Anteprima
Vedrai una selezione di 4 pagine su 11
Corpora Linguistics Pag. 1 Corpora Linguistics Pag. 2
Anteprima di 4 pagg. su 11.
Scarica il documento per vederlo tutto.
Corpora Linguistics Pag. 6
Anteprima di 4 pagg. su 11.
Scarica il documento per vederlo tutto.
Corpora Linguistics Pag. 11
1 su 11
D/illustrazione/soddisfatti o rimborsati
Disdici quando
vuoi
Acquista con carta
o PayPal
Scarica i documenti
tutte le volte che vuoi
Estratto del documento

Corpus Linguistics

Corpus

The word corpus (pl. corpora or corpuses) is the Latin term for "body," hence according to McEnery and Wilson "any collection of more than one text can be called a corpus." A corpus can be defined as any body of text.

Introduction

  • Corpus linguistics has undergone a remarkable renaissance in recent years.
  • It has become an increasingly prevalent methodology in many linguistics, in spite of the unpopularity of the approach in the 1960s and 1970s.
  • From being a marginalised approach used largely in English linguistics, and more specifically in studies of English grammar, corpus linguistics has started to widen its scope.
  • Corpus linguistics is also increasingly multilingual, with many languages and many varieties of those languages being studied with the help of corpus data.

What is Corpus Linguistics?

Corpus linguistics is perhaps best described in simple terms as "the study of language on examples of 'real life' language use."

Is Corpus Linguistics a Branch of Linguistics?

The answer is yes and is one of many debates.

  • Among those who consider it a branch of linguistics there is halfway use states: "all involved activities of data gathering and theorising which inevitably lead to qualitative changes in the understanding of language."

Corpus as a Branch of Linguistics

  • For this reason, it should be considered as an independent branch of linguistics, a "new research enterprise and a new philosophical approach to linguistic enquiry" (Tognini-Bonelli 2001).

Opposite View/Corpus as a Methodology

  • Scholars of the opposite view consider CL simply as a "methodology" (and not as aspect of language requiring explanation or description).
  • Corpus linguistics represents a new approach to the use of empirical data which is functional and applicable to (just about) every field of linguistics, for example, lexicography, applied linguistics, critical discourse analysis, stylistics and studies in language variation.

Early Corpus Linguistics

  • It is a term used to describe linguistics before the advent of Chomsky.
  • The dominant methodological approach to linguistics immediately prior to Chomsky was based upon described languages use.
  • This term describes all linguistics before Chomsky and links it to the modern methodology of corpus linguistics to which it has affinity.
  • We could argue that those linguists who worked before Chomsky were interested in gathering and studying from...
  • While not identifying themselves with the term "corpus linguistics", field linguists, like Boas (1940), and later linguists of the structuralist tradition all used a basic methodology that can be undoubtedly called "corpus-based".
  • Fries and Traver (1940 and Fongers (1944) are examples of linguists who used the corpus in research on foreign language pedagogy.
  • Eaton's (1940) study also shows evidence of a corpus-base inclination. He compared the frequency of word meanings in Dutch, French, German and Italian.
  • The semantic frequency lists used by Eaton were also used by other researchers interested in monolingual description. Lorge (1949) is an example.
  • Fries also gives an example of descriptive grammar of English based on a corpus. This work presents the corpus-based grammars of the late 1980s.

But after the 1950s, the corpus as a source of data underwent a period of almost total unpopularity and neglect.

As a methodology, it was widely perceived as discredited.

Why are we so sure about the fifties?

Because one man and his criticism of the corpus as a source of information can be well noted!

Noam Chomsky (1928-)

Why look in a corpus for what lies in your head?

  • "Chomsky" corpora, by their very nature, are incomplete.
  • Language is non-enumerable.
  • Hence no finite corpus can adequately represent language.
  • Corpora are skewed. Some sentences are in the corpus because they are frequent constructions, some by sheer chance.
  • Corpora can never be a useful tool for the linguist as the linguist must seek to model language competence rather than performance.

Competence

  • = Our tacit, internalized knowledge of a language
  • = What we know about language
  • Explains and characterizes a speaker's knowledge of the language
  • The linguist attempts to explain and characterize our knowledge of language

Performance

  • = External evidence of language competence and its usage on particular occasions
  • = How we use language
  • Performance is a poor mirror of competence because it may be influenced by factors other than our competence.

"A corpus is a collection of externalised utterances. It is performance data and, as such, it must be a poor guide to modeling linguistic competence".

  • They helped to train academics for the grammatical analysis of English.
  • Geoffrey Leech started a computer research centre which has produced a number of well-known corpus linguistics projects, including:
    • The famous Lancaster-Oslo/Bergen Corpus (LOB) and more recently
    • The British National Corpus.
  • The works of Francis and Kucera, as well as that of Quirk and his colleagues, inspired centres of English corpus centres in other countries like Scandinavia, Western Europe and Eastern Europe.

Work After Firth

  • Who was Firth?
    • He became a professor of English at Lahore, India.
    • He produced a collection of papers which were published in a compendium format in 1957 in which he outlines an approach to language in which social context and the social purpose of communication are paramount.
    • “The central concept... is the context of situation. In that context are the human participant, what they say, what is going on. The phonetician can find his phonetic context, and the grammarian and the lexico-grapher theirs.” - J. R. Firth (1957:29)
    • “Attested language... duly recorded is the focus of attention for the linguist.” - J. R. Firth.
  • What happens after Firth...
    • His ideas dominated much of British linguistics for the best part of a generation.
    • His exhortation to study “attested language” inspired what we now call neo-Firthian linguists, such as Halliday and Sinclair.
    • His term collocation is still very much used in modern corpus linguistics.
  • The largest programme of research inspired by neo-Firthian’s corpus linguists has been the COBUILD project carried out at Birmingham University by John Sinclair and his team around 1980 onwards.

Conclusions

  • During the 1950s, a series of criticisms were made of the corpus-based approach to language study.
  • Some were right, some were half-right and some have proved themselves to be wrong or irrelevant.
  • Some linguists carried on using the corpus as a technique and tried to establish a balance between the use of the corpus and the use of intuition.
  • Although the methodology went through a period of relative neglect for 2 decades, it was far from abandoned.
  • Instead, during this period, essential advances in the use of corpora were noted.

THERE ARE STILL FEW EXAMPLES OF COMPARABLE CORPORA.

ONE OF THE CLEAREST IS ICE, INTERNATIONAL CORPUS OF ENGLISH, WHICH GATHERS AROUND ONE MILLION WORDS IN EACH OF THE MANY VARIETIES OF ENGLISH AROUND THE WORLD (SUCH VARIETIES HAVE BEEN ASSEMBLED FOLLOWING THE SAME CRITERION).

SPECIFIC CORPORA

IN RECENT TIMES, THERE HAS BEEN A CONSIDERABLE GROWTH IN THE PRODUCTION OF CORPORA WHICH REPRESENT THE USE OF LANGUAGES FOR SPECIFIC PURPOSES. THE MAJORITY OF THESE CORPORA VARY IN SIZE.

CORPORA FOR SPECIFIC PURPOSES INCLUDE TEXTS FROM FIELDS SUCH AS ECONOMICS, MEDICINE, SCIENCE AND TECHNOLOGY, COMPUTER SCIENCE, ACADEMIC WRITING, AND LITERATURES.

LEARNER CORPORA

THESE ARE MADE OF TEXTS, WHICH ARE PRODUCED BY THE LEARNERS OF A LANGUAGE. THEY CONSIST IN SAMPLES OF WRITTEN OR ORAL PRODUCTION OF STUDENTS OF A FOREIGN LANGUAGE.

THEY INCLUDE INFORMATION ABOUT THE AGE, THE EDUCATION, THE MOTHER TONGUE AND THE LEARNING SITUATION OF NON-NATIVE SPEAKERS.

THEY ARE USED TO IDENTIFY THE CHARACTERISTICS OF INTERLANGUAGE AS WELL AS TO CARRY OUT ERROR ANALYSIS.

THEY ARE CONSIDERED USEFUL TOOLS FOR BOTH LANGUAGE TEACHERS AND STUDENTS.

AUTONOMY AND LANGUAGE AWARENESS IS ENCOURAGED THROUGH THE USE OF LEARNER CORPORA AND OFTEN BECOMES A SOURCE FOR LEARNING AND TEACHING MATERIALS AND ACTIVITIES.

TRANSLATION CORPORA

TRANSLATION CORPORA DIFFER FROM PARALLEL CORPORA, AS THEY DO NOT REPRESENT TEXT IN TRANSLATION.

RATHER, THEY ALLOW ONE TO COMPARE, FOR EXAMPLE, L1 ENGLISH TEXTS IN ONE GENRE WITH L1 ITALIAN TEXTS IN THE SAME GENRE.

WHY USING TRANSLATION CORPORA?

  • PARALLEL CORPORA ARE BEING EXPLOITED FOR A VARIETY OF TRANSLATION-RELATED TASKS.
  • BUT TRANSLATION CORPORA HAVE BEEN DEVELOPED TO OVERCOME PROBLEMS OF ARTIFICIALITY AND ERROR WHICH ARE SOURCES OF POTENTIALS PROBLEMS WITH CORPORA INCORPORATING TRANSLATED MATERIAL FOR BOTH MACHINES AND HUMANS.
Dettagli
A.A. 2016-2017
11 pagine
1 download
SSD Scienze antichità, filologico-letterarie e storico-artistiche L-LIN/12 Lingua e traduzione - lingua inglese

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher hardrockmetallover97 di informazioni apprese con la frequenza delle lezioni di Lingua e traduzione inglese I e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli Studi di Palermo o del prof Sciarrino Chiara.