Corpus Linguistics - Corpora Typology

Revisionato il 13/04/2026

di alessia.lento

Publisher

Vota

Contenuto verificato e approvato dal Team di Esperti di Skuola.net

Appunti di Corpus Linguistics per l'esame della professoressa Freddi. E' presente una sintesi di tutti i corpora contenuti nel manuale di Corpus Linguistics, perfettamente schematizzati con …

Esame Corpus Linguistics

Facoltà Lettere e filosofia

Dal corso del Prof. Freddi Maria

Università Università degli Studi di Pavia

A.A. 2013-2014

4 pagine

Appunto

Scarica

Estratto del documento

Corpora typology

Hansard reports corpus

The Hansard reports corpus is a corpus of parliamentary debate produced in the UK. It is an opportunistic corpus, meaning there is no particular sampling frame or collection of an ever-larger body of data, but rather what it was possible to gather. It was built to exploit the first machine-readable material available at that time. However, it is not very reliable because transcripts are known to make certain changes, adding information about the speakers and people referred to. It omits situational references, such as turn-taking, making it seem as though MPs speak one after the other without any apparent meta-comment on how and when to speak.

Canadian Hansard

This is a parallel corpus: the source text (ST) is in one language with translations into other languages.

Helsinki corpus

The Helsinki corpus is a historical corpus compiled in the 1980s. It spans from 850 CE to the 18th century, possibly the widest of any available corpora. This corpus attempts to cover a variety of types of text and contains 1.5 million words, although the coverage of some periods is inevitably scanty.

ARCHER (A Representative Corpus of Historical English Registers)

Created by Biber at Northern Arizona University, ARCHER is one of the most important diachronic corpora of recent years. It represents both a spread of time periods and a spread of genres, containing 1.7 million words. The focus is more on the last 350 years, and it allows the analysis of diachronic change with an emphasis on genre variation. It is ideal for research on the emergence of grammatical structures that characterize present-day English.

Specialized historical corpora

Corpus of Early English Correspondence: Solely of letters.
Lancaster Newsbooks Corpus: Only very early news publications of the 1650s.
Corpus of English Dialogues 1560-1760: Historical speech, no audio recording at that time, so it collects dialogues because they are likely to approximate speech more than other genres of writing, such as transcripts of court trials and scripts of plays.

COLT corpus of London teenage speech

This corpus takes a thoughtful approach to the issue of anonymity in corpus building. Recordings were made in London, with recruits being pupils. It includes time alignment, similar to ONZE (origin of New Zealand English) and ICE-GB.

International Corpus of English (ICE)

The ICE represents one language (English) and a number of international varieties of it, allowing comparison and contrast among these varieties. It is a family of matched corpora and uses the ICECUP parser (3rd generation, 1990s), which brackets the main syntactic constituents and creates a clear graphical display of the output (Treebank), applied to ICE-GB. This corpus includes spoken data and covers areas where English is spoken mainly as a foreign language or second language, such as Hong Kong and India. However, it lacks the diachronic dimension of the Brown corpus.

Anteprima

Vedrai una selezione di 1 pagina su 4

Corpus Linguistics - Corpora Typology Pag. 1

Acquista con carta o PayPal

Scarica i documenti tutte le volte che vuoi

Dettagli

SSD

Scienze antichità, filologico-letterarie e storico-artistiche L-LIN/12 Lingua e traduzione - lingua inglese

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher alessia.lento di informazioni apprese con la frequenza delle lezioni di Corpus Linguistics e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Università degli Studi di Pavia o del prof Freddi Maria.

Appunti correlati

Invia appunti e guadagna

Recensioni

Ti è piaciuto questo appunto?