Sunday, April 22, 2012

Corpus, Corpus, Corpus


For this week's assignment, I reviewed three corpuses . . . corpusi . . . corpora: The National American Corpus, The Corpus of Contemporary American English and The Cambridge English Corpus.  Here are my thoughts:

The American National Corpus contains over 14 million words drawn from authentic texts which are donated by contributors. The goal of this corpus is to enable software designers to analyze typical American English so that their products and the web will “handle [actual] American usage.”  This corpus is principled, authentic, accessible, and would be a good resource for business professionals in software and web design.

 The Corpus of Contemporary American English contains 425 million words collected from more than 175,000 sources including spoken, fiction, popular magazines, newspapers, and academic journals.  This corpus does not have any stated goals or explanatory information.  The “see notes” link to verify the authenticity of texts doesn’t work.  The corpus is downloadable and no membership is required to access the data.  This corpus might be great for an individual researcher.  However, I would not recommend it because the authenticity of the corpus cannot be verified.
The Cambridge English Corpus would be the best site for educators and language learners to use. The goal of the corpus is “to help in writing books for learners of English.”  This is a principled corpus.  It contains 1 billion, 760 million words taken from authentic sources: “newspapers, best-selling novels, non-fiction books on a wide range of topics, websites, magazines, junk mail, TV and radio programmes, recordings of people's everyday conversations and many other sources.”  This corpus consists of 8 corpora specializing in Spoken English in the UK, Business language, Spoken English in North America, Business reports and docs in the UK and US,  Legal English, Financial English from US and US, Academic English from the UK and US, and a corpus of student exam scripts from the Cambridge ESOL exams.

This corpus is ideal for educators and text book developers.  Only members have full access, but there are many features of this corpus that are available to the public.  The corpus provides learning materials including interactive quizzes and games free online. I will recommend this to my tutees.
Oh, Corpora!

No comments:

Post a Comment