Corpus Resources

Corpus Resources ENGL 2017 

Online corpus PolyU Language bank Over 36 mil words of multilingual, multi-genre corpora free
RCPCE Profession-specific Corpora A large collection of texts used in different professions in Hong Kong free
A Query to Internet Corpora (Leeds U) Updated general-purpose online corpora with different languages  
British National Corpus (1980-1993) A standard English corpus often used as a reference corpus.  
British Academic Written corpus (BAWE)
A 6- mil- word collection of student essays in different disciplines  
Business Letter Corpus A corpus with different English letters  
BYU Corpora
A collection of mega-corpora, including such as BNC and NOW (New words from 2010 to yesterday)  
The Corpus of Contemporary American English (COCA ,1990-present) Representative of modern American English  
Time Magazine (1923-2006) A corpus for diachronic language study free
GloWbE (Global Web-Based English) 1.9 billion words of English used in 20 countries free
MICASE Transcripts of a wide range of spoken academic texts from Michigan University. free
The Oxford Text Archive The  Archive develops, collects, catalogues and preserves a variety of electronic literary and linguistic resources  free
WebCorp Allows corpus-type searches of documents in English on the Internet. free
LAMAL A link to different corpora PW
Fashion Communication Corpus (FCC) A 1 million-word texts obtained from fashion magazines, literature, journals, websites etc. PW
Enron email corpus Enron email data sets compiled at UK Berkeley free
Corpora maintained by Geoffrey Sampson A  collection of different texts  
Parallel corpus Bilingual Parallel Corpora ofChinese Classics Parallel texts of Chinese classic novels and government documents  
English-Chinese parallel concordancer A collection of novels, fables and essays free

Text Archive

The Gutenberg Project
The pioneering project designed to make non-copyright text available electronically free
The Internet Archive
"The Internet Archive Text Archive contains a wide range of fiction, popular books, children's books, historical texts and academic books." free
Word cloud

Voyant Tools

To create word cloud based on frequency free
Bookworm  A simple and powerful way to visualize trends in repositories of digitized texts.  
Wordle Wordle is a tool for generating "word clouds" from text that you provide. free
Corpus tools Concgram  A program for concordancing and concgram text analysis free
AntConc A freeware concordance program for Windows free
ParaConc A bilingual or multilingual concordancer that can be used in contrastive analyses and translation studies free trial
WordSmith Tools Concordancing, word lists, key words free (4.0) 
Leximancer Lexical analysis free trial
WMatrix In addition to frequency lists and concordances, WMatrix extends the keywords method to key grammatical categories and key semantic domains. free trial
Sketch Engine Sketch Engine can provide a one-page summary of the word’s grammatical and collocational behavior, showing the word’scollocatescategorised by grammatical relations.  
ATLAS.ti (7) For qualitative data analysis and discourse analysis free trial
NVivo (10) For qualitative data analysis and discourse analysis  
Kfngram Kfngram makes n-gram indices of any text(s) you give it, similar to WordSmithTools' Cluster function. free
The IMS Open Corpus Workbench   free
Lexical analysers The Ultimate Research Assistant Lexical semantic thematic analysis of web documents free
Tagger CLAWS Word class (part-of-speech) tagger free
Stanford Log-linear Part Of Speech tagger Different software for POS tagging free
GUM The Georgetown University Multilayer Corpus free
Phonetic analysis Praat Praat (the Dutch word for "talk") is a free scientific software program for the analysis of speech in phonetics. free

(The Emu Speech Database System)

EMU is a collection of software tools for the creation, manipulation and analysis of speech databases. free
WaveSurfer WaveSurfer is an Open Source tool for sound visualization and manipulation. free


Speech Analyzer is a computer program for acoustic analysis of speech sounds. free
Development workbench KPML Workbench for developing grammatical descriptions and defining computational grammars free


Database for developing and storing terminologies free
Descriptive resources WordNet A lexical database organizing nouns, verbs, adjectives and adverbs into synonym sets, each representing one underlying lexical concept. free


A lexical database containing around 1,200 semanticframes, 13,000lexical units and over 190,000 example sentences. free 

Hong Kong Research Grant Council (RGC)
GRF log in page