The Open American National Corpus
The Open American National Corpus (OANC) is a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. All data and annotations are fully open and unrestricted for any use.
Available Data and Annotations
Contribute Text, Annotations, and Derived Data
OANC and MASC are collaborative development resources that rely on contributions of data and annotations from the linguistics and natural language processing communities as well as the public at large.
We solicit contributions of written texts and spoken transcripts in American English that were produced in or after 1990 to be included in the OANC and/or MASC.
Native speakers of American English (Am I a Native Speaker?) who have produced documents of any kind (including college student essays, blogs, poetry, fiction, email, etc.) are invited to become a part of linguistic history by contributing these materials to the OANC/MASC. Authors can consult the frequently asked questions page to learn more about how the data will be used, and why you should consider contributing your work to the OANC.
Those who have developed corpora of post-1989 American English for any purpose are also encouraged to contribute their unrestricted data. We also ask users to contribute annotations for linguistic features of any kind on all or part of the OANC and/or MASC and contribute derived data such as word lists, etc. derived from OANC/MASC, for free distribution and use.