Note: While this page will display in any browser, it will look much better when viewed with a browser that is compliant with the latest W3C standards. If you can see this message you may want to consider updating your software at this time.
The American National Corpus has obtained the anc.org domain name. We would like to thank the Animal News Center for transferring the domain to us when they decided not to renew it. In honor of the Animal News Center the American National Corpus has made a donation to the Humane Society of the United States on their behalf.
October 22, 2008: Version 1.2.5 of the ANC Tool is now available. The new version fixes a problem that prevented it from starting on Mac OS X.
July 24, 2008: Version 1.2.3 of the ANC Tool is now available. The new version includes better support for selecting the Unicode character encoding, a few bug fixes, and (experimental) NLTK output.
The open portion of the ANC (approximately 15 million words of text, with annotations) is now available for download.
Frequency counts for the second release are now available and can be downloaded here.
Both sets of annotations can be downloaded from our annotations page.
The ANC, in collaboration with the FrameNet project, WordNet, and Columbia University, has received a grant from the National Science Foundation to produce a balanced sub-corpus of the ANC that is manually annotated for WordNet senses, FrameNet frames, and validated for word and sentence boundaries, part of speech, noun chunks, and verb chunks.
The newly-formed Assocation for Computational LInguistics Special Interest Group for Annotations (SIGANN) is publishing a 40K word "sharable corpus" consisting of texts drawn from the ANC 2nd Release, with the intent of gathering as many annotations of the corpus as possible.
The American National Corpus (ANC) project is creating a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. The ANC will provide the most comprehensive picture of American English ever created, and will serve as a resource for education, linguistic and lexicographic research, and technology development.
When completed, the ANC will contain a core corpus of at least 100 million words, comparable across genres to the British National Corpus (BNC). The corpus will also include an "opportunistic" component of potentially several hundreds of millions of words, chosen to provide both the broadest and largest selection of texts (and, where available, annotations) possible.
Do you have public domain (or "sharealike") texts in American English produced in or after 1990? You can upload all or parts of this data to be included in the ANC.
Authors may consult the frequently asked questions page to learn more about how the data will be used, and why you should consider contributing your work to the ANC.
If you have annotated any part of the ANC for linguistic features of any kind or produced linguistic information derived from it, please contribute the annotations to the ANC for free distribution and use to anyone who has the ANC data.
ANC annotations in the format specified for the Linguistic Annotation Format developed by ISO TC37 SC4, and a version of the ANC Tool that handles data in this format.
The ANC is working with annotation projects that are generating layers of annotation for some or all of the following: Penn Treebank-style syntactic annotations, PropBank, NomBank, TimeML, and opinion annotations. The data and annotations from these projects will be added to the ANC.
The American National Corpus project has received support from the ANC Consortium, the TalkBank project, the Department of Chinese, Translation, and Linguistics at the City University of Hong Kong, and the National Science Foundation.
The ANC also acknowledges the following, who have provided software and/or support for ANC development: