Using the ANC with Xaira
Note that Xaira is currently available for Windows systems only.
Contents
- Xaira and the First Release
- Xaira and the Second Release
- Indexing the ANC (both releases).
Xaira and the First Release.
It is assumed that you have unpacked the ANC files somewhere onto
your hard drive. We refer to this location as the "ANC home directory". At a minimum, you will need the "merged" directory and the XML
files respStmt.xml and publicationStmt.xml in the ANC home directory.
It is also assumed that you have the latest available build of Xaira installed (v1.10 as of this writing).
- Download ANCXaira.zip. which contains the following files:
- BulkXsltW.exe : a program for preprocessing the ANC files.
- anc.xsl : a XSLT style sheet used to preprocess the ANC.
- bib.xsl.txt : A text file containing the XSL to be pasted into Xaira when creating the bibliography
- nyt-headers-fixed.zip : A zip file containing 14 updated/repaired headers for the NY Times files.
Unzip the ANCXaira.zip file to your ANC home directory. If your system does not automatically unzip (i.e., extract) the files from the zip archive, you can use use WinZip to do so. Unzip the nyt-headers-fixed.zip and place it in the ANC home directory as well.
- Create a new directory called "texts" in your ANC home directory.
- Start the BulkXsltW program. If the BulkXsltW program is in the ANC home directory you can accept the default values
and simply click the "Transform" button. Otherwise enter the following:
- For the input directory browse to the ANC home directory and select the "merged" directory.
- For the XSL file select the "anc.xsl" file you installed above.
- For the output directory select the "texts" directory you created above.
- Leave the rest of the fields as they are.
- Click the "Transform" button.
- Wait. (it takes approximately 15 minutes on a 1.5 GHz machine.)
Quit the BulkXslt program when it completes.
Note: The BulkXslt program is a Java application bundled as a Windows executable and
requires that you have Java 1.4 or later installed on your system. Executables for other platforms can be provided as a Java Jar file
or Java source code, if needed, by contacting the ANC technician.
Xaira and the Second Release
Before using the ANC with Xaira the ANC standoff annotations need to be inlined into XML with the
ANC Tool. It is recommended that you include the Hepple (Penn) part of speech tags when processing the
ANC since not all files are annotated with the Biber part of speech tags.
Indexing the ANC (both releases)
Note: During local testing our Xaira client crashes if more than 10,000 files have been indexed. We have yet to determine if this is a problem with our local configuration, or a current limitation (bug) in Xaira. In the meantime you may wish to limit the number of files you index 9,999 or fewer. The easiest temporary solution is likely to simply process the entire ANC with the ANC Tool and then delete/move the files/directories you are not interested in, or least that you can live without.
- Start the Xaira Indextools program and run the Index Wizard (File menu).
- Corpus name dialog. Enter a name and title statement (optional). Click the Next button.
- Corpus root dialog. Click the Next button. You can change where Xaira will store its files but the default values are recommended unless you know what you are doing.
- Texts dialog. Click the Browse button and select the directory containing the ANC files. Click the Next button.
- Markup diaglog. The XML button should be checked (the default). Click the Next button.
Xaira will copy the ANC files from their current location to Xaira's text directory. This may take several minutes.
- File structure dialog. The Model 1 button should be checked (the default). Click the Next button.
- File list dialog. Click the Next button. Note, if you have processed the entire ANC and have more than 10,000 files you can use this dialog to remove files. The files removed from the file list are not deleted from the disk, Xaira simply won't index them.
- Reading files dialog. Click the Go button. Xaira will parse the ANC files. This may be a lengthy procedure. When all the texts have been parsed you should receive the message that "All texts are well-formed." Click the Next button.
- Language dialog. "en" should be selected by default. Click the Next button.
- Text delineation dialog. Put a checkmark in the "Just use file names" box and click the Next button.
- Unit delineation dialog. Select "s" in the left hand list and "Auto-number" in the right hand list. Click the Next button.
- Word delineation dialog. Select "tok" (should be the default) and click the Next button.
- Additional keys. Click the Next button.
- Bibliography dialog. Click the Next button. No bibliography is built since the ANC files do not contain the information Xaira expects to find in the header.
- Indexing dialog. Click the Index button. Xaira will start to index the ANC files. This may be a lengthy procedure. When the indexer completes you should receive the message "Indexer terminated with code 0 (OK)". Click the Finish button. Put a check in the box beside "View corpus in Xaira client" if you would like to load the Xaira client immediately.
You should now be able to open and query the ANC with the Xaira client.