Note: While this page will display in any browser, it will look much better when viewed with a browser that is compliant with the latest W3C standards. If you can see this message you may want to consider updating your software at this time.
Xoro is a simple scripting language that was developed by the ANC to automate tasks in Gate. While Gate provides Pipelines and Conditional Pipelines sometimes more complex processing is required. For example, different gazetteers/transducers may be needed depending on some feature of the file, or only the files that match a certain pattern should be processed, or etc. etc. In particular, working with a corpus stored as a large number of files is awkward unless the documents are first placed in a "DataStore", which may not be optimal in some cases.
One of the most powerful features of Xoro is the ability to import and use any Java class, including Gate resources, in a simple scripting environment without having to recomplie Java code. In addition, without the overhead of a GUI Gate resources tend to run considerably faster in Xoro than they do in Gate itself. Xoro can also be run as a processing resource inside Gate.
Before you are able to use Xoro you will need to have the following:
Unfortunately, Xoro does not come with an installer or install script. You will have to unpack the Xoro.zip file and manually move the files to their proper locations. The first step is to download Xoro (link to be added later) and unzip it to a temporary location on your hard drive. The zip file should contain the following files:
Installation steps:
Gate itself is not distributed with Xoro. It is assumed that you already have Gate up and running.
Xoro was never intended to become a full-blown scripting language, and it was never intended to be released to the public. Xoro was developed as an in-house hack to get a job done. Over time features have been added and Xoro has evolved into something that is quite useful and worth sharing with the rest of the world. However, Xoro's early roots as a quick hack do show through from time to time. In particular:
The Xoro executable (jar file) is freely distributable and free to use for any purposes.
The source code for the current version of Xoro is available by request only. The source code for the current version of Xoro is not re-distributable. A future version of Xoro will be released as open source under a BSD-like license.
Xoro is Copyright © 2004, 2005 The American National Corpus Project. All rights reserved.
The entry point for every Xoro script is the main function with the signature
void main(string[] args)
The string array parameter args will be filled in with any command line arguments. If you are not going to be using command line arguments in your script you can leave the parenthesis empty. The classic HelloWorld program in Xoro looks like:
void main()
{
println("Hello world.");
}
Xoro supports both single line comments and multiple line comments.
// This is a single line comment. /* * This is a multiple * line comment. */
Only basic operators and types have been implemented (Xoro only supports integer types, mostly used for loop counters) so don't expect to calculate pi or ridiculously large prime numbers with Xoro.
| Arithmetic Operators | |
|---|---|
| + | addition |
| - | subtraction |
| * | multiplication |
| / | division |
| ++ | increment |
| -- | decrement |
| Relational Operators | |
| == | equal |
| != | not equal |
| < | less than |
| <= | less than or equal |
| > | greater than |
| >= | greater than or equal |
| Logical Operators | |
| && | and |
| || | or |
| ! | not |
Note: Only prefix forms of the increment and decrement operators are implemented. That is, use ++i, do not use i++.
Note: There are no bitwise operators.
Xoro contains four built in types:
| int | an integer type represented internally by a java.lang.Integer object. |
| boolean | a boolean type represented internally by a java.lang.Boolean object |
| string | a string type represented internally by a java.lang.String object |
| object | a generic object type represented by a java.lang.Object object |
There is no type in Xoro that corresponds to Java's primitive int type. This makes it impossible to call methods such as java.lang.String.substring() that expects primitive int types for its parameters.
There is a pseudo void type, however it is only used as a place holder in function signatures to indicate that a function does not return a value. The void keyword can not be used in any other context.
Global variables are allowed, but they must be declared after the Java import statements and before any function definitions.
Only very basic (and mostly broken) arrays are implemented. Arrays are declared similar to the way arrays are declared in Java, for example:
int[10] array1; object[] array2;
In addition, arrays may be initialized when they are declared.
string[] message = {"hello world", "foo bar"};
object[] resources = { resource1, resource2 };
You can not allocate arrays at runtime with the new operator.
There are none. If you need a user defined type then you will have to write a Java class and use that.
Java classes may be imported with the use keyword. Each class must be imported individually, there is no mechanism to import entire Java packages. All import statements must appear at the top of the program, before any variable or function definitions.
To use a Java class the format is:
use class as alias;
Where class is the full package and class name of the Java class to be imported, and alias is the name the class with be known as internally. The alias and the class name are not required to be the same.
For example:
use java.lang.String as String; use java.io.File as File; use java.util.LinkedList as List; use gate.Document as Document;
Note: To use a Java class you must import it with the use statement; you can not simply specify the full package and class name in the program. That is the following will not work.
use java.lang.String as String;
void main(String[] args)
{
String filename = "somefile.xml";
java.io.File file = new java.io.File(filename); // KA-BOOM!!!
}
You can use any Java class as long as it can be found on the classpath. The new operator is used to allocate new obejcts as it is in Java.
use java.util.Stack as Stack;
void main(string[] args)
{
Stack stack = new Stack();
stack.push("foo");
...
}
Note: Only very basic type checking is done when the script is compiled, full type checking does not take place until each statement is evaluated. For example, the following code will run until the second push statement, at which time a Java exception will be thrown.
use java.io.File as File;
use java.util.Stack as Stack;
void main(string[] args)
{
object stack = new Stack(); // ok, a Stack is a kind of object
stack.push("foo"); // ok, the stack object does have a push method
stack = new File(); // ok, File is an object too.
stack.push("bar"); // KA-BOOM...
}
Note: There is no way to catch exceptions in Xoro. When an exception is thrown Xoro will print an incomprehensible error message and terminate.
While it is possible to use Gate resources as you would any other Java class, Xoro provides special syntactic candy for intializing resources in resouces blocks. A resource block takes the form
resource alias = java.package.class
{
parameter = "value";
...
}
For example:
resource splitter = gate.creole.splitter.SentenceSplitter
{
name = "Splitter";
gazetteerURL = "file:/c:/gate/gazetteer/lists.def";
transducerURL = "file:/c:/gate/grammar/main.jape";
}
Before you can use a gate resource you need to call the built in load function which invokes gate.Factory.createResource() to create the resource for you. For example:
// A custom resource
resource saveStandoff = org.xces.creole.SaveStandoff
{
name = "Save standoff markup";
standoffTags = [ "tok" ];
standoffASName = "Standoff markups";
namespace = "http://www.xces.org/schema/2003";
destination = "file:/c:/corpus/sample.txt";
}
void main(string[] args)
{
load(saveStandoff);
...
saveStandoff.execute();
}
Since the gate.Factory class is declared as abstract it can not be imported like the other Gate classes and resources. So Xoro provides proxy for gate.Factory class named Factory (of course).
use gate.FeatureMap as FeatureMap;
void main(string[] args)
{
FeatureMap fm = Factory.newFeatureMap();
...
}
Functions may be defined after the import block and global variable definitions. User defined functions may appear either before or after the main function. The syntax to define a function is:
<type> <identifier> ( [<parameter_list> ) <block>
Where:
For example:
void doNothing(string name, int value)
{
...
}
All statement blocks in Xoro must be enclosed in curly braces. There is are no single line blocks after if, while, and for statements as there is in C++, Java, etc. That is, use:
if (x == y)
{
do_something();
}
Don't use:
if (x == y) do_something();
If statements take the form:
if (<boolean_expression>) <block>
Where <boolean_expression> is the expression to be evaluated to determine if the statement block is executed. An if statments may also have an else clause
if (<boolean_expression>) <block> else <block>
And multiple if-else statements can be chained together.
if (<boolean_expression>) <block> else if (<boolean_expression>) <block> else ...
The for statement takes the form
for (<init>; <boolean_expression>;<step>) <block>
Where:
For example:
for (int i = 0; i < 10; ++i) // remember, no post-fix increment operator...
{
...
}
Note: multiple expressions can not be used in for statements. That is, the following will not work:
for (int i = 0, int j = 10; i < 10; ++i, --i)
{
...
}
The while statement takes the form:
while ( <boolean_expression> ) <block>
Where <boolean_expression> is the expression that will be evaluated to determine if an iteration of the statement block should be performed.
exit
Can be used to cause the early termination of the script.
break
continue
Coming soon. Using them won't cause a syntax error in the scipt...
they just don't do anything yet.
Below are some example scripts to show how to perform various tasks with Xoro.
There are several factors that can make debugging a Jape grammar a daunting task. Not the least of which is simply making sure your grammar is syntactically correct. A Jape grammar that contains syntax errors with causes whatever processing resource that tries to use throw an exception. Depending on what threw the exception, and where, restarting Gate may be required to reload the resource. For example, if you are developing a custom processing resource that uses a gate.creole.Transducer resource internally and you want Gate to reload your jar file.
Below is a script that tries to instantiate a gate.creole.Transducer with a specified Jape grammar. If the grammar is syntactically correct the script with print an appropriate message and exit. If the grammar is not syntactically correct a Gate exception will be throw and the corresponding error message and stack trace will be displayed.
use gate.FeatureMap as FeatureMap;
use java.io.File as File;
void
main(string[] args)
{
// Check the args
if (length(args) == 0)
{
println("No grammar file specified.");
exit;
}
// Test for the existence of the grammar file.
File file = new File(args[0]);
if (!file.exists())
{
println("Could not find " + file.getPath());
exit;
}
FeatureMap fm = Factory.newFeatureMap();
fm.put("grammarURL", "file:" + file.getPath());
object transducer = Factory.createResource("gate.creole.Transducer", fm);
// If we made it this far the grammer was ok!
println("Success! The grammar appears to be syntactically correct.");
}
This script is similar to a pipeline application in Gate. This example runs Gate's default tokenizer, sentence splitter, and POS tagger on all the files in a given directory and saves the resulting XML files in another directory.
use gate.Document as Document;
use java.io.File as File;
use java.net.URL as URL;
// Use the default values for all the resources
resource tokenizer = gate.creole.tokeniser.DefaultTokeniser { }
resource splitter = gate.creole.splitter.SentenceSplitter { }
resource tagger = gate.creole.POSTagger { }
resource saver = org.xces.creole.SaveMarkup { }
// Put our resources in an array for easy access. This is the order in which they will
// be executed as well.
object[] resources = { tokenizer, splitter, tagger, saver }
int nResources = length(resources);
// Where we will write the processed files.
File outputDir;
void main(string[] args)
{
String xces = getenv("XCES_HOME");
LoadResources();
// Error checking omitted for brevity... we'll assume the args are correct...
File inputDir = new File(args[0]);
File outputDir = new File(args[1]);
File[] files = inputDir.listFiles();
int nFiles = length(files);
for (int i = 0; i < nFiles; ++i)
{
Process(files[i]);
}
println("Done.");
}
void LoadResources()
{
for (int i = 0; i < nResources; ++i)
{
load(resources[i]);
}
}
void Process(File file)
{
Document doc = Factory.newDocument(new URL("file:" + file.getPath()));
// Set the destination file.
File dest = new File(outputDir, file.getName());
saver.setDestination(new URL("file:" + dest.getPath()));
// Now processs the document with all the resources.
for (int i = 0; i < nResourse; ++i)
{
object res = resources[i];
res.setDocument(doc);
res.execute();
// make sure no resource keeps hanging onto the document.
res.setDocument(null);
}
// Clean up the document
Factory.deleteResource(doc);
doc = null;
}