Note: While this page will display in any browser, it will look much better when viewed with a browser that is compliant with the latest W3C standards. If you can see this message you may want to consider updating your software at this time.


15em 7em
about contents encoding frequency data using xaira bugs & caveats
about obtaining contributing contents encoding frequency data using Xaira bugs & caveats
contents using annotations download
annotations software source code frequency data publications contributor's FAQ
project people consortium anc mailing list contact us site map

Welcome to the Wonderful World of Xoro


Introduction

Xoro is a simple scripting language that was developed by the ANC to automate tasks in Gate. While Gate provides Pipelines and Conditional Pipelines sometimes more complex processing is required. For example, different gazetteers/transducers may be needed depending on some feature of the file, or only the files that match a certain pattern should be processed, or etc. etc. In particular, working with a corpus stored as a large number of files is awkward unless the documents are first placed in a "DataStore", which may not be optimal in some cases.

One of the most powerful features of Xoro is the ability to import and use any Java class, including Gate resources, in a simple scripting environment without having to recomplie Java code. In addition, without the overhead of a GUI Gate resources tend to run considerably faster in Xoro than they do in Gate itself. Xoro can also be run as a processing resource inside Gate.

Getting Started

Requirements

Before you are able to use Xoro you will need to have the following:

  1. GATE installed and working correctly.
  2. Good knowledge of Java programming and GATE's Java API.
  3. Knowledge of running Java programs from the command line and editing shell (.bat) scripts.
  4. A good sense of humour and lots of time on your hands. Using Xoro is not for the faint of heart.

Installation

Unfortunately, Xoro does not come with an installer or install script. You will have to unpack the Xoro.zip file and manually move the files to their proper locations. The first step is to download Xoro (link to be added later) and unzip it to a temporary location on your hard drive. The zip file should contain the following files:

Installation steps:

  1. Move the plugins/Xoro directory into GATE's plugin directory (this will be GATE_HOME/plugins)
  2. Start GATE and register the Xoro plugin.
  3. Copy the Xoro.jar file someplace, GATE's home directory is a good location.
  4. Copy the Xoro.bat file someplace else. You could place this in GATE's home directory as well, however it is handy to have this on your PATH someplace.
  5. Edit the xoro.bat file and change the system variables GATE_HOME and XORO as required .
  6. Run the included hello.xoro program to test your installation.

Gate itself is not distributed with Xoro. It is assumed that you already have Gate up and running.

Limitations

Xoro was never intended to become a full-blown scripting language, and it was never intended to be released to the public. Xoro was developed as an in-house hack to get a job done. Over time features have been added and Xoro has evolved into something that is quite useful and worth sharing with the rest of the world. However, Xoro's early roots as a quick hack do show through from time to time. In particular:

License

The Xoro executable (jar file) is freely distributable and free to use for any purposes.

The source code for the current version of Xoro is available by request only. The source code for the current version of Xoro is not re-distributable. A future version of Xoro will be released as open source under a BSD-like license.

Xoro is Copyright © 2004, 2005 The American National Corpus Project. All rights reserved.

The Xoro Language

The entry point for every Xoro script is the main function with the signature

void main(string[] args)

The string array parameter args will be filled in with any command line arguments. If you are not going to be using command line arguments in your script you can leave the parenthesis empty. The classic HelloWorld program in Xoro looks like:

 
void main() 
{ 
		println("Hello world."); 
}

Comments

Xoro supports both single line comments and multiple line comments.

 
// This is a single line comment. 
/* 
 * This is a multiple 
 * line comment. 
 */

Operators

Only basic operators and types have been implemented (Xoro only supports integer types, mostly used for loop counters) so don't expect to calculate pi or ridiculously large prime numbers with Xoro.

Arithmetic Operators
+ addition
- subtraction
* multiplication
/ division
++ increment
-- decrement
 
Relational Operators
== equal
!= not equal
< less than
<= less than or equal
> greater than
>= greater than or equal
 
Logical Operators
&& and
|| or
! not

Note: Only prefix forms of the increment and decrement operators are implemented. That is, use ++i, do not use i++.

Note: There are no bitwise operators.

Types

Built-in Types

Xoro contains four built in types:

int an integer type represented internally by a java.lang.Integer object.
boolean a boolean type represented internally by a java.lang.Boolean object
string a string type represented internally by a java.lang.String object
object a generic object type represented by a java.lang.Object object

There is no type in Xoro that corresponds to Java's primitive int type. This makes it impossible to call methods such as java.lang.String.substring() that expects primitive int types for its parameters.

There is a pseudo void type, however it is only used as a place holder in function signatures to indicate that a function does not return a value. The void keyword can not be used in any other context.

Global variables are allowed, but they must be declared after the Java import statements and before any function definitions.

Array Types

Only very basic (and mostly broken) arrays are implemented. Arrays are declared similar to the way arrays are declared in Java, for example:

 
int[10] array1; 
object[] array2;
		

In addition, arrays may be initialized when they are declared.

 
string[] message = {"hello world", "foo bar"}; 
object[] resources = { resource1, resource2 };
		

You can not allocate arrays at runtime with the new operator.

User Defined Types

There are none. If you need a user defined type then you will have to write a Java class and use that.

Using Java Classes

Java classes may be imported with the use keyword. Each class must be imported individually, there is no mechanism to import entire Java packages. All import statements must appear at the top of the program, before any variable or function definitions.

To use a Java class the format is:

			use class as alias;
		

Where class is the full package and class name of the Java class to be imported, and alias is the name the class with be known as internally. The alias and the class name are not required to be the same.

For example:

 
use java.lang.String as String; 
use java.io.File as File; 
use java.util.LinkedList as List; 
use gate.Document as Document;
		

Note: To use a Java class you must import it with the use statement; you can not simply specify the full package and class name in the program. That is the following will not work.

 
use java.lang.String as String; 
			
void main(String[] args)
{ 
	String filename = "somefile.xml"; 
	java.io.File file = new java.io.File(filename);   // KA-BOOM!!! 
}
		

You can use any Java class as long as it can be found on the classpath. The new operator is used to allocate new obejcts as it is in Java.

 
use java.util.Stack as Stack; 
			
void main(string[] args) 
{ 
	Stack stack = new Stack(); 
	stack.push("foo"); 
	... 
}

Note: Only very basic type checking is done when the script is compiled, full type checking does not take place until each statement is evaluated. For example, the following code will run until the second push statement, at which time a Java exception will be thrown.

 
use java.io.File as File; 
use java.util.Stack as Stack; 
void main(string[] args) 
{ 
		object stack = new Stack(); // ok, a Stack is a kind of object 
		stack.push("foo"); // ok, the stack object does have a push method 
		stack = new File(); // ok, File is an object too.
		stack.push("bar"); // KA-BOOM... 
}

Note: There is no way to catch exceptions in Xoro. When an exception is thrown Xoro will print an incomprehensible error message and terminate.

Using Gate Resources

While it is possible to use Gate resources as you would any other Java class, Xoro provides special syntactic candy for intializing resources in resouces blocks. A resource block takes the form

 
resource alias = java.package.class 
{ 
		parameter = "value"; 
		... 
}

For example:

 
resource splitter = gate.creole.splitter.SentenceSplitter 
{ 
		name = "Splitter"; 
		gazetteerURL = "file:/c:/gate/gazetteer/lists.def";
		transducerURL = "file:/c:/gate/grammar/main.jape"; 
}
		

Before you can use a gate resource you need to call the built in load function which invokes gate.Factory.createResource() to create the resource for you. For example:

 
// A custom resource
resource saveStandoff = org.xces.creole.SaveStandoff 
{ 
		name = "Save standoff markup"; 
		standoffTags = [ "tok" ]; 
		standoffASName = "Standoff markups"; 
		namespace = "http://www.xces.org/schema/2003"; 
		destination = "file:/c:/corpus/sample.txt"; 
} 
void main(string[] args) 
{ 
		load(saveStandoff); 
		... 
		saveStandoff.execute(); 
}

The Factory Object

Since the gate.Factory class is declared as abstract it can not be imported like the other Gate classes and resources. So Xoro provides proxy for gate.Factory class named Factory (of course).

 
use gate.FeatureMap as FeatureMap; 
			
void main(string[] args) 
{
		FeatureMap fm = Factory.newFeatureMap(); 
		... 
}
		

Built-in Functions

User Defined Functions

Functions may be defined after the import block and global variable definitions. User defined functions may appear either before or after the main function. The syntax to define a function is:

<type> <identifier> ( [<parameter_list> ) <block>

Where:

For example:

 
void doNothing(string name, int value) 
{ 
		... 
}
		

Flow of Control

Blocks

All statement blocks in Xoro must be enclosed in curly braces. There is are no single line blocks after if, while, and for statements as there is in C++, Java, etc. That is, use:

 
if (x == y) 
{ 
		do_something(); 
}

Don't use:

 
	if (x == y) 
		do_something();

If Statements

If statements take the form:

 if (<boolean_expression>) <block>
		

Where <boolean_expression> is the expression to be evaluated to determine if the statement block is executed. An if statments may also have an else clause

 
if (<boolean_expression>) 
	<block> 
else 
	<block>

And multiple if-else statements can be chained together.

 
if (<boolean_expression>) 
	<block> 
else if (<boolean_expression>) 
	<block> 
else ...

For Loops

The for statement takes the form

 
for (<init>; <boolean_expression>;<step>) 
	<block>

Where:

For example:

 
for (int i = 0; i < 10; ++i)  // remember, no post-fix increment operator... 
{ 
	... 
}

Note: multiple expressions can not be used in for statements. That is, the following will not work:

 
for (int i = 0, int j = 10; i < 10; ++i, --i) 
{ 
	... 
}

While Loops

The while statement takes the form:

 
while ( <boolean_expression> ) 
	<block>

Where <boolean_expression> is the expression that will be evaluated to determine if an iteration of the statement block should be performed.

Flow of control Statements

exit
Can be used to cause the early termination of the script.

break
continue
Coming soon. Using them won't cause a syntax error in the scipt... they just don't do anything yet.

Examples

Below are some example scripts to show how to perform various tasks with Xoro.

Check the syntax of a Jape Grammar

There are several factors that can make debugging a Jape grammar a daunting task. Not the least of which is simply making sure your grammar is syntactically correct. A Jape grammar that contains syntax errors with causes whatever processing resource that tries to use throw an exception. Depending on what threw the exception, and where, restarting Gate may be required to reload the resource. For example, if you are developing a custom processing resource that uses a gate.creole.Transducer resource internally and you want Gate to reload your jar file.

Below is a script that tries to instantiate a gate.creole.Transducer with a specified Jape grammar. If the grammar is syntactically correct the script with print an appropriate message and exit. If the grammar is not syntactically correct a Gate exception will be throw and the corresponding error message and stack trace will be displayed.

 
use gate.FeatureMap as FeatureMap; 
use java.io.File as File; 
void
main(string[] args) 
{
	// Check the args 
	if (length(args) == 0) 
	{
		println("No grammar file specified."); 
		exit; 
	} 
	// Test for the existence of the grammar file. 
	File file = new File(args[0]); 
	if (!file.exists()) 
	{
		println("Could not find " + file.getPath()); 
		exit; 
	} 
	FeatureMap fm = Factory.newFeatureMap(); 
	fm.put("grammarURL", "file:" + file.getPath());
	object transducer = Factory.createResource("gate.creole.Transducer", fm);
	// If we made it this far the grammer was ok! 
	println("Success! The grammar appears to be syntactically correct."); 
}

Run several resources on a directory of files

This script is similar to a pipeline application in Gate. This example runs Gate's default tokenizer, sentence splitter, and POS tagger on all the files in a given directory and saves the resulting XML files in another directory.

 
use gate.Document as Document; 
use java.io.File as File; 
use java.net.URL as URL; 

// Use the default values for all the resources
resource tokenizer = gate.creole.tokeniser.DefaultTokeniser { } 
resource splitter = gate.creole.splitter.SentenceSplitter { } 
resource tagger = gate.creole.POSTagger { } 
resource saver = org.xces.creole.SaveMarkup { }
	
// Put our resources in an array for easy access. This is the order in which they will 
// be executed as well. 
object[] resources = { tokenizer, splitter, tagger, saver } 			
int nResources = length(resources); 

// Where we will write the processed files. 
File outputDir; 

void main(string[] args) 
{ 
	String xces = getenv("XCES_HOME"); 
	LoadResources(); 
	// Error checking omitted for brevity... we'll assume the args are correct... 
	File inputDir = new File(args[0]); 
	File outputDir = new File(args[1]); 
	File[] files = inputDir.listFiles(); 
	int nFiles = length(files); 
	for (int i = 0; i < nFiles; ++i) 
	{ 
		Process(files[i]); 
	}
	println("Done."); 
}
			
void LoadResources() 
{ 
	for (int i = 0; i < nResources; ++i) 
	{
		load(resources[i]); 
	} 
} 
			
void Process(File file) 
{ 
	Document doc = Factory.newDocument(new URL("file:" + file.getPath())); 
	// Set the destination file. 
	File dest = new File(outputDir, file.getName());
	saver.setDestination(new URL("file:" + dest.getPath())); 
	// Now processs the document with all the resources. 
	for (int i = 0; i < nResourse; ++i) 
	{ 
		object res = resources[i]; 
		res.setDocument(doc); 
		res.execute();
		// make sure no resource keeps hanging onto the document. 
		res.setDocument(null); 
	} 
	// Clean up the document 
	Factory.deleteResource(doc); 
	doc = null; 
}
			

Acknowlegements