What is Callisto?

Callisto is an annotation tool developed to support linguistic annotation of textual sources for any Unicode-supported language. It is written in Java, and The initial development of the tool by the MITRE Corporation was funded by the U. S. government.

Where do I get help on Callisto?

This FAQ answers some frequent questions already. If you do not find the answer to your question here, consider joining the callisto-users-list and posting your question there.

What System Requirements does Callisto have?

Callisto requires Java version 1.5 or better to run.

If you're developing programs, you want the SDK. To just use Java programs you want the JRE.

Using Callisto

Using Callisto
Can I annotate files which are not UTF-8 encoded?

Yes, you can specify the character encoding (which defaults to UTF-8) of the signal file when opening or importing. If you choose the wrong encoding, you may see your text in the wrong font, or some characters will look meaningless (Though this can also be caused by using a font that does not have all the characters in the text). You can re-read the file in a different encoding by selecting a different "Character Encoding" from the "Format" menu.

Why are tag offsets in my colleague's file wrong?

This is almost always caused by some program changing the new-line characters automatically, while exchanging the files.  This should not occur in the latest versions of Callisto because we now do encode the original signal in the annotation file.

The cause of the problem:

Different operating systems use different characters to represent "new-line": some use two characters, while others use only one. With stand-off annotation, if the data-files have the new-lines changed, the annotation-file must have all of it's offsets updated, or each annotation will be "off by one" for each preceding newline.

The following means are known to "auto-convert" files:

  • Using FTP to transfer files in ASCII mode
  • Using WinZip to un-zip archives (WinZip's default settings will convert automatically, though you can change that in it's preferences
  • Sending files as text attachments in e-mail. Many e-mail clients will convert when attaching and detaching.

How you can fix it

We've considered several means of automatically correcting the problem in Callisto. Unfortunately, without embedding the original data file in the standoff annotation file, it's impossible to automatically correct all problems.

That said, correction could be as easy as changing all new-lines to DOS or UNIX style. This can be done in several ways.

  • Good text editors can save using different new-lines (eg. emacs, EditPlus, JEdit, BBEdit)
  • There are several utilities that just change new-lines (eg. dos2unix, unix2dos, d2u, u2d)
  • If you have Perl, these one-liners will work:
    Conversion Perl script
    DOS to UNIX perl -i -pe 's/\x0d\x0a/\x0a/g' <filename>
    UNIX to DOS perl -i -pe 's/\x0a/\x0d\x0a/g' <filename>
    MAC to DOS perl -i -pe 's/\x0d/\x0d\x0a/g' <filename>

How you can prevent it

The Most reliable mechanism we have found is to use the "tar" (and optionally "gzip") utilities to archive and unpack files before transferring them. Windows users can get these command line tools with the cygwin tools.

Windows users can use WinZip if the preferences are corrected on the machine where they are unpacked. Open WinZip, and open the "Options->Configuration" menu. Under the "Miscellaneous" tab, in the "Other" group, un-check the "TAR file smart CR/LF conversion" option.