Download of software

GNU The Dynamic Duo: The Gnu and the Penguin in flight

The following software is under the GNU General Public License (GPL).

CST's lemmatiser
The package covers the source text (C++) of CST's lemmatiser. After compilation for your platform (Linux, Unix, Windows) you can train the programme. For languages like Danish, Norwegian and Swedish with rich morphology you need large full form word lists in order to attain a reasonable good result. Contakt CST if you want to use CST's linguistic resources for Danish to train the lemmatiser. These resources are not covered by the GPL.
Bracmat
Bracmat is an interpreted programming language that is developed by one of CST's staff members since 1986. Originally it was designed as a Computer Algebra system, but it has shown its merits in natural language processing as well. It has been used in the field of General Relativity for the algebraic computation of Ricci tensors from given space-time metrics, for the implementation of a dialogue-manager in the Staging-project, for the analysis of texts in the "Controlled Language"-part of the VID-project and for automatic error correction of CST's many html-pages. Read more about Bracmat.

Other licenses than GNU

CST uses free third-party software that we have adapted to our needs, typically to enable us to run the software on platforms that it originally had not been written for. We are happy to pass on these programmes to other users under the same licenses:

POS-tagger written by Eric Brill
CST uses this POS-tagger in many applications for the analysis of English texts (using Eric Brill's linguistic resources, in some cases with small adaptations) as well as Danish texts (with CST's linguistic resources). The distribution comprises Eric Brill's original distribution and a Zip-file with CST's software adaptations. Note that the training part of Brill's tagger is unchanged! We have made the following adaptations:
  • Reformatting from UNIX-style C to standard C++,
  • Replacement of some UNIX-specific functions with standard C functions,
  • Better handling of capitals in (supposedly) headings, and
  • The introduction of an optionfile "xoptions" to make the source code independent of language and tagset.
The CASS parser written by Steven Abney
CST has used the CASS-parser in the VID-project for marking up noun phrases in large text corpora. The distribution comprises Steven Abneys original distribution and a Zip-file with CST's adaptations. The adaptations are minimal but relevant if you want to compile the programme with one of the newer GNU-C++ compilers. (UPDATE: after we did these adaptations, Steven Abney has made adaptations to solve the same compatibility problems. However, nor CST's distribution, nor Steven Abney's seem to compile with the newest generation of GNU C++ compilers (version 4 and above)!)

Linguistic resources

If you are interested in obtaining linguistic resources that have been produced under the auspices of CST (STO, training data for the POS-tagger or for the lemmatiser, grammars for the NP-recogniser, rules for the name recogniser), please contact Hanne Fersøe (hanne@cst.dk).


Blå linie
Njalsgade 140-142, building 25, DK-2300 Copenhagen S
Tlf: +45 35329090 - Fax: +45 35329089
Valid XHTML 1.0 Strict