Download of software
GNU
The following software is under the GNU General Public License (GPL).
- CST's lemmatiser
- The package covers the source text (C++) of CST's lemmatiser. After compilation for your platform (Linux, Unix, Windows) you can train the programme.
For languages like Danish, Norwegian and Swedish with rich morphology you need large full form word lists in order to attain a reasonable good result.
Contakt CST if you want to use CST's linguistic resources for Danish to train the lemmatiser.
These resources are not covered by the GPL.
- Bracmat
-
Bracmat is an interpreted programming language that is developed by one of CST's
staff members since 1986. Originally it was designed as a Computer Algebra system,
but it has shown its merits in natural language processing as well. It has been
used in the field of General Relativity for the algebraic computation of Ricci tensors
from given space-time metrics, for the implementation of a dialogue-manager in the
Staging-project, for the analysis of texts in the "Controlled
Language"-part of the VID-project and for automatic error correction
of CST's many html-pages. Read more about Bracmat.
Other licenses than GNU
CST uses free third-party software that we have adapted to our needs,
typically to enable us to run the software on platforms that it originally had not been written for.
We are happy to pass on these programmes to other users under the same licenses:
- POS-tagger written by Eric Brill
- CST uses this POS-tagger in many applications for the analysis of
English texts (using Eric Brill's linguistic resources, in some cases with small
adaptations) as well as Danish texts (with CST's linguistic resources).
The distribution comprises Eric Brill's original
distribution and a Zip-file with CST's software adaptations. Note that the training part
of Brill's tagger is unchanged!
We have made the following adaptations:
- Reformatting from UNIX-style C to standard C++,
- Replacement of some UNIX-specific functions with standard C
functions,
- Better handling of capitals in (supposedly) headings, and
- The introduction of an optionfile "xoptions" to make the source code independent of language and tagset.
- The CASS parser written by Steven Abney
- CST has used the CASS-parser in the VID-project for marking up
noun phrases in large text corpora. The distribution comprises
Steven Abneys original distribution and a Zip-file with CST's
adaptations. The adaptations are minimal but relevant if you want to compile
the programme with one of the newer GNU-C++ compilers.
(UPDATE: after we did these adaptations, Steven Abney has made adaptations to solve the same compatibility problems. However, nor CST's distribution, nor Steven Abney's seem to compile with the newest generation of GNU C++ compilers (version 4 and above)!)
Linguistic resources
If you are interested in obtaining linguistic resources that have been produced
under the auspices of CST (STO, training data for the POS-tagger or for the
lemmatiser, grammars for the NP-recogniser, rules for the name recogniser), please
contact Hanne Fersøe (hanne@cst.dk).
|