Tokeniser | Name recogniser | POS-tagger | Lemmatiser | NP-recogniser |
Repetitiveness checker | Keywords | Multi word terms

Combination of tools

Summarisation tool | Computational Lexicon for Danish | Anonymisation

CST's online demos

This page links to demo versions of our language technological tools, grammars and word bases.

Here you can see were you need language technological applications.

CST's tokeniser creates segments containing one sentence each and divides these into tokens - words, numbers and punctuation.
Name recogniser
CST's named enity recogniser demarcates and classifies proper nouns in a text.
The POS-tagger automatically assigns word class information to each word in a text, whether it is a noun, a verb, etc.
CST's lemmatiser reduces each word form in a text to the word's lemma form, the base form. The lemma form is the expression you would use to do a look up in a dictionary.
The Cass NP-chunker demarcates simple noun phrases.
Repetitiveness checker
The program finds repetitions of word groups that somehow stand out, using a statistical method. E.g. in a EU-related text: on the basis of or the high contracting parties.
This program extracts 20 keywords characterising an input text.
Multi word terms
The program finds the most relevant adjective + noun combination among the words in the text.
Test a combination of tools
The aforementioned tools can work together in different constellations.
Summarisation tool
The summarisation tool (DanSum) can be used for automatic summarisation of Danish newspaper and text documents.
Computational Lexicon for Danish
User interface to STO.
IDentification and ANonymisation of NAmes (IDANNA).

BlÄ linie
Emil Holms Kanal 2, building 22, 3, DK-2300 Copenhagen S
Valid XHTML 1.0 Strict