Linguistic Engine EXTRAKT
The linguistic engine EXTRAKT is a complete set of functions for natural language processing (NLP). It is the basis for mono-lingual or multi-lingual (cross-lingual) applications
in different domains, such as indexing, lemmatisation, language identification, text classification etc.
In most cases EXTRAKT is used as an add-on for the handling of search requests in Internet search engines, library systems or shop systems.
The programming and development began in 1990. First of all it was the German component of the multi-lingual full text retrieval system EMIR (European Multilingual Information Retrieval). EMIR was probably the first multi lingual full text retrieval system worldwide.
Meanwhile, EXTRAKT covers most of the European languages - so that a whole set of multi language systems can be offered.
EXTRAKT's main component are dictionaries, that means that most of the information is stored in dictionaries. The access to the linguistic information is very fast. Therefore, EXTRAKT is working at high speed and even huge amount of documents are no problem for it.
EXTRAKT-I is an EXTRAKT application which uses directly the EXTRAKT API: -I stands for integrated.
EXTRAKT-I reads the content of an input file and this content is processed by the linguistic functions.
A configuration file controls the process completely. This version of EXTRAKT is about 2 times to 10 times faster than the server version.
Within the above mentionned EMIR project, the decision was taken, to use only full form dictionaries - a rather uncommon decision at this time - but it was the right decision (due to Prof. Christian Fluhr from the French CEA): a single dictionary look-up returns the desired information out of a given dictionary.