The Linguistic Engine EXTRAKT

The linguistic engine EXTRAKT is the main product of TEXTEC.

It carries out a (morpho-syntactic) analysis of most of the European languages. Besides the analyze function, EXTRAKT contains a whole series of linguistic functions.

Basis of the system are dictionaries whose number of entries vary from approx. 500,000 up to 1,8 million entries. Different (domain) specific dictionaries are also available.

A basic form is a word form, which stands for the different forms of the same word. The German word Haus is the basic form for example for Haus, Hause, Hauses, Häuser and Häusern. The basic form go stands for all forms of this verb, i.e. go, goes, went, and gone.

For the German language, a dictionary of about 1,5 million entries was built up. In addition, special dictionaries containing "ersatz" representations of specific characters (Haeuser instead of Häuser) as well as a dictionary with split German compounds (approx. 1,500,000 entries), a French dictionary with accentless entries (like methode for méthode) etc. are available.

For German there is a dictionary of 120,000 synonyms and related word forms (derivations) with 150,000 entries and a thesaurus.

There are 36,000 English and 25,000 French synonyms available.

The lemmata can be translated into another language so that a multilingual (cross-lingual) search becomes possible.

EXTRAKT at this time has bilingual dictionaries, which have 50,000 to 170,000 entries. These dictionaries are constantly updated.

Private dictionaries can be created by the customer himself and can be added to the system.

EXTRAKT was equipped with the generation function GENERATE: this function generates all forms of a given basic word form. Therefore, the word "Hauses" (genitive singular) can be entered and GENERATE generates from this basic form Haus all inflected forms Hause, Hauses, Häuser and Häusern.

GENERATE is available for all languages covered by our system because the mono-lingual dictionaries are used for analysis and for generation.

EXTRAKT has a client/server architecture, however, is also provided as DLL.

EXTRAKT is available as simple C++-DLL, as TCP/IP-Server (EXTRAKT - Server) and as COM/DCOM - Server. The C++-DLL can be integrated directly in client programs. Any client system can communicate with the EXTRAKT - Server using a simple protocol; requests can be formulated as strings. In most of the installations, the server version is choosen.

Furthermore, TEXTEC offers special interface modules for different platforms in order to make the communication with the server (as well as the integration of the EXTRAKT functions) as easy as possible. EXTRAKT as a COM/DCOM component allows an easy integration in local and distributed systems.


The INDEX and GENERATE functions are available in JAVA.

Newest version of EXTRAKT is 5.0 b03 (June 2014).

