Once the Connexor's metadata discovery has found the entities in the text, we can choose which types of entities we need for our purposes. We may only be interested in certain types of entities, such as people, companies, genes, viruses, or chemical compounds. Connexor software assigns each and every found entity to a category.
Connexor (then called Conexor) was founded in 1997 by a group of young entrepreneurs with their background in research on natural language analysis and computing. The co-founders Atro Voutilainen, Pasi Tapanainen and Timo Järvinen started to commercialize their successful academic research on morphology, morphosyntactic tagging and full-scale syntactic parsing that they carried out at University of Helsinki through the 1990s and reported in leading international conferences on natural language processing and other leading publishers.
, academic institutions
, time expressions
Information obtained by entity categorization is also useful for text type classification. The presence of certain types of entities can identify the text type. Another factor used for document classification is the language, in which the text is written.
Classification allows multiple categories per document. The categories are either predefined by the user, or the software defines them on the basis of other similar documents.
Connexor's software is available with over 150 predefined entity categories although the default classification may not suit everyone's needs. In that case, users can speficy their own custom ontologies, which include all those entities, categories and category hierarchies, which they are interested in.