|Title||Vector-based semantic analysis using random indexing and morphological analysis for cross-lingual information retrieval|
|Publication Type||Book Chapter|
|Year of Publication||2002|
|Authors||Karlgren J, Sahlgren M|
|Book Title||Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, Darmstadt, Germany, September 3 - 4|
|Series Title||Lecture Notes In Computer Science|
Meaning, the main object of study in information access, is most decidedly situation-dependent. While much of meaning appears to achieve consistency across usage situations -- a term will seem to mean much the same thing in many of its contexts -- most everything can be negotiated on the go. Human processing appears to be flexible in this respect, and oriented towards learning from prototypes rather than learning by definition: learning new words, and adding new meanings or shades of meaning to an existing word does not need a formal re-training process. We have built a query expansion and translation tool for information retrieval systems. When used in one single language it will expand the terms of a query using a thesaurus built for that purpose; when used across languages it will provide numerous translations and near translations for the source language terms. The underlying technology we are testing is that of vector-based semantic analysis, an analysis method related to latent semantic indexing based on stochastic pattern computing. This paper will briefly describe how we acquired training data, aligned it, analyzed it using morphological analysis tools, and finally built a thesaurus using the data, but will concentrate on an overview of vector-based semantic analysis and how stochastic pattern computing differs from latent semantic indexing in its current form.