You are here

Vector-based semantic analysis using random indexing and morphological analysis for cross-lingual information retrieval

TitleVector-based semantic analysis using random indexing and morphological analysis for cross-lingual information retrieval
Publication TypeBook Chapter
Year of Publication2002
AuthorsKarlgren J, Sahlgren M
Book TitleRevised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, Darmstadt, Germany, September 3 - 4
Series TitleLecture Notes In Computer Science
Pagination169-176
PublisherSpringer
ISBN3-540-44042-9
Keywordsinformation retrieval
Abstract

Meaning, the main object of study in information access, is most decidedly situation-dependent. While much of meaning appears to achieve consistency across usage situations -- a term will seem to mean much the same thing in many of its contexts -- most everything can be negotiated on the go. Human processing appears to be flexible in this respect, and oriented towards learning from prototypes rather than learning by definition: learning new words, and adding new meanings or shades of meaning to an existing word does not need a formal re-training process. We have built a query expansion and translation tool for information retrieval systems. When used in one single language it will expand the terms of a query using a thesaurus built for that purpose; when used across languages it will provide numerous translations and near translations for the source language terms. The underlying technology we are testing is that of vector-based semantic analysis, an analysis method related to latent semantic indexing based on stochastic pattern computing. This paper will briefly describe how we acquired training data, aligned it, analyzed it using morphological analysis tools, and finally built a thesaurus using the data, but will concentrate on an overview of vector-based semantic analysis and how stochastic pattern computing differs from latent semantic indexing in its current form.

URLhttp://www.ercim.eu/publication/ws-proceedings/CLEF2/karlgren.pdf