Principled Query Processing

Year of Publication2005
AuthorsKarlgren, J, Sahlgren, M, Cöster, R

This year, the SICS team decided to concentrate on query processing and on the internal topical structure of the query: we have identified this as one of the major bottlenecks for cross-lingual access systems. Previous years, the SICS team has investigated, among other issues, how to translate compounds. Compound translation is non-trivial due to dependencies between compound elements and has been treated in various ways in the treatment of compounding languages such as Swedish. We decided this year to investigate the topical dependencies between query terms, under the hypothesis that the complexity of translating compounds is a special case of the more general case of understanding the respective topicality of query terms.
The question under investigation is how much each query term contributes in terms of topicality in the documents of the collection under consideration. If a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. Our base system is used with two different enhancements to test the hypothesis that boosting topically active terms is beneficial for retreival results.
Both schemes are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms; the other using the likelihood of individual terms to appear topically in text. These are two different avenues of analysis and will most likely provide different results if pursued further than
these initial experiments.
The results of the boosting schemes delivered uncontroversially improved results. These results will provide impetus for the further study of translation of complex terms — the question which first prompted this set of experiments in the first place.