You are here

Weighting Query Terms Based on Distributional Statistics

TitleWeighting Query Terms Based on Distributional Statistics
Publication TypeBook Chapter
Year of Publication2006
AuthorsKarlgren J, Sahlgren M, Cöster R
EditorPeters C, Gey FC, Gonzalo J, Müller H, Jones G, Kluck M, Magnini B, de Rijke M
Book TitleAccessing Multilingual Information Repositories
Series TitleLecture Notes in Computer Science
CityBerlin / Heidelberg

This year, the SICS team has concentrated on query processing and on the internal topical structure of the query, specifically compound translation. Compound translation is non-trivial due to dependencies between compound elements. This year, we have investigated topical dependencies between query terms: if a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. The two experiments described here are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms globally across the entire collection; the other using the likelihood of individual terms to appear topically in individual texts. Both – complementary – boosting schemes tested delivered improved results.