You are here

Weighting Query Terms Based on Distributional Statistics

TitleWeighting Query Terms Based on Distributional Statistics
Publication TypeBook Chapter
Year of Publication2006
AuthorsKarlgren, J, Sahlgren, M, Cöster, R
EditorPeters, C, Gey, FC, Gonzalo, J, Müller, H, Jones, G, Kluck, M, Magnini, B, de Rijke, M
Book TitleAccessing Multilingual Information Repositories
Series TitleLecture Notes in Computer Science
Volume4022
Pagination208-211
PublisherSpringer
CityBerlin / Heidelberg
ISBN978-3-540-45697-1
Abstract

This year, the SICS team has concentrated on query processing and on the internal topical structure of the query, specifically compound translation. Compound translation is non-trivial due to dependencies between compound elements. This year, we have investigated topical dependencies between query terms: if a query term happens to be non-topical or noise, it should be discarded or given a low weight when ranking retrieved documents; if a query term shows high topicality its weight should be boosted. The two experiments described here are based on the analysis of the distributional character of query terms: one using similarity of occurrence context between query terms globally across the entire collection; the other using the likelihood of individual terms to appear topically in individual texts. Both – complementary – boosting schemes tested delivered improved results.

DOI10.1007/11878773_24