You are here

Pre-processing text for web information retrieval purposes by splitting compounds into their morphemes

TitlePre-processing text for web information retrieval purposes by splitting compounds into their morphemes
Publication TypeConference Paper
Year of Publication2005
AuthorsAbels S, Hahn A
Conference NameOpen Source Web Information Retrieval (OSWIR)

In web information retrieval, the interpretation of text is crucial. In this paper, we describe an approach to ease the interpretation of compound word (ie words that consist of other words such as “handshake” or “blackboard”). We argue that in the web information
retrieval domain, a fast decomposition of those words is necessary and a way to split as many words as possible, while we believe that on the other side a small error rate is acceptable. Our approach allows the decomposition of compounds within a very reasonable amount of time. Our approach is language independent and currently available as an open source realization.