You are here

A vector space analysis of swedish patent claims with different linguistic indices

TitleA vector space analysis of swedish patent claims with different linguistic indices
Publication TypeConference Paper
Year of Publication2010
AuthorsAndersson, L
Conference Name3rd international workshop on Patent information retrieval
PublisherACM
Conference LocationToronto, ON, Canada
ISBN Number978-1-4503-0384-2
Abstract

The purpose of this study was twofold, first to examine if it is possible to use a general automatic retrieval model, the Vector Space Model (VSM), in order to discover similarities between Swedish patent claims; and second to examine whether an addition morphological decompounding module at the pre-processing level improves the result. In the present study, a comparison between three different topic sets consisting of patent claims was compared against an entire collection of 30,117 claims. The VSM was evaluated with and without additional morphological decompounding modules. The results indicate that decompounding will influence the performance of the retrieval model in a positive way. However, the sublanguage of patent claims and the errors made during the Optical Character Recognition (OCR) process were harmful towards the overall performance of the Natural Language Processing (NLP) applications as well as for the retrieval model.

DOI10.1145/1871888.1871898