|Title||A vector space analysis of swedish patent claims with different linguistic indices|
|Publication Type||Conference Paper|
|Year of Publication||2010|
|Conference Name||3rd international workshop on Patent information retrieval|
|Conference Location||Toronto, ON, Canada|
The purpose of this study was twofold, first to examine if it is possible to use a general automatic retrieval model, the Vector Space Model (VSM), in order to discover similarities between Swedish patent claims; and second to examine whether an addition morphological decompounding module at the pre-processing level improves the result. In the present study, a comparison between three different topic sets consisting of patent claims was compared against an entire collection of 30,117 claims. The VSM was evaluated with and without additional morphological decompounding modules. The results indicate that decompounding will influence the performance of the retrieval model in a positive way. However, the sublanguage of patent claims and the errors made during the Optical Character Recognition (OCR) process were harmful towards the overall performance of the Natural Language Processing (NLP) applications as well as for the retrieval model.