Identification of Similar Documents Using Coherent Chunks

Publication TypeBook Chapter
Year of Publication2009
AuthorsLalitha Devi S, Kuppan S, Venkataswamy K, Rao PRK
EditorLalitha Devi S, Branco A, Mitkov R
Book TitleAnaphora Processing and Applications
Series TitleLecture Notes in Computer Science
CityBerlin / Heidelberg
ISBN Number978-3-642-04974-3

We focus on automatically finding similar documents using coherent chunks. The similarity between the documents is determined by identifying the coherent chunks present in them. We apply linguistic rules in identifying the coherent chunks and uses Vector Space Model (VSM) in determining the similarity among documents. We have taken patent documents from USPTO for this work. This method of using coherent chunks for identifying similar documents has shown encouraging results.