Measuring Lexical Cohesion: Beyond Word Repetition

Publication TypeConference Paper
Year of Publication2014
AuthorsKazantseva A, Szpakowicz S
Conference NameCOLING 2014, the 25th International Conference on Computational Linguistics
Conference LocationDublin, Ireland

This paper considers the problem of finding topical shifts in documents and in particular at what information can be leveraged to identify them. Recent research on topical segmentation usually
assumes that topical shifts in discourse are signalled by changes in vocabulary. This information, however, is not always a sufficient indicator of a topical shift, especially for certain genres. This
paper explores an additional source of information. Our hypothesis is that the type of a referring expression is an indicator of how accessible its antecedent is. The shorter and less informative
the expression (e.g., a personal pronoun versus a lengthy post-modified noun phrase), the more accessible the antecedent is likely to be and the more likely it is that the topic under discussion has
remained constant between the two mentions. We explore how this information can be used to augment a lexically-based topical segmenter. We test our hypothesis on two types of data, literary
narratives and lecture notes. The results suggest that our similarity metric is useful: depending on the settings it either slightly improves the performance or leaves it unchanged. They also suggest
that certain types of referring expressions are more useful than others.