You are here

Simple or Not Simple? A Readability Question

TitleSimple or Not Simple? A Readability Question
Publication TypeBook Chapter
Year of Publication2015
AuthorsŠtajner S, Mitkov R, Corpas Pastor G
EditorGala N, Rapp R, Bel-Enguix G
Book TitleLanguage Production, Cognition, and the Lexicon
Series TitleText, Speech and Language Technology
PublisherSpringer International Publishing
ISBN Number978-3-319-08042-0

Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.