You are here

Simple or Not Simple? A Readability Question

Title	Simple or Not Simple? A Readability Question
Publication Type	Book Chapter
Year of Publication	2015
Authors	Štajner S, Mitkov R, Corpas Pastor G
Editor	Gala N, Rapp R, Bel-Enguix G
Book Title	Language Production, Cognition, and the Lexicon
Series Title	Text, Speech and Language Technology
Volume	48
Pagination	379-398
Publisher	Springer International Publishing
ISBN Number	978-3-319-08042-0
Abstract	Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers struggling with a language disability. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder. The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.
DOI	10.1007/978-3-319-08043-7_22

Log in or register to post comments
Google Scholar
DOI