|Title||What Can Readability Measures Really Tell Us About Text Complexity?|
|Publication Type||Conference Paper|
|Year of Publication||2012|
|Authors||Štajner, S, Evans, R, Orăsan, C, Mitkov, R|
|Conference Name||Natural Language Processing for Improving Textual Accessibility (NLP4ITA)|
|Publisher||European Language Resources Association (ELRA)|
|Conference Location||Istanbul, Turkey|
This study presents the results of an initial phase of a project seeking to convert texts into a more accessible form for people with autism spectrum disorders by means of text simplification technologies. Random samples of Simple Wikipedia articles are compared with texts from News, Health, and Fiction genres using four standard readability indices (Kincaid, Flesch, Fog and SMOG) and sixteen linguistically motivated features. The comparison of readability indices across the four genres indicated that the Fiction genre was relatively easy whereas the News genre was relatively difficult to read. The correlation of four readability indices was measured, revealing that they are almost perfectly linearly correlated and that this correlation is not genre dependent. The correlation of the sixteen linguistic features to the readability indices was also measured. The results of these experiments indicate that some of the linguistic features are well correlated with the readability measures and that these correlations are genre dependent. The maximum correlation was observed for fiction.