You are here

Diachronic Changes in Text Complexity in 20th Century English Language: An NLP Approach

Title	Diachronic Changes in Text Complexity in 20th Century English Language: An NLP Approach
Publication Type	Conference Paper
Year of Publication	2012
Authors	Štajner S, Mitkov R
Conference Name	The eighth international conference on Language Resources and Evaluation (LREC)
Publisher	European Language Resources Association (ELRA)
Conference Location	Istanbul, Turkey
Abstract	A syntactically complex text may represent a problem for both comprehension by humans and various NLP tasks. A large number of studies in text simplification are concerned with this problem and their aim is to transform the given text into a simplified form in order to make it accessible to the wider audience. In this study, we were investigating what the natural tendency of texts is in 20th century English language. Are they becoming syntactically more complex over the years, requiring a higher literacy level and greater effort from the readers, or are they becoming simpler and easier to read? We examined several factors of text complexity (average sentence length, Automated Readability Index, sentence complexity and passive voice) in the 20th century for two main English language varieties – British and American, using the ‘Brown family’ of corpora. In British English, we compared the complexity of texts published in 1931, 1961 and 1991, while in American English we compared the complexity of texts published in 1961 and 1992. Furthermore, we demonstrated how the state-of-the-art NLP tools can be used for automatic extraction of some complex features from the raw text version of the corpora.
URL	http://www.lrec-conf.org/proceedings/lrec2012/pdf/355_Paper.pdf

Log in or register to post comments
Google Scholar