You are here

CUCWeb: a Catalan corpus built from the Web

TitleCUCWeb: a Catalan corpus built from the Web
Publication TypeConference Paper
Year of Publication2006
AuthorsBoleda G, Bott S, Meza R, Castillo C, Badia T, Lopez V
Conference Name2nd Web as Corpus Workshop held in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006)
PublisherAssociation for Computational Linguistics
Conference LocationTrento, Italy

This paper presents CUCWeb, a 166 million word corpus for Catalan built by crawling the Web. The corpus has been annotated with NLP tools and made available to language users through a flexible web interface. The developed architecture is quite general, so that it can be used to create corpora for other languages.