|Title||CUCWeb: a Catalan corpus built from the Web|
|Publication Type||Conference Paper|
|Year of Publication||2006|
|Authors||Boleda, G, Bott, S, Meza, R, Castillo, C, Badia, T, Lopez, V|
|Conference Name||2nd Web as Corpus Workshop held in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006)|
|Publisher||Association for Computational Linguistics|
|Conference Location||Trento, Italy|
This paper presents CUCWeb, a 166 million word corpus for Catalan built by crawling the Web. The corpus has been annotated with NLP tools and made available to language users through a ﬂexible web interface. The developed architecture is quite general, so that it can be used to create corpora for other languages.