Title | CUCWeb: a Catalan corpus built from the Web |
Publication Type | Conference Paper |
Year of Publication | 2006 |
Authors | Boleda G, Bott S, Meza R, Castillo C, Badia T, Lopez V |
Conference Name | 2nd Web as Corpus Workshop held in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006) |
Publisher | Association for Computational Linguistics |
Conference Location | Trento, Italy |
Abstract | This paper presents CUCWeb, a 166 million word corpus for Catalan built by crawling the Web. The corpus has been annotated with NLP tools and made available to language users through a flexible web interface. The developed architecture is quite general, so that it can be used to create corpora for other languages. |
URL | http://aclweb.org/anthology/W/W06/W06-1704.pdf |
- Log in or register to post comments
- Google Scholar