|Title||Annotated web as corpus|
|Publication Type||Conference Paper|
|Year of Publication||2006|
|Authors||Rayson, P, Walkerdine, J, Fletcher, WH, Kilgarriff, A|
|Conference Name||2nd Web as Corpus Workshop held in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006)|
|Conference Location||Trento, Italy|
This paper presents a proposal to facilitate the use of the annotated web as corpus by alleviating the annotation bottleneck for corpus data drawn from the web. We describe a framework for large-scale distributed corpus annotation using peer-to-peer (P2P) technology to meet this need. We also propose to annotate a large reference corpus in order to evaluate this framework. This will allow us to investigate the affordances offered by distributed techniques to ensure replicability of linguistic research based on web-derived corpora.