Towards chunk-based translation memories

Publication TypeJournal Article
Year of Publication2008
AuthorsColominas, C

Most of the current Translation Memory systems are based on segments determined by marks that in most cases correspond to a complete sentence. The problem of complete sentence matching is that examples are often excluded from the matching candidates even though they probably contain one or more useful sub-segments that could be helpful to the translation. In view of these limitations, some proposals have been made in the literature regarding the possibility of building Translation Memory systems that operate “below” the sentence level, that is to say, at a sub-sentential level. Existing work demonstrates that sub-sentential segmentation of Translation Memories clearly shows a significantly best recall with respect to sentential segmentation. Accepting the benefits of sub-sentential segmentation of Translation Memories, in this paper we consider different possibilities of sub-sentential segmentation and attempt an evaluation of the recall (covering sequences of chunks) and the precision (usability of these chunks) obtained by noun phrase chunk segmentation. Our experiments show that pre- or postmodified NPs turn out to be especially adequate for pretranslation tasks as they show a minimum cost by a maximum gain. In other words, their translation is on the one side not trivial as it often involves structural divergences between language and at the same time they are context independent enough that they can be reused without changes in most cases.