You are here

Unsupervised segmentation of words using prior distributions of morph length and frequency

TitleUnsupervised segmentation of words using prior distributions of morph length and frequency
Publication TypeConference Paper
Year of Publication2003
AuthorsCreutz, M
Conference NameACL’03
Abstract

We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant prior information on the length and frequency distributions of morphs in a language. Our algorithm is shown to outperform two competing algorithms, when evaluated on data from a language with agglutinative morphology (Finnish), and to perform well also on English data.

URLhttp://lib.tkk.fi/Diss/2006/isbn9512282119/article2.pdf