Title | Unsupervised segmentation of words using prior distributions of morph length and frequency |
Publication Type | Conference Paper |
Year of Publication | 2003 |
Authors | Creutz M |
Conference Name | ACL’03 |
Abstract | We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant prior information on the length and frequency distributions of morphs in a language. Our algorithm is shown to outperform two competing algorithms, when evaluated on data from a language with agglutinative morphology (Finnish), and to perform well also on English data. |
URL | http://lib.tkk.fi/Diss/2006/isbn9512282119/article2.pdf |
- Log in or register to post comments
- Google Scholar