Abstract  A languageindependent framework for syntactic finltestate parsing is discussed. The article presents a framework, a formalism, a compiler and a parser for grammars written in this formalism. As a substantial example, fragments from a nontrivial finitestate grammar of English are discussed.
The linguistic framework of the present approach is based on a surface syntactic tagging scheme by F. Karlsson. This representation is slightly less powerful than phrase structure tree notation, letting some ambiguous constructions be described more concisely. The finitestate rule compiler implements what was briefly sketched by Koskenniemi (1990). It is based on the calculus of finitestate machines. The compiler transforms rules into ruleautomata. The runtime parser exploits one of certain alternative strategies in performing the effective intersection of the rule automata and the sentence automaton.
Fragments of a fairly comprehensive finitestate grammar of English are presented here, including samples from nonfinite constructions as a demonstration of the capacity of the present
formalism, which goes far beyond plain disambiguation or part of speech tagging. The grammar itself is directly related to a parser and tagging system for English created as a part of project SIMPR I using Karlsson's CG (Constraint Grammar) formalism.
