Abstract

fst stands for Finite-State Toolkit. It is an enhanced version of the xfst tool described in the 2003 Beesley and Karttunen book Finite State Morphology. Like xfst, fst serves two purposes. It is a development tool for compiling finite-state networks and a runtime tool that applies networks to input strings or files. xfst is limited to morphological analysis and generation. fst can also be used for other applications. This paper describes the new features of the fst regular expression formalism and illustrates their use for named-entity recognition, relation extraction, tokenization and parsing. The fst pattern matching algorithm (pmatch) operates on a single pattern network but the network can be the union of any number of distinct pattern definitions. Many patterns can be matched simultaneously in one pass over a text. This is a distinct fst advantage over pattern matching facilities in languages such as Perl and Python.

Keywords

finite-state automata tokenization pattern matching 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Publications, Palo Alto (2003)Google Scholar
  2. 2.
    Karttunen, L.: Pattern Matching with FST – A Tutorial. Technical Report TR-2010-01. Palo Alto Research Center, Palo Alto, CA (2010)Google Scholar
  3. 3.
    Woods, W.A.: Transition Network Grammars of Natural Language Analysis. Comm. ACM 13(10), 591–606 (1970)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Lauri Karttunen
    • 1
  1. 1.Stanford UniversityPalo AltoUSA

Personalised recommendations