Beyond Morphology: Pattern Matching with FST

  • Lauri Karttunen
Conference paper

DOI: 10.1007/978-3-642-23138-4_1

Part of the Communications in Computer and Information Science book series (CCIS, volume 100)
Cite this paper as:
Karttunen L. (2011) Beyond Morphology: Pattern Matching with FST. In: Mahlow C., Piotrowski M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2011. Communications in Computer and Information Science, vol 100. Springer, Berlin, Heidelberg

Abstract

fst stands for Finite-State Toolkit. It is an enhanced version of the xfst tool described in the 2003 Beesley and Karttunen book Finite State Morphology. Like xfst, fst serves two purposes. It is a development tool for compiling finite-state networks and a runtime tool that applies networks to input strings or files. xfst is limited to morphological analysis and generation. fst can also be used for other applications. This paper describes the new features of the fst regular expression formalism and illustrates their use for named-entity recognition, relation extraction, tokenization and parsing. The fst pattern matching algorithm (pmatch) operates on a single pattern network but the network can be the union of any number of distinct pattern definitions. Many patterns can be matched simultaneously in one pass over a text. This is a distinct fst advantage over pattern matching facilities in languages such as Perl and Python.

Keywords

finite-state automata tokenization pattern matching 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Lauri Karttunen
    • 1
  1. 1.Stanford UniversityPalo AltoUSA

Personalised recommendations