Tiburon: A Weighted Tree Automata Toolkit

  • Jonathan May
  • Kevin Knight
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4094)


The availability of weighted finite-state string automata toolkits made possible great advances in natural language processing. However, recent advances in syntax-based NLP model design are unsuitable for these toolkits. To combat this problem, we introduce a weighted finite-state tree automata toolkit, which incorporates recent developments in weighted tree automata theory and is useful for natural language applications such as machine translation, sentence compression, question answering, and many more.


Natural Language Processing Machine Translation Statistical Machine Transla Tree Automaton Tree Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kaplan, R.M., Kay, M.: Phonological rules and finite-state transducers. In: Linguistic Society of America Meeting Handbook, Fifty-Sixth Annual Meeting (1981) (abstract)Google Scholar
  2. 2.
    Koskenniemi, K.: Two-level morphology: A general computational model for word-form recognition and production. Publication 11, University of Helsinki, Department of General Linguistics, Helsinki (1983)Google Scholar
  3. 3.
    Karttunen, L., Beesley, K.R.: A short history of two-level morphology. In: ESSLLI 2001, Special Event titled Twenty Years of Finite-State Morphology, Helsinki, Finland (2001)Google Scholar
  4. 4.
    Karttunen, L., Beesley, K.R.: Two-level rule compiler. Technical Report ISTL-92-2, Xerox Palo Alto Research Center, Palo Alto, CA (1992)Google Scholar
  5. 5.
    Karttunen, L., Kaplan, R.M., Zaenen, A.: Two-level morphology with composition. In: COLING Proceedings (1992)Google Scholar
  6. 6.
    Karttunen, L.: The replace operator. In: ACL Proceedings (1995)Google Scholar
  7. 7.
    Karttunen, L.: Directed replacement. In: ACL Proceedings (1996)Google Scholar
  8. 8.
    Riccardi, G., Pieraccini, R., Bocchieri, E.: Stochastic automata for language modeling. Computer Speech & Language 10(4) (1996)Google Scholar
  9. 9.
    Ljolje, A., Riley, M.D.: Optimal speech recognition using phone recognition and lexical access. In: ICSLP Proceedings (1992)Google Scholar
  10. 10.
    Mohri, M., Pereira, F.C.N., Riley, M.: The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231 (2000)Google Scholar
  11. 11.
    Mohri, M., Pereira, F.C.N., Riley, M.: A rational design for a weighted finite-state transducer library. In: Proceedings of the 7th Annual AT&T Software Symposium (1997)Google Scholar
  12. 12.
    van Noord, G., Gerdemann, D.: An extendible regular expression compiler for finite-state approaches in natural language processing. In: 4th International Workshop on Implementing Automata (2000)Google Scholar
  13. 13.
    Kanthak, S., Ney, H.: Fsa: An efficient and flexible c++ toolkit for finite state automata using on-demand computation. In: ACL Proceedings (2004)Google Scholar
  14. 14.
    Graehl, J.: Carmel finite-state toolkit (1997),
  15. 15.
    Kaiser, E., Schalkwyk, J.: Building a robust, skipping parser within the AT&T FSM toolkit. Technical report, Center for Human Computer Communication, Oregon Graduate Institute of Science and Technology (2001)Google Scholar
  16. 16.
    van Noord, G.: Treatment of epsilon moves in subset construction. Comput. Linguist. 26(1) (2000)Google Scholar
  17. 17.
    Koehn, P., Knight, K.: Feature-rich statistical translation of noun phrases. In: ACL Proceedings (2003)Google Scholar
  18. 18.
    Pereira, F., Riley, M.: Speech recognition by composition of weighted finite automata. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing. MIT Press, Cambridge (1997)Google Scholar
  19. 19.
    Mohri, M.: Finite-state transducers in language and speech processing. Comput. Linguist. 23(2) (1997)Google Scholar
  20. 20.
    Rounds, W.C.: Mappings and grammars on trees. Mathematical Systems Theory 4 (1970)Google Scholar
  21. 21.
    Och, F.J., Tillmann, C., Ney, H.: Improved alignment models for statistical machine translation. In: EMNLP/VLC Proceedings (1999)Google Scholar
  22. 22.
    Yamada, K., Knight, K.: A syntax-based statistical translation model. In: ACL Proceedings (2001)Google Scholar
  23. 23.
    Eisner, J.: Learning non-isomorphic tree mappings for machine translation. In: ACL Proceedings (companion volume) (2003)Google Scholar
  24. 24.
    Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139 (2002)Google Scholar
  25. 25.
    Pang, B., Knight, K., Marcu, D.: Syntax-based alignment of multiple translations extracting paraphrases and generating new sentences. In: NAACL Proceedings (2003)Google Scholar
  26. 26.
    Charniak, E.: Immediate-head parsing for language models. In: ACL Proceedings (2001)Google Scholar
  27. 27.
    Yamada, K.: A Syntax-Based Translation Model. PhD thesis, University of Southern California (2002)Google Scholar
  28. 28.
    Allauzen, C., Mohri, M., Roark, B.: A general weighted grammar library. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 23–34. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  29. 29.
    Knight, K., Graehl, J.: An overview of probabilistic tree transducers for natural language processing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 1–24. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  30. 30.
    Thatcher, J.W.: Generalized2 sequential machines. J. Comput. System Sci. 4 (1970)Google Scholar
  31. 31.
    Gécseg, F., Steinby, M.: Tree Automata. Akadémiai Kiadó, Budapest (1984)MATHGoogle Scholar
  32. 32.
    Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications (1997) (release October 1, 2002), Available on:
  33. 33.
    Genet, T., Tong, V.V.T.: Reachability analysis of term rewriting systems with timbuk. In: Nieuwenhuis, R., Voronkov, A. (eds.) LPAR 2001. LNCS, vol. 2250, p. 695. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  34. 34.
    Borovansky, P., Kirchner, C., Kirchner, H., Moreau, P., Vittek, M.: Elan: A logical framework based on computational systems. In: Proceedings of the first international workshop on rewriting logic (1996)Google Scholar
  35. 35.
    Henriksen, J., Jensen, J., Jørgensen, M., Klarlund, N., Paige, B., Rauhe, T., Sandholm, A.: Mona: Monadic second-order logic in practice. In: Brinksma, E., Steffen, B., Cleaveland, W.R., Larsen, K.G., Margaria, T. (eds.) TACAS 1995. LNCS, vol. 1019. Springer, Heidelberg (1995)Google Scholar
  36. 36.
    Magidor, M., Moran, G.: Probabilistic tree automata. Israel Journal of Mathematics 8 (1969)Google Scholar
  37. 37.
    Fülöp, Z., Vogler, H.: Weighted tree transducers. J. Autom. Lang. Comb. 9(1) (2004)Google Scholar
  38. 38.
    Kuich, W.: Tree transducers and formal tree series. Acta Cybernet 14 (1999)Google Scholar
  39. 39.
    Brainerd, W.S.: Tree generating regular systems. Inform. and Control 14 (1969)Google Scholar
  40. 40.
    Knuth, D.: A generalization of Dijkstra’s algorithm. Inform. Process. Lett. 6(1) (1977)Google Scholar
  41. 41.
    Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1 (1959)Google Scholar
  42. 42.
    Huang, L., Chiang, D.: Better k-best parsing. In: IWPT Proceedings (2005)Google Scholar
  43. 43.
    Galley, M., Hopkins, M., Knight, K., Marcu, D.: What’s in a translation rule? In: HLT-NAACL Proceedings (2004)Google Scholar
  44. 44.
    Bod, R.: An efficient implementation of a new DOP model. In: EACL Proceedings (2003)Google Scholar
  45. 45.
    May, J., Knight, K.: A better n-best list: Practical determinization of weighted finite tree automata. In: NAACL Proceedings (2006)Google Scholar
  46. 46.
    Siztus, A., Ortmanns, S.: High quality word graphs using forward-backward pruning. In: Proceedings of the IEEE Conference on Acoustic, Speech and Signal Processing (1999)Google Scholar
  47. 47.
    Graehl, J.: Context-free algorithms (unpublished handout) (2005)Google Scholar
  48. 48.
    Lari, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4 (1990)Google Scholar
  49. 49.
    Aho, A.V., Ullman, J.D.: Translations of a context-free grammar. Inform. and Control 19 (1971)Google Scholar
  50. 50.
    Shieber, S.M.: Synchronous grammars as tree transducers. In: TAG+7 Proceedings (2004)Google Scholar
  51. 51.
    Schabes, Y.: Mathematical and Computational Aspects of Lexicalized Grammars. PhD thesis, Univ. of Pennsylvania, Phila., PA (1990)Google Scholar
  52. 52.
    Engelfriet, J.: Bottom-up and top-down tree transformations. a comparison. Mathematical Systems Theory 9 (1976)Google Scholar
  53. 53.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1) (1977)Google Scholar
  54. 54.
    Graehl, J., Knight, K.: Training tree transducers. In: HLT-NAACL Proceedings (2004)Google Scholar
  55. 55.
    Graehl, J., Knight, K., May, J.: Training tree transducers. Comput. Linguist. (submitted)Google Scholar
  56. 56.
    Knight, K., Graehl, J.: Machine transliteration. Comput. Linguist. 24(4) (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jonathan May
    • 1
  • Kevin Knight
    • 1
  1. 1.Information Sciences InstituteUniversity of Southern CaliforniaMarina Del Rey

Personalised recommendations