Journal of Logic, Language and Information

, Volume 13, Issue 4, pp 439–455 | Cite as

Learning Local Transductions Is Hard

  • Martin Jansche
Original Article


Local deterministic string-to-string transductions arise in natural language processing (NLP) tasks such as letter-to-sound translation or pronunciation modeling. This class of transductions is a simple generalization of morphisms of free monoids; learning local transductions is essentially the same as inference of certain monoid morphisms. However, learning even a highly restricted class of morphisms, the so-called fine morphisms, leads to intractable problems: deciding whether a hypothesized fine morphism is consistent with observations is an NP-complete problem; and maximizing classification accuracy of the even smaller class of alphabetic substitution morphisms is APX-hard. These theoretical results provide some justification for using the kinds of heuristics that are commonly used for this learning task.

Key words

Boolean satisfiability combinatorial optimization formal languages letter-to-sound rules machine learning natural language processing NP completeness rational transductions 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aho, A.V., Hopcroft, J.E., and Ullman, J.D., 1983, Data Structures and Algorithms, Addison-Wesley Series in Computer Science and Information Processing, Reading, MA: Addison-Wesley.Google Scholar
  2. Angluin, D., 1982, “Inference of reversible languages,” Journal of the ACM 29(3), 741–765.CrossRefGoogle Scholar
  3. Ausiello, G.,Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A. and Protasi, M., 1999, Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties, Berlin, Germany: Springer.Google Scholar
  4. Bakiri, G. and Dietterich, T.G., 2001, “Constructing high-accuracy letter-to-phoneme rules with machine learning,” pp. 27–44 in Data-Driven Techniques in Speech Synthesis, R.I. Damper, ed., No. 9in Telecommunications Technology and Applications, Boston, MA: Kluwer.Google Scholar
  5. Damper, R.I., Marchand, Y., Adamson, M.J., and Gustafson, K., 1999, “Evaluating the pronunciation component of text-to-speech systems for English: A performance comparison of different approaches,” Computer Speech and Language 13(2), 155–176.CrossRefGoogle Scholar
  6. Eilenberg, S., 1974, Automata, Languages, and Machines, Vol. A., New York, NY: Academic Press.Google Scholar
  7. Fisher, W.M., 1999, “A statistical text-to-phone function using Ngrams and rules,” pp. 649–652 in International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ.Google Scholar
  8. García, P. and Vidal, E., 1990, “Inference of k-testable languages in the strict sense and application to syntactic pattern recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 920–925.CrossRefGoogle Scholar
  9. Garey, M.R. and Johnson, D.S., 1979, Computers and Intractability: A Guide to the Theory of NP-Completeness, San Francisco, CA: W.H. Freeman.Google Scholar
  10. Gildea, D. and Jurafsky, D., 1996, “Learning bias and phonological-rule induction,” Computational Linguistics 22(4), 497–530.Google Scholar
  11. Gold, E.M., 1967, “Language identification in the limit,” Information and Control 10(5), 447–474.CrossRefGoogle Scholar
  12. Hyafil, L. and Rivest, R.L., 1976, “Constructing optimal binary decision trees is NP-complete,” Information Processing Letters 5(1), 15–17.CrossRefGoogle Scholar
  13. International Phonetic Association: 1999, Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet, Cambridge, U.K.: Cambridge University Press.Google Scholar
  14. Jansche, M., 2001, “Re-engineering letter-to-sound rules,” pp. 111–117 in Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics. Pittsburgh, PA.Google Scholar
  15. Jansche, M., 2003, “Inference of string mappings for language technology,” Ph.D. Thesis, The Ohio State University, Columbus.Google Scholar
  16. Kearns, M.J. and Vazirani, U.V., 1994, An Introduction to Computational Learning Theory. Cambridge, MA: MIT Press, Second printing, 1997.Google Scholar
  17. Kearns, M.J., Schapire, R.E., and Sellie, L.M., 1992, “Toward efficient agnostic learning,” pp. 341–352 in Proceedings of the 5th Annual Workshop on Computational Learning Theory, Philadelphia.Google Scholar
  18. Kruskal, J.B., 1983, “An overview of sequence comparison,” pp. 1–44 in Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. Kruskal, eds., Reading, MA: Addison-Wesley Reissued by CSLI Publications, Stanford, CA,1999.Google Scholar
  19. Lucassen, J.M. and Mercer, R.L., 1984, “An information theoretic approach to the automatic determination of phonemic baseforms,” pp. 42.5.1–42.5.4 in International Conference on Acoustics, Speech, and Signal Processing.Google Scholar
  20. McNaughton, R. and Papert, S., 1972, Counter-Free Automata, Cambridge, MA: MIT Press.Google Scholar
  21. Minka, T.P., 2000, “Empirical risk minimization is an incomplete inductive principle,”
  22. Mohri, M., 1997, “Finite-state transducers in language and speech processing,” Computational Linguistics 23(2), 269–311.Google Scholar
  23. Oncina, J., Garcia, P. and Vidal, E., 1993, “Learning subsequential transducers for pattern recognition interpretation tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence 15(5), 448–458.CrossRefGoogle Scholar
  24. Papadimitriou, C.H., 1994, Computational Complexity, Reading, MA: Addison-Wesley.Google Scholar
  25. Papadimitriou, C.H. and Steiglitz, K., 1998, Combinatorial Optimization: Algorithms and Complexity, Mineola, NY: Dover Publications. Originally published by Prentice Hall, Englewood Cliffs, NJ, 1982.Google Scholar
  26. Pitt, L., 1989, “Inductive inference, DFAs, and computational complexity,” pp. 18–44 in Analogical and Inductive Inference, International Workshop AII’ 89, Reinhardsbrunn Castle, GDR, October 1–6, 1989, Proceedings, Vol. 397 of Lecture Notes in Computer Science, K.P. Jantke, ed., Berlin, Germany: Springer.Google Scholar
  27. Pitt, L. and Warmuth, M.K., 1993, “The minimum consistent DFA problem cannot be approximated within any polynomial,” Journal of the ACM 40(1), 95–142.CrossRefGoogle Scholar
  28. Roche, E. and Schabes, Y., eds., 1997, Finite-State Language Processing, Language, Speech and Communication, Cambridge, MA: MIT Press.Google Scholar
  29. Sejnowski, T.J. and Rosenberg, C.R., 1987, “Parallel networks that learn to pronounce English text,” Complex Systems 1(1), 145–168.Google Scholar
  30. Sproat, R., Möbius, B., Maeda, K. and Tzoukermann, E., 1998, “Multilingual text analysis,” Chapt. 3, pp. 31–87 in Multilingual Text-to-Speech Synthesis: The Bell Labs Approach, R. Sproat, ed., Dordrecht, The Netherlands: Kluwer Academic Publishers.Google Scholar
  31. Valiant, L.G., 1984, “A theory of the learnable,” Communications of the ACM 27(11), 1134–1142.CrossRefGoogle Scholar
  32. van den Bosch, A.P.J., 1997, “Learning to pronounce written words: A study in inductive language learning,” Ph.D. Thesis, Universiteit Maastricht, Maastricht, The Netherlands.Google Scholar
  33. Wagner, R.A. and Fischer, M.J., 1974, “The string-to-string correction problem,” Journal of the ACM 21(1), 168–173.CrossRefGoogle Scholar
  34. Weide, R.L., 1998, “The Carnegie Mellon pronouncing dictionary version 0.6,” {electronic document}, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  1. 1.Center for Computational Learning Systems, Columbia UniversityNew YorkU.S.A.

Personalised recommendations