Advertisement

On the Computation of Some Standard Distances Between Probabilistic Automata

  • Corinna Cortes
  • Mehryar Mohri
  • Ashish Rastogi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4094)

Abstract

The problem of the computation of a distance between two probabilistic automata arises in a variety of statistical learning problems. This paper presents an exhaustive analysis of the problem of computing the L p distance between two automata. We give efficient exact and approximate algorithms for computing these distances for p even and prove the problem to be NP-hard for all odd values of p, thereby completing previously known hardness results. We also give an efficient algorithm for computing the Hellinger distance between unambiguous probabilistic automata. Our results include a general algorithm for the computation of the norm of an unambiguous probabilistic automaton based on a monoid morphism and efficient algorithms for the specific case of the computation of the L p norm. Finally, we also describe an efficient algorithm for testing the equivalence of two arbitrary probabilistic automata A 1 and A 2 based on Schützenberger’s standardization with a running time complexity of O(|Σ| (|A 1| + |A 2|)3), a significant improvement over the previously best algorithm reported for this problem.

Keywords

Hellinger Distance Standard Distance Large Clique Probabilistic Automaton Intersection Automaton 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berstel, J., Reutenauer, C.: Rational Series and Their Languages. Springer, Berlin (1988)MATHGoogle Scholar
  2. 2.
    Bloom, S., Ésik, Z.: Iteration Theories. Springer, Berlin (1991)Google Scholar
  3. 3.
    Cortes, C., Mohri, M., Rastogi, A., Riley, M.: Distances between Probabilistic Automata. Preparation journal version (2006)Google Scholar
  4. 4.
    Cortes, C., Mohri, M., Rastogi, A., Riley, M.D.: Efficient Computation of the Relative Entropy of Probabilistic Automata. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 323–336. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Culik II, K., Kari, J.: Digital Images and Formal Languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 3, pp. 599–616. Springer, Heidelberg (1997)Google Scholar
  6. 6.
    Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)MATHCrossRefGoogle Scholar
  7. 7.
    Eilenberg, S.: Automata, Languages and Machines, vol. A–B. Academic Press, London (1974-1976)MATHGoogle Scholar
  8. 8.
    Eisner, J.: Expectation Semirings: Flexible EM for Finite-State Transducers. In: Proceedings of the ESSLLI Workshop on Finite-State Methods in NLP (2001)Google Scholar
  9. 9.
    Engebretsen, L., Holmerin, J.: Clique is hard to approximate within n 1 − o(1). In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 2–12. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Kuich, W., Salomaa, A.: Semirings, Automata, Languages. In: EATCS Monographs on Theoretical Computer Science, vol. 5. Springer, Berlin (1986)Google Scholar
  11. 11.
    Mohri, M.: Finite-State Transducers in Language and Speech Processing. Computational Linguistics 23(2) (1997)Google Scholar
  12. 12.
    Mohri, M.: Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for Weighted Transducers. International Journal of Foundations of Computer Science 13(1), 129–143 (2002)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Mohri, M.: Semiring Frameworks and Algorithms for Shortest-Distance Problems. Journal of Automata, Languages and Combinatorics 7(3), 321–350 (2002)MATHMathSciNetGoogle Scholar
  14. 14.
    Paz, A.: Introduction to probabilistic automata. Academic Press, New York (1971)MATHGoogle Scholar
  15. 15.
    Lyngsø, R.B., Pederson, C.N.S.: The Consensus String Problem and the Complexity of Comparing Hidden Markov Models. Journal of Computer and System Sciences 65(3), 545–569 (2002)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Salomaa, A., Soittola, M.: Automata-Theoretic Aspects of Formal Power Series. Springer, Heidelberg (1978)MATHGoogle Scholar
  17. 17.
    Schützenberger, M.-P.: On the definition of a family of automata. Information and Control 4 (1961)Google Scholar
  18. 18.
    Håstad, J.: Clique is hard to approximate within n 1 − ε. In: FOCS 1996: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, Washington, DC, USA, p. 627. IEEE Computer Society, Los Alamitos (1996)Google Scholar
  19. 19.
    Tzeng, W.-G.: A Polynomial-Time Algorithm for the Equivalence of Probabilistic Automata. Foundations of Computer Science (FOCS), 216–227 (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Corinna Cortes
    • 1
  • Mehryar Mohri
    • 1
    • 2
  • Ashish Rastogi
    • 2
  1. 1.Google ResearchNew YorkUSA
  2. 2.Courant Institute of Mathematical SciencesNew YorkUSA

Personalised recommendations