Machine Learning

, Volume 96, Issue 1–2, pp 129–154 | Cite as

PAutomaC: a probabilistic automata and hidden Markov models learning competition

  • Sicco Verwer
  • Rémi Eyraud
  • Colin de la Higuera
Article

Abstract

Approximating distributions over strings is a hard learning problem. Typical techniques involve using finite state machines as models and attempting to learn these; these machines can either be hand built and then have their weights estimated, or built by grammatical inference techniques: the structure and the weights are then learned simultaneously. The Probabilistic Automata learning Competition (PAutomaC), run in 2012, was the first grammatical inference challenge that allowed the comparison between these methods and algorithms. Its main goal was to provide an overview of the state-of-the-art techniques for this hard learning problem. Both artificial data and real data were presented and contestants were to try to estimate the probabilities of strings. The purpose of this paper is to describe some of the technical and intrinsic challenges such a competition has to face, to give a broad state of the art concerning both the problems dealing with learning grammars and finite state machines and the relevant literature. This paper also provides the results of the competition and a brief description and analysis of the different approaches the main participants used.

Keywords

Grammatical inference Probabilistic automata Hidden Markov models Programming competition 

Notes

Acknowledgements

We are very thankful to the members of the scientific committee for their help in designing this competition. We want to thank all participants and in particular Raphael Bailly, Cleo Billa, Mans Hulden, Fabio Kepler, David Llorens, Sergio Mergen, Shihiro Shibata, and Ryo Yoshinaka for their help during the writing of this paper.

References

  1. Abe, N., & Warmuth, M. (1992). On the computational complexity of approximating distributions by probabilistic automata. Machine Learning Journal, 9, 205–260. MATHGoogle Scholar
  2. Angluin, D. (1988). Identifying languages from stochastic examples (Technical Report Yaleu/Dcs/RR-614). Yale University. Google Scholar
  3. Bailly, R. (2011). QWA: spectral algorithm. In JMLR—workshop and conference proceedings: Vol. 20. Proceedings of the Asian conference on machine learning, ACML’11 (pp. 147–163). Cambridge: JMLR. Google Scholar
  4. Bailly, R., Denis, F., & Ralaivola, L. (2009). Grammatical inference as a principal component analysis problem. In Proceedings of the international conference on machine learning ICML’09 (pp. 33–40). Omnipress. Google Scholar
  5. Balle, B., Castro, J., & Gavaldà, R. (2012). Bootstrapping and learning PDFA in data streams. In JMLR—workshop and conference proceedings: Vol. 21. Proceedings of the international conference on grammatical inference ICGI’12 (pp. 34–48). Cambridge: JMLR. Google Scholar
  6. Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities, 3, 1–8. Google Scholar
  7. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41, 164–171. MATHMathSciNetCrossRefGoogle Scholar
  8. Beimel, A., Bergadano, F., Bshouty, N. H., Kushilevitz, E., & Varricchio, S. (2000). Learning functions represented as multiplicity automata. Journal of the ACM, 47(3), 506–530. MATHMathSciNetCrossRefGoogle Scholar
  9. Bergadano, F., & Varricchio, S. (1996). Learning behaviors of automata from multiplicity and equivalence queries. SIAM Journal on Computing, 25(6), 1268–1280. MATHMathSciNetCrossRefGoogle Scholar
  10. Blei, D. M., & Jordan, M. (2006). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1), 121–143. MathSciNetCrossRefGoogle Scholar
  11. Borges, J., & Levene, M. (2000). Data mining of user navigation patterns. In LNCS: Vol. 1836. Web usage mining and user profiling—WEBKDD’99 workshop (pp. 92–111). Berlin: Springer. CrossRefGoogle Scholar
  12. Brill, E., Florian, R., Henderson, J. C., & Mangu, L. (1998). Beyond n-grams: can linguistic sophistication improve language modeling. In Proceedings of the joint conference of the international committee on computational linguistics and the association for computational linguistics COLING-ACL’98 (pp. 186–190). Los Altos: Kaufmann/ACL. Google Scholar
  13. Carrasco, R. C., & Oncina, J. (1994). Learning stochastic regular grammars by means of a state merging method. In LNAI: Vol. 862. Proceedings of the international colloquium on grammatical inference ICGI’94 (pp. 139–150). Berlin: Springer. Google Scholar
  14. Carrasco, R. C., Forcada, M., & Santamaria, L. (1996). Inferring stochastic regular grammars with recurrent neural networks. In LNAI: Vol. 1147. Proceedings of the international colloquium on grammatical inference ICGI’96 (pp. 274–281). Berlin: Springer. Google Scholar
  15. Carrasco, R. C., Oncina, J., & Calera-Rubio, J. (2001). Stochastic inference of regular tree languages. Machine Learning Journal, 44(1), 185–197. MATHCrossRefGoogle Scholar
  16. Castro, J., & Gavaldá, R. (2008). Towards feasible PAC-learning of probabilistic deterministic finite automata. In LNCS: Vol. 5278. Proceedings of the international colloquium on grammatical inference ICGI’08 (pp. 163–174). Berlin: Springer. Google Scholar
  17. Chen, S. F., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. In Proceedings of the meeting of the association for computational linguistics ACL’96 (pp. 310–318). Stroudsburg: Association for Computational Linguistics. Google Scholar
  18. Clark, A., & Thollard, F. (2004). PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5, 473–497. MATHMathSciNetGoogle Scholar
  19. Cover, T., & Thomas, J. (1991). Elements of information theory. New York: Wiley. MATHCrossRefGoogle Scholar
  20. Cruz-Alcázar, P., & Vidal, E. (2008). Two grammatical inference applications in music processing. Applied Artificial Intelligence, 22(1–2), 53–76. CrossRefGoogle Scholar
  21. de la Higuera, C. (2010). Grammatical inference: learning automata and grammars. Cambridge: Cambridge University Press. Google Scholar
  22. de la Higuera, C., & Oncina, J. (2003). Identification with probability one of stochastic deterministic linear languages. In LNCS: Vol. 2842. Proceedings of the international conference on algorithmic learning theory ALT’03 (pp. 134–148). Berlin: Springer. Google Scholar
  23. de la Higuera, C., & Oncina, J. (2004). Learning probabilistic finite automata. In LNAI: Vol. 3264. Proceedings of the international colloquium on grammatical inference ICGI’04 (pp. 175–186). Berlin: Springer. Google Scholar
  24. de la Higuera, C., & Thollard, F. (2000). Identification in the limit with probability one of stochastic deterministic finite automata. In LNAI: Vol. 1891. Proceedings of the international colloquium on grammatical inference ICGI’00 (pp. 15–24). Berlin: Springer. Google Scholar
  25. Denis, F., & Esposito, Y. (2004). Learning classes of probabilistic automata. In LNCS: Vol. 3120. Proceedings of the conference on learning theory COLT’04 (pp. 124–139). Berlin: Springer. Google Scholar
  26. Denis, F., Lemay, A., & Terlutte, A. (2000). Learning regular languages using non-deterministic finite automata. In LNAI: Vol. 1891. Proceedings of the international colloquium on grammatical inference ICGI’00 (pp. 39–50). Berlin: Springer. Google Scholar
  27. Denis, F., Lemay, A., & Terlutte, A. (2001). Learning regular languages using RFSA. In LNCS: Vol. 2225. Proceedings of the international conference on algorithmic learning theory ALT’01 (pp. 348–363). Berlin: Springer. CrossRefGoogle Scholar
  28. Denis, F., Esposito, Y., & Habrard, A. (2006). Learning rational stochastic languages. In LNCS: Vol. 4005. Proceedings of the conference on learning theory COLT’06 (pp. 274–288). Berlin: Springer. Google Scholar
  29. Dupont, P., & Amengual, J.-C. (2000). Smoothing probabilistic automata: an error-correcting approach. In LNAI: Vol. 1891. Proceedings of the international colloquium on grammatical inference ICGI’00 (pp. 51–62). Berlin: Springer. Google Scholar
  30. Dupont, P., Denis, F., & Esposito, Y. (2005). Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recognition, 38(9), 1349–1371. MATHCrossRefGoogle Scholar
  31. Esposito, Y., Lemay, A., Denis, F., & Dupont, P. (2002). Learning probabilistic residual finite state automata. In LNAI: Vol. 2484. Proceedings of the international colloquium on grammatical inference ICGI’02 (pp. 77–91). Berlin: Springer. Google Scholar
  32. Gao, J., & Johnson, M. (2008). A comparison of Bayesian estimators for unsupervised hidden Markov model POS taggers. In Proceedings of the conference on empirical methods in natural language processing EMNLP’08 (pp. 344–352). Stroudsburg: Association for Computational Linguistics. CrossRefGoogle Scholar
  33. Gavaldà, R., Keller, P. W., Pineau, J., & Precup, D. (2006). PAC-learning of Markov models with hidden state. In LNCS: Vol. 4212. Proceedings of the European conference on machine learning ECML’06 (pp. 150–161). Berlin: Springer. Google Scholar
  34. Gelfand, A., & Smith, A. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398–409. MATHMathSciNetCrossRefGoogle Scholar
  35. Gildea, D., & Jurafsky, D. (1996). Learning bias and phonological-rule induction. Computational Linguistics, 22, 497–530. Google Scholar
  36. Goan, T., Benson, N., & Etzioni, O. (1996). A grammar inference algorithm for the world wide web. In Proceedings of AAAI spring symposium on machine learning in information access, Stanford, CA. Menlo Park: AAAI Press. Google Scholar
  37. Grünwald, P. (2007). The minimum description length principle. Cambridge: MIT Press. Google Scholar
  38. Guttman, O. (2006). Probabilistic automata and distributions over sequences. PhD thesis, The Australian National University. Google Scholar
  39. Guttman, O., Vishwanathan, S. V. N., & Williamson, R. C. (2005). Learnability of probabilistic automata via oracles. In LNCS: Vol. 3734. Proceedings of the international conference on algorithmic learning theory ALT’05 (pp. 171–182). Berlin: Springer. CrossRefGoogle Scholar
  40. Habrard, A., Bernard, M., & Sebban, M. (2003). Improvement of the state merging rule on noisy data in probabilistic grammatical inference. In LNAI: Vol. 2837. Proceedings of the European conference on machine learning ECML’03 (pp. 169–180). Berlin: Springer. Google Scholar
  41. Habrard, A., Denis, F., & Esposito, Y. (2006). Using pseudo-stochastic rational languages in probabilistic grammatical inference. In LNAI: Vol. 4201. Proceedings of the international colloquium on grammatical inference ICGI’06 (pp. 112–124). Berlin: Springer. Google Scholar
  42. Hasan Ibne, A., Batard, A., de la Higuera, C., & Eckert, C. (2010). PMSA: a parallel algorithm for learning regular languages. In NIPS workshop on learning on cores, clusters and clouds. Google Scholar
  43. Heule, M., & Verwer, S. (2010). Exact DFA identification using SAT solvers. In LNCS: Vol. 6339. Proceedings of international colloquium on grammatical inference ICGI’10 (pp. 66–79). Google Scholar
  44. Horning, J. J. (1969). A study of grammatical inference. PhD thesis, Stanford University. Google Scholar
  45. Hulden, M. (2012). Treba: efficient numerically stable EM for PFA. In JMLR—workshop and conference proceedings: Vol. 21. Proceedings of the international conference on grammatical inference ICGI’12 (pp. 249–253). Cambridge: JMLR. Google Scholar
  46. Jelinek, F. (1998). Statistical methods for speech recognition. Cambridge: MIT Press. Google Scholar
  47. Kearns, M. J., & Vazirani, U. (1994). An introduction to computational learning theory. Cambridge: MIT Press. Google Scholar
  48. Kearns, M. J., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R. E., & Sellie, L. (1994). On the learnability of discrete distributions. In Proceedings of the twenty-sixth annual ACM symposium on theory of computing STOC’94 (pp. 273–282). New York: ACM. CrossRefGoogle Scholar
  49. Kepler, F., Mergen, S., & Billa, C. (2012). Simple variable length n-grams for probabilistic automata learning. In JMLR—workshop and conference proceedings: Vol. 21. Proceedings of the international conference on grammatical inference ICGI’12 (pp. 254–258). Cambridge: JMLR. Google Scholar
  50. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. MATHMathSciNetCrossRefGoogle Scholar
  51. Lang, K. J., Pearlmutter, B. A., & Price, R. A. (1998). Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In LNAI: Vol. 1433. Proceedings of the international colloquium on grammatical inference ICGI’98 (pp. 1–12). Berlin: Springer. CrossRefGoogle Scholar
  52. Lee, D., & Yannakakis, M. (1996). Principles and methods of testing finite state machines—a survey. Proceedings of the IEEE, 84(8), 1090–1123. CrossRefGoogle Scholar
  53. Milani Comparetti, P., Wondracek, G., Kruegel, C., & Kirda, E. (2009). Prospex: protocol specification extraction. In Proceedings of the IEEE symposium on security and privacy (pp. 110–125). Los Alamitos: IEEE Computer Society. Google Scholar
  54. Mohri, M. (1997). Finite-state transducers in language and speech processing. Computational Linguistics, 23(3), 269–311. MathSciNetGoogle Scholar
  55. Neal, R. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2), 249–265. MathSciNetGoogle Scholar
  56. Ney, H., Martin, S., & Wessel, F. (1997). Statistical language modeling using leaving-one-out. In Corpus-based statiscal methods in speech and language processing (pp. 174–207). Norwell: Kluwer Academic. CrossRefGoogle Scholar
  57. Palmer, N., & Goldberg, P. W. (2005). PAC-learnability of probabilistic deterministic finite state automata in terms of variation distance. In LNCS: Vol. 3734. Proceedings of the international conference on algorithmic learning theory ALT’05 (pp. 157–170). Berlin: Springer. CrossRefGoogle Scholar
  58. Partington, J. R. (1988). An introduction to Hankel operators. London mathematical society student texts. Cambridge: Cambridge University Press. MATHGoogle Scholar
  59. Paz, A. (1971). Introduction to probabilistic automata. San Diego: Academic Press. MATHGoogle Scholar
  60. Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286. CrossRefGoogle Scholar
  61. Rivest, R. L., & Schapire, R. E. (1993). Inference of finite automata using homing sequences. Information and Computation, 103, 299–347. MATHMathSciNetCrossRefGoogle Scholar
  62. Ron, D., Singer, Y., & Tishby, N. (1994). Learning probabilistic automata with variable memory length. In Proceedings of the conference on learning theory COLT’94 (pp. 35–46). New York: ACM. Google Scholar
  63. Ron, D., Singer, Y., & Tishby, N. (1995). On the learnability and usage of acyclic probabilistic finite automata. In Proceedings of the conference on learning theory COLT’95 (pp. 31–40). New York: ACM. Google Scholar
  64. Sakakibara, Y. (2005). Grammatical inference in bioinformatics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 1051–1062. CrossRefGoogle Scholar
  65. Sanjeev, A., & Boaz, B. (2009). Computational complexity: a modern approach (1st edn.). Cambridge: Cambridge University Press. Google Scholar
  66. Saul, L., & Pereira, F. (1997). Aggregate and mixed-order Markov models for statistical language processing. In Proceedings of the second conference on empirical methods in natural language processing EMNLP’97 (pp. 81–89). Stroudsburg: Association for Computational Linguistics. Google Scholar
  67. Shalizi, C. R., & Crutchfield, J. P. (2001). Computational mechanics: pattern and prediction, structure and simplicity. Journal of Statistical Physics, 104, 817–879. MATHMathSciNetCrossRefGoogle Scholar
  68. Shibata, C., & Yoshinaka, R. (2012). Marginalizing out transition probabilities for several subclasses of PFAs. In JMLR—workshop and conference proceedings: Vol. 21. Proceedings of the international conference on grammatical inference ICGI’12 (pp. 259–263). Google Scholar
  69. Stolcke, A. (1994). Bayesian learning of probabilistic language models. Ph.D. dissertation, University of California. Google Scholar
  70. Sudkamp, A. (2006). Languages and machines: an introduction to the theory of computer science (third edn.). Reading: Addison-Wesley. Google Scholar
  71. Thollard, F. (2001). Improving probabilistic grammatical inference core algorithms with post-processing techniques. In Proceedings of the international conference on machine learning ICML’01 (pp. 561–568). Los Altos: Kauffman. Google Scholar
  72. Thollard, F., & Clark, A. (2004). PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5, 473–497. MATHMathSciNetGoogle Scholar
  73. Thollard, F., & Dupont, P. (1999). Entropie relative et algorithmes d’inférence grammaticale probabiliste. In Actes de la conférence CAP’99 (pp. 115–122). Google Scholar
  74. Thollard, F., Dupont, P., & de la Higuera, C. (2000). Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In Proceedings of the international conference on machine learning ICML’00 (pp. 975–982). Los Altos: Kaufmann. Google Scholar
  75. Verwer, S., Weerdt, M., & Witteveen, C. (2010). A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data. In LNCS: Vol. 6339. Proceedings of the international colloquium on grammatical inference ICGI’10 (pp. 203–216). Berlin: Springer. Google Scholar
  76. Verwer, S., de Weerdt, M., & Witteveen, C. (2011). Learning driving behavior by timed syntactic pattern recognition. In Proceedings of the international joint conference on artificial intelligence IJCAI’11 (pp. 1529–1534). Google Scholar
  77. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., & Carrasco, R. C. (2005a). Probabilistic finite state automata—part I. Pattern Analysis and Machine Intelligence, 27(7), 1013–1025. CrossRefGoogle Scholar
  78. Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., & Carrasco, R. C. (2005b). Probabilistic finite state automata—part II. Pattern Analysis and Machine Intelligence, 27(7), 1026–1039. CrossRefGoogle Scholar
  79. Walkinshaw, N., Lambeau, B., Damas, C., Bogdanov, K., & Dupont, P. (2012). Stamina: a competition to encourage the development and assessment of software model inference techniques. In Empirical software engineering (pp. 1–34). Google Scholar
  80. Wetherell, C. S. (1980). Probabilistic languages: a review and some open questions. Computing Surveys, 12(4), 361–379. MATHMathSciNetCrossRefGoogle Scholar
  81. Young, T. Y. (1994). Handbook of pattern recognition and image processing: computer vision (Vol. 2). San Diego: Academic Press. MATHGoogle Scholar
  82. Young-Lai, M., & Tompa, F. W. (2000). Stochastic grammatical inference of text database structure. Machine Learning Journal, 40(2), 111–137. CrossRefGoogle Scholar
  83. Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22, 179–214. CrossRefGoogle Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Sicco Verwer
    • 1
  • Rémi Eyraud
    • 2
  • Colin de la Higuera
    • 3
  1. 1.Institute for Computing and Information SciencesRadboud University NijmegenNijmegenThe Netherlands
  2. 2.QARMA teamLaboratoire d’Informatique Fondamentale de MarseilleMarseilleFrance
  3. 3.TALN team, Laboratoire d’Informatique de Nantes AtlantiqueNantes UniversityNantes Cedex 1France

Personalised recommendations