Automating Formalization by Statistical and Semantic Parsing of Mathematics

  • Cezary KaliszykEmail author
  • Josef Urban
  • Jiří Vyskočil
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10499)


We discuss the progress in our project which aims to automate formalization by combining natural language processing with deep semantic understanding of mathematical expressions. We introduce the overall motivation and ideas behind this project, and then propose a context-based parsing approach that combines efficient statistical learning of deep parse trees with their semantic pruning by type checking and large-theory automated theorem proving. We show that our learning method allows efficient use of large amount of contextual information, which in turn significantly boosts the precision of the statistical parsing and also makes it more efficient. This leads to a large improvement of our first results in parsing theorems from the Flyspeck corpus.


  1. 1.
    Bancerek, G., Rudnicki, P.: A compendium of continuous lattices in MIZAR. J. Autom. Reason. 29(3–4), 189–224 (2002)CrossRefzbMATHGoogle Scholar
  2. 2.
    Blanchette, J.C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED. J. Formaliz. Reason. 9(1), 101–148 (2016)MathSciNetGoogle Scholar
  3. 3.
    Collins, M.: Three generative, lexicalised models for statistical parsing. In: Cohen, P.R., Wahlster, W. (eds.) Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 16–23. Morgan Kaufmann Publishers/ACL (1997)Google Scholar
  4. 4.
    The Coq Proof Assistant.
  5. 5.
    Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)CrossRefGoogle Scholar
  6. 6.
    Dudani, S.A.: The distance-weighted K-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 6(4), 325–327 (1976)CrossRefGoogle Scholar
  7. 7.
    Garillot, F., Gonthier, G., Mahboubi, A., Rideau, L.: Packaging mathematical structures. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 327–342. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03359-9_23 CrossRefGoogle Scholar
  8. 8.
    Gauthier, T., Kaliszyk, C.: Matching concepts across HOL libraries. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 267–281. Springer, Cham (2014). doi: 10.1007/978-3-319-08434-3_20 CrossRefGoogle Scholar
  9. 9.
    Gonthier, G., et al.: A machine-checked proof of the odd order theorem. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 163–179. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39634-2_14 CrossRefGoogle Scholar
  10. 10.
    Gonthier, G., Tassi, E.: A language of patterns for subterm selection. In: Beringer, L., Felty, A. (eds.) ITP 2012. LNCS, vol. 7406, pp. 361–376. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-32347-8_25 CrossRefGoogle Scholar
  11. 11.
    Grabowski, A., Korniłowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formaliz. Reason. 3(2), 153–245 (2010)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Greenbaum, S.: Input transformations and resolution implementation techniques for theorem-proving in first-order logic. Ph.D. thesis, University of Illinois at Urbana-Champaign (1986)Google Scholar
  13. 13.
    Haftmann, F., Wenzel, M.: Constructive type classes in isabelle. In: Altenkirch, T., McBride, C. (eds.) TYPES 2006. LNCS, vol. 4502, pp. 160–174. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74464-1_11 CrossRefGoogle Scholar
  14. 14.
    Hales, T.: Dense Sphere Packings a Blueprint for Formal Proofs, London Mathematical Society Lecture Note Series, vol. 400. Cambridge University Press, Cambridge (2012)CrossRefGoogle Scholar
  15. 15.
    Hales, T.C., Adams, M., Bauer, G., Dang, D.T., Harrison, J., Hoang, T.L., Kaliszyk, C., Magron, V., McLaughlin, S., Nguyen, T.T., Nguyen, T.Q., Nipkow, T., Obua, S., Pleso, J., Rute, J., Solovyev, A., Ta, A.H.T., Tran, T.N., Trieu, D.T., Urban, J., Vu, K.K., Zumkeller, R.: A formal proof of the Kepler conjecture. CoRR, abs/1501.02155, 2015Google Scholar
  16. 16.
    Harrison, J.: HOL Light: a tutorial introduction. In: Srivas, M., Camilleri, A. (eds.) FMCAD 1996. LNCS, vol. 1166, pp. 265–269. Springer, Heidelberg (1996). doi: 10.1007/BFb0031814 CrossRefGoogle Scholar
  17. 17.
    Harrison, J., Urban, J., Wiedijk, F.: History of interactive theorem proving. In: Siekmann, J.H. (ed.) Computational Logic. Handbook of the History of Logic, vol. 9. Elsevier, Amsterdam (2014)CrossRefGoogle Scholar
  18. 18.
    Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. J. Autom. Reason. 53(2), 173–213 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kaliszyk, C., Urban, J., Vyskocil, J.: System description: statistical parsing of informalized Mizar formulas.
  20. 20.
    Kaliszyk, C., Urban, J., Vyskočil, J.: Efficient semantic features for automated reasoning over large theories. In: Yang, Q., Wooldridge, M. (eds.) IJCAI 2015, pp. 3084–3090. AAAI Press, Menlo Park (2015)Google Scholar
  21. 21.
    Kaliszyk, C., Urban, J., Vyskočil, J.: Learning to parse on aligned corpora (rough diamond). In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 227–233. Springer, Cham (2015). doi: 10.1007/978-3-319-22102-1_15 Google Scholar
  22. 22.
    Kaliszyk, C., Urban, J., Vyskočil, J., Geuvers, H.: Developing corpus-based translation methods between informal and formal mathematics: project description. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 435–439. Springer, Cham (2014). doi: 10.1007/978-3-319-08434-3_34 CrossRefGoogle Scholar
  23. 23.
    Klein, G., Andronick, J., Elphinstone, K., Heiser, G., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: seL4: formal verification of an operating-system kernel. Commun. ACM 53(6), 107–115 (2010)CrossRefGoogle Scholar
  24. 24.
    Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 1–35. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39799-8_1 CrossRefGoogle Scholar
  25. 25.
    Lange, M., Leiß, H.: To CNF or not to CNF? an efficient yet presentable version of the CYK algorithm. Inform. Didact. 8, 1–21 (2009). Google Scholar
  26. 26.
    Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)CrossRefGoogle Scholar
  27. 27.
    Robinson, J.A., Voronkov, A. (eds.): Handbook of Automated Reasoning (in 2 Volumes). Elsevier and MIT Press, Cambridge (2001)zbMATHGoogle Scholar
  28. 28.
    Rudnicki, P., Schwarzweller, C., Trybulec, A.: Commutative algebra in the Mizar system. J. Symb. Comput. 32(1/2), 143–169 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Tankink, C., Kaliszyk, C., Urban, J., Geuvers, H.: Formal mathematics on display: a wiki for Flyspeck. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS, vol. 7961, pp. 152–167. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39320-4_10 CrossRefGoogle Scholar
  30. 30.
    Urban, J., Vyskočil, J.: Theorem proving in large formal mathematics as an emerging AI field. In: Bonacina, M.P., Stickel, M.E. (eds.) Automated Reasoning and Mathematics. LNCS, vol. 7788, pp. 240–257. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-36675-8_13 CrossRefGoogle Scholar
  31. 31.
    Wenzel, M., Paulson, L.C., Nipkow, T.: The Isabelle framework. In: Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 33–38. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-71067-7_7 CrossRefGoogle Scholar
  32. 32.
    Younger, D.H.: Recognition and parsing of context-free languages in time \(n^{3}\). Inf. Control 10(2), 189–208 (1967)CrossRefzbMATHGoogle Scholar
  33. 33.
    Zinn, C.: Understanding informal mathematical discourse. Ph.D. thesis, University of Erlangen-Nuremberg (2004)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Cezary Kaliszyk
    • 1
    Email author
  • Josef Urban
    • 2
  • Jiří Vyskočil
    • 2
  1. 1.University of InnsbruckInnsbruckAustria
  2. 2.Czech Technical University in PraguePragueCzech Republic

Personalised recommendations