Learning to Parse on Aligned Corpora (Rough Diamond)

  • Cezary Kaliszyk
  • Josef Urban
  • Jiří Vyskočil
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9236)


One of the first big hurdles that mathematicians encounter when considering writing formal proofs is the necessity to get acquainted with the formal terminology and the parsing mechanisms used in the large ITP libraries. This includes the large number of formal symbols, the grammar of the formal languages and the advanced mechanisms instrumenting the proof assistants to correctly understand the formal expressions in the presence of ubiquitous overloading.

In this work we start to address this problem by developing approximate probabilistic parsing techniques that autonomously train disambiguation on large corpora. Unlike in standard natural language processing, we can filter the resulting parse trees by strong ITP and AR semantic methods such as typechecking and automated theorem proving, and even let the probabilistic methods self-improve based on such semantic feedback. We describe the general motivation and our first experiments, and build an online system for parsing ambiguous formulas over the Flyspeck library.


Parse Tree Ambiguous Sentence Statistical Machine Learning Grammar Tree Parsing System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Blanchette, J. C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED. Accepted to J. Formalized Reasoning (2015). Preprint at
  2. 2.
    Dijkstra, E.W.: The fruits of misunderstanding. Elektronische Rechenanlagen 25(6), 10–13 (1983)Google Scholar
  3. 3.
    Hales, T.: Dense Sphere Packings: A Blueprint for Formal Proofs. London Mathematical Society Lecture Note Series, vol. 400. Cambridge University Press, Cambridge (2012)CrossRefGoogle Scholar
  4. 4.
    Kaliszyk, C., Urban, J.: Stronger automation for Flyspeck by feature weighting and strategy evolution. In: Blanchette, J.C., Urban, J. (eds.) PxTP 2013. EPiC Series, vol. 14, pp. 87–95. EasyChair (2013)Google Scholar
  5. 5.
    Kaliszyk, C., Urban, J.: HOL(y)Hammer: online ATP service for HOL Light. Math. Comput. Sci. 9(1), 5–22 (2015)zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Kaliszyk, C., Urban, J., Vyskočil, J., Geuvers, H.: Developing corpus-based translation methods between informal and formal mathematics: project description. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 435–439. Springer, Heidelberg (2014) Google Scholar
  7. 7.
    Kühlwein, D., van Laarhoven, T., Tsivtsivadze, E., Urban, J., Heskes, T.: Overview and evaluation of premise selection techniques for large theory mathematics. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS, vol. 7364, pp. 378–392. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  8. 8.
    Tankink, C., Kaliszyk, C., Urban, J., Geuvers, H.: Formal mathematics on display: a wiki for flyspeck. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS, vol. 7961, pp. 152–167. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  9. 9.
    Younger, D.H.: Recognition and parsing of context-free languages in time n\(\,{\hat{}}\,\,3\). Inf. Control 10(2), 189–208 (1967)zbMATHCrossRefGoogle Scholar
  10. 10.
    Zinn, C.: Understanding informal mathematical discourse. Ph.D. thesis, University of Erlangen-Nuremberg (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Cezary Kaliszyk
    • 1
  • Josef Urban
    • 2
  • Jiří Vyskočil
    • 3
  1. 1.University of InnsbruckInnsbruckAustria
  2. 2.Radboud University NijmegenNijmegenThe Netherlands
  3. 3.Czech Technical UniversityPragueCzech Republic

Personalised recommendations