Skip to main content

Learning to Parse on Aligned Corpora (Rough Diamond)

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9236))

Abstract

One of the first big hurdles that mathematicians encounter when considering writing formal proofs is the necessity to get acquainted with the formal terminology and the parsing mechanisms used in the large ITP libraries. This includes the large number of formal symbols, the grammar of the formal languages and the advanced mechanisms instrumenting the proof assistants to correctly understand the formal expressions in the presence of ubiquitous overloading.

In this work we start to address this problem by developing approximate probabilistic parsing techniques that autonomously train disambiguation on large corpora. Unlike in standard natural language processing, we can filter the resulting parse trees by strong ITP and AR semantic methods such as typechecking and automated theorem proving, and even let the probabilistic methods self-improve based on such semantic feedback. We describe the general motivation and our first experiments, and build an online system for parsing ambiguous formulas over the Flyspeck library.

C. Kaliszyk—Supported by the Austrian Science Fund (FWF): P26201.

J. Urban—Supported by NWO grant nr. 612.001.208.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Approximate results of an opinion poll run by the second author since 2000.

  2. 2.

    http://colo12-c703.uibk.ac.at/hh/parse.html.

  3. 3.

    Exactly, the theorems containing substrings sin, cos and tan.

  4. 4.

    http://nlp.stanford.edu/software/lex-parser.shtml.

  5. 5.

    http://colo12-c703.uibk.ac.at/hh/parse.html.

  6. 6.

    http://colo12-c703.uibk.ac.at/hh/parseimg.html.

  7. 7.

    The exact list is at http://mizar.cs.ualberta.ca/~mptp/i2f/00proved2.

References

  1. Blanchette, J. C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED. Accepted to J. Formalized Reasoning (2015). Preprint at http://www4.in.tum.de/~blanchet/h4qed.pdf

  2. Dijkstra, E.W.: The fruits of misunderstanding. Elektronische Rechenanlagen 25(6), 10–13 (1983)

    Google Scholar 

  3. Hales, T.: Dense Sphere Packings: A Blueprint for Formal Proofs. London Mathematical Society Lecture Note Series, vol. 400. Cambridge University Press, Cambridge (2012)

    Book  Google Scholar 

  4. Kaliszyk, C., Urban, J.: Stronger automation for Flyspeck by feature weighting and strategy evolution. In: Blanchette, J.C., Urban, J. (eds.) PxTP 2013. EPiC Series, vol. 14, pp. 87–95. EasyChair (2013)

    Google Scholar 

  5. Kaliszyk, C., Urban, J.: HOL(y)Hammer: online ATP service for HOL Light. Math. Comput. Sci. 9(1), 5–22 (2015)

    Article  MATH  MathSciNet  Google Scholar 

  6. Kaliszyk, C., Urban, J., Vyskočil, J., Geuvers, H.: Developing corpus-based translation methods between informal and formal mathematics: project description. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds.) CICM 2014. LNCS, vol. 8543, pp. 435–439. Springer, Heidelberg (2014)

    Google Scholar 

  7. Kühlwein, D., van Laarhoven, T., Tsivtsivadze, E., Urban, J., Heskes, T.: Overview and evaluation of premise selection techniques for large theory mathematics. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS, vol. 7364, pp. 378–392. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Tankink, C., Kaliszyk, C., Urban, J., Geuvers, H.: Formal mathematics on display: a wiki for flyspeck. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS, vol. 7961, pp. 152–167. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Younger, D.H.: Recognition and parsing of context-free languages in time n\(\,{\hat{}}\,\,3\). Inf. Control 10(2), 189–208 (1967)

    Article  MATH  Google Scholar 

  10. Zinn, C.: Understanding informal mathematical discourse. Ph.D. thesis, University of Erlangen-Nuremberg (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josef Urban .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kaliszyk, C., Urban, J., Vyskočil, J. (2015). Learning to Parse on Aligned Corpora (Rough Diamond). In: Urban, C., Zhang, X. (eds) Interactive Theorem Proving. ITP 2015. Lecture Notes in Computer Science(), vol 9236. Springer, Cham. https://doi.org/10.1007/978-3-319-22102-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22102-1_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22101-4

  • Online ISBN: 978-3-319-22102-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics