Race Against the Teens – Benchmarking Mechanized Math on Pre-university Problems

  • Takuya Matsuzaki
  • Hidenao Iwane
  • Munehiro Kobayashi
  • Yiyang Zhan
  • Ryoya Fukasaku
  • Jumma Kudo
  • Hirokazu Anai
  • Noriko H. Arai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9706)


This paper introduces a benchmark problem library for mechanized math technologies including computer algebra and automated theorem proving. The library consists of pre-university math problems taken from exercise problem books, university entrance exams, and the International Mathematical Olympiads. It thus includes problems in various areas of pre-university math and with a variety of difficulty. Unlike other existing benchmark libraries, this one contains problems that are formalized so that they are obtainable as the result of mechanical translation of the original problems expressed in natural language. In other words, the library is designed to support the integration of the technologies of mechanized math and natural language processing towards the goal of end-to-end automatic math problem solving. The paper also presents preliminary experimental results of our prototype reasoning component of an end-to-end system on the library. The library is publicly available through the Internet.


Computer Algebra System Automatic Reasoning Satisfiability Modulo Theory Math Problem Peano Arithmetic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Barrett, C., Stump, A., Tinelli, C.: The Satisfiability Modulo Theories Library (SMT-LIB) (2010).
  2. 2.
    Bos, J.: Wide-coverage semantic analysis with boxer. In: Bos, J., Delmonte, R. (eds.) Semantics in Text Processing, STEP 2008 Conference Proceedings, pp. 277–286. Research in Computational Semantics, College Publications (2008)Google Scholar
  3. 3.
    Clark, S., Curran, J.R.: Wide-coverage efficient statistical parsing with CCG and log-linear models. Comput. Linguist. 33, 493–552 (2007)CrossRefzbMATHGoogle Scholar
  4. 4.
    Dennis, L.A., Gow, J., Schürmann, C.: Challenge problems for inductive theorem provers v1.0. Technical report ULCS-07-004, University of Liverpool, Department of Computer Science (2007)Google Scholar
  5. 5.
    Grabowski, A., Korni lowicz, A., Naumowicz, A.: Mizar in a nutshell. J. Formalized Reasoning 3(2), 153–245 (2010)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Hoos, H.H., Stützle, T.: SATLIB: An Online Resource for Research on SAT. In: Sat2000: Highlights of Satisfiability Research in the Year 2000, pp. 283–292. IOS Press, Amsterdam (2000)Google Scholar
  7. 7.
    Iwane, H., Matsuzaki, T., Arai, N., Anai, H.: Automated natural language geometry math problem solving by real quantier elimination. In: Proceedings of the 10th International Workshop on Automated Deduction (ADG2014), pp. 75–84 (2014)Google Scholar
  8. 8.
    Iwane, H., Yanami, H., Anai, H., Yokoyama, K.: An effective implementation of symbolic-numeric cylindrical algebraic decomposition for quantifier elimination. Theor. Comput. Sci. 479, 43–69 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Kwiatkowksi, T., Zettlemoyer, L., Goldwater, S., Steedman, M.: Inducing probabilistic CCG grammars from logical form with higher-order unification. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1223–1233. Association for Computational Linguistics (2010)Google Scholar
  10. 10.
    Matsuzaki, T., Iwane, H., Anai, H., Arai, N.H.: The most uncreative examinee: A first step toward wide coverage natural language math problem solving. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1098–1104 (2014)Google Scholar
  11. 11.
    Quaresma, P.: Thousands of geometric problems for geometric theorem provers (TGTP). In: Schreck, P., Narboux, J., Richter-Gebert, J. (eds.) ADG 2010. LNCS, vol. 6877, pp. 169–181. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Sutcliffe, G.: The TPTP problem library and associated infrastructure: the FOF and CNF Parts, v3.5.0. J. Autom. Reasoning 43(4), 337–362 (2009)CrossRefzbMATHGoogle Scholar
  13. 13.
    Sutcliffe, G., Benzmüller, C.: Automated reasoning in higher-order logic using the TPTP THF infrastructure. J. Formalized Reasoning 3(1), 1–27 (2010)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Sutcliffe, G., Stickel, M., Schulz, S., Urban, J.: Answer extraction for TPTP.
  15. 15.
    Tarski, A.: A Decision Method for Elementary Algebra and Geometry. University of California Press, Berkeley (1951)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Takuya Matsuzaki
    • 1
    • 2
  • Hidenao Iwane
    • 2
    • 3
  • Munehiro Kobayashi
    • 4
  • Yiyang Zhan
    • 5
  • Ryoya Fukasaku
    • 6
  • Jumma Kudo
    • 6
  • Hirokazu Anai
    • 3
    • 7
  • Noriko H. Arai
    • 2
  1. 1.Nagoya UniversityNagoyaJapan
  2. 2.National Institute of InformaticsChiyodaJapan
  3. 3.Fujitsu Laboratories, Ltd.KawasakiJapan
  4. 4.University of TsukubaTsukubaJapan
  5. 5.Université Paris DiderotParisFrance
  6. 6.Tokyo University of ScienceShinjukuJapan
  7. 7.Kyushu UniversityFukuokaJapan

Personalised recommendations