Skip to main content

How to Do Maths with Words: Neural Machine Learning Applications to Mathematics and Their Philosophical Significance

  • Living reference work entry
  • First Online:
Handbook of the History and Philosophy of Mathematical Practice
  • 48 Accesses


Recent years have seen a remarkable development of deep neural network techniques for data analysis, along with their increasing application in scientific research across different disciplines. The field of mathematics has not been exempted from this general trend. The present chapter provides a survey of recent applications of neural models to mathematics and assesses their philosophical significance, related to the role of language in mathematics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


  1. 1.

    Wu’s talk was part of a Workshop on Machine-Assisted Proofs held between February 13 and 17, 2023. The recording of the talk is available on the website of the IPAM.

  2. 2.

    This does not mean that any task is solvable by means of neural models, as a significant part of the AI community tends to think. Many problems, maybe most of them, do not accept being constructed as a predictive task based on a finite number of supposedly similar cases expressible as vectors. Problems in politics, law, art, and many other domains of social life cannot, by their nature, be framed in such terms without transforming, if not wholly obliterating that very nature.

  3. 3.

    The reader should keep in mind that, despite our efforts to cover as much of the field as possible, the following constitutes only a partial view on the state of the art, biased toward works exhibiting philosophical interest. A more precise perspective can be obtained by inspecting the “Related Work” section usually included in the papers we will present. Lu et al. (2023) also provide a useful systematic survey of DNN models for mathematical reasoning.

  4. 4.

    That is, iteratively generating text one word at a time from a sequence of words initially given, recursively augmented by the words produced by the model.

  5. 5.

    Whereas BERT contained a maximum of 340 million parameters in its original version, and at the time of GPT-3’s release, the largest model contained 17 billion, GPT-3 was composed of more than ten times this number: 175 billion parameters. As for the data, GPT-3 was trained on 500 billion words (tokens) compared to 3300 million for BERT.

  6. 6.

    For a comprehensive account of LLMs, see Bommasani et al. (2021). For a precise yet accessible presentation of the mechanisms behind GPT, see, for instance, Wolfram (2023).

  7. 7.

    To the best of our knowledge, there has been no work within this area of research involving the content of mathematical diagrams yet. From a different perspective, Sørensen and Johansen (2020) have used neural techniques to identify diagrams in mathematical texts.


  • Alemi AA, Chollet F, Een N, Irving G, Szegedy C, Urban J (2016) DeepMath – Deep sequence models for premise selection. In: Proceedings of the 30th international conference on neural information processing systems, NIPS’16. Curran Associates, Red Hook, pp 2243–2251

    Google Scholar 

  • Avigad J (2008) Computers in mathematical inquiry. Oxford University Press, New York, Chap 11, pp 134–150

    Google Scholar 

  • Avigad J (2015) Mathematics and language. In: Davis, E., Davis, P. (eds) Mathematics, Substance and Surmise. Springer, Cham

    Google Scholar 

  • Bansal K, Loos SM, Rabe MN, Szegedy C, Wilcox S (2019) HOList: An environment for machine learning of higher-order theorem proving (extended version). CoRR abs/1904.03241,

  • Belinkov Y, Glass J (2019) Analysis methods in neural language processing: a survey. Trans Assoc Comput Ling 7:49–72.

  • Biggio L, Bendinelli T, Neitz A, Lucchi A, Parascandolo G (2021) Neural symbolic regression that scales. In: Meila M, Zhang T (eds) Proceedings of the 38th International Conference on Machine Learning, PMLR, Proceedings of Machine Learning Research, vol 139, pp 936–945.

  • Blechschmidt J, Ernst OG (2021) Three ways to solve partial differential equations with neural networks – a review. GAMM-Mitteilungen 44(2):e202100,006.

    Article  MathSciNet  Google Scholar 

  • Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji NS, Chen AS, Creel K, Davis JQ, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie L, Goel K, Goodman ND, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard T, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass MS, Krishna R, Kuditipudi R, et al (2021) On the opportunities and risks of foundation models. CoRR abs/2108.07258., 2108.07258

  • Borwein JM, Bailey DH (2003) Mathematics by experiment – plausible reasoning in the 21st century. A K Peters, New York

    Google Scholar 

  • Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 33, pp 1877–1901.

  • Brunton SL, Kutz JN (2022) Data-driven science and engineering: machine learning, dynamical systems, and control, 2nd edn. Cambridge University Press, Cambridge.

  • Carifio J, Halverson J, Krioukov D, Nelson BD (2017) Machine learning in the string landscape. J High Energy Phys 2017(9):157.

    Article  MathSciNet  Google Scholar 

  • Charton F (2021) Linear algebra with transformers. CoRR abs/2112.01898,, 2112.01898

  • Charton F (2022) What is my math transformer doing? – three results on interpretability and generalization. 2211.00170

    Google Scholar 

  • Chemla K (2012) The history of mathematical proof in ancient traditions. Cambridge University Press, Cambridge

    Google Scholar 

  • Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, Barham P, Chung HW, Sutton C, Gehrmann S, Schuh P, Shi K, Tsvyashchenko S, Maynez J, Rao A, Barnes P, Tay Y, Shazeer N, Prabhakaran V, Reif E, Du N, Hutchinson B, Pope R, Bradbury J, Austin J, Isard M, Gur-Ari G, Yin P, Duke T, Levskaya A, Ghemawat S, Dev S, Michalewski H, Garcia X, Misra V, Robinson K, Fedus L, Zhou D, Ippolito D, Luan D, Lim H, Zoph B, Spiridonov A, Sepassi R, Dohan D, Agrawal S, Omernick M, Dai AM, Pillai TS, Pellat M, Lewkowycz A, Moreira E, Child R, Polozov O, Lee K, Zhou Z, Wang X, Saeta B, Diaz M, Firat O, Catasta M, Wei J, Meier-Hellstern K, Eck D, Dean J, Petrov S, Fiedel N (2022) Palm: scaling language modeling with pathways. 2204.02311

    Google Scholar 

  • Cobbe K, Kosaraju V, Bavarian M, Chen M, Jun H, Kaiser L, Plappert M, Tworek J, Hilton J, Nakano R, Hesse C, Schulman J (2021) Training verifiers to solve math word problems. CoRR abs/2110.14168,, 2110.14168

  • Conneau A, Kruszewski G, Lample G, Barrault L, Baroni M (2018) What you can cram into a single $&!#* vector: probing sentence embeddings for linguistic properties. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics (volume 1: long papers). Association for Computational Linguistics, Melbourne, pp 2126–2136.

  • d’Ascoli S, Kamienny P, Lample G, Charton F (2022) Deep symbolic regression for recurrent sequences. CoRR abs/2201.04600, 2201.04600

    Google Scholar 

  • Davies A, Veličković P, Buesing L, Blackwell S, Zheng D, Tomašev N, Tanburn R, Battaglia P, Blundell C, Juhász A, Lackenby M, Williamson G, Hassabis D, Kohli P (2021) Advancing mathematics by guiding human intuition with AI. Nature 600(7887):70–74.

    Article  Google Scholar 

  • Davis E (2019) The use of deep learning for symbolic integration: a review of (lample and charton, 2019). 1912.05752

    Google Scholar 

  • Davis E (2021) Deep learning and mathematical intuition: a review of (davies et al. 2021). CoRR abs/2112.04324, 2112.04324

    Google Scholar 

  • Davis E (2023) Mathematics, word problems, common sense, and artificial intelligence. 2301.09723

    Google Scholar 

  • Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805, 1810.04805

    Google Scholar 

  • Ferreira D, Freitas A (2020) Premise selection in natural language mathematical texts. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, pp 7365–7374.

  • Ferreira D, Freitas A (2021) STAR: cross-modal [STA]tement [R]epresentation for selecting relevant mathematical premises. In: Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: main volume. Association for Computational Linguistics, Online, pp 3234–3243.

  • Freivalds K, Liepins R (2017) Improving the neural GPU architecture for algorithm learning. CoRR abs/1702.08727,, 1702.08727

  • Ghahramani Z (2023) Introducing PaLM 2.

  • Giaquinto M (2008) Cognition of structure. Oxford University Press, New York, Chap 2, pp 43–64

    Google Scholar 

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge MA/London UK

    Google Scholar 

  • Griffith K, Kalita J (2021) Solving arithmetic word problems with transformers and preprocessing of problem text. CoRR abs/2106.00893,, 2106.00893

  • Heal K, Kulkarni A, Sertöz EC (2020) Deep learning gauss-manin connections. CoRR abs/2007.13786,, 2007.13786

  • Hendrycks D, Burns C, Basart S, Zou A, Mazeika M, Song D, Steinhardt J (2021a) Measuring massive multitask language understanding. 2009.03300

    Google Scholar 

  • Hendrycks D, Burns C, Kadavath S, Arora A, Basart S, Tang E, Song D, Steinhardt J (2021b) Measuring mathematical problem solving with the MATH dataset. In: Thirty-fifth conference on neural information processing systems datasets and benchmarks track (Round 2),

  • Herreman A (2000) La topologie et ses signes: Éléments pour une histoire sémiotique des mathématiques. L’Harmattan, Paris

    Google Scholar 

  • Hewitt J, Liang P (2019) Designing and interpreting probes with control tasks. 1909.03368

    Google Scholar 

  • Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366.

    Article  Google Scholar 

  • Hughes MC (2020) A neural network approach to predicting and computing knot invariants. J Knot Theory Ramif 29(03):2050,005.

  • Jejjala V, Kar A, Parrikar O (2019) Deep learning the hyperbolic volume of a knot. Phys Lett B 799:135033.

    Article  MathSciNet  Google Scholar 

  • Jiang AQ, Li W, Tworkowski S, Czechowski K, Odrzygóźdź T, Miłoś P, Wu Y, Jamnik M (2022) Thor: wielding hammers to integrate language models and automated theorem provers. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds) Advances in neural information processing systems, vol 35. Curran Associates, pp 8360–8373.

    Google Scholar 

  • Jiang AQ, Welleck S, Zhou JP, Lacroix T, Liu J, Li W, Jamnik M, Lample G, Wu Y (2023) Draft, sketch, and prove: guiding formal theorem provers with informal proofs. In: The eleventh international conference on learning representations.

    Google Scholar 

  • Kaiser Ł, Sutskever I (2016) Neural GPUs learn algorithms. 1511.08228

    Google Scholar 

  • Kaliszyk C, Chollet F, Szegedy C (2017) HolStep: a machine learning dataset for higher-order logic theorem proving. CoRR abs/1703.00426,, 1703.00426

  • Kamienny PA, d’Ascoli S, Lample G, Charton F (2022) End-to-end symbolic regression with transformers. 2204.10532

    Google Scholar 

  • Kim S, Lu PY, Mukherjee S, Gilbert M, Jing L, Ceperic V, Soljacic M (2019) Integration of neural network-based symbolic regression in deep learning for scientific discovery. IEEE Trans Neural Netw Learn Syst 32:4166–4177

    Article  Google Scholar 

  • Kohlhase A, Kohlhase M, Ouypornkochagorn T (2018) Discourse phenomena in mathematical documents. In: Rabe F, Farmer WM, Passmore GO, Youssef A (eds) Intelligent Computer Mathematics. Springer International Publishing, Cham, pp 147–163

    Google Scholar 

  • Lample G, Charton F (2020) Deep learning for symbolic mathematics. In: International conference on learning representations.

    Google Scholar 

  • Lample G, Lacroix T, Lachaux MA, Rodriguez A, Hayat A, Lavril T, Ebner G, Martinet X (2022) Hypertree proof search for neural theorem proving. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds) Advances in neural information processing systems, vol 35. Curran Associates, pp 26,337–26,349.

    Google Scholar 

  • Lee D, Szegedy C, Rabe M, Loos S, Bansal K (2020) Mathematical reasoning in latent space. In: International conference on learning representations

    Google Scholar 

  • Levitt JSF, Hajij M, Sazdanovic R (2019) Big data approaches to knot theory: understanding the structure of the jones polynomial. 1912.10086

    Google Scholar 

  • Lewkowycz A, Andreassen A, Dohan D, Dyer E, Michalewski H, Ramasesh V, Slone A, Anil C, Schlag I, Gutman-Solo T, Wu Y, Neyshabur B, Gur-Ari G, Misra V (2022) Solving quantitative reasoning problems with language models. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (eds) Advances in neural information processing systems, vol 35. Curran Associates, pp 3843–3857.

    Google Scholar 

  • Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57.

    Article  Google Scholar 

  • Loos SM, Irving G, Szegedy C, Kaliszyk C (2017) Deep network guided proof search. CoRR abs/1701.06972,

  • Lu P, Qiu L, Yu W, Welleck S, Chang KW (2023) A survey of deep learning for mathematical reasoning. In: Rogers A, Boyd-Graber J, Okazaki N (eds) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Toronto, Canada, pp 14,605–14,631.

  • Madsen A, Reddy S, Chandar S (2021) Post-hoc interpretability for neural nlp: a survey.,

  • Mancosu P (2008) Mathematical Explanation: Why it Matters. Oxford University Press, New York, Chap 5, pp 134–150

    Google Scholar 

  • Manning CD (2015) Computational linguistics and deep learning. Comput Linguist 41(4):701–707.

    Article  MathSciNet  Google Scholar 

  • Manning CD, Clark K, Hewitt J, Khandelwal U, Levy O (2020) Emergent linguistic structure in artificial neural networks trained by self-supervision. Proc Natl Acad Sci.

  • Martius G, Lampert CH (2016) Extrapolation and learning equations. CoRR abs/1610.02995,, 1610.02995

  • McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A proposal for the Dartmouth summer research project on artificial intelligence, August 31, 1955. AI Mag 27(4):12

    Google Scholar 

  • Meng Y, Rumshisky A (2019) Solving math word problems with double-decoder transformer. CoRR abs/1908.10924,, 1908.10924

  • Netz R (1999) The shaping of deduction in Greek mathematics. Cambridge University Press, Cambridge

    Google Scholar 

  • Newell A, Simon HA (1956) Plans for the Dartmouth summer research project on artificial intelligence. Typescript, Supplement to McCarthy, et al. (2006)

    Google Scholar 

  • Nogueira R, Jiang Z, Lin J (2020) Document ranking with a pretrained sequence-to-sequence model. CoRR abs/2003.06713

    Google Scholar 

  • Paliwal AS, Loos SM, Rabe MN, Bansal K, Szegedy C (2019) Graph representations for higher-order logic and theorem proving. In: AAAI conference on artificial intelligence

    Google Scholar 

  • Petersen BK, Landajuela M, Mundhenk TN, Santiago CP, Kim SK, Kim JT (2021) Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. 1912.04871

    Google Scholar 

  • Polu S, Sutskever I (2020) Generative language modeling for automated theorem proving. CoRR abs/2009.03393,, 2009.03393

  • Quine WVO (2013) Word and object, new edition, paperback edn. The MIT Press, London

    Google Scholar 

  • Rabe MN, Szegedy C (2021) Towards the automatic mathematician. In: Platzer A, Sutcliffe G (eds) Automated deduction – CADE 28. Springer International Publishing, Cham, pp 25–37

    Chapter  Google Scholar 

  • Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2019) Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683,, 1910.10683

  • Ravfogel S, Prasad G, Linzen T, Goldberg Y (2021) Counterfactual interventions reveal the causal effect of relative clause representations on agreement prediction. In: Proceedings of the 25th conference on computational natural language learning. Association for Computational Linguistics, Online, pp 194–209.

  • Saxton D, Grefenstette E, Hill F, Kohli P (2019) Analysing mathematical reasoning abilities of neural models. 1904.01557

    Google Scholar 

  • Schlimm D (2018) Numbers through numerals. The constitutive role of external representations. In: Bangu S (ed) Naturalizing logico-mathematical knowledge: approaches from psychology and cognitive science. Routledge, New York. pp 195–217

    Google Scholar 

  • Shen JT, Yamashita M, Prihar E, Heffernan NT, Wu X, Lee D (2021) Mathbert: A pre-trained language model for general NLP tasks in mathematics education. CoRR abs/2106.07340, 2106.07340

    Google Scholar 

  • Sloane NJA (2007) The on-line encyclopedia of integer sequences. In: Kauers M, Kerber M, Miner R, Windsteiger W (eds) Towards mechanized mathematical assistants. Springer Berlin Heidelberg, Berlin/Heidelberg, pp 130–130

    Chapter  Google Scholar 

  • Sørensen HK, Johansen MW (2020) Counting mathematical diagrams with machine learning. In: Pietarinen AV, Chapman P, Bosveld-de Smet L, Giardino V, Corter J, Linker S (eds) Diagrammatic representation and inference. Springer International Publishing, Cham, pp 26–33

    Google Scholar 

  • Szegedy C (2020) A promising path towards autoformalization and general artificial intelligence. In: Benzmüller C, Miller B (eds) Intelligent computer mathematics. Springer International Publishing, Cham, pp 3–20

    Chapter  Google Scholar 

  • Toffoli SD, Giardino V (2013) Forms and roles of diagrams in knot theory. Erkenntnis 79(4):829–842.

    Article  MathSciNet  Google Scholar 

  • Trask A, Hill F, Reed SE, Rae J, Dyer C, Blunsom P (2018) Neural arithmetic logic units. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. Curran Associates.

    Google Scholar 

  • Turing A (1948/2004) Intelligent machinery (1948). In: The essential Turing. Oxford University Press, Oxford.

  • Udrescu SM, Tegmark M (2019) AI Feynman: a physics-inspired method for symbolic regression. 1905.11481

    Google Scholar 

  • Valipour M, You B, Panju M, Ghodsi A (2021) SymbolicGPT: a generative transformer model for symbolic regression. ArXiv abs/2106.14131

    Google Scholar 

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 30.

  • Wagner R (2019) Does mathematics need foundations? Springer International Publishing, Cham, pp 381–396.

    Book  Google Scholar 

  • Wagner AZ (2021) Constructions in combinatorics via neural networks. 2104.14516

    Google Scholar 

  • Wang M, Deng J (2020) Learning to prove theorems by learning to generate theorems. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds) Advances in neural information processing systems, vol 33. Curran Associates, pp 18,146–18,157.

    Google Scholar 

  • Waszek D (2018) Les représentations en mathématiques. PhD thesis,, thèse de doctorat dirigée par Panza, Marco Philosophie Paris 1 2018

  • Welleck S, Liu J, Bras RL, Hajishirzi H, Choi Y, Cho K (2021) Naturalproofs: Mathematical theorem proving in natural language. CoRR abs/2104.01112,, 2104.01112

  • Welleck S, West P, Cao J, Choi Y (2022) Symbolic brittleness in sequence models: on systematic generalization in symbolic mathematics. 2109.13986

    Google Scholar 

  • Williamson G (2023) Is deep learning a useful tool for the pure mathematician? 2304.12602

    Google Scholar 

  • Wittgenstein L (2009) Philosophical investigations, 4th edn. Wiley-Blackwell, Chichester

    Google Scholar 

  • Wolfram S (2023) What is ChatGPT doing… and why does it work? Accessed 09 July 2023

  • Wu Y, Jiang AQ, Li W, Rabe MN, Staats CE, Jamnik M, Szegedy C (2022) Autoformalization with large language models. In: Oh AH, Agarwal A, Belgrave D, Cho K (eds) Advances in neural information processing systems.

    Google Scholar 

  • Zheng K, Han JM, Polu S (2022) MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics. In: international conference on learning representations.

Download references


The author wishes to thank Deniz Sarikaya, Bharath Sriraman, David Waszek, Roy Wagner, and John Terilla for their patient support, precious feedback, and constant encouragement.

Funding Information This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 839730.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Juan Luis Gastaldi .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Gastaldi, J.L. (2023). How to Do Maths with Words: Neural Machine Learning Applications to Mathematics and Their Philosophical Significance. In: Sriraman, B. (eds) Handbook of the History and Philosophy of Mathematical Practice. Springer, Cham.

Download citation

  • DOI:

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-19071-2

  • Online ISBN: 978-3-030-19071-2

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics