Making AI meaningful again


Artificial intelligence (AI) research enjoyed an initial period of enthusiasm in the 1970s and 80s, but this enthusiasm was tempered by a long interlude of frustration when genuinely useful AI applications failed to be forthcoming. Today, we are experiencing once again a period of enthusiasm, fired above all by the successes of the technology of deep neural networks or deep machine learning. In this paper we draw attention to what we take to be serious problems underlying current views of artificial intelligence encouraged by these successes, especially in the domain of language processing. We then show an alternative approach to language-centric AI, in which we identify a role for philosophy.

This is a preview of subscription content, log in to check access.


  1. 1.

    Also referred to as ‘brute force’ learning.

  2. 2.

    Increasing efficiency means: reducing unit production costs; increasing effectiveness means: achieving higher desired quality per production unit.

  3. 3.

  4. 4.

    This is the official name of Google’s AI department. While Google’s machine-learning engineers are certainly among the world’s leading representatives of their craft, the name nonetheless reveals a certain hubris.

  5. 5.

    This encoding approach is used (with variations on how the vector is created) by all dNNs since “word2vec” (Mikolov et al. 2013).

  6. 6.

    In this entire text, “meaning” signifies the relevance to the actions and thoughts that humans attribute to the stimuli that they encounter in sensation. For a non-English speaker, an English sentence, too, is a series of meaningless stimuli. For an English speaker, in contrast, the sentence is immediately interpreted as meaningful.

  7. 7.

    For example, the algorithm learns to translate the German word ‘Mehl’ into ‘flour’ because this pair is repeated many times in training sentences. But it will fail to translate “Wir haben Mehl Befehl gegeben zu laufen” into the adequate “We ordered Mehl to run”. It rather gives out the nonsensical “We have ordered flour to run” (result produced on Jan. 7, 2019). The translation fails because there are not enough training examples to learn the martial usage of surnames without title.

  8. 8.

    To a reader without knowledge of the Bible this sentence (John, 1,1) will seem strange or unintellegible. It is impossible to enumerate all such contextual constellations and include them as annotated features to training sets for stochastic models in amounts sufficient for machine learning.

  9. 9.

    To illustrate the limitations of the approach, Hofstadter used input sentences with a high degree of cross-contextualisation (see “The Shallowness of Google Translate”, The Atlantic, January 30, 2018).

    Text by Hofstadter: In their house, everything comes in pairs. There’s his car and her car, his towels and her towels, and his library and hers.

    Google Translate: Dans leur maison, tout vient en paires. Il y a sa voiture et sa voiture, ses serviettes et ses serviettes, sa biblioth que et les siennes.

    Translated back into English by Google: In their house everything comes in pairs. There is his car and his car, their napkins and their napkins, his library and their’s.

  10. 10.

    So-called deterministic AI models (Russell and Norvig 2014) do not generalize, either, but they report their failures.

  11. 11.

    Indeed they were criticised, 200 years earlier, by Immanuel Kant in 1781 in his Critique of Pure Reason.

  12. 12.

    One example described in Feng et al. (2018) rests on the input: “In 1899, John Jacob Astor IV invested $100,000 for Tesla to further develop and produce a new lighting system. Instead, Tesla used the money to fund his Colorado Springs experiments”. The described system correctly answers the question: “What did Tesla spend Astor’s money on?” with a confidence of 0.78 (where 1 is the maximum). The problem is that it provides exactly the same answer with a similar degree of confidence as its response to the nonsensical question: “did?”

  13. 13.

    The \(F_1\)-score of 0.52 reported by Zheng et al. (2017) seems quite high; but most of the training material is synthetic and the reported outcome only concerns information triples, which cannot be used for applied IE. The example is ‘poignant’ because the paper in question won the 2017 Prize for Information Extraction of the Association for Computational Linguistics, globally the most important meeting in the language AI field.

  14. 14.

  15. 15.

    Currently, prior knowledge is used mainly for the selection or creation of the training data for end-to-end dNN applications.

  16. 16.

    The improvements provided by this approach are very modest and not higher than those achieved by other tweaks of dNNs such as optimised embeddings or changes in the layering architecture.

  17. 17.

    An excellent summary can be found in Russell and Norvig (2014).

  18. 18.

    Hayes’ conception of an ontology as the formalization of our knowledge of reality continues today in the work of Tom Gruber, whose Siri application, implemented by Apple in the iPhone, is built around a set of continuously evolving ontologies representing simple domains of reality such as restaurants, movies, and so forth.

  19. 19.

    BFO is currently under review as an International Standards Organization standard under ISO/IEC: 21838-1 (Top-Level Ontologies: Requirements) and ISO/IEC: 21838-2 (BFO).

  20. 20.

    This is a non-compact and non-complete k-order intensional logic; ‘k-order’ means that predicates of the logic can predicate over other predicates arbitrarily often. ‘Intensional’ means that the range of predication in the logic is not restricted to existing entities (Gamut 1991).

  21. 21.

    An overview is given in Boolos et al. (2007).

  22. 22.

    This software is in production at carexpert GmbH, Walluf, Germany, a claims validation service provider processing some 70 thousand automobile glass repair bills a year with a total reimbursement sum of over \(\EUR 50\) million. The bill validation process, performed by a car mechanics expert in 7–10 min, is completed by the system in 250 ms.

  23. 23.

    Compare footnote 6 above.


  1. Arp, R., Smith, B., & Spear, A. (2015). Building ontologies with basic formal ontology. Cambridge, MA: MIT Press.

    Google Scholar 

  2. Ashburner, M. (2000). Gene ontology: Tool for the unification of biology. Nature Genetics, 25, 25–29.

    Article  Google Scholar 

  3. Boolos, G. S., Burgess, J. P., & Jeffrey, R. C. (2007). Computability and logic. Cambridge: Cambridge University Press.

    Google Scholar 

  4. Carey, S., & Xu, F. (2001). Infants’ knowledge of objects: Beyond object files and object tracking. Cognition, 80, 179–213.

    Article  Google Scholar 

  5. Chen, Y., Gilroy, S., Knight, K., & Jonathan. (2017). Recurrent neural networks as weighted language recognizers. CoRR, arXiv:1711.05408.

  6. Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2, 113–124.

    Article  Google Scholar 

  7. Cooper, S. B. (2004). Computability theory. London: Chapman & Hall/CRC.

    Google Scholar 

  8. Dummett, M. (1996). Origins of analytical philosophy. Boston, MA: Harvard University Press.

    Google Scholar 

  9. Feng, S., Wallace, E., Iyyer, M., Rodriguez, P., Grissom II, A., & Boyd-Graber, J. L.(2018). Right answer for the wrong reason: Discovery and mitigation. CoRR, arXiv:1804.07781.

  10. Finkel, J. R., Kleeman, A., & Manning, C. D. (2008). Efficient, feature-based, conditional random field parsing. In Proceedings of ACL-08: HLT (pp. 959–967). Association for Computational Linguistics.

  11. Gamut, L. T. F. (1991). Logic, language and meaning (Vol. 2). Chicago, London: The University of Chicago Press.

    Google Scholar 

  12. Gelman, S. A. (2003). The essential child: Origins of essentialism in everyday thought. London: Oxford Series in Cognitive Development.

    Google Scholar 

  13. Gelman, S. A., & Byrnes, J. P. (1991). Perspectives on language and thought. Cambridge, MA: Cambridge University Press.

    Google Scholar 

  14. Gelman, S. A., & Wellman, H. M. (1991). Insides and essences: Early understandings of the non-obvious. Cognition, 38(3), 213–244.

    Article  Google Scholar 

  15. Gibson, J. J. (1979). An ecological theory of perception. Boston, MA: Houghton Miflin.

    Google Scholar 

  16. Gopnik, A. (2000). Explanation as orgasm and the drive for causal understanding. In F. Keil & R. Wilson (Eds.), Cognition and explanation. Cambridge, MA: MIT Press.

    Google Scholar 

  17. Gutierrez-Basulto, V., & Schockaert, S. (2018). From knowledge graph embedding to ontology embedding? An analysis of the compatibility between vector space representations and rules. In Principles of knowledge representation and reasoning: Proceedings of the sixteenth international conference, KR 2018, Tempe, Arizona, 30 October–2 November 2018, pp. 379–388.

  18. Hastie, T., Tishirani, T., & Friedman, J. (2008). The elements of statistical learning (2nd ed.). Berlin: Springer.

    Google Scholar 

  19. Hayes, P. J. (1985). The second naive physics manifesto. In J. R. Hobbs & R. C. Moore (Eds.), Formal theories of the common-sense world. Norwoord: Ablex Publishing Corporation.

    Google Scholar 

  20. Honnibal, M., & Montani, I. (2018). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (in press).

  21. Jaderberg, M., & Czarnecki, W. M. (2018). Human-level performance in first-person multiplayer games with population-based deep reinforcement learning.

  22. Jo, J., & Bengio, Y. (2017). Measuring the tendency of CNNs to learn surface statistical regularities. CoRR, arXiv:1711.11561.

  23. Keil, F. (1989). Concepts. Kinds and Cognitive Development. Cambridge, MA: MIT Press.

    Google Scholar 

  24. Keil, F. (1995). The growth of causal understanding of natural kinds. In D. Premack & J. Premack (Eds.), Causal cognition. London: Oxford University Press.

    Google Scholar 

  25. Kim, I. K., & Spelke, E. S. (1999). Perception and understanding of effects of gravity and inertia on object motion. Developmental Science, 2(3), 339–362.

    Article  Google Scholar 

  26. Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge, MA: MIT.

    Google Scholar 

  27. Kowsari, K., Brown, D. E., Heidarysafa, M., Meimandi, K. J., Gerber, M. S., & Barnes, L. E. (2017). HDLTex: Hierarchical deep learning for text classification. CoRR, arXiv:1709.08267.

  28. Leslie, A. (1979). The representation of perceived causal connection in infancy. Oxford: University of Oxford.

    Google Scholar 

  29. Marcus, G. (2018). Deep learning: A critical appraisal.

  30. McCarthy, J., & Hayes, P. J. (1969). Some philosophical problems from the standpoint of artificial intelligence. Machine Intelligence, 4, 463–502.

    Google Scholar 

  31. Medin, D., & Ross, B. H. (1989). The specific character of abstract thought: Categorization, problem solving, and induction. In Advances in the psychology of human intelligence (Vol. 5).

  32. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 26, pp. 3111–3119). Red Hook: Curran Associates Inc.

    Google Scholar 

  33. Millikan, R. (2001). On clear and confused ideas. Cambridge Studies in Philosophy. Cambridge, MA: Cambridge University Press.

    Google Scholar 

  34. Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., & Frossard, P. (2016). Universal adversarial perturbations. CoRR, arXiv:1610.08401.

  35. Nienhuys-Cheng, S.-H., & de Wolf, R. (2008). Foundations of inductive logic programming. Berlin: Springer.

    Google Scholar 

  36. Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). BLEU: A method for automatic evaluation of machine translation. In ACL (pp. 311–318). ACL.

  37. Poplin, R., Varadarajan, A. V., & Blumer, K. (2018). Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature Biomedical Engineering, 2, 158–164.

    Article  Google Scholar 

  38. Povinelli, D. J. (2000). Folk physics for apes: The chimpanzee’s theory of how the world works. London: Oxford University Press.

    Google Scholar 

  39. Rehder, B. (1999). A causal model theory of categorization. In Proceedings of the 21st annual meeting of the cognitive science society (pp. 595–600).

  40. Robinson, A., & Voronkov, A. (2001). Handbook of automated reasoning. Cambridge, MA: Elsevier Science.

    Google Scholar 

  41. Russell, S., & Norvig, P. (2014). Artificial intelligence: A modern approach. Harlow, Essex: Pearson Education.

    Google Scholar 

  42. Silver, David, Huang, Aja, Maddison, Chris J., Guez, Arthur, Sifre, Laurent, van den Driessche, George, et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.

    Article  Google Scholar 

  43. Smith, B. (2003). Ontology. In Blackwell guide to the philosophy of computing and information (pp. 155–166). Blackwell.

  44. Solomon, K. O., Medin, D., & Lynch, E. (1999). Concepts do more than categorize. Trends in Cognitive Sciences, 3, 99–105.

    Article  Google Scholar 

  45. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge, MA: The MIT Press.

    Google Scholar 

  46. Tenenbaum, J. B. (1999). A Bayesian framework for concept learning. Cambridge, MA: Massachusetts Institute of Technology.

    Google Scholar 

  47. Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity, and Bayesian inference. Behavioral and brain sciences, 24(4), 629–640.

    Google Scholar 

  48. Vaswani, A., Shazeeri, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. CoRR, arXiv:1706.03762.

  49. Zheng, S., Wang, F., Bao, H., Hao, Y., Zhou, P., & Xu, B. (2017). Joint extraction of entities and relations based on a novel tagging scheme. CoRR, arXiv:1706.05075.

Download references


We would like to thank Prodromos Kolyvakis, Kevin Keane, James Llinas and Kirsten Gather for helpful comments.

Author information



Corresponding author

Correspondence to Jobst Landgrebe.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: A real-world example

Appendix: A real-world example

To represent in logical form the full meaning of complex natural language expression E as used in a given domain and for a given purpose, we will need a set of domain-specific ontologies together with algorithms which, given E, can generate a logical formula using ontology terms which are counterparts of the constituent simple expressions in E and which expresses the relations between these terms. These algorithms should then allow the representation in machine-readable form not merely of single expressions but of entire texts, even of entire corpora of texts, in which domain-specific knowledge is communicated in natural language form.

To see how philosophy is already enabling applied science-based production along these lines, let us look at a real-world example of an AI automaton used to automatically generate expert technical appraisals for insurance claims.Footnote 22 Today, such claims are validated by mid-level clerical staff, whose job is to compare the content of each claim—for example the line items in a car repair or cardiologist bill—with the standards legally and technically valid for the context at issue (also referred to as ‘benchmarks’). When deviations from a benchmark are detected by humans, corresponding amounts are subtracted from the indemnity amount with a written justification for the reduction. Digitalization has advanced sufficiently far in the insurance world that claims data can be made available in structured digital form (lines in the bill are stored as separate attributes in a table in a relational database). However, the relevant standards specifying benchmarks and how they are to be treated in claims processing have until recently been represented only as free-text strings. Now, however, by using technology along the lines described above, it is possible to automate both the digital representation of these standards and the results of the corresponding comparisons between standards and claims data.

To this end, we developed an application that combines stochastic models with a multi-facetted version of logic-based AI to achieve the following steps:

  1. 1.

    Compute a mathematical representation (vector) of the contents of the bill using logic for both textual and quantitative data. The text is vectorised using a procedure shown in equations (1)–(3) of Sect. 3.3.1 above, while the quantitative content is simply inserted into the vector.

  2. 2.

    Recognise the exact type of bill and understand the context in which it was generated. This is done using the logical representation of the text, which is taken as input for deterministic or stochastic classification of the bill type (for example, car glass damage) and subtype (for example, rear window).

  3. 3.

    Identify the appropriate repair instructions (‘benchmark’) for the bill by querying the corresponding claims knowledge base for the benchmark most closely matching the bill in question. Standard sets of benchmarks are provided by the original equipment manufacturers or they are created from historic bills using unsupervised pattern identification in combination with human curation. The benchmark texts are transformed into mathematical logic.

  4. 4.

    Compare the bill to its benchmark by identifying matching lines using combinatorial optimisation. The matches are established by computing the logical equivalence of the matching line items using entailment in both directions: given a bill line (or line group) p and its candidate match (or group) q, compute \(p \vdash q\) and \(q \vdash p\) to establish the match.

  5. 5.

    Subtract the value of the items on the bill that do not match the benchmark from the reimbursement sum

  6. 6.

    Output the justification for the subtractions using textual formulations from the appropriate standard documents.

To achieve comparable results an end-to-end dNN-based algorithm would require billions of bills with standardised appraisal results. Yet the entire German car market yields only some 2–3 million car glass damage repair bills in any given year and the appraisals are not standardised.

The technology is used for the automation of typical mid-level office tasks. It detects non-processable input, for example language resulting in a non-resolvable set of logical formulae, and passes on the cases it cannot process for human inspection. This is a core feature of our technology which may not match the expectations of an AI purist. However, applications of the sort described have the potential to automate millions of office jobs in the German-speaking countries alone.

Human beings, when properly trained, are able to perform the classification described under step 2 spontaneously. They can do this both for entire artefacts such as bills and for the single lines which are their constituents. Humans live in a world which is meaningful in precisely this respect.Footnote 23 The ability to classify types of entities in given contexts can be replicated in machines only if they store a machine-adequate representation of the background knowledge that humans use in guiding their actions. This is realised in the described system by means of ontologies covering both the entities to which reference is made in given textual inputs and the contexts and information artefacts associated therewith. The ontologies also incorporate formal definitions of the relevant characteristics of these objects, of the terms used in the relevant insurance rules, and so forth. The ontologies are built by hand, but involve a minimal amount of effort for those with expertise in the relevant domain (here: the contents of repair bills and insurance rules). These definitions are entailed by the bill and benchmark texts, and the latter are automatically processed into logical representations in the ontology framework without human interference.

Resulting system properties

This philosophy-driven AI application uses both stochastic models and parsers, as well as mechanical theorem provers. It meets the requirements listed in Table 1, including:

  • Exactness—it has an error rate of below 0.3% (relative to the gold standard obtained by a consortium of human experts), which is below the best human error rate of 0.5%. Such low levels of error are achieved only because, unlike a stand-alone stochastic model, the system will detect if it cannot perform any of the essential inference steps and route the case to a human being.

  • Information security—the system is secure because any misreactions to perturbing input by its stochastic models are detected by the logical model working in the immediately subsequent step.

  • Robustness—it is robust since it will detect when it cannot interpret a given context properly, and issue a corresponding alert.

  • Data parsimony—it requires very little data for training, since unlike the sorts of suboptimally separating agnostic spaces resulting from stochastic embeddings, it induces what we can call a semantic space that separates data points very effectively.

  • Semantic fidelity—the system not only allows, but it is in fact based on inference and so it can easily use prior and world knowledge in both stochastic (Bayesian net) and deterministic (logical) form.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Landgrebe, J., Smith, B. Making AI meaningful again. Synthese (2019).

Download citation


  • Artificial intelligence
  • Deep neural networks
  • Semantics
  • Logic
  • Basic formal ontology (BFO)