Skip to main content
Log in

Natural Language Inference in Coq

  • Published:
Journal of Logic, Language and Information Aims and scope Submit manuscript


In this paper we propose a way to deal with natural language inference (NLI) by implementing Modern Type Theoretical Semantics in the proof assistant Coq. The paper is a first attempt to deal with NLI and natural language reasoning in general by using the proof assistant technology. Valid NLIs are treated as theorems and as such the adequacy of our account is tested by trying to prove them. We use Luo’s Modern Type Theory (MTT) with coercive subtyping as the formal language into which we translate natural language semantics, and we further implement these semantics in the Coq proof assistant. It is shown that the use of a MTT with an adequate subtyping mechanism can give us a number of promising results as regards NLI. Specifically, it is shown that a number of inference cases, i.e. quantifiers, adjectives, conjoined noun phrases and temporal reference among other things can be successfully dealt with. It is then shown, that even though Coq is an interactive and not an automated theorem prover, automation of all of the test examples is possible by introducing user-defined automated tactics. Lastly, the paper offers a number of innovative approaches to NL phenomena like adjectives, collective predication, comparatives and factive verbs among other things, contributing in this respect to the theoretical study of formal semantics using MTTs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. pCIC is a type theory that is rather similar to Luo’s UTT, especially after its universe \(Set\) became predicative since Coq 8.0. A main difference is that UTT does not have co-inductive types. The interested reader is directed to Goguen’s Ph.D. thesis (1994) as regards the meta-theory of UTT.

  2. See next section for the definition of \(\varSigma \) types.

  3. This is similar to simple type theory where a type \(t\) of truth values exists.

  4. Of course, the need for type fine-grainedness is not an uncontroversial claim. As one of the reviewers notes, there is considerable literature claiming that this type of ‘sortal’ incorrectness is due to pragmatic factors. However, there is a huge literature claiming to the contrary. This paper takes the stance that type fine-grainedness is indeed needed, following in this respect researchers like Pustejovsky (1995), Asher (2011), Ranta (1994), Fox and Lappin (1990), Bassac et al. (2010) among many others.

  5. See the discussion in Sect. 2.4 for details w.r.t \(\varSigma \) types.

  6. See Luo (2012) for more details on this.

  7. For the notion of belief using MTT semantics, see Ranta (1994) and Chatzikyriakidis and Luo (2013) among others.

  8. Another anonymous reviewer asks what we do in cases of words like work, where an indication of two different senses exists, i.e. \(work: [\![human]\!] \rightarrow Prop\) and \(work: [\![method]\!] \rightarrow Prop\). Even though we have not looked at the problem at its full scale, the second author has proposed the use of overloading with Unit types for these cases, encoding the different senses of the same verb (Luo 2011). Furthermore, on the level of CNs there is considerable work by the authors and colleagues on dot.types. The interested reader should consult (Luo 2011; Chatzikyriakidis and Luo 2012; Xue and Luo 2012; Asher and Luo 2012) for more details.

  9. It is worth mentioning that subsumptive subtyping, i.e. the traditional notion of subtyping that adopts the subsumption rule (if \(A\le B\), then every object of type \(A\) is also of type \(B\)), is inadequate for MTTs in the sense that it would destroy some important metatheoretical properties of MTTs [see, for example, §4 of Luo et al. (2012) for details].

  10. This kind of inferences can be straightforwardly proven in Coq by using a standard analysis for quantifier some plus the subtyping relation \([\![man]\!]\ {<}_{}\ [\![human]\!]\).

  11. See Luo (2011) for more details on this proposal as well as Xue and Luo (2012) for an implementation of dot-types in the proof assistant Plastic.

  12. \(\varSigma \)-types can also provide the tools for the proper semantic interpretation of the so-called ‘Donkey-sentences’ (Sundholm 1986).

  13. For a recent approach to anaphora using dependent typing see Grudzinska and Zawadowski (2014).

  14. This was proposed for the first time in Luo (2011).

  15. There is quite a long discussion on how these universes should be like. In particular, the debate is largely concentrated on whether a universe should be predicative or impredicative. A strongly impredicative universe \(U\) of all types (with \(U:U\) and \(\varPi \)-types) is shown to be paradoxical (Girard 1971) and as such logically inconsistent. The theory UTT we use here has only one impredicative universe \(Prop\) (representing the world of logical formulas) together with infinitely many predicative universes which as such avoids Girard’s paradox [see Luo (1994) for more details].

  16. An anonymous reviewer was questioning the use of MTTs, and their advantages against other systems for formal semantics, i.e. Montague Grammar, DRT and Davidsonian semantics. In this paper, we argue for rich type theories instead of simply typed ones. DRT, Davidsonian semantics as well as other systems for formal semantics are not typed systems. However, fusions of DRT with simple type theory have been attempted successfully (Muskens 2005). In principle, fusions of MTTs with DRT are possible. A discussion on whether MTTs consistute a better alternative than any preceding formal semantics system is, we are afraid, out of the scope of this paper.

  17. In Luo’s MTT, \(\textsc {cn}\) is the universe containing the names that interpret CNs. Since the possiblity of introducing new universes is not an option we approximate this idea by having CN being of type set.

  18. Note that ambiguous paths are not allowed and as such given two types \(A\) and \(B\) (with \(A, B :CN\)), there is no possiblity of defining both \(A<B\) and \(B<A\).

  19. See the discussion on automation in Sect. 5.

  20. The same modification can be also found in MacCartney (2009). In general, if one uses a theorem prover to deal with NL inference [e.g. analyses in the style of Blackburn and Bos (2005), Bos and Markert (2005, 2006)] such modifications are necessary.

  21. The source codes can be obtained by sending an email request to the first author:

  22. See however Chatzikyriakidis (2014) for a more thorough look at adverbs from an MTT perspective.

  23. For this first example, we shall detail its formal semantics in a type theory. You can also find the Coq implementation of this in Appendix “An Example from the FraCas Test Suite”. For the examples later on, we shall omit such details.

  24. This is the type for VP adverbs used in Luo (2011). We will see later on that a slightly modified type will be used for VP adverbs. For the moment, we keep this type given that it does not play any role whatsoever in proving the inference.

  25. Note that \([\![finish]\!]:[\![human]\!]\rightarrow Prop < [\![delegate]\!]\rightarrow Prop\).

  26. A note about Coq is in order here: building new universes is not an option in Coq (or, put in another way, Coq does not support us to build a new universe). Instead, we shall use an existing universe in Coq in conducting our examples for coordination.

  27. This is something that has been noted in the literature, see Partee (2007). Note that in the FraCas test suite, the inference in (46) is valid.

  28. In the case of fake, Partee (2010) tried to provide an account where fake is treated as a subsective adjective, i.e. affirmative in the classification given in the FraCas test suite, via using the disjoint union type.

  29. Of course, depending on context more fine grained distinctions might be needed but the idea is applicable to these cases as well.

  30. \(Small\) is defined after \(Large\) has been declared. The opposite is also possible, i.e. defining \(Large\) after \(Small\) has been declared first. This might seem strange from a theoretical point of view, but for implementation purposes it is not.

  31. This is based on the authors’ analysis of subsective adjectives (Chatzikyriakidis and Luo 2013).

  32. The interested reader is directed to Chatzikyriakidis and Luo (2013) for more information on the treatment of subsective as well as the other types of adjectives in MTT with coercive subtyping.

  33. In Coq, we cannot have the first projection as a general coercion. Instead, we have to declare it for the instances we need. This is a weakness of Coq that does not allow us to implement the more general treatment. Such a general coercion is possible to get declared in Plastic, an interactive theorem prover that implements Luo’s UTT and coercive subtyping (Callaghan and Luo 2001).

  34. Here we do not spell out the type \(Height\). One might take \(Height\) to be the type of natural numbers and use \(170\) to stand for \(1.70\), etc.

  35. The transitive properties of comparatives are not encoded in this example for reasons of simplicity. One may very well do so having as a guide the previous entry without measures.

  36. This is a bi-implcation, given that if the height of human \(x\) is less than the height of another human \(y\), then it is also the case that \(x\) is shorter than \(y\). The definition also works as an implication.

  37. One may even employ this model to capture composite tenses like the past perfect, but we do not discuss this here. See Ranta (1994) for an idea of how this can be done within such a framework.

  38. The assumption that verbs involve an event/situation argument goes back at least to Davidson (1967). See Davidson (1967) and reference therein, for a history of events/situations in linguistic theory.

  39. An inductive type is specified by a number of constructors whose types must be strictly positive [see, for example, Chapter 9 of Luo (1994) for formal details]. \(Time\) as an inductive type may have other constructors but we only detail \(date\) here.

  40. Note that,in detail, the range of days depends on the year and month. This can be represented by means of dependent types: the type \(Day(y,m)\) depends on \(y:Year\) and \(m:Month\): for example, because there are only 28 days in Feb of 1970, \(Day(1970,Feb) = \{1, 2, \ldots , 28\}\), the enumeration type consisting of 1, 2, ..., 28 only. Formally, \(DATE\) can be defined as \(\varSigma y:Year. \varSigma m:Month.\ Day(y,m)\).

  41. Another approach to dealing with such adjectives is to follow Partee (2007) and assume that former behaves similarly to privative adjectives like fake or imaginary. If so, one may follow the proposed MTT-interpretation by the authors to use the disjoint union type to interpret former. See Chatzikyriakidis and Luo (2013) for details.

  42. For understandability of the readers who are unfamiliar with MTTs, we abuse the notation here, using \(\lnot A\) to stand for \(A\rightarrow \emptyset \), \(\wedge \) for \(\times \) and \(\exists \) for \(\varSigma \). One may ignore these formal details.

  43. In Coq this is translated as \((Time\rightarrow \textsc {cn})\rightarrow Prop\) given that definitions always end in Prop.

  44. \(Vec(A,n)\) can be seen as a collection of elements of type \(A\) with an explicit \(nat\) argument counting the elements.

  45. Note that reciprocal predicates can be seen as cases after the functor each_other has been applied. In a sense, the semantics of reciprocals are similar to regular transitive predicates after each_other has been applied. See the following discussion on each_other.

  46. On the assumption that meet and respectively are also assumed to involve extra information in the same vein with each other.

  47. Also, this does not mean that the Vector-treatments are superior as compared to some existing constructive semantic accounts of these quantifiers [see, for example, Sundholm (1989)].

  48. There are a number of details as to how the regular entry for something like walk (\(Animal \rightarrow Prop\)) and its plural version (\(\varPi n:nat, vector Animal n \rightarrow Prop\)) are related but this is something that we cannot discuss here. See however the discussion in Chatzikyriakidis and Luo (2012).

  49. However, look at a first way this can be used in order to deal with inferences involving this kind of quantifiers in Sect. 4.9.1.

  50. For example we have skipped examples 3.68 and 3.69 in the FraCas test suite given the similarity with 3.67.

  51. It can be defined compositionally as a \(\varSigma \) type.

  52. This is in fact the only case of those tested where the prover finds a proof where it should not have.

  53. For example, the Montagovian meaning postulates for the different kinds of adjectives have to be defined as axioms [see e.g. Pulman (2013)]. In our case, and at least for intersective and subsective adjectives, their inferential properties are derived via typing only [see Chatzikyriakidis and Luo (2013) for more information].

  54. Even though we do not use \(\varSigma \) types to represent existential quantification.

  55. E.g. treatment of anaphora that is lacking in our account or the treatment of collective predication temporal reference lacking in deep approaches like Bos and Markert (2005, 2006), Pulman (2013).

  56. The account proposed in MacCartney (2009), as already mentioned, is a kind of hybrid approach with both a shallow and a deep component. It is out of the scope of this paper to look at the state-of-the-art shallow approaches to NLI. However, the interested reader is directed to MacCartney (2009) and references therein for more information on this type of approaches.

  57. This is one of the core ideas of GF parsers, i.e defining one abstract syntax that corresponds to multiple concrete ones.

  58. Dealing with ellipsis successfully is of course largely dependent on the adequacy of the parser, given that if the parser succeeds in parsing elliptical constructions it will then linearize these structures into the Coq language where the elided information will be present. From this stage on, inferences are easy to be proven. However, this issue is left for future work.

  59. Jauto is part of the LibTactics library, containing extra tactics beyond the standard ones.


  • Asher, N. (2011). Lexical meaning in context: A web of words. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Asher, N., & Luo, Z. (2012). Formalisation of coercions in lexical semantics. Sinn und Bedeutung 17, Paris 223.

  • Bassac, C., Mery, M., & Retoré, C. (2010). Towards a type-theoretical account of lexical semantics. Journal of Logic Language and Information, 19, 229–245.

    Article  Google Scholar 

  • Blackburn, P., & Bos, J. (2005). Representation and inference for natural language. Stanford: CSLI Publications.

    Google Scholar 

  • Blazy, S., Dargaye, Z., & Leroy, X. (2006) Formal verification of a C compiler front-end. In FM 2006: International symposium on formal methods. Lecture Notes in Computer Science (Vol. 4085, pp. 460–475). Berlin: Springer.

  • Bos, J., & Markert, K. (2005). Recognising textual entailment with logical inference. In Proceedings of the 2005 conference on empirical methods in natural language processing (EMNLP) (pp. 98–103).

  • Bos, J., & Markert, K. (2006). When logical inference helps determining textual entailment (and when it doesn’t). In Proceedings of the 2nd PASCAL challenges workshop on recognising textual entailment.

  • Callaghan, P., & Luo, Z. (2001). An implementation of LF with coercive subtyping and universes. Journal of Automated Reasoning, 27(1), 3–27.

    Article  Google Scholar 

  • Chatzikyriakidis, S. (2014). Adverbs in a modern type theory. In N. Asherl & S. Soloviev (Eds.), Proceedings of LACL2014. LNCS 8535 (pp. 44–56).

  • Chatzikyriakidis, S., & Luo, Z. (2012). An account of natural language coordination in type theory with coercive subtyping. In Y. Parmentier & D. Duchier (Eds.), Proceedings of constraint solving and language processing (CSLP12) (pp. 31–51). LNCS 8114, Orleans.

  • Chatzikyriakidis, S., & Luo, Z. (2013). Adjectives in a modern type-theoretical setting. In G. Morrill & J. Nederhof (Eds.), Proceedings of formal grammar 2013 (pp. 159–174). LNCS 8036.

  • Church, A. (1940). A formulation of the simple theory of types. The Journal of Symbolic Logic, 5(1), 56–68.

  • Cooper, R., Crouch, D., van Eijck, J., Fox, C., van Genabith, J., Jaspars, J., Kamp, H., Milward, D., Pinkal, M., Poesio, M., & Pulman, S. (1996). Using the framework. Technical Report LRE 62–051r.

  • Dagan, I., Glickman, D., & Magnini, B. (2006). The PASCAL recognising textual entailment challenge. In J. Quionero-Candela, I. Dagan, B. Magnini & F. d’Alch-Buc (Eds.), Machine learning challenges (pp. 177–190). LNCS 3944.

  • Davidson, D. (1967). Compositionality and coercion in semantics: The semantics of adjective meaning. In N. Rescher (Ed.), The logical form of action sentences (pp. 81–95). Pittsburgh: University of Pittsburgh Press.

    Google Scholar 

  • Davidson, D. (1967). The logical form of action sentences. In N. Rescher (Ed.), The logic of decision and action. Pittsburgh: University of Pittsburgh Press.

    Google Scholar 

  • Fox, C., & Lappin, S. (1990). Foundations of intensional semantics. Oxford: Oxford University Press.

    Google Scholar 

  • Girard, J. Y. (1971). Une extension de l’interpretation fonctionelle de Gödel à l’analyse et son application à l’élimination des coupures dans et la théorie des types’. In Proceedings of 2nd Scandinavian logic symposium, North-Holland.

  • Goguen, H. (1994). A typed operational semantics for type theory. Ph.D. thesis, University of Edinburgh.

  • Gonthier, G. (2005). A computer-checked proof of the four colour theorem.

  • Grudzinska, J., & Zawadowski, M. (2014). System with generalized quantifiers on dependent types for anaphora. In Proceedings of EACL 2014.

  • Kamp, H. (1975). Formal semantics of natural language. In E. Keenan (Ed.), Two theories about adjectives (pp. 123–155). Cambridge: Cambridge University Press.

    Google Scholar 

  • Lappin, S. (To appear). Curry typing, polymorphism, and fine-grained intensionality. In S. Lappin & C. Fox (Eds.), Handbook of contemporary semantic theory. Oxford: Blackwell.

  • Ljunglof, P., & Siverbo, M. (2011). A bilingual treebank for the FraCas test suite. Clt project report, University of Gothenburg.

  • Luo, Z. (1994). Computation and reasoning: A type theory for computer science. Oxford: Oxford University Press.

    Google Scholar 

  • Luo, Z. (1999). Coercive subtyping. Journal of Logic and Computation, 9(1), 105–130.

    Article  Google Scholar 

  • Luo, Z. (2010). Type-theoretical semantics with coercive subtyping. Semantics and Linguistic Theory 20 (SALT20), Vancouver, 84(2), 28–56.

  • Luo, Z. (2011). Contextual analysis of word meanings in type-theoretical semantics. In Logical aspects of computational linguistics (LACL’2011) (pp. 159–174). LNAI 6736.

  • Luo, Z. (2012). Common nouns as types. In D. Bechet & A. Dikovsky (Eds.), Logical aspects of computational linguistics (LACL’2012) (pp. 173–185). LNCS 7351.

  • Luo, Z. (2012). Formal semantics in modern type theories with coercive subtyping. Linguistics and Philosophy, 35(6), 491–513.

    Article  Google Scholar 

  • Luo, Z., Soloviev, S., & Xue, T. (2012). Coercive subtyping: Theory and implementation. Information and Computation, 223, 18–42.

    Article  Google Scholar 

  • MacCartney, B. (2009). Natural language inference. Ph.D. thesis, Stanford Universisty.

  • Maienborn, C., & Schafer, M. (2011). Adverbs and adverbials. In C. Maienborn, K. von Heusinger, & P. Portner (Eds.), Semantics: An international handbook of natural language meaning (pp. 1390–1420). Mouton: De Gruyter.

    Chapter  Google Scholar 

  • Martin-Löf, P. (1975). An intuitionistic theory of types: predicative part. In H. Rose & J. C. Shepherdson (Eds.), Logic Colloquium’73.

  • Martin-Löf, P. (1984). Intuitionistic type theory. Naples: Bibliopolis.

    Google Scholar 

  • Montague, R. (1973). The proper treatment of quantification in ordinary English. In J. Hintikka, J. Moravcsik & P. Suppes (Eds.), Approaches to natural languages (pp. 221–242).

  • Montague, R. (1974). Formal philosophy. New Haven: Yale University Press.

    Google Scholar 

  • Muskens, R. (2005). Sense and the computation of reference. Linguistics and Philosophy, 28(4), 473–504.

  • Partee, B. (2007). Compositionality and coercion in semantics: The semantics of adjective meaning. In G. Bouma, I. Krämer & J. Zwarts (Eds.), Cognitive foundations of interpretation (pp. 145–161). Royal Netherlands Academy of Arts and Sciences.

  • Partee, B. (2010). Privative adjectives: Subsective plus coercion. In R. Bauerle & U. Reyle (Eds.), Presuppositions and discourse: Essays offered to Hans Kamp (pp. 123–155). Bingley: Emerald Group Publishing.

    Google Scholar 

  • Pulman, S. (2013). Second order inference in NL semantics. Talk given at the KCL Language and Cognition seminar, London.

  • Pustejovsky, J. (1995). The generative lexicon. Cambridge: MIT.

    Google Scholar 

  • Ranta, A. (1994). Type-theoretical grammar. Oxford: Oxford University Press.

    Google Scholar 

  • Ranta, A. (2011). Grammatical framework: Programming with multilingual grammar. Stanford: CSLI Publications.

    Google Scholar 

  • Sundholm, G. (1986). Proof theory and meaning. In D. Gabbay & F. Guenthner (Eds.), Handbook of philosophical logic III: Alternatives to classical logic (pp. 471–506). Reidel.

  • Sundholm, G. (1989). Constructive generalized quantifiers. Synthese, 79(1), 1–12.

    Article  Google Scholar 

  • The Coq Development Team. (2007). The Coq Proof Assistant Reference Manual (Version 8.1), INRIA.

  • Wilson, S., Fleuriot, A., & Smaill, A. (2010). Inductive proof automation for coq. In Proceedings of the 2nd Coq Workshop. EPTCS.

  • Xue, T., & Luo, Z. (2012). Dot-types and their implementation. In Logical aspects of computational linguistics (LACL 2012). LNCS 7351.

Download references


This work is supported by the Grant F/07-537/AJ of the Leverhulme Trust in U.K. Two anonymous reviewers are also thanked for providing detailed and insightful comments and suggestions on an earlier draft of this paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Stergios Chatzikyriakidis.

Appendix: Coq Code of Examples

Appendix: Coq Code of Examples

All of the examples in this paper have been tried in the Coq proof assistant. The source codes can be obtained by sending an email request to Here, we shall give an example with Coq tactics (Appendix “A More Advanced Example: Proving Peirce’s Law”) and some examples in linguistic semantics (Appendix “An Example from the FraCas Test Suite”).

1.1 A More Advanced Example: Proving Peirce’s Law

We want to prove that if the law of the excluded middle holds then so does Peirce’s law.

figure cp

We unfold the definitions, apply intros and elim H:

figure cq

Then, intro, assumption and intro again:

figure cr

We use apply H0 and now we have to prove \(A\rightarrow B\). We apply intro:

figure cs

We use absurd A and now we need to prove \(A\) and \(~ A\), which can be done via two applications of assumption.

The above can be proved automatically as well, using automated user-defined tactics. For this case, we can define a tactic which unfolds all the definitions and then applies tauto, which tries intuitionistic propositional tautologies:

figure ct

This suffices to prove our example automatically.

1.2 An Example from the FraCas Test Suite

FraCas example 3.55

figure cu
figure cv

We unfold the definitions for a and move the premise to the assumptions via intro and we apply the elimination tactic elim:

figure cw

We apply intros:

figure cx

With \(x :Irishdelegate\) as an assumption, we can now substitute \(x_0\) in the conclusion with \(x\) thanks to the subtyping mechanism:

figure cy

We apply assumption and the proof is over. The above can be proved using automated tactics as well. For the purposes of this paper the following tactic has been defined:

figure cz

The above unfolds all definitions, then tries all intuitionistic first-order tautologies (intuition). Then, congruence deals with any equalities (for the example in question there are no equalities). Then jauto is applied, which is basically Coq’s predefined auto tactic along with some pre-processing of the goal. Then again intuition is applied. This automated tactic can prove what we want (and much more).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chatzikyriakidis, S., Luo, Z. Natural Language Inference in Coq. J of Log Lang and Inf 23, 441–480 (2014).

Download citation

  • Published:

  • Issue Date:

  • DOI: