Skip to main content
Log in

Parsing as semantically guided constraint solving: the role of ontologies

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

In the Parsing-as-Constraint-Solving model of language processing, grammar syntax is described modularly through independent constraints among direct constituents of a phrase - constraints such as: “in verb phrases, a verb must precede its complements”, or “in noun phrases, a noun requires a determiner”. Parsing reduces to verifying the constraints relevant to an input phrase, but instead of the typical hierarchical (i.e., parse tree) representations of a successful parse (and also typical complete silence upon unsuccessful parses), the main result is a list of satisfied constraints, and if the input is not totally conforming, also a list of unsatisfied constraints. The latter can serve various purposes beyond plain parsing, such as guiding the correction of any imperfections found in the input- and we can still construct a parse tree if needed, as a side effect. While almost purely syntax-based, the Parsing-as-Constraint-Solving model lends itself well to accommodating interactions with other levels of analysis. These however have been little explored. In this position paper we discuss how to extend this model to incorporate semantic information, in particular from ontologies, and with particular guidance from unsatisfied constraints. This departs from more typical constraint-solving schemes, where failed constraints are simply listed and do not actively contribute to the parse. By giving failed constraints a more active role, we can arrive at more precise analyses and at more appropriate corrections of flawed input. Because even not totally conforming sentences can be more precisely parsed, we gain in expressivity with respect to both the classical, strictly stratified approach to NLP, and the less precise and less reliable statistically-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adebara, I.: Using womb grammars for inducing the grammar of a subset of Yoruba noun phrases. TRIANGLE, vol 14, (ISSN 2013-939X) (2018)

  2. Adebara, I., Dahl, V.: Grammar induction as automated transformation between constraint solving models of language. In: Proceedings of the Workshop on Knowledge-Based Techniques for Problem Solving and Reasoning Co-located with 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), New York City, USA, July 10, 2016 (2016). http://ceur-ws.org/Vol-1648/paper6.pdf

  3. Adebara, I., Dahl, V., Tessaris, S.: Completing mixed language grammars through womb grammars plus ontologies. In: Henning Christiansen, M.D.J.L., Loukanova, R. (eds.) Proceedings of the International Workshop on Partiality, Underspecification and Natural Language Processing, pp. 32–40 (2015)

  4. Becerra, L., Dahl, V., Miralles, E.: On second language tutoring through womb grammars. In: IWANN 2013, June 12–14, Tenerife, Spain (2013)

  5. Becerra, L., Dahl, V., Jiménez-López, M.D.: Womb grammars as a bio-inspired model for grammar induction. In: Trends in Practical Applications of Heterogeneous Multi-agent Systems. The PAAMS Collection, pp. 79–86. Springer International Publishing (2014)

  6. Blache, P.: Property grammars: a fully constraint-based theory. In: Proceedings of the First International Conference on Constraint Solving and Language Processing, CSLP’04. https://doi.org/10.1007/11424574_1, pp 1–16. Springer, Berlin (2005)

  7. Blache, P., Morawietz, F.: A non-generative constraint-based formalism. Travaux Interdisciplinaires du Laboratoire Parole et Langage d’Aix-en-Provence (TIPA) 19, 11–26. https://hal.archives-ouvertes.fr/hal-00283724. Autorisation No.1123 : <BR />TIPA est la revue du Laboratoire Parole et Langage (2000)

  8. Christiansen, H.: CHR grammars. TPLP 5(4–5), 467–501 (2005)

    MathSciNet  MATH  Google Scholar 

  9. Clark, M., Kim, Y., Kruschwitz, U., Song, D., Albakour, D., Dignum, S., Beresi, U.C., Fasli, M., De Roeck, A.: Automatically structuring domain knowledge from text: an overview of current research. Inf. Process. Manag. 48(3), 552–568 (2012)

    Article  Google Scholar 

  10. Dahl, V., Blache, P.: Directly executable constraint based grammars. In: Proceedings of Journees Francophones de Programmation en Logique avec Contraintes (2004)

  11. Dahl, V., Gu, B.: A CHRG analysis of ambiguity in biological texts. In: Proceedings of Fourth International Workshop on Constraints and Language Processing (CSLP) (2007)

  12. Dahl, V., Miralles, J.E.: Womb grammars: constraint solving for grammar induction. In: Sneyers, J., Frühwirth, T. (eds.) Proceedings of the 9th Workshop on Constraint Handling Rules, vol. Technical Report CW 624, pp 32–40. Department of Computer Science, K.U. Leuven (2012)

  13. Dahl, V., Miralles, E., Becerra, L.: On language acquisition through womb grammars. In: 7th International Workshop on Constraint Solving and Language Processing, pp 99–105 (2012)

  14. Dahl, V., Egilmez, S., Martins, J., Miralles, J.E.: On failure-driven constraint-based parsing through chrg?. In: CHR 2013–Proceedings of the 10th International Workshop on Constraint Handling Rules, p 13 (2013)

  15. Dahl, V., Gu, B., Miralles, E.: Semantic properties in constraint-based grammars. In: Blache, P., Christiansen, H., Dahl, V., Duchier, D., Villadsen, J.E. (eds.). Constraints and Language (2014)

  16. Duchier, D., Dao, T.B.H., Parmentier, Y.: Model-Theory and Implementation of Property Grammars with Features. Journal of Logic and Computation, Oxford University Press (OUP) 24(2), 491–509 (2014)

    Article  MathSciNet  Google Scholar 

  17. Foth, K., Daum, M., Menzel, W.: Parsing unrestricted german text with defeasible constraints. In: Christiansen, H., Skadhauge, P., Villadsen, J. (eds.) Constraint Solving and Language Processing, Lecture Notes in Computer Science, vol. 3438. https://doi.org/10.1007/11424574_9, pp 140–157. Springer, Berlin (2005)

  18. Frühwirth, T.W.: Theory and practice of constraint handling rules. J. Log. Program. 37(1–3), 95–138 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  19. Gottlob, G.: Computer science as the continuation of logic by other means. http://www.informatics-europe.org/ecss/ecss-2009/conference-program.html. Keynote at European Computing Summit Sciences 2009, Paris (2009)

  20. Gupta, A., Oates, T.: Using ontologies and the web to learn lexical semantics. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI’07, pp 1618–1623. Morgan Kaufmann Publishers Inc., San Francisco (2007). http://dl.acm.org/citation.cfm?id=1625275.1625537

  21. Ritchie, G.: The Linguistic Analysis of Jokes. Routledge, Evanston (2004)

    Google Scholar 

  22. Snow, R., Jurafsky, D., Ng, A.Y.: Semantic taxonomy induction from heterogenous evidence. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44. https://doi.org/10.3115/1220175.1220276, pp 801–808. Association for Computational Linguistics, Stroudsburg (2006)

  23. Turner, N.: Ancient Pathways, Ancestral Knowledge: Ethnobotany and Ecological Wisdom of Indigenous Peoples of Northwestern North America. McGill-Queen’s University Press, Montreal (2014)

    Google Scholar 

  24. van Rullen, T.: Vers une analyse syntaxique a granularite variable. Ph.D. thesis, Université de Provence (2005)

  25. Vossen, P.: Eurowordnet: a multilingual database of autonomous and language-specific wordnets connected via an inter-lingualindex. Int. J. Lexicogr. 17 (2), 161–173 (2004). https://doi.org/10.1093/ijl/17.2.161

    Article  Google Scholar 

Download references

Acknowledgments

Veronica Dahl is thankful for the support provided for this work by NSERC’s Discovery grant 31611024. Mariano De Sousa Bispo wishes to thank the government of Canada and in particular Foreign Affairs, Trade and Development Canada for the support that made this work possible, in the form of his scholarship within the Emerging Leaders in the America’s program. Thanks are also due to Saskia Wolsak for pointing us to Turner’s work on ethnobotany, and to the anonymous referees, for their very useful comments in a first draft of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Tessaris.

Appendices

Appendix A: Some parsing details

We next show, through the examples of uniqueness and obligation, that different constraint violation checking may need to take place at different stages of analysis.

Uniqueness

Violations of uniqueness constraints are checked by a CHRG rule that finds two words of same category C within the bounds of a phrase of category Cat being parsed, where C has been constrained to appear only once within that phrase. Should such words be found, the information that uniqueness of a category C under Cat has been violated in the range these categories cover is added as a (fact represented as a CHR) constraint, and appropriate remedial actions can then be taken, such as deleting one occurrence of the repeated category (if both are the same, as in “the the book”), or including both if different, so that further considerations can be made before choosing one over the other. In slightly simplified form, our CHRG rule that checks for the violation of uniqueness Cat : C !. looks as follows:

figure a

The first line in the above code finds a category C between word boundaries N1 and N2, with attributes Attr1 and parse tree Tree1. The three dots indicate a skipped substring after N2, before another instance of the same category C is found between the word boundaries N3 and N4. The Prolog calls (between curly brackets) and the guard find a category Cat that dominates both instances of C, and a uniqueness property that is required between a phrase Cat and its immediate daughter C (i.e., a requirement that C appear no more than once as immediate daughter of a phrase of category Cat). Once all this is checked, a grammar symbol (failed/1) is thrown into the constraint store, that states that uniqueness of C within Cat is falsified between word boundaries N1 and N4 (since grammar rules are compiled into CHR rules, what will appear in the constraint store is the equivalent CHR constraint (failed/3), namely failed(uniqueness(N5,N5,Cat,Attr,Tree,C)).

This rule can fire even if a phrasal category hasn’t been fully expanded, because adding more components to an iCat is not going to modify uniqueness having been violated.

Obligation

A constraint such as obligation, in contrast, is designed to only fire after the phrasal category has been fully expanded, since adding one more component into the phrase can result in its becoming satisfied (in the case in which the component added is the one which is obligatory). We now show the CHRG rule that checks for failed obligation:

figure b

The obligation rule checks a given category for its direct daughters. Since iCat s are retracted when they fail, this check only fires after the iCat has been fully expanded.

Appendix B: Implementing error correction through hypernym requirements

In this brief example concerning the medical domain, we show how hypernym requirements enable the system to detect errors, and through ontologies correct them. Suppose that instead of the correct phrase “salmonella causes typhoid fever”, a speech recognition system hears instead “salmon causes typhoid fever”—a plausible scenario if the presence of noise.

We consider that causes(X:germ,Y:symptom) properly represents the verb and typed parameters. Therefore, consistency can be checked between the subject’s type in the noun phrase and the X parameter’s type, and between the direct object’s type and the Y parameter’s. If the consistency check fails, an ontology must be called to resolve the issue. Although this example only corrects errors in the sentence’s subject, it can be extended to direct objects fairly easy.

The iCat relationship presented in this paper has been simplified in this example for clarity (we only show one argument of iCat: that which contains the category’s parse tree.

There are two possible ways of preserving typing information: within the syntactic tree or in a separate system. The latter has been chosen, and we assume the typing information is in a predicate semanticType(Word,Type), which for each word defines its type, as in semanticType(salmonella,germ). A priori, we believe this approach scales better, as this could be a component, agnostic to the syntactic tree representation. Moreover, this approach can be polished to avoid unnecessary type calculations.

The rule’s guard comprises four conditions to be satisfied before the rule applies. If these are satisfied, because we have a simplification rule, the iCat in question will be replaced by the new iCat that appears after the guard- thus correcting it. The rule knows that the noun phrase and verb phrase are consecutive because of the word boundaries in each (the noun phrase stretches between N1 and N2, and the verb phrase, between N2 and N3).

The second rule below is similar, but covers the case in which there is no semantic mismatch. The two first tests performed in the guard are the same as in the first rule, so we indicate them by “...” rather than recopying them. We now use a propagation rule, rather than a simplification one, because since there is no semantic mismatch, the iCat in question does not need to be replaced: it is already correct. The rule therefore expands the noun phrase and verb phrase into a sentence with parse tree s(iCat(n(Noun)), iCat(vp(VerbTree,DirectObjectTree))).

figure c

The rule that follows this one is identical except that after establishing that semantic types are as expected, it expands the existing verb phrase with no corrections needed:

figure d

Whenever there is an error, there are several heuristics the ontology can use to find the “closest” word, they depend on the input generation context. For example, if the input is being generated from scanned pages, maybe Hamming Distance could yield good results. Experiment with different scenarios can help create a selection of heuristics appropriate for the domain we are working with.

It is worth noting that if no type mismatch is generated, CHRG will connect the subtrees and continue its execution without consulting any ontology.

Appendix C: Implementation considerations for the incorporation of semantics

In terms of implementation, extending WGs to incorporating semantics as described here could be done through adding a 6th argument to our categories, in which to store the lambda-expression that represents the partial semantics associated with that category:

figure e

Since we have defined each word’s meaning in a ternary predicate called “meaning”, in order to generate that 6th argument all we have to do is call meaning/3 for the category being expanded into a 6-ary iCat.

Next we need to combine the meanings at appropriate points in our parsing process. We postulate that the appropriate point is every time a phrase is completed (i.e., cannot be further expanded). For our example “no bird sings”, once “no bird” has been analyzed into a noun phrase, this noun phrase cannot be further expanded, since “sings” is not allowable as a noun phrase’s direct daughter. At this point the parser looks at the parse tree, at any failed properties associated with it, and consults any ontological information needed to take into account the failed properties, in order to construct the meaning representation of the noun phrase.

The modularity obtained by isolating meaning representations into a reserved extra argument within iCat makes it easy to in order to change the meaning representation formalism if desired, for instance for linguistic experimentation purposes: all we have to do is redefine meaning/3.

Appendix D: WGs for grammar induction—the intuitive idea

Let LS (the source language), \(L^{S}_{syntax}\) be Its syntactic component, and \(L^{S}_{lex}\) its lexical component.

Let LT be the target language, of which we can only have access to its lexicon (\(L^{T}_{lex}\)). We can feed a sufficiently representative corpus of sentences in LT that are known to be correct, to a hybrid parser consisting of \(L^{S}_{syntax}\) and \(L^{T}_{lex}\). This will result in some of the sentences being marked as incorrect by the parser. An analysis of the constraints these “incorrect” sentences violate can subsequently reveal how to transform \(L^{S}_{syntax}\) so it accepts as correct the sentences in the corpus of L T —i.e., how to transform it into \(L^{T}_{syntax}\). For instance, let’s assume that LS = English, LT = French, and English adjectives always precede the noun they modify, while in French they always post-cede it (an oversimplification, just for illustration purposes). Then “the blue book” is correct English, whereas in French we would more readily say “le livre bleu”.

If we plug the French lexicon and the English syntax constraints into our Womb Grammar parser, and run a representative corpus of (correct) French noun phrases by the resulting hybrid parser, the said precedence property will be declared unsatisfied when hitting phrases such as “le livre bleu”. The grammar repairing module of WG can then look at the entire list of unsatisfied constraints, and produce the missing syntactic component of LT’s parser by modifying the constraints in \(L^{S}_{syntax}\) so that none are violated by the corpus sentences.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dahl, V., Tessaris, S. & De Sousa Bispo, M. Parsing as semantically guided constraint solving: the role of ontologies. Ann Math Artif Intell 82, 161–185 (2018). https://doi.org/10.1007/s10472-018-9573-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-018-9573-2

Keywords

Navigation