Predicate Argument Structures for Information Extraction from Dependency Representations: Null Elements are Missing

  • Rodolfo Delmonte
Part of the Studies in Computational Intelligence book series (SCI, volume 515)


State of the art parsers are currently trained on converted versions of Penn Treebank into dependency representations which however don’t include null elements. This is done to facilitate structural learning and prevent the probabilistic engine to postulate the existence of deprecated null elements everywhere (see [15]). However it is a fact that in this way, the semantics of the representation used and produced on runtime is inconsistent and will reduce dramatically its usefulness in real life applications like Information Extraction, Q/A and other semantically driven fields by hampering the mapping of a complete logical form. What systems have come up with are “Quasi”-logical forms or partial logical forms mapped directly from the surface representation in dependency structure. We show the most common problems derived from the conversion and then describe an algorithm that we have implemented to apply to our converted Italian Treebank, that can be used on any CONLL-style treebank or representation to produce an “almost complete” semantically consistent dependency treebank.


Predicate argument structures Dependency structures  Null elements Logical form Information extraction for question answering and text understanding 



This work has been partially funded by the PARLI Project (Portale per l’Accesso alle Risorse Linguistiche per l’Italiano—MIUR—PRIN 2008).


  1. 1.
    Bies, A., Ferguson, M., Katz, K., MacIntyre, R., Tredinnick, V., Kim, G., Ann Marcinkiewicz, M., Schasberger, B.: Bracketing guidelines for Treebank II style Penn\(\sim \)dm/07/autumn/795.10/ptb-annotation-guide/root. html (1995) Google Scholar
  2. 2.
    Cahill, A., McCarthy, M., van Genabith, J., Way, A.: Automatic annotation of the Penn-Treebank with LFG f-structure information. In: LREC: Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data. Las Palmas (2002)Google Scholar
  3. 3.
    Cahill, A., McCarthy, M., van Genabith, J., Way, A.: Quasi-logical forms for the Penn Treebank. In: Bunt H., van der Sluis I., Morante R. (eds.) Proceedings of the Fifth International Workshop on Computational Semantics, IWCS-05, pp. 55–71. Tilburg (2003)Google Scholar
  4. 4.
    Cai, S., Chiang, D., Goldberg, Y.: Language-independent parsing with empty elements. In: Proceedings of the 49th Annual Meeting of the ACL, pp. 212–216 (2011)Google Scholar
  5. 5.
    Campbell, R.: Using linguistic principles to recover empty categories. In Proceedings of ACL (2004)Google Scholar
  6. 6.
    Chung, T., Gildea, D.: Effects of empty categories on machine translation. In Proceedings EMNLP (2010)Google Scholar
  7. 7.
    Choi, J.D., Palmer, M.: Robust constituent-to-dependency conversion for english. In: Proceedings of the 9th International Workshop on Treebanks and Linguistic Theories (TLT’9), pp. 55–66. Tartu (2010)Google Scholar
  8. 8.
    Clark, S., Curran, J.R.: Comparing the accuracy of CCG and Penn Treebank parsers. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 53–56. Suntec, Singapore (2009)Google Scholar
  9. 9.
    De Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, pp. 449–454 (2006/5)Google Scholar
  10. 10.
    Delmonte, R., Bristot, A., Tonelli, S.: VIT —Venice Italian Treebank: Syntactic and Quantitative Features. In: De Smedt, K., Hajic, J., Kübler, S. (eds.), Proceedings of Sixth International Workshop on TLT, vol. 1, pp. 43–54. Nealt Proceeding Series (2007)Google Scholar
  11. 11.
    Delmonte R., Bianchi, D.: Semantic web, RDFs and NLP for QA. In: Calzolari N., Magnini B. (eds.) Proceedings of the Workshop on “Topics and Perspectives of NLP in Italy”, Università di Pisa, AI*IA, pp. 67–75 (2003)Google Scholar
  12. 12.
    Dienes P., Dubey, A.: Antecedent recovery: experiments with a trace tagger. In: Proceedingsof EMNLP (2003a)Google Scholar
  13. 13.
    Dienes P., Dubey, A.: Deep processing by combining shallow methods. In: Proceedings of ACL (2003b)Google Scholar
  14. 14.
    Gabbard, R., Marcus M., Kulick, S.: Fully parsing the Penn Treebank. In: Proceedings of the HLT Conference of the North American Chapter of the ACL, pp. 184–191 (2006)Google Scholar
  15. 15.
    Gaizauskas, R.: Investigations into the Grammar Underlying the Penn Treebank II, Technical Report CS-95-25. Univeristy of Sheffield, Department of Computer Science (1995)Google Scholar
  16. 16.
    Guo, Y., van Genabith, J., Wang, H.: Treebank-based acquisition of LFG resources for Chinese. In: Lexical Functional Grammar, pp. 28–30. California (2007)Google Scholar
  17. 17.
    Johnson, M.: A simple patter-matching algorithm for recovering empty nodes and their antecedents. In: Proceedings of the 39th Annual Meeting of the ACL, 136–143, Toulouse, France (2001)Google Scholar
  18. 18.
    Johansson, R., Nugues, P.: Extended constituent-to-dependency conversion for english. In: Proceedings of NODALIDA 2007, Tartu (2007)Google Scholar
  19. 19.
    Katz, B.: Annotating the World Wide Web using natural language. In: RIAO ’97 (1997)Google Scholar
  20. 20.
    Liakata, M., Pulman, S.: From Trees to Predicate-Argument Structures. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pp. 563–569. Taipei (2002)Google Scholar
  21. 21.
    Litkowski, K.C.: Syntactic clues and Lexical resources in question-answering. In: Voorhees E.M., Harman D.K. (eds.) The Ninth Text Retrieval Conference (TREC-9). NIST Special Publication 500–249, Gaithersburg, pp. 157–166 (2001)Google Scholar
  22. 22.
    Marcus, M., Kim, G., Ann Marcinkiewicz, M., Macintyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: annotating predicate argument structure. In: ARPA Human Language Technology Workshop, pp. 114–119 (1994)Google Scholar
  23. 23.
    Sagae, K., Tsujii, J.: Shift-reduce dependency DAG parsing. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester (2008)Google Scholar
  24. 24.
    Schmid, H.: Trace prediction and recovery with unlexicalized PCFGs and slash features. In: Proceedings COLING-ACL (2006)Google Scholar
  25. 25.
    Tonelli, S., Delmonte, R., Bristot, A.: Enriching the Venice Italian Treebank with dependency and grammatical relations, LREC 2008 (2008)Google Scholar
  26. 26.
    Xue, N., Xia, F., Chiou, F.-D., Palmer, M.: The Penn Chinese TreeBank: phrase structure annotation of a large corpus. Nat. Lang. Eng. 11(2), 207–238 (2005)Google Scholar
  27. 27.
    Yang, Y., Xue, N.: Chasing the ghost: recovering empty categories in the Chinese Treebank. In: Proceedings COLING (2010)Google Scholar
  28. 28.
  29. 29.

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Department of Linguistic Studies and Comparative Cultures and Department of Computer scienceCa’ Foscari UniversityVeniceItaly

Personalised recommendations