DILUCT: An Open-Source Spanish Dependency Parser Based on Rules, Heuristics, and Selectional Preferences

  • Hiram Calvo
  • Alexander Gelbukh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3999)

Abstract

A method for recognizing syntactic patterns for Spanish is presented. This method is based on dependency parsing using heuristic rules to infer dependency relationships between words, and word co-occurrence statistics (learnt in an unsupervised manner) to resolve ambiguities such as prepositional phrase attachment. If a complete parse cannot be produced, a partial structure is built with some (if not all) dependency relations identified. Evaluation shows that in spite of its simplicity, the parser’s accuracy is superior to the available existing parsers for Spanish. Though certain grammar rules, as well as the lexical resources used, are specific for Spanish, the suggested approach is language-independent.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apresyan, Y.D., Boguslavski, I., Iomdin, L., Lazurski, A., Pertsov, N., Sannikov, V., Tsinman, L.: Linguistic Support of the ETAP-2 System, Moscow, Nauka (1989) (in Russian)Google Scholar
  2. 2.
    Bolshakov, I.A.: A Method of Linguistic Steganography Based on Collocationally-Verified Synonymy. In: Fridrich, J. (ed.) IH 2004. LNCS, vol. 3200, pp. 180–191. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Bolshakov, I.A., Gelbukh, A.: Lexical functions in Spanish. In: Proc. CIC-98, Simposium Internacional de Computación, Mexico, pp. 383–395 (1998), http://www.gelbukh.com/CV/Publications/1998/
  4. 4.
    Bolshakov, I.A., Gelbukh, A.: A Very Large Database of Collocations and Semantic Links. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds.) NLDB 2000. LNCS, vol. 1959, pp. 103–114. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Bolshakov, I.A., Gelbukh, A.: On Detection of Malapropisms by Multistage Collocation Testing. In: NLDB-2003, 8th Int. Conf. on Application of Natural Language to Information Systems, pp. 28–41. Bonner Köllen Verlag (2003)Google Scholar
  6. 6.
    Brants, T.: TNT–A Statistical Part-of-Speech Tagger. In: Proc. ANLP 2000, 6th Applied NLP Conference, Seattle (2000)Google Scholar
  7. 7.
    Briscoe, T., Carroll, J., Graham, J., Copestake, A.: Relational evaluation schemes. In: Procs. of the Beyond PARSEVAL Workshop, 3rd International Conference on Language Resources and Evaluation, pp. 4–8. Las Palmas, Gran Canaria (2002)Google Scholar
  8. 8.
    Calvo, H., Gelbukh, A.: Natural Language Interface Framework for Spatial Object Composition Systems. Procesamiento de Lenguaje Natural 31 (2003)Google Scholar
  9. 9.
    Calvo, H., Gelbukh, A.: Acquiring selectional preferences from untagged text for prepositional phrase attachment disambiguation. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 207–216. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Calvo, H., Gelbukh, A., Kilgarriff, A.: Distributional Thesaurus Versus WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 177–188. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing: An Open-Source Suite of Language Analyzers. In: Proc. 4th Intern. Conf. on Language Resources and Evaluation (LREC 2004), Portugal (2004)Google Scholar
  12. 12.
    Chomsky, N.: Syntactic Structures. Mouton & Co., The Hague (1957)Google Scholar
  13. 13.
    Civit, M., Martí, M.A.: Estándares de anotación morfosintáctica para el español. Workshop of tools and resources for Spanish and Portuguese. In: IBERAMIA 2004 (2004)Google Scholar
  14. 14.
    Copestake, A., Flickinger, D., Sag, I.A.: Minimal Recursion Semantics. In: An introduction. CSLI, Stanford University (1997)Google Scholar
  15. 15.
    Debusmann, R., Duchier, D., Kruijff, G.-J.M.: Extensible Dependency Grammar: A New Methodology. In: Recent Advances in Dependency Grammar. Proc. of a workshop at COLING-2004, Geneve (2004)Google Scholar
  16. 16.
    Díaz, I., Moreno, L., Fuentes, I., Pastor, Ó.: Integrating Natural Language Techniques in OO-Method. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 560–571. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Gelbukh, A., Torres, S., Calvo, H.: Transforming a Constituency Treebank into a Dependency Treebank 34, Spain (2005) (submitted to Procesamiento del Lenguaje Natural)Google Scholar
  18. 18.
    Gelbukh, A., Sidorov, G., Velásquez, F.: Análisis morfológico automático del español a través de generación. Escritos 28, 9–26 (2003)Google Scholar
  19. 19.
    Gladki, A.V.: Syntax Structures of Natural Language in Automated Dialogue Systems (in Russian). Moscow, Nauka (1985)Google Scholar
  20. 20.
    Mel’čuk, I.A.: Meaning-text models: a recent trend in Soviet linguistics. Annual Review of Anthropology 10, 27–62 (1981)CrossRefGoogle Scholar
  21. 21.
    Mel’čuk, I.A.: Dependency Syntax: Theory and Practice. State U. Press, NY (1988)Google Scholar
  22. 22.
    Mel’čuk, I.A.: Lexical Functions: A Tool for the Description of Lexical Relations in the Lexicon. In: Wanner, L. (ed.) Lexical Functions in Lexicography and Natural Language Processing, Benjamins, Amsterdam/Philadelphia (1996)Google Scholar
  23. 23.
    Montes-y-Gómez, M., Gelbukh, A.F., López-López, A.: Text Mining at Detail Level Using Conceptual Graphs. In: Priss, U., Corbett, D.R., Angelova, G. (eds.) ICCS 2002. LNCS (LNAI), vol. 2393, pp. 122–136. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  24. 24.
    Montes-y-Gómez, M., López-López, A., Gelbukh, A.: Information Retrieval with Conceptual Graph Matching. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 312–321. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  25. 25.
    Pollard, C., Sag, I.: Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago (1994)Google Scholar
  26. 26.
    Sag, I., Wasow, T., Bender, E.M.: Syntactic Theory. A Formal Introduction, 2nd edn. CSLI Publications, Stanford, CA (2003)MATHGoogle Scholar
  27. 27.
    Sowa, J.F.: Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley Publishing Co., Reading (1984)MATHGoogle Scholar
  28. 28.
    Steele, J.: Meaning-Text Theory. Linguistics, Lexicography, and Implications. Univ. of Ottawa Press, Ottawa (1990)Google Scholar
  29. 29.
    Tapanainen, P.: Parsing in two frameworks: finite-state and functional dependency grammar. Academic Dissertation. University of Helsinki, Language Technology, Department of General Linguistics, Faculty of Arts (1999)Google Scholar
  30. 30.
    Tesnière, L.: Eléments de syntaxe structurale. Librairie Klincksieck. Paris (1959)Google Scholar
  31. 31.
    Yuret, D.: Discovery of Linguistic Relations Using Lexical Attraction, PhD thesis, MIT (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hiram Calvo
    • 1
  • Alexander Gelbukh
    • 1
  1. 1.Natural Language Processing Laboratory, Center for Computing ResearchNational Polytechnic InstituteMexico CityMexico

Personalised recommendations