Advertisement

Syntactic Analysis Using Finite Patterns: A New Parsing System for Czech

  • Vojtěch Kovář
  • Aleš Horák
  • Miloš Jakubíček
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6562)

Abstract

Syntactic analysis of natural languages is considered to be one of the basic steps to advanced natural language processing, such as logical analysis or information retrieval with natural language texts. The Czech language can be characterized as a morphologically rich language with a relatively free word order, which further complicates the problem of syntactic analysis. Current parsing systems for Czech fight many problems including low precision or high ambiguity of the parser output. In this paper, we show a new approach to syntactic analysis of free-word-order languages based on the idea of pattern matching linking rules. The system, named SET, is currently developed and tested with the Czech language as a representative of free-word-order languages with very rich morphological system. We briefly mention current approaches and parsing systems for Czech. Then we describe the basic ideas as well as details of SET’s prototype implementation of the pattern matching approach to syntactic analysis. We also offer preliminary analysis of the system parsing precision and discuss the advantages and disadvantages of this approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baumann, S., Brinckmann, C., Hansen-Schirra, C., et al.: The muli project: Annotation and analysis of information structure in german and english. In: Proceedings of the LREC 2004 Conference, Lisboa, Portugal (2004)Google Scholar
  2. 2.
    Horák, A., Kadlec, V., Smrž, P.: Enhancing Best Analysis Selection and Parser Comparison. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 461–467. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. ACM Press, New York (1999)Google Scholar
  4. 4.
    Mráková, E., Sedláček, R.: From Czech morphology through partial parsing to disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 126–135. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Sgall, P.: Generativní popis jazyka a česká deklinace (Generative Description of the Language and the Czech Declension). Academia, Prague (1967)Google Scholar
  6. 6.
    Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence and Its Semantic and Pragmatic Aspects. Academia/Reidel Publishing Company, Prague, Czech Republic/Dordrecht, Netherlands (1986)Google Scholar
  7. 7.
    Hajič, J.: Complex Corpus Annotation: The Prague Dependency Treebank, Bratislava, Slovakia, Jazykovedný ústav L’. Štúra. SAV (2004)Google Scholar
  8. 8.
    Hajič, J., Collins, M., Ramshaw, L., Tillmann, C.: A Statistical Parser for Czech. In: Proceedings ACL 1999, Maryland, USA (1999)Google Scholar
  9. 9.
    McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-Projective Dependency Parsing using Spanning Tree Algorithms. In: Proceedings of HTL/EMNLP 2005, Vancouver, BC, Canada (2005)Google Scholar
  10. 10.
    Horák, A.: Computer Processing of Czech Syntax and Semantics. Librix.eu, Brno, Czech Republic (2008)Google Scholar
  11. 11.
    Holan, T., Žabokrtský, Z.: Combining Czech Dependency Parsers. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 95–102. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Kovář, V., Jakubíček, M.: Test suite for the Czech parser synt. In: Proceedings of Recent Advances in Slavonic Natural Language Processing 2008, Brno, Czech Republlic, Masaryk University, pp. 63–70 (2008)Google Scholar
  13. 13.
    Horák, A., Holan, T., Kadlec, V., Kovář, V.: Dependency and Phrasal Parsers of the Czech Language: A Comparison. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 76–84. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Sojka, P.: Competing patterns for language engineering. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 157–162. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  15. 15.
    Przepiórkowski, A., Buczyński, A.: \(\spadesuit\): Shallow parsing and disambiguation engine. In: Proceedings of the 3rd Language & Technology Conference, Poznań (2007)Google Scholar
  16. 16.
    Kilgarriff, A., Rychlý, P., Smrž, P., Tugwell, D.: The Sketch Engine. In: Proceedings of the Eleventh EURALEX International Congress, Lorient, France, Universite de Bretagne-Sud, pp. 105–116 (2004)Google Scholar
  17. 17.
    Rychlý, P., Smrž, P.: Manatee, Bonito and Word Sketches for Czech. In: Proceedings of the Second International Conference on Corpus Linguisitcs, pp. 124–132. Saint-Petersburg State University Press, Saint-Petersburg (2004)Google Scholar
  18. 18.
    Kadlec, V.: Syntactic analysis of natural languages based on context-free grammar backbone. PhD thesis, Masaryk University (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Vojtěch Kovář
    • 1
  • Aleš Horák
    • 1
  • Miloš Jakubíček
    • 1
  1. 1.Faculty of InformaticsMasaryk UniversityBrnoCzech Republic

Personalised recommendations