Mining Phrases from Syntactic Analysis

  • Miloš Jakubíček
  • Aleš Horák
  • Vojtěch Kovář
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5729)


In this paper we describe the exploitation of the syntactic parser synt to obtain information about syntactic structures (such as noun or verb phrases) of common sentences in Czech. These phrases/structures are from the analysis point of view usually identical to nonterminals in the grammar used by the parser to find possible valid derivations of the given sentence. The parser has been extended in such a way that enables its highly ambiguous output to be used for mining those phrases unambiguously and offers several ways how to identify them. To achieve this, some previously unused results of syntactic analysis have been evolved leading to more precise morphological analysis and hence also to deeper distinction among various syntactic (sub)structures. Finally, an application for shallow valency extraction and punctuation correction is presented.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kadlec, V., Horák, A.: New meta-grammar constructs in czech language parser synt. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 85–92. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Kadlec, V.: Syntactic analysis of natural languages based on context-free grammar backbone. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2007)Google Scholar
  3. 3.
    Horák, A.: The Normal Translation Algorithm in Transparent Intensional Logic for Czech. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2001)Google Scholar
  4. 4.
    Pala, K., Rychlý, P., Smrž, P.: DESAM – Annotated Corpus for Czech. In: Jeffery, K. (ed.) SOFSEM 1997. LNCS, vol. 1338, pp. 523–530. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  5. 5.
    Hlaváčková, D., Horák, A., Kadlec, V.: Exploitation of the verbaLex verb valency lexicon in the syntactic analysis of czech. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 79–85. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Pala, K., Ševeček, P.: The valence of czech words. In: Sborník prací FFBU, Brno, Masarykova univerzita, pp. 41–54 (1997)Google Scholar
  7. 7.
    Šmerk, P.: Unsupervised learning of rules for morphological disambiguation. LNCS. Springer, Heidelberg (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Miloš Jakubíček
    • 1
  • Aleš Horák
    • 1
  • Vojtěch Kovář
    • 1
  1. 1.NLP Centre, Faculty of InformaticsMasaryk UniversityBrnoCzech Republic

Personalised recommendations