Advertisement

An Experiment with Theme–Rheme Identification

  • Karel Pala
  • Ondřej Svoboda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8655)

Abstract

In this paper we start from the theory of Functional Sentence Perspective developed primarily by Firbas [1], Svoboda [12] and also later by Sgall et al. [9].

We make an attempt to formulate and implement a procedure for Czech allowing to automatically recognize which sentence constituents carry information that is contextually dependent and thus known to an addressee (theme), constituents containing new information (rheme), and also constituents bearing non-thematic and non-rhematic information (transition).

The experimental implementation of the procedure uses tools developed in NLP Centre, FI MU, particularly the morphological analyzer Majka [17], disambiguator DESAMB [16] and parser SET [5].

As a starting data resource we use a small corpus of 120 Czech sentences, which at the moment does not include a free continuous text. This is motivated by the fact that we do not use syntactically pre-tagged text but perform syntactic analysis directly using the parser SET. Thus, we offer only a very basic evaluation, which captures the main FSP phenomena and shows that the task is feasible.

The toolset developed for the experiment consists of two parts: first, a chunker, which determines word-order positions from the parse tree of a sentence, second, an FSP tagger which is the implementation of the procedure. It labels the chunks with the tags of what is further called functional elements (e.g. theme proper, transition, rheme proper). An experimental version is available at http://nlp.fi.muni.cz/~xsvobo15/fsp/fsp.html .

Keywords

rule-based parsing chunking functional sentence perspective 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Firbas, J.: On the problem of non-thematic subjects in contemporary English (English summary of “k otázce nezákladových podmětů v současné angličtině”, ib. pp. 22–42 and 165–173). Časopis pro moderní filologii 39, 171–173 (1957)Google Scholar
  2. 2.
    Firbas, J.: Functional sentence perspective in written and spoken communication. Cambridge University Press (1992) (reprinted 1995)Google Scholar
  3. 3.
    Hajičová, E., Sgall, P., Skoumalová, H.: An automatic procedure for topic-focus identification. Journal of Computational Linguistics 21(1), 81–94 (1995)Google Scholar
  4. 4.
    Karlík, P., Svoboda, A.: Skladba češtiny pro cizince (Czech Syntax for Foreigners). Univerzita J.E. Purkyně, Faculty of Arts, Brno (1982)Google Scholar
  5. 5.
    Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: A new parsing system for Czech. In: Human Language Technology: Challenges for Computer Science and Linguistics, pp. 161–171 (2011)Google Scholar
  6. 6.
    Mathesius, V.: O tak zvaném aktuálním členění větném (on the so-called functional sentence perspective). Slovo a Slovesnost 5, 171–174 (1939)Google Scholar
  7. 7.
    Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolářová-řezníčková, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical layer in the Prague Dependency Treebank. Tech. rep., ÚFAL MFF UK, Prague, Czech Republic (2005), http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/t-layer/html/index.html
  8. 8.
    Pala, K., Svoboda, O.: Semi-automatic theme-rheme identification. In: Proceedings of the Raslan Workshop, pp. 39–48. Karlova Studánka (2013)Google Scholar
  9. 9.
    Sgall, P.: Towards a definition of focus and topic. Prague Bulletin of Mathematical Linguistics 31, 32, 3–25, 24–32 (1979, 1980)Google Scholar
  10. 10.
    Steinberger, R., Bennett, P.: Automatic recognition of theme, focus and contrastive stress. In: Proceedings of the Conference Focus and NLP (1994)Google Scholar
  11. 11.
    Svoboda, A.: České slovosledné pozice z pohledu aktuálního členění. Slovo a slovesnost 45, 22–34, 88–103 (1984), http://kramerius.lib.cas.cz/search/i.jsp?pid=uuid:c9de3a32-530d-11e1-1418-001143e3f55c
  12. 12.
    Svoboda, A.: Kapitoly z funkční syntaxe. In: Spisy pedagogické fakulty v Ostravě. vol. 66 (1989)Google Scholar
  13. 13.
    Veselá, K., Havelka, J.: Anotování aktuálního členění věty v pražském závislostním korpusu, ÚFAL/CKL TR-2003-20 (2003), http://ufal.mff.cuni.cz/pdt2.0/publications/VeselaHavelkaTR2003.pdf
  14. 14.
    Zikánová, Š., Týnovský, M.: Identification of topic and focus in czech: Comparative evaluation on prague dependency treebank. In: Studies in Formal Slavic Phonology, Morphology, Syntax, Semantics and Information Structure (Formal Description of Slavic Languages 7, pp. 343–353. Peter Lang, Frankfurt am Main (2009)Google Scholar
  15. 15.
    Zikánová, Š., Týnovský, M., Havelka, J.: Identification of topic and focus in czech: Evaluation of manual parallel annotations. The Prague Bulletin of Mathematical Linguistics (87), 61–70 (2007)Google Scholar
  16. 16.
    Šmerk, P.: Unsupervised learning of rules for morphological disambiguation. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 211–216. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Šmerk, P.: Majka – fast morphological analyzer. In: Proceedings of the Raslan Workshop, pp. 13–16. Masarykova Univerzita, Brno (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Karel Pala
    • 1
  • Ondřej Svoboda
    • 1
  1. 1.Natural Language Processing Centre, Faculty of Informatics, Faculty of ArtsMasaryk UniversityBrnoCzech Republic

Personalised recommendations