Advertisement

Abstract

The aim of this paper is to present the design of a partial syntactic annotation of the IPI PAN Corpus of Polish [22] and the corresponding extension of the corpus search engine Poliqarp [25,12] developed at the Institue of Computer Science PAS and currently employed in Polish and Portuguese corpora projects. In particular, we will argue for the need to distinguish between, and represent both, syntactic and semantic heads, and we will sketch the representation of coordination, the area traditionally controversial both in theoretical and in computational linguistics. The annotation is designed in a way intended to maximise the usefulness of the resulting corpus for the task of automatic valence acquisition.

Keywords

Nominal Group Coordinate Structure Measure Phrase CSLI Publication Nominal Phrase 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barreto, F., Branco, A., Ferreira, E., Mendes, A., Nascimento, M.F., Nunes, F., Silva, J.: Open resources and tools for the shallow processing of Portuguese: The TagShare project. In: Proceedings of LREC (2006)Google Scholar
  2. 2.
    Beavers, J., Sag, I.A.: Coordinate ellipsis and apparent non-constitutent coordination. In: Müller, S. (ed.) Proceedings of the HPSG04 Conference, pp. 48–69. CSLI Publications, Stanford (2004)Google Scholar
  3. 3.
    Bloomfield, L.: Language. Holt, New York (1933)Google Scholar
  4. 4.
    Böhmová, A., Hajič, J., Hajičová, E., Hladká, B.: The Prague Dependency Treebank: Three-level annotation scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 103–127. Kluwer, Dordrecht (2003)CrossRefGoogle Scholar
  5. 5.
    Christ, O.: A modular and flexible architecture for an integrated corpus query system. In: COMPLEX’94, Budapest (1994)Google Scholar
  6. 6.
    Covington, M.A.: A 700-year-old argument for a syntactic transformation. http://www.ai.uga.edu/mc/trans700.html
  7. 7.
    Fast, J., Przepiórkowski, A.: Automatic extraction of Polish verb subcategorization: An evaluation of common statistics. In: Vetulani, Z. (ed.) Proceedings of the 2nd Language & Technology Conference, Poznañ, Poland, pp. 191–195 (2005)Google Scholar
  8. 8.
    Fillmore, C.J., Baker, C.F., Sato, H.: Seeing arguments through transparent structures. In: Proceedings of LREC 2002, Las Palmas, Canary Islands, Spain, pp. 787–791. ELRA (2002)Google Scholar
  9. 9.
    Fillmore, C.J., Johnson, C.R., Petruck, M.R.L.: Background to FrameNet. International Journal of Lexicography 16(3), 235–250 (2003)CrossRefGoogle Scholar
  10. 10.
    Huang, C.-R., Keh-Jiann, C., Feng-Yi, C., Keh-Jiann, C., Zhao-Ming, G., Kuang-Yu, C.: Sinica treebank: Design criteria, annotation guidelines, and on-line interface. In: Proceedings of 2nd Chinese Language Processing Workshop (Held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, ACL-2000), Hong Kong, pp. 29–37 (2000)Google Scholar
  11. 11.
    Ide, N., Bonhomme, P., Romary, L.: XCES: An XML-based standard for linguistic corpora. In: Proceedings of the Linguistic Resources and Evaluation Conference, Athens, Greece, pp. 825–830 (2000)Google Scholar
  12. 12.
    Janus, D., Przepiórkowski, A.: Poliqarp 1.0: Some technical aspects of a linguistic search engine for large corpora. In: Waliñski, J., Kredens, K., Goźdź-Roszkowski, S. (eds.) The proceedings of Practical Applications of Linguistic Corpora 2005, Peter Lang, Frankfurt am Main (2006)Google Scholar
  13. 13.
    Kallas, K.: Składnia współczesnych polskich konstrukcji współrzȩdnych. Wydawnictwo Uniwersytetu Mikołaja Kopernika, Toruń (1993)Google Scholar
  14. 14.
    Kosek, I.: Przyczasownikowe frazy przyimkowo-nominalne w zdaniach współczesnego jȩzyka polskiego. Wydawnictwo Uniwersytetu Warmińsko-Mazurskiego, Olsztyn (1999)Google Scholar
  15. 15.
    Lezius, W.: TIGERSearch — ein Suchwerkzeug für Baumbanken. In: Busemann, S. (ed.) Proceedings der 6. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2002), Saarbrücken (2002)Google Scholar
  16. 16.
    Mel’čuk, I.A.: Levels of dependency in linguistic description: concepts and problems. In: Àgel, V., Eichinger, L., Eroms, H.-W., Hellwig, P., Heringer, H.-J., Lobin, H. (eds.) Dependenz und Valenz: Ein Internationales Handbuch Der Zeitgenösischen Forschung, pp. 188–229. De Gruyter, Berlin (2003)Google Scholar
  17. 17.
    Monz, C., de Rijke, M.: Tequesta: The University of Amsterdam’s texual question answering system. In: Proceedings of Tenth Text Retrieval Conference (TREC-10), pp. 513–522 (2001)Google Scholar
  18. 18.
    Nivre, J.: Theory-supporting treebanks. In: Nivre, J., Hinrichs, E. (eds.) Proceedings of the Second Workshop on Treebanks and Linguistic Theories (TLT2003), Växjö, Norway, pp. 117–128 (2003)Google Scholar
  19. 19.
    Pollard, C., Sag, I.A.: Information-Based Syntax and Semantics, vol. 1: Fundamentals. CSLI Lecture Notes, vol. 13. CSLI Publications, Stanford (1987)Google Scholar
  20. 20.
    Pollard, C., Sag, I.A.: Head-driven Phrase Structure Grammar. Chicago University Press, Chicago (1994)Google Scholar
  21. 21.
    Przepiórkowski, A.: Case Assignment and the Complement-Adjunct Dichotomy: A Non-Configurational Constraint-Based Approach. Ph. D. dissertation, Universität Tübingen (1999)Google Scholar
  22. 22.
    Adam Przepiórkowski. The IPI PAN Corpus: Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)Google Scholar
  23. 23.
    Przepiórkowski, A.: On heads and coordination in a partial treebank. In: Hajič, J., Nivre, J. (eds.) Proceedings of the TLT 2006, Prague, pp. 163–174 (2006)Google Scholar
  24. 24.
    Przepiórkowski, A., Fast, J.: Baseline experiments in the extraction of Polish valence frames. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining, Advances in Soft Computing, pp. 511–520. Springer, Berlin (2005)CrossRefGoogle Scholar
  25. 25.
    Przepi’orkowski, A., Krynicki, Z., Dêbowski, Ł., Woliński, M., Janus, D., Bański, P.: A search tool for corpora with positional tagsets and ambiguities. In: Proceedings of LREC 2004, Lisbon, pp. 1235–1238. ELRA (2004)Google Scholar
  26. 26.
    Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003, Budapest, pp. 33–40 (2003)Google Scholar
  27. 27.
    Przepiókowski, A., Woliński, M.: The unbearable lightness of tagging: A case study in morphosyntactic tagging of Polish. In: Proceedings of the LINC-03, EACL 2003, pp. 109–116 (2003)Google Scholar
  28. 28.
    Sag, I.A., Gazdar, G., Wasow, T., Weisler, S.: Coordination and how to distinguish categories. Natural Language and Linguistic Theory 3, 117–171 (1985)CrossRefGoogle Scholar
  29. 29.
    Saloni, Z., Świdziński, M.: Składnia współczesnego jȩzyka polskiego, 4th (changed) edn. Wydawnictwo Naukowe PWN, Warsaw (1998)Google Scholar
  30. 30.
    Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Reidel, Dordrecht (1986)Google Scholar
  31. 31.
    Silberztein, M.: Finite-state description of the French determiner system. French Language Studies 13, 221–246 (2003)CrossRefGoogle Scholar
  32. 32.
    M. Świdziński. Realizacje zdaniowe podmiotu-mianownika, czyli o strukturalnych ograniczeniach selekcyjnych. In: A. Markowski (ed.) Opisać słowa, pp. 188–201. Dom Wydawniczy Elipsa (1992)Google Scholar
  33. 33.
    Tesnière, L.: Éléments de Syntaxe Structurale. Klincksieck, Paris (1959)Google Scholar
  34. 34.
    Watson, R., Carroll, J., Briscoe, T.: Efficient extraction of grammatical relations. In: Proceedings of the Ninth International Workshop on Parsing Technology, Vancouver, British Columbia, pp. 160–170. Association for Computational Linguistics (2005)Google Scholar
  35. 35.
    Wright, A., Kathol, A.: When a head is not a head: A constructional approach to exocentricity in English. In: Kim, J.-B., Wechsler, S. (eds.) Proceedings of the 9th International Conference on Head-Driven Phrase Structure Grammar, pp. 373–389. CSLI Publications, Stanford (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Adam Przepiórkowski
    • 1
  1. 1.Institute of Computer Science, Polish Academy of Sciences, WarsawPoland

Personalised recommendations