Abstract
Law texts including constitution, acts, public notices and court judgements form a huge database of texts. As many texts from small domains, the used sublanguage is partially restricted and also different from general language (Czech). As a starting collection of data, the legal database Lexis containing approx. 50,000 Czech law documents has been chosen. Our attention is concentrated mostly on noun groups, which are the main candidates for law terms. We were able to recognize 3992 such different noun groups in the selected text samples. The paper also presents results of the morphological analysis, lemmatization, tagging, disambiguation, and the basic syntactic analysis of Czech law texts as these tasks are crucial for any further sophisticated natural language processing. The verbs in legal texts have been explored preliminarily as well. In this respect, we are trying to explore how the linguistic analysis can help in identification of the semantic nature of law terms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sedláček, R.: Morphemic Analyser for Czech. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2005)
Šmerk, P.: Towards morphological disambiguation of Czech. PhD thesis proposals, Faculty of Informatics, Masaryk University, Brno (2007) (in Czech)
Žáčková, E.: Partial syntactic analysis of Czech. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2002) (in Czech)
Horák, A., Hlaváčková, D.: VerbaLex – New Comprehensive Lexicon of Verb Valencies for Czech. In: Computer Treatment of Slavic and East European Languages, Third International Seminar, Bratislava, VEDA, pp. 107–115 (2005)
Schulze, B.M., Christ, O.: The CQP User’s Manual (1996)
Rychlý, P.: Corpus managers and their effective implementation. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2000)
Čermák, F., et al.: The Czech National Corpus – SYN2000. Institute of the Czech National Corpus, Prague (2000), http://www.korpus.cz
Vossen, P., et al.: The EuroWordNet Base Concepts and Top Ontology. Technical Report Deliverable D017, EuroWordNet LE2 4003, University of Amsterdam (1998)
Miller, G.A., Fellbaum, C., et al.: WordNet 3.0. Princeton University (2006), http://wordnet.princeton.edu
Pala, K., Ševeček, P.: Valence českých sloves (Valences of Czech Verbs). In: Sborník prací Filozofické fakulty Masarykovy univerzity, Brno, Masaryk University, pp. 41–54 (1997)
Peters, W., Sagri, M., Tiscornia, D.: The structuring of legal knowledge in LOIS. Artficial Intelligence and Law 15, 2 (2007)
Hlaváčková, D., Khokhlova, M., Pala, K.: Semantic Classes of Czech Verbs. In: Proceedings of the IIS Conference 2009, Krakow (2009) (in print)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pala, K., Rychlý, P., Šmerk, P. (2010). Automatic Identification of Legal Terms in Czech Law Texts. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds) Semantic Processing of Legal Texts. Lecture Notes in Computer Science(), vol 6036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12837-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-12837-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12836-3
Online ISBN: 978-3-642-12837-0
eBook Packages: Computer ScienceComputer Science (R0)