Automatic Identification of Legal Terms in Czech Law Texts

Pala, Karel; Rychlý, Pavel; Šmerk, Pavel

doi:10.1007/978-3-642-12837-0_5

Karel Pala²²,
Pavel Rychlý²² &
Pavel Šmerk²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6036))

2295 Accesses
5 Citations

Abstract

Law texts including constitution, acts, public notices and court judgements form a huge database of texts. As many texts from small domains, the used sublanguage is partially restricted and also different from general language (Czech). As a starting collection of data, the legal database Lexis containing approx. 50,000 Czech law documents has been chosen. Our attention is concentrated mostly on noun groups, which are the main candidates for law terms. We were able to recognize 3992 such different noun groups in the selected text samples. The paper also presents results of the morphological analysis, lemmatization, tagging, disambiguation, and the basic syntactic analysis of Czech law texts as these tasks are crucial for any further sophisticated natural language processing. The verbs in legal texts have been explored preliminarily as well. In this respect, we are trying to explore how the linguistic analysis can help in identification of the semantic nature of law terms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sedláček, R.: Morphemic Analyser for Czech. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2005)
Google Scholar
Šmerk, P.: Towards morphological disambiguation of Czech. PhD thesis proposals, Faculty of Informatics, Masaryk University, Brno (2007) (in Czech)
Google Scholar
Žáčková, E.: Partial syntactic analysis of Czech. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2002) (in Czech)
Google Scholar
Horák, A., Hlaváčková, D.: VerbaLex – New Comprehensive Lexicon of Verb Valencies for Czech. In: Computer Treatment of Slavic and East European Languages, Third International Seminar, Bratislava, VEDA, pp. 107–115 (2005)
Google Scholar
Schulze, B.M., Christ, O.: The CQP User’s Manual (1996)
Google Scholar
Rychlý, P.: Corpus managers and their effective implementation. PhD thesis, Faculty of Informatics, Masaryk University, Brno (2000)
Google Scholar
Čermák, F., et al.: The Czech National Corpus – SYN2000. Institute of the Czech National Corpus, Prague (2000), http://www.korpus.cz
Vossen, P., et al.: The EuroWordNet Base Concepts and Top Ontology. Technical Report Deliverable D017, EuroWordNet LE2 4003, University of Amsterdam (1998)
Google Scholar
Miller, G.A., Fellbaum, C., et al.: WordNet 3.0. Princeton University (2006), http://wordnet.princeton.edu
Pala, K., Ševeček, P.: Valence českých sloves (Valences of Czech Verbs). In: Sborník prací Filozofické fakulty Masarykovy univerzity, Brno, Masaryk University, pp. 41–54 (1997)
Google Scholar
Peters, W., Sagri, M., Tiscornia, D.: The structuring of legal knowledge in LOIS. Artficial Intelligence and Law 15, 2 (2007)
Google Scholar
Hlaváčková, D., Khokhlova, M., Pala, K.: Semantic Classes of Czech Verbs. In: Proceedings of the IIS Conference 2009, Krakow (2009) (in print)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala, Pavel Rychlý & Pavel Šmerk

Authors

Karel Pala
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Rychlý
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Šmerk
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Legal Information, Theory and Techniques, ITTIG-CNR, Via dei Barucci 20, 50127, Florence, Italy
Enrico Francesconi & Daniela Tiscornia &
Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC) - CNR, Area della Ricerca di Pisa,, Via Moruzzi 1, 56124, Pisa, Italy
Simonetta Montemagni
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK
Wim Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pala, K., Rychlý, P., Šmerk, P. (2010). Automatic Identification of Legal Terms in Czech Law Texts. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds) Semantic Processing of Legal Texts. Lecture Notes in Computer Science(), vol 6036. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12837-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-12837-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12836-3
Online ISBN: 978-3-642-12837-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics