Text Mining of Supreme Administrative Court Jurisdictions

  • Ingo Feinerer
  • Kurt Hornik
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Within the last decade text mining, i.e., extracting sensitive information from text corpora, has become a major factor in business intelligence. The automated textual analysis of law corpora is highly valuable because of its impact on a company’s legal options and the raw amount of available jurisdiction. The study of supreme court jurisdiction and international law corpora is equally important due to its effects on business sectors.

In this paper we use text mining methods to investigate Austrian supreme administrative court jurisdictions concerning dues and taxes. We analyze the law corpora using R with the new text mining package tm. Applications include clustering the jurisdiction documents into groups modeling tax classes (like income or value-added tax) and identifying jurisdiction properties. The findings are compared to results obtained by law experts.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. ACHATZ, M., KAMPER, K., and RUPPE H. (1987): Die Rechtssprechung des VwGH in Abgabensachen. Orac Verlag, Wien.Google Scholar
  2. CONRAD, J., AL-KOFAHI, K., ZHAO, Y. and KARYPIS, G. (2005): Effective Document Clustering for Large Heterogeneous Law Firm Collections. In: 10th International Con-ference on Artificial Intelligence and Law (ICAIL). 177-187.Google Scholar
  3. FEINERER, I. (2007): tm: Text Mining Package, R package version 0.1-2.Google Scholar
  4. HORNIK, K. (2007): Snowball: Snowball Stemmers, R package version 0.0-1.Google Scholar
  5. KARATZOGLOU, A. and FEINERER, I. (2007): Text Clustering with String Kernels in R. In: Advances in Data Analysis (Proceedings of the 30th Annual Conference of the GfKl). 91-98. Springer-Verlag.Google Scholar
  6. KARATZOGLOU, A., SMOLA, A. and HORNIK, K. (2006): kernlab: Kernel-based machine learning methods including support vector machines, R package version 0.9-1.Google Scholar
  7. KARATZOGLOU, A., SMOLA, A., HORNIK, K. and ZEILEIS, A. (2004): kernlab — An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11(9), 1-20.Google Scholar
  8. LODHI, H., SAUNDERS, C., SHAWE-TAYLOR, J., WATKINS, C., and CRISTIANINI, N. (2002): Text classification using string kernels. Journal of Machine Learning Research, 2,419-444.MATHCrossRefGoogle Scholar
  9. NAGEL, H. and MAMUT, M. (2006): Rechtsprechung des VwGH in Abgabensachen 2000-2004.Google Scholar
  10. PORTER, M. (1980): An algorithm for suffix stripping. Program, 14(3), 130-137.Google Scholar
  11. R DEVELOPMENT CORE TEAM (2006): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/.
  12. SCHWEIGHOFER, E. (1999): Legal Knowledge Representation, Automatic Text Analysis in Public International and European Law. Kluwer Law International, Law and Electronic Commerce, Volume 7, The Hague. ISBN 9041111484.Google Scholar
  13. TEMPLE LANG, D. (2006): Rstem: Interface to Snowball implementation of Porter’s word stemming algorithm, R package version 0.3-1.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ingo Feinerer
    • 1
  • Kurt Hornik
    • 1
  1. 1.Department of Statistics and MathematicsWirtschaftsuniversität WienWienAustria

Personalised recommendations