Unsupervised Keyword Extraction from Polish Legal Texts

  • Michał Jungiewicz
  • Michał Łopuszyński
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8686)

Abstract

In this work, we present an application of the recently proposed unsupervised keyword extraction algorithm RAKE to a corpus of Polish legal texts from the field of public procurement. RAKE is essentially a language and domain independent method. Its only language-specific input is a stoplist containing a set of non-content words. The performance of the method heavily depends on the choice of such a stoplist, which should be domain adopted. Therefore, we complement RAKE algorithm with an automatic approach to selecting non-content words, which is based on the statistical properties of term distribution.

Keywords

Keyword extraction unsupervised learning legal texts 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.): Semantic Processing of Legal Texts. LNCS, vol. 6036. Springer, Heidelberg (2010)Google Scholar
  2. 2.
    Kim, S.N., Medelyan, O., Kan, M.-Y., Baldwin, T.: SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval 2010, p. 21. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  3. 3.
    Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining. Applications and Theory, p. 1. John Wiley and Sons, Ltd. (2010)Google Scholar
  4. 4.
    Church, K.W., Gale, W.A.: Poisson mixtures. Natural Language Engineering 1(02), 163 (1995)CrossRefGoogle Scholar
  5. 5.
    Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)MATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Michał Jungiewicz
    • 1
    • 2
  • Michał Łopuszyński
    • 1
  1. 1.Interdisciplinary Centre for Mathematical and Computational ModellingUniversity of WarsawWarsawPoland
  2. 2.Faculty of Electronics and Information TechnologyWarsaw University of TechnologyWarsawPoland

Personalised recommendations