Searching Case Law Judgments by Using Other Judgments as a Query

Sarsa, Sami; Hyvönen, Eero

doi:10.1007/978-3-030-59082-6_11

Sami Sarsa⁸ &
Eero Hyvönen^8,9

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1292))

Included in the following conference series:

Conference on Artificial Intelligence and Natural Language

409 Accesses

Abstract

This paper presents an effective method for case law retrieval based on semantic document similarity and a web application for querying Finnish case law. The novelty of the work comes from the idea of using legal documents for automatic formulation of the query, including case law judgments, legal case descriptions, or other texts. The query documents may be in various formats, including image files with text content. This approach allows efficient search for similar documents without the need to specify a query string or keywords, which can be difficult in this use case. The application leverages two traditional word frequency based methods, TF-IDF and LDA, alongside two modern neural network methods, Doc2Vec and Doc2VecC. Effectiveness of the approach for document relevance ranking has been evaluated using a gold standard set of inter-document similarities. We show that a linear combination of similarities derived from the individual models provides a robust automatic similarity assessment for ranking the case law documents for retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://data.finlex.fi.
2.
https://finto.fi/oiko/en/.
3.
https://finto.fi/koko/en/.
4.
http://casetext.com.
5.
http://fastcase.com.
6.
Cf., e.g., http://www.acclaimip.com/.

References

Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: * SEM 2013 shared task: semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 32–43 (2013)
Google Scholar
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics (2012)
Google Scholar
Ash, E., Chen, D.L.: Case vectors: spatial representations of the law using document embeddings. Social Science Research Network (Working paper) (2018)
Google Scholar
Basu, M., Ghosh, S., Ghosh, K.: Overview of the fire 2018 track: information retrieval from microblogs during disasters (IRMiDis). In: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2018, pp. 1–5. ACM, New York (2018)
Google Scholar
Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research-paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2015). https://doi.org/10.1007/s00799-015-0156-0
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Brants, T.: Natural language processing in information retrieval. In: Proceedings of the 14th Meeting of Computational Linguistics in the Netherlands, pp. 1–12 (2004)
Google Scholar
Brown, T.B.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Campr, M., Ježek, K.: Comparing semantic models for evaluating automatic document summarization. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS (LNAI), vol. 9302, pp. 252–260. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24033-6_29
Chapter Google Scholar
Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., Hon, H.-W.: Adapting ranking SVM to document retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193. ACM (2006)
Google Scholar
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)
Chen, M.: Efficient vector representation for documents through corruption. In: 5th International Conference on Learning Representations. OpenReview.net (2017)
Google Scholar
Council of the European Union: Council conclusions inviting the introduction of the European Case Law Identifier (ECLI) and a minimum set of uniform metadata for case law. In: Official Journal of the European Union, C 127, 29.4.2011, pp. 1–7. Publications Office of the European Union (2011)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Pandey, S., Purohit, G.N., Munshi, U.M.: Data security in cloud-based applications. In: Munshi, U.M., Verma, N. (eds.) Data Science Landscape. SBD, vol. 38, pp. 321–326. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7515-5_24
Chapter Google Scholar
Hyvönen, E., et al.: LawSampo: a semantic portal on a linked open data service for Finnish legislation and case law. In: Proceedings of ESWC 2020, Poster and Demo Papers. Springer, Heidelberg (2020, in press)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Article Google Scholar
Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec. Inf. Sci. 477, 15–29 (2019)
Article Google Scholar
Landthaler, J., Waltl, B., Holl, P., Matthes, F.: Extending full text search for legal document collections using word embeddings. In: JURIX, pp. 73–82 (2016)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Mäkelä, E.: LAS: an integrated language analysis tool for multiple languages. J. Open Source Softw. 1(6), 35 (2016)
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, Chap. 6. Cambridge University Press, New York, NY, USA (2008)
Book Google Scholar
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval 2014, pp. 1–8 (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546 (2013)
Google Scholar
Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, pp. 83–84. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016)
Google Scholar
Oksanen, A., Tuominen, J., Mäkelä, E., Tamper, M., Hietanen, A., Hyvönen, E.: Semantic Finlex: Finnish legislation and case law as a linked open data service. In: Proceedings of Law via the Internet 2018: Knowledge of the Law in the Big Data Age (abstracts), LVI 2018, pp. 212–228 (October 2018)
Google Scholar
Oksanen, A., Tuominen, J., Mäkelä, E., Tamper, M., Hietanen, A., Hyvönen, E.: Semantic Finlex: transforming, publishing, and using Finnish legislation and case law as linked open data on the web. In: Peruginelli, G., Faro, S. (eds.) Knowledge of the Law in the Big Data Age. Frontiers in Artificial Intelligence and Applications, vol. 317, pp. 212–228. IOS Press (2019). ISBN 978-1-61499-984-3 (print); ISBN 978-1-61499-985-0 (online)
Google Scholar
van Opijnen, M., Peruginelli, G., Kefali, E., Palmirani, M.: On-line publication of court decisions in the EU: report of the policy group of the project ‘building on the European case law identifier’ (15 February 2017). https://ssrn.com/abstract=3088495, http://dx.doi.org/10.2139/ssrn.3088495
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Google Scholar
Qin, T., Liu, T.-Y., Xu, J., Li, H.: LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf. Retr. 13(4), 346–374 (2010)
Article Google Scholar
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article Google Scholar
Shin, J.-H., Abebe, M., Yoo, C.J., Kim, S., Lee, J.H., Yoo, H.-K.: Evaluating the effectiveness of the vector space retrieval model indexing. In: Park, J.J.J.H., Pan, Y., Yi, G., Loia, V. (eds.) CSA/CUTE/UCAWSN-2016. LNEE, vol. 421, pp. 680–685. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-3023-9_104
Chapter Google Scholar
Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE Computer Society, Washington, DC, USA (2007)
Google Scholar
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar

Download references

Acknowledgements

Thanks for collaborations to Aki Hietanen, Saara Packalen, Tiina Husso, and Oili Salminen at Ministry of Justice, Finland, to Minna Tamper and Jouni Tuominen at Aalto University and University of Helsinki, and to Jari Linhala, Arttu Oksanen, and Risto Talo and at Edita Publishing Ltd. This work is part of the Anoppi project funded by Finnish Ministry of Justice (https://oikeusministerio.fi/en/project?tunnus=OM042:00/2018).

Author information

Authors and Affiliations

Semantic Computing Research Group (SeCo), Aalto University, Espoo, Finland
Sami Sarsa & Eero Hyvönen
HELDIG – Helsinki Centre for Digital Humanities, University of Helsinki, Helsinki, Finland
Eero Hyvönen

Authors

Sami Sarsa
View author publications
You can also search for this author in PubMed Google Scholar
Eero Hyvönen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sami Sarsa .

Editor information

Editors and Affiliations

ITMO University, St. Petersburg, Russia
Andrey Filchenkov
Haaga-Helia University of Applied Sciences, Helsinki, Finland
Janne Kauttonen
University of Helsinki, Helsinki, Finland
Lidia Pivovarova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarsa, S., Hyvönen, E. (2020). Searching Case Law Judgments by Using Other Judgments as a Query. In: Filchenkov, A., Kauttonen, J., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2020. Communications in Computer and Information Science, vol 1292. Springer, Cham. https://doi.org/10.1007/978-3-030-59082-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-59082-6_11
Published: 30 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59081-9
Online ISBN: 978-3-030-59082-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics