Abstract
This paper presents an effective method for case law retrieval based on semantic document similarity and a web application for querying Finnish case law. The novelty of the work comes from the idea of using legal documents for automatic formulation of the query, including case law judgments, legal case descriptions, or other texts. The query documents may be in various formats, including image files with text content. This approach allows efficient search for similar documents without the need to specify a query string or keywords, which can be difficult in this use case. The application leverages two traditional word frequency based methods, TF-IDF and LDA, alongside two modern neural network methods, Doc2Vec and Doc2VecC. Effectiveness of the approach for document relevance ranking has been evaluated using a gold standard set of inter-document similarities. We show that a linear combination of similarities derived from the individual models provides a robust automatic similarity assessment for ranking the case law documents for retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., Guo, W.: * SEM 2013 shared task: semantic textual similarity. In: Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, vol. 1, pp. 32–43 (2013)
Agirre, E., Diab, M., Cer, D., Gonzalez-Agirre, A.: Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 385–393. Association for Computational Linguistics (2012)
Ash, E., Chen, D.L.: Case vectors: spatial representations of the law using document embeddings. Social Science Research Network (Working paper) (2018)
Basu, M., Ghosh, S., Ghosh, K.: Overview of the fire 2018 track: information retrieval from microblogs during disasters (IRMiDis). In: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE 2018, pp. 1–5. ACM, New York (2018)
Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research-paper recommender systems: a literature survey. Int. J. Digit. Libr. 17(4), 305–338 (2015). https://doi.org/10.1007/s00799-015-0156-0
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Brants, T.: Natural language processing in information retrieval. In: Proceedings of the 14th Meeting of Computational Linguistics in the Netherlands, pp. 1–12 (2004)
Brown, T.B.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Campr, M., Ježek, K.: Comparing semantic models for evaluating automatic document summarization. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS (LNAI), vol. 9302, pp. 252–260. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24033-6_29
Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., Hon, H.-W.: Adapting ranking SVM to document retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193. ACM (2006)
Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv preprint arXiv:1708.00055 (2017)
Chen, M.: Efficient vector representation for documents through corruption. In: 5th International Conference on Learning Representations. OpenReview.net (2017)
Council of the European Union: Council conclusions inviting the introduction of the European Case Law Identifier (ECLI) and a minimum set of uniform metadata for case law. In: Official Journal of the European Union, C 127, 29.4.2011, pp. 1–7. Publications Office of the European Union (2011)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Pandey, S., Purohit, G.N., Munshi, U.M.: Data security in cloud-based applications. In: Munshi, U.M., Verma, N. (eds.) Data Science Landscape. SBD, vol. 38, pp. 321–326. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-7515-5_24
Hyvönen, E., et al.: LawSampo: a semantic portal on a linked open data service for Finnish legislation and case law. In: Proceedings of ESWC 2020, Poster and Demo Papers. Springer, Heidelberg (2020, in press)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Kim, D., Seo, D., Cho, S., Kang, P.: Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec. Inf. Sci. 477, 15–29 (2019)
Landthaler, J., Waltl, B., Holl, P., Matthes, F.: Extending full text search for legal document collections using word embeddings. In: JURIX, pp. 73–82 (2016)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Mäkelä, E.: LAS: an integrated language analysis tool for multiple languages. J. Open Source Softw. 1(6), 35 (2016)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, Chap. 6. Cambridge University Press, New York, NY, USA (2008)
Marelli, M., Bentivogli, L., Baroni, M., Bernardi, R., Menini, S., Zamparelli, R.: Semeval-2014 task 1: evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. In: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval 2014, pp. 1–8 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546 (2013)
Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, pp. 83–84. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016)
Oksanen, A., Tuominen, J., Mäkelä, E., Tamper, M., Hietanen, A., Hyvönen, E.: Semantic Finlex: Finnish legislation and case law as a linked open data service. In: Proceedings of Law via the Internet 2018: Knowledge of the Law in the Big Data Age (abstracts), LVI 2018, pp. 212–228 (October 2018)
Oksanen, A., Tuominen, J., Mäkelä, E., Tamper, M., Hietanen, A., Hyvönen, E.: Semantic Finlex: transforming, publishing, and using Finnish legislation and case law as linked open data on the web. In: Peruginelli, G., Faro, S. (eds.) Knowledge of the Law in the Big Data Age. Frontiers in Artificial Intelligence and Applications, vol. 317, pp. 212–228. IOS Press (2019). ISBN 978-1-61499-984-3 (print); ISBN 978-1-61499-985-0 (online)
van Opijnen, M., Peruginelli, G., Kefali, E., Palmirani, M.: On-line publication of court decisions in the EU: report of the policy group of the project ‘building on the European case law identifier’ (15 February 2017). https://ssrn.com/abstract=3088495, http://dx.doi.org/10.2139/ssrn.3088495
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Qin, T., Liu, T.-Y., Xu, J., Li, H.: LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf. Retr. 13(4), 346–374 (2010)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Shin, J.-H., Abebe, M., Yoo, C.J., Kim, S., Lee, J.H., Yoo, H.-K.: Evaluating the effectiveness of the vector space retrieval model indexing. In: Park, J.J.J.H., Pan, Y., Yi, G., Loia, V. (eds.) CSA/CUTE/UCAWSN-2016. LNEE, vol. 421, pp. 680–685. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-3023-9_104
Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE Computer Society, Washington, DC, USA (2007)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Acknowledgements
Thanks for collaborations to Aki Hietanen, Saara Packalen, Tiina Husso, and Oili Salminen at Ministry of Justice, Finland, to Minna Tamper and Jouni Tuominen at Aalto University and University of Helsinki, and to Jari Linhala, Arttu Oksanen, and Risto Talo and at Edita Publishing Ltd. This work is part of the Anoppi project funded by Finnish Ministry of Justice (https://oikeusministerio.fi/en/project?tunnus=OM042:00/2018).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sarsa, S., Hyvönen, E. (2020). Searching Case Law Judgments by Using Other Judgments as a Query. In: Filchenkov, A., Kauttonen, J., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2020. Communications in Computer and Information Science, vol 1292. Springer, Cham. https://doi.org/10.1007/978-3-030-59082-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-59082-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59081-9
Online ISBN: 978-3-030-59082-6
eBook Packages: Computer ScienceComputer Science (R0)