Abstract
The judicial system has evolved tremendously over the past years. Thousands of cases are registered daily and stored in the form on documents which are used by lawyers whenever required. Lawyers are important stakeholders in judicial system and constantly study multiple cases during their work. Manual retrieval of this information from a collection is very difficult. This is where the information retrieval system comes in picture. This article is a brief comparison of various information retrieval models which are currently being used. It includes the Boolean model, TF-IDF model, vector space model, Okapi BM25 model and fuzzy search models. Each of these models is tested on three datasets, and their results were noted. The experimental results unfold that the Okapi BM25 model outperformed the other models in the case study. The results also show that document pre-processing plays an important role in the effectiveness of the query-document matching.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aguilar, J., Salazar, C., Velasco, H., Monsalve-Pulido, J., Montoya, E.: Comparison and evaluation of different methods for the feature extraction from educational contents. Computation 8 (2020)
Dai, S., Diao, Q., Zhou, C.: Performance comparison of language models for information retrieval. IFIP Adv. Inf. Commun. Technol. 187 (2005)
Svore, K.M., Burges, C.J.C.: A machine learning approach for improved BM25 retrieval. In: International Conference on Information and Knowledge Management, Proceedings (2009). https://doi.org/10.1145/1645953.1646237
Jimenez, S., Cucerzan, S.P., Gonzalez, F.A., Gelbukh, A., Dueñas, G.: BM25-CTF: improving TF and IDF factors in BM25 by using collection term frequencies. J. Intell. Fuzzy Syst. 34 (2018)
Kural, Y.B., Robertson, S., Jones, S.: Clustering information retrieval search outputs (1999). https://doi.org/10.14236/ewic/irsg1999.9
Rekha, J.U.: Instant fuzzy search using probabilistic-correlation based ranking. Indian J. Sci. Technol. (2020). https://doi.org/10.17485/ijst/v13i11.2020-32
Bhatia, P.K., Mathur, T., Gupta, T.: Survey paper on information retrieval algorithms and personalized information retrieval concept. Int. J. Comput. Appl. 66 (2013)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval (2008). https://doi.org/10.1017/cbo9780511809071
Cranfield collection. http://ir.dcs.gla.ac.uk/resources/test_collections/cran/
Robertson, S.: Microsoft Cambridge at TREC-9: filtering track (2001)
Soergel, D.: TREC: Experiment and Evaluation in Information Retrieval (Book Review). Digital Libraries and Electronic Publishing (2006)
CACM collection. http://ir.dcs.gla.ac.uk/resources/test_collections/cacm/
CISI (a data set for information retrieval). https://www.kaggle.com/dmaso01dsta/cisi-a-dataset-for-information-retrieval
Singhal, A.: Modern information retrieval: a brief overview. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 24 (2001)
Pannu, M., James, A., Bird, R.: A comparison of information retrieval models. In: Proceedings of WCCCE 2014: The 19th Western Canadian Conference on Computing Education—In-Cooperation with ACM SIGCSE (2014). https://doi.org/10.1145/2597959.2597978
Ponte, J.M., Croft, W.B.: Language modeling approach to information retrieval. SIGIR Forum (1998). https://doi.org/10.1145/3130348.3130368
Xue, G.R., et al.: Optimizing web search using web click-through data. In: International Conference on Information and Knowledge Management, Proceedings (2004). https://doi.org/10.1145/1031171.1031192
Amo, P., Ferreras, F.L., Cruz, F., Rosa, M.: Smoothing functions for automatic relevance feedback in information retrieval. In: Proceedings—International Workshop on Database and Expert Systems Applications, DEXA, vol. 2000, Jan 2000
Trotman, A., Puurula, A., Burgess, B.: Improvements to BM25 and language models examined. In: Proceedings of the 2014 Australasian Document Computing Symposium, pp. 58–65. Association for Computing Machinery (2014). https://doi.org/10.1145/2682862.2682863
Joby, P.P.: Exploring devops: challenges and benefits. J. Inf. Technol. 1(01), 27–37 (2019)
Chen, J.I.Z., Lai, K.-L.: Data conveyance maximization in bilateral relay system using optimal time assignment. J. Ubiquitous Comput. Commun. Technol. (UCCT) 2(02), 109–117 (2020)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Desai, D., Ghadge, A., Wazare, R., Bagade, J. (2022). A Comparative Study of Information Retrieval Models for Short Document Summaries. In: Smys, S., Bestak, R., Palanisamy, R., Kotuliak, I. (eds) Computer Networks and Inventive Communication Technologies . Lecture Notes on Data Engineering and Communications Technologies, vol 75. Springer, Singapore. https://doi.org/10.1007/978-981-16-3728-5_42
Download citation
DOI: https://doi.org/10.1007/978-981-16-3728-5_42
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3727-8
Online ISBN: 978-981-16-3728-5
eBook Packages: EngineeringEngineering (R0)