Skip to main content

A Comparative Study of Information Retrieval Models for Short Document Summaries

  • Conference paper
  • First Online:
Computer Networks and Inventive Communication Technologies

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 75))

Abstract

The judicial system has evolved tremendously over the past years. Thousands of cases are registered daily and stored in the form on documents which are used by lawyers whenever required. Lawyers are important stakeholders in judicial system and constantly study multiple cases during their work. Manual retrieval of this information from a collection is very difficult. This is where the information retrieval system comes in picture. This article is a brief comparison of various information retrieval models which are currently being used. It includes the Boolean model, TF-IDF model, vector space model, Okapi BM25 model and fuzzy search models. Each of these models is tested on three datasets, and their results were noted. The experimental results unfold that the Okapi BM25 model outperformed the other models in the case study. The results also show that document pre-processing plays an important role in the effectiveness of the query-document matching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aguilar, J., Salazar, C., Velasco, H., Monsalve-Pulido, J., Montoya, E.: Comparison and evaluation of different methods for the feature extraction from educational contents. Computation 8 (2020)

    Google Scholar 

  2. Dai, S., Diao, Q., Zhou, C.: Performance comparison of language models for information retrieval. IFIP Adv. Inf. Commun. Technol. 187 (2005)

    Google Scholar 

  3. Svore, K.M., Burges, C.J.C.: A machine learning approach for improved BM25 retrieval. In: International Conference on Information and Knowledge Management, Proceedings (2009). https://doi.org/10.1145/1645953.1646237

  4. Jimenez, S., Cucerzan, S.P., Gonzalez, F.A., Gelbukh, A., Dueñas, G.: BM25-CTF: improving TF and IDF factors in BM25 by using collection term frequencies. J. Intell. Fuzzy Syst. 34 (2018)

    Google Scholar 

  5. Kural, Y.B., Robertson, S., Jones, S.: Clustering information retrieval search outputs (1999). https://doi.org/10.14236/ewic/irsg1999.9

  6. Rekha, J.U.: Instant fuzzy search using probabilistic-correlation based ranking. Indian J. Sci. Technol. (2020). https://doi.org/10.17485/ijst/v13i11.2020-32

    Article  Google Scholar 

  7. Bhatia, P.K., Mathur, T., Gupta, T.: Survey paper on information retrieval algorithms and personalized information retrieval concept. Int. J. Comput. Appl. 66 (2013)

    Google Scholar 

  8. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval (2008). https://doi.org/10.1017/cbo9780511809071

  9. Cranfield collection. http://ir.dcs.gla.ac.uk/resources/test_collections/cran/

  10. Robertson, S.: Microsoft Cambridge at TREC-9: filtering track (2001)

    Google Scholar 

  11. Soergel, D.: TREC: Experiment and Evaluation in Information Retrieval (Book Review). Digital Libraries and Electronic Publishing (2006)

    Google Scholar 

  12. CACM collection. http://ir.dcs.gla.ac.uk/resources/test_collections/cacm/

  13. CISI (a data set for information retrieval). https://www.kaggle.com/dmaso01dsta/cisi-a-dataset-for-information-retrieval

  14. Singhal, A.: Modern information retrieval: a brief overview. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 24 (2001)

    Google Scholar 

  15. Pannu, M., James, A., Bird, R.: A comparison of information retrieval models. In: Proceedings of WCCCE 2014: The 19th Western Canadian Conference on Computing Education—In-Cooperation with ACM SIGCSE (2014). https://doi.org/10.1145/2597959.2597978

  16. Ponte, J.M., Croft, W.B.: Language modeling approach to information retrieval. SIGIR Forum (1998). https://doi.org/10.1145/3130348.3130368

  17. Xue, G.R., et al.: Optimizing web search using web click-through data. In: International Conference on Information and Knowledge Management, Proceedings (2004). https://doi.org/10.1145/1031171.1031192

  18. Amo, P., Ferreras, F.L., Cruz, F., Rosa, M.: Smoothing functions for automatic relevance feedback in information retrieval. In: Proceedings—International Workshop on Database and Expert Systems Applications, DEXA, vol. 2000, Jan 2000

    Google Scholar 

  19. Trotman, A., Puurula, A., Burgess, B.: Improvements to BM25 and language models examined. In: Proceedings of the 2014 Australasian Document Computing Symposium, pp. 58–65. Association for Computing Machinery (2014). https://doi.org/10.1145/2682862.2682863

  20. Joby, P.P.: Exploring devops: challenges and benefits. J. Inf. Technol. 1(01), 27–37 (2019)

    Google Scholar 

  21. Chen, J.I.Z., Lai, K.-L.: Data conveyance maximization in bilateral relay system using optimal time assignment. J. Ubiquitous Comput. Commun. Technol. (UCCT) 2(02), 109–117 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Desai, D., Ghadge, A., Wazare, R., Bagade, J. (2022). A Comparative Study of Information Retrieval Models for Short Document Summaries. In: Smys, S., Bestak, R., Palanisamy, R., Kotuliak, I. (eds) Computer Networks and Inventive Communication Technologies . Lecture Notes on Data Engineering and Communications Technologies, vol 75. Springer, Singapore. https://doi.org/10.1007/978-981-16-3728-5_42

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-3728-5_42

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-3727-8

  • Online ISBN: 978-981-16-3728-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics