Skip to main content
Log in

Information retrieval methodology for aiding scientific database search

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

During literature reviews, and specially when conducting systematic literature reviews, finding and screening relevant papers during scientific document search may involve managing and processing large amounts of unstructured text data. In those cases where the search topic is difficult to establish or has fuzzy limits, researchers require to broaden the scope of the search and, in consequence, data from retrieved scientific publications may become huge and uncorrelated. However, through a convenient analysis of these data the researcher may be able to discover new knowledge which may be hidden within the search output, thus exploring the limits of the search and enhancing the review scope. With that aim, this paper presents an iterative methodology that applies text mining and machine learning techniques to a downloaded corpus of abstracts from scientific databases, combining automatic processing algorithms with tools for supervised decision-making in an iterative process sustained on the researchers’ judgement, so as to adapt, screen and tune the search output. The paper ends showing a working example that employs a set of developed scripts that implement the different stages of the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Al-Ruithe M, Benkhelifa E, Hameed K (2018) A systematic literature review of data governance and cloud data governance. Pers Ubiquitous Comput. https://doi.org/10.1007/s00779-017-1104-3

  • Buttcher S, Clarke C, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. The MIT Press, Cambridge

    MATH  Google Scholar 

  • Eachempati P, Srivastava PR (2017) Systematic literature review of big data analytics. In: Proceedings of the 2017 ACM SIGMIS conference on computers and people research, ACM, New York, NY, USA, SIGMIS-CPR’17, pp 177–178. https://doi.org/10.1145/3084381.3084422

  • Felizardo KR, Nakagawa EY, Feitosa D, Minghim R, Maldonado JC (2010) An approach based on visual text mining to support categorization and classification in the systematic mapping. In: Proceedings of the 14th international conference on evaluation and assessment in software engineering, BCS learning & development Ltd., Swindon, UK, EASE’10, pp 34–43

  • Franco-Bedoya O, Ameller D, Costal D, Franch X (2017) Open source software ecosystems: a systematic mapping. Inf Softw Technol 91:160–185. https://doi.org/10.1016/j.infsof.2017.07.007

    Article  Google Scholar 

  • Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007

    Article  Google Scholar 

  • Hordri NF, Samar A, Yuhaniz SS, Shamsuddin SM (2017) A systematic literature review on features of deep learning in big data analytics. Int J Adv Soft Comput Appl 9(1):32–49. https://doi.org/10.1016/j.ijinfomgt.2014.10.007

    Article  Google Scholar 

  • Hotho A, Nnberger A, Paa G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62

    Google Scholar 

  • Islam MS, Jubayer FEM, Ahmed SI (2017) A support vector machine mixed with tf-idf algorithm to categorize Bengali document. In: 2017 international conference on electrical, computer and communication engineering (ECCE), pp 191–196. https://doi.org/10.1109/ECACE.2017.7912904

  • Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Berlin, pp 137–142

    Chapter  Google Scholar 

  • Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Version 2.3, EBSE Technical Report EBSE-2007-01, Keele University and University of Durham

  • Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. VLDB Endow 5(12):2032–2033

    Article  Google Scholar 

  • LHeureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797. https://doi.org/10.1109/ACCESS.2017.2696365

    Article  Google Scholar 

  • Marcos-Pablos S, García-Peñalvo F Decision support tools for slr search string construction. In: Proceedings of the 6th international conference on technological ecosystems for enhancing multiculturality, ACM, New York, NY, USA, TEEM 2018 (in press)

  • Marshall C, Brereton P (2013) Tools to support systematic literature reviews in software engineering: a mapping study. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement, pp 296–299. https://doi.org/10.1109/ESEM.2013.32

  • Mayer-Schnberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston

    Google Scholar 

  • Mergel GD, Silveira MS, da Silva TS (2015) A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th annual ACM symposium on applied computing, ACM, New York, NY, USA, SAC’15, pp 1594–1601. https://doi.org/10.1145/2695664.2695902

  • Nelson B, Olovsson T (2016) Security and privacy for big data: a systematic literature review. In: 2016 IEEE international conference on big data (big data), pp 3693–3702. https://doi.org/10.1109/BigData.2016.78410372

  • Olorisade BK, de Quincey E, Brereton P, Andras P (2016) A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’16, pp 14:1–14:11. https://doi.org/10.1145/2915970.2915982

  • O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4:5

    Article  Google Scholar 

  • Petticrew M, Roberts H (2008) Systematic reviews in the social sciences: a practical guide. Wiley, London

    Google Scholar 

  • Ros R, Bjarnason E, Runeson P (2017) A machine learning approach for semi-automated search and selection in literature studies. In: Proceedings of the 21st international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’17, pp 118–127. https://doi.org/10.1145/3084226.3084243

  • Sparck Jones K (1988) Document retrieval systems. Taylor Graham Publishing, London, UK, chap A statistical interpretation of term specificity and its application in retrieval, pp 132–142

  • Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston

    Google Scholar 

  • Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E (2014) Systematic review automation technologies. Syst Rev 3:74

    Article  Google Scholar 

Download references

Acknowledgements

This work has partially funded by the Spanish Government Ministry of Economy and Competitiveness throughout the DEFINES project (Ref. TIN2016-80172-R) and the Ministry of Education of the Junta de Castilla y Leon (Spain) throughout the T-CUIDA project (Ref. SA061P17).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel Marcos-Pablos.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by B. B. Gupta.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marcos-Pablos, S., García-Peñalvo, F.J. Information retrieval methodology for aiding scientific database search. Soft Comput 24, 5551–5560 (2020). https://doi.org/10.1007/s00500-018-3568-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3568-0

Keywords

Navigation