Abstract
During literature reviews, and specially when conducting systematic literature reviews, finding and screening relevant papers during scientific document search may involve managing and processing large amounts of unstructured text data. In those cases where the search topic is difficult to establish or has fuzzy limits, researchers require to broaden the scope of the search and, in consequence, data from retrieved scientific publications may become huge and uncorrelated. However, through a convenient analysis of these data the researcher may be able to discover new knowledge which may be hidden within the search output, thus exploring the limits of the search and enhancing the review scope. With that aim, this paper presents an iterative methodology that applies text mining and machine learning techniques to a downloaded corpus of abstracts from scientific databases, combining automatic processing algorithms with tools for supervised decision-making in an iterative process sustained on the researchers’ judgement, so as to adapt, screen and tune the search output. The paper ends showing a working example that employs a set of developed scripts that implement the different stages of the proposed methodology.
Similar content being viewed by others
References
Al-Ruithe M, Benkhelifa E, Hameed K (2018) A systematic literature review of data governance and cloud data governance. Pers Ubiquitous Comput. https://doi.org/10.1007/s00779-017-1104-3
Buttcher S, Clarke C, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. The MIT Press, Cambridge
Eachempati P, Srivastava PR (2017) Systematic literature review of big data analytics. In: Proceedings of the 2017 ACM SIGMIS conference on computers and people research, ACM, New York, NY, USA, SIGMIS-CPR’17, pp 177–178. https://doi.org/10.1145/3084381.3084422
Felizardo KR, Nakagawa EY, Feitosa D, Minghim R, Maldonado JC (2010) An approach based on visual text mining to support categorization and classification in the systematic mapping. In: Proceedings of the 14th international conference on evaluation and assessment in software engineering, BCS learning & development Ltd., Swindon, UK, EASE’10, pp 34–43
Franco-Bedoya O, Ameller D, Costal D, Franch X (2017) Open source software ecosystems: a systematic mapping. Inf Softw Technol 91:160–185. https://doi.org/10.1016/j.infsof.2017.07.007
Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manag 35(2):137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
Hordri NF, Samar A, Yuhaniz SS, Shamsuddin SM (2017) A systematic literature review on features of deep learning in big data analytics. Int J Adv Soft Comput Appl 9(1):32–49. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
Hotho A, Nnberger A, Paa G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62
Islam MS, Jubayer FEM, Ahmed SI (2017) A support vector machine mixed with tf-idf algorithm to categorize Bengali document. In: 2017 international conference on electrical, computer and communication engineering (ECCE), pp 191–196. https://doi.org/10.1109/ECACE.2017.7912904
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C (eds) Machine learning: ECML-98. Springer, Berlin, pp 137–142
Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Version 2.3, EBSE Technical Report EBSE-2007-01, Keele University and University of Durham
Labrinidis A, Jagadish HV (2012) Challenges and opportunities with big data. VLDB Endow 5(12):2032–2033
LHeureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797. https://doi.org/10.1109/ACCESS.2017.2696365
Marcos-Pablos S, García-Peñalvo F Decision support tools for slr search string construction. In: Proceedings of the 6th international conference on technological ecosystems for enhancing multiculturality, ACM, New York, NY, USA, TEEM 2018 (in press)
Marshall C, Brereton P (2013) Tools to support systematic literature reviews in software engineering: a mapping study. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement, pp 296–299. https://doi.org/10.1109/ESEM.2013.32
Mayer-Schnberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston
Mergel GD, Silveira MS, da Silva TS (2015) A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th annual ACM symposium on applied computing, ACM, New York, NY, USA, SAC’15, pp 1594–1601. https://doi.org/10.1145/2695664.2695902
Nelson B, Olovsson T (2016) Security and privacy for big data: a systematic literature review. In: 2016 IEEE international conference on big data (big data), pp 3693–3702. https://doi.org/10.1109/BigData.2016.78410372
Olorisade BK, de Quincey E, Brereton P, Andras P (2016) A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’16, pp 14:1–14:11. https://doi.org/10.1145/2915970.2915982
O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4:5
Petticrew M, Roberts H (2008) Systematic reviews in the social sciences: a practical guide. Wiley, London
Ros R, Bjarnason E, Runeson P (2017) A machine learning approach for semi-automated search and selection in literature studies. In: Proceedings of the 21st international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’17, pp 118–127. https://doi.org/10.1145/3084226.3084243
Sparck Jones K (1988) Document retrieval systems. Taylor Graham Publishing, London, UK, chap A statistical interpretation of term specificity and its application in retrieval, pp 132–142
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., Boston
Tsafnat G, Glasziou P, Choong MK, Dunn A, Galgani F, Coiera E (2014) Systematic review automation technologies. Syst Rev 3:74
Acknowledgements
This work has partially funded by the Spanish Government Ministry of Economy and Competitiveness throughout the DEFINES project (Ref. TIN2016-80172-R) and the Ministry of Education of the Junta de Castilla y Leon (Spain) throughout the T-CUIDA project (Ref. SA061P17).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by B. B. Gupta.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Marcos-Pablos, S., García-Peñalvo, F.J. Information retrieval methodology for aiding scientific database search. Soft Comput 24, 5551–5560 (2020). https://doi.org/10.1007/s00500-018-3568-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3568-0