Information retrieval methodology for aiding scientific database search
During literature reviews, and specially when conducting systematic literature reviews, finding and screening relevant papers during scientific document search may involve managing and processing large amounts of unstructured text data. In those cases where the search topic is difficult to establish or has fuzzy limits, researchers require to broaden the scope of the search and, in consequence, data from retrieved scientific publications may become huge and uncorrelated. However, through a convenient analysis of these data the researcher may be able to discover new knowledge which may be hidden within the search output, thus exploring the limits of the search and enhancing the review scope. With that aim, this paper presents an iterative methodology that applies text mining and machine learning techniques to a downloaded corpus of abstracts from scientific databases, combining automatic processing algorithms with tools for supervised decision-making in an iterative process sustained on the researchers’ judgement, so as to adapt, screen and tune the search output. The paper ends showing a working example that employs a set of developed scripts that implement the different stages of the proposed methodology.
KeywordsInformation retrieval Systematic literature review Text mining Vector space model Support vector machine
This work has partially funded by the Spanish Government Ministry of Economy and Competitiveness throughout the DEFINES project (Ref. TIN2016-80172-R) and the Ministry of Education of the Junta de Castilla y Leon (Spain) throughout the T-CUIDA project (Ref. SA061P17).
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- Al-Ruithe M, Benkhelifa E, Hameed K (2018) A systematic literature review of data governance and cloud data governance. Pers Ubiquitous Comput. https://doi.org/10.1007/s00779-017-1104-3
- Eachempati P, Srivastava PR (2017) Systematic literature review of big data analytics. In: Proceedings of the 2017 ACM SIGMIS conference on computers and people research, ACM, New York, NY, USA, SIGMIS-CPR’17, pp 177–178. https://doi.org/10.1145/3084381.3084422
- Felizardo KR, Nakagawa EY, Feitosa D, Minghim R, Maldonado JC (2010) An approach based on visual text mining to support categorization and classification in the systematic mapping. In: Proceedings of the 14th international conference on evaluation and assessment in software engineering, BCS learning & development Ltd., Swindon, UK, EASE’10, pp 34–43Google Scholar
- Hotho A, Nnberger A, Paa G (2005) A brief survey of text mining. LDV Forum GLDV J Comput Linguist Lang Technol 20(1):19–62Google Scholar
- Islam MS, Jubayer FEM, Ahmed SI (2017) A support vector machine mixed with tf-idf algorithm to categorize Bengali document. In: 2017 international conference on electrical, computer and communication engineering (ECCE), pp 191–196. https://doi.org/10.1109/ECACE.2017.7912904
- Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Version 2.3, EBSE Technical Report EBSE-2007-01, Keele University and University of DurhamGoogle Scholar
- Marcos-Pablos S, García-Peñalvo F Decision support tools for slr search string construction. In: Proceedings of the 6th international conference on technological ecosystems for enhancing multiculturality, ACM, New York, NY, USA, TEEM 2018 (in press) Google Scholar
- Marshall C, Brereton P (2013) Tools to support systematic literature reviews in software engineering: a mapping study. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement, pp 296–299. https://doi.org/10.1109/ESEM.2013.32
- Mayer-Schnberger V, Cukier K (2013) Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, BostonGoogle Scholar
- Mergel GD, Silveira MS, da Silva TS (2015) A method to support search string building in systematic literature reviews through visual text mining. In: Proceedings of the 30th annual ACM symposium on applied computing, ACM, New York, NY, USA, SAC’15, pp 1594–1601. https://doi.org/10.1145/2695664.2695902
- Nelson B, Olovsson T (2016) Security and privacy for big data: a systematic literature review. In: 2016 IEEE international conference on big data (big data), pp 3693–3702. https://doi.org/10.1109/BigData.2016.78410372
- Olorisade BK, de Quincey E, Brereton P, Andras P (2016) A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. In: Proceedings of the 20th international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’16, pp 14:1–14:11. https://doi.org/10.1145/2915970.2915982
- Petticrew M, Roberts H (2008) Systematic reviews in the social sciences: a practical guide. Wiley, LondonGoogle Scholar
- Ros R, Bjarnason E, Runeson P (2017) A machine learning approach for semi-automated search and selection in literature studies. In: Proceedings of the 21st international conference on evaluation and assessment in software engineering, ACM, New York, NY, USA, EASE’17, pp 118–127. https://doi.org/10.1145/3084226.3084243
- Sparck Jones K (1988) Document retrieval systems. Taylor Graham Publishing, London, UK, chap A statistical interpretation of term specificity and its application in retrieval, pp 132–142Google Scholar
- Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc., BostonGoogle Scholar