Abstract
Literature-based discovery (LBD) aims to discover valuable latent relationships between disparate sets of literatures. This paper presents the first inclusive scientometric overview of LBD research. We utilize a comprehensive scientometric approach incorporating CiteSpace to systematically analyze the literature on LBD from the last four decades (1986–2020). After manual cleaning, we have retrieved a total of 409 documents from six bibliographic databases and two preprint servers. The 35 years’ history of LBD could be partitioned into three phases according to the published papers per year: incubation (1986–2003), developing (2004–2008), and mature phase (2009–2020). The annual production of publications follows Price’s law. The co-authorship network exhibits many subnetworks, indicating that LBD research is composed of many small and medium-sized groups with little collaboration among them. Science mapping reveals that mainstream research in LBD has shifted from baseline co-occurrence approaches to semantic-based methods at the beginning of the new millennium. In the last decade, we can observe the leaning of LBD towards modern network science ideas. In an applied sense, the LBD is increasingly used in predicting adverse drug reactions and drug repurposing. Besides theoretical considerations, the researchers have put a lot of effort into the development of Web-based LBD applications. Nowadays, LBD is becoming increasingly interdisciplinary and involves methods from information science, scientometrics, and machine learning. Unfortunately, LBD is mainly limited to the biomedical domain. The cascading citation expansion announces deep learning and explainable artificial intelligence as emerging topics in LBD. The results indicate that LBD is still growing and evolving.
Similar content being viewed by others
Availability of data and materials
The data set discussed in this paper has been deposited in the public repository Zenodo (https://doi.org/10.5281/zenodo.3884422) and is freely available to the research community.
Code availability
R code to replicate the results of the study is accessible on the author’s GitHub page (https://github.com/akastrin/lbd-review).
Notes
A 2-generation forward expansion collects all papers connecting to the seed paper with two-step citation paths.
The Journal of Biomedical Discovery and Collaborations (DISCO) was an open access online journal that encompassed all aspects of scientific information management and studies of scientific practice. The journal connected disparate perspectives (e.g., informatics, computer science, sociology, cognitive psychology, scientometrics, public policy, technology innovation, and history and philosophy of science) and published several papers directly related to LBD. DISCO was published by BioMed Central from 2006–2008.
Sigma (\(\Sigma\)) index is used to characterize scientific novelty according to centrality and burstness as criteria of transformative discovery (Chen et al. 2009). Sigma is defined as \((\textit{centrality} + 1)^{\textit{burstness}}\).
To our knowledge two LBD events have been organized in the past decade. The First International Workshop on the role of Semantic Web in Literature-Based Discovery (SWLBD 2012) was co-organized with The IEEE International Conference on Bioinformatics and Biomedicine in Philadelphia, USA (http://www.ischool.drexel.edu/ieeebibm/bibm12). At the time of this writing, Smalheiser and Sebastian organized the First International Workshop on Literature-Based Discovery (LBD2020) co-located with the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining in Singapore (http://scientificarbitrage.com/lbd-2020).
At the time of the submission of this paper we came across a new paper (Crichton et al. 2020) from Korhonen’s group which discussed implementation of graph-based neural network methodology for open and closed LBD.
References
Abramo, G., D’Angelo, C. A., & Solazzi, M. (2011). The relationship between scientists’ research performance and the degree of internationalization of their research. Scientometrics, 86(3), 629–643. https://doi.org/10.1007/s11192-010-0284-7.
Ahlers, C. B., Hristovski, D., Kilicoglu, H., & Rindflesch, T. C. (2007). Using the literature-based discovery paradigm to investigate drug mechanisms. In AMIA annual symposium proceedings (pp. 6–10).
Ahmed, A. (2016). Literature-based discovery: Critical analysis and future directions. International Journal of Computer Science and Network Security, 16(7), 11–26.
Andronis, C., Sharma, A., Virvilis, V., Deftereos, S., & Persidis, A. (2011). Literature mining, ontologies and information visualization for drug repurposing. Briefings in Bioinformatics, 12(4), 357–368. https://doi.org/10.1093/bib/bbr005.
Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007.
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012.
Bekhuis, T. (2006). Conceptual biology, hypothesis discovery, and text mining: Swanson’s legacy. Biomedical Digital Libraries,. https://doi.org/10.1186/1742-5581-3-2.
Berthold, M. R. (Ed.). (2012). Bisociative knowledge discovery: An introduction to concept, algorithms, tools, and applications. In Lecture Notes in Artificial Intelligence. Berlin: Springer. https://doi.org/10.1007/978-3-642-31830-6.
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10, P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008.
Bodenreider, O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 32(Database issue), D267–270. https://doi.org/10.1093/nar/gkh061.
Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222. https://doi.org/10.1002/asi.23329.
Bradford, S. C. (1934). Sources of information on specific subjects. Engineering, 137, 85–86.
Bruza, P., & Weeber, M. (Eds.). (2008). Literature-based discovery. Berlin: Springer. https://doi.org/10.1007/978-3-540-68690-3.
Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks: An introduction to co-word analysis. Information (International Social Science Council), 22(2), 191–235. https://doi.org/10.1177/053901883022002003.
Cameron, D., Bodenreider, O., Yalamanchili, H., Danh, T., Vallabhaneni, S., Thirunarayan, K., et al. (2013). A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications. Journal of Biomedical Informatics, 46(2), 238–251. https://doi.org/10.1016/j.jbi.2012.09.004.
Cameron, D., Kavuluru, R., Rindflesch, T. C., Sheth, A. P., Thirunarayan, K., & Bodenreider, O. (2015). Context-driven automatic subgraph creation for literature-based discovery. Journal of Biomedical Informatics, 54, 141–157. https://doi.org/10.1016/j.jbi.2015.01.014.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology, 57(3), 359–377. https://doi.org/10.1002/asi.20317.
Chen, C. (2013). Mapping scientific frontiers: The quest for knowledge visualization. London: Springer. https://doi.org/10.1007/978-1-4471-5128-9.
Chen, C. (2018). Cascading citation expansion. Journal of Information Science Theory and Practice, 6(2), 6–23.
Chen, C., Chen, Y., Horowitz, M., Hou, H., Liu, Z., & Pellegrino, D. (2009). Towards an explanatory and computational theory of scientific discovery. Journal of Informetrics, 3(3), 191–209. https://doi.org/10.1016/j.joi.2009.03.004.
Chen, C., Ibekwe-SanJuan, F., & Hou, J. (2010). The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386–1409. https://doi.org/10.1002/asi.21309.
Chen, C., & Leydesdorff, L. (2014). Patterns of connections and movements in dual-map overlays: A new method of publication portfolio analysis. Journal of the Association for Information Science and Technology, 65(2), 334–351.
Chen, C., & Song, M. (2017). Representing scientific knowledge: The role of uncertainty. New York: Springer. https://doi.org/10.1007/978-3-319-62543-0.
Chen, C., & Song, M. (2019). Visualizing a field of research: A methodology of systematic scientometric reviews. PLoS ONE, 14(10), e0223994. https://doi.org/10.1371/journal.pone.0223994.
Chen, H., & Sharp, B. M. (2004). Content-rich biological network constructed by mining PubMed abstracts. BMC Bioinformatics, 5(1), 147. https://doi.org/10.1186/1471-2105-5-147.
Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology, 63(8), 1609–1630. https://doi.org/10.1002/asi.22688.
Cohen, A. M. (2005). A survey of current work in biomedical text mining. Briefings in Bioinformatics, 6(1), 57–71. https://doi.org/10.1093/bib/6.1.57.
Cohen, T., & Widdows, D. (2017). Embedding of semantic predications. Journal of Biomedical Informatics, 68, 150–166. https://doi.org/10.1016/j.jbi.2017.03.003.
Cohen, T., Widdows, D., Schvaneveldt, R. W., Davies, P., & Rindflesch, T. C. (2012). Discovering discovery patterns with predication-based semantic indexing. Journal of Biomedical Informatics, 45(6), 1049–1065. https://doi.org/10.1016/j.jbi.2012.07.003.
Cory, K. A. (1997). Discovering hidden analogies in an online humanities database. Computers and the Humanities, 31(1), 1–12. https://doi.org/10.1023/A:1000422220677.
Crichton, G., Baker, S., Guo, Y., & Korhonen, A. (2020). Neural networks for open and closed literature-based discovery. PLoS ONE, 15(5), e0232891. https://doi.org/10.1371/journal.pone.0232891.
Crichton, G., Guo, Y., Pyysalo, S., & Korhonen, A. (2018). Neural networks for link prediction in realistic biomedical graphs: A multi-dimensional evaluation of graph embedding-based approaches. BMC Bioinformatics,. https://doi.org/10.1186/s12859-018-2163-9.
Davies, R. (1989). The creation of new knowledge by information retrieval and classification. Journal of Documentation, 45(4), 273–301. https://doi.org/10.1108/eb026846.
Davies, R. (1990). Generating new knowledge by retrieving information. Journal of Documentation, 46(4), 368–372. https://doi.org/10.1108/eb026868.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9.
Deftereos, S. N., Andronis, C., Friedla, E. J., Persidis, A., & Persidis, A. (2011). Drug repurposing and adverse event prediction using high-throughput literature analysis. WIREs Systems Biology and Medicine, 3(3), 323–334. https://doi.org/10.1002/wsbm.147.
DeShazo, J. P., LaVallie, D. L., & Wolf, F. M. (2009). Publication trends in the medical informatics literature: 20 years of “Medical Informatics” in MeSH. BMC Medical Informatics and Decision Making, 9(1), 7. https://doi.org/10.1186/1472-6947-9-7.
DiGiacomo, R. A., Kremer, J. M., & Shah, D. M. (1989). Fish-oil dietary supplementation in patients with Raynaud’s phenomenon: A double-blind, controlled, prospective study. The American Journal of Medicine, 86(2), 158–164. https://doi.org/10.1016/0002-9343(89)90261-1.
Ding, Y., Song, M., Han, J., Yu, Q., Yan, E., Lin, L., et al. (2013). Entitymetrics: Measuring the impact of entities. PLoS ONE, 8(8), e71416. https://doi.org/10.1371/journal.pone.0071416.
Egghe, L. (2006). Theory and practise of the g-index. Scientometrics, 69(1), 131–152.
Eijk, C. C. v. d., Mulligen, E. M. v., Kors, J. A., Mons, B., & Berg, J. v.d. (2004). Constructing an associative concept space for literature-based discovery. Journal of the American Society for Information Science and Technology, 55(5), 436–444. https://doi.org/10.1002/asi.10392.
Frijters, R., van Vugt, M., Smeets, R., van Schaik, R., de Vlieg, J., & Alkema, W. (2010). Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLOS Computational Biology,. https://doi.org/10.1371/journal.pcbi.1000943.
Fuller, S. S., Revere, D., Bugni, P. F., & Martin, G. M. (2004). A knowledgebase system to enhance scientific discovery: Telemakus. Biomedical Digital Libraries, 1(1), 2. https://doi.org/10.1186/1742-5581-1-2.
Garfield, E. (1955). Science. Citation indexes for science. A new dimension in documentation through association of ideas, 122(3159), 108–111.
Godin, B. (2006). On the origins of bibliometrics. Scientometrics, 68(1), 109–133. https://doi.org/10.1007/s11192-006-0086-0.
Gopalakrishnan, V., Jha, K., Jin, W., & Zhang, A. (2019). A survey on literature based discovery approaches in biomedical domain. Journal of Biomedical Informatics, 93, 103141. https://doi.org/10.1016/j.jbi.2019.103141.
Gordon, M. D., & Dumais, S. (1998). Using latent semantic indexing for literature based discovery. Journal of the American Society for Information Science, 49(8), 674–685. https://doi.org/10.1002/(SICI)1097-4571(199806)49:8<674::AID-ASI2>3.0.CO;2-T.
Gordon, M. D., & Lindsay, R. K. (1996). Toward discovery support systems: A replication, re-examination, and extension of Swanson’s work on literature-based discovery of a connection between Raynaud’s and fish oil. Journal of the American Society for Information Science, 47(2), 116–128. https://doi.org/10.1002/(SICI)1097-4571(199602)47:2<116::AID-ASI3>3.0.CO;2-1.
Gould, R. V., & Fernandez, R. M. (1989). Structures of mediation: A formal approach to brokerage in transaction networks. Sociological Methodology, 19, 89–126. https://doi.org/10.2307/270949.
Henry, S., & McInnes, B. T. (2017). Literature based discovery: Models, methods, and trends. Journal of Biomedical Informatics, 74, 20–32. https://doi.org/10.1016/j.jbi.2017.08.011.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572. https://doi.org/10.1073/pnas.0507655102.
Hou, J., Yang, X., & Chen, C. (2018). Emerging trends and new developments in information science: A document co-citation analysis (2009–2016). Scientometrics, 115(2), 869–892. https://doi.org/10.1007/s11192-018-2695-9.
Hristovski, D., Friedman, C., Rindflesch, T. C., & Peterlin, B. (2006). Exploiting semantic relations for literature-based discovery. In AMIA annual symposium proceedings (pp. 349–353).
Hristovski, D., Kastrin, A., & Rindflesch, T. C. (2015). Semantics-based cross-domain collaboration recommendation in the life sciences: Preliminary results. In J. Pei, F. Silvestri, & J. Tang (Eds.), Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015, association for computing machinery, Paris, France, ASONAM ’15 (pp. 805–806). https://doi.org/10.1145/2808797.2809300.
Hristovski, D., Peterlin, B., Mitchell, J. A., & Humphrey, S. M. (2005). Using literature-based discovery to identify disease candidate genes. International Journal of Medical Informatics, 74(2), 289–298. https://doi.org/10.1016/j.ijmedinf.2004.04.024.
Hristovski, D., Rindflesch, T., & Peterlin, B. (2013). Using literature-based discovery to identify novel therapeutic approaches. Cardiovascular & Hematological Agents in Medicinal Chemistry, 11(1), 14–24.
Hristovski, D., Stare, J., Peterlin, B., & Džeroski, S. (2001). Supporting discovery in medicine by association rule mining in Medline and UMLS. Studies in Health Technology and Informatics, 84(Pt 2), 1344–1348.
Hui, W., & Lau, W. K. (2019). Application of literature-based discovery in nonmedical disciplines: A survey. In Proceedings of the 2nd international conference on computing and big data, association for computing machinery, Taichung, Taiwan, ICCBD 2019 (pp. 7–11). https://doi.org/10.1145/3366650.3366660.
Jensen, L. J., Saric, J., & Bork, P. (2006). Literature mining for the biologist: From information retrieval to biological discovery. Nature Reviews Genetics, 7(2), 119–129. https://doi.org/10.1038/nrg1768.
Jenssen, T. K., Lægreid, A., Komorowski, J., & Hovig, E. (2001). A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28(1), 21–28. https://doi.org/10.1038/ng0501-21.
Jha, K., & Jin, W. (2016). Mining hidden knowledge from the counterterrorism dataset using graph-based approach. In E. Métais, F. Meziane, M. Saraee, V. Sugumaran, & S. Vadera (Eds.), Natural language processing and information systems (pp. 310–317). Cham: Springer. https://doi.org/10.1007/978-3-319-41754-7_29.
Kastrin, A., Rindflesch, T. C., & Hristovski, D. (2016). Link prediction on a network of co-occurring MeSH terms: Towards literature-based discovery. Methods of Information in Medicine, 55(4), 340–346. https://doi.org/10.3414/ME15-01-0108.
Katukuri, J. R., Xie, Y., Raghavan, V. V., & Gupta, A. (2012). Hypotheses generation as supervised link discovery with automated class labeling on large-scale biomedical concept networks. BMC Genomics, 13(3), S5. https://doi.org/10.1186/1471-2164-13-S3-S5.
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., & Rindflesch, T. C. (2012). SemMedDB: A PubMed-scale repository of biomedical semantic predications. Bioinformatics, 28(23), 3158–3160. https://doi.org/10.1093/bioinformatics/bts591.
Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4), 373–397. https://doi.org/10.1023/A:1024940629314.
Kostoff, R. N. (2014). Literature-related discovery: Common factors for Parkinson’s disease and Crohn’s disease. Scientometrics, 100(3), 623–657. https://doi.org/10.1007/s11192-014-1298-3.
Kostoff, R. N., & Briggs, M. B. (2008). Literature-related discovery (LRD): Potential treatments for Parkinson’s disease. Technological Forecasting and Social Change, 75(2), 226–238. https://doi.org/10.1016/j.techfore.2007.11.007.
Kozomara, A., & Griffiths-Jones, S. (2011). miRBase: Integrating microRNA annotation and deep-sequencing data. Nucleic Acids Research, 39(suppl-1), D152–D157. https://doi.org/10.1093/nar/gkq1027.
Lee, S., & Bozeman, B. (2005). The impact of research collaboration on scientific productivity. Social Studies of Science, 35(5), 673–702. https://doi.org/10.1177/0306312705052359.
Lever, J., Gakkhar, S., Gottlieb, M., Rashnavadi, T., Lin, S., Siu, C., et al. (2017). A collaborative filtering-based approach to biomedical knowledge discovery. Bioinformatics, 34(4), 652–659. https://doi.org/10.1093/bioinformatics/btx613.
Li, J., Yin, Y., Fortunato, S., & Wang, D. (2019). Nobel laureates are almost the same as us. Nature Reviews Physics, 1, 301–303. https://doi.org/10.1038/s42254-019-0057-z.
Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P., et al. (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. PLOS Medicine, 6(7), e1000100. https://doi.org/10.1371/journal.pmed.1000100.
Lindsay, R. K., & Gordon, M. D. (1999). Literature-based discovery by lexical statistics. Journal of the American Society for Information Science, 50(7), 574–587. https://doi.org/10.1002/(SICI)1097-4571(1999)50:7<574::AID-ASI3>3.0.CO;2-Q.
Mower, J., Subramanian, D., Shang, N., & Cohen, T. (2017). Classification-by-analogy: Using vector representations of implicit relationships to identify plausibly causal drug/side-effect relationships. In AMIA annual symposium proceedings (pp. 1940–1949).
Nadri, H., Rahimi, B., Timpka, T., & Sedghi, S. (2017). The top 100 articles in the medical informatics: A bibliometric analysis. Journal of Medical Systems, 41(10), 150. https://doi.org/10.1007/s10916-017-0794-4.
Noyons, E. C., Moed, H. F., & Luwel, M. (1999). Combining mapping and citation analysis for evaluative bibliometric purposes: A bibliometric study. Journal of the American Society for Information Science, 50(2), 115–131. https://doi.org/10.1002/(SICI)1097-4571(1999)50:2<115::AID-ASI3>3.0.CO;2-J.
Petrič, I., Cestnik, B., Lavrač, N., & Urbančič, T. (2012). Outlier detection in cross-context link discovery for creative literature mining. The Computer Journal, 55(1), 47–61. https://doi.org/10.1093/comjnl/bxq074.
Petrič, I., Urbančič, T., Cestnik, B., & Macedoni-Lukšič, M. (2009). Literature mining method RaJoLink for uncovering relations between biomedical concepts. Journal of Biomedical Informatics, 42(2), 219–227. https://doi.org/10.1016/j.jbi.2008.08.004.
Pratt, W., & Yetisgen-Yildiz, M. (2003). LitLinker: Capturing connections across the biomedical literature. In Proceedings of the 2nd international conference on Knowledge capture, Association for Computing Machinery, Sanibel Island, FL, USA, K-CAP ’03 (pp. 105–112). https://doi.org/10.1145/945645.945662
Price, D. J. D. S. (1963). Little science, big science. New York, NY: Columbia University Press.
Price, D. J. D. S. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Pritchard, A. (1969). Statistical bibliography or bibliometrics. Journal of Documentation, 25(4), 348–349.
Pyysalo, S., Baker, S., Ali, I., Haselwimmer, S., Shah, T., Young, A., et al. (2019). LION LBD: A literature-based discovery system for cancer biology. Bioinformatics, 35(9), 1553–1561. https://doi.org/10.1093/bioinformatics/bty845.
Rindflesch, T. C., & Fiszman, M. (2003). The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. Journal of Biomedical Informatics, 36(6), 462–477. https://doi.org/10.1016/j.jbi.2003.11.003.
Sang, S., Yang, Z., Liu, X., Wang, L., Zhang, Y., Lin, H., et al. (2018). A knowledge graph based bidirectional recurrent neural network method for literature-based discovery. In: H.J. Zheng, Z. Callejas, D. Griol, H. Wang, X. Hu, H. Schmidt, J. Baumbach, J. Dickerson, & L. Zhang (Eds.), 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), IEEE (pp. 751–752). https://doi.org/10.1109/BIBM.2018.8621423
Schuemie, M., Talmon, J., Moorman, P., & Kors, J. (2009). Mapping the domain of medical informatics. Methods of Information in Medicine, 48(01), 76–83. https://doi.org/10.3414/ME0576.
Sebastian, Y., Siew, E. G., & Orimaye, S. O. (2017a). Emerging approaches in literature-based discovery: Techniques and performance review. The Knowledge Engineering Review,. https://doi.org/10.1017/S0269888917000042.
Sebastian, Y., Siew, E. G., & Orimaye, S. O. (2017b). Learning the heterogeneous bibliographic information network for literature-based discovery. Knowledge-Based Systems, 115, 66–79. https://doi.org/10.1016/j.knosys.2016.10.015.
Shang, N., Xu, H., Rindflesch, T. C., & Cohen, T. (2014). Identifying plausible adverse drug reactions using knowledge extracted from the literature. Journal of Biomedical Informatics, 52, 293–310. https://doi.org/10.1016/j.jbi.2014.07.011.
Shneider, A. M. (2009). Four stages of a scientific discipline; four types of scientist. Trends in Biochemical Sciences, 34(5), 217–223. https://doi.org/10.1016/j.tibs.2009.02.002.
Smalheiser, N., & Swanson, D. (1994). Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease. Neuroscience Research Communications, 15(1), 1–9.
Smalheiser, N. R. (2012). Literature-based discovery: Beyond the ABCs. Journal of the American Society for Information Science and Technology, 63(2), 218–224. https://doi.org/10.1002/asi.21599.
Smalheiser, N. R. (2017). Rediscovering Don Swanson: The past, present and future of literature-based discovery. Journal of Data and Information Science, 2(4), 43–64. https://doi.org/10.1515/jdis-2017-0019.
Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269. https://doi.org/10.1002/asi.4630240406.
Song, D., & Bruza, P. (2006). Text based knowledge discovery with information flow analysis. In D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M. Y. Vardi, G. Weikum, X. Zhou, J. Li, H. T. Shen, M. Kitsuregawa, & Y. Zhang (Eds.), Frontiers of WWW Research and Development—APWeb 2006 (Vol. 3841, pp. 692–701). Berlin: Springer. https://doi.org/10.1007/11610113_60.
Srinivasan, P. (2004). Text mining: Generating hypotheses from MEDLINE. Journal of the American Society for Information Science and Technology, 55(5), 396–413. https://doi.org/10.1002/asi.10389.
Stapley, B. J., & Benoit, G. (2000). Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in Medline abstracts. In Pacific Symposium on Biocomputing (pp. 529–540). https://doi.org/10.1142/9789814447331_0050
Swanson, D. R. (1986a). Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1), 7–18. https://doi.org/10.1353/pbm.1986.0087.
Swanson, D. R. (1986b). Undiscovered public knowledge. The Library Quarterly, 56(2), 103–118. https://doi.org/10.1086/601720.
Swanson, D. R. (1988). Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine, 31(4), 526–557. https://doi.org/10.1353/pbm.1988.0009.
Swanson, D. R. (2011). Literature-based resurrection of neglected medical discoveries. Journal of Biomedical Discovery and Collaboration, 6, 34–47. https://doi.org/10.5210/disco.v6i0.3515.
Swanson, D. R., & Smalheiser, N. R. (1997). An interactive system for finding complementary literatures: A stimulus to scientific discovery. Artificial Intelligence, 91(2), 183–203. https://doi.org/10.1016/S0004-3702(97)00008-8.
Swanson, D. R., & Smalheiser, N. R. (1999). Implicit text linkages between Medline records: Using Arrowsmith as an aid to scientific discovery. Library Trends, 48(1), 48–59.
Sybrandt, J., Carrabba, A., Herzog, A., & Safro, I. (2018a). Are abstracts enough for hypothesis generation? In: N. Abe, H. Liu, C. Pu, X. Hu, N. Ahmed, M. Qiao, Y. Song, D. Kossmann, B. Liu, K. Lee, J. Tang, J. He, & J. Saltz (Eds.), 2018 IEEE International Conference on Big Data (Big Data) (pp. 1504–1513). https://doi.org/10.1109/BigData.2018.8621974
Sybrandt, J., Shtutman, M., & Safro, I. (2018b). Large-scale validation of hypothesis generation systems via candidate ranking. In: N. Abe, H. Liu, C. Pu, X. Hu, N. Ahmed, M. Qiao, Y. Song, D. Kossmann, B. Liu, K. Lee, J. Tang, J. He, & J. Saltz (Eds.) 2018 IEEE International Conference on Big Data (Big Data) (pp. 1494–1503). https://doi.org/10.1109/BigData.2018.8622637
Thilakaratne, M., Falkner, K., & Atapattu, T. (2019a). A systematic review on literature-based discovery: General overview, methodology & statistical analysis. ACM Computing Surveys, 52(6), 129:1–129:34. https://doi.org/10.1145/3365756.
Thilakaratne, M., Falkner, K., & Atapattu, T. (2019b). A systematic review on literature-based discovery workflow. PeerJ Computer Science, 5, e235. https://doi.org/10.7717/peerj-cs.235.
Thonon, F., Boulkedid, R., Delory, T., Rousseau, S., Saghatchian, M., Wv, Harten, et al. (2015). Measuring the outcome of biomedical research: A systematic literature review. PLoS ONE, 10(4), e0122239. https://doi.org/10.1371/journal.pone.0122239.
Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical combinations and scientific impact. Science, 342(6157), 468–472. https://doi.org/10.1126/science.1240474.
van Eck, N., & Waltman, L. (2009). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538. https://doi.org/10.1007/s11192-009-0146-3.
Weeber, M., Klein, H., de Jong-van den Berg, LT, & Vos, R. (2001). Using concepts in literature-based discovery: Simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries. Journal of the American Society for Information Science and Technology, 52(7), 548–557. https://doi.org/10.1002/asi.1104.
Weeber, M., Kors, J. A., & Mons, B. (2005). Online tools to support literature-based discovery in the life sciences. Briefings in Bioinformatics, 6(3), 277–286. https://doi.org/10.1093/bib/6.3.277.
Weeber, M., Vos, R., Klein, H., de Berg, L. T. W., Aronson, A. R., & Molema, G. (2003). Generating hypotheses by discovering implicit associations in the literature: A case report of a search for new potential therapeutic uses for thalidomide. Journal of the American Medical Informatics Association, 10(3), 252–259. https://doi.org/10.1197/jamia.M1158.
Wei, C. H., Allot, A., Leaman, R., & Lu, Z. (2019). PubTator central: Automated concept annotation for biomedical full text articles. Nucleic Acids Research, 47(W1), W587–W593. https://doi.org/10.1093/nar/gkz389.
Widdows, D., & Cohen, T. (2015). Reasoning with vectors: A continuous model for fast robust inference. Logic Journal of the IGPL, 23(2), 141–173. https://doi.org/10.1093/jigpal/jzu028.
Wilkowski, B., Fiszman, M., Miller, C. M., Hristovski, D., Arabandi, S., Rosemblat, G., & Rindflesch, T. C. (2011). Graph-based methods for discovery browsing with semantic predications. In AMIA annual symposium proceedings (pp. 1514–1523).
Wren, J. D. (2004). Extending the mutual information measure to rank inferred literature relationships. BMC Bioinformatics, 5(1), 145. https://doi.org/10.1186/1471-2105-5-145.
Wren, J. D., Bekeredjian, R., Stewart, J. A., Shohet, R. V., & Garner, H. R. (2004). Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics, 20(3), 389–398. https://doi.org/10.1093/bioinformatics/btg421.
Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop and small teams disrupt science and technology. Nature, 566(7744), 378. https://doi.org/10.1038/s41586-019-0941-9.
Yang, H. T., Ju, J. H., Wong, Y. T., Shmulevich, I., & Chiang, J. H. (2017). Literature-based discovery of new candidates for drug repurposing. Briefings in Bioinformatics, 18(3), 488–497. https://doi.org/10.1093/bib/bbw030.
Ying, R., Bourgeois, D., You, J., Zitnik, M., & Leskovec, J. (2019). GNNExplainer: Generating explanations for graph neural networks. Advances in Neural Information Processing Systems, 32, 9240–9251.
Zhao S, Su C, Lu Z, Wang F (2020) Recent advances in biomedical literature mining. Briefings in Bioinformatics https://doi.org/10.1093/bib/bbaa057
Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., & Hoffman, M. M. (2019). Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities. Information Fusion, 50, 71–91. https://doi.org/10.1016/j.inffus.2018.09.012.
Acknowledgements
The authors thank Petra Hrovat Hristovski for proof-reading the manuscript. The authors also thank Halil Kilicoglu for his helpful comments and suggestions.
Funding
Authors were supported by the Slovenian Research Agency (Grant Nos. Z5-9352 (AK) and J5-1780 (DH)).
Author information
Authors and Affiliations
Contributions
AK conceived the study, collected the data, performed data analysis, and wrote the manuscript. DH contributed with critical revisions of the manuscript. Both authors read and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Kastrin, A., Hristovski, D. Scientometric analysis and knowledge mapping of literature-based discovery (1986–2020). Scientometrics 126, 1415–1451 (2021). https://doi.org/10.1007/s11192-020-03811-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-020-03811-z