Skip to main content

Making Biomedical Sciences publications more accessible for machines

Abstract

With the rapidly expanding catalogue of scientific publications, especially within the Biomedical Sciences field, it is becoming increasingly difficult for researchers to search for, read or even interpret emerging scientific findings. PubMed, just one of the current biomedical data repositories, comprises over 33 million citations for biomedical research, and over 2500 publications are added each day. To further strengthen the impact biomedical research, we suggest that there should be more synergy between publications and machines. By bringing machines into the realm of research and publication, we can greatly augment the assessment, investigation and cataloging of the biomedical literary corpus. The effective application of machine-based manuscript assessment and interpretation is now crucial, and potentially stands as the most effective way for researchers to comprehend and process the tsunami of biomedical data and literature. Many biomedical manuscripts are currently published online in poorly searchable document types, with figures and data presented in formats that are partially inaccessible to machine-based approaches. The structure and format of biomedical manuscripts should be adapted to facilitate machine-assisted interrogation of this important literary corpus. In this context, it is important to embrace the concept that biomedical scientists should also write manuscripts that can be read by machines. It is likely that an enhanced human–machine synergy in reading biomedical publications will greatly enhance biomedical data retrieval and reveal novel insights into complex datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  • Ad Hoc Working Group for Critical Appraisal of the Medical Literature. 1987. A proposal for more informative abstracts of clinical articles. Annals of Internal Medicine 106(4): 598–604.

    Article  Google Scholar 

  • Ahmed, Z., S. Zeeshan, and T. Dandekar. 2016. Mining biomedical images towards valuable information retrieval in biomedical and life sciences. Database. https://doi.org/10.1093/database/baw118.

    Article  Google Scholar 

  • Aleixandre-Benavent, R., R. Lucas-Dominguez, A. Sixto-Costoya, and A. Vidal-Infer. 2018. The sharing of research data in the cell and tissue engineering area: Is it a common practice? Stem Cells and Development 27 (11): 717–722.

    Article  Google Scholar 

  • Allen, D.K., S. Karanasios, and A. Norman. 2014. Information sharing and interoperability: The case of major incident management. European Journal of Information Systems 23 (4): 418–432.

    Article  Google Scholar 

  • Amann, R.I., S. Baichoo, B.J. Blencowe, P. Bork, M. Borodovsky, C. Brooksbank, et al. 2019. Toward unrestricted use of public genomic data. Science 363 (6425): 350–352.

    Article  Google Scholar 

  • Badger, G.M., W. Cook James, C.L. Hewett, L. Kennaway Ernest, N.M. Kennaway, R.H. Martin, et al. 1940. The production of cancer by pure hydrocarbons. V. Proceedings of the Royal Society of London Series b: Biological Sciences 129 (857): 439–467.

    Google Scholar 

  • Bardi, A., and P. Manghi. 2014. Enhanced publications: Data models and information systems. LIBER Quarterly 23 (4): 240–273.

    Article  Google Scholar 

  • Bettembourg, C., C. Diot, A. Burgun, and O. Dameron. 2012. GO2PUB: Querying PubMed with semantic expansion of Gene Ontology terms. Journal of Biomedical Semantics 3 (1): 7.

    Article  Google Scholar 

  • Biochemical and Biophysical Research Communications: Guide for authors. n.d. https://www.elsevier.com/journals/biochemical-and-biophysical-research-communications/0006-291x/guide-for-authors#22002.

  • BMC Bioinformatics: Preparing your manuscript. n.d. https://www.bmcbioinformatics.biomedcentral.com/submission-guidelines/preparing-your-manuscript/research-article.

  • Bornmann, L., and R. Mutz. 2015. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology 66 (11): 2215–2222.

    Article  Google Scholar 

  • Boyack, K.W., C. Smith, and R. Klavans. 2020. A detailed open access model of the PubMed literature. Scientific Data 7 (1): 408.

    Article  Google Scholar 

  • Camps, D. 2010. The Abstract: The letter of presentation for a scientific paper. Colombia Médica 41: 82–84.

    Article  Google Scholar 

  • Cellular and Molecular Biology: Authors guidelines. n.d. https://www.cellmolbiol.org/index.php/CMB/pages/view/Authors%20Guidelines.

  • Chen, H., B. Martin, C.M. Daimon, and S. Maudsley. 2013. Effective use of latent semantic indexing and computational linguistics in biological and biomedical applications. Frontiers in Physiology 4: 8.

    Google Scholar 

  • Clough, E., and T. Barrett. 2016. The Gene Expression Omnibus Database. Methods in Molecular Biology 1418: 93–110.

    Article  Google Scholar 

  • COPE Council. n.d. COPE Discussion Document: Artificial intelligence (AI) in decision making—English. COPE Council. https://doi.org/10.24318/9kvAgrnJ.

  • Cox, L.J., U. Hengst, N.G. Gurskaya, K.A. Lukyanov, and S.R. Jaffrey. 2008. Intra-axonal translation and retrograde trafficking of CREB promotes neuronal survival. Nature Cell Biology 10 (2): 149–159.

    Article  Google Scholar 

  • Definition of interoperability. n.d. http://interoperability-definition.info/en/.

  • Dietze, H., and M. Schroeder. 2009. GoWeb: A semantic search engine for the life science web. BMC Bioinformatics 10 (10): S7.

    Article  Google Scholar 

  • Dimitrova, M., R. Meyer, P.L. Buttigieg, T. Georgiev, G. Zhelezov, S. Demirov, et al. 2021. A streamlined workflow for conversion, peer review, and publication of genomics metadata as omics data papers. GigaScience. https://doi.org/10.1093/gigascience/giab034.

    Article  Google Scholar 

  • Doms, A., and M. Schroeder. 2005. GoPubMed: Exploring PubMed with the Gene Ontology. Nucleic Acids Research 33 (Web Server Issue): W783–W786.

    Article  Google Scholar 

  • Dupre, M., and W. Cowper. 1699. VI. An abstract of an account of five pair of muscles, which serve for different motions of the head, on the first and second vertebra of the neck; and of two ligaments, one of which fastens the head to the first vertebra, and the other fastens the first to the second. To which is Annext the history of an uncommon appearance of a humane skull. Philosophical Transactions of the Royal Society of London 21(251): 130–141.

  • Eliceiri, K.W., M.R. Berthold, I.G. Goldberg, L. Ibanez, B.S. Manjunath, M.E. Martone, et al. 2012. Biological imaging software tools. Nature Methods 9 (7): 697–710.

    Article  Google Scholar 

  • Esteva, A., B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau, et al. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542: 115.

    Article  Google Scholar 

  • Fahr, P., J. Buchanan, and S. Wordsworth. 2019. A review of the challenges of using biomedical big data for economic evaluations of precision medicine. Applied Health Economics and Health Policy 17 (4): 443–452.

    Article  Google Scholar 

  • Fajardo-Ortiz, D., L. Duran, L. Moreno, H. Ochoa, and V.M. Castano. 2014. Mapping knowledge translation and innovation processes in Cancer Drug Development: The case of liposomal doxorubicin. Journal of Translational Medicine 12: 227.

    Article  Google Scholar 

  • Fang, F.C., R.G. Steen, and A. Casadevall. 2012. Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of USA 109 (42): 17028.

    Article  Google Scholar 

  • FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems 2018. https://www.fda.gov/news-events/press-announcements/fda-permits-marketing-artificial-intelligence-based-device-detect-certain-diabetes-related-eye.

  • Fiorini, N., D.J. Lipman, and Z. Lu. 2017. Towards PubMed 2.0. eLife 6: e28801.

    Article  Google Scholar 

  • Frobenius Sigismond, A., and C. Mortimer. 1740. XXXI. Abstracts of the original papers communicated to the Royal Society by Sigismond Augustus Frobenius, M. D. concerning his spiritus vini æthereus: Collected by C. Mortimerj, M. D. Secr. R. S. Philosophical Transactions of the Royal Society of London 41(461): 864–870.

  • Frontiers in Plant Science: Author guidelines. n.d. https://www.frontiersin.org/journals/plant-science#author-guidelines.

  • Gene Ontology Consortium. 2001. Creating the Gene Ontology resource: Design and implementation. Genome Research 11 (8): 1425–1433.

    Article  Google Scholar 

  • Goecks, J., V. Jalili, L.M. Heiser, and J.W. Gray. 2020. How machine learning will transform biomedicine. Cell 181 (1): 92–101.

    Article  Google Scholar 

  • Gundersen, G.W., M.R. Jones, A.D. Rouillard, Y. Kou, C.D. Monteiro, A.S. Feldmann, et al. 2015. GEO2Enrichr: Browser extension and server app to extract gene sets from GEO and analyze them for biological functions. Bioinformatics 31 (18): 3060–3062.

    Article  Google Scholar 

  • Hannun, A.Y., P. Rajpurkar, M. Haghpanahi, G.H. Tison, C. Bourn, M.P. Turakhia, et al. 2019. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 25 (1): 65–69.

    Article  Google Scholar 

  • Harvey, J.F. 1978. The abstract journal, 1790–1920: Origin, development and diffusion. Manzer, Bruce M. Metuchen, NJ, Scarecrow Press; 1977: 321 pp. Price $12.50. Journal of the American Society for Information Science 29(4): 213.

  • He, X.R., S.Y. Han, and P.P. Li. 2017. Recent highlights of Chinese medicine for advanced lung cancer. Chinese Journal of Integrative Medicine 23 (5): 323–330.

    Article  Google Scholar 

  • Hollick, F.S.J., and J. Gray. 1940. The flight of the dipterous fly Muscina stabulans Fallén. Philosophical Transactions of the Royal Society of London Series b, Biological Sciences 230 (572): 357–390.

    Google Scholar 

  • Househ, M.S., B. Aldosari, A. Alanazi, A.W. Kushniruk, and E.M. Borycki. 2017. Big data, big problems: A healthcare perspective. Studies in Health Technology and Informatics 238: 36–39.

    Google Scholar 

  • Hu, L., D. Bell, S. Antani, Z. Xue, K. Yu, M.P. Horning, et al. 2019. An observational study of deep learning and automated evaluation of cervical images for cancer screening. Journal of the National Cancer Institute 111 (9): 923–932.

    Article  Google Scholar 

  • Hunter, P. 2020. The, “industrial” revolution in biomedical research: Data explosion and reproducibility crisis drive changes in lab workflows. EMBO Reports 21 (2): e50003.

    Article  Google Scholar 

  • Hura, M., G. McLeod, E. Larson, J. Schneider, and D. Gonzales. 2000. Interoperability: A continuing challenge in coalition air operations. Santa Monica: Rand Corp.

    Google Scholar 

  • Igbokwe, O. 2018. How Image Analysis and Natural Language Processing can be combined to improve Precision Medicine. https://www.medium.com/@obiigbokwe/how-image-analysis-and-natural-language-processing-can-be-combined-to-improve-precision-medicine-67d72f9853ea.

  • Inau, E.T., J. Sack, D. Waltemath, and A.A. Zeleke. 2021. Initiatives, concepts, and implementation practices of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in health data stewardship practice: Protocol for a scoping review. JMIR Research Protocols 10 (2): e22505.

    Article  Google Scholar 

  • International Journal of Molecular Sciences: Instructions for authors. n.d. https://www.mdpi.com/journal/ijms/instructions.

  • Interoperability in healthcare. n.d. https://www.himss.org/resources/interoperability-healthcare.

  • Johnson, C., W. Lau, A. Bhandari, and T. Hays. 2008. A best-fit model for concept vectors in biomedical research grants. In AMIA annual symposium proceedings, 2008, 993.

  • Journal of Experimental Biology: Manuscript preparation. n.d. https://www.jeb.biologists.org/content/manuscript-prep.

  • Landhuis, E. 2016. Scientific literature: Information overload. Nature 535 (7612): 457–458.

    Article  Google Scholar 

  • Lee, C.H., and H.J. Yoon. 2017. Medical big data: Promise and challenges. Kidney Research and Clinical Practice 36 (1): 3–11.

    Article  Google Scholar 

  • Leitner, F., M. Krallinger, C. Rodriguez-Penagos, J. Hakenberg, C. Plake, C.-J. Kuo, et al. 2008. Introducing meta-services for biomedical information extraction. Genome Biology 9 (Suppl 2): S6.

    Article  Google Scholar 

  • Lekschas, F., and N. Gehlenborg. 2018. SATORI: A system for ontology-guided visual exploration of biomedical data repositories. Bioinformatics 34 (7): 1200–1207.

    Article  Google Scholar 

  • Leong, C.W., R. Mihalcea, and S. Hassan. 2010. Text mining for automatic image tagging. UNT Digital Library.

  • Lorgelly, P.K., B. Doble, and R.J. Knott. 2016. Realising the value of linked data to health economic analyses of cancer care: A case study of cancer 2015. PharmacoEconomics 34 (2): 139–154.

    Article  Google Scholar 

  • Mansournia, M.A., G.S. Collins, R.O. Nielsen, M. Nazemipour, N.P. Jewell, D.G. Altman, et al. 2021. CHecklist for statistical Assessment of Medical Papers: The CHAMP statement. British Journal of Sports Medicine 55 (18): 1009–1017.

    Article  Google Scholar 

  • Marsh, N. 2016. How much does research cost? https://www.truii.com/data-curio-blog/business-insights/how-much-does-research-cost/.

  • Maudsley, S., V. Devanarayan, B. Martin, H. Geerts, Brain Health Modeling Initiative. 2018. Intelligent and effective informatic deconvolution of “Big Data” and its future impact on the quantitative nature of neurodegenerative disease therapy. Alzheimer’s and Dementia: the Journal of the Alzheimer’s Association 14 (7): 961–975.

    Article  Google Scholar 

  • McCray, A.T., A.C. Browne, and O. Bodenreider. 2002. The lexical properties of the Gene Ontology. In Proceedings of AMIA symposium, 2002, 504–508.

  • McMahon, A., A. Buyx, and B. Prainsack. 2020. Big data governance needs more collective responsibility: The role of harm mitigation in the governance of data use in medicine and beyond. Medical Law Review 28 (1): 155–182.

    Google Scholar 

  • MEDLINE PubMed production statistics. n.d. https://www.nlm.nih.gov/bsd/medline_pubmed_production_stats.html.

  • Moradi, M., and N. Ghadiri. 2017. Quantifying the informativeness for biomedical literature summarization: An itemset mining method. Computer Methods and Programs in Biomedicine 146: 77–89.

    Article  Google Scholar 

  • Nature Chemical Biology: For authors. n.d. https://www.nature.com/nchembio/for-authors/preparing-your-submission#formatting.

  • NHS England. 2018. National genomic test directories. NHS England.

  • Pampel, H., P. Vierkant, F. Scholze, R. Bertelmann, M. Kindling, J. Klump, et al. 2013. Making research data repositories visible: The re3data.org Registry. PLoS ONE 8 (11): e78080.

    Article  Google Scholar 

  • Parciak, M., T. Bender, U. Sax, and C.R. Bauer. 2019. Applying FAIRness: Redesigning a biomedical informatics research data management pipeline. Methods of Information in Medicine 58 (6): 229–234.

    Article  Google Scholar 

  • Pividori, M., A. Cernadas, L.A. de Haro, F. Carrari, G. Stegmayer, and D.H. Milone. 2019. Clustermatch: Discovering hidden relations in highly diverse kinds of qualitative and quantitative data without standardization. Bioinformatics 35 (11): 1931–1939.

    Article  Google Scholar 

  • PLoS Genetics: Submission guidelines. n.d. https://www.journals.plos.org/plosgenetics/s/submission-guidelines.

  • Pop, M., and S.L. Salzberg. 2015. Use and mis-use of supplementary material in science publications. BMC Bioinformatics 16 (1): 237.

    Article  Google Scholar 

  • Pulverer, B. 2014. Transparent, reproducible data. The EMBO Journal 33 (22): 2597.

    Article  Google Scholar 

  • Renganathan, V. 2017. Text mining in biomedical domain with emphasis on document clustering. Healthcare Informatics Research 23 (3): 141–146.

    Article  Google Scholar 

  • RepositoryFinder. n.d. https://www.repositoryfinder.datacite.org.

  • Santos, C., J. Blake, and D.J. States. 2005. Supplementary data need to be kept in public repositories. Nature 438 (7069): 738.

    Article  Google Scholar 

  • Sinaci, A.A., F.J. Núñez-Benjumea, M. Gencturk, M.-L. Jauer, T. Deserno, C. Chronaki, et al. 2020. From raw data to FAIR data: The FAIRification workflow for health research. Methods of Information in Medicine 59 (S 01): e21–e32.

    Article  Google Scholar 

  • Sing, D.C., L.N. Metz, and S. Dudli. 2017. Machine learning-based classification of 38 years of spine-related literature into 100 research topics. Spine (phila Pa 1976) 42 (11): 863–870.

    Article  Google Scholar 

  • Smith, B., M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, et al. 2007. The OBO Foundry: Coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 25 (11): 1251–1255.

    Article  Google Scholar 

  • Smith, B., S. Arabandi, M. Brochhausen, M. Calhoun, P. Ciccarese, S. Doyle, et al. 2015. Biomedical imaging ontologies: A survey and proposal for future work. Journal of Pathology Informatics 6. http://europepmc.org/abstract/MED/26167381.

  • Sollaci, L.B., and M.G. Pereira. 2004. The introduction, methods, results, and discussion (IMRAD) structure: A fifty-year survey. Journal of the Medical Library Association 92 (3): 364–367.

    Google Scholar 

  • Swanson, D.R. 1988. Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine 31 (4): 526–557.

    Article  Google Scholar 

  • The EMBO Journal: Author guidelines. https://www.embopress.org/page/journal/14602075/authorguide#researcharticleguide.

  • The FASEB Journal: Research articles. n.d. https://www.fasebj.org/researcharticles.

  • The Journal of Biological Chemistry: Instructions for authors. http://www.jbc.org/site/misc/ifora.xhtml#preparing_text.

  • The Retraction Watch Database. n.d. http://www.retractiondatabase.org/.

  • Topol, E.J. 2019. High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine 25 (1): 44–56.

    Article  Google Scholar 

  • van Gastel, J., J.O. Hendrickx, H. Leysen, B. Martin, L. Veenker, S. Beuning, et al. 2019. Enhanced molecular appreciation of psychiatric disorders through high-dimensionality data acquisition and analytics. Methods in Molecular Biology (clifton, NJ) 2011: 671–723.

    Article  Google Scholar 

  • Van Leeuwenhoek, A. 1683, An abstract of a letter from Mr. Anthony Leewenhoeck of Delft to Mr. R. H. concerning the appearances of several woods, and their vessels. Philosophical Transactions of the Royal Society of London 13(148): 197–208.

  • Vizcaino, J.A., A. Csordas, N. del Toro, J.A. Dianes, J. Griss, I. Lavidas, et al. 2016. (2016) Update of the PRIDE database and its related tools. Nucleic Acids Research 44 (D1): D447–D456.

    Article  Google Scholar 

  • Wang, B., X. Chen, H. Mamitsuka, and S. Zhu. 2015. BMExpert: Mining MEDLINE for finding experts in biomedical domains based on language model. IEEE/ACM Transactions on Computational Biology and Bioinformatics 12 (6): 1286–1294.

    Article  Google Scholar 

  • Welty, L.J., L.V. Rasmussen, A.S. Baldridge, and E.W. Whitley. 2020. Facilitating reproducible research through direct connection of data analysis with manuscript preparation: StatTag for connecting statistical software to Microsoft Word. JAMIA Open 3 (3): 342–358.

    Article  Google Scholar 

  • Wilkinson, M.D., M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3 (1): 160018.

    Article  Google Scholar 

  • Wu, H., and Y. Zhou. 2017. Gene Ontology (GO) prediction using machine learning methods.

  • Zhang, S., and N. Elhadad. 2013. Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. Journal of Biomedical Informatics 46 (6): 1088–1098.

    Article  Google Scholar 

Download references

Funding

This work is supported by Fonds Wetenschappelijk Onderzoek: 42/FA010100/32/6484, 1198020N.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stuart Maudsley.

Ethics declarations

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could result in a conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Van Meenen, J., Leysen, H., Chen, H. et al. Making Biomedical Sciences publications more accessible for machines. Med Health Care and Philos 25, 179–190 (2022). https://doi.org/10.1007/s11019-022-10069-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11019-022-10069-0

Keywords

  • Interoperability
  • Machine
  • Research
  • Open access
  • Reproducibility