Skip to main content

Nonhypothesis-Driven Research: Data Mining and Knowledge Discovery

  • Chapter
  • First Online:
Clinical Research Informatics

Abstract

Clinical information, stored over time and increasingly linked to other types of information such as environmental and social determinants of health and healthcare claims, is a potentially rich data source for clinical research. Knowledge discovery in databases (KDD) is a process for pattern discovery and predictive modeling in large databases. KDD encompasses and makes extensive use of data-mining methods—automated processes and algorithms that enable pattern recognition and classification. Characteristically, KDD involves the use of machine learning methods developed in the domain of artificial intelligence and information retrieval. These methods, which include both structure learning and parameter learning, have been applied to healthcare and biomedical data for various purposes with good success and potential or realized clinical translation. We introduce the Fayyad model of knowledge discovery in databases and describe the steps of the process, providing select examples from clinical research informatics. These steps range from initial data selection and preparation to interpretation and evaluation. Commonly used data-mining methods are surveyed: artificial neural networks, decision-tree induction, support vector machines (kernel methods), association-rule induction, k-nearest neighbor, and probabilistic methods such as Bayesian networks. We link methods for evaluating the models that result from the KDD process to methods used in diagnostic medicine, spotlighting measures derived from a confusion matrix and receiver operating characteristic curve analysis and, more recently, uncertainty quantification and conformal prediction. Throughout the chapter, we discuss salient aspects of biomedical data management and use, including applications, the use of FAIR principles, pipelines and infrastructure for KDD, and future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med. 2000;342(25):1878–86. Epub 2000/06/22. https://doi.org/10.1056/NEJM200006223422506.

    Article  CAS  PubMed  Google Scholar 

  2. Aronsky D, Fiszman M, Chapman WW, Haug PJ. Combining decision support methodologies to diagnose pneumonia. Proc AMIA Symp. 2001:12–6. Epub 2002/02/05.

    Google Scholar 

  3. Lagor C, Aronsky D, Fiszman M, Haug PJ. Automatic identification of patients eligible for a pneumonia guideline: comparing the diagnostic accuracy of two decision support models. Stud Health Technol Inform. 2001;84(Pt 1):493–7. Epub 2001/10/18.

    CAS  PubMed  Google Scholar 

  4. Rong G, Mendez A, Assi EB, Zhao B, Sawan M. Artificial intelligence in healthcare: review and prediction case studies. Engineering. 2020;6(3):291–301.

    Article  Google Scholar 

  5. Shah NH, Milstein A, Bagley S. Making machine learning models clinically useful. JAMA. 2019;322(14):1351–2. https://doi.org/10.1001/jama.2019.10306.

    Article  PubMed  Google Scholar 

  6. Beam AL, Manrai AK, Ghassemi M. Challenges to the reproducibility of machine learning models in health care. JAMA. 2020;323(4):305–6. https://doi.org/10.1001/jama.2019.20866.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Liu VX, Bates DW, Wiens J, Shah NH. The number needed to benefit: estimating the value of predictive analytics in healthcare. J Am Med Inform Assoc. 2019;26(12):1655–9. https://doi.org/10.1093/jamia/ocz088.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Stead WW. Clinical implications and challenges of artificial intelligence and deep learning. JAMA. 2018;320(11):1107–8. https://doi.org/10.1001/jama.2018.11029.

    Article  PubMed  Google Scholar 

  9. Van Calster B, Wynants L, Timmerman D, Steyerberg EW, Collins GS. Predictive analytics in health care: how can we know it works? J Am Med Inform Assoc. 2019;26(12):1651–4. https://doi.org/10.1093/jamia/ocz130.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Frey LJ, Bernstam EV, Denny JC. Precision medicine informatics. J Am Med Inform Assoc. 2016;23(4):668–70. https://doi.org/10.1093/jamia/ocw053.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Hunter DJ. Uncertainty in the era of precision medicine. N Engl J Med. 2016;375(8):711–3. https://doi.org/10.1056/NEJMp1608282.

    Article  PubMed  Google Scholar 

  12. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38. https://doi.org/10.1097/EDE.0b013e3181c30fb2.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347–58. https://doi.org/10.1056/NEJMra1814259.

    Article  PubMed  Google Scholar 

  14. Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N Engl J Med. 2020;383(9):874–82. Epub 2020/06/17. https://doi.org/10.1056/NEJMms2004740.

    Article  PubMed  Google Scholar 

  15. Cirillo D, Catuara-Solarz S, Morey C, Guney E, Subirats L, Mellino S, Gigante A, Valencia A, Rementeria MJ, Chadha AS, Mavridis N. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digit Med. 2020;3:81. Epub 2020/06/01. https://doi.org/10.1038/s41746-020-0288-5.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Fayyad U, Piatetsky-Shapiro G, et al. From data mining to knowledge discovery: an overview. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurasamy R, editors. Advances in knowledge discovery and data mining. Menlo Park, CA: AAAI Press/MIT Press; 1996. p. 1–34.

    Google Scholar 

  17. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJG, Groth P, Goble C, Grethe JS, Heringa J, ‘t Hoen PAC, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone S-A, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3(1):160018. https://doi.org/10.1038/sdata.2016.18.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Poynton MR, Frey L, et al. Representation of smoking-related concepts in an electronic health record. In: MEDINFO 2007: 12th world congress on health (medical) informatics. Brisbane, Australia; 2007.

    Google Scholar 

  19. Zheutlin AB, Vieira L, Shewcraft RA, Li S, Wang Z, Schadt E, Kao YH, Gross S, Dolan SM, Stone J, Schadt E, Li L. A comprehensive digital phenotype for postpartum hemorrhage. J Am Med Inform Assoc. 2022;29(2):321–8. https://doi.org/10.1093/jamia/ocab181.

    Article  PubMed  Google Scholar 

  20. Matheny ME, Ricket I, Goodrich CA, Shah RU, Stabler ME, Perkins AM, Dorn C, Denton J, Bray BE, Gouripeddi R, Higgins J, Chapman WW, MacKenzie TA, Brown JR. Development of electronic health record-based prediction models for 30-day readmission risk among patients hospitalized for acute myocardial infarction. JAMA Netw Open. 2021;4(1):e2035782. https://doi.org/10.1001/jamanetworkopen.2020.35782.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Minsky ML. The society of mind. New York: Simon and Schuster; 1986. p. 339.

    Google Scholar 

  22. Wolpert DH. What is important about the no free lunch theorems? In: Pardalos PM, Rasskazova V, Vrahatis MN, editors. Black box optimization, machine learning, and no-free lunch theorems. Cham: Springer International Publishing; 2021. p. 373–88.

    Chapter  Google Scholar 

  23. McCulloch WS, Pitts WH. A logical calculus of the ideas imminent in nervous activity. Bull Math Biophys. 1943;5:115–33.

    Article  Google Scholar 

  24. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. https://doi.org/10.1038/nature14539.

    Article  CAS  PubMed  Google Scholar 

  25. Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2018;19(6):1236–46. https://doi.org/10.1093/bib/bbx044.

    Article  PubMed  Google Scholar 

  26. Piccialli F, Somma VD, Giampaolo F, Cuomo S, Fortino G. A survey on deep learning in medicine: why, how and when? Inform Fusion. 2021;66:111–37. https://doi.org/10.1016/j.inffus.2020.09.006.

    Article  Google Scholar 

  27. Krizhevsky A, Sutskever I, Hinton GE, editors. ImageNet classification with deep convolutional neural networks. Curran Associates, Inc.; 2012.

    Google Scholar 

  28. Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag. 2012;29(6):82–97. https://doi.org/10.1109/MSP.2012.2205597.

    Article  Google Scholar 

  29. Quinlan J. C4.5: programs for machine learning. San Mateo, CA: Morgan Kauffman; 1993.

    Google Scholar 

  30. Inan M, Hasan R, Alam F. A hybrid probabilistic ensemble based extreme gradient boosting approach for breast cancer diagnosis. 2021. p. 1029–35.

    Google Scholar 

  31. Hassan MM, Peya ZJ, Mollick S, Billah MA, Shakil MMH, Dulla AU. Diabetes prediction in healthcare at early stage using machine learning approach. In: 2021 12th International conference on computing communication and networking technologies (ICCCNT), 6–8 Jul 2021.

    Google Scholar 

  32. Kilic A, Goyal A, Miller JK, Gleason TG, Dubrawksi A. Performance of a machine learning algorithm in predicting outcomes of aortic valve replacement. Ann Thorac Surg. 2021;111(2):503–10. https://doi.org/10.1016/j.athoracsur.2020.05.107.

    Article  PubMed  Google Scholar 

  33. Vapnik VN. The nature of statistical learning theory. New York: Springer; 1995.

    Book  Google Scholar 

  34. Vapnik VN. Statistical learning theory. New York: Wiley; 1998.

    Google Scholar 

  35. Christianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. New York: Cambridge University Press; 2000.

    Book  Google Scholar 

  36. Jonsson P, Wohlin C. Benchmarking k-nearest neighbour imputation with homogeneous Likert data. Empir Softw Eng. 2006;11(3):1382–3256.

    Google Scholar 

  37. Genolini C, Falissard B. KmL: k-means for longitudinal data. Comput Stat. 2010;25(2):317–28. https://doi.org/10.1007/s00180-009-0178-4.

    Article  Google Scholar 

  38. Genolini C, Pingault JB, Driss T, Côté S, Tremblay RE, Vitaro F, Arnaud C, Falissard B. KmL3D: a non-parametric algorithm for clustering joint trajectories. Comput Methods Programs Biomed. 2013;109(1):104–11. Epub 2012/11/03. https://doi.org/10.1016/j.cmpb.2012.08.016.

    Article  CAS  PubMed  Google Scholar 

  39. Matheny ME, Ohno-Machado L, Resnic FS. Discrimination and calibration of mortality risk prediction models in interventional cardiology. J Biomed Inform. 2005;38(5):367–75. https://doi.org/10.1016/j.jbi.2005.02.007.

    Article  CAS  PubMed  Google Scholar 

  40. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. Epub 1982/04/01. https://doi.org/10.1148/radiology.143.1.7063747.

    Article  CAS  PubMed  Google Scholar 

  41. Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L. The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform. 2005;38(5):404–15. Epub 2005/04/02. https://doi.org/10.1016/j.jbi.2005.02.008.

    Article  PubMed  Google Scholar 

  42. Biswas S, Rajan H. Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. Athens: Association for Computing Machinery; 2021. p. 981–93.

    Chapter  Google Scholar 

  43. De Balso M. Tecton, Inc. 2020 [21 Mar 2022]. Available from: https://www.tecton.ai/blog/what-is-a-feature-store/.

  44. Breuel C. Towards data science. 2020 [21 Mar 2022]. Available from: https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f.

  45. Rajan NS, Gouripeddi R, Facelli JC. A service oriented framework to assess the quality of electronic health data for clinical research. In: 2013 IEEE international conference on healthcare informatics, 9–11 Sept 2013.

    Google Scholar 

  46. Rajan NS, Gouripeddi R, Mo P, Madsen RK, Facelli JC. Towards a content agnostic computable knowledge repository for data quality assessment. Comput Methods Prog Biomed. 2019;177:193–201. https://doi.org/10.1016/j.cmpb.2019.05.017.

    Article  Google Scholar 

  47. Barocas S, Hardt M, Narayanan A, editors. Fairness and machine learning limitations and opportunities. 2018.

    Google Scholar 

  48. Verma S, Rubin J. Fairness definitions explained. In: Proceedings of the international workshop on software fairness. Gothenburg: Association for Computing Machinery; 2018. p. 1–7.

    Google Scholar 

  49. McDermott MBA, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M. Reproducibility in machine learning for health research: still a ways to go. Sci Transl Med. 2021;13(586):eabb1655. https://doi.org/10.1126/scitranslmed.abb1655.

    Article  PubMed  Google Scholar 

  50. Qayyum A, Qadir J, Bilal M, Al-Fuqaha A. Secure and robust machine learning for healthcare: a survey. IEEE Rev Biomed Eng. 2021;14:156–80. https://doi.org/10.1109/RBME.2020.3013489.

    Article  PubMed  Google Scholar 

  51. Morid MA, Sheng ORL, Kawamoto K, Abdelrahman S. Learning hidden patterns from patient multivariate time series data using convolutional neural networks: a case study of healthcare cost prediction. J Biomed Inform. 2020;111:103565. https://doi.org/10.1016/j.jbi.2020.103565.

    Article  PubMed  Google Scholar 

  52. Purushotham S, Meng C, Che Z, Liu Y. Benchmarking deep learning models on large healthcare datasets. J Biomed Inform. 2018;83:112–34. https://doi.org/10.1016/j.jbi.2018.04.007.

    Article  PubMed  Google Scholar 

  53. Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, Schacht AL. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov. 2010;9(3):203–14. https://doi.org/10.1038/nrd3078.

    Article  CAS  PubMed  Google Scholar 

  54. Kaitin KI. Deconstructing the drug development process: the new face of innovation. Clin Pharmacol Therap. 2010;87(3):356–61. https://doi.org/10.1038/clpt.2009.293.

    Article  CAS  Google Scholar 

  55. DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: new estimates of R&D costs. J Health Econ. 2016;47:20–33. Epub 2016/02/12. https://doi.org/10.1016/j.jhealeco.2016.01.012.

    Article  PubMed  Google Scholar 

  56. Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3(8):673–83.

    Article  CAS  PubMed  Google Scholar 

  57. Hay M, Thomas DW, Craighead JL, Economides C, Rosenthal J. Clinical development success rates for investigational drugs. Nat Biotechnol. 2014;32(1):40–51.

    Article  CAS  PubMed  Google Scholar 

  58. Zhang P, Wang F, Hu J. Towards drug repositioning: a unified computational framework for integrating multiple aspects of drug similarity and disease similarity. In: AMIA annual symposium proceedings. American Medical Informatics Association; 2014.

    Google Scholar 

  59. Ghofrani HA, Osterloh IH, Grimminger F. Sildenafil: from angina to erectile dysfunction to pulmonary hypertension and beyond. Nat Rev Drug Discov. 2006;5(8):689–702.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Xu H, Aldrich MC, Chen Q, Liu H, Peterson NB, Dai Q, Levy M, Shah A, Han X, Ruan X. Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality. J Am Med Inform Assoc. 2015;22(1):179–91.

    Article  PubMed  Google Scholar 

  61. Xu M, Lee EM, Wen Z, Cheng Y, Huang W-K, Qian X, Julia T, Kouznetsova J, Ogden SC, Hammack C. Identification of small-molecule inhibitors of Zika virus infection and induced neural cell death via a drug repurposing screen. Nat Med. 2016;22(10):1101–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Gouripeddi R, Balasubramanian V, Panchanathan S, Harris J, Bhaskaran A, Siegel RM. Predicting risk of complications following a drug eluting stent procedure: a SVM approach for imbalanced data. In: 2009 22nd IEEE international symposium on computer-based medical systems, 2–5 Aug 2009.

    Google Scholar 

  63. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20(1):117–21. Epub 2012/09/06. https://doi.org/10.1136/amiajnl-2012-001145.

    Article  PubMed  Google Scholar 

  64. Pepe MS. The statistical evaluation of medical tests for classification and prediction. New York: Oxford University Press; 2003.

    Google Scholar 

  65. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144–51. Epub 2012/06/25. https://doi.org/10.1136/amiajnl-2011-000681.

    Article  PubMed  PubMed Central  Google Scholar 

  66. Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, Doherty D, Forsberg K, Gao Y, Kashyap V, Kinoshita J, Luciano J, Marshall MS, Ogbuji C, Rees J, Stephens S, Wong GT, Wu E, Zaccagnini D, Hongsermeier T, Neumann E, Herman I, Cheung KH. Advancing translational research with the semantic web. BMC Bioinformatics. 2007;8(Suppl 3):S2. Epub 2007/05/09. https://doi.org/10.1186/1471-2105-8-s3-s2.

    Article  PubMed  PubMed Central  Google Scholar 

  67. National Institute of Biomedical Imaging and Bioengineering. Pediatric research using integrated sensor monitoring systems. 2022 [8 Mar 2022]. Available from: https://www.nibib.nih.gov/research-funding/prisms.

  68. Mitra K, Carvunis A-R, Ramesh SK, Ideker T. Integrative approaches for finding modular structure in biological networks. Nat Rev Genet. 2013;14:nrg3552. https://doi.org/10.1038/nrg3552.

    Article  CAS  Google Scholar 

  69. Parikh RB, Kakad M, Bates DW. Integrating predictive analytics into high-value care: the dawn of precision delivery. JAMA. 2016;315:651–2. https://doi.org/10.1001/jama.2015.19417.

    Article  CAS  PubMed  Google Scholar 

  70. Szolovits P. Uncertainty and decisions in medical informatics. Methods Inf Med. 1995;34:111–21.

    Article  CAS  PubMed  Google Scholar 

  71. Council NR. Assessing the reliability of complex models: mathematical and statistical foundations of verification, validation, and uncertainty quantification. Washington, DC: The National Academies Press; 2012. 131 p.

    Google Scholar 

  72. Pflieger LT, Mason CC, Facelli JC. Uncertainty quantification in breast cancer risk prediction models using self-reported family health history. J Clin Transl Sci. 2017;1(1):53–9. Epub 2017/01/20. https://doi.org/10.1017/cts.2016.9.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Shafer G, Vovk V. A tutorial on conformal prediction. J Mach Learn Res. 2008;9(3):371.

    Google Scholar 

  74. Balasubramanian V, Ho S-S, Vovk V. Conformal prediction for reliable machine learning: theory, adaptations and applications. Newnes; 2014.

    Google Scholar 

  75. Balasubramanian V, Gouripeddi R, Panchanathan S, Vermillion J, Bhaskaran A, Siegel R. Support vector machine based conformal predictors for risk of complications following a coronary Drug Eluting Stent procedure. In: 2009 36th Annual computers in cardiology conference (CinC), 13–16 Sept 2009.

    Google Scholar 

  76. Vazquez J, Facelli JC. Conformal prediction in clinical medical sciences. J Healthc Inform Res. 2022;6:241. https://doi.org/10.1007/s41666-021-00113-8.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Balasubramanian VN, Ho S-S, Vovk V, editors. Conformal prediction for reliable machine learning. Boston: Morgan Kaufmann; 2014. p. i.

    Book  Google Scholar 

  78. Pereira T, Cardoso S, Guerreiro M, Mendonça A, Madeira SC. Targeting the uncertainty of predictions at patient-level using an ensemble of classifiers coupled with calibration methods, Venn-ABERS, and conformal predictors: a case study in AD. J Biomed Inform. 2020;101 https://doi.org/10.1016/j.jbi.2019.103350.

  79. Papadopoulos H, Gammerman A, Vovk V. Reliable diagnosis of acute abdominal pain with conformal prediction. Eng Intell Syst. 2009;17(2):127.

    Google Scholar 

  80. Pokhrel SR, Choi J. Federated learning with blockchain for autonomous vehicles: analysis and design challenges. IEEE Trans Commun. 2020;68(8):4734–46. https://doi.org/10.1109/TCOMM.2020.2990686.

    Article  Google Scholar 

  81. Bonawitz K, Eichner H, Grieskamp W, Huba D, Ingerman A, Ivanov V, Kiddon C, Konečný J, Mazzocchi S, McMahan B. Towards federated learning at scale: system design. Proc Mach Learn Syst. 2019;1:374–88.

    Google Scholar 

  82. Xu J, Glicksberg BS, Su C, Walker P, Bian J, Wang F. Federated learning for healthcare informatics. J Healthc Inform Res. 2021;5(1):1–19. Epub 2020/11/12. https://doi.org/10.1007/s41666-020-00082-4.

    Article  PubMed  Google Scholar 

  83. Gouripeddi R, Lundrigan P, Kasera S, Collingwood S, Cummins M, Facelli JC, Sward K. Exposure health informatics ecosystem. In: Phillips KA, Yamamoto DP, Racz LA, editors. Total exposure health: an introduction. Boca Raton, FL: CRC Press; 2020.

    Google Scholar 

  84. Choudhury O, Park Y, Salonidis T, Gkoulalas-Divanis A, Sylla I, Das AK. Predicting adverse drug reactions on distributed health data using federated learning. AMIA Annu Symp Proc. 2019;2019:313–22. Epub 2020/03/04.

    PubMed  Google Scholar 

  85. Bey R, Goussault R, Grolleau F, Benchoufi M, Porcher R. Fold-stratified cross-validation for unbiased and privacy-preserving federated learning. J Am Med Inform Assoc. 2020;27(8):1244–51. https://doi.org/10.1093/jamia/ocaa096.

    Article  PubMed  PubMed Central  Google Scholar 

  86. Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001;16(3):199–231.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mollie R. Cummins .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cummins, M.R., Nachimuthu, S.K., Abdelrahman, S.E., Facelli, J.C., Gouripeddi, R. (2023). Nonhypothesis-Driven Research: Data Mining and Knowledge Discovery. In: Richesson, R.L., Andrews, J.E., Fultz Hollis, K. (eds) Clinical Research Informatics. Health Informatics. Springer, Cham. https://doi.org/10.1007/978-3-031-27173-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-27173-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27172-4

  • Online ISBN: 978-3-031-27173-1

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics