European Journal of Epidemiology

, Volume 33, Issue 3, pp 245–257 | Cite as

Epidemiology in wonderland: Big Data and precision medicine

  • Rodolfo SaracciEmail author


Big Data and precision medicine, two major contemporary challenges for epidemiology, are critically examined from two different angles. In Part 1 Big Data collected for research purposes (Big research Data) and Big Data used for research although collected for other primary purposes (Big secondary Data) are discussed in the light of the fundamental common requirement of data validity, prevailing over “bigness”. Precision medicine is treated developing the key point that high relative risks are as a rule required to make a variable or combination of variables suitable for prediction of disease occurrence, outcome or response to treatment; the commercial proliferation of allegedly predictive tests of unknown or poor validity is commented. Part 2 proposes a “wise epidemiology” approach to: (a) choosing in a context imprinted by Big Data and precision medicine—epidemiological research projects actually relevant to population health, (b) training epidemiologists, (c) investigating the impact on clinical practices and doctor-patient relation of the influx of Big Data and computerized medicine and (d) clarifying whether today "health" may be redefined—as some maintain in purely technological terms.


Big data Datome Doctor-patient relation Epidemiological research Epidemiology training Health definition Population health Precision for commerce Precision medicine Validity Wise epidemiology 



I wish to thank Albert Hofman for his invitation to write this essay and for his patience in waiting for it.


  1. 1.
    Watson JD, Crick FHC. Molecular structure of nucleic acids—a structure for deoxyribose nucleic acid. Nature. 1953;171:737–8.CrossRefPubMedGoogle Scholar
  2. 2.
    Blackburn EK, Callender ST, Dacie JV, Doll R, Girdwood RH, Mollin DL, Saracci R, Stafford JL, Thompson RB, Varadi S, Wetherley-Mein G. Possible association between pernicious anaemia and leukaemia: a prospective study of 1625 patients with a note on the very high incidence of stomach cancer. Int J Cancer. 1968;3:163–7.CrossRefPubMedGoogle Scholar
  3. 3.
    Doll R, Hill AB. Mortality in relation to smoking: ten years’ observations of British doctors. BMJ. 1964;1:1399–1410, 1460–1467.CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
  5. 5.
    Keys A, editor. Seven Countries: a multivariate analysis of death and coronary heart disease. Cambridge, MA: Harvard University Press; 1980.Google Scholar
  6. 6.
  7. 7.
    Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008.Google Scholar
  8. 8.
    Hu H, Galea S, Rosella L, Henry D. Big Data and population health: focusing on the health impacts of the social, physical, and economic environment. Epidemiology. 2017;26:759–62.CrossRefGoogle Scholar
  9. 9.
    Holmes DA. Big Data. A very short introduction. Oxford: Oxford University Press; 2017.CrossRefGoogle Scholar
  10. 10.
    Schwab K. The fourth industrial revolution. Geneva: World Economic Forum; 2016.Google Scholar
  11. 11.
    Gilbert JP, Meier P, Rumke CL, Saracci R, Zelen M, White C. Report of the Committee for the assessment of biometric aspects of controlled trials of hypoglycemic agents. JAMA 1975; 231:583–608.CrossRefGoogle Scholar
  12. 12.
    Margetts BM, Pietinen P, Riboli E, editors. European prospective investigation into cancer and nutrition: validity studies on dietary assessment methods. Int J Epidemiol. 1997;26(suppl 1):S1–89.Google Scholar
  13. 13. Accessed 9 Mar 2018.
  14. 14. Accessed 9 Mar 2018.
  15. 15.
    Anderson CA, Petterson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT. Data quality control in genetic case-control association studies. Nat Protoc. 2010;5:1564–73.CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Dunn WB, Broadhurst DI, Edison A, Guillou C, Viant MR, Bearden DW, Beger RD. Quality assurance and quality control processes: summary of a metabolomics community questionnaire. Metabolomics. 2017. Scholar
  17. 17.
    Brennan P, Perola M, van Ommen GJ, Riboli E. European cohort Consortium. Chronic disease research in Europe and the need for integrated population cohorts. Eur J Epidemiol. 2017;32:741–9.CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Vineis P, Chadeau-Hyam M, Gmuender H, Gulliver J, Herceg Z, Kleinjans J, Kogevinas M, Kyrtopoulos S, Nieuwenhuijsen M, Phillips DH, Probst-Hensch N, Scalbert A, Vermeulen R, Wild CP. The EXPOsOMICS Consortium. The exposome in practice: design of the EXPOsOMICS project. Int J Hyg Environ Health. 2016;220:142–51.CrossRefPubMedGoogle Scholar
  19. 19.
    The MR-Base Collaboration. MR-Base: a platform for systematic causal inference across the phenome of genetic associations. BioRxiv. 2016. Scholar
  20. 20.
    Yang A, Troup M, Ho JWK. Scalability and validation of Big Data bioinformatics software. Comput Struct Biotechnol J. 2017;15:379–86.CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Olsen J. Using secondary data. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 481–91.Google Scholar
  22. 22.
    Graunt J. Natural and political observations mentioned in a following index, and made upon the Bills of Mortality. Facsimile ed. New York: Arno Press; 1975.Google Scholar
  23. 23.
    Stafoggia M, Schwartz J, Badaloni C, Bellander T, Alessandrini E, Cattani G, De Donato F, Gaeta A, Leone G, Lyapustin A, Sorek-Hamer M, de Hoogh K, Di Q, Forastiere F, Kloog I. Estimation of daily PM10 concentrations in Italy (2006–2012) using finely resolved satellite data, land use variables and meteorology. Environ Int. 2017;99:234–44.CrossRefPubMedGoogle Scholar
  24. 24.
    Krieger N. A century of census tract: health and the body politic (1906–2006). J Urban Health. 2006;83:355–61.CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Greenland S, Rothman KJ. Fundamentals of epidemiologic data analysis. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 213–8.Google Scholar
  26. 26.
    CIOMS. International ethical guidelines for health-related research involving humans. Geneva: CIOMS; 2016. p. 41–5.Google Scholar
  27. 27.
    Armstrong B, Doll R. Environmental factors and cancer incidence and mortality in different countries, with special reference to dietary practices. Int J Cancer. 1975;15:617–31.CrossRefPubMedGoogle Scholar
  28. 28.
    Pukkala E, Martinsen JI, Lynge E, Gunnarsdottir HK, Sparén P, Tryggvadottir L, Weiderpass E, Kjaerheim K. Occupation and cancer—follow-up of 15 million people in five Nordic countries. Acta Oncol. 2009;48:646–790.CrossRefPubMedGoogle Scholar
  29. 29.
    Benjamini Y. Simultaneous and selective inference: current successes and future challenges. Biom J. 2010;52:708–21.CrossRefPubMedGoogle Scholar
  30. 30.
    Anderson C. The end of theory: the data deluge makes the scientific method obsolete. Accessed 9 Mar 2018.
  31. 31.
    Calude C, Longo G. The Deluge of spurious correlations in Big Data. Accessed 9 Mar 2018.
  32. 32.
    Hume D. In: Sellby-Bigge LA, editors. A treatise of human nature. Oxford: Oxford University Press; 1978.Google Scholar
  33. 33.
    Daniel RM, De Stavola BL, Vansteelandt S. Commentary: the formal approach to quantitative causal inference: misguided or misrepresented? Int J Epidemiol. 2016;45:1817–29.PubMedGoogle Scholar
  34. 34.
    Lazer D, Kennedy R, King G, Vespignani A. The parable of Google Flu: traps in Big Data analysis. Science. 2014;343:1203–5.CrossRefPubMedGoogle Scholar
  35. 35.
  36. 36.
    Bansal S, Chowell G, Simonsen L, Vespignani A, Viboud C. Big Data for infectious disease surveillance and modeling. J Infect Dis. 2016;214(suppl 4):S375–9.CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Wang X, Hripcsack G, Markatou M, Friedman C. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc. 2009;16:328–37.CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Sarker A, Ginn R, Nikfarjam A, O’Connor K, Smith K, Jayaraman S, Upadhaya T, Gonzalez G. Utilizing social media data for pharmacovigilance. A review. J Biomed Inform. 2015;54:202–12.CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Cell LA, Mark RG, Stone DJ, Montgomery RA. “Big Data” in the intensive care unit—closing the data loop. Am J Respir Crit Care Med. 2013;187:1157–9.CrossRefGoogle Scholar
  40. 40.
    Cochrane A. Effectiveness and efficiency: random reflections on health services. London: The Nuffield Trust; 1972. p. 51–3.Google Scholar
  41. 41.
    Ospina-Tascón GA, Buchele GL, Vincent JL. Multicenter, randomized, controlled trials evaluating mortality in intensive care: doomed to fail? Crit Care Med. 2008;36:1311–22.CrossRefPubMedGoogle Scholar
  42. 42.
    Nielsen PB, Larsen BL, Gorst-Rasmussen A, Skjoth F, Lip GYH. Beta-blockers in atrial fibrillation patients with or without heart failure. Association with mortality in a nationwide study. Circ Heart Fail. 2016;9:e002597. Scholar
  43. 43.
    Truett J, Cornfield J, Kannel WB. A multivariate analysis of the risk of coronary heart disease in Framingham. J Chron Dis. 1967;20:511–24.CrossRefPubMedGoogle Scholar
  44. 44.
    Hippisley-Cox J, Coupland C, Brindle P. NIHR CLAHRC West. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ. 2017;357:j2099.CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Thrift AP, Whiteman DC. Can we really predict risk of cancer? Cancer Epidemiol. 2013;37:349–52.CrossRefPubMedGoogle Scholar
  46. 46.
    Authors Various. Special section: causality in epidemiology. Int J Epidemiol. 2017;45:1776–2206.Google Scholar
  47. 47.
    Wald NJ, Hackshaw AK, Frost CD. When can a risk factor be used as a worthwhile screening test? BMJ. 1999;319:1562–5.CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Ketola E, Laatikainen T, Vartiainen E. Evaluating risk for cardiovascular diseases-vain or value? How do different cardiovascular risk scores act in real life. Eur J Pub Health. 2009;20:107–12.CrossRefGoogle Scholar
  49. 49.
    van Staa TP, Gulliford M, Ng ES-W, Goldacre B, Smeeth L. Prediction of cardiovascular risk using Framingham, ASSIGN and QRISK2: how well do they predict individual rather than population risk ? PLoS ONE. 2014;9:e106455.CrossRefPubMedPubMedCentralGoogle Scholar
  50. 50.
    Janssens ACJW, van Duijn CM. Genome-based prediction of common diseases: advances and prospects. Hum Mol Genet. 2008;17:R166–73.CrossRefPubMedGoogle Scholar
  51. 51.
    Hopper JL. Genetics for population and public health. Int J Epidemiol. 2017;45:8–11.CrossRefGoogle Scholar
  52. 52.
    Stepien M, Fedirko V, Duarte-Salles T, Ferrari P, Freisling H, Trepo E, Trichopoulou A, Bamia C, Weiderpass E, Olsen A, Tjonneland A, Overvad K, Boutron-Ruault MC, Fagherazzi G, Racine A, Khun T, Kaaks R, Aleksandrova K, Boeing H, Lagiou P, et al. Prospective association of liver function biomarkers with development of hepatobiliary cancers. Cancer Epidemiol. 2016;40:179–87.CrossRefPubMedGoogle Scholar
  53. 53.
    Tanniou J, van der Tweel I, Teernstra S, Roes KCB. Sub-group analyses in confirmatory trials: time to be specific about their purposes. BMC Med Res Methodol. 2016;16:20.CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    National Institutes of Health. About-all-of-us-research-program. Accessed 9 Mar 2018.
  55. 55.
    Naylor S. What’s in a name? The evolution of “P-medicine”. Accessed 9 Mar 2018.
  56. 56.
    Lowy DR. The potential cost-effective precision medicine in low and middle-income countries. In: Presentation at the IARC 50th anniversary conference, Lyon, June 8, 2016.Google Scholar
  57. 57.
    Salgado R, Moore H, Martens JWM, Lively T, Malik S, McDermott U, Michiels S, Moscow JA, Tejpar S, McKee T, Lacombe D. IBCD-Faculty. Societal challenges of precision medicine: bringing order to chaos. Eur J Cancer. 2017;84:325–34.CrossRefPubMedGoogle Scholar
  58. 58.
    Gavin T. The second coming of consumer genomics with 3 predictions for 2018. Posted at Medcitizens 26/7/2017. Accessed 9 Mar 2018.
  59. 59.
    Khoury MJ. Direct to consumer genetic testing: think before you spit, 2017 edition! Posted at CDC 18/4/2017. Accessed 2 Feb 2018.
  60. 60.
    Camus A. Discours de Suède. Paris: Gallimard; 1958. p. 33.Google Scholar
  61. 61.
    Marr B. Big Data: 20 mind-boggling facts everyone must read. Forbes Tech. Posted September 30, 2015. Accessed 2 Feb, 2018.
  62. 62.
    Rizzati L. Digital data storage is undergoing mind-boggling growth. EETimes. Posted 14/9/2016. Accessed 9 Mar 2018.
  63. 63.
    Obermeyer Z, Emanuel EJ. Big Data, machine learning, and clinical medicine. N Engl J Med. 2016;375:1216–9.CrossRefPubMedPubMedCentralGoogle Scholar
  64. 64.
    Chen JH, Asch SM. Machine learning and prediction in medicine-beyond the peak of inflated expectations. N Engl J Med. 2017;376:2507–9.CrossRefPubMedGoogle Scholar
  65. 65.
    Levinson J, Price BH, Saini V. Death by a thousand clicks: leading Boston doctors decry electronic medical records. Accessed 9 Mar 2018.
  66. 66.
    Hartzband P, Groopman J. Medical taylorism. N Engl J Med. 2016;374:106–8.CrossRefPubMedGoogle Scholar
  67. 67.
    Catalyst NEJM. Physician burnout: the root of the problem and the path to solutions. Waltham MA:; 2017.Google Scholar
  68. 68.
    Harari YN. Homo Deus. London: Vintage; 2016. p. 427–62.Google Scholar
  69. 69.
    Porter R. The greatest benefit to mankind. London: Fontana Press; 1997.Google Scholar
  70. 70.
    Wootton D. Bad medicine. Oxford: Oxford University Press; 2007.Google Scholar
  71. 71.
    Eliot TS. Collected poems 1909–1962. London: Farber & Farber; 1963. p. 161.Google Scholar
  72. 72.
    ABIM Foundation. Choosing Wisely. Accessed 9 Mar 2018.
  73. 73.
    Rothman JK. The growing rift between epidemiologists and their data. Eur J Epidemiol. 2017;32:863–5.CrossRefPubMedGoogle Scholar
  74. 74.
    Saracci R, Simonato L, Acheson ED, Andersen A, Bertazzi PA, Claude J, Charnay N, Estève J, Frentzel-Beyme RR, Gardner MJ. Mortality and incidence of cancer of workers in the man made vitreous fibres producing industry: an international investigation at 13 European plants. Brit J Ind Med. 1984;41:425–36.Google Scholar
  75. 75.
    Baris YI, Saracci R, Simonato L, Skidmore JW, Artvinli M. Malignant mesothelioma and radiological chest abnormalities in two villages in Central Turkey, An epidemiological and environmental investigation. Lancet. 1981;1:984–7.CrossRefPubMedGoogle Scholar
  76. 76.
    Saracci R, Wild C. International Agency for Research on Cancer. The first fifty years, 1965–2015. Lyon: International Agency for Research on Cancer 2015.
  77. 77.
    Verghese A, Shah NH. What this computer needs is a physician—humanism and artificial intelligence. JAMA. 2018;319:19–20.CrossRefPubMedGoogle Scholar
  78. 78.
    Gogniat V. La santé redéfinie par les technologies. Genève: Le Temps. 28 Jan 2018.Google Scholar
  79. 79.
    World Health Organization. Basic documents. 47th ed. Geneva: WHO; 2009. p. 1.Google Scholar
  80. 80.
    Saracci R. The World Health Organization needs to reconsider its definition of health. BMJ. 1997;314:1409–10.CrossRefPubMedPubMedCentralGoogle Scholar
  81. 81.
    McKinsey Global Institute. The age of analytics: competing in a data-driven world. McKinsey Global Institute 2016. Accessed 2 Feb 2018.
  82. 82.
    Einstein A. In: The ultimate quotable Einstein. Calaprice A, editor. Princeton: Princeton University Press; 2010. p. 409.Google Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  1. 1.LyonFrance

Personalised recommendations