Biomedical innovation and translation are increasingly emphasizing research using “big data.” The hope is that big data methods will both speed up research and make its results more applicable to “real-world” patients and health services. While big data research has been embraced by scientists, politicians, industry, and the public, numerous ethical, organizational, and technical/methodological concerns have also been raised. With respect to technical and methodological concerns, there is a view that these will be resolved through sophisticated information technologies, predictive algorithms, and data analysis techniques. While such advances will likely go some way towards resolving technical and methodological issues, we believe that the epistemological issues raised by big data research have important ethical implications and raise questions about the very possibility of big data research achieving its goals.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Aboab, J., L.A. Celi, P. Charlton, et al. 2016. A “datathon” model to support cross-disciplinary collaboration. Science Translational Medicine 8(333): 333ps8.
Adams, J.U. 2015. Genetics: Big hopes for big data. Nature 527(7578): S108–S109.
Advisory Council to Google on the Right to be Forgotten. 2015. Report of the advisory council to google on the right to be forgotten. Google Docs [Online]. https://drive.google.com/file/d/0B1UgZshetMd4cEI3SjlvV0hNbDA/view?pli=1&usp=embed_facebook. Accessed October 11, 2016.
Alyass, A., M. Turcotte, and D. Meyre. 2015. From big data analysis to personalized medicine for all: Challenges and opportunities. BMC Medical Genomics 8(1): 33.
American Society of Clinical Oncology. 2017. How CancerLinQ™ Works. https://cancerlinq.org/how-it-works. Accessed September 10, 2016.
Angus, D.C. 2015. Fusing randomized trials with big data: The key to self-learning health care systems? JAMA 314(8): 767–768.
Auffray, C., R. Balling, I. Barroso, et al. 2016. Making sense of big data in health research: Towards an EU action plan. Genome medicine 8(1): 71.
Bate, A., J. Juniper, A.M. Lawton, and R.M. Thwaites. 2016. Designing and incorporating a real world data approach to international drug development and use: What the UK offers. Drug Discovery Today 21(3): 400–405.
Bender, E. 2015. Big data in biomedicine: 4 big questions. Nature 527(7576): S19–S19.
Bohensky, M.A., D. Jolley, V. Sundararajan, et al. 2010. Data linkage: A powerful research tool with potential problems. BMC Health Services Research 10(1): 346.
Booth, P. 2015. Access to anonymised patient data: Corners cannot be cut if patient confidence is to be maintained. BMJ 351: h5817.
Bourne, P.E., J.R. Lorsch, and E.D. Green. 2015. Perspective: Sustaining the big-data ecosystem. Nature 527(7576): S16–S17.
Bourzac, K. 2015. Collaborations: Mining the motherlodes. Nature 527(7576): S8–S9.
Boyd, D., and K. Crawford. 2012. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5): 662–679.
Broder, A., L. Adamic, M. Franklin, M.d. Rijke, E. Xing, and K. Yu. 2015. Big data: New paradigm or sound and fury, signifying nothing? In Proceedings of the eighth ACM international conference on web search and data mining, 5–6.
Burgio, M.R., J.P. Ioannidis, B.M. Kaminski, et al. 2013. Collaborative cancer epidemiology in the 21st century: The model of cancer consortia. Cancer Epidemiology Biomarkers & Prevention: cebp-0591.
Busch, L. 2014. Big data, big questions| A dozen ways to get lost in translation: Inherent challenges in large scale data sets. International Journal of Communication 8: 818.
Callebaut, W. 2012. Scientific perspectivism: A philosopher of science’s response to the challenge of big data biology. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences 43(1): 69–80.
Carter, P., G.T. Laurie, and M. Dixon-Woods. 2015. The social licence for research: Why care.data ran into trouble. Journal of Medical Ethics. epub ahead of print, January 23. doi:10.1136/medethics-2014-102374.
Caulfield, T. 2004. Biotechnology and the popular press: Hype and the selling of science. Trends in Biotechnology 22(7): 337–339.
Chawla, N.V., and D.A. Davis. 2013. Bringing big data to personalized healthcare: A patient-centered framework. Journal of General Internal Medicine 28(3): 660–665.
Chen, B., and A.J. Butte. 2016. Leveraging big data to transform target selection and drug discovery. Clinical Pharmacology & Therapeutics 99(3): 285–297.
China Daily USA. 2016. China planning big data health care model for 2020 [Online]. https://iapp.org/news/a/china-planning-big-data-health-care-model-for-2020/. Accessed September 9, 2016.
Chow-White, P.A., M. MacAulay, A. Charters, and P. Chow. 2015. From the bench to the bedside in the big data age: Ethics and practices of consent and privacy for clinical genomics and personalized medicine. Ethics and Information Technology 17(3): 189–200.
Christen, M., N. Biller-Andorno, B. Bringedal, K. Grimes, J. Savulescu, and H. Walter. 2016. Ethical challenges of simulation-driven big neuroscience. AJOB Neuroscience 7(1): 5–17.
Collins, F.S., and H. Varmus. 2015. A new initiative on precision medicine. New England Journal of Medicine 372(9): 793–795.
Costa, F.F. 2014. Big data in biomedicine. Drug Discovery Today 19(4): 433–440.
Crawford, K., M.L. Gray, and K. Miltner. 2014. Big data| Critiquing big data: Politics, ethics, epistemology| Special section introduction. International Journal of Communication 8: 810.
Crump, C., K. Sundquist, and M.A. Winkleby. 2015. Transnational research partnerships: Leveraging big data to enhance US health. Journal of Epidemiology and Community Health. ePub ahead of print: March 12. doi:10.1136/jech-2015-205451.
data.gov. 2016. Open government. https://www.data.gov/open-gov/. Accessed September 15, 2016.
data.gov.uk. 2016. Opening up government. https://data.gov.uk/. Accessed September 9, 2016.
Dereli, T., Y. Coşkun, E. Kolker, Ö. Güner, M. Ağırbaşlı, and V. Özdemir. 2014. Big data and ethics review for health systems research in LMICs: Understanding risk, uncertainty and ignorance—and catching the black swans? The American Journal of Bioethics 14(2): 48–50.
Dickson, D.J., and J.D. Pfeifer. 2016. Real-world data in the molecular era: Finding the reality in the real world. Clinical Pharmacology & Therapeutics 99(2): 186–197.
Docherty, A. 2014. Big data–Ethical perspectives. Anaesthesia 69(4): 390–391.
Dove, E.S., and V. Özdemir. 2015. What role for law, human rights, and bioethics in an age of big data, consortia science, and consortia ethics? The importance of trustworthiness. Laws 4(3): 515–540.
Dove, E.S., B.M. Knoppers, and H.Z. Ma’n. 2013. An ethics safe harbor for international genomics research? Genome Medicine 5(11): 1.
Dove, E.S., D. Townend, E.M. Meslin, et al. 2016. Ethics review for international data-intensive research. Science 351(6280): 1399–1400.
Dzau, V.J., and G.S. Ginsburg. 2016. Realizing the full potential of precision medicine in health and health care. JAMA 316(16): 1659–1660.
eMERGE network. 2014. eMERGE network. https://emerge.mc.vanderbilt.edu/. Accessed September 10, 2016.
Erdmann, J. 2013. As personal genomes join big data will privacy and access shrink? Chemistry & Biology 20(1): 1–2.
Fiedler, K. 2011. Voodoo correlations are everywhere—Not only in neuroscience. Perspectives on Psychological Science 6(2): 163–171.
Fierce Biotech. 2016. 10 reasons why biotech needs big data [Online]. http://www.fiercebiotech.com/special-report/10-reasons-why-biotech-needs-big-data. Accessed September 9, 2016.
Financial Review. 2016. Medibank will use data to force hospitals, surgeons to address health costs [Online]. http://www.afr.com/business/health/hospitals-and-gps/medibank-will-use-data-to-force-hospitals-surgeons-to-address-health-costs-20160728-gqfh3n. Accessed September 9, 2016.
Fischer, T., K. Brothers, P. Erdmann, and M. Langanke. 2016. Clinical decision-making and secondary findings in systems medicine. BMC Medical Ethics 17(1): 32.
Frizzo-Barker, J., P.A. Chow-White, A. Charters, and D. Ha. 2016. Genomic big data and privacy: Challenges and opportunities for precision medicine. Computer Supported Cooperative Work (CSCW) 25(2–3): 115–136.
Gal, T.S., T.C. Tucker, A. Gangopadhyay, and Z. Chen. 2014. A data recipient centered de-identification method to retain statistical attributes. Journal of Biomedical Informatics 50: 32–45.
Genomics England. 2016. The 100,000 genomes project. https://www.genomicsengland.co.uk/the-100000-genomes-project/. Accessed September 10, 2016.
Gilbert, R., H. Goldstein, and H. Hemingway. 2015. The market in healthcare data. BMJ 4(351): h5897.
Goldstein, B.A., A.M. Navar, M.J. Pencina, and J.P. Ioannidis. 2016. Opportunities and challenges in developing risk prediction models with electronic health records data: A systematic review. Journal of the American Medical Informatics Association 24(1): 198–208.
Goossens, K., K. Van Uytfanghe, P.J. Twomey, and L.M. Thienpont. 2015. Monitoring laboratory data across manufacturers and laboratories—A prerequisite to make “Big Data” work. Clinica Chimica Acta 445: 12–18.
Hassey, A. 2015. Response of Health and Social Care Information Centre to article on access to anonymised patient data. BMJ 351: h5820.
Hemkens, L.G., D.G. Contopoulos-Ioannidis, and J.P. Ioannidis. 2016a. Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: Meta-epidemiological survey. BMJ 352: i493.
———. 2016b. Current use of routinely collected health data to complement randomized controlled trials: A meta-epidemiological survey. CMAJ Open 4(2): E132–E140.
———. 2016c. Routinely collected data and comparative effectiveness evidence: Promises and limitations. Canadian Medical Association Journal 188(8): E158.
Hendler, J. 2014. Data integration for heterogenous datasets. Big data 2(4): 205–215.
Hoffman, S. 2010. Electronic health records and research: Privacy versus scientific priorities. The American Journal of Bioethics 10(9): 19–20.
———. 2014. Citizen science: The law and ethics of public access to medical Big Data [Online]. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2491054. Accessed October 11, 2016.
———. 2016. The promise and perils of open medical data. Hastings Center Report 46(1): 6–7.
Hoffman, S., and A. Podgurski. 2013. Big bad data: Law, public health, and biomedical databases. The Journal of Law, Medicine & Ethics 41(s1): 56–60.
Hood, L., and C. Auffray. 2013. Participatory medicine: A driving force for revolutionizing healthcare. Genome Medicine 5(12): 1–4.
Howard, R. 2013. Big data hype cut down to size. Government News 33(5): 26.
Ioannidis, J.P. 2005a. Microarrays and molecular research: Noise discovery? The Lancet 365(9458): 454–455.
———. 2005b. Why most published research findings are false. PLoS Medicine 2(8): e124.
———. 2013. Informed consent, big data, and the oxymoron of research that is not research. The American Journal of Bioethics 13(4): 40–42.
Janssen, M., E. Estevez, and T. Janowski. 2014. Interoperability in big, open, and linked data—Organizational maturity, capabilities, and data portfolios. Computer 47(10): 44–49.
Jee, K., and G.-H. Kim. 2013. Potentiality of big data in the medical sector: Focus on how to reshape the healthcare system. Healthcare Informatics Research 19(2): 79–85.
Joyner, M.J., N. Paneth, and J.P. Ioannidis. 2016. What happens when underperforming big ideas in research become entrenched? JAMA 316(13): 1355–1356.
Kaiser, J. 2016. Funding for key data resources in jeopardy. Science 351(6268): 14–14.
Kaplan, B. 2016. How should health data be used? Privacy, secondary use, and big data sales. Cambridge Quarterly of Healthcare Ethics 25: 312–329.
Khoury, M.J., and J.P. Ioannidis. 2014. Big data meets public health. Science 346(6213): 1054–1055.
Larson, E.B. 2013. Building trust in the power of “big data” research to serve the public good. JAMA 309(23): 2443–2444.
Lazer, D., R. Kennedy, G. King, and A. Vespignani. 2014. The parable of Google flu: Traps in big data analysis. Science 343(6176): 1203–1205.
Madigan, D., P.B. Ryan, M. Schuemie, et al. 2013. Evaluating the impact of database heterogeneity on observational study results. American Journal of Epidemiology 178(4): 645–651.
Manrai, A.K., J.P. Ioannidis, and I.S. Kohane. 2016. Clinical genomics: From pathogenicity claims to quantitative risk estimates. JAMA 315(12): 1233–1234.
Mason, P., W. Lipworth, and I. Kerridge. 2016. More than one way to be global: Globalisation of research and the contest of ideas. American Journal of Bioethics (Open peer commentary) 16(10): 48–49.
McKinsey & Company. 2013. How big data can revolutionize pharmaceutical R&D [Online]. http://www.mckinsey.com/industries/pharmaceuticals-and-medical-products/our-insights/how-big-data-can-revolutionize-pharmaceutical-r-and-d. Accessed September 9, 2016.
Medicines and Healthcare Products Regulatory Agency. 2017. Welcome to The Clinical Practice Research Datalink. https://www.cprd.com/home/. Accessed February 13, 2017.
Mischak, H., E. Critselis, S. Hanash, W.M. Gallagher, A. Vlahou, and J.P. Ioannidis. 2015. Epidemiologic design and analysis for proteomic studies: A primer on-omic technologies. American Journal of Epidemiology 181(9): 635–647.
Mittelstadt, B.D., and L. Floridi. 2016. The ethics of big data: Current and foreseeable issues in biomedical contexts. Science and Engineering Ethics 22(2): 303–341.
Moore, S.M., D.R. Maffitt, K.E. Smith, et al. 2015. De-identification of medical images with retention of scientific research value. RadioGraphics 35(3): 727–735.
Mostert, M., A.L. Bredenoord, M.C. Biesaart, and J.J. van Delden. 2015. Big Data in medical research and EU data protection law: Challenges to the consent or anonymise approach. European Journal of Human Genetics 24: 956–960.
Motherboard. 2015. ‘Oblivion’ is the software that could automate the ‘right to be forgotten’ [Online]. http://motherboard.vice.com/read/oblivion-is-the-software-that-could-automate-the-right-to-be-forgotten. Accessed October 11, 2016.
Murdoch, T.B., and A.S. Detsky. 2013. The inevitable application of big data to health care. JAMA 309(13): 1351–1352.
Nair, V., C. Pritchard, M. Tewari, and J. Ioannidis. 2014. Design and analysis for studying microRNAs in human disease: A primer on -omic Technologies. American Journal of Epidemiology 180(2): 140–152.
National Institutes of Health. 2016. Precision Medicine Initiative Cohort Program. https://www.nih.gov/precision-medicine-initiative-cohort-program. Accessed September 10, 2016.
National Patient-Centered Clinical Research Network. 2016. Patient-powered research networks. http://www.pcornet.org/patient-powered-research-networks/. Accessed September 10, 2016.
Newman, A.L. 2015. What the “right to be forgotten” means for privacy in a digital age. Science 347(6221): 507–508.
NHS England. 2016. The care.data programme [Online]. https://www.england.nhs.uk/ourwork/tsd/care-data/. Accessed September 9, 2016.
Office of Science and Technology Policy. 2012. Obama administration unveils “big data” initiative. https://obamawhitehouse.archives.gov/the-press-office/2015/11/19/release-obama-administration-unveils-big-data-initiative-announces-200. Accessed September 9, 2016.
Oye, K.A., G. Jain, M. Amador, et al. 2015. The next frontier: Fostering innovation by improving health data access and utilization. Clinical Pharmacology & Therapeutics 98(5): 514–521.
Parikh, R.B., M. Kakad, and D.W. Bates. 2016. Integrating predictive analytics into high-value care: The dawn of precision delivery. JAMA 315(7): 651–652.
Patel, C.J., B. Burford, and J.P. Ioannidis. 2015. Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. Journal of Clinical Epidemiology 68(9): 1046–1058.
Patel, C.J., J. Ji, J. Sundquist, J.P. Ioannidis, and K. Sundquist. 2016. Systematic assessment of pharmaceutical prescriptions in association with cancer risk: A method to conduct a population-wide medication-wide longitudinal study. Scientific Reports 6(Aug 10): 31308.
Personal Genome Project. 2016. Sharing personal genomes. http://www.personalgenomes.org/. Accessed September 10, 2016.
Ploug, T., and S. Holm. 2015. Meta consent: A flexible and autonomous way of obtaining informed consent for secondary research. BMJ 350: h2146.
Prasser, F., F. Kohlmayer, and K.A. Kuhn. 2016a. Efficient and effective pruning strategies for health data de-identification. BMC Medical Informatics and Decision Making 16(1): 1.
———. 2016b. The importance of context: Risk-based de-identification of biomedical data. Methods of Information in Medicine 55(4): 347–355.
Propellor. 2016. Citizen science and digital health tools inspiring public health impact. https://www.propellerhealth.com/2016/04/07/citizen-science-and-digital-health-tools-inspiring-public-health-impact/. Accessed September 9, 2016.
Puschmann, C., and J. Burgess. 2014. Big data, big questions| Metaphors of big data. International Journal of Communication 8: 820.
Raghupathi, W., and V. Raghupathi. 2014. Big data analytics in healthcare: Promise and potential. Health Information Science and Systems 2(1): 1–10.
Roden, D.M., and J.C. Denny. 2016. Integrating electronic health record genotype and phenotype datasets to transform patient care. Clinical Pharmacology & Therapeutics 99(3): 298–305.
Roski, J., G.W. Bo-Linn, and T.A. Andrews. 2014. Creating value in health care through big data: Opportunities and policy implications. Health Affairs 33(7): 1115–1122.
Rothstein, M.A. 2015. Ethical issues in big data health research: Currents in contemporary bioethics. The Journal of Law, Medicine & Ethics 43(2): 425–429.
Rothwell, P.M. 2005. Subgroup analysis in randomised controlled trials: Importance, indications, and interpretation. The Lancet 365(9454): 176–186.
Ryan, P.B., D. Madigan, P.E. Stang, J. Marc Overhage, J.A. Racoosin, and A.G. Hartzema. 2012. Empirical assessment of methods for risk identification in healthcare data: Results from the experiments of the Observational Medical Outcomes Partnership. Statistics in Medicine 31(30): 4401–4415.
Sacristán, J.A., and T. Dilla. 2015. No big data without small data: Learning health care systems begin and end with the individual patient. Journal of Evaluation in Clinical Practice 21(6): 1014–1017.
Sboner, A., X.J. Mu, D. Greenbaum, R.K. Auerbach, and M.B. Gerstein. 2011. The real cost of sequencing: Higher than you think! Genome Biology 12(8): 1.
Scaiano, M., G. Middleton, L. Arbuckle, et al. 2016. A unified framework for evaluating the risk of re-identification of text de-identification tools. Journal of Biomedical Informatics 63: 174–183.
Schadt, E.E. 2012. The changing privacy landscape in the era of big data. Molecular Systems Biology 8(1): 612.
Schneeweiss, S., and J. Avorn. 2005. A review of uses of health care utilization databases for epidemiologic research on therapeutics. Journal of Clinical Epidemiology 58(4): 323–337.
Scientific American. 2014. Citizen science is stimulating a wealth of innovative projects. http://www.scientificamerican.com/article/citizen-science-is-stimulating-a-wealth-of-innovative-projects/. Accessed September 9, 2016.
Shah, N.H., and J.D. Tenenbaum. 2012. The coming age of data-driven medicine: Translational bioinformatics’ next frontier. Journal of the American Medical Informatics Association 19(e1): e2–e4.
Shoenbill, K., N. Fost, U. Tachinardi, and E.A. Mendonca. 2014. Genetic data and electronic health records: A discussion of ethical, logistical and technological considerations. Journal of the American Medical Informatics Association 21(1): 171–180.
Souilmi, Y., A.K. Lancaster, J.-Y. Jung, et al. 2015. Scalable and cost-effective NGS genotyping in the cloud. BMC Medical Genomics 8(1): 64.
Stoeklé, H.-C., M.-F. Mamzer-Bruneel, G. Vogt, and C. Hervé. 2016. 23andMe: A new two-sided data-banking market model. BMC Medical Ethics 17(1): 1.
Swan, M. 2013. The quantified self: Fundamental disruption in big data science and biological discovery. Big Data 1(2): 85–99.
———. 2012. Protecting patient privacy in the age of big data. UMKC Law Review 81: 385.
Terry, N. 2013. Big data proxies and health privacy exceptionalism. Health Matrix 2465-108.
Tzoulaki, I., T.M. Ebbels, A. Valdes, P. Elliott, and J.P. Ioannidis. 2014. Design and analysis of metabolomics studies in epidemiological research: A primer on-omic technologies. American Journal of Epidemiology 180(2): 129–139.
Vayena, E., M. Salathé, L.C. Madoff, and J.S. Brownstein. 2015. Ethical challenges of big data in public health. PLoS Computational Biology 11(2): e1003904.
Vicini, P., O. Fields, E. Lai, et al. 2016. Precision medicine in the age of big data: The present and future role of large-scale unbiased sequencing in drug discovery and development. Clinical Pharmacology & Therapeutics 99(2): 198–207.
Waldman, S., and A. Terzic. 2016. Big data transforms discovery–utilization therapeutics continuum. Clinical Pharmacology & Therapeutics 99(3): 250–254.
Weber, G.M., K.D. Mandl, and I.S. Kohane. 2014. Finding the missing link for big biomedical data. JAMA 311(24): 2479–2480.
Zarate, O.A., J.G. Brody, P. Brown, M.D. Ramirez‐Andreotta, L. Perovich, and J. Matz. 2016. Balancing benefits and risks of immortal data. Hastings Center Report 46(1): 36–45.
Zuccon, G., D. Kotzur, A. Nguyen, and A. Bergheim. 2014. De-identification of health records using Anonym: Effectiveness and robustness across datasets. Artificial Intelligence in Medicine 61(3): 145–151.
Zulman, D.M., N.H. Shah, and A. Verghese. 2016. Evolutionary pressures on the electronic health record: Caring for complexity. JAMA 316(9): 923–924.
We would like to thank Associate Professor Ainsley Newson for her helpful guidance on an earlier version of this article.
Research related to this article has been funded by the National Health and Medical Research Council (Career Development Fellowship APP1036539 and Project Grant APP APP1083980).
Conflict of Interest
The authors have no conflicts of interest.
About this article
Cite this article
Lipworth, W., Mason, P.H., Kerridge, I. et al. Ethics and Epistemology in Big Data Research. Bioethical Inquiry 14, 489–500 (2017). https://doi.org/10.1007/s11673-017-9771-3