Assembling Biomedical Big Data

  • Sabina Leonelli


This chapter examines the challenges involved in disseminating, integrating and analyzing large datasets collected within both clinical and research settings. I highlight the technical, ethical and epistemic concerns underlying attempts to portray and use big data as revolutionary tools for producing biomedical knowledge and related interventions. When bringing together data collected on human subjects with data collected from other organisms, significant differences in the experimental cultures of biologists and clinicians emerge, which if left unchallenged may compromise the quality and validity of large-scale, cross-species data integration. The study of data integration calls attention to the fragmented, localized and inherently translational nature of biomedical research, and the challenges underlying the assemblage and interpretation of big data in this domain.



Some of the material in this chapter is based on the following paper: Leonelli, S. (2012) When Humans Are the Exception: Cross-Species Databases at the Interface of Clinical and Biological Research. Social Studies of Science 42(2): 214–236. The empirical research for that paper was funded by the UK Economic and Social Research Council, as part of the ESRC Centre for Genomics in Society; the research used to reframe and update that work was funded by the European Research Council grant award 335925 (“The Epistemology of Data-Intensive Science”). I am grateful to Alberto Cambrosio, four anonymous referees and my colleagues in Egenis for their feedback.


  1. Ankeny, Rachel A., and Sabina Leonelli. 2011. What is So Special about Model Organisms? Studies in the History and the Philosophy of Science: Part A. 42 (2): 313–332.CrossRefGoogle Scholar
  2. Baker, Karen S., and Florance Millerand. 2010. Infrastructuring Ecology: Challenges in Achieving Data Sharing. In Collaboration in the New Life Sciences, ed. John N. Parker, Niki Vermeulen, and B. Bart Penders. London: Ashgate.Google Scholar
  3. Bodenreider, Olivier. 2004. The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. Nucleic Acids Research 32: 267–270.CrossRefGoogle Scholar
  4. Borgman, Christine. 2015. Big Data, Little Data, No Data. Cambridge, MA: MIT Press.Google Scholar
  5. Bowker, Geoffrey C. 2001. Biodiversity Datadiversity. Social Studies of Science 30 (5): 643–683.CrossRefGoogle Scholar
  6. Buetow, Kenneth H. 2005. Cyberinfrastructure: Empowering a “Third Way” in Biomedical Research. Science 308 (5723): 821–824.CrossRefGoogle Scholar
  7. Bult, Carole J. 2006. From Information to Understanding: The Role of Model Organism Databases in Comparative and Functional Genomics. Animal Genetics 27 (1): 28–40.CrossRefGoogle Scholar
  8. Cambrosio, Alberto, Pascale Bourret, Vololona Rabeharisoa, and Michel Callon. 2014. Big Data and the Collective Turn in Biomedicine: How Should We Analyze Post-Genomic Practices? Tecnoscienza 5 (1): 13–44.Google Scholar
  9. Chow-White, Peter A., and Miguel Garcia-Sanchos. 2011. Global Genome Databases Bidirectional Shaping and Spaces of Convergence: Interactions between Biology and Computing from the First DNA Sequencers to Global Genome Databases. Science, Technology & Human Values 37 (1): 124–164.CrossRefGoogle Scholar
  10. Clarke, Adele E., and Joan H. Fujimura. 1992. The Right Tools for the Job. In At Work in Twentieth-Century Life Sciences. Princeton, NJ: Princeton University Press.Google Scholar
  11. Clarke, Brendan, Donald Gillies, Phyllis Illari, Federica Russo, and Jon Williamson. 2014. Mechanisms and the Evidence Hierarchy. Topoi 33: 339–360.CrossRefGoogle Scholar
  12. Davies, Rowland H. 2004. The Age of Model Organisms. Nature Reviews Genetics 5: 69–76.CrossRefGoogle Scholar
  13. Davies, Gail. 2011. Playing Dice with Mice: Building Experimental Futures in Singapore. New Genetics and Society 30: 433–441.CrossRefGoogle Scholar
  14. Edwards, Paul, et al. 2011. Science Friction: Data, Metadata and Collaborations. Social Studies of Science 41 (5): 667–690.CrossRefGoogle Scholar
  15. Fujimura, Joan H. 2015. A Different Kind of Association between Socio-Histories and Health. British Journal of Sociology 66 (1): 58–67.CrossRefGoogle Scholar
  16. Gene Ontology Consortium. 2000. Gene Ontology: Tool for the Unification of Biology. Nature Genetics 25: 25–29.CrossRefGoogle Scholar
  17. Generic Model Organism Database. 2016. Accessed July 2016.
  18. Gere, Cathy, and Bronwyn Parry. 2006. The Flesh Made Word: Banking the Body in the Age of Information. BioSocieties 1 (1): 83–98.CrossRefGoogle Scholar
  19. Hey, Tony, Stewart Tansley, and Kristine Halle. 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmont, Washington: Microsoft Research.Google Scholar
  20. Hilgartner, Steven. 1995. Biomolecular Databases: New Communication Regimes for Biology? Science Communication 17: 240–263.CrossRefGoogle Scholar
  21. Hine, Christine. 2006. Databases as Scientific Instruments and their Role in the Ordering of Scientific Work. Social Studies of Science 36 (2): 269–298.CrossRefGoogle Scholar
  22. Howe, David, et al. 2008. Big Data: The Future of Biocuration. Nature 455: 47–50.CrossRefGoogle Scholar
  23. Kaye, Jane, and Michael Stranger, eds. 2009. Principles and Practice in Biobank Governance. London: Ashgate.Google Scholar
  24. Keating, Peter, and Alberto Cambrosio. 2003. Biomedical Platforms: Realigning the Normal and the Pathological in Late-Twentieth-Century Medicine. Cambridge, MA: MIT Press.Google Scholar
  25. Kitchin, Rob. 2013. The Data Revolution. Thousand Oaks, CA: Sage Publishing.Google Scholar
  26. Kohler, Robert E. 1994. Lords of the Fly: Drosophila Genetics and the Experimental Life. Chicago, IL: University of Chicago Press.Google Scholar
  27. Kohli-Laven, Nina, Pascale Bourret, Alberto Cambrosio, and Peter Keating. 2011. Cancer Clinical Trials in the Era of Genomic Signatures: Biomedical Innovation, Clinical Utility, and Regulatory-Scientific Hybrids. Social Studies of Science 41 (4): 487–513.CrossRefGoogle Scholar
  28. Leigh Star, Susan, and James R. Griesemer. 1989. Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39. Social Studies of Science 19 (3): 387–420.CrossRefGoogle Scholar
  29. Leigh Star, Susan, and Katherine Rhleder. 1996. Steps Towards an Ecology of Infrastructure: Design and Access for Large Information Spaces. Information Systems Research 7 (1): 63–92.CrossRefGoogle Scholar
  30. Lenoir, Timothy. 1999. Shaping Biomedicine as an Information Science. In Proceedings of the 1998 Conference on the History and Heritage of Science Information Systems, ed. M.E. Bowden, T.B. Hahn and R.V. Williams. Medford, NJ: Information Today, Inc., ASIS Monograph Series, 27–45.Google Scholar
  31. Leonelli, Sabina. 2007. Growing Weed, Producing Knowledge. An Epistemic History of Arabidopsis Thaliana. History and Philosophy of the Life Sciences 29 (2): 55–87.Google Scholar
  32. ———. 2009. Centralising Labels to Distribute Data: The Regulatory Role of Genomic Consortia. In The Handbook for Genetics and Society: Mapping the New Genomic Era, ed. Paul Atkinson, P. Peter Glasner, and Margaret Lock, 469–485. London: Routledge.Google Scholar
  33. ———. 2016. Data-Centric Biology: A Philosophical Study. Chicago, IL: Chicago University Press.Google Scholar
  34. Leonelli, Sabina, and Rachel A. Ankeny. 2012. Re-Thinking Organisms: The Epistemic Impact of Databases on Model Organism Biology. Studies in History and Philosophy of Biological and Biomedical Sciences 43 (1): 29–36.CrossRefGoogle Scholar
  35. Loewy, Ilana. 1986. Between Bench and Bedside: Science, Healing, and Interleukin-2 in a Cancer Ward. Cambridge, MA: Harvard University Press.Google Scholar
  36. Lomax, Jane, and A.T. McCray. 2004. Mapping the Gene Ontology into the Unified Medical Language System. Comparative and Functional Genomics 5: 354–361.CrossRefGoogle Scholar
  37. McCarthy, Mark I., et al. 2009. Genome-Wide Association Studies for Complex Traits: Consensus, Uncertainty and Challenges. Nature Reviews Genetics 9: 356–369.CrossRefGoogle Scholar
  38. Nelson, S.J., T. Powell, S. Srinivasan, and B.L. Humphreys. 2002. The Unified Medical Language System (UMLS) Project. In Encyclopedia of Library and Information Science, ed. A. Kent and C.M. Hall, 369–378. New York: Marcel Dekker, Inc.Google Scholar
  39. Parry, Bronwyn. 2004. Trading the Genome. New York: Columbia University Press.CrossRefGoogle Scholar
  40. Quirke, Viviane, and Jean-Paul Gaudillière. 2008. The Era of Biomedicine: Science, Medicine, and Public Health in Britain and France after the Second World War. Medical History 52 (4): 441–452.CrossRefGoogle Scholar
  41. Rader, Karen. 2004. Making Mice. Princeton, NJ: Princeton University Press.Google Scholar
  42. Rogers, Susan, and Alberto Cambrosio. 2007. Making a New Technology Work: The Standardization and Regulation of Microarrays. Yale Journal of Biology and Medicine 80: 165–178.Google Scholar
  43. Rosenthal, Nathan, and Michael Ashburner. 2002. Taking Stock of Our Models: The Function and Future of Stock Centers. Nature Reviews Genetics 3: 711–717.CrossRefGoogle Scholar
  44. Solomon, Miriam. 2015. Making Medical Knowledge. Oxford: Oxford University Press.CrossRefGoogle Scholar
  45. Spradling, Allan, et al. 2006. New Roles for Model Genetic Organisms in Understanding and Treating Human Disease: Report from the 2006 Genetics Society of America Meeting. Genetics 172: 2025–2032.Google Scholar
  46. Stevens, Hallam. 2013. Life Out of Sequence. Chicago, IL: Chicago University Press.CrossRefGoogle Scholar
  47. Sunder Rajan, Kaushik. 2017. Pharmocracy. Durham, NC: Duke University Press.CrossRefGoogle Scholar
  48. Tailor, Krisa. 2016. The Patient Revolution: How Big Data and Analytics are Transforming the Health Care Experience. Wiley and SAS Business Series. Hoboken, New JerseyGoogle Scholar
  49. Taylor, Cristopher, et al. 2008. Promoting Coherent Minimum Reporting Guidelines for Biological and Biomedical Investigations: The MIBBI Project. Nature Biotechnology 26 (8): 889–896.CrossRefGoogle Scholar
  50. Vize, Peter D., and Monte Westerfield. 2015. Model Organism Databases. Genesis 53: 449.CrossRefGoogle Scholar
  51. Waldby, Catherine, and Robert Mitchell. 2006. Tissue Economies: Blood, Organs and Cell Lines in Late Capitalism. Durham and London: Duke University Press.CrossRefGoogle Scholar
  52. Wouters, Paul, and Paul Schröder. 2003. The Public Domain of Digital Research Data. Amsterdam: NIWI-KNAW.Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  • Sabina Leonelli
    • 1
  1. 1.Exeter UniversityExeterUK

Personalised recommendations