Advertisement

Big Data-Revolution oder Datenhybris?

Überlegungen zum Datenpositivismus der Molekularbiologie
  • Gabriele Gramelsberger
Artikel/Articles

Zusammenfassung

Genomdaten, Kernstück der 2008 ausgerufenen Big Data-Revolution der Biologie, werden voll automatisiert sequenziert und analysiert. Der Wechsel von der manuellen Laborpraktik der Elektrophorese-Sequenzierung zu DNA-Sequenziermaschinen und softwarebasierten Analyseprogrammen vollzog sich zwischen 1982 und 1992. Erst dieser Wechsel ermöglichte die Flut an Daten, die mit der zweiten und dritten Generation der DNA-Sequenzierer erheblich zunimmt. Doch mit diesem Wechsel verändern sich auch die Validierungsstrategien der Genomdaten. Der Beitrag untersucht beides – die Automatisierung und die damit verbundene Validierungskultur – um ein Bild der Komplexität der Datengenerierung und deren Datenpositivismus zu geben. Leitend ist dabei die Frage, ob dieser Datenpositivismus die Grundlage der aktuell angekündigten Big Data-Revolution der Molekularbiologie ist oder deren Datenhybris.

Schlüsselwörter

Gensequenzierung Automatisierung Validierung Humangenomprojekt Base-calling Algorithmen Big Data 

Big Data Revolution or Data Hubris?

On the Data Positivism of Molecular Biology

Abstract

Genome data, the core of the 2008 proclaimed big data revolution in biology, are automatically generated and analyzed. The transition from the manual laboratory practice of electrophoresis sequencing to automated DNA-sequencing machines and software-based analysis programs was completed between 1982 and 1992. This transition facilitated the first data deluge, which was considerably increased by the second and third generation of DNA-sequencers during the 2000s. However, the strategies for evaluating sequence data were also transformed along with this transition. The paper explores both the computational strategies of automation, as well as the data evaluation culture connected with it, in order to provide a complete picture of the complexity of today’s data generation and its intrinsic data positivism. This paper is thereby guided by the question, whether this data positivism is the basis of the big data revolution of molecular biology announced today, or it marks the beginning of its data hubris.

Keywords

genome sequencing automation validation human genome project base-calling algorithms big data 

Literatur

  1. 1000 Genomes Project Consortium 2010. A Map of Human Genome Variation from Population-scale Sequencing. Nature (467/7319): 1061–1073.CrossRefGoogle Scholar
  2. ABI 1996. ABI PRISM, DNA sequencing analysis software. User’s manual. Foster City, CA: PE Applied Biosystems.Google Scholar
  3. Anderson, Stephen 1981. Shotgun DNA Sequencing Using Cloned DNase I‑generated Fragments. Nucleic Acids Research (9): 3015–3027.CrossRefGoogle Scholar
  4. Anderson, Chris 2008. The End of Theory: The Data Deluge Makes the Scientific Methods Obsolete. Wired Magazine 16.07.2008.Google Scholar
  5. Bentley, David R. 2006. Whole-genome Re-sequencing. Current Opinion in Genetics & Development (16): 545–552.CrossRefGoogle Scholar
  6. Berg, P., H. Fancher und M. Chamberlin 1963. The Synthesis of Mixed Polynucleotides Containing Ribo- and Dexyribonucleotides by Purified Preparations of DNA Polymerase from Escherichia coli. In: Henry J. Vogel et al. (Hg.). Symposium on Informational Macromolecules. New York, London: Academic Press: 467–483.Google Scholar
  7. Birney, Ewan und Nicole Soranzo 2015. The End of the Start for Population Sequencing. Nature (526): 52–53.CrossRefGoogle Scholar
  8. Bonfield, James K. und Roger Staden 1995. The Application of Numerical Estimates of Base Calling Accuracy to DNA Sequencing Projects. Nucleic Acids Research (23/8): 1406–1410.CrossRefGoogle Scholar
  9. Butler, Declain 2013. When Google Got Flu Wrong. Nature (494): 155–156.CrossRefGoogle Scholar
  10. Chadarevian, Soraya de 2002. Designs for Life: Molecular Biology after World War II. Cambridge: Cambridge University Press.Google Scholar
  11. Chow-White, Peter A. und Miguel Garcia-Sancho 2012. Bidirectional Shaping and Spaces of Convergence: Interactions between Biology and Computing from the First DNA Sequencers to Global Genome Databases. Science, Technology & Human Values (37/1): 124–164.CrossRefGoogle Scholar
  12. Churchill, Gary A. und Michael S. Waterman 1992. The Accuracy of DNA Sequences: Estimating Sequence Quality. Genomics (14): 89–98.CrossRefGoogle Scholar
  13. Codd, Edgar F. 1970. A Relational Model of Data for Language Shared Data Banks. Communications of the ACM (13/6): 377–387.CrossRefGoogle Scholar
  14. Connell, Charles et al. 1987. Automated DNA Sequence Analysis. BioTechniques (5): 342–348.Google Scholar
  15. Crick, Francis 1970. Central Dogma of Molecular Biology. Nature (227): 561–563.CrossRefGoogle Scholar
  16. Dear, Simon und Roger Staden 1991. A Sequence Assembly and Editing Program for Efficient Management of Large Projects. Nucleic Acids Research (19/14): 3907–3911.CrossRefGoogle Scholar
  17. Dear, Simon und Roger Staden 1992. A Standard File Format for Data from DNA Sequencing Instruments. DNA Sequence (3/2): 107–110.CrossRefGoogle Scholar
  18. Eid, J., A. Fehr, J. Gray et al. 2009. Real-time DNA Sequencing from Single Polymerase Molecules. Science (323): 133–138.Google Scholar
  19. Ein-Dor, Liat, Gad Getz, David Givol und Eytan Domany 2005. Outcome Signature Genes in Breast Cancer: Is There a Unique Set? Bioinformatics (21/2): 171–178.CrossRefGoogle Scholar
  20. Ewing, Brent, LaDeana Hillier, Michael C. Wendl und Phil Green 1998. Base-calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment. Genome Research (8/3): 175–185.CrossRefGoogle Scholar
  21. Falk, Raphael 2010. What is a Gene? Revisited, Studies in History and Philosophy of Science. Studies in History and Philosophy of Biological and Biomedical Sciences (41/4): 396–406.CrossRefGoogle Scholar
  22. Ferry, Georgina und John Sulston 2010. The Common Thread. New York: Random House.Google Scholar
  23. Fox Keller, Evelyn 1995. Refiguring Life: Changing Metaphors in 20th Century Biology. New York: Columbia University Press.Google Scholar
  24. Fox Keller, Evelyn 2000. The Century of the Gene. Harvard: Harvard University Press.Google Scholar
  25. Fruton, Joseph 1972. Molecules and Life. New York: Wiley Intersccience 1972.Google Scholar
  26. Gabrielczyk, Thomas 2009. Editorial: Next-Generation-Publishing. Laborwelt (10/3): 3.Google Scholar
  27. Gandomi, Amirund Murtaza Haider 2015. Beyond the Hype: Big Data Concepts, Methods, and Analytics. International Journal of Information Management (35/2): 137–144.CrossRefGoogle Scholar
  28. García-Sancho, Miguel 2010. A New Insight into Sanger’s Development of Sequencing: From Proteins to DNA, 1943–1977. Journal of the History of Biology (43/2): 265–323.CrossRefGoogle Scholar
  29. García-Sancho, Miguel 2012. Biology, Computing, and the History of Molecular Sequencing: From Proteins to DNA, 1945–2000. London: Palgrave Macmillan.CrossRefGoogle Scholar
  30. GenBank 2016: Homepage. URL: https://www.ncbi.nlm.nih.gov/genbank/ (09.12.2016).
  31. Ginsberg, Jeremy et al. 2009. Detecting Influenza Epidemics Using Search Engine Query Data. Nature (457): 1012–1014.CrossRefGoogle Scholar
  32. Gramelsberger, Gabriele 2013. Simulation and Systems Understanding. In: Hanne Andersen, Dennis Dieks, Wenceslao J. Gonzalez, Thomas Uebel und Gregory Wheeler (Hg.). New Challenges to Philosophy of Science. Dordrecht: Springer: 151–161.CrossRefGoogle Scholar
  33. Heather, James M. und Benjamin Chain 2016. The Sequence of Sequencers: The History of Sequencing DNA. Genomics (107/1): 1–8.CrossRefGoogle Scholar
  34. International HapMap Consortium 2013. The International HapMap Project. Nature (426): 789–796.CrossRefGoogle Scholar
  35. International Human Genome Consortium 2001. Standard Finishing Practices and Annotation of Problem Regions for the Human Genome Project. URL: https://www.genome.gov/10001812/ (10.09.2017).Google Scholar
  36. International Human Genome Sequencing Consortium 2004. Finishing the Euchromatic Sequence of the Human Genome. Nature (431): 931–945.CrossRefGoogle Scholar
  37. Karger, Barry L. und Andras Guttman 2009. DNA Sequencing by Capillary Electrophoresis. Electrophoresis (30): 196–202.CrossRefGoogle Scholar
  38. Kay, Lilly 1988. Laboratory Technology and Biological Knowledge: The Tiselius Electrophoresis Apparatus, 1930–1945. History and Philosophy of the Life Sciences (10): 51–72.Google Scholar
  39. Kay, Lilly 2000. Who Wrote the Book of Life: A History of the Genetic Code. Stanford: Stanford University Press.Google Scholar
  40. Koren, Sergey, Michael C. Schatz, Brian P Walenz, Jeffrey Martin, Jason T Howard, Ganeshkumar Ganapathy, Zhong Wang, David A Rasko, W. Richard McCombie, Erich D. Jarvis und Adam M. Phillippy 2012. Hybrid Error Correction and de novo Assembly of Single-molecule Sequencing Reads. Nature Biotechnology (30): 693–700.CrossRefGoogle Scholar
  41. Krajewski, Markus 2007. In Formation. Aufstieg und Fall der Tabelle als Paradigma der Datenverarbeitung. Nach Feierabend. Züricher Jahrbuch für Wissenschaftsgeschichte (3): 37–55.Google Scholar
  42. Kumar, Prateek, Steven Henikoff und Pauline C. Ng 2009. Predicting the Effects of Coding Non-synonymous Variants on Protein Function Using the SIFT Algorithm. Nature Protocols (4): 1073–1081.CrossRefGoogle Scholar
  43. Lander, Eric S., Lauren M. Linton, Bruce Birren et al. 2001. Initial Sequencing and Analysis of the Human Genome. Nature (409): 860–921.CrossRefGoogle Scholar
  44. Lazer, David, Ryan Kennedy, Gary King und Alessandro Vespignani 2014. The Parable of Google Flu: Traps in Big Data Analysis. Science (343): 1203–1205.CrossRefGoogle Scholar
  45. Leonelli, Sabina 2012. Introduction: Making Sense of Data-driven Research in the Biological and Biomedical Sciences. Studies in History and Philosophy of Biological and Biomedical Sciences (43): 1–3.CrossRefGoogle Scholar
  46. Leonelli, Sabina 2014. What Difference Does Quantity Make? On the Epistemology of Big Data in Biology. Big Data & Society (1): 1–11.CrossRefGoogle Scholar
  47. Leonelli, Sabina 2016. Data-Centric Biology: A Philosophical Study. Chicago, IL: Chicago University Press.Google Scholar
  48. Leonelli, Sabina und Rachel A. Ankeny 2010. Re-thinking Organisms: The Impact of Databases on Model Organism Biology. Studies in History and Philosophy of Biological and Biomedical Sciences (43): 29–36.CrossRefGoogle Scholar
  49. Levene, M. J., J. Korlach, S. W. Turner, M. Foquet, H. G. Craighead und W. W. Webb 2003. Zero-mode Waveguides for Single-molecule Analysis at High Concentrations. Science (299): 682–686.CrossRefGoogle Scholar
  50. Margulies, Marcel, Michael Egholm, William E. Altman et al. 2005. Genome Sequencing in Microfabricated High-density Picolitre Reactors. Nature (437): 376–380.Google Scholar
  51. Marx, Vivien 2013. Biology: The Big Challenges of Big Data. Nature (498): 255–260.CrossRefGoogle Scholar
  52. Messing, Joachim und J. Vierira 1982. A New Pair of M13 Vectors for Selecting either DNA Strand of Double-digest Restriction Fragments. Gene (19/3): 269–276.CrossRefGoogle Scholar
  53. Myers, Eugene W., Granger G. Sutton, Art L. Delcher et al. 2000. A Whole-Genome Assembly of Drosophila. Science (287): 2196–2204.CrossRefGoogle Scholar
  54. O’Malley, Maureen und Orkun S. Soyer 2012. The Roles of Integration in Molecular Systems Biology. Studies in History and Philosophy of Biological and Biomedical Sciences (43/1): 58–68.CrossRefGoogle Scholar
  55. Rabinow, Paul 1996. Making PCR. A Story of Biotechnology. Chicago, IL: Chicago University Press.Google Scholar
  56. Rheinberger, Hans-Jörg 2001a. Putting Isotopes to Work: Liquid Scintillation Counters, 1950–1970. In: Bernward Joerges, Terry Shinn (Hg.). Instrumentation Between Science, State and Industry. Dordrecht: Springer: 143–174.CrossRefGoogle Scholar
  57. Rheinberger, Hans-Jörg 2001b. Experimentalsysteme und epistemische Dinge. Eine Geschichte der Proteinsynthese im Reagenzglas. Göttingen: Wallstein Verlag.Google Scholar
  58. Ronaghi, M., S. Karamohamed, B. Pettersson, M. Uhlén und P. Nyrén 1996. Real-Time DNA Sequencing Using Detection of Pyrophosphate Release. Analytical Biochemistry (242): 84–89.CrossRefGoogle Scholar
  59. Ronaghi, Mostafa, Mathias Uhlén und Pål Nyrén 1998. A Sequencing Method Based on Real-Time Pyrophosphate. Science (281): 363–365.CrossRefGoogle Scholar
  60. Sanger, Frederick 1949. Some Chemical Investigations on the Structure of Insulin. Cold Spring Harbor Symposia on Quantitative Biology: Amino Acids and Proteins (14): 153–160.Google Scholar
  61. Sanger, Frederick 1959. Chemistry of Insulin. Science (129): 1340–1344.CrossRefGoogle Scholar
  62. Sanger, Frederick 1988. Sequences, Sequences, and Sequences. Annual Review of Biochemistry (57): 1–28.CrossRefGoogle Scholar
  63. Sanger, Frederick, Steve Nicklen und Alan R. Coulson 1977. DNA Sequencing with Chain-terminating Inhibitors. PNAS Proceedings of the National Academy of Sciences (74): 5463–5467.CrossRefGoogle Scholar
  64. Schmutz, Jeremy, Jeremy Wheeler, Jane Grimwood et al. 2004. Quality Assessment of the Human Genome Sequence. Nature (429): 365–368.CrossRefGoogle Scholar
  65. Schön, Oliver 2002. Systematische Verfahrensoptimierung im Bereich der Mega-Sequenzierung und ihr exemplarischer Einsatz zur Analyse des Humangenoms. Dissertation, Technische Universität Carolo-Wilhelmina Braunschweig.Google Scholar
  66. Schrödinger, Erwin 1944. What is Life? The Physical Aspect of the Living Cell. Cambridge: Cambridge University Press.Google Scholar
  67. Schuster, Stephan C. 2008. Next-generation Sequencing Transforms Today’s Biology. Nature (5/1): 16–18.Google Scholar
  68. Shendure, Jay et al. 2005. Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science (309): 1728–1732.CrossRefGoogle Scholar
  69. Smith, Temple F. 1990. The History of the Genetic Sequence Databases. Genomics (6): 702–707.Google Scholar
  70. Smith, L. M., G. J. Porreca, N. B. Reppas, X. Lin, J. P. McCutcheon, A. M. Rosenbaum, M. D. Wang, K. Zhang, R. D. Mitra, G. M. Church 1986. Fluorescence Detection in Automated DNA Sequence Analysis. Nature (321): 674–678.CrossRefGoogle Scholar
  71. Staden, Roger 1979. A Strategy of DNA Sequencing Employing Computer Programs. Nucleic Acids Research (6/7): 2601–2610.CrossRefGoogle Scholar
  72. Staden, Roger 1982. Automation of the Computer Handling of Gel Reading Data Produced by the Shotgun Method of DNA Sequencing. Nucleic Acids Research (10/15): 4731–4751.CrossRefGoogle Scholar
  73. Strasser, Bruno J. 2012. Data-driven Sciences: From Wonder Cabinets to Electronic Databases. Studies in History and Philosophy of Biological and Biomedical Sciences (43/1): 85–87.CrossRefGoogle Scholar
  74. Thieffry, Denis und Sahotra Sarkar 1998. Forty Years under The Central Dogma. Trends in Biochemistry (23): 312–316.CrossRefGoogle Scholar
  75. van’t Veer, Laura J., Hongyue Dai, Marc J. van de Vijver 2002. Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer. Nature (415): 530–536.CrossRefGoogle Scholar
  76. Venter, Craig J., Mark D. Adams, Eugene W. Myers 2001. The Sequence of the Human Genome. Science (291): 1304–1351.CrossRefGoogle Scholar
  77. Venter, Craig J., Karin Remington, John F. Heidelberg et al. 2004. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science (304): 66–74.CrossRefGoogle Scholar
  78. Waterston, Robert H., Eric S. Lander und John E. Sulston 2002. On the Sequencing of the Human Genome. PNAS Proceedings of the National Academy of Sciences (99/6): 3712–3716.CrossRefGoogle Scholar
  79. Wu, Ray und Dale A. Kaiser 1968. Structure and Base Sequence in the Cohesive Ends of Bacteriophage Lambda DNA. Journal of Molecular Biology (35/3): 523–537.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Zentrum für interdisziplinäre Wissenschafts- und TechnikforschungRWTH AachenAachenDeutschland

Personalised recommendations