Abstract
We live in a data-rich world with rapidly growing databases with zettabytes of data. Innovation, computation, and technological advances have now tremendously accelerated the pace of discovery, providing driverless cars, robotic devices, expert healthcare systems, precision medicine, and automated discovery to mention a few. Even though the definition of the term data science continues to evolve, the sweeping impact it has already produced on society is undeniable. We are at a point when new discoveries through data science have enormous potential to advance progress but also to be used maliciously, with harmful ethical and social consequences. Perhaps nowhere is this more clearly exemplified than in the biological and medical sciences. The confluence of (1) machine learning, (2) mathematical modeling, (3) computation/simulation, and (4) big data have moved us from the sequencing of genomes to gene editing and individualized medicine; yet, unsettled policies regarding data privacy and ethical norms could potentially open doors for serious negative repercussions. The data science revolution has amplified the urgent need for a paradigm shift in undergraduate biology education. It has reaffirmed that data science education interacts and enhances mathematical education in advancing quantitative conceptual and skill development for the new generation of biologists. These connections encourage us to strive to cultivate a broadly skilled workforce of technologically savvy problem-solvers, skilled at handling the unique challenges pertaining to biological data, and capable of collaborating across various disciplines in the sciences, the humanities, and the social sciences. To accomplish this, we suggest development of open curricula that extend beyond the job certification rhetoric and combine data acumen with modeling, experimental, and computational methods through engaging projects, while also providing awareness and deep exploration of their societal implications. This process would benefit from embracing the pedagogy of experiential learning and involve students in open-ended explorations derived from authentic inquiries and ongoing research. On this foundation, we encourage development of flexible data science initiatives for the education of life science undergraduates within and across existing models.
Similar content being viewed by others
References
AAMC-HHMI Scientific Foundation for Future Physicians Committee et al. (2009) Scientific foundations for future physicians
Akman O, Powell M (2018) A model for cross-institutional collaboration: how the intercollegiate biomathematics alliance is pioneering a new paradigm in response to diminishing resources in academia. Lett Biomath 5(1):91–97
Akman O, Eaton CD, Hrozencik D, Jenkins KP, Thompson KV (2020) Building community-based approaches to systemic reform in mathematical biology education. Bull Math Biol 82(8):1–21
Alekseyev YO, Fazeli R, Yang S, Basran R, Maher T, Miller NS, Remick D (2018) A next-generation sequencing primer—How does it work and what can it do? Acad Pathol 5:2374289518766521
American Statistical Association et al. (2016) Guidelines for assessment and instruction in statistics education (GAISE): College report 2016. Alexandria, VA
Anderson C (2008) The end of theory: The data deluge makes the scientific method obsolete. Wired Mag. 16(7):16-07
Andreu-Perez J, Poon CCY, Merrifield RD, Wong STC, Yang G-Z (2015) Big data for health. IEEE J Biomed Health Inform 19(4):1193–1208
Attwood TK, Kell DB, McDermott P, Marsh J, Pettifer SR, Thorne D (2009) Calling international rescue: Knowledge lost in literature and data landslide!. Biochem J 424(3):317–333
Balevi E, Gitlin RD (2018) Synergies between cloud-fag-thing and brain-spinal cord-nerve networks. In: 2018 Information theory and applications workshop (ITA), pp. 1–9. IEEE
Batut B, Hiltemann S, Bagnacani A, Baker D, Bhardwaj V, Blank C, Bretaudeau A, Brillet-Guéguen L, Čech M, Chilton J et al (2018) Community-driven data analysis training for biology. Cell Syst 6(6):752–758
Blaschke LM (2012) Heutagogy and lifelong learning: A review of heutagogical practice and self-determined learning. Int Rev Res Open Distrib. Learn. 13(1):56–71
Boyer EL (1998) The Boyer commission on educating undergraduates in the research university, reinventing undergraduate education: A blueprint for america’s research universities. Stony Brook, NY, p 46
Bruner JS (1971) “The process of education” revisited. The Phi Delta Kappan 53(1):18–21
Cajori F (1890) The teaching and history of mathematics in the United States, Washington, 1890, p 94
Carmichael I, Marron JS (2018) Data science vs. statistics: two cultures? Japanese J Stat Data Sci 1(1):117–138
Chen M, Yang J, Zhou J, Hao Y, Zhang J, Youn C-H (2018) 5G-smart diabetes: Toward personalized diabetes diagnosis with healthcare big data clouds. IEEE Commun Mag 56(4):16–23
Chung MK (2018) Statistical challenges of big brain network data. Stat. Probab. Lett. 136:78–82
Cirillo D, Valencia A (2019) Big data analytics for personalized medicine. Current Opinion Biotechnol 58:161–167
Cohen JE (2004) Mathematics is biology’s next microscope, only better; Biology is mathematics’ next physics, only better. PLoS Biol 2(12):e439
Commission on Undergraduate Education in the Biological Sciences and Kormondy EJ (1972) CUEBS, 1963 to 1972: Its history and final report
Compeau P, Pevzner PA (2018) Bioinformatics algorithms: an active learning approach. La Jolla. Active Learning Publishers, CA
D’Argenio V (2018) The high-throughput analyses era: Are we ready for the data struggle? High-Throughput 7(1):8–20
D’Avanzo C (2013) Post-vision and change: Do we know how to change? CBE Life Sci Educ 12(3):273–382
Davison J (2018) No, machine learning is not just glorified statistics. https://towardsdatascience.com/no-machine-learning-is-not-just-glorified-statistics-26d3952234e3. Accessed 31 Mar 2020
Devers C, Lee C, Hoffert J, Devers E, Burgos S, Davis J (2015) Followme: A game-based approach to self-regulation. In: Society for information technology & teacher education international conference, pp. 754–758. Association for the Advancement of Computing in Education (AACE)
Ding WY, Beresford MW, Saleem MA, Ramanan AV (2019) Big data and stratified medicine: what does it mean for children? Archiv Dis Childhood 104(4):389–394
Discover Data Science (2020) Bachelor degree in data science — Guide to choosing a great program. https://www.discoverdatascience.org/programs/bachelors-in-data-science/. Accessed 2 Apr 2020
Dönertacs HM, Fuentealba M, Partridge L, Thornton JM (2019) Identifying potential ageing-modulating drugs in silico. Trends in Endocrinol Metab 30(2):118–131
Drake A, Struve L, Meghani SA, Bukoski B (2019) Invisible labor, visible change: Non-tenure-track faculty agency in a research university. Rev High Educ 42(4):1635–1664
Duran-Frigola M, Fernández-Torras A, Bertoni M, Aloy P (2019) Formatting biological big data for modern machine learning in drug discovery. Wiley Interdiscip Rev Comput Mol Sci 9(6):e1408
Farley SS, Dawson A, Goring SJ, Williams JW (2018) Situating ecology as a big-data science: Current advances, challenges, and solutions. BioScience 68(8):563–576
Fenner M (2019) Machine learning with Python for everyone. Addison-Wesley Professional
Freeman S, Eddy SL, McDonough M, Smith MK, Okoroafor N, Jordt H, Wenderoth MP (2014) Active learning increases student performance in science, engineering, and mathematics. Proc Nat Acad Sci 111(23):8410–8415
Fuentealba M, Dönertacs HM, Williams R, Labbadia J, Thornton JM, Partridge L (2019) Using the drug-protein interactome to identify anti-ageing compounds for humans. PLoS Comput Biol 15(1):e1006639
Géron A (2019) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems, 2nd edn. O’Reilly Media
Gibson G (2018) Population genetics and GWAS: A primer. PLoS Biol 16(3):e2005485
Godsey B (2017) Think like a data scientist: Tackle the data science process step-by-step. Manning Publications Co
Golmohammadi M, Harati Nejad Torbati AH, Lopez de Diego S, Obeid I, Picone J (2019) Automatic analysis of eegs using big data and hybrid deep learning architectures. Front Human Neurosci 13:76
Greer ML, Akman O, Comar TD, Hrozencik D, Rubin JE (2020) Paying our dues: The role of professional societies in the evolution of mathematical biology education. Bull Math Biol 82(5):59–59
Grindrod P (1991) Patterns and waves: The theory and applications of reaction-diffusion equations. Oxford University Press, USA
Gross LJ (1994) Quantitative training for life-science students. BioScience 44(2):59
Grus J (2019) Data science from scratch: First principles with Python. O’Reilly Media
Hampton SE, Strasser CA, Tewksbury JJ, Gram WK, Budden AE, Batcheller AL, Duke CS, Porter JH (2013) Big data and the future of ecology. Front Ecology Environ 11(3):156–162
Handelsman J, Ebert-May D, Beichner R, Bruns P, Chang A, DeHaan R, Gentile J, Lauffer S, Stewart J, Tilghman SM, Wood WB (2004) Scientific teaching. Science 304(5670):521–522
Hase S (2009) Heutagogy and e-learning in the workplace: Some challenges and opportunities. Impact: J Appl Res Workplace E-Learn 1(1):43–52
Hase S, Kenyon Chr (2000) From andragogy to heutagogy. UltiBASE In-Site
Hayes LM (2019) Here to stay: An overview of the non-tenure track faculty and their rise to new faculty majority. In: Jeffries R (ed) Diversity, equity, and inclusivity in contemporary higher education, pp. 160–174. IGI Global
Healy K (2018) Data visualization: A practical introduction. Princeton University Press
Herbart JF (1896) The science of education. DC Heath & Company
Herrera GE, Lenhart S (2010) Spatial optimal control of renewable resource stocks. In: Canrell S, Cosner C, Ruan S (eds) Spatial ecology, p 343
Hosseini M-P, Soltanian-Zadeh H, Elisevich K, Pompili D (2016) Cloud-based deep learning of big EEG data for epileptic seizure prediction. In: 2016 IEEE Global conference on signal and information processing (GlobalSIP), pp. 1151–1155, IEEE
James J (2019) What ‘data never sleeps 7.0’ says-and doesn’t say. Domosphere: https://www.domo.com/learn/data-never-sleeps-7
Jungck JR (1991) Constructivism, computer exploratoriums, and collaborative learning: Constructing scientific knowledge. Teach Educ 3(2):151–170
Kaggle: Code and data. https://www.kaggle.com/
Kansagra AP, John-Paul JYu, Chatterjee AR, Lenchik L, Chow DS, Prater AB, Yeh J, Doshi AM, Hawkins CM, Heilbrun ME (2016) Big data and the future of radiology informatics. Acad Radiol 23(1):30–42
Katkin W (2003) The Boyer commission report and its impact on undergraduate research. New Dir Teach Learn 93:19–38
Knowles MS (1980) The modern practice of adult education: Andragogy vs. pedagogy. Association Press, Wilton, CT
Krohn J, Beyleveld G, Bassens A (2019) Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence. Addison-Wesley Professional,
Kuddus RH (2013) Who should change biology education: An analysis of the final report on the Vision and Change in Undergraduate Biology Education conference. Int J Biol Educ 3(1a):63–83
Kuhn M, Johnson K (2013) Applied predictive modeling, vol 26. Springer, Berlin
LaDeau SL, Han BA, Rosi-Marshall EJ, Weathers KC (2017) The next decade of big data in ecosystem science. Ecosystems 20(2):274–283
LaMar MD (2016) QUBES: A community supporting teaching and learning in quantitative biology. https://help.hubzero.org/resources/1520
Lehrer R, Schauble L, Wisittanawat P (2020) Getting a grip on variability. Bull Math Biol 82:106
Lesk A (2019) Introduction to bioinformatics. Oxford University Press
Macauley M, Youngs N (2020) The case for algebraic biology: From research to education. Bull Math Biol 82:115
Mackey MC, Maini PK (2015) What has mathematics done for biology? Bull Math Biol 77(5):735–738
MacLean D (2019) R bioinformatics cookbook: Use R and bioconductor to perform RNAseq, genomics, data visualization, and bioinformatic analysis. Packt Publishing
Makkie M, Huang H, Zhao Y, Vasilakos AV, Liu T (2019) Fast and scalable distributed deep convolutional autoencoder for FMRI big data analytics. Neurocomputing 325:20–30
Manzoni C, Denny P, Lovering RC, Lewis PA (2015) Computational analysis of the LRRK2 interactome. PeerJ 3:e778
Mayes R, Long T, Huffling L, Reedy A, Williamson B (2020) Undergraduate quantitative biology impact on biology preservice teachers. Bull Math Biol 82:63
Merchant A (2018) Big data: Ushering new vistas in market research. Projectics/Proyectica/Projectique 3:9–12
Moore JW (1998) The Boyer report. J Chem Edu 75(8):935
Moses A (2017) Statistical modeling and machine learning for molecular biology. CRC Press,
Munov MM, Price SA (2019) The future is bright for evolutionary morphology and biomechanics in the era of big data. Integr Comp Biol 59(3):599–603
National Academies of Sciences, Engineering and Medicine (2018) Data science for undergraduates: Opportunities and options. National Academies Press
National Institutes of Environmental Health Sciences (2018) Workshop on developing a data science competent EHS workforce. https://www.niehs.nih.gov/news/events/pastmtg/2018/data-science/workshop_report.pdf
National Science Foundation (2005) Interdisciplinary training for undergraduates in biological and mathematical sciences (UBM)
National Research Council (2003) BIO2010: Transforming undergraduate education for future research biologists. National Academies Press
National Research Council (2012) Discipline-based education research: Understanding and improving learning in undergraduate science and engineering. National Academies Press
Nolan D, Temple Lang D (2010) Computing in the statistics curricula. Am Stat 64(2):97–107
Nunes Da Silva I, Hernane Spatti D, Andrade Flauzino R, Liboni LHB, dos Reis Alves SF (2017) Artificial neural networks. Springer International Publishing, p 39
Olson S, Riordan DG (2012) Engage to excel: Producing one million additional college graduates with degrees in science, technology, engineering, and mathematics. Report to the president, Executive office of the president
O’Neil C (2016) Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books
Owais SS, Hussein NS (2016) Extract five categories CPIVW from the 9V’s characteristics of the big data. Int J Adv Comput Sci Appl 7(3):254–258
Pearson M (2004) MAA’s professional enhancement program (PREP) funded by NSF. FOCUS
President’s Council of Advisors on Science and Technology (US) (2010) Prepare and Inspire: K-12 Education in Science, Technology, Engineering, and Math (STEM) for America’s Future: Executive Report. Executive Office of the President
Ralston A (1984) Will discrete mathematics surpass calculus in importance? The Two-Year Coll Math J 15(5):371–373
Ramsundar B, Eastman P, Walters P, Pande V (2019) Deep learning for the life sciences: Applying deep learning to genomics, microscopy, drug discovery, and more. O’Reilly Media, Inc
Rapp A, Tirassa M (2017) Know thyself: A theory of the self for personal informatics. Human-Comput Interact 32(5–6):335–380
Raschka S (2015) Python machine learning. Packt Publishing Ltd
Rheinberger H-J (2011) Infra-experimentality: From traces to data, from data to patterning facts. Hist Sci 49(3):337–348
Richards NM, King JH (2014) Big data ethics. Wake Forest L Rev 49:393
Roberts FS (1980) Is calculus necessary? In: Proceedings of the fourth international congress on mathematical education, pp. 50–53
Rohlf FJ, Sokal RR (1995) Biometry: The principles and practice of statistics in biological research. Freeman New York, 1961, 1981, 1995, and many subsequent editions
Saltz JS, Stanton JM (2017) An introduction to data science. Sage Publications
Sansone S-A, McQuilton P, Rocca-Serra P, Gonzalez-Beltran A, Izzo M, Lister AL, Thurston M (2019) FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol 37(4):358–367
Schumacher CS, Siegel MJ (2015) 2015 CUPM curriculum guide to majors in the mathematical sciences
Schwab JJ (1962) The teaching of science. The teaching of science as enquiry and science in the elementary school. Harvard University Press
Science/AAAS (2015) Special issue: Artificial Intelligence, vol. 349. American Association for the Advancement of Science
Scientific Data. Recommended data repositories (2020) Scientific Data. https://www.nature.com/sdata/policies/repositories. Accessed 24 Apr 2020
Sheppard C (2017) Genetic algorithms with Python. Smashwords Edition
Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall LW, Price S, Scherle R, Spaeth PA, Kidd DM (2010) Linking big: The continuing promise of evolutionary synthesis. Evolution 64(4):871–880
Sigmund K, Nowak MA (1999) Evolutionary game theory. Current Biol 9(14):R503–R505
Singer S, Smith KA (2013) Discipline-based education research: Understanding and improving learning in undergraduate science and engineering. J Eng Educ 102(4):468–471
Smith BR (2016) Atlas of knowledge: Anyone can map. MIT press
Stains M, Harshman J, Barker MK, Chasteen SV, Cole R, DeChenne-Peters SE, Eagan MK, Esson JM, Knight JK, Laski FA et al (2018) Anatomy of STEM teaching in North American universities. Science 359(6383):1468–1470
Stanhope L, Ziegler L, Haque T, Le L, Vinces M, Davis GK, Zieffler A, Brodfuehrer P, Preest M, Belitsky JM, Umbanhowar C, Overvoorde PJ (2018) Development of a biological science quantitative reasoning exam (biosquare). 16(4):ar66
Steen LA (2005) Math and BIO 2010: Linking undergraduate disciplines. MAA
Stein-O’Brien GL, Arora R, Culhane AC, Favorov AV, Garmire LX, Greene CS, Goff LA, Li Y, Ngom A, Ochs MF et al (2018) Enter the matrix: Factorization uncovers knowledge from omics. Trends Genet 34(10):790–805
Sturmfels B (2005) Can biology lead to new theorems? Annu Rep Clay Math Inst 2005:13–26
Sun AY, Scanlon BR (2019) How can big data and machine learning benefit environment and water management: A survey of methods, applications, and future directions. Environ Res Lett 14(7):073001
Taylor RT, Bishop PR, Lenhart S, Gross LJ, Sturner, Kelly (2020) Development of the BioCalculus Assessment (BCA). 19(1):ar6
Theobald O (2017) Machine learning for absolute beginners. Scatterplot Press
Tukey JW (1962) The future of data analysis. Ann Math Stat 33(1):1–67
Turner M (1962) Statistics in biology. In: Lucas HL Jr (ed) The Cullowhee Conference on Training in Biomathematics, Cullowhee, North Carolina, August 1961. Typing Service, Raleigh, NC, pp 259–263
Vayena E, Salathé M, Madoff LC, Brownstein JS (2015) Ethical challenges of big data in public health. PLoS Comput Biol 11(2):e1003904
Vision and Change in Undergraduate Biology Education: A Call to Action (2009) AAAS, Washington DC
Voit EO (2019) Perspective: Dimensions of the scientific method. PLoS Comput Biol 15(9):e1007279
Wang K, Shao Y, Shu L, Zhu C, Zhang Y (2016) Mobile big data fault-tolerant processing for ehealth networks. IEEE Netw 30(1):36–42
Webb S (2018) Deep learning for biology. Nature 554(7693)
Wegmayr V, Aitharaju S, Buhmann J (2018) Classification of brain MRI with big data and deep 3d convolutional neural networks. In: Petrick NA, Mori K (eds) Medical imaging 2018: Computer-aided diagnosis, vol. 10575, p. 105751S. International Society for Optics and Photonics
Wigderson A (2018) Mathematics and computation: Algorithms will rule the earth, but which algorithms? The institute letter, p. 4, Fall
Wilke CO (2019) Fundamentals of data visualization: A primer on making informative and compelling figures. O’Reilly Media
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, Santos LBD, Bourne PE et al (2016) Comment: the FAIR guiding principles for scientific data management and stewardship. Sci Data 3:9
Woodin T, Carter VC, Fletcher L (2017) Vision and Change in Biology Undergraduate Education, A Call for Action—Initial responses. CBE Life Sci Educ 9:71–73
Writer P (2017) 10 Key Marketing Trends for 2017. https://www.paulwriter.com/10-key-marketing-trends-2017/. Accessed 4 Sept 2020
Wüest RO, Zimmermann NE, Zurell D, Alexander JM, Fritz SA, Hof C, Kreft H, Normand S, Cabral JS, Szekely E et al (2020) Macroecology in the age of big data-where to go from here? J Biogeogr 47(1):1–12
Zar JH (1999) Biostatistical analysis. Pearson Education India
Zelterman D (2015) Applied multivariate statistics with R. Springer, Berlin
Zheng L, Yuan G, Yang Y, Kuang H (2020) Efficient acquisition of geographic big data: Domestic three-line stereo aerial photography system. In: Chinese Academy of Sciences et al. (eds) China’s e-science blue book 2018, pp. 205–218. Springer
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The first author was partially supported by the Karl Peace Fellowship in Mathematics, Randolph-Macon College, VA. The third author was supported by NSF Award #DBI-1300426 to the University of Tennessee.
Rights and permissions
About this article
Cite this article
Robeva, R.S., Jungck, J.R. & Gross, L.J. Changing the Nature of Quantitative Biology Education: Data Science as a Driver. Bull Math Biol 82, 127 (2020). https://doi.org/10.1007/s11538-020-00785-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11538-020-00785-0