Abstract
Data science has advanced significantly in recent years and allows scientists to harness large-scale data analysis techniques using open source coding frameworks. Data science is a tool that should be taught to science and engineering students in addition to their chosen domain knowledge. An applied data science minor allows students to understand data and data handling as well as statistics and model development. This move will improve reproducibility and openness of research as well as allow for greater interdisciplinarity and more analyses focusing on critical scientific challenges.
Similar content being viewed by others
References
T. Wackler: Strategy for American Leadership in Advanced Manufacturing, National Science and Technology Policy, White House, 40 (2018). https://www.whitehouse.gov/wp-content/uploads/2018/10/Advanced-Manufacturing-Strategic-Plan-2018.pdf. (accessed 4 January 2020).
B. Weinelt: Digital Transformation Initiative, World Economic Forum, (2015). http://wef.ch/2hU0x7I (accessed 4 January 2020).
R. Grossman, The Industries That Are Being Disrupted the Most by Digital, Harvard Business Review, (2016). https://hbr.org/2016/03/the-industries-that-are-being-disrupted-the-most-by-digital (accessed January 4, 2020).
M. I. Jordan, editor, Frontiers in Massive Data Analysis, National Research Council, National Academies Press, (2013). http://www.nap.edu/catalog.php?record_id=18374. (accessed 4 January 2020).
F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, R.E. Gruber, Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems, 26, 4 (2008). http://dl.acm.org/citation.cfm?id=1365816. (accessed January 26, 2016).
R.C. Taylor, An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, BMC Bioinformatics. 11, S1 (2010). http://www.biomedcentral.com/1471-2105/11/S12/S1. (accessed October 28, 2014).
M. Zaharia, R.S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M.J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, I. Stoica, Apache Spark: A Unified Engine for Big Data Processing, Commun. ACM. 59, 56–65 (2016). https://doi.org/10.1145/2934664. (accessed 4 January 2020).
E. Maxwell: Harnessing Openness to Improve Research, Teaching and Learning in Higher Education. Innovations: Technology, Governance, Globalization, 5(2), 155 (2010). http://dx.doi.org/10.1162/inov_a_00019. (accessed 4 January 2020).
E. Maxwell, Open Standards, Open Source, and Open Innovation: Harnessing the Benefits of Openness, Innovations: Technology, Governance, Globalization, 1, 119–176 (2006). https://doi.org/10.1162/itgg.2006.1.3.119. (accessed 4 January 2020).
D. C. Ince, L. Hatton, and J. Graham-Cumming: The case for open computer programs. Nature, 482, 7386, 485 (2012). http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html. (accessed 4 January 2020).
J. Andraka: Open Access: The Pathway to Innovation, OSTP, (2013). https://obamawhitehouse.archives.gov/blog/2013/06/20/open-access-pathway-innovation. (accessed 4 January 2020).
J. S. S. Lowndes, B. D. Best, C. Scarborough, J. C. Afflerbach, M. R. Frazier, O’C. C. Hara, N. Jiang, and B. S. Halpern: Our path to better science in less time using open data science tools. Nat. Ecol. Evol., 1(6), 160 (2017). https://dx.doi.org/10.1038/s41559-017-0160. (accessed 4 January 2020).
B. Obama: Executive Order — Making Open and Machine Readable the New Default for Government Information, The White House (2013). https://obamawhitehouse.archives.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-. (accessed 4 January 2020).
Group of 8 (G8): G8 Open Data Charter and Technical Annex Gov.UK), (2013). https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex. (accessed 4 January 2020).
J. P. Holdren: Increasing Access to the Results of Federally Funded Scientific Research, Executive Office of the President: Office of Science and Technology Policy, (2013). https://obamawhitehouse.archives.gov/blog/2016/02/22/increasing-access-results-federally-funded-science. (accessed 4 January 2020).
C. Wadia, M. Stebbins: It’s Time to Open Materials Science Data, Executive Office of the President: Office of Science and Technology Policy, (2015). https://obamawhitehouse.archives.gov/blog/2015/02/06/its-time-open-materials-science-data. (accessed 4 January 2020).
F. S. Collins and L. A. Tabak, “Policy: NIH plans to enhance reproducibility,” Nature, 505, 7485, 612–613, (Jan. 2014). http://www.nature.com/news/policy-nih-plans-to-enhance-reproducibility-1.14586. (accessed 4 January 2020).
H. V. Fineberg, “Reproducibility and Replicability in Science,” National Academies Press, (May 2019) https://www.nap.edu/catalog/25303/reproducibility-and-replicability-in-science. (accessed 4 January 2020).
Y. E. Wang, G.-Y. Wei, D. Brooks, Benchmarking TPU, GPU, and CPU Platforms for Deep Learning, ArXiv: 1907.10701 [Cs, Stat]. (2019). http://arxiv.org/abs/1907.10701 (accessed January 8, 2020).
N.P. Jouppi, et al., In-Datacenter Performance Analysis of a Tensor Processing Unit, ArXiv: 1704.04760 [Cs]. (2017). http://arxiv.org/abs/1704.04760 (accessed January 8, 2020).
Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature. 521, 436–444 (2015). https://doi.org/10.1038/nature14539. (accessed 4 January 2020).
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, Proc. of IEEE Computer Vision and Pattern Recognition, 8, (2009). https://wordnet.cs.princeton.edu/papers/imagenet_cvpr09.pdf. (accessed 4 January 2020).
ImageNet, (n.d.). http://image-net.org/ (accessed January 8, 2020).
A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 1097–1105, (2012). https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. (accessed 4 January 2020).
K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition, ArXiv:1409.1556 [Cs]. (2014). http://arxiv.org/abs/1409.1556. (accessed 4 January 2020).
R. Al-Rfou, et al., Theano: A Python framework for fast computation of mathematical expressions, ArXiv:1605.02688 [Cs]. (2016). http://arxiv.org/abs/1605.02688 (accessed January 8, 2020).
M. Abadi, et al., TensorFlow: A System for Large-Scale Machine Learning, Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 265–283, (2016). https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi (accessed January 8, 2020).
F. Chollet, J. J. Allaire, Deep Learning with R, Manning Publications, (2018). https://www.manning.com/books/deep-learning-with-r (accessed May 29, 2019).
G. Marcus, Deep Learning: A Critical Appraisal, ArXiv:1801.00631 [Cs, Stat]. (2018). http://arxiv.org/abs/1801.00631 (accessed January 8, 2020).
J. Dean, The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design, ArXiv:1911.05289 [Cs, Stat]. (2019). http://arxiv.org/abs/1911.05289 (accessed January 8, 2020).
D. Silver et al., “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, (Oct. 2017). https://www.nature.com/articles/nature24270. (accessed 4 January 2020).
D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, 529, 7587, 484–489, (Jan. 2016). https://www.nature.com/articles/nature16961. (accessed 4 January 2020).
E. E. David Jr.: Responsible Science, Volume I: Ensuring the Integrity of the Research Process, National Academies Press, (1992). http://www.nap.edu/catalog/1864/responsible-science-volume-i-ensuring-the-integrity-of-the-research. (accessed 4 January 2020).
R. D. Peng: Reproducible Research in Computational Science. Science, 334, 6060, 1226 (2011). https://dx.doi.org/10.1126/science.1213847. (accessed 4 January 2020).
Announcement: Reducing our irreproducibility. Nature, 496(7446), 398 (2013). http://www.nature.com/news/announcement-reducing-our-irreproducibility-1.12852. (accessed 4 January 2020).
J. T. Leek and R. D. Peng: Statistics: P values are just the tip of the iceberg. Nature, 520, 7549, 612 (2015). http://www.nature.com/doifinder/10.1038/520612a. (accessed 4 January 2020).
A. Guterres, “The Sustainable Development Goals Report 2018,” United Nations, Department of Economic and Social Affairs, (2018) https://www.un.org/development/desa/publications/the-sustainable-development-goals-report-2018.html. (accessed 4 January 2020).
R. H. French et al., “Degradation science: Mesoscopic evolution and temporal analytics of photovoltaic energy materials,” Current Opinion in Solid State and Materials Science, 19, 4, 212–226, (Aug. 2015). http://www.sciencedirect.com/science/article/pii/S1359028614000989. (accessed 4 January 2020).
H. E. Yang, R. H. French, L. S. Bruckman, Eds., Durability and Reliability of Polymers and Other Materials in Photovoltaic Modules, 1st Edition. Amsterdam: Elsevier, William Andrew Applied Science Publishers, (2019). https://www.sciencedirect.com/book/9780128115459/durability-and-reliability-of-polymers-and-other-materials-in-photovoltaic-modules. (accessed 4 January 2020).
International Energy Agency, World Energy Outlook 2019, (2019). https://www.iea.org/weo/weo2019/secure/data/. (accessed 4 January 2020).
T. M. Pollock: Integrated Computational Materials Engineering, National Academies Press, (2008). https://nae.edu/25043/Integrated-Computational-Materials-Engineering. (accessed 4 January 2020).
J. P. Holdren: Goals of the Materials Genome Initiative (2011). https://www.mgi.gov/sites/default/files/documents/materials_genome_initiative-final.pdf. (accessed 4 January 2020).
R.M. Dudley, R.M. Dudley, Uniform Central Limit Theorems, Cambridge University Press, (1999). https://doi.org/10.1017/CBO9780511665622. (accessed 4 January 2020).
H. Lasi, P. Fettke, H.-G. Kemper, T. Feld, and M. Hoffmann: Industry 4.0. Business & Information Systems Engineering, 6, 4, 239 (2014). DOI: 10.1007/s12599-014-0334-4. (accessed 4 January 2020).
L. D. Xu, E. L. Xu, and L. Li: Industry 4.0: State of the Art and Future Trends. International Journal of Production Research, 56, 8, 2941 (2018). DOI:10.1080/00207543.2018.1444806. (accessed 4 January 2020).
J. Lee, B. Bagheri, and H.-A. Kao: A Cyber-Physical Systems Architecture for Industry 4.0-based Manufacturing Systems. Manufacturing Letters, 3, 18 (2015). https://doi.org/10.1016/j.mfglet.2014.12.001. (accessed 4 January 2020).
Y. Lu: Industry 4.0: A Survey on Technologies, Applications and Open Research Issues. Journal of Industrial Information Integration, 6, 1 (2017). DOI: 10.1016/j.jii.2017.04.005
D. Hughes and R. H. French, “Crafting a Minor to Produce T-Shaped Graduates,” National Academies, Washington DC, 21 March 2016. http://tsummit.org/files/T-Summit_Speaker_Abstracts-2016.pdf. (accessed 4 January 2020).
Business Higher Education Forum, “Creating a Minor in Applied Data Science BHEF,” The Business Higher Education Forum, Case Study, Aug. 2016. Available: http://www.bhef.com/publications/creating-minor-applied-data-science. (accessed 4 January 2020).
R Core Team, “R: The R Project for Statistical Computing” (2019). https://www.r-project.org/. (accessed 4 January 2020).
RStudio: Integrated Development Environment for R, RStudio, Inc., Boston, MA (2015). http://www.rstudio.com/. (accessed 4 January 2020)
H. Wickham, G. Grolemund, “R for Data Science: Import, Tidy, Transform, Visualize, and Model Data” 1 edition, O’Reilly Media, (2017). http://r4ds.had.co.nz/. (accessed 4 January 2020).
van G. Rossum, Python tutorial, technical report CS-R9526, National Research Institute for Mathematics and Computer Science, Amsterdam, The Netherlands (1995), p.71. https://ir.cwi.nl/pub/5007/05007D.pdf. (accessed 4 January 2020).
G. Van Rossum and Drake L. Fred, Python 3 Reference Manual, CreateSpace, Scotts Valley, CA (2009).
Python Software Foundation: Python 3.8.1 documentation” (n.d.). https://docs.python.org/3.8/contents.html. (accessed 4 January 2020).
Van H. Styn, Git - Revision Control Perfected, Linux Journal, 208 (2011). https://www.linuxjournal.com/content/git-revision-control-perfected. (accessed 4 January 2020).
Z. Brown, A Git Origin Story, Linux Journal, 288 (2018). https://www.linuxjournal.com/content/git-origin-story. (accessed 4 January 2020).
K. Ram, “Git can facilitate greater reproducibility and increased transparency in science,” Source Code for Biology and Medicine, 8, 1, 7, (Feb. 2013). https://doi.org/10.1186/1751-0473-8-7. (accessed 4 January 2020).
A. Swartz, “Aaron Swartz’s A Programmable Web: An Unfinished Work,” Synthesis Lectures on the Semantic Web: Theory and Technology, 3, 2, 1–64, (Feb. 2013). https://www.morganclaypool.com/doi/abs/10.2200/S00481ED1V01Y201302WBE005. (accessed 4 January 2020).
M. Kline, Modern LaTeX, 2nd Ed. (2018). https://assets.bitbashing.io/modern-latex.pdf. (accessed 4 January 2020).
H. Wickham et al., “Welcome to the Tidyverse,” Journal of Open Source Software, vol. 4, no. 43, p. 1686, (Nov. 2019). https://joss.theoj.org/papers/10.21105/joss.01686. (accessed 4 January 2020).
H. Wickham, ggplot2: Elegant Graphics for Data Analysis, 2nd ed Springer International Publishing, (2016). https://www.springer.com/gp/book/9783319242750. (accessed 4 January 2020).
D. E. Knuth, “Literate Programming,” Comput J, 27, 2, 97–111, (Jan. 1984). https://academic.oup.com/comjnl/article/27/2/97/343244/Literate-Programming. (accessed 4 January 2020).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
French, R.H., Bruckman, L.S. Learnings from developing an applied data science curricula for undergraduate and graduate students. MRS Advances 5, 347–353 (2020). https://doi.org/10.1557/adv.2020.135
Published:
Issue Date:
DOI: https://doi.org/10.1557/adv.2020.135