Alivisatos, P. (2017). Stem and computer science education: Preparing the 21st century workforce. Research and Technology Subcommittee House Committee on Science, Space, and Technology.
Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired Magazine, 16(7), 16-07.
Google Scholar
Aravkin, A., & Davis, D. (2016). A smart stochastic algorithm for nonconvex optimization with applications to robust machine learning. arXiv preprint arXiv:161001101.
Association, A. S., et al. (2014). Curriculum guidelines for undergraduate programs in statistical science. Retrieved March 3, 2009, from http://www.amstat.org/education/curriculumguidelines.cfm.
Barnes, N. (2010). Publish your computer code: It is good enough. Nature News, 467(7317), 753–753.
Article
Google Scholar
Barocas, S., Boyd, D., Friedler, S., & Wallach, H. (2017). Social and technical trade-offs in data science.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
Article
Google Scholar
Bhardwaj, A. (2017). What is the difference between data science and statistics? https://priceonomics.com/whats-the-difference-between-data-science-and/.
Blei, D. M., & Smyth, P. (2017). Science and data science. Proceedings of the National Academy of Sciences, 114(33), 8689–8692.
Article
Google Scholar
Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In: Advances in Neural Information Processing Systems (pp. 4349–4357).
Bottou, L., Curtis, F. E., & Nocedal, J. (2016). Optimization methods for large-scale machine learning. arXiv preprint arXiv:160604838.
Breiman, L., et al. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231.
MathSciNet
Article
Google Scholar
Buckheit, J. B., & Donoho, D. L. (1995). Wavelab and reproducible research. In: Wavelets and statistics (pp. 55–81), Springer.
Bühlmann, P., & van de Geer, S. (2018). Statistics for big data: A perspective. Statistics and Probability Letters.
Bühlmann, P., & Meinshausen, N. (2016). Magging: maximin aggregation for inhomogeneous large-scale data. Proceedings of the IEEE, 104(1), 126–135.
Article
Google Scholar
Bühlmann, P., & Stuart, A. M. (2016). Mathematics, statistics and data science. EMS Newsletter, 100, 28–30.
Google Scholar
Chambers, J. M. (1993). Greater or lesser statistics: A choice for future research. Statistics and Computing, 3(4), 182–184.
Article
Google Scholar
Cleveland, W. S. (2001). Data science: an action plan for expanding the technical areas of the field of statistics. International Statistical Review, 69(1), 21–26.
Article
Google Scholar
Conway, D. (2010). The data science Venn diagram.
Crawford, K. (2017). The trouble with bias. Conference on Neural Information Processing Systems, invited speaker.
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., et al. (2017). Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, 4, 15–30.
Article
Google Scholar
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766.
MathSciNet
Article
Google Scholar
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:170208608.
Efron, B., & Hastie, T. (2016). Computer age statistical inference (vol 5). Cambridge: Cambridge University Press.
Eick, S. G., Graves, T. L., Karr, A. F., Marron, J., & Mockus, A. (2001). Does code decay? Assessing the evidence from change management data. IEEE Transactions on Software Engineering, 27(1), 1–12.
Article
Google Scholar
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in knowledge discovery and data mining (Vol. 21). Menlo Park: AAAI press.
Google Scholar
Felder, R. M., & Brent, R. (2016). Teaching and learning STEM: A practical guide. Hoboken: Wiley.
Google Scholar
Freitas, A. A. (2014). Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter, 15(1), 1–10.
Article
Google Scholar
Gentleman, R., Carey, V., Huber, W., Irizarry, R., & Dudoit, S. (2006). Bioinformatics and computational biology solutions using R and Bioconductor. Berlin: Springer.
MATH
Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Book in preparation for mit press. http://www.deeplearningbook.org.
Graves, T. L., Karr, A. F., Marron, J., & Siy, H. (2000). Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26(7), 653–661.
Article
Google Scholar
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (2011). Robust statistics: the approach based on influence functions (Vol. 114). Hoboken: Wiley.
MATH
Google Scholar
Hand, D. J., et al. (2006). Classifier technology and the illusion of progress. Statistical Science, 21(1), 1–14.
MathSciNet
Article
Google Scholar
Hardin, J., Hoerl, R., Horton, N. J., Nolan, D., Baumer, B., Hall-Holt, O., et al. (2015). Data science in statistics curricula: Preparing students to “think with data”. The American Statistician, 69(4), 343–353.
MathSciNet
Article
Google Scholar
Hicks, S. C., & Irizarry, R. A. (2017). A guide to teaching data science. The American Statistician
(just-accepted).
Hooker, G., & Hooker, C. (2017). Machine learning and the future of realism. arXiv preprint arXiv:170404688.
Huber, P. J. (2011). Robust statistics. In: International Encyclopedia of Statistical Science (pp. 1248–1251). Springer.
Jl, Doumont. (2009). Trees, maps, and theorems. Brussels: Principiae.
Google Scholar
Kiar, G., Bridgeford, E., Chandrashekhar, V., Mhembere, D., Burns, & R., Roncal, W. G., et al. (2017). A comprehensive cloud framework for accurate and reliable human connectome estimation and meganalysis. bioRxiv p 188706.
Knuth, D. E. (1984). Literate programming. The Computer Journal, 27(2), 97–111.
Article
Google Scholar
Kross, S., Peng, R. D., Caffo, B. S., Gooding, I., & Leek, J. T. (2017). The democratization of data science education. Peer J (PrePrints).
Leek, J. T., & Peng, R. D. (2015). Opinion: Reproducible research can still be wrong: Adopting a prevention approach. Proceedings of the National Academy of Sciences, 112(6), 1645–1646.
Article
Google Scholar
Lipton, Z. C. (2016). The mythos of model interpretability. arXiv preprint arXiv:160603490.
Lu, X., Marron, J., & Haaland, P. (2014). Object-oriented data analysis of cell images. Journal of the American Statistical Association, 109(506), 548–559.
MathSciNet
Article
Google Scholar
Maronna, R., Martin, R. D., & Yohai, V. (2006). Robust statistics (Vol. 1). Chichester: Wiley.
Book
Google Scholar
Marron, J. (1999). Effective writing in mathematical statistics. Statistica Neerlandica, 53(1), 68–75.
Article
Google Scholar
Marron, J. (2017). Big data in context and robustness against heterogeneity. Econometrics and Statistics, 2, 73–80.
MathSciNet
Article
Google Scholar
Marron, J., & Alonso, A. M. (2014). Overview of object oriented data analysis. Biometrical Journal, 56(5), 732–753.
MathSciNet
Article
Google Scholar
Members, R. P. (2017). The r project for statistical computing. https://www.r-project.org/.
Naur, P. (1974). Concise survey of computer methods.
Network, C. G. A., et al. (2012). Comprehensive molecular characterization of human colon and rectal cancer. Nature, 487(7407), 330–337.
Article
Google Scholar
Nolan, D., & Temple Lang, D. (2010). Computing in the statistics curricula. The American Statistician, 64(2), 97–107.
MathSciNet
Article
Google Scholar
O’Neil, C. (2017). Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.
Patil, D. (2011). Building data science teams. “O’Reilly Media, Inc.”.
Patil, P., Peng, R. D., & Leek, J. (2016). A statistical definition for reproducibility and replicability. bioRxiv p 066803.
Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226–1227.
Article
Google Scholar
Perez, F., & Granger, B. E. (2015). Project jupyter: Computational narratives as the engine of collaborative data science. Tech. rep., Technical Report. Technical report, Project Jupyter.
Pizer, S. M., & Marron, J. (2017). Object statistics on curved manifolds. In Statistical Shape and Deformation Analysis: Methods, Implementation and Applications (p. 137).
Chapter
Google Scholar
Reid, N. (2018). Statistical science in the world of big data. Statistics and Probability Letters.
MathSciNet
Article
Google Scholar
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135–1144). ACM.
Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach. Egnlewood Cliffs: Artificial Intelligence Prentice-Hall.
MATH
Google Scholar
Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013). Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10), 285. (e1003) .
Article
Google Scholar
Smith, M. T., Zwiessele, M., & Lawrence, N. D. (2016) Differentially private Gaussian processes. arXiv preprint arXiv:160600720.
Sonnenburg, S., Braun, M. L., Ong, C. S., Bengio, S., Bottou, L., Holmes, G., et al. (2007). The need for open source software in machine learning. Journal of Machine Learning Research, 8(oct), 2443–2466.
Google Scholar
Staudte, R. G., & Sheather, S. J. (2011). Robust estimation and testing (Vol. 918). Hoboken: Wiley.
MATH
Google Scholar
Stodden, V. (2012). Reproducible research for scientific computing: Tools and strategies for changing the culture. Computing in Science and Engineering, 14(4), 13–17.
Article
Google Scholar
Tao, T. (2007). What is good mathematics? Bulletin of the American Mathematical Society, 44(4), 623–634.
MathSciNet
Article
Google Scholar
Tukey, J. W. (1962). The future of data analysis. The Annals of Mathematical Statistics, 33(1), 1–67.
MathSciNet
Article
Google Scholar
Wang, H., & Marron, J. (2007). Object oriented data analysis: Sets of trees. The Annals of Statistics, 1849–1873.
MathSciNet
Article
Google Scholar
Wasserman, L. (2014). Rise of the machines. In Past, present, and future of statistical science (pp. 1–12).
Chapter
Google Scholar
Wickham, H. (2015). R packages: Organize, test, document, and share your code. O’Reilly Media, Inc.
Wilson, G., Aruliah, D. A., Brown, C. T., Hong, N. P. C., Davis, M., Guy, R. T., et al. (2014). Best practices for scientific computing. PLoS Biology, 12(1), 745. (e1001) .
Article
Google Scholar
Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., & Teal, T. K. (2017). Good enough practices in scientific computing. PLoS Computational Biology, 13(6), 510. (e1005) .
Article
Google Scholar
Wu, C. (1998). Statistics = data science? http://www2.isye.gatech.edu/~jeffwu/presentations/datascience.pdf.
Xie, Y. (2015). Dynamic Documents with R and knitr (Vol. 29). Boca Raton: CRC Press.
Google Scholar
Yu, B. (2014). Ims presidential address: Let us own data science. http://bulletin.imstat.org/2014/10/ims-presidential-address-let-us-own-data-science/.
Zarsky, T. (2016). The trouble with algorithmic decisions: An analytic road map to examine efficiency and fairness in automated and opaque decision making. Science, Technology, and Human Values, 41(1), 118–132.
Article
Google Scholar