Statistical Computing and Data Science in Introductory Statistics

  • Karsten LübkeEmail author
  • Matthias Gehrke
  • Norman Markgraf
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


In the last years, there is movement towards simulation-based inference (e.g. bootstrapping and randomization tests) in order to improve students’ understanding of statistical reasoning as well as a call to introduce statistical computing and reproducible analysis within the curriculum. With the help of R mosaic and the concept of minimal R, we were able to include all this in an introductory statistics course for people studying while working a business-related major. Moreover, this also paves the road towards methods and concepts like data wrangling or algorithmic modelling, more related to data science than to classical statistics.



We thank Oliver Gansser, Bianca Krol, Sebastian Sauer, and numerous other colleagues for their contribution in the proposed change of the curriculum and for helpful comments in order to improve the teaching materials. Also, we thank Nathan Tintle for his support with the CAOS inventory. The remarks of Nicholas Horton, Randall Pruim, and two reviewers helped to improve this paper a lot. We gratefully acknowledge that our work was supported by an internal teaching innovation grant by our institution.


  1. Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., et al. (2018). rmarkdown: Dynamic documents for R.; R package version 1.10.
  2. Angrist, J. D., & Pischke, J. S. (2017). Undergraduate econometrics instruction: Through our classes, darkly. Journal of Economic Perspectives, 31(2), 125–44. Scholar
  3. Baumer, B., Cetinkaya-Rundel, M., Bray, A., Loi, L., & Horton, N. J. (2014). R markdown: Integrating a reproducible analysis tool into introductory statistics. Technological Innovations in Statistics Education, 8(1).Google Scholar
  4. Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231.MathSciNetCrossRefGoogle Scholar
  5. Chance, B., Wong, J., & Tintle, N. (2016). Student performance in curricula centered on simulation-based inference: A preliminary report. Journal of Statistics Education, 24(3), 114–126. Scholar
  6. Chang, W., Cheng, J., Allaire, J., Xie, Y., & McPherson, J. (2018). Shiny: Web application framework for R.; R package version 1.1.0.
  7. Cobb, G. (2015). Mere renovation is too little too late: We need to rethink our undergraduate curriculum from the ground up. The American Statistician, 69(4), 266–282. Scholar
  8. Cobb, G. W. (2007). The introductory statistics course: A ptolemaic curriculum? Technology Innovations in Statistics Education, 1(1).Google Scholar
  9. Dabbish, L., Stuart, C., Tsay, J., & Herbsleb, J. (2012). Social coding in github: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work (pp. 1277–1286). ACMGoogle Scholar
  10. De Veaux, R. D., & Velleman, P. F. (2008). Math is music; statistics is literature (or, why are there no six-year-old novelists?). Amstat News, 375, 54–58.Google Scholar
  11. Delmas, G., Joan, G., Ooms, A., & Chance, B. (2007). Assessing students’ conceptual understanding after a first course in statistics. Statistics Education Research Journal, 6(2).Google Scholar
  12. Doi, J., Potter, G., Wong, J., Alcaraz, I., & Chi, P. (2016). Web application teaching tools for statistics using R and shiny. Technology Innovations in Statistics Education, 9(1).Google Scholar
  13. Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766. Scholar
  14. Ernst, M. D. (2004). Permutation methods: A basis for exact inference. Statistical Science, 19(4), 676–685.MathSciNetCrossRefGoogle Scholar
  15. Fox, J. (2017). Using the R commander: A point-and-click interface for R. Boca Raton FL: Chapman and Hall/CRC Press.
  16. Hardin, J., Hoerl, R., Horton, N. J., Nolan, D., Baumer, B., Hall-Holt, O., et al. (2015). Data science in statistics curricula: Preparing students to “think with data”. The American Statistician, 69(4), 343–353. Scholar
  17. Hesterberg, T. C. (2015). What teachers should know about the bootstrap: Resampling in the undergraduate statistics curriculum. The American Statistician, 69(4), 371–386.MathSciNetCrossRefGoogle Scholar
  18. Hildreth, L. A., Robison-Cox, J., & Schmidt, J. (2018). Comparing student success and understanding in introductory statistics under consensus and simulation-based curricula. Statistics Education Research Journal, 17(1).Google Scholar
  19. Horton, N. J., Baumer, B. S., Wickham, H. (2014). Teaching precursors to data science in introductory and second courses in statistics. arXiv:1401.3269
  20. Kaplan, D.: Statistical modeling: A fresh approach (2 edn). Project MOSAIC Books.Google Scholar
  21. Kaplan, D. (2018). Teaching stats for data science. The American Statistician, 72(1), 89–96. Scholar
  22. Kaplan, D., & Pruim, R. (2018). ggformula: Formula Interface to the Grammar of Graphics.; R package version 0.9.0.
  23. Kim, A. Y., Escobedo-Land, A. (2015). Okcupid data for introductory statistics and data science courses. Journal of Statistics Education, 23(2), null (2015).
  24. Lesser, L. M., Pearl, D. K., III, J. J. W. (2016). Assessing fun items’ effectiveness in increasing learning of college introductory statistics students: Results of a randomized experiment. Journal of Statistics Education, 24(2), 54–62 (2016).,
  25. McGowan, H. M., & Gunderson, B. K. (2010). A randomized experiment exploring how certain features of clicker use effect undergraduate students’ engagement and learning in statistics. Technology Innovations in Statistics Education, 4(1).Google Scholar
  26. Neumann, D. L., Hood, M., & Neumann, M. M. (2013). Using real-life data when teaching statistics: student perceptions of this strategy in an introductory statistics course. Statistics Education Research Journal, 12(2).Google Scholar
  27. Nolan, D., & Lang, D. T. (2010). Computing in the statistics curricula. The American Statistician, 64(2), 97–107. Scholar
  28. Pearl, J. (2009). Causal inference in statistics: An overview. Statistics surveys, 3, 96–146.MathSciNetCrossRefGoogle Scholar
  29. Pruim, R., Kaplan, D.T., & Horton, N.J. (2017). The mosaic package: Helping students to ‘think with data’ using R. The R Journal, 9(1), 77–102 (2017).
  30. R Core Team. (2018). R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria.
  31. Ridgway, J. (2016). Implications of the data revolution for statistics education. International Statistical Review, 84(3), 528–549. Scholar
  32. Ridsdale, C., Rothwell, J., Smit, M., Ali-Hassan, H., Bliemel, M., Irvine, D., et al. (2015). Strategies and best practices for data literacy education: Knowledge synthesis report (2015).
  33. Schau, C.: Survey of attitudes toward statistiks (2017).
  34. Schloerke, B., Allaire, J., & Borges, B. (2018). learnr: Interactive Tutorials for R.; R package version
  35. Schulte, F. P. (2015). Die Bedeutung und Erfassung des Erwerbs von Theorie-Praxis-/Praxis-Theorie-Transferkompetenz im Rahmen eines dualen Studiums.
  36. Sole, M. A., & Weinberg, S. L. (2017). What’s brewing? A statistics education discovery project. Journal of Statistics Education, 25(3), 137–144. Scholar
  37. Stigler, J. W., Son, J. Y. (2018). Modeling first: A modeling approach to teaching introductory statistics. In M. A. Sorto, A. White, L. Guyot (Eds.), Looking back, looking forward. Proceedings of the Tenth International Conference on Teaching Statistics.
  38. Tintle, N., Chance, B. L., Cobb, G. W., Rossman, A.J., Roy, S., Swanson, T., et al. (2016). Introduction to statistical investigations. Wiley Online Library.Google Scholar
  39. Tintle, N., Clark, J., Fischer, K., Chance, B., Cobb, G., Roy, S., et al. (2018). Assessing the association between precourse metrics of student preparation and student performance in introductory statistics: Results from early data on simulation-based inference vs. nonsimulation-based inference. Journal of Statistics Education, 26(2), 103–109 (2018).,
  40. Tukey, J. W. (1977). Exploratory data analysis. Mass: Reading.zbMATHGoogle Scholar
  41. Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: Toward a unified view. MIS Quarterly, 27(3), 425–478.CrossRefGoogle Scholar
  42. Venkatesh, V., Thong, J. Y., & Xu, X. (2012). Consumer acceptance and use of information technology: extending the unified theory of acceptance and use of technology. MIS Quarterly (pp. 157–178).Google Scholar
  43. Wang, X., Rush, C., & Horton, N. J. (2017). Data visualization on day one: Bringing big ideas into intro stats early and often. Technology Innovations in Statistics Education, 10(1).Google Scholar
  44. Weihs, C., & Ickstadt, K. (2018). Data science: The impact of statistics. International Journal of Data Science and Analytics,. Scholar
  45. Wickham, H. (2016). ggplot2: Elegant Graphics for data analysis. New York: Springer.
  46. Wickham, H., François, R., Henry, L., Müller, K. (2018). dplyr: A grammar of data manipulation.; R package version 0.7.6.
  47. Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical Review, 67(3), 223–248. Scholar
  48. Wild, C. J., Pfannkuch, M., Regan, M., & Parsonage, R. (2017). Accessible conceptions of statistical inference: Pulling ourselves up by the bootstraps. International Statistical Review, 85(1), 84–107. Scholar
  49. Wild, C. J., Utts, J. M., Horton, N. J. (2018). What Is statistics? (pp. 5–36). Cham: Springer International Publishing.,
  50. Wood, B. L., Mocko, M., Everson, M., Horton, N. J., & Velleman, P. (2018). Updated guidelines, updated curriculum: The gaise college report and introductory statistics for the modern student. CHANCE, 31(2), 53–59. Scholar
  51. Xie, Y. (2015) Dynamic Documents with R and knitr (2nd Edn). Chapman and Hall/CRC, Boca Raton, Florida. ISBN 978-1498716963

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Karsten Lübke
    • 1
    Email author
  • Matthias Gehrke
    • 2
  • Norman Markgraf
    • 3
  1. 1.FOM University of Applied SciencesDortmundGermany
  2. 2.FOM University of Applied SciencesFrankfurtGermany
  3. 3.FOM University of Applied SciencesEssenGermany

Personalised recommendations