Skip to main content

Advertisement

Log in

Data science pedagogical tools and practices: A systematic literature review

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

The development of data science curricula has gained attention in academia and industry. Yet, less is known about the pedagogical practices and tools employed in data science education. Through a systematic literature review, we summarize prior pedagogical practices and tools used in data science initiatives at the higher education level. Following the Technological Pedagogical Content Knowledge (TPACK) framework, we aim to characterize the technological and pedagogical knowledge quality of reviewed studies, as we find the content presented to be diverse and incomparable. TPACK is a universally established method for teaching considering information and communication technology. Yet it is seldom used for the analysis of data science pedagogy. To make this framework more structured, we list the tools employed in each reviewed study to summarize technological knowledge quality. We further examine whether each study follows the needs of the Cognitive Apprenticeship theory to summarize the pedagogical knowledge quality in each reviewed study. Of the 23 reviewed studies, 14 met the needs of Cognitive Apprenticeship theory and include hands-on experiences, promote students’ active learning, seeking guidance from the instructor as a coach, introduce students to the real-world industry demands of data and data scientists, and provide meaningful learning resources and feedback across various stages of their data science initiatives. While each study presents at least one tool to teach data science, we found the assessment of the technological knowledge of data science initiatives to be difficult. This is because the studies fall short of explaining how students come to learn the operation of tools and become proficient in using them throughout a course or program. Our review aims to highlight implications for practices and tools used in data science pedagogy for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

Data sharing does not apply to this article as no datasets were generated or analyzed during the current study.

References

  • Akram, H., Yingxiu, Y., Al-Adwan, A. S., & Alkhalifah, A. (2021). Technology integration in higher education during COVID-19: An assessment of online teaching competencies through technological pedagogical content knowledge model. Frontiers in Psychology, 12, 736522.

    Article  Google Scholar 

  • Aktaş, İ, & Özmen, H. (2020). Investigating the impact of TPACK development course on pre-service science teachers’ performances. Asia Pacific Education Review, 21, 667–682.

    Article  Google Scholar 

  • Allaire JJ, Xie Y, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W, Iannone R (2021) Rmarkdown: Dynamic documents for R. https://CRAN.R-project.org/package=rmarkdown

  • Allen, G. I. (2021). Experiential learning in data science: Developing an interdisciplinary, client-sponsored capstone program. SIGCSE - Proc. ACM Tech. Symp. Comput. Sci. Educ., PG-516522, 516–522. https://doi.org/10.1145/3408877.3432536

  • Anderson, P., Bowring, J., McCauley, R., Pothering, G., & Starr, C. (2014). n undergraduate degree in data science: curriculum and a decade of implementation experience. 45th ACM Technical Symposium on Computer Science Education, 145–150.

  • Archambault, L. M., & Barnett, J. H. (2010). Revisiting technological pedagogical content knowledge: Exploring the TPACK framework. Computers & Education, 44(4), 1656–1662.

    Article  Google Scholar 

  • Barman, A., Chen, S., Chang, A., & Allen, G. (2022). Experiential learning in data science through a novel client-facing consulting course. Proc. Front. Educ. Conf. FIE, 2022-Octob(PG-). https://doi.org/10.1109/FIE56618.2022.9962532

  • Bart, A. C., Kafura, D., Shaffer, C. A., & Tilevich, E. (2018). Reconciling the promise and pragmatics of enhancing computing pedagogy with data science. 49th ACM Technical Symposium on Computer Science Education, 1029–1034.

  • Berman, F., Rutenbar, R., Hailpern, B., Christensen, H., Davidson, S., Estrin, D., ..., & Szalay, A. S. (2018). Realizing the potential of data science. Communications of the ACM, 61(4), 67–72.

  • Bonnell, J., Ogihara, M., & Yesha, Y. (2022). Challenges and issues in data science education. Computer, 55(2 PG-63–66), 63–66. https://doi.org/10.1109/MC.2021.3128734

    Article  Google Scholar 

  • Bornn, L., Mortensen, J., & Ahrensmeier, D. (2022). A data-first approach to learning real-world statistical modeling. Canadian Journal for the Scholarship of Teaching and Learning, 13(1 PG-). https://doi.org/10.5206/cjsotlrcacea.2022.1.10204

  • Brinkley-Etzkorn, K. E. (2018). Learning to teach online: Measuring the influence of faculty development training on teaching effectiveness through a TPACK lens. The Internet and Higher Education, 38, 28–35.

    Article  Google Scholar 

  • Cao, L. (2017). Data science: A comprehensive overview. ACM Computing Surveys (CSUR), 50(3), 1–42.

    Article  Google Scholar 

  • Cetinkaya-Rundel, M., & Ellison, V. (2021). A fresh look at introductory data science. Journal of Statistics and Data Science Education, 29(PG-S16-S26), S16–S26. https://doi.org/10.1080/10691898.2020.1804497

    Article  Google Scholar 

  • Ching, G. S., & Roberts, A. (2020). Evaluating the pedagogy of technology integrated teaching and learning: An overview. International Journal of Research Studies in Education, 9, 37–50.

    Article  Google Scholar 

  • Collins, A., Brown, J. S., & Holum, A. (1991). Cognitive apprenticeship: Making thinking visible. American Educator, 15(3), 6–11.

    Google Scholar 

  • Collins, A., Brown, J. S., & Newman, S. E. (2018). Cognitive apprenticeship: Teaching the crafts of reading, writing, and mathematics. In Knowing, learning, and instruction. Routledge.

  • Collins, A. (2006). Cognitive apprenticeship. The cambridge handbook of the learning sciences.

  • Covidence. (2023). Covidence systematic review software. Retrieved February 2023 from www.covidence.org

  • Danyluk, A., Leidig, P., McGettrick, A., Cassel, L., Doyle, M., Servin, C., Schmitt, K., & Stefik, A. (2021). Computing competencies for undergraduate data science programs: An ACM task force final report. SIGCSE, PG-11191120, 1119–1120. https://doi.org/10.1145/3408877.3432586

  • De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., …, & Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. In Annual Review of Statistics and Its Application (Vol. 4, Issue PG-15–30, pp. 15–30). https://doi.org/10.1146/annurev-statistics-060116-053930

  • Dennen, V. P., & Burner, K. J. (2008). The cognitive apprenticeship model in educational practice. Routledge.

    Google Scholar 

  • Dogan, A., & Birant, D. (2021). Machine learning and data mining in manufacturing. Expert Systems with Applications, 166, 114060.

    Article  Google Scholar 

  • Donoghue, T., Voytek, B., & Ellis, S. E. (2021). Teaching creative and practical data science at scale. Journal of Statistics and Data Science Education, 29(PG-S27-S39), S27–S39. https://doi.org/10.1080/10691898.2020.1860725

    Article  Google Scholar 

  • Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766.

    Article  MathSciNet  Google Scholar 

  • Fennell, H. W., Lyon, J. A., Madamanchi, A., & Magana, A. J. (2020). Toward computational apprenticeship: Bringing a constructivist agenda to computational pedagogy. Journal of Engineering Education, 109(2), 170–176.

    Article  Google Scholar 

  • Feyyad, U. M. (1996). Data mining and knowledge discovery: Making sense out of data. IEEE Expert, 11(5), 20–25.

    Article  Google Scholar 

  • Finzer, W. (2013). The data science education dilemma. Technology Innovations in Statistics Education, 7(2). https://doi.org/10.52041/srap.12105

  • Garrett, K. N. (2014). A quantitative study of higher education faculty self-assessments of technological, pedagogical, and content knowledge (TPaCK) and technology training. The University of Alabama.

  • Gess-Newsome, J. (1999). Pedagogical content knowledge: An introduction and orientation. In Examining pedagogical content knowledge: The construct and its implications for science education (pp. 3–17).

  • Green, A., & Zhai, C. (2019). LiveDataLab: A cloud-based platform to facilitate hands-on data science education at scale. In Proceedings of the Sixth (2019) ACM Conference on Learning@ Scale (Issue PG-, pp. 1–2). https://doi.org/10.1145/3330430.3333665

  • Hassan, O. A. (2011). Learning theories and assessment methodologies–an engineering educational perspective. European Journal of Engineering Education, 36(4), 327–339.

    Article  Google Scholar 

  • Hee, K., Zicari, R. V., Tolle, K., & Manieri, A. (2016). Tailored data science education using gamification. In 2016 8TH IEEE International Conference on Cloud Computing Technology and Science (CLOUDCOM 2016) (Issue PG-627–632, pp. 627–632). https://doi.org/10.1109/CloudCom.2016.105

  • Hicks, S. C., & Irizarry, R. A. (2018). A guide to teaching data science. The American Statistician, 72(4 PG-382–391), 382–391. https://doi.org/10.1080/00031305.2017.1356747

    Article  MathSciNet  MATH  Google Scholar 

  • Holt, D., Smissen, I., & Segrave, S. (2006). New students, new learning, new environments in higher education: Literacies in the digital age. Proceedings of the 23rd Annual ASCILITE Conference “Who’s Learning? Whose Technology, 327–336.

  • Hughes, J., Thomas, R., & Scharber, C. (2006). Assessing technology integration: The RAT–replacement, amplification, and transformation-framework. In Society for Information. Technology & Teacher Education International Conference, 1616–1620.

  • Huppenkothen, D., Arendt, A., Hogg, D. W., Ram, K., VanderPlas, J. T., & Rokem, A. (2018). Hack weeks as a model for data science education and collaboration. Proceedings of the National Academy of Sciences of the United States of America, 115(36 PG-8872–8877), 8872–8877. https://doi.org/10.1073/pnas.1717196115

    Article  Google Scholar 

  • Ionascu, A., & Stefaniga, S. A. (2020). DS Lab Notebook: A new tool for data science applications. In 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2020) (Issue PG-310–314, pp. 310–314). https://doi.org/10.1109/SYNASC51798.2020.00056

  • Irizarry, R. A. (2020). The role of academia in data science education. 2(1).

  • Kim, B., & Henke, G. (2021). Easy-to-use cloud computing for teaching data science. Journal of Statistics and Data Science Education, 29(PG-S103-S111), S103–S111. https://doi.org/10.1080/10691898.2020.1860726

    Article  Google Scholar 

  • Kitchin, R. (2014). Big data, new epistemologies and paradigm shifts. Big Data & Society, 1(1). https://doi.org/10.1177/2053951714528481

  • Koyuncuoglu, Ö. (2021). An investigation of graduate students’ Technological Pedagogical and Content Knowledge (TPACK). International Journal of Education in Mathematics, Science and Technology, 9(2), 299–313.

    Article  MathSciNet  Google Scholar 

  • Kristensen, F., Troeng, O., Safavi, M., & Narayanan, P. (2015). Competition in higher education–good or bad.

  • Kross, S., & Guo, P. J. (2019). Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–14.

  • Maksimenkova, O., Neznanov, A., & Radchenko, I. (2019). Using data expedition as a formative assessment tool in data science education: Reasoning, justification, and evaluation. International Journal of Emerging Technologies in Learning, 14(11 PG-107–122), 107–122. https://doi.org/10.3991/ijet.v14i11.10202

    Article  Google Scholar 

  • Maksimenkova, O., Neznanov, A., & Radchenko, I. (2020). Collaborative learning in data Science education: A data expedition as a formative assessment tool. In Challenges of the Digital Transformation in Education, ICL2018, VOL 1 (Vol. 916, Issue PG-14–25, pp. 14–25). https://doi.org/10.1007/978-3-030-11932-4_2

  • Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung Byers, A. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.

  • Mikalef, P., & Krogstie, J. (2019). Investigating the Data Science Skill Gap: An Empirical Analysis. In EDUCON (Issue PG-1275–1284, pp. 1275–1284).

  • Mikroyannidis, A., Domingue, J., Bachler, M., & Quick, K. (2019). Smart blockchain badges for data science education. Proc. Front. Educ. Conf. FIE, 2018-Octob(PG-). https://doi.org/10.1109/FIE.2018.8659012

  • Mikroyannidis, A., Domingue, J., Phethean, C., Beeston, G., & Simperl, E. (2018). Designing and delivering a curriculum for data science education across Europe. In Teaching and Learning in a Digital World (Vol. 716, Issue PG-540–550, pp. 540–550). https://doi.org/10.1007/978-3-319-73204-6_59

  • Mishra, P., & Koehler, M. J. (2006). Technological pedagogical content knowledge: A framework for teacher knowledge. Teachers College Record, 108(6), 1017–1054.

    Article  Google Scholar 

  • Molenda, M. (2003). In search of the elusive ADDIE model. Performance Improvement, 42(5), 34–37.

    Article  Google Scholar 

  • Mujallid, A. (2021). Instructors’ readiness to teach online: A review of TPACK standards in online professional development. Programmes in Higher Education. International Journal of Learning, Teaching and Educational Research, 20(7), 135–150.

    Google Scholar 

  • Murray, S., Ryan, J., & Pahl, C. (2003). A tool-mediated cognitive apprenticeship approach for a computer engineering course. 3rd IEEE International Conference on Advanced Technologies, 2–6.

  • Polak, J., & Cook, D. (2021). A study on student performance, engagement, and experience with Kaggle InClass data challenges. Journal of Statistics and Data Science Education, 29(1 PG-63–70), 63–70. https://doi.org/10.1080/10691898.2021.1892554

    Article  Google Scholar 

  • Power, D. J. (2016). Data science: Supporting decision-making. Journal of Decision Systems, 25(4), 345–356.

    Article  Google Scholar 

  • Rao, A., Bihani, A., & Nair, M. (2018). Milo: A visual programming environment for Data Science Education. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (Issue PG-211–215, pp. 211–215). NS -

  • Romrell, D., Kidder, L., & Wood, E. (2014). The SAMR model as a framework for evaluating mLearning. Online Learning Journal, 18(2). https://doi.org/10.24059/olj.v18i2.435

  • Rossi, R. (2021). Data science education based on ADDIE model and the EDISON framework. In 2021 International Conference on Big Data Engineering and Education (BDEE 2021) (Issue PG-40–45, pp. 40–45). https://doi.org/10.1109/BDEE52938.2021.00013

  • Rostami, M. A., & Bucker, H. M. (2019). Redesigning interactive educational modules for combinatorial scientific computing. In Computational Science - ICCS 2019, PT V (Vol. 11540, Issue PG-363–373, pp. 363–373). https://doi.org/10.1007/978-3-030-22750-0_29

  • Roy, P. K., Saumya, S., Singh, J. P., Banerjee, S., & Gutub, A. (2023). Analysis of community question-answering issues via machine learning and deep learning: State-of-the-art review. CAAI Transactions on Intelligence Technology, 8(1), 95–117.

    Article  Google Scholar 

  • Salas-Rueda, R. A. (2020). TPACK: Technological, pedagogical and content model necessary to improve the educational process on mathematics through a web application? International Electronic Journal of Mathematics Education, 15(1). https://doi.org/10.29333/iejme/5887

  • Sanchez-Pinto, L. N., Luo, Y., & Churpek, M. M. (2018). Big data and data science in critical care. Chest, 154(5), 1239–1248.

    Article  Google Scholar 

  • Sánchez‐Peña, M., Vieira, C., & Magana, A. J. (2022). Data science knowledge integration: Affordances of a computational cognitive apprenticeship on student conceptual understanding. Computer Applications in Engineering Education, 31(2), 239–259. https://doi.org/10.1002/cae.22580

  • Savonen, C., Wright, C., Hoffman, A. M., Muschelli, J., Cox, K., Tan, F. J., & Leek, J. T. (2022). Open-source Tools for Training Resources–OTTR. Journal of Statistics and Data Science Education, PG- 1–12. https://doi.org/10.1080/26939169.2022.2118646

    Article  Google Scholar 

  • Schmidt, D. A., Baran, E., Thompson, A. D., Mishra, P., Koehler, M. J., & Shin, T. S. (2009). Technological pedagogical content knowledge (TPACK) the development and validation of an assessment instrument for preservice teachers. Journal of Research on Technology in Education, 42(2), 123–149.

    Article  Google Scholar 

  • Shafi, A., Saeed, S., Bamarouf, Y. A., Iqbal, S. Z., Min-Allah, N., & Alqahtani, M. A. (2019). Student outcomes assessment methodology for ABET accreditation: A case study of computer science and computer information systems programs. IEEE Access, 7, 13653–13667.

    Article  Google Scholar 

  • Sheffield, R., Dobozy, E., Gibson, D., Mullaney, J., & Campbell, C. (2015). Teacher education students using TPACK in science: A case study. Educational Media International, 52(3), 227–238.

    Article  Google Scholar 

  • Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14.

    Article  Google Scholar 

  • Silva, P. (2015). Davis’ technology acceptance model (TAM)(1989). Information Seeking Behavior and Technology Adoption: Theories and Trends (pp. 205–219). https://doi.org/10.4018/978-1-4666-8156-9.ch013

  • Song, I. Y., & Zhu, Y. J. (2016). Big data and data science: what should we teach? Expert Systems, 33(4 PG-364–373), 364–373. https://doi.org/10.1111/exsy.12130

    Article  Google Scholar 

  • Suthar, K., Mitchell, T., Hartwig, A. C., Wang, J., Mao, S., Parson, L., Zeng, P., Liu, B., & He, P. (2021). Real data and application-based interactive modules for data science education in engineering. ASEE Annu. Conf. Expos. Conf. Proc., PG-. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124546523&partnerID=40&md5=ed00569a6049c4f397399743b6de40efNS-

  • Tang, R., & Sae-Lim, W. (2016). Data science programs in US higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information, 23(3), 269–290.

    Article  Google Scholar 

  • Vance, E. A. (2021). Using team-based learning to teach data science. Journal of Statistics and Data Science Education, 29(3 PG-277–296), 277–296. https://doi.org/10.1080/26939169.2021.1971587

    Article  Google Scholar 

  • Watson, D. M. (2001). Pedagogy before technology: Re-thinking the relationship between ICT and teaching. Education and Information Technologies, 6, 251–266.

    Article  Google Scholar 

  • West, J. (2018). Teaching data science: an objective approach to curriculum validation. Computer Science Education, 28(2 PG-136–157), 136–157. https://doi.org/10.1080/08993408.2018.1486120

    Article  Google Scholar 

  • Yavuz, F. G., & Ward, M. D. (2020). Fostering undergraduate data science. American Statistician, 74(1 PG-8–16), 8–16. https://doi.org/10.1080/00031305.2017.1407360

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This study was funded by Canada Research Chair Program and Canada Foundation for Innovation

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bahar Memarian.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare the following financial interests/personal relationships which may be considered as potential competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Memarian, B., Doleck, T. Data science pedagogical tools and practices: A systematic literature review. Educ Inf Technol (2023). https://doi.org/10.1007/s10639-023-12102-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10639-023-12102-y

Keywords

Navigation