Abstract
Data visualization technologies are powerful tools for telling evidence-based narratives about oneself and the world. This paper contributes to the literature on data science education by examining the sociotechnical practices of data wrangling—strategies for selecting and managing large, aggregated datasets to produce a model and story. We examined the learning opportunities related to data wrangling practices by investigating youth’s talk-in-interaction while assembling models and stories about family migration using interactive data visualization tools and large socioeconomic datasets. We first identified ten sociotechnical practices that characterize youth’s interaction with tools and collaboration in data wrangling. We then suggest four categories of activities to describe patterns of learning related to the practices, including addressing missing data, understanding data aggregation, exploring social or historical events that constitute the formation of data patterns, and varying data visual encoding for storytelling. These practices and activities are important to understand for supporting future data science education opportunities that facilitate learning and discussion about scientific and socioeconomic issues. This study also sheds light on how the family migration modeling context positions the youth as having agency and authority over the data and contributes to the design of CSCL environments that tackle the challenges of data wrangling.
Similar content being viewed by others
Notes
All participant names are pseudonyms.
Transcript conventions: CAPITALS indicate emphasis; (Observer notes) indicate significant gesture;? indicates rising intonation;! indicates exclamations; [indicates overlapping talk;, or. indicates pauses less than a half-second … indicates pauses longer than a half-second
References
Aridor, K., & Ben-Zvi, D. (2018). Statistical modeling to promote students’ aggregate reasoning with sample and sampling. ZDM, 50(7), 1165–1181.
Azevedo, F. S., & Mann, M. J. (2018). Seeing in the dark: Embodied cognition in amateur astronomy practice. Journal of the Learning Sciences, 27(1), 89–136.
Bandura, A. (1986). Social foundations of thought and action: A social-cognitive view. Englewood Cliffs: Prentice-Hall.
Barron, B. (2006). Interest and self-sustained learning as catalysts of development: A learning ecology perspective. Human Development, 49(4), 193–224.
Barron, B., Gomez, K., Pinkard, N., & Martin, C. K. (2014). The digital youth network: Cultivating digital media citizenship in urban communities. Cambridge: MIT Press.
Börner, K. (2019). VIS keynote address: Data visualization literacy. In 2019 IEEE Conference on Visual Analytics Science and Technology (VAST) (pp. 1-1). IEEE.
Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662–679.
Cairo, A. (2019). How charts lie: Getting smarter about visual information. New York: WW Norton & Company.
Cobb, P., Confrey, J., DiSessa, A., Lehrer, R., & Schauble, L. (2003). Design experiments in educational research. Educational Researcher, 32(1), 9–13.
Dalton, C. M., Taylor, L., & Thatcher, J. (2016). Critical data studies: A dialog on data and space. Big Data & Society, 3(1). https://doi.org/10.1177/2053951716648346.
Davis, P., Horn, M., Block, F., Phillips, B., Evans, E. M., Diamond, J., & Shen, C. (2015). “Whoa! We’re going deep in the trees!”: Patterns of collaboration around an interactive information visualization exhibit. International Journal of Computer-Supported Collaborative Learning, 10(1), 53–76.
Engel, J. (2017). Statistical literacy for active citizenship: A call for data science education. Statistics Education Research Journal, 16(1), 44–49.
Enyedy, N., & Mukhopadhyay, S. (2007). They don't show nothing I didn't know: Emergent tensions between culturally relevant pedagogy and mathematics pedagogy. The Journal of the Learning Sciences, 16(2), 139–174.
Fivush, R., Bohanek, J. G., & Zaman, W. (2011). Personal and intergenerational narratives in relation to adolescents' well-being. New Directions for Child and Adolescent Development, 131, 45–57.
Gibson, J. J. (1986). The ecological approach to visual perception. Hillsdale: Erlbaum (Original work published 1979).
Glaser, B. G. (1965). The constant comparative method of qualitative analysis. Social Problems, 12(4), 436–445.
Goldstein, B. E., & Hall, R. (2007). Modeling without end: Conflict across organizational and disciplinary boundaries in habitat conservation planning. In J. Kaput, E. Hamilton, S. Zawojewski, & R. Lesh (Eds.), Foundations for the future (pp. 57–76). Mahwah: Erlbaum.
Goodwin, C. (1994). Professional vision. American Anthropologist, New Series, 96(3), 606–633 Wiley.
Goodwin, C., & Goodwin, M. H. (1996). Seeing as a situated activity: Formulating planes. In Y. Engeström & D. Middleton (Eds.), Cognition and communication at work (pp. 61–95). Cambridge: Cambridge University Press.
Greeno, J. G. (1994). Gibson’s affordances. Psychological Review, 101, 336–342.
Greeno, J. G., & Engeström, Y. (2014). Learning in activity. In K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (2nd ed., pp. 128–147). London: Cambridge University Press.
Hall, R., & Nemirovsky, R. (2012). Introduction to the special issue: Modalities of body engagement in mathematical activity and learning. Journal of the Learning Sciences, 21(2), 207–215.
Hancock, C., Kaput, J. J., & Goldsmith, L. T. (1992). Authentic inquiry with data: Critical barriers to classroom implementation. Educational Psychologist, 27(3), 337–364.
Ingulfsen, L., Furberg, A., & Strømme, T. A. (2018). Students’ engagement with real-time graphs in CSCL settings: Scrutinizing the role of teacher support. International Journal of Computer-Supported Collaborative Learning, 13(4), 365–390.
Jiang, S. (2018). STEM+ L: Investigating Adolescents' participation trajectories in a collaborative multimodal composing environment (Doctoral dissertation, University of Miami).
Jiang, S., & Kahn, J. B. (2019). Data wrangling practices and process in modeling family migration narratives with big data visualization technologies. In 13th International Conference on Computer Supported Collaborative Learning-A Wide Lens: Combining Embodied, Enactive, Extended, and Embedded Learning in Collaborative Settings, CSCL 2019 (pp. 208-215). International Society of the Learning Sciences (ISLS).
Jordan, B., & Henderson, A. (1995). Interaction analysis: Foundations and practice. Journal of the Learning Sciences, 4(1), 39–103.
Kahn, J. (2020). Learning at the intersection of self and society: The family geobiography as a context for data science education. Journal of the Learning Sciences, 29(1), 57–80.
Kahn, J., & Hall, R. (2016). Getting personal with big data: Stories with multivariable models about global health and wealth. Paper presented at the American education research association 2016 annual meeting, Washington D.C.
Konold, C., Higgins, T., Russell, S. J., & Khalil, K. (2015). Data seen through different lenses. Educational Studies in Mathematics, 88(3), 305–325.
Kosara, R., & Mackinlay, J. (2013). Storytelling: The next step for visualization. Computer, 46(5), 44–50.
Krumhansl, R., Busey, A., Krumhansl, K., Foster, J., & Peach, C. (2013). Visualizing oceans of data: Educational interface design. In 2013 OCEANS-San Diego (pp. 1-8). IEEE.
Latour, B. (1999). Pandora's hope: Essays on the reality of science studies. Cambridge: Harvard University Press.
Lave, J. (1996). Teaching, as learning, in practice. Mind, Culture, and Activity, 3(3), 149–164.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge: Cambridge University Press.
Lee, V. R., & Dubovi, I. (2020). At home with data: Family engagements with data involved in type 1 diabetes management. Journal of the Learning Sciences, 29(1), 11–31.
Lee, V. R., & Wilkerson, M. (2018). Data use by middle and secondary students in the digital age: A status report and future prospects. Commissioned paper for the National Academies of sciences, engineering, and medicine, board on science education, committee on science investigations and engineering Design for Grades 6–12. Washington, D.C.
Lehrer, R., & English, L. (2018). Introducing children to modeling variability. In International handbook of research in statistics education (pp. 229–260). Springer, Cham.
Makar, K., & Rubin, A. (2018). Learning about statistical inference. In International handbook of research in statistics education (pp. 261–294). Springer, Cham.
Makar, K., Bakker, A., & Ben-Zvi, D. (2011). The reasoning behind informal statistical inference. Mathematical Thinking and Learning, 13(1–2), 152–173.
Moore, D. (1990). Uncertainty. In L. Steen (Ed.), On the shoulders of giants: New approaches to numeracy (pp. 95–137). Washington, D.C.: National Academy Press.
Noss, R., & Hoyles, C. (1996). Windows on mathematical meanings: Learning cultures and computers (Vol. 17). Dordrecht: Kluwer Academic Publishers.
Pangrazio, L., & Sefton-Green, J. (2020). The social utility of ‘data literacy’. Learning, Media and Technology, 45(2), 208–220.
Philip, T. M., Schuler-Brown, S., & Way, W. (2013). A framework for learning about big data with mobile technologies for democratic participation: Possibilities, limitations, and unanticipated obstacles. Technology, Knowledge and Learning, 18(3), 103–120.
Philip, T. M., Olivares-Pasillas, M. C., & Rocha, J. (2016). Becoming racially literate about data and data-literate about race: Data visualizations in the classroom as a site of racial-ideological micro-contestations. Cognition and Instruction, 34(4), 361–388.
Polman, J. L., & Hope, J. M. (2014). Science news stories as boundary objects affecting engagement with science. Journal of Research in Science Teaching, 51(3), 315–341.
Radinsky, J. (2020). Mobilities of data narratives. Cognition and Instruction, 1–33.
Radinsky, J., Hospelhorn, E., Melendez, J. W., Riel, J., & Washington, S. (2014). Teaching American migrations with GIS census webmaps: A modified “backwards design” approach in middle-school and college classrooms. Journal of Social Studies Research, 38(3), 143–158.
Radinsky, J., Tabak, I., & Moore, M. (2019). Disciplinary task models for designing classroom orchestration: The case of data visualization for historical inquiry. Proceedings of the 13th international conference of the computer supported collaborative learning (CSCL), Lyon, France.
Roberts, J., & Lyons, L. (2017). The value of learning talk: Applying a novel dialogue scoring method to inform interaction design in an open-ended, embodied museum exhibit. International Journal of Computer-Supported Collaborative Learning, 12(4), 343–376.
Rubel, L. H., Lim, V. Y., Hall-Wieckert, M., & Sullivan, M. (2016). Teaching mathematics for spatial justice: An investigation of the lottery. Cognition and Instruction, 34(1), 1–26.
Rubel, L. H., Hall-Wieckert, M., & Lim, V. Y. (2017). Making space for place: Mapping tools and practices to teach for spatial justice. Journal of the Learning Sciences, 26(4), 643–687.
Schegloff, E. A. (1997). Conversation analysis and socially shared cognition. In L. B. Resnick, J. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 150–171). Washington, DC: American Psychological Association.
Segel, E., & Heer, J. (2010). Narrative visualization: Telling stories with data. IEEE Transactions on Visualization and Computer Graphics, 16(6), 1139–1148.
Stahl, G. (2013). Transactive discourse in CSCL. International Journal of Computer-Supported Collaborative Learning, 8(2), 145–147.
Star, S. L. (1985). Scientific work and uncertainty. Social Studies of Science, 15(3), 391–427.
Stevens, R., & Hall, R. (1998). Disciplined perception: Learning to see in technoscience. In M. Lampert & M. L. Blunk (Eds.), Talking mathematics in school: Studies of teaching and learning (pp. 107–149). Cambridge: University Press.
Strauss, A., & Corbin, J. (1998). Basics of qualitative research. Techniques and procedures for developing grounded theory (2nd ed.). Thousand Oaks: Sage.
Tchounikine, P. (2019). Learners’ agency and CSCL technologies: Towards an emancipatory perspective. International Journal of Computer-Supported Collaborative Learning, 14(2), 237–250.
Tuominen, K., Savolainen, R., & Talja, S. (2005). Information literacy as a sociotechnical practice. The Library Quarterly, 75(3), 329–345.
Venturini, T., Jensen, P., & Latour, B. (2015). Fill in the gap: A new alliance for social and natural sciences. Journal of Artificial Societies and Social Simulation, 18(2), 11.
Wilkerson, M. H., & Laina, V. (2018). Middle school students’ reasoning about data and context through storytelling with repurposed local data. ZDM, 50(7), 1223–1235.
Wilkerson, M. H., & Polman, J. L. (2020). Situating data science: Exploring how relationships to data shape learning. Journal of the Learning Sciences, 29(1), 1–10.
Acknowledgements
This work was supported by the National Science Foundation under grant number 1341882. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, S., Kahn, J. Data wrangling practices and collaborative interactions with aggregated data. Intern. J. Comput.-Support. Collab. Learn 15, 257–281 (2020). https://doi.org/10.1007/s11412-020-09327-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11412-020-09327-1