Skip to main content

Statistical and Stochastic Analysis of Sequence Data

  • Chapter
  • First Online:

Part of the book series: Computer-Supported Collaborative Learning Series ((CULS,volume 19))

Abstract

Two common CSCL questions regarding analyses of temporal data, such as event sequences, are: (i) What variables are related to event attributes? and (ii) what is the process (or what are the processes) that generated the events? The first question is best answered with statistical methods, the second with stochastic or deterministic process modeling methods. This chapter provides an overview of statistical and stochastic methods of direct relevance to CSCL research. Many of the statistical analyses are integrated into statistical discourse analysis. From the stochastic modeling repertoire, the basic hidden Markov model as well as recent extensions is introduced, ending with dynamic Bayesian models as the current best integration. Looking into the near future, we identify opportunities for a closer alignment of qualitative with quantitative methods for temporal analysis, afforded by developments such as automization of quantitative methods and advances in computational modeling.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Abrahamson, D., Blikstein, P., & Wilensky, U. (2007). Classroom model, model classroom: Computer-supported methodology for investigating collaborative-learning pedagogy. In C. Chinn, G. Erkens, & S. Puntambekar (Eds.), Proceedings of the 8th international conference on computer supported collaborative learning (CSCL) (Vol. 8, part 1, pp. 49–58). International Society of the Learning Sciences.

    Google Scholar 

  • Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A., & Buyya, R. (2015). Big data computing and clouds: Trends and future directions. Journal of Parallel and Distributed Computing, 79, 3–15.

    Article  Google Scholar 

  • Bakeman, R., & Gottman, J. M. (1986). Observing interaction: An introduction to sequential analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  • Bannert, M., Reimann, P., & Sonnenberg, C. (2014). Process mining techniques for analysing patterns and strategies in students’ self-regulated learning. Metacognition and Learning, 9(2), 161–185.

    Article  Google Scholar 

  • Bello-Orgaz, G., Jung, J. J., & Camacho, D. (2016). Social big data: Recent achievements and new challenges. Information Fusion, 28, 45–59.

    Article  Google Scholar 

  • Bergner, Y., Walker, E., & Ogan, A. (2017). Dynamic Bayesian network models for peer tutoring interactions. In A. A. von Davier, M. Zhu, & P. C. Kyllonen (Eds.), Innovative assessment of collaboration (pp. 249–268). New York: Springer.

    Google Scholar 

  • Boyer, K. E., Ha, E. Y., Phillips, R., Wallis, M. D., Vouk, M. A., & Lester, J. (2009). Inferring tutorial dialogue structure with hidden Markov modeling. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications—EdAppsNLP ‘09 (pp. 19–26). Association for Computational Linguistics.

    Google Scholar 

  • Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models. London: Sage.

    Google Scholar 

  • Cen, H., Koedinger, K., & Junker, B. (2006). Learning factors analysis–a general method for cognitive model evaluation and improvement. In M. Ikeda, K. D. Ashley, & T. W. Chan (Eds.), Intelligent tutoring systems, lecture notes in computer science (Vol. 4053, pp. 164–175). New York: Springer.

    Google Scholar 

  • Chen, G., & Chiu, M. M. (2008). Online discussion processes: Effects of earlier messages’ evaluations, knowledge content, social cues and personal information on later messages. Computers and Education, 50, 678–692.

    Article  Google Scholar 

  • Chiu, M. M. (2008). Flowing toward correct contributions during groups’ mathematics problem solving: A statistical discourse analysis. Journal of the Learning Sciences, 17(3), 415–463. https://doi.org/10.1080/10508400802224830.

    Article  Google Scholar 

  • Chiu, M. M. (2013). Cycles of discourse analysis <=> statistical discourse analysis. In 10th International conference on computer supported collaborative learning, Madison, WI, USA.

    Google Scholar 

  • Chiu, M. M. (2018). Statistically modelling effects of dynamic processes on outcomes: An example of discourse sequences and group solutions. Journal of Learning Analytics, 5(1), 75–91.

    Article  Google Scholar 

  • Chiu, M. M., & Lehmann-Willenbrock, N. (2016). Statistical discourse analysis: Modeling sequences of individual behaviors during group interactions across time. Group Dynamics: Theory, Research, and Practice, 20(3), 242–258. DOI: 10.1037/gdn0000048

    Google Scholar 

  • Cohen, J., West, S. G., Aiken, L., & Cohen, P. (2003). Applied multiple regression/correlation analysis for the behavioral sciences. Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Cress, U. (2008). The need for considering multilevel analysis in CSCL research—an appeal for the use of more advanced statistical methods. International Journal of Computer-Supported Collaborative Learning, 3, 69–84.

    Article  Google Scholar 

  • Embretson, S. E., & Reise, S. P. (2013). Item response theory. Hove, East Sussex, UK: Psychology Press.

    Google Scholar 

  • Farran, D. C., & Son-Yarbrough, W. (2001). Title I funded preschools as a developmental context for children’s play and verbal behaviors. Early Childhood Research Quarterly, 16(2), 245–262.

    Article  Google Scholar 

  • Feldman, R., & Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press.

    Google Scholar 

  • Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144.

    Article  Google Scholar 

  • Goldstein, H. (2011). Multilevel statistical models. London: Edward Arnold.

    Google Scholar 

  • Gottman, J. M., & Roy, A. K. (1990). Sequential analysis: A guide for behavioral researchers. Cambridge: Cambridge University Press.

    Google Scholar 

  • Greene, W. H. (1997). Econometric analysis (3rd ed.). London: Prentice-Hall.

    Google Scholar 

  • Helske, S., & Helske, J. (2017). Mixture hidden Markov models for sequence data: The seqHMM package in R. Retrieved from http://arxiv.org/abs/1704.00543

  • Jackson, C. H. (2011). Multi-state models for panel data: The msm package for R. Journal of Statistical Software, 38(8), 1–29.

    Article  Google Scholar 

  • Joreskog, K., & Sorbom, D. (2015). LISREL 9.2. New York: Scientific Software International.

    Google Scholar 

  • Kennedy, P. (2008). Guide to econometrics. New York: Wiley-Blackwell.

    Google Scholar 

  • Korb, K. B., & Nicholson, A. E. (2010). Bayesian artificial intelligence. Boca Raton, FL: CRC Press.

    Google Scholar 

  • Loehlin, C. (2004). Latent variable models: An introduction to factor, path, and structural equation analysis. Hove, East Sussex, UK: Psychology Press.

    Google Scholar 

  • Mandryk, R. L., & Inkpen, K. M. (2004). Physiological indicators for the evaluation of co-located collaborative play. In Proceedings of the 2004 ACM conference on Computer Supported Cooperative Work—CSCW ‘04 (pp. 102–111). Association for Computing Machinery.

    Google Scholar 

  • Muthén, L. K., & Muthén, B. O. (2018). Mplus 8.1. Los Angeles, CA: Muthén & Muthén.

    Google Scholar 

  • Nagarajan, R., Scutari, M., & Lèbre, S. (2013). Bayesian networks in R. New York: Springer.

    Google Scholar 

  • National Research Council. (2013). Frontiers in massive data analysis. Washington, DC: National Academies Press.

    Google Scholar 

  • O’Connell, J., & Højsgaard, S. (2011). Hidden semi Markov models for multiple observation sequences: The mhsmm package for R. Journal of Statistical Software, 39(4), 1–22.

    Google Scholar 

  • Oshima, J., Oshima, R., & Fujita, W. (2018). A mixed-methods approach to analyze shared epistemic agency in jigsaw instruction at multiple scales of temporality. Journal of Learning Analytics, 5(1), 10–24.

    Article  Google Scholar 

  • Picciano, A. G. (2012). The evolution of big data and learning analytics in American higher education. Journal of Asynchronous Learning Networks, 16(3), 9–20.

    Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Article  Google Scholar 

  • Reimann, P. (2009). Time is precious: Variable- and event-centred approaches to process analysis in CSCL research. International Journal of Computer-Supported Collaborative Learning, 4, 239–257.

    Article  Google Scholar 

  • Reimann, P., Frerejean, J., & Thompson, K. (2009). Using process mining to identify models of group decision making processes in chat data. In C. O’Malley, D. Suthers, P. Reimann, & A. Dimitracopoulou (Eds.), Computer-supported collaborative learning practices: CSCL2009 conference proceedings (pp. 98–107). International Society for the Learning Sciences.

    Google Scholar 

  • Russell, S., & Norvig, P. (2016). Artificial intelligence: A modern approach (global edition). London: Prentice-Hall.

    Google Scholar 

  • Sarkar, P., & Moore, A. W. (2006). Dynamic social network analysis using latent space models. In Y. Weiss, B. Scholkopf, and J. Platt (Eds.) Advances in neural information processing systems 18 (pp. 1145–1152). Cambridge, MA: MIT Press.

    Google Scholar 

  • Schneider, B., Sharma, K., Cuendet, S., Zufferey, G., Dillenbourg, P., & Pea, R. (2018). Leveraging mobile eye-trackers to capture joint visual attention in co-located collaborative learning groups. International Journal of Computer-Supported Collaborative Learning, 13(3), 241–261.

    Article  Google Scholar 

  • Schwarz, B., & Baker, M. (2016). Dialogue, Argumentation and education. Cambridge: Cambridge University Press.

    Google Scholar 

  • Shaffer, D. W., Hatfield, D., Svarovsky, G. N., Nash, P., Nulty, A., Bagley, E., Frank, K., Rupp, A. A., & Mislevy, R. (2009). Epistemic network analysis: A prototype for 21st-century assessment of learning. International Journal of Learning and Media, 1(2), 33–53.

    Article  Google Scholar 

  • Soller, A. (2004). Computational modeling and analysis of knowledge sharing in collaborative distance learning. User Modeling and User-Adapted Interaction, 14, 351–381.

    Article  Google Scholar 

  • Teddlie, C., & Tashakkori, A. (2009). Foundations of mixed methods research: Integrating quantitative and qualitative approaches in the social and behavioral sciences. London: Sage.

    Google Scholar 

  • Tuckman, B. W. (1965). Developmental sequence in small groups. Psychological Bulletin, 63(6), 384–399.

    Article  Google Scholar 

  • Turner, R., & Liu, L. (2014). Hmm.discnp: Hidden Markov models with discrete non-parametric observation distributions. R Package Version 0.2-3. Retrieved from http://CRAN.R-project.org/package=hmm.discnp

  • Visser, I., & Speekenbrink, M. (2010). depmixS4: An R Package for Hidden Markov Models. Journal of Statistical Software, 36, 1–21.

    Article  Google Scholar 

  • Walker, E., Rummel, N., & Koedinger, K. R. (2014). Adaptive intelligent support to improve peer tutoring in algebra. International Journal of Artificial Intelligence in Education, 24(1), 33–61.

    Article  Google Scholar 

  • Weinberger, A., & Fischer, F. (2006). A framework to analyze argumentative knowledge construction in computer-supported collaborative learning. Computers & Education, 46(1), 71–95.

    Article  Google Scholar 

  • Wise, A., & Chiu, M. M. (2011). Analyzing temporal patterns of knowledge construction in a role-based online discussion. International Journal of Computer-Supported Collaborative Learning, 6, 445–470.

    Article  Google Scholar 

  • Wolery, M., Busick, M., Reichow, B., & Barton, E. E. (2010). Comparison of overlap methods for quantitatively synthesizing single-subject data. The Journal of Special Education, 44(1), 18–28.

    Article  Google Scholar 

  • Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class Hadoop and streaming data. New York: McGraw-Hill Osborne Media.

    Google Scholar 

Further Readings

  • Abrahamson, D., Blikstein, P., & Wilensky, U. (2007). Classroom model, model classroom: Computer-supported methodology for investigating collaborative-learning pedagogy. In C. Chinn, G. Erkens, & S. Puntambekar (Eds.), Proceedings of the eighth International Conference on Computer Supported Collaborative Learning (CSCL) (Vol. 8, Part 1, pp. 49–58). International Society of the Learning Sciences. A powerful demonstration of how (deterministic) computational modeling can interact with empirical (classroom) research. Using the agent-based modeling tool, NetLogo, the authors provide an analysis of the mechanisms that lead to the emergence of stratified learning zones in a prototypical collaborative classroom activity. Also important because it highlights the tension between collaborative solving problems and learning from collaboration.

    Google Scholar 

  • Bergner, Y., Walker, E., & Ogan, A. (2017). Dynamic Bayesian Network models for peer tutoring interactions. In A. A. von Davier, M. Zhu, & P. C. Kyllonen (Eds.), Innovative assessment of collaboration (pp. 249–268). Springer. This chapter provides a nice illustration of the use of modern HMM approaches to analyzing (peer) tutorial dialogue. While an important area of collaborative learning, research on tutor–tutee dialogue is only partially reflected in the CSCL literature, with this chapter providing a welcome connection between CSCL, AI in Education, and assessment research. It includes an application in the context of an empirical study.

    Google Scholar 

  • Chiu, M. M. (2008). Flowing toward correct contributions during groups’ mathematics problem solving: A statistical discourse analysis. Journal of the Learning Sciences, 17(3), 415–463. This empirical study applied statistical discourse analysis to test whether (a) groups that created more correct, new ideas (micro-creativity) were more likely to solve a problem and (b) students’ recent actions (microtime context of evaluations, questions, justifications, politeness, and status differences) increased subsequent micro-creativity.

    Article  Google Scholar 

  • Chiu, M. M., & Lehmann-Willenbrock, N. (2016). Statistical discourse analysis: Modeling sequences of individual behaviors during group interactions across time. Group Dynamics: Theory, Research, and Practice, 20(3), 242–258. This article showcases statistical discourse analysis, a method that integrates most of the above methods (parallel chats, trees, group/individual differences, pivotal events, time periods, multiple target events, indirect effects, later group outcomes) and addresses related issues (e.g., missing data, inter-rater reliability, false positives, etc.).

    Article  Google Scholar 

  • Reimann, P. (2009). Time is precious: Variable- and event-centred approaches to process analysis in CSCL research. International Journal of Computer-Supported Collaborative Learning, 4, 239–257. This methodological paper provides an overview of qualitative, quantitative, and computational methods for analyzing temporal data in CSCL. It argues that there is a rather fundamental difference between explaining collaboration over time in terms of variables versus explaining them in terms of events. Implications for doing temporal analysis are discussed.

    Article  Google Scholar 

NAPLES Video

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Ming Chiu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chiu, M.M., Reimann, P. (2021). Statistical and Stochastic Analysis of Sequence Data. In: Cress, U., Rosé, C., Wise, A.F., Oshima, J. (eds) International Handbook of Computer-Supported Collaborative Learning. Computer-Supported Collaborative Learning Series, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-030-65291-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65291-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65290-6

  • Online ISBN: 978-3-030-65291-3

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics