Skip to main content
Log in

ProcData: An R Package for Process Data Analysis

  • Application Reviews and Case Studies
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Process data refer to data recorded in log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents’ response problem-solving behaviors. Process data analysis aims at enhancing educational assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for inspecting, processing, and analyzing process data. We define an S3 class ‘proc’ for organizing process data and extend generic methods summary and print for ‘proc’. Feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for making predictions from neural-network-based sequence models. In addition, a real dataset of response processes from the climate control item in the 2012 Programme for International Student Assessment is included in the package.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. ProcData 0.2.5 is used to produce the results in the case study. This version is available on CRAN at https://cran.r-project.org/src/contrib/Archive/ProcData/ or on Github at https://github.com/xytangtang/ProcData/tree/95a3658?.

  2. http://www.oecd.org/pisa/test-2012/testquestions/question3/.

References

  • Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.

    Article  Google Scholar 

  • Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. New York, NY: Springer Science & Business Media. https://doi.org/10.1007/0-387-28981-X

    Book  Google Scholar 

  • Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6(1), 76–90.

    Article  Google Scholar 

  • Chen, Y., Li, X., Liu, J., & Ying, Z. (2019). Statistical analysis of complex problem-solving process data: An event history analysis approach. Frontiers in Psychology, 10, 486. https://doi.org/10.3389/fpsyg.2019.00486

    Article  PubMed  PubMed Central  Google Scholar 

  • Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1724–1734). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1179.

  • Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3), 317–322.

    Article  Google Scholar 

  • Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24(109), 23–26.

    Article  Google Scholar 

  • Gómez-Alonso, C., & Valls, A. (2008). A similarity measure for sequences of categorical data based on the ordering of common elements. In V. Torra & Y. Narukawa (Eds.), Modeling decisions for artificial intelligence (pp. 134–145). Berlin, Heidelberg: Springer, Berlin Heidelberg.

    Chapter  Google Scholar 

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: MIT Press.

    Google Scholar 

  • Hao, J., Smith, L., Mislevy, R., von Davier, A., & Bauer, M. (2016). Taming log files from game/simulation-based assessments: Data models and data analysis tools. ETS Research Report Series, 2016(1), 1–17.

    Article  Google Scholar 

  • He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 749–776). Hershey, PA: Information Science Reference. https://doi.org/10.4018/978-1-4666-9441-5.ch029.

  • Hinton, G., Srivastava, N., & Swersky, K. (2014). RMSProp: Divide the gradient by a running average of its recent magnitude. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture slides lec6.pdf.

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  PubMed  Google Scholar 

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations.

  • Paradis, E. (2018). Multidimensional scaling with very large datasets. Journal of Computational and Graphical Statistics, 27(4), 935–939.

    Article  Google Scholar 

  • Patterson, J., & Gibson, A. (2017). Deep learning: A practitioner’s approach. O’Reilly Media, Inc.

  • Qiao, X., & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic. Frontiers in Psychology, 9, 2231. https://doi.org/10.3389/fpsyg.2018.02231

    Article  PubMed  PubMed Central  Google Scholar 

  • Ren, Y., Luo, F., Ren, P., Bai, D., Li, X., & Liu, H. (2019). Exploring multiple goals balancing in complex problem solving based on log data. Frontiers in Psychology, 10, 1975. https://doi.org/10.3389/fpsyg.2019.01975

    Article  PubMed  PubMed Central  Google Scholar 

  • Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. https://doi.org/10.1214/aoms/1177729586

    Article  Google Scholar 

  • Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24(111), 647–656.

    Article  Google Scholar 

  • Stadler, M., Fischer, F., & Greiff, S. (2019). Taking a closer look: An exploratory analysis of successful and unsuccessful strategy use in complex problems. Frontiers in Psychology, 10, 777. https://doi.org/10.3389/fpsyg.2019.00777

    Article  PubMed  PubMed Central  Google Scholar 

  • Tang, S., Peterson, J. C., & Pardos, Z. A. (2016). Deep neural networks and how they apply to sequential education data. In: Proceedings of the third (2016) acm conference on learning@scale (pp. 321–324). https://doi.org/10.1145/2876034.2893444.

  • Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika. https://doi.org/10.1007/s11336-020-09708-3.

    Article  PubMed  Google Scholar 

  • Tang, X., Wang, Z., Liu, J., & Ying, Z. (2020). An exploratory analysis of the latent structure of process data via action sequence autoencoders. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12203

    Article  Google Scholar 

  • Wang, C., Xu, G., Shang, Z., & Kuncel, N. (2018). Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method. Journal of Educational and Behavioral Statistics, 43(4), 469–501. https://doi.org/10.3102/1076998618767123

    Article  Google Scholar 

  • Wang, X., Liu, Y., Sun, C., Wang, B., & Wang, X. (2015, July). Predicting polarities of tweets by composing word embeddings with long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, (volume 1: Long papers) (pp. 1343–1353). Beijing, China: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P15-1130https://doi.org/10.3115/v1/P15-1130.

  • Zeiler, M. D. (2012). Adadelta: an adaptive learning rate method. arXiv preprint arXiv: 1212.5701.

  • Zhang, S., Tang, X., He, Q., Liu, J., & Ying, Z. (2021). External correlates of adult digital problem-solving behavior: Log data analysis of a large-scale assessment. Retrieved from https://arxiv.org/pdf/2103.15036.pdf.

  • Zhang, S., Wang, Z., Qi, J., Liu, J., & Ying, Z. (2021). Accurate assessment via process data. Retrieved from https://arxiv.org/pdf/2103.15034.pdf.

Download references

Acknowledgements

This research is supported in part by NSF 1826540 and NSF 2119938.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingchen Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This appendix contains the output of print() in Sect. 4.3.

figure o

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, X., Zhang, S., Wang, Z. et al. ProcData: An R Package for Process Data Analysis. Psychometrika 86, 1058–1083 (2021). https://doi.org/10.1007/s11336-021-09798-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-021-09798-7

Keywords

Navigation