ProcData: An R Package for Process Data Analysis

Tang, Xueying; Zhang, Susu; Wang, Zhi; Liu, Jingchen; Ying, Zhiliang

doi:10.1007/s11336-021-09798-7

ProcData: An R Package for Process Data Analysis

Application Reviews and Case Studies
Published: 11 August 2021

Volume 86, pages 1058–1083, (2021)
Cite this article

Psychometrika Aims and scope Submit manuscript

Xueying Tang¹,
Susu Zhang²,
Zhi Wang³,
Jingchen Liu ORCID: orcid.org/0000-0002-4937-2601³ &
…
Zhiliang Ying³

942 Accesses
6 Citations
Explore all metrics

Abstract

Process data refer to data recorded in log files of computer-based items. These data, represented as timestamped action sequences, keep track of respondents’ response problem-solving behaviors. Process data analysis aims at enhancing educational assessment accuracy and serving other assessment purposes by utilizing the rich information contained in response processes. The R package ProcData presented in this article is designed to provide tools for inspecting, processing, and analyzing process data. We define an S3 class ‘proc’ for organizing process data and extend generic methods summary and print for ‘proc’. Feature extraction methods for process data are implemented in the package for compressing information in the irregular response processes into regular numeric vectors. ProcData also provides functions for making predictions from neural-network-based sequence models. In addition, a real dataset of response processes from the climate control item in the 2012 Programme for International Student Assessment is included in the package.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

What is Qualitative in Qualitative Research

Article Open access 27 February 2019

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Article Open access 07 June 2017

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

Notes

ProcData 0.2.5 is used to produce the results in the case study. This version is available on CRAN at https://cran.r-project.org/src/contrib/Archive/ProcData/ or on Github at https://github.com/xytangtang/ProcData/tree/95a3658?.
http://www.oecd.org/pisa/test-2012/testquestions/question3/.

References

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.
Article Google Scholar
Borg, I., & Groenen, P. J. (2005). Modern multidimensional scaling: Theory and applications. New York, NY: Springer Science & Business Media. https://doi.org/10.1007/0-387-28981-X
Book Google Scholar
Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA Journal of Applied Mathematics, 6(1), 76–90.
Article Google Scholar
Chen, Y., Li, X., Liu, J., & Ying, Z. (2019). Statistical analysis of complex problem-solving process data: An event history analysis approach. Frontiers in Psychology, 10, 486. https://doi.org/10.3389/fpsyg.2019.00486
Article PubMed PubMed Central Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1724–1734). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1179.
Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3), 317–322.
Article Google Scholar
Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24(109), 23–26.
Article Google Scholar
Gómez-Alonso, C., & Valls, A. (2008). A similarity measure for sequences of categorical data based on the ordering of common elements. In V. Torra & Y. Narukawa (Eds.), Modeling decisions for artificial intelligence (pp. 134–145). Berlin, Heidelberg: Springer, Berlin Heidelberg.
Chapter Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: MIT Press.
Google Scholar
Hao, J., Smith, L., Mislevy, R., von Davier, A., & Bauer, M. (2016). Taming log files from game/simulation-based assessments: Data models and data analysis tools. ETS Research Report Series, 2016(1), 1–17.
Article Google Scholar
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 749–776). Hershey, PA: Information Science Reference. https://doi.org/10.4018/978-1-4666-9441-5.ch029.
Hinton, G., Srivastava, N., & Swersky, K. (2014). RMSProp: Divide the gradient by a running average of its recent magnitude. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture slides lec6.pdf.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article PubMed Google Scholar
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations.
Paradis, E. (2018). Multidimensional scaling with very large datasets. Journal of Computational and Graphical Statistics, 27(4), 935–939.
Article Google Scholar
Patterson, J., & Gibson, A. (2017). Deep learning: A practitioner’s approach. O’Reilly Media, Inc.
Qiao, X., & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic. Frontiers in Psychology, 9, 2231. https://doi.org/10.3389/fpsyg.2018.02231
Article PubMed PubMed Central Google Scholar
Ren, Y., Luo, F., Ren, P., Bai, D., Li, X., & Liu, H. (2019). Exploring multiple goals balancing in complex problem solving based on log data. Frontiers in Psychology, 10, 1975. https://doi.org/10.3389/fpsyg.2019.01975
Article PubMed PubMed Central Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. https://doi.org/10.1214/aoms/1177729586
Article Google Scholar
Shanno, D. F. (1970). Conditioning of quasi-newton methods for function minimization. Mathematics of Computation, 24(111), 647–656.
Article Google Scholar
Stadler, M., Fischer, F., & Greiff, S. (2019). Taking a closer look: An exploratory analysis of successful and unsuccessful strategy use in complex problems. Frontiers in Psychology, 10, 777. https://doi.org/10.3389/fpsyg.2019.00777
Article PubMed PubMed Central Google Scholar
Tang, S., Peterson, J. C., & Pardos, Z. A. (2016). Deep neural networks and how they apply to sequential education data. In: Proceedings of the third (2016) acm conference on learning@scale (pp. 321–324). https://doi.org/10.1145/2876034.2893444.
Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika. https://doi.org/10.1007/s11336-020-09708-3.
Article PubMed Google Scholar
Tang, X., Wang, Z., Liu, J., & Ying, Z. (2020). An exploratory analysis of the latent structure of process data via action sequence autoencoders. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12203
Article Google Scholar
Wang, C., Xu, G., Shang, Z., & Kuncel, N. (2018). Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method. Journal of Educational and Behavioral Statistics, 43(4), 469–501. https://doi.org/10.3102/1076998618767123
Article Google Scholar
Wang, X., Liu, Y., Sun, C., Wang, B., & Wang, X. (2015, July). Predicting polarities of tweets by composing word embeddings with long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, (volume 1: Long papers) (pp. 1343–1353). Beijing, China: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/P15-1130 https://doi.org/10.3115/v1/P15-1130.
Zeiler, M. D. (2012). Adadelta: an adaptive learning rate method. arXiv preprint arXiv: 1212.5701.
Zhang, S., Tang, X., He, Q., Liu, J., & Ying, Z. (2021). External correlates of adult digital problem-solving behavior: Log data analysis of a large-scale assessment. Retrieved from https://arxiv.org/pdf/2103.15036.pdf.
Zhang, S., Wang, Z., Qi, J., Liu, J., & Ying, Z. (2021). Accurate assessment via process data. Retrieved from https://arxiv.org/pdf/2103.15034.pdf.

Download references

Acknowledgements

This research is supported in part by NSF 1826540 and NSF 2119938.

Author information

Authors and Affiliations

University of Arizona, Tucson, USA
Xueying Tang
University of Illinois at Urbana-Champaign, Urbana-Champaign, USA
Susu Zhang
Columbia University, New York, NY, USA
Zhi Wang, Jingchen Liu & Zhiliang Ying

Authors

Xueying Tang
View author publications
You can also search for this author in PubMed Google Scholar
Susu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jingchen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiliang Ying
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingchen Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

This appendix contains the output of print() in Sect. 4.3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, X., Zhang, S., Wang, Z. et al. ProcData: An R Package for Process Data Analysis. Psychometrika 86, 1058–1083 (2021). https://doi.org/10.1007/s11336-021-09798-7

Download citation

Received: 16 August 2020
Revised: 29 June 2021
Accepted: 12 July 2021
Published: 11 August 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11336-021-09798-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ProcData: An R Package for Process Data Analysis

Abstract

Access this article

Similar content being viewed by others

What is Qualitative in Qualitative Research

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ProcData: An R Package for Process Data Analysis

Abstract

Access this article

Similar content being viewed by others

What is Qualitative in Qualitative Research

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation