Skip to main content
Log in

Latent Feature Extraction for Process Data via Multidimensional Scaling

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Computer-based interactive items have become prevalent in recent educational assessments. In such items, detailed human–computer interactive process, known as response process, is recorded in a log file. The recorded response processes provide great opportunities to understand individuals’ problem solving processes. However, difficulties exist in analyzing these data as they are high-dimensional sequences in a nonstandard format. This paper aims at extracting useful information from response processes. In particular, we consider an exploratory analysis that extracts latent variables from process data through a multidimensional scaling framework. A dissimilarity measure is described to quantify the discrepancy between two response processes. The proposed method is applied to both simulated data and real process data from 14 PSTRE items in PIAAC 2012. A prediction procedure is used to examine the information contained in the extracted latent variables. We find that the extracted latent variables preserve a substantial amount of information in the process and have reasonable interpretability. We also empirically prove that process data contains more information than classic binary item responses in terms of out-of-sample prediction of many variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingchen Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Liu and Ying’s research is supported by National Science Foundation Grants SES-1826540 and IIS-1633360. He’s research is supported by National Science Foundation Grant IIS-1633353. The authors would like to thank Educational Testing Service for providing the data, and Hok Kan Ling for cleaning it.

Electronic supplementary material

Appendix

Appendix

To compare the prediction performance of features extracted from different dissimilarity measures, we compute the Levenshtein distance matrix of the action sequences for each item and extracted features using Procedure 1 with the number of features K chosen by fivefold cross-validation. With these newly extracted features, we repeat the experiment of score prediction using multiple items (Section 4.5.2). All the settings are the same as before. A comparison of the prediction performance with the results in the main text is presented in Figure 12. Although the \(\text {OSR}^2\) for the Levenshtein distance features is lower than that for the features extracted previously, it is still higher than that from the baseline model and the general trend of \(\text {OSR}^2\) as the number of items increases is similar.

Fig. 12
figure 12

Comparison of score prediction for features extracted based on different dissimilarity measures. “OSS” and “L” stand for the measure used in the main text and the Levenshtein distance, respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, X., Wang, Z., He, Q. et al. Latent Feature Extraction for Process Data via Multidimensional Scaling. Psychometrika 85, 378–397 (2020). https://doi.org/10.1007/s11336-020-09708-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-020-09708-3

Keywords

Navigation