Automated Item Generation with Recurrent Neural Networks

von Davier, Matthias

doi:10.1007/s11336-018-9608-y

Automated Item Generation with Recurrent Neural Networks

Published: 12 March 2018

Volume 83, pages 847–857, (2018)
Cite this article

Psychometrika Aims and scope Submit manuscript

Matthias von Davier ORCID: orcid.org/0000-0003-1298-9701¹

2156 Accesses
18 Citations
Explore all metrics

Abstract

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven’s progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

Thanks to the excellent suggestion received from reviewers of the first draft, it was decided to collect actual response data using automatically generated items and compare these to response data from published human generated personality items. Additional experiments with dropouts, another suggestion received from reviewers, which allow to train networks with a form of regularization, will be conducted in future research.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jozefowicz, R., Jia, Y., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Schuster, M., Monga, R., Moore, S., Murray, D., Olah, C., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (Google Research).
Bejar, I. I., Lawless, R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment. https://www.uam.es/personal_pdi/psicologia/fjabad/cv/articulos/jlta/A_Feasibility_Study_of_On_the_Fly_Item_Generation_in_Adaptive_Tes%5B1%5D.pdf. Accessed 7 March 2018.
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A., & Bengio, Y. (2015). A recurrent latent variable model for sequential data. arXiv:1506.02216v6 [cs.LG].
Cui, H., Wei, X., & Dai, M. (2010). Parallel implementation of expectation-maximization for fast convergence. In ACM proceedings. http://users.ece.cmu.edu/~hengganc/archive/report/final.pdf. Accessed 7 March 2018.
Cybenko, G. (1989). Approximations by superpositions of sigmoidal functions. Mathematics of Control, Signals, and Systems, 2(4), 303–314.
Article Google Scholar
Dennis, J. E., & Schnabel, R. B. (1996). Numerical methods for unconstrained optimization and nonlinear equations. Classics in Applied Mathematics: Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611971200.
Dreyfus, S. E. (1990). Artificial neural networks, back propagation, and the Kelley–Bryson gradient procedure. Journal of Guidance, Control, and Dynamics, 13(5), 926–928.
Article Google Scholar
Embretson, S. E. (2002). Generating abstract reasoning items with cognitive theory. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (p. 219250). Mahwah, NJ: Erlbaum.
Google Scholar
Embretson, S. E., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics: Psychometrics (Vol. 26, p. 747768). North Holland: Elsevier.
Google Scholar
Gal, Y., & Ghahramani, Z. (2015). A theoretically grounded application of dropout in recurrent neural networks. Published in NIPS 2016. arXiv:1512.05287
Gierl, M. J., & Lai, H. (2013). Using automated processes to generate test items. Educational Measurement: Issues and Practice, 32, 3650.
Article Google Scholar
Gilula, Z., & Haberman, S. J. (1994). Models for analyzing categorical panel data. Journal of the American Statistical Association, 89, 645–656.
Article Google Scholar
Gilula, Z., & Haberman, S. J. (1995). Prediction functions for categorical panel data. The Annals of Statistics, 23, 1130–1142.
Article Google Scholar
Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In I. Mervielde, I. Deary, F. De Fruyt, & F. Ostendorf (Eds.), Personality psychology in Europe (Vol. 7, pp. 7–28). Tilburg: Tilburg University Press.
Google Scholar
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., et al. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40, 84–96.
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, J. (2014). Generative adversarial networks. arXiv:1406.2661.
Greff, K., Srivastava, R. K., Koutnik, J., Steunebrink, B. R., & Schmidhuber, J. (2015). LSTM: A search space odyssey. arXiv preprint arXiv:1503.04069.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 17351780.
Article Google Scholar
Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257. https://doi.org/10.1016/0893-6080(91)90009-T.
Article Google Scholar
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer N., & Wu, Y. (2016). Exploring the limits of language modeling. arXiv:1602.02410v2.
Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd international conference on machine learning, Lille, France (Vol. 37). JMLR: W&CP.
Karpathy, A. (2015). The unreasonable effectiveness of RNNs. http://karpathy.Github.io/2015/05/21/rnn-effectiveness/. Accessed 7 March 2018.
Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Mikolov, T. (2012). Statistical language models based on NNs. Ph.D. thesis, Brno University of Technology.
Ozair, S. (2016). Char-rnn for tensorflow. https://github.com/sherjilozair/char-rnn-tensorflow. Accessed 7 March 2018.
Rammstedt, B., & John, O. P. (2007). Measuring personality in one minute or less: A 10-item short version of the big five inventory in English and German. Journal of Research in Personality, 41, 203–212. https://doi.org/10.1016/j.jrp.2006.02.001.
Article Google Scholar
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408.
Article Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation (Vol. 1). Cambridge, MA: MIT press.
Google Scholar
Savage, L. (1971). Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66(336), 783–801. https://doi.org/10.2307/2284229.
Article Google Scholar
Schäfer, A. M., & Zimmermann, H. G. (2006). Recurrent neural networks are universal approximators. In S. D. Kollias, A. Stafylopatis, W. Duch, & E. Oja (Eds.), Artificial neural networks— ICANN 2006. ICANN 2006. Lecture notes in computer science (Vol. 4131). Berlin: Springer.
Google Scholar
Sundermeyer, M., Ney, H., & Schlüter, R. (2015). From feedforward to recurrent LSTM NNs for language modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3), 517–529. https://doi.org/10.1109/TASLP.2015.2400218.
Article Google Scholar
Trask, A., Gilmore, D., & Russell, M. (2015). Modeling order in neural word embeddings at scale. CoRR, abs/1506.02338, 2015. arXiv:1506.02338.
von Davier, M. (2016). High-performance psychometrics: The parallel-E parallel-M algorithm for generalized latent variable models. ETS Research Report Series, 2016, 111. https://doi.org/10.1002/ets2.12120.
Article Google Scholar
von Davier, M. (2017). New results on an improved parallel EM algorithm for estimating generalized latent variable models. In L. A. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W.-C. Wang (Eds.) Quantitative psychology: Proceedings of the 81st annual meeting of the psychometric society, Asheville, North Carolina, 2016 (p. 1–8). http://www.springer.com/us/book/9783319562933.

Download references

Author information

Authors and Affiliations

National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA, 19104-3102, USA
Matthias von Davier

Authors

Matthias von Davier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthias von Davier.

Rights and permissions

Reprints and permissions

About this article

Cite this article

von Davier, M. Automated Item Generation with Recurrent Neural Networks. Psychometrika 83, 847–857 (2018). https://doi.org/10.1007/s11336-018-9608-y

Download citation

Received: 23 March 2017
Revised: 27 December 2017
Published: 12 March 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11336-018-9608-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automated Item Generation with Recurrent Neural Networks

Abstract

Access this article

Similar content being viewed by others

Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation

Discovering Gated Recurrent Neural Network Architectures

The Automatic Generation of Nonwords for Lexical Recognition Tests

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automated Item Generation with Recurrent Neural Networks

Abstract

Access this article

Similar content being viewed by others

Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation

Discovering Gated Recurrent Neural Network Architectures

The Automatic Generation of Nonwords for Lexical Recognition Tests

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation