Abstract
Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven’s progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Similar content being viewed by others
Notes
Thanks to the excellent suggestion received from reviewers of the first draft, it was decided to collect actual response data using automatically generated items and compare these to response data from published human generated personality items. Additional experiments with dropouts, another suggestion received from reviewers, which allow to train networks with a form of regularization, will be conducted in future research.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jozefowicz, R., Jia, Y., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Schuster, M., Monga, R., Moore, S., Murray, D., Olah, C., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (Google Research).
Bejar, I. I., Lawless, R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment. https://www.uam.es/personal_pdi/psicologia/fjabad/cv/articulos/jlta/A_Feasibility_Study_of_On_the_Fly_Item_Generation_in_Adaptive_Tes%5B1%5D.pdf. Accessed 7 March 2018.
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A., & Bengio, Y. (2015). A recurrent latent variable model for sequential data. arXiv:1506.02216v6 [cs.LG].
Cui, H., Wei, X., & Dai, M. (2010). Parallel implementation of expectation-maximization for fast convergence. In ACM proceedings. http://users.ece.cmu.edu/~hengganc/archive/report/final.pdf. Accessed 7 March 2018.
Cybenko, G. (1989). Approximations by superpositions of sigmoidal functions. Mathematics of Control, Signals, and Systems, 2(4), 303–314.
Dennis, J. E., & Schnabel, R. B. (1996). Numerical methods for unconstrained optimization and nonlinear equations. Classics in Applied Mathematics: Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611971200.
Dreyfus, S. E. (1990). Artificial neural networks, back propagation, and the Kelley–Bryson gradient procedure. Journal of Guidance, Control, and Dynamics, 13(5), 926–928.
Embretson, S. E. (2002). Generating abstract reasoning items with cognitive theory. In S. H. Irvine & P. C. Kyllonen (Eds.), Item generation for test development (p. 219250). Mahwah, NJ: Erlbaum.
Embretson, S. E., & Yang, X. (2007). Automatic item generation and cognitive psychology. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics: Psychometrics (Vol. 26, p. 747768). North Holland: Elsevier.
Gal, Y., & Ghahramani, Z. (2015). A theoretically grounded application of dropout in recurrent neural networks. Published in NIPS 2016. arXiv:1512.05287
Gierl, M. J., & Lai, H. (2013). Using automated processes to generate test items. Educational Measurement: Issues and Practice, 32, 3650.
Gilula, Z., & Haberman, S. J. (1994). Models for analyzing categorical panel data. Journal of the American Statistical Association, 89, 645–656.
Gilula, Z., & Haberman, S. J. (1995). Prediction functions for categorical panel data. The Annals of Statistics, 23, 1130–1142.
Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In I. Mervielde, I. Deary, F. De Fruyt, & F. Ostendorf (Eds.), Personality psychology in Europe (Vol. 7, pp. 7–28). Tilburg: Tilburg University Press.
Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., et al. (2006). The international personality item pool and the future of public-domain personality measures. Journal of Research in Personality, 40, 84–96.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, J. (2014). Generative adversarial networks. arXiv:1406.2661.
Greff, K., Srivastava, R. K., Koutnik, J., Steunebrink, B. R., & Schmidhuber, J. (2015). LSTM: A search space odyssey. arXiv preprint arXiv:1503.04069.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 17351780.
Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257. https://doi.org/10.1016/0893-6080(91)90009-T.
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer N., & Wu, Y. (2016). Exploring the limits of language modeling. arXiv:1602.02410v2.
Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In Proceedings of the 32nd international conference on machine learning, Lille, France (Vol. 37). JMLR: W&CP.
Karpathy, A. (2015). The unreasonable effectiveness of RNNs. http://karpathy.Github.io/2015/05/21/rnn-effectiveness/. Accessed 7 March 2018.
Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Mikolov, T. (2012). Statistical language models based on NNs. Ph.D. thesis, Brno University of Technology.
Ozair, S. (2016). Char-rnn for tensorflow. https://github.com/sherjilozair/char-rnn-tensorflow. Accessed 7 March 2018.
Rammstedt, B., & John, O. P. (2007). Measuring personality in one minute or less: A 10-item short version of the big five inventory in English and German. Journal of Research in Personality, 41, 203–212. https://doi.org/10.1016/j.jrp.2006.02.001.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation (Vol. 1). Cambridge, MA: MIT press.
Savage, L. (1971). Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66(336), 783–801. https://doi.org/10.2307/2284229.
Schäfer, A. M., & Zimmermann, H. G. (2006). Recurrent neural networks are universal approximators. In S. D. Kollias, A. Stafylopatis, W. Duch, & E. Oja (Eds.), Artificial neural networks— ICANN 2006. ICANN 2006. Lecture notes in computer science (Vol. 4131). Berlin: Springer.
Sundermeyer, M., Ney, H., & Schlüter, R. (2015). From feedforward to recurrent LSTM NNs for language modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3), 517–529. https://doi.org/10.1109/TASLP.2015.2400218.
Trask, A., Gilmore, D., & Russell, M. (2015). Modeling order in neural word embeddings at scale. CoRR, abs/1506.02338, 2015. arXiv:1506.02338.
von Davier, M. (2016). High-performance psychometrics: The parallel-E parallel-M algorithm for generalized latent variable models. ETS Research Report Series, 2016, 111. https://doi.org/10.1002/ets2.12120.
von Davier, M. (2017). New results on an improved parallel EM algorithm for estimating generalized latent variable models. In L. A. van der Ark, M. Wiberg, S. A. Culpepper, J. A. Douglas, & W.-C. Wang (Eds.) Quantitative psychology: Proceedings of the 81st annual meeting of the psychometric society, Asheville, North Carolina, 2016 (p. 1–8). http://www.springer.com/us/book/9783319562933.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
von Davier, M. Automated Item Generation with Recurrent Neural Networks. Psychometrika 83, 847–857 (2018). https://doi.org/10.1007/s11336-018-9608-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-018-9608-y