The Curse of Explanation: Model Selection in Language Testing Research

Choi, Ikkyu

doi:10.1007/978-981-15-8952-2_9

Ikkyu Choi³

460 Accesses

Abstract

Language testing researchers often use statistical models to approximate and study a true model (i.e., the underlying system that is responsible for generating data). Building a model that successfully approximates the true model is not an easy task and typically involves data-driven model selection. However, available tools for model selection cannot guarantee successful reproduction of the true model. Moreover, there are consequences of model selection that affect the quality of inferences. Introducing and illustrating some of these issues related to model selection is the goal of this chapter. In particular, I focus on three issues: (1) uncertainty due to model selection in statistical inference, (2) successful approximations of data with an incorrect model, and (3) existence of substantively different models whose statistical counterparts are highly comparable. I conclude with a call for explicitly acknowledging and justifying model selection processes, as laid out in Bachman’s research use argument framework (2006, 2009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Critical review of validation models and practices in language testing: their limitations and future directions for validation research

Article Open access 10 August 2019

Models, robustness, and non-causal explanation: a foray into cognitive science and biology

Article 26 July 2014

Writing about Structural Equation Modelling

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Article Google Scholar
Bachman, L. F. (1982). The trait structure of cloze test scores. TESOL Quarterly, 16, 61–70.
Article Google Scholar
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Google Scholar
Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17, 1–42.
Article Google Scholar
Bachman, L. F. (2006). Generalizability: A journey into the nature of empirical research in applied linguistics. In M. Chalhoub-Deville, C. A. Chapelle, & P. Duff (Eds.), Inference and generalizability in applied linguistics: Multiple perspectives (pp. 165–207). Dordrecht, The Netherlands: John Benjamins.
Chapter Google Scholar
Bachman, L. F. (2009). Generalizability and research use arguments. In K. Ercikan & W-M. Roth (Eds.), Generalizing from educational research (pp. 127–148). New York, NY: Tayler & Francis.
Google Scholar
Bachman, L. F. (2013). Ongoing challenges in language assessment. In A. J. Kunnan (Ed.), The companion to language assessment. Wiley-Blackwell: Hoboken, NJ.
Google Scholar
Bachman, L. F., & Palmer, A. S. (1981). The construct validation of the FSI oral interview. Language Learning, 31, 67–86.
Article Google Scholar
Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of communicative proficiency. TESOL Quarterly, 16, 444–465.
Google Scholar
Bae, J., & Bachman, L. F. (1998). A latent variable approach to listening and reading: Testing factorial invariance across two groups of children in the Korean/English two-way immersion program. Language Testing, 15, 380–414.
Article Google Scholar
Bae, J., & Bachman, L. F. (2010). An investigation of four writing traits and two tasks across two languages. Language Testing, 27, 213–234.
Article Google Scholar
Bellman, R. E. (1961). Adaptive control processes. Princeton, NJ: Princeton University Press.
Book Google Scholar
Berk, R. A. (2016). Statistical learning from a regression perspective (2nd ed.). New York, NY: Springer.
Book Google Scholar
Berk, R. A., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2013). Valid post-selection inference. The Annals of Statistics, 41, 802–837.
Article Google Scholar
Berk, R. A., Brown, L., & Zhao, L. (2010). Statistical inference after model selection. Journal of Quantitative Criminology, 26, 217–236.
Article Google Scholar
Berk, R. A., & Freedman, D. A. (2003). Statistical assumptions as empirical commitments. In T. G. Blomberg & S. Cohen (Eds.), Law, punishment, and social control: Essays in honor of Sheldon Messinger (pp. 235–254). New York, NY: Aldine de Gruyter.
Google Scholar
Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.
Article Google Scholar
Breiman, L. (2001a). Statistical modeling: The two cultures. Statistical Science, 16, 199–231.
Article Google Scholar
Breiman, L. (2001b). Random forests. Machine Learning, 45, 5–32.
Article Google Scholar
Brown, L. D. (1967). The conditional level of Student’s t test. The Annals of Mathematical Statistics, 38, 1068–1071.
Article Google Scholar
Buehler, R. J., & Feddersen, A. P. (1963). Note on a conditional property of Student’s t. The Annals of Mathematical Statistics, 34, 1098–1100.
Article Google Scholar
Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Series A, 158, 419–466.
Article Google Scholar
Cox, D. R., & Snell, E. J. (1974). The choice of variables in observational studies. Journal of the Royal Statistical Society, Series C, 23, 51–59.
Google Scholar
Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structures analysis and the “problem” of sample size: A clarification. Psychological Bulletin, 109, 512–519.
Article Google Scholar
Educational Testing Service. (2019). About the TOEFL iBT® test. https://www.ets.org/toefl/ibt/about.
Faraway, J. J. (2016). Does data splitting improve prediction? Statistics and Computing, 26, 40–60.
Article Google Scholar
Fouly, K., Bachman, L. F., & Cziko, G. (1990). The divisibility of language competence: A confirmatory approach. Language Learning, 40, 1–21.
Article Google Scholar
Gelman, A., & Nolan, D. (2002). Teaching statistics: A bag of tricks. Oxford: Oxford University Press.
Google Scholar
Kabalia, P. (1998). Valid confidence intervals in regression after variable selection. Econometric Theory, 14, 463–482.
Article Google Scholar
Kadane, J. B., & Lazar, N. A. (2004). Methods and criteria for model selection. Journal of the American Statistical Association, 99, 279–290.
Article Google Scholar
Lee, S., & Hershberger, S. (1990). A simple rule for generating equivalent models in covariance structure modeling. Multivariate Behavioral Research, 25, 313–334.
Article Google Scholar
Lee, T., MacCallum, R. C., & Browne, M. W. (2018). Fungible parameter estimates in structural equation modeling. Psychological Methods, 23, 58–75.
Article Google Scholar
Leeb, H., & Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory, 21, 21–59.
Article Google Scholar
Leeb, H., & Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34, 2554–2591.
Article Google Scholar
Leeb, H., & Pötscher, B. M. (2008). Model selection. In T. G. Anderson, R. A. Davis, J. P. Kreib, & T. Mikosch (Eds.), The handbook of financial time series (pp. 785–821). New York, NY: Springer.
Google Scholar
MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin, 114, 185–199.
Article Google Scholar
MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4, 415–447.
Article Google Scholar
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall.
Book Google Scholar
Meehl, P. E., & Waller, N. G. (2002). The path analysis controversy: A new statistical approach to strong appraisal of verisimilitude. Psychological Methods, 7, 283–300.
Article Google Scholar
R Core Team. (2019). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Schapire, R. E. (1999). A brief introduction to boosting. In Proceedings of the Sixth International Joint Conference on Artificial Intelligence (pp. 1401–1406). Stockholm, Sweden.
Google Scholar
Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Annals of Statistics, 7, 1019–1033.
Article Google Scholar
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25, 289–310.
Article Google Scholar
Waller, N. G. (2008). Fungible weights in multiple regression. Psychometrika, 73, 691–703.
Article Google Scholar
Waller, N. G., & Jones, J. A. (2009). Locating the extrema of fungible regression weights. Psychometrika, 74, 589–602.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Educational Testing Service, Princeton, New Jersey, USA
Ikkyu Choi

Authors

Ikkyu Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ikkyu Choi .

Editor information

Editors and Affiliations

Iowa State University, Ames, IA, USA
Gary J. Ockey
Brigham Young University–Hawaii, Laie, HI, USA
Brent A. Green

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Choi, I. (2020). The Curse of Explanation: Model Selection in Language Testing Research. In: Ockey, G.J., Green, B.A. (eds) Another Generation of Fundamental Considerations in Language Assessment. Springer, Singapore. https://doi.org/10.1007/978-981-15-8952-2_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-8952-2_9
Published: 24 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8951-5
Online ISBN: 978-981-15-8952-2
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics

The Curse of Explanation: Model Selection in Language Testing Research

Abstract

Access this chapter

Similar content being viewed by others

Critical review of validation models and practices in language testing: their limitations and future directions for validation research

Models, robustness, and non-causal explanation: a foray into cognitive science and biology

Writing about Structural Equation Modelling

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

The Curse of Explanation: Model Selection in Language Testing Research

Abstract

Access this chapter

Similar content being viewed by others

Critical review of validation models and practices in language testing: their limitations and future directions for validation research

Models, robustness, and non-causal explanation: a foray into cognitive science and biology

Writing about Structural Equation Modelling

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation