Skip to main content

The Curse of Explanation: Model Selection in Language Testing Research

  • Chapter
  • First Online:
Another Generation of Fundamental Considerations in Language Assessment
  • 460 Accesses

Abstract

Language testing researchers often use statistical models to approximate and study a true model (i.e., the underlying system that is responsible for generating data). Building a model that successfully approximates the true model is not an easy task and typically involves data-driven model selection. However, available tools for model selection cannot guarantee successful reproduction of the true model. Moreover, there are consequences of model selection that affect the quality of inferences. Introducing and illustrating some of these issues related to model selection is the goal of this chapter. In particular, I focus on three issues: (1) uncertainty due to model selection in statistical inference, (2) successful approximations of data with an incorrect model, and (3) existence of substantively different models whose statistical counterparts are highly comparable. I conclude with a call for explicitly acknowledging and justifying model selection processes, as laid out in Bachman’s research use argument framework (2006, 2009).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.

    Article  Google Scholar 

  • Bachman, L. F. (1982). The trait structure of cloze test scores. TESOL Quarterly, 16, 61–70.

    Article  Google Scholar 

  • Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

    Google Scholar 

  • Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17, 1–42.

    Article  Google Scholar 

  • Bachman, L. F. (2006). Generalizability: A journey into the nature of empirical research in applied linguistics. In M. Chalhoub-Deville, C. A. Chapelle, & P. Duff (Eds.), Inference and generalizability in applied linguistics: Multiple perspectives (pp. 165–207). Dordrecht, The Netherlands: John Benjamins.

    Chapter  Google Scholar 

  • Bachman, L. F. (2009). Generalizability and research use arguments. In K. Ercikan & W-M. Roth (Eds.), Generalizing from educational research (pp. 127–148). New York, NY: Tayler & Francis.

    Google Scholar 

  • Bachman, L. F. (2013). Ongoing challenges in language assessment. In A. J. Kunnan (Ed.), The companion to language assessment. Wiley-Blackwell: Hoboken, NJ.

    Google Scholar 

  • Bachman, L. F., & Palmer, A. S. (1981). The construct validation of the FSI oral interview. Language Learning, 31, 67–86.

    Article  Google Scholar 

  • Bachman, L. F., & Palmer, A. S. (1982). The construct validation of some components of communicative proficiency. TESOL Quarterly, 16, 444–465.

    Google Scholar 

  • Bae, J., & Bachman, L. F. (1998). A latent variable approach to listening and reading: Testing factorial invariance across two groups of children in the Korean/English two-way immersion program. Language Testing, 15, 380–414.

    Article  Google Scholar 

  • Bae, J., & Bachman, L. F. (2010). An investigation of four writing traits and two tasks across two languages. Language Testing, 27, 213–234.

    Article  Google Scholar 

  • Bellman, R. E. (1961). Adaptive control processes. Princeton, NJ: Princeton University Press.

    Book  Google Scholar 

  • Berk, R. A. (2016). Statistical learning from a regression perspective (2nd ed.). New York, NY: Springer.

    Book  Google Scholar 

  • Berk, R. A., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2013). Valid post-selection inference. The Annals of Statistics, 41, 802–837.

    Article  Google Scholar 

  • Berk, R. A., Brown, L., & Zhao, L. (2010). Statistical inference after model selection. Journal of Quantitative Criminology, 26, 217–236.

    Article  Google Scholar 

  • Berk, R. A., & Freedman, D. A. (2003). Statistical assumptions as empirical commitments. In T. G. Blomberg & S. Cohen (Eds.), Law, punishment, and social control: Essays in honor of Sheldon Messinger (pp. 235–254). New York, NY: Aldine de Gruyter.

    Google Scholar 

  • Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791–799.

    Article  Google Scholar 

  • Breiman, L. (2001a). Statistical modeling: The two cultures. Statistical Science, 16, 199–231.

    Article  Google Scholar 

  • Breiman, L. (2001b). Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  • Brown, L. D. (1967). The conditional level of Student’s t test. The Annals of Mathematical Statistics, 38, 1068–1071.

    Article  Google Scholar 

  • Buehler, R. J., & Feddersen, A. P. (1963). Note on a conditional property of Student’s t. The Annals of Mathematical Statistics, 34, 1098–1100.

    Article  Google Scholar 

  • Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Series A, 158, 419–466.

    Article  Google Scholar 

  • Cox, D. R., & Snell, E. J. (1974). The choice of variables in observational studies. Journal of the Royal Statistical Society, Series C, 23, 51–59.

    Google Scholar 

  • Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structures analysis and the “problem” of sample size: A clarification. Psychological Bulletin, 109, 512–519.

    Article  Google Scholar 

  • Educational Testing Service. (2019). About the TOEFL iBT® test. https://www.ets.org/toefl/ibt/about.

  • Faraway, J. J. (2016). Does data splitting improve prediction? Statistics and Computing, 26, 40–60.

    Article  Google Scholar 

  • Fouly, K., Bachman, L. F., & Cziko, G. (1990). The divisibility of language competence: A confirmatory approach. Language Learning, 40, 1–21.

    Article  Google Scholar 

  • Gelman, A., & Nolan, D. (2002). Teaching statistics: A bag of tricks. Oxford: Oxford University Press.

    Google Scholar 

  • Kabalia, P. (1998). Valid confidence intervals in regression after variable selection. Econometric Theory, 14, 463–482.

    Article  Google Scholar 

  • Kadane, J. B., & Lazar, N. A. (2004). Methods and criteria for model selection. Journal of the American Statistical Association, 99, 279–290.

    Article  Google Scholar 

  • Lee, S., & Hershberger, S. (1990). A simple rule for generating equivalent models in covariance structure modeling. Multivariate Behavioral Research, 25, 313–334.

    Article  Google Scholar 

  • Lee, T., MacCallum, R. C., & Browne, M. W. (2018). Fungible parameter estimates in structural equation modeling. Psychological Methods, 23, 58–75.

    Article  Google Scholar 

  • Leeb, H., & Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory, 21, 21–59.

    Article  Google Scholar 

  • Leeb, H., & Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34, 2554–2591.

    Article  Google Scholar 

  • Leeb, H., & Pötscher, B. M. (2008). Model selection. In T. G. Anderson, R. A. Davis, J. P. Kreib, & T. Mikosch (Eds.), The handbook of financial time series (pp. 785–821). New York, NY: Springer.

    Google Scholar 

  • MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of equivalent models in applications of covariance structure analysis. Psychological Bulletin, 114, 185–199.

    Article  Google Scholar 

  • MacKay, D. J. C. (1992). Bayesian interpolation. Neural Computation, 4, 415–447.

    Article  Google Scholar 

  • McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall.

    Book  Google Scholar 

  • Meehl, P. E., & Waller, N. G. (2002). The path analysis controversy: A new statistical approach to strong appraisal of verisimilitude. Psychological Methods, 7, 283–300.

    Article  Google Scholar 

  • R Core Team. (2019). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing.

    Google Scholar 

  • Schapire, R. E. (1999). A brief introduction to boosting. In Proceedings of the Sixth International Joint Conference on Artificial Intelligence (pp. 1401–1406). Stockholm, Sweden.

    Google Scholar 

  • Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Annals of Statistics, 7, 1019–1033.

    Article  Google Scholar 

  • Shmueli, G. (2010). To explain or to predict? Statistical Science, 25, 289–310.

    Article  Google Scholar 

  • Waller, N. G. (2008). Fungible weights in multiple regression. Psychometrika, 73, 691–703.

    Article  Google Scholar 

  • Waller, N. G., & Jones, J. A. (2009). Locating the extrema of fungible regression weights. Psychometrika, 74, 589–602.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ikkyu Choi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Choi, I. (2020). The Curse of Explanation: Model Selection in Language Testing Research. In: Ockey, G.J., Green, B.A. (eds) Another Generation of Fundamental Considerations in Language Assessment. Springer, Singapore. https://doi.org/10.1007/978-981-15-8952-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-8952-2_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-8951-5

  • Online ISBN: 978-981-15-8952-2

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics