Skip to main content

National and International Educational Achievement Testing: A Case of Multi-level Validation Framed by the Ecological Model of Item Responding

  • Chapter
  • First Online:
Understanding and Investigating Response Processes in Validation Research

Part of the book series: Social Indicators Research Series ((SINS,volume 69))

Abstract

The results of large-scale student assessments are increasingly being used to rank nations, states, and schools and to inform policy decisions. These uses often rely on aggregated student test score data, and imply inferences about multilevel constructs. Validating uses and interpretations about these multilevel constructs requires appropriate multilevel validation techniques. This chapter combines multilevel data analysis techniques with an explanatory view of validity to develop explanations of score variation that can be used to evaluate multilevel measurement inferences. We use country-level mathematics scores from the Trends in International Mathematics and Science Study (TIMSS) to illustrate the integration of these techniques. The explanation focused view of validity accompanied by the ecological model of item responding situates conventional response process research in a multilevel construct setting and moves response process studies beyond the traditional focus on individual test-takers’ behaviors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Domain scores were used in the multilevel confirmatory factor analyses because they could be treated as continuous observed variables and hence conventional fit statistics were available to test for fit as well as the computational ease of using continuous scores resulting in substantially reduced computing time. Our’s is a variation on the use of item parcels. In our case, however, the parcels are theoretically driven and confirmed to be unidimensional. As further support for the use of the four domain scores in subsequent analyses, we fit a multilevel exploratory item response theory analysis for all 29 items simultaneously. The first three eigenvalues of the within level polychoric correlation matrix were 10.0, 1.5, and 1.3; and the first three eigenvalues of the between level correlation matrix were 22.4, 1.5, and 1.0. Clearly, the eigenvalues point toward one between and one within latent variable even when the items are the focus of analysis. The CFI = 0.92, RMSEA = 0.03, SRMR Within = 0.07, and SRMR Between = 0.06 for the one factor within and one factor between model. As an example of the computational burden of the item level analyses, the 29 item analysis described in this footnote required over 6 h of computational time whereas the domain models complete in less than 5 min each. All of this evidence lends further support for the use of the domain scores in the subsequent analyses.

References

  • Chen, G., Mathieu, J. E., & Bliese, P. D. (2004a). A framework for conducting multilevel construct validation. In F. J. Yammarino & F. Dansereau (Eds.), Research in multilevel issues: Multilevel issues in organizational behavior and processes (Vol. 3, pp. 273–303). Oxford, UK: Elsevier.

    Chapter  Google Scholar 

  • Chen, G., Mathieu, J. E., & Bliese, P. D. (2004b). Validating frogs and ponds in multilevel contexts: Some afterthoughts. In F. J. Yammarino & F. Dansereau (Eds.), Research in multilevel issues: Multilevel issues in organizational behavior and processes (Vol. 3, pp. 335–343). Oxford, UK: Elsevier.

    Chapter  Google Scholar 

  • Dansereau, F., & Yammarino, F. J. (2000). Within and between analysis: The variant paradigm as an underlying approach to theory building and testing. In K. J. Klein & S. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations (pp. 425–466). San Francisco, CA: Jossey-Bass.

    Google Scholar 

  • Forer, B., & Zumbo, B. D. (2011). Validation of multilevel constructs: Validation methods and empirical findings for the EDI. Social Indicators Research: An International Interdisciplinary Journal for Quality of Life Measurement, 103, 231–265. doi:10.1007/s11205-011-9844-3.

    Article  Google Scholar 

  • Goldstein, H., & McDonald, R. P. (1988). A general model for the analysis of multilevel data. Psychometrika, 53, 455–467.

    Article  Google Scholar 

  • Hofmann, D. A., & Jones, L.M. (2004). Some foundational and guiding questions for multilevel construct validation. In F. Yammarino & F. Dansereau (Eds.), Multi-level issues in organizational behavior and processes. Amsterdam: Elsevier.

    Google Scholar 

  • Kaplan, D., & Elliott, P. R. (1997). A didactic example of multilevel structural equation modeling applicable to the study of organizations. Structural Equation Modeling, 4, 1–24.

    Article  Google Scholar 

  • Klein, K. J., Dansereau, F., & Hall, R. J. (1994). Levels issues in theory development, data collection, and analysis. Academy of Management Review, 19, 195–229.

    Google Scholar 

  • Lee, S.-Y. (1990). Multilevel analysis of structural equation models. Biometrika, 77, 763–772.

    Article  Google Scholar 

  • Longford, N. T., & Muthén, B. O. (1992). Factor analysis for clustered observations. Psychometrika, 57, 581–597.

    Article  Google Scholar 

  • Morgeson, F. P., & Hofmann, D. A. (1999). The structure and function of collective constructs: Implications for multilevel research and theory development. Academy of Management Review, 24, 249–265.

    Google Scholar 

  • Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y., Arora, A., & Erberber, E. (2005). TIMSS 2007 assessment frameworks. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. URL: http://timss.bc.edu/timss2007/PDF/T07_AF.pdf.

  • Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376–398.

    Article  Google Scholar 

  • Muthén, B. O., & Satorra, A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267–316.

    Article  Google Scholar 

  • Raudenbush, S. W., Rowan, B., & Kang, S. J. (1991). A multilevel, multivariate model for studying school climate with estimation via the EM algorithm and application to U.S. high-school data. Journal of Educational Statistics, 16, 295–330.

    Article  Google Scholar 

  • Stone, J., & Zumbo, B. D. (2016). Validity as a Pragmatist project: A global concern with local application. In V. Aryadoust & J. Fox (Eds.), Trends in language assessment research and practice (pp. 555–573). Newcastle, UK: Cambridge Scholars Publishing.

    Google Scholar 

  • Watkins, K. (2007). Human development report 2007/2008, fighting climate change: Human solidarity in a divided world. New York, NY: United Nations Development Programme.

    Google Scholar 

  • Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, Vol. 26: Psychometrics (pp. 45–79). Amsterdam, The Netherlands: Elsevier Science B.V.

    Google Scholar 

  • Zumbo, B. D. (2009). Validity as contextualized and Pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65–82). Charlotte, NC: IAP – Information Age Publishing, Inc..

    Google Scholar 

  • Zumbo, B. D., & Forer, B. (2011). Testing and measurement from a multilevel view: Psychometrics and validation. In J. A. Bovaird, K. F. Geisinger, & C. W. Buckendahl (Eds.), High stakes testing in education – Science and practice in K-12 settings (pp. 177–190). Washington, DC: American Psychological Association Press.

    Chapter  Google Scholar 

  • Zumbo, B. D., & Gelin, M. N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1–23.

    Google Scholar 

  • Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Astivia, O. L. O., & Ark, T. K. (2015). A methodology for Zumbo’s Third Generation DIF analyses and the ecology of item eesponding. Language Assessment Quarterly, 12, 136–151.

    Article  Google Scholar 

Download references

Acknowledgment

The authors would like to thank Professor Fred Dansereau for his generous guidance and feedback on the WABA analyses, and Professor Bob Linn for the encouragement to publish this paper. An earlier version of this paper presented at the symposium “A Multilevel View of Test Validity”, 2010 Annual Meeting of the American Educational Research Association, Denver, CO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruno D. Zumbo .

Editor information

Editors and Affiliations

Appendices

Appendices

Appendix A: Countries Involved in the Study and Sample Size

Nation

Number of students

Algeria

384

Armenia

277

Australia

294

Bahrain

303

Bosnia and Herzegovina

301

Botswana

298

Bulgaria

288

Chinese Taipei

287

Colombia

347

Cyprus

314

Czech Republic

349

Egypt

466

England

299

Georgia

306

Ghana

377

Hong Kong, SAR

249

Hungary

285

Indonesia

305

Iran, Islamic Republic of

291

Israel

234

Italy

315

Japan

307

Jordan

370

Korea, Republic of

306

Kuwait

284

Lebanon

267

Lithuania

287

Malaysia

321

Malta

337

Mongolia

317

Norway

326

Oman

322

Palestinian National Authority

315

Qatar

516

Romania

303

Russian Federation

320

Saudi Arabia

307

Scotland

290

Serbia

288

Singapore

328

Slovenia

292

Sweden

369

Syria, Arab Republic of

327

Thailand

390

Tunisia

292

Turkey

314

Ukraine

321

United States

544

Appendix B: Listing of the National Level Curriculum Explanatory Variables

Variable

Description

Data coding

1. Calculator

Does the national curriculum contain statements/policies about the use of calculators in grade 8 mathematics?

Binary 0/1; Yes = 1

2. Computer

Does the national curriculum contain statements/policies about the use of computers in grade 8 mathematics?

Binary 0/1; Yes = 1

How much emphasis does the national mathematics curriculum place on the following?

3a. Basic

(a) Mastering basic skills and procedures

4 point scale;

None = 0,

Very Little = 1,

Some = 2,

A lot = 3

3b. Concept

(b) Understanding mathematical concepts and principles

4 point scale;

None = 0,

Very Little = 1,

Some = 2,

A lot = 3

3c. Real life

(c) Applying mathematics in real-life contexts

4 point scale;

None = 0,

Very Little = 1,

Some = 2,

A lot = 3

3d. Communicate

(d) Communicating mathematically

4 point scale;

None = 0,

Very Little = 1,

Some = 2,

A lot = 3

3e. Reason

(e) Reasoning mathematically

4 point scale;

None = 0,

Very Little = 1,

Some = 2,

A lot = 3

3f. Integrating

(f) Integrating mathematics with other subjects

4 point scale;

None = 0,

Very Little = 1,

Some = 2,

A lot = 3

3g. Proof

(g) Deriving formal proofs

4 point scale;

None = 0,

Very Little = 1,

Some = 2,

A lot = 3

4a & b. Which best describes how the mathematics curriculum addresses the issue of students with different levels of ability? (Two variables DFlevel and DFCur)

Different curricula are prescribed for students of different ability levels.

Design Matrix

DFlevel   DFcur

0 1

The same curriculum is prescribed for students of different ability levels, but at different levels of difficulty

1 0

The same curriculum is prescribed for all students

0 0

5. Remedial

Is there an official policy to provide remedial mathematics instruction at the eighth grade of formal schooling?

Binary 0/1; Yes = 1

6. Degree

Which are the current requirements for being a middle/lower secondary grade teacher? A degree from a teacher education program

Binary 0/1; Yes = 1

7. Exam

Across grades K-12, does an education authority in your country (e.g., National Ministry of Education) administer examinations in mathematics that have consequences for individual students, such as determining grade promotion, entry to a higher school system, entry to a university, and/or exiting or graduating from high school?

Binary 0/1; Yes = 1

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Zumbo, B.D., Liu, Y., Wu, A.D., Forer, B., Shear, B.R. (2017). National and International Educational Achievement Testing: A Case of Multi-level Validation Framed by the Ecological Model of Item Responding. In: Zumbo, B., Hubley, A. (eds) Understanding and Investigating Response Processes in Validation Research. Social Indicators Research Series, vol 69. Springer, Cham. https://doi.org/10.1007/978-3-319-56129-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56129-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56128-8

  • Online ISBN: 978-3-319-56129-5

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics