Skip to main content

Selecting a Data Collection Design for Linking in Educational Measurement: Taking Differential Motivation into Account

  • Conference paper
Quantitative Psychology Research

Abstract

In educational measurement, multiple test forms are often constructed to measure the same construct. Linking procedures can be used to disentangle differences in test form difficulty and differences in the proficiency of examinees so that scores for different test forms can be used interchangeably. Multiple data collection designs can be used for collecting data to be used for linking. Differential motivation refers to the difference in test-taking motivation that exists between high-stakes and low-stakes administration conditions. In a high-stakes administration condition, an examinee is expected to work harder and strive for maximum performance, whereas a low-stakes administration condition elicits typical, rather than maximum, performance. Differential motivation can be considered a confounding variable when choosing a data collection design. We discuss the suitability of different data collection designs and the way they are typically implemented in practice with respect to the effect of differential motivation. An example using data from the Eindtoets Basisonderwijs (End of Primary School Test) highlights the need to consider differential motivation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Despite the theoretical difference between linking and equating, the same statistical methods are used in the two procedures. Therefore, the terms equating and linking are used interchangeably for the purpose of this paper.

References

  • Angoff WH (1971) Scales, norms, and equivalent scores. In: Thorndike RL (ed) Educational measurement, 2nd edn. American Council of Education, Washington, pp 508–600

    Google Scholar 

  • Béguin AA (2000) Robustness of equating high-stakes tests. Unpublished doctoral dissertation, Twente University, Enschede, The Netherlands

    Google Scholar 

  • Béguin AA, Maan A (2007) IRT linking of high-stakes tests with a low-stakes anchor. Paper presented at the 2007 Annual National Council of Measurement in Education (NCME) meeting, April 10–12, Chicago

    Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioural sciences, 2nd edn. Lawrence Erlbaum Associates, Hillsdale

    Google Scholar 

  • Embretson SE, Reise SP (2000) Item response theory for psychologists. Lawrence Erlbaum, Mahwah

    Google Scholar 

  • Emons WHM (1998) Nonequivalent groups IRT observed-score equating: its applicability and appropriateness for the Swedish Scholastic Aptitude Test. Twente University (unpublished report)

    Google Scholar 

  • Holland PW, Rubin DR (eds) (1982) Test equating. Academic, New York

    Google Scholar 

  • Holland PW, Wightman LE (1982) Section pre-equating: a preliminary investigation. In: Holland PW, Rubin DR (eds) Test equating. Academic, New York, pp 271–297

    Google Scholar 

  • Kiplinger VL, Linn RL (1996) Raising the stakes of test administration: the impact on student performance on the National Assessment of Educational Progress. Educ Assess 3:111–133

    Article  Google Scholar 

  • Kolen MJ, Brennan RL (2004) Test equating, scaling, and linking, 2nd edn. Springer Verlag, New York

    Book  MATH  Google Scholar 

  • Linacre JM (2002) What do infit and outfit, mean-square and standardized mean? Rasch Meas 16:878

    Google Scholar 

  • Maier MH (1993) Military aptitude testing: the past fifty years (DMCM Technical Report 93-700). Defence Manpower Data Center, Montery, CA

    Google Scholar 

  • Mair P, Hatzinger R, Maier M (2010) eRm: Extended Rasch Modeling. Retrieved from http: //CRAN.R-project.org/package=eRm

    Google Scholar 

  • Meijer RR, Sijtsma K (2001) Methodology review: evaluating person fit. Appl Psychol Meas 25:107–135

    Article  MathSciNet  Google Scholar 

  • Mittelhaëuser M, Béguin AA, Sijtsma K (2011) Comparing the effectiveness of different linking designs: the internal anchor versus the external anchor and pre-test data (Report No. 11-01). Retrieved from Psychometric Research Centre Web site: http://www.cito.nl/~/media/cito_nl/Files/Onderzoek%20en%20wetenschap/cito_mrd_report_2011_01.ashx

  • Mittelhaëuser M, Béguin AA, Sijtsma K (2013) Modeling differences in test-taking motivation: exploring the usefulness of the mixture Rasch model and person-fit statistics. In: Millsap RE, van der Ark LA, Bolt DM, Woods CM (eds) New developments in quantitative psychology. Springer, New York, pp 357–370

    Chapter  Google Scholar 

  • O’Neill HF, Sugrue B, Baker EL (1996) Effects of motivational interventions on the National Assessment of Educational Progress mathematics performance. Educ Assess 3:135–157

    Article  Google Scholar 

  • Rasch G (1960) Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research, Copenhagen

    Google Scholar 

  • Reckase MD (2009) Multidimensional item response theory models. Springer Verlag, New York

    Book  Google Scholar 

  • Reise SP, Flannery WP (1996) Assessing person-fit on measures of typical performance. Appl Meas Educ 9:9–26

    Article  Google Scholar 

  • Scheerens J, Glas C, Thomas SM (2003) Educational evaluation, assessment and monitoring: a systematic approach. Swets & Zeitlinger, Lisse

    Google Scholar 

  • Verhelst ND, Glas CAW, Verstralen HHFM (1995) One-parameter logistic model (OPLM). Cito, National Institute for Educational Measurement, Arnhem

    Google Scholar 

  • von Davier AA (2013) Observed-score equating: an overview. Psychometrika 78:605–623

    Article  MATH  MathSciNet  Google Scholar 

  • von Davier AA, Holland PW, Thayer DT (2004) The kernel method of test equating. Springer, New York

    MATH  Google Scholar 

  • Wise SL, DeMars CE (2005) Low examinee effort in low-stakes assessment: problems and potential solutions. Educ Assess 10:1–17

    Article  Google Scholar 

  • Wise SL, Kong X (2005) Response time effort: a new measure of examinee motivation in computer-based tests. Appl Meas Educ 18:163–183

    Article  Google Scholar 

  • Wolf LF, Smith JK (1995) The consequence of consequence: motivation, anxiety and test performance. Appl Meas Educ 8:227–242

    Article  Google Scholar 

  • Wolf LF, Smith JK, Birnbaum ME (1995) The consequence of performance, test, motivation, and mentally taxing. Appl Meas Educ 8:341–351

    Article  Google Scholar 

  • Wright BD, Masters GN (1982) Rating scale analysis. Mesa Press, Chicago

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marie-Anne Mittelhaëuser .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mittelhaëuser, MA., Béguin, A.A., Sijtsma, K. (2015). Selecting a Data Collection Design for Linking in Educational Measurement: Taking Differential Motivation into Account. In: Millsap, R., Bolt, D., van der Ark, L., Wang, WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. https://doi.org/10.1007/978-3-319-07503-7_11

Download citation

Publish with us

Policies and ethics