Skip to main content

Scoring of Complex Multiple Choice Items in NEPS Competence Tests

  • Chapter
  • First Online:
Methodological Issues of Longitudinal Surveys

Abstract

In order to precisely assess the cognitive achievement and abilities of students, different types of items are often used in competence tests. In the National Educational Panel Study (NEPS), test instruments also consist of items with different response formats, mainly simple multiple choice (MC) items in which one answer out of four is correct and complex multiple choice (CMC) items comprising several dichotomous “yes/no” subtasks. The different subtasks of CMC items are usually aggregated to a polytomous variable and analyzed via a partial credit model. When developing an appropriate scaling model for the NEPS competence tests, different questions arose concerning the response formats in the partial credit model. Two relevant issues were how the response categories of polytomous CMC variables should be scored in the scaling model and how the different item formats should be weighted. In order to examine which aggregation of item response categories and which item format weighting best models the two response formats of CMC and MC items, different procedures of aggregating response categories and weighting item formats were analyzed in the NEPS, and the appropriateness of these procedures to model the data was evaluated using certain item fit and test fit indices. Results suggest that a differentiated scoring without an aggregation of categories of CMC items best discriminates between persons. Additionally, for the NEPS competence data, an item format weighting of one point for MC items and half a point for each subtask of CMC items yields the best item fit for both MC and CMC items. In this paper, we summarize important results of the research on the implementation of different response formats conducted in the NEPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adams, R., & Wu, M. (2002). PISA 2000 technical report. Paris, France: OECD.

    Google Scholar 

  • Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. Brandon-Tuma (Ed.), Sociological methodology (pp. 33–80). San Francisco, CA: Jossey-Bass.

    Google Scholar 

  • Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21(1), 65–88.

    Google Scholar 

  • Blömeke, S., Kaiser, G., & Lehmann, R. (2010). TEDS-M 2008—Professionelle Kompetenz und Lerngelegenheiten angehender Primarstufenlehrkräfte im internationalen Vergleich. Münster: Waxmann.

    Google Scholar 

  • Downing, S. M., & Haladyna, T. M. (Eds.). (2006). Handbook of test development. Mahwah, NJ: L. Erlbaum.

    Google Scholar 

  • Downing, S. M. (2006). Selected-response item formats in test development. In S. M. Downing, & T. M. Haladyna (Eds.), Handbook of test development (pp. 3–26). Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Ferrara, S., Huynh, H., & Michaels, H. (1999). Contextual explanations of local dependence in item clusters in a large-scale hands-on science performance assessment. Journal of Educational Measurement, 36(1), 119 – 140.

    Google Scholar 

  • Gräfe, L. (2012). How to deal with missing responses in competency tests ? A comparison of data- and model-based IRT approaches (Unpublished diploma thesis). Friedrich-Schiller-University Jena, Jena, Germany.

    Google Scholar 

  • Hahn, I., Schöps, K., Rönnebeck, S., Martensen, M., Hansen, S., Saß, S., … Prenzel, M. (2013). Assessing scientific literacy over the lifespan—A description of the NEPS science framework and the test development. Journal of Educational Research Online, 5(2), 110 – 138.

    Google Scholar 

  • Haberkorn, K., Pohl, S., & Carstensen, C. (2015). Incorporating different response formats of competence tests in an IRT model. Manuscript submitted for publication.

    Google Scholar 

  • Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23(1), 17–27.

    Google Scholar 

  • Haladyna, T. M., & Rodriguez, M. C. (2013) Developing and validating test items. New York, NY: Routledge.

    Google Scholar 

  • Hsu, T. C. (1984). The merits of multiple-answer items as evaluated by using six scoring formulas. Journal of Experimental Education, 52(3), 152–158.

    Google Scholar 

  • Huynh, H. (1994). On equivalence between a partial credit item and a set of independent Rasch binary items. Psychometrika, 59(1), 111–119.

    Google Scholar 

  • Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Thousand Oaks, CA: Sage.

    Book  Google Scholar 

  • Lukhele, R., & Sireci, S. G. (1995, April). Using IRT to combine multiple-choice and free-response sections of a test on to a common scale using a priori weights. Paper presented at the annual conference of the National Council on Measurement in Education, San Francisco, CA.

    Google Scholar 

  • Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.

    Google Scholar 

  • Neumann, I., Duchardt, C., Grüßing, M., Heinze, A., Knopp, E., & Ehmke, T. (2013). Modeling and assessing mathematical competence over the lifespan. Journal of Educational Research Online, 5(2), 80 – 109.

    Google Scholar 

  • OECD (2009). PISA 2006 technical report. Paris, France: OECD.

    Book  Google Scholar 

  • Olson, J. F., Martin, M. O., & Mullis, I. V. S. (Eds.). (2008). TIMSS 2007 technical report. Chestnut Hill, MA: Boston College.

    Google Scholar 

  • Osterlind, S. J. (1998). Constructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Dordrecht, Netherlands: Kluwer Academic.

    Google Scholar 

  • Penfield, R. D., Myers, N. D., & Wolfe, E. W. (2008). Methods for assessing item, step, and threshold invariance. Polytomous items following the partial credit model. Educational and Psychological Measurement, 68(5), 717 – 733.

    Google Scholar 

  • Pohl, S., & Carstensen, C. H. (2012). NEPS technical report—Scaling the data of the competence tests. (NEPS Working Paper No. 14). Bamberg: University of Bamberg, National Educational Panel Study.

    Google Scholar 

  • Pohl, S., & Carstensen, C. H. (2013). Scaling the competence tests in the National Educational Panel Study—Many questions, some answers, and further challenges. Journal of Educational Research Online, 5(2), 189 – 216.

    Google Scholar 

  • Pohl, S., Gräfe, L., & Rose, N. (2013). Dealing with omitted and not reached items in competence tests—Evaluating approaches accounting for missing responses in IRT models. Educational and Psychological Measurement, 74(3), 423 – 452.

    Google Scholar 

  • Rodriguez, M. (2002). Choosing an item format. In G. Tindal, & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 213–231). Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Schöps K., & Saß, S. (2013). NEPS technical report for science—Scaling results of Starting Cohort 4 in ninth grade. (NEPS Working Paper No 23). Bamberg: University of Bamberg, National Educational Panel Study.

    Google Scholar 

  • Senkbeil, M. & Ihme, J. M. (2012). NEPS technical report for computer literacy—Scaling results of Starting Cohort 4 in ninth grade. (NEPS Working Paper No. 17). Bamberg: University of Bamberg, National Educational Panel Study.

    Google Scholar 

  • Senkbeil, M., Ihme, J. M., & Wittwer, J. (2013). The Test of Technological and Information Literacy (TILT) in the National Educational Panel Study: Development, empirical testing, and evidence for validity. Journal of Educational Research Online, 5(2), 139–161.

    Google Scholar 

  • Si, C. B. (2002). Ability estimation under different item parameterization and scoring models (Doctoral dissertation). Retrieved from http://digital.library.unt.edu/ark:/67531/metadc3116/m2/1/high_res_d/dissertation.pdf

  • Stucky, B. D. (2009). Item response theory for weighted summed scores (Master’s thesis). Retrieved from https://cdr.lib.unc.edu/indexablecontent?id=uuid:03c49891-0701-47b8-af13-9c1e5b60d52d&ds=DATA_FILE

  • Sykes, R. C., & Hou, L. (2003). Weighting constructed-response items in IRT-based exams. Applied Measurement in Education, 16(4), 257 – 275.

    Google Scholar 

  • Wainer, H., Sireci, S. G., & Thissen, D. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28(3), 237 – 247.

    Google Scholar 

  • Weinert, S., Artelt, C., Prenzel, M., Senkbeil, M., Ehmke, T., & Carstensen C. H. (2011). Development of competencies across the life span. In H.-P. Blossfeld, H.-G. Roßbach, & J. von Maurice (Eds.), Education as a lifelong process: The German National Educational Panel Study (NEPS) (pp. 67 – 86). Wiesbaden: VS Verlag für Sozialwissenschaften.

    Google Scholar 

  • Wongwiwatthananukit S., Bennett, D. E., & Popovich N. G. (2000). Assessing pharmacy student knowledge on multiple-choice examinations using partial-credit scoring of combined-response multiple-choice items. American Journal of Pharmaceutical Education, 64(1), 1 – 10.

    Google Scholar 

  • Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. (2007). ACER ConQuest 2.0—Generalised item response modelling software. Camberwell, Australia: ACER Press.

    Google Scholar 

  • Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187–213.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kerstin Haberkorn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Fachmedien Wiesbaden

About this chapter

Cite this chapter

Haberkorn, K., Pohl, S., Carstensen, C. (2016). Scoring of Complex Multiple Choice Items in NEPS Competence Tests. In: Blossfeld, HP., von Maurice, J., Bayer, M., Skopek, J. (eds) Methodological Issues of Longitudinal Surveys. Springer VS, Wiesbaden. https://doi.org/10.1007/978-3-658-11994-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-658-11994-2_29

  • Published:

  • Publisher Name: Springer VS, Wiesbaden

  • Print ISBN: 978-3-658-11992-8

  • Online ISBN: 978-3-658-11994-2

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics