Scoring of Complex Multiple Choice Items in NEPS Competence Tests

Haberkorn, Kerstin; Pohl, Steffi; Carstensen, Claus H.

doi:10.1007/978-3-658-11994-2_29

Kerstin Haberkorn⁵,
Steffi Pohl⁶ &
Claus H. Carstensen⁵

1492 Accesses

Abstract

In order to precisely assess the cognitive achievement and abilities of students, different types of items are often used in competence tests. In the National Educational Panel Study (NEPS), test instruments also consist of items with different response formats, mainly simple multiple choice (MC) items in which one answer out of four is correct and complex multiple choice (CMC) items comprising several dichotomous “yes/no” subtasks. The different subtasks of CMC items are usually aggregated to a polytomous variable and analyzed via a partial credit model. When developing an appropriate scaling model for the NEPS competence tests, different questions arose concerning the response formats in the partial credit model. Two relevant issues were how the response categories of polytomous CMC variables should be scored in the scaling model and how the different item formats should be weighted. In order to examine which aggregation of item response categories and which item format weighting best models the two response formats of CMC and MC items, different procedures of aggregating response categories and weighting item formats were analyzed in the NEPS, and the appropriateness of these procedures to model the data was evaluated using certain item fit and test fit indices. Results suggest that a differentiated scoring without an aggregation of categories of CMC items best discriminates between persons. Additionally, for the NEPS competence data, an item format weighting of one point for MC items and half a point for each subtask of CMC items yields the best item fit for both MC and CMC items. In this paper, we summarize important results of the research on the implementation of different response formats conducted in the NEPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, R., & Wu, M. (2002). PISA 2000 technical report. Paris, France: OECD.
Google Scholar
Andrich, D. (1985). An elaboration of Guttman scaling with Rasch models for measurement. In N. Brandon-Tuma (Ed.), Sociological methodology (pp. 33–80). San Francisco, CA: Jossey-Bass.
Google Scholar
Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21(1), 65–88.
Google Scholar
Blömeke, S., Kaiser, G., & Lehmann, R. (2010). TEDS-M 2008—Professionelle Kompetenz und Lerngelegenheiten angehender Primarstufenlehrkräfte im internationalen Vergleich. Münster: Waxmann.
Google Scholar
Downing, S. M., & Haladyna, T. M. (Eds.). (2006). Handbook of test development. Mahwah, NJ: L. Erlbaum.
Google Scholar
Downing, S. M. (2006). Selected-response item formats in test development. In S. M. Downing, & T. M. Haladyna (Eds.), Handbook of test development (pp. 3–26). Mahwah, NJ: Erlbaum.
Google Scholar
Ferrara, S., Huynh, H., & Michaels, H. (1999). Contextual explanations of local dependence in item clusters in a large-scale hands-on science performance assessment. Journal of Educational Measurement, 36(1), 119 – 140.
Google Scholar
Gräfe, L. (2012). How to deal with missing responses in competency tests ? A comparison of data- and model-based IRT approaches (Unpublished diploma thesis). Friedrich-Schiller-University Jena, Jena, Germany.
Google Scholar
Hahn, I., Schöps, K., Rönnebeck, S., Martensen, M., Hansen, S., Saß, S., … Prenzel, M. (2013). Assessing scientific literacy over the lifespan—A description of the NEPS science framework and the test development. Journal of Educational Research Online, 5(2), 110 – 138.
Google Scholar
Haberkorn, K., Pohl, S., & Carstensen, C. (2015). Incorporating different response formats of competence tests in an IRT model. Manuscript submitted for publication.
Google Scholar
Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23(1), 17–27.
Google Scholar
Haladyna, T. M., & Rodriguez, M. C. (2013) Developing and validating test items. New York, NY: Routledge.
Google Scholar
Hsu, T. C. (1984). The merits of multiple-answer items as evaluated by using six scoring formulas. Journal of Experimental Education, 52(3), 152–158.
Google Scholar
Huynh, H. (1994). On equivalence between a partial credit item and a set of independent Rasch binary items. Psychometrika, 59(1), 111–119.
Google Scholar
Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Thousand Oaks, CA: Sage.
Book Google Scholar
Lukhele, R., & Sireci, S. G. (1995, April). Using IRT to combine multiple-choice and free-response sections of a test on to a common scale using a priori weights. Paper presented at the annual conference of the National Council on Measurement in Education, San Francisco, CA.
Google Scholar
Masters, G. N. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.
Google Scholar
Neumann, I., Duchardt, C., Grüßing, M., Heinze, A., Knopp, E., & Ehmke, T. (2013). Modeling and assessing mathematical competence over the lifespan. Journal of Educational Research Online, 5(2), 80 – 109.
Google Scholar
OECD (2009). PISA 2006 technical report. Paris, France: OECD.
Book Google Scholar
Olson, J. F., Martin, M. O., & Mullis, I. V. S. (Eds.). (2008). TIMSS 2007 technical report. Chestnut Hill, MA: Boston College.
Google Scholar
Osterlind, S. J. (1998). Constructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Dordrecht, Netherlands: Kluwer Academic.
Google Scholar
Penfield, R. D., Myers, N. D., & Wolfe, E. W. (2008). Methods for assessing item, step, and threshold invariance. Polytomous items following the partial credit model. Educational and Psychological Measurement, 68(5), 717 – 733.
Google Scholar
Pohl, S., & Carstensen, C. H. (2012). NEPS technical report—Scaling the data of the competence tests. (NEPS Working Paper No. 14). Bamberg: University of Bamberg, National Educational Panel Study.
Google Scholar
Pohl, S., & Carstensen, C. H. (2013). Scaling the competence tests in the National Educational Panel Study—Many questions, some answers, and further challenges. Journal of Educational Research Online, 5(2), 189 – 216.
Google Scholar
Pohl, S., Gräfe, L., & Rose, N. (2013). Dealing with omitted and not reached items in competence tests—Evaluating approaches accounting for missing responses in IRT models. Educational and Psychological Measurement, 74(3), 423 – 452.
Google Scholar
Rodriguez, M. (2002). Choosing an item format. In G. Tindal, & T. M. Haladyna (Eds.), Large-scale assessment programs for all students: Validity, technical adequacy, and implementation (pp. 213–231). Mahwah, NJ: Erlbaum.
Google Scholar
Schöps K., & Saß, S. (2013). NEPS technical report for science—Scaling results of Starting Cohort 4 in ninth grade. (NEPS Working Paper No 23). Bamberg: University of Bamberg, National Educational Panel Study.
Google Scholar
Senkbeil, M. & Ihme, J. M. (2012). NEPS technical report for computer literacy—Scaling results of Starting Cohort 4 in ninth grade. (NEPS Working Paper No. 17). Bamberg: University of Bamberg, National Educational Panel Study.
Google Scholar
Senkbeil, M., Ihme, J. M., & Wittwer, J. (2013). The Test of Technological and Information Literacy (TILT) in the National Educational Panel Study: Development, empirical testing, and evidence for validity. Journal of Educational Research Online, 5(2), 139–161.
Google Scholar
Si, C. B. (2002). Ability estimation under different item parameterization and scoring models (Doctoral dissertation). Retrieved from http://digital.library.unt.edu/ark:/67531/metadc3116/m2/1/high_res_d/dissertation.pdf
Stucky, B. D. (2009). Item response theory for weighted summed scores (Master’s thesis). Retrieved from https://cdr.lib.unc.edu/indexablecontent?id=uuid:03c49891-0701-47b8-af13-9c1e5b60d52d&ds=DATA_FILE
Sykes, R. C., & Hou, L. (2003). Weighting constructed-response items in IRT-based exams. Applied Measurement in Education, 16(4), 257 – 275.
Google Scholar
Wainer, H., Sireci, S. G., & Thissen, D. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28(3), 237 – 247.
Google Scholar
Weinert, S., Artelt, C., Prenzel, M., Senkbeil, M., Ehmke, T., & Carstensen C. H. (2011). Development of competencies across the life span. In H.-P. Blossfeld, H.-G. Roßbach, & J. von Maurice (Eds.), Education as a lifelong process: The German National Educational Panel Study (NEPS) (pp. 67 – 86). Wiesbaden: VS Verlag für Sozialwissenschaften.
Google Scholar
Wongwiwatthananukit S., Bennett, D. E., & Popovich N. G. (2000). Assessing pharmacy student knowledge on multiple-choice examinations using partial-credit scoring of combined-response multiple-choice items. American Journal of Pharmaceutical Education, 64(1), 1 – 10.
Google Scholar
Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. (2007). ACER ConQuest 2.0—Generalised item response modelling software. Camberwell, Australia: ACER Press.
Google Scholar
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187–213.
Google Scholar

Download references

Author information

Authors and Affiliations

Bamberg, Deutschland
Kerstin Haberkorn & Claus H. Carstensen
Berlin, Deutschland
Steffi Pohl

Authors

Kerstin Haberkorn
View author publications
You can also search for this author in PubMed Google Scholar
Steffi Pohl
View author publications
You can also search for this author in PubMed Google Scholar
Claus H. Carstensen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kerstin Haberkorn .

Editor information

Editors and Affiliations

Political and Social Sciences, European University Institute, Florence, Italy
Hans-Peter Blossfeld
Direktorium, Leibniz Inst of Edu courses (LIfBi), Bamberg, Germany
Jutta von Maurice
LIfBi, Bamberg, Germany
Michael Bayer
SPS, European University Institute, Florence, Italy
Jan Skopek

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Haberkorn, K., Pohl, S., Carstensen, C. (2016). Scoring of Complex Multiple Choice Items in NEPS Competence Tests. In: Blossfeld, HP., von Maurice, J., Bayer, M., Skopek, J. (eds) Methodological Issues of Longitudinal Surveys. Springer VS, Wiesbaden. https://doi.org/10.1007/978-3-658-11994-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-658-11994-2_29
Published: 02 April 2016
Publisher Name: Springer VS, Wiesbaden
Print ISBN: 978-3-658-11992-8
Online ISBN: 978-3-658-11994-2
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics