Measuring Competencies across the Lifespan - Challenges of Linking Test Scores

Pohl, Steffi; Haberkorn, Kerstin; Carstensen, Claus H.

doi:10.1007/978-3-319-20585-4_12

Measuring Competencies across the Lifespan - Challenges of Linking Test Scores

Steffi Pohl⁴,
Kerstin Haberkorn⁵ &
Claus H. Carstensen⁵

Conference paper

1441 Accesses
1 Citations

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 145))

Abstract

The National Educational Panel Study (NEPS) aims at investigating the development of competencies across the whole lifespan. Competencies are assessed via tests and competence scores are estimated based on models of Item Response Theory (IRT). IRT allows a comparison of test scores—and, thus, the investigation of change across time and differences between cohorts—even when the respective competence is measured with different items. As in NEPS for most of the competencies retest effects are assumed, linking is done via additional link studies in which the tests for two age groups are administered to a separate sample of participants. However, in order to be able to link the test results of two different measurement occasions, certain assumptions, such as, that the measures are invariant across samples and that the tests measure the same construct, need to hold. These are challenging assumptions regarding the linking of competencies across the whole lifespan. Before linking reading tests in NEPS for different age cohorts in secondary school as well as in adulthood, we, thus, investigated unidimensionality of the items for different cohorts as well as measurement invariance across samples. Our results show that the tests for different age groups do measure a unidimensional construct within the same sample. However, measurement invariance of the same test across different samples does not hold for all age groups. Thus, the same test exhibits a different measurement model in different samples. Based on our results, linking may well be justified within secondary school, while linking test scores in secondary school with those in adult age is threatened by differences in the measurement model. Possible reasons for these results are discussed and implications for the design of longitudinal studies as well as for possible analyses strategies are drawn.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Newborns started 2012 and the adult sample was pursued from the former ALWA study.
2.
This is possible if measurement invariance for the instruments for comparisons between cohorts can be assumed. One may also investigate and account for cohort effects with this design.
3.
Test targeting is good, when the item difficulties of the test items well fit to the ability levels of the specific target group. A good test targeting enhances reliability of the ability measurement.

References

Alexandrowicz, R. (2008). Wieviel ist “ein bisserl”? Ein neuer zugang zum BIC im rahmen von Latent-Class-Analysen [How much is “a bit”? A new approach to the BIC within the framework of Latent Class Analyses]. In J. Reinecke & C. Tarnai (Eds.), Klassifikationsanalysen in theorie und anwendung (pp. 141–165). Münster: Waxmann.
Google Scholar
Artelt, C., Weinert, S., & Carstensen, C. H. (2013). Assessing competencies across the lifespan within the German National Educational Panel Study (NEPS) – Editorial. Journal for Educational Research Online, 5, 5–14.
Google Scholar
Aßmann, C., Steinhauer, H. W., Kiesl, H., Koch, S., Schönberger, B., Müller-Kuller, A., et al. (2011). Sampling designs of the National Educational Panel Study: Challenges and solutions. Zeitschrift für Erziehungswissenschaft, 14, 51–65.
Article Google Scholar
Blossfeld, H.-P., von Maurice, J., & Schneider, T. (2011). The National Educational Panel Study: Need, main features, and research potential. Zeitschrift für Erziehungswissenschaft, 14, 5–17.
Article Google Scholar
Böhme, K., & Robitzsch, A. (2009). Methodische aspekte der erfassung der lesekompetenz [Methodological aspects of reading assessment]. In D. Granzer, O. Köller, A. Bremerich-Vos, M. van den Heuvel-Panhuizen, K. Reiss, & G. Walther (Eds.), Bildungsstandards Deutsch und mathematik. Leistungsmessung in der grundschule (pp. 250–289). Weinheim: Beltz.
Google Scholar
Camilli, G., Yamamoto, K., & Wang, M. (1993). Scale shrinkage in vertical equating. Applied Psychological Measurement, 17, 379–388.
Article Google Scholar
Carstensen, C. H., Lankes, E. M., & Steffensky, M. (2012). Modellierung von längsschnittlichen daten am beispiel einer quasi-experimentellen studie zur erfassung von naturwissenschaftlichen kompetenzen im kindergartenalter [Modeling of longitudinal data illustrated on a quasi-experimental study of the assessment of scientific competencies in preschool children]. In W. Kempf & R. Langeheine (Eds.), Item-response-modelle in der sozialwissenschaftlichen forschung (pp. 109–126). Berlin: Regener.
Google Scholar
Diamond, J. J., & Evans, W. J. (1972). An investigation of the cognitive correlates of testwiseness. Journal of Educational Measurement, 9, 145–150.
Article Google Scholar
Doran, H. C., & Cohen, J. (2005). The confounding effect of linking bias on gains estimated from value-added models. In R. W. Lissitz (Ed.), Value-added models in education: Theory and applications (pp. 80–104). Maple Grove, MN: JAM Press.
Google Scholar
Dorans, N. J., Pommerich, M., & Holland, P. (Eds.). (2007). Linking and aligning scores and scales. New York, NY: Springer.
MATH Google Scholar
Gehrer, K., Zimmermann, S., Artelt, C., & Weinert, S. (2013). NEPS framework for assessing reading competence and results from an adult pilot study. Journal for Educational Research Online, 5, 50–79.
Google Scholar
Gibb, B. G. (1964). Testwiseness as secondary cue response (Doctoral dissertation). Stanford University, Ann Arbor, Michigan: University Microfilms, 1964. No. 64-7643.
Google Scholar
Haberkorn, K., Pohl, S., Carstensen, C., & Wiegand, E. (in press). Scoring of complex multiple choice items in NEPS competence tests. In H. -P. Blossfeld, J. von Maurice, M. Bayer, & J. Skopek (Eds.), Methodological issues in longitudinal surveys. Springer.
Google Scholar
Haertel, E. (1991). Report on TRP analyses of issues concerning within-age versus across-age scales for the National Assessment of Educational Progress. Washington, DC: National Center for Education Statistics.
Google Scholar
Hahn, I., Schöps, K., Rönnebeck, S., Martensen, M., Hansen, S., Saß, S., et al. (2013). Assessing science literacy over the lifespan – A description of the NEPS science framework and the test development. Journal for Educational Research Online, 5, 110–138.
Google Scholar
Hoover, H. D. (1984). The most appropriate scores for measuring educational development in the elementary schools: GE’s. Educational Measurement: Issues and Practice, 3, 8–14.
Article Google Scholar
Hoover, H. D., Dunbar, S. B., & Frisbie, D. A. (2003). The Iowa Tests: Guide to research and development. Chicago, IL: Riverside Publishing.
Google Scholar
Jones, L. V., & Olkin, I. (Eds.). (2004). The Nation’s Report Card: Evolution and perspectives. Bloomington, IN: Phi Delta Kappa Educational Foundation.
Google Scholar
Klieme, E., Avenarius, H., Blum, W., Döbrich, P., Gruber, H., Prenzel, M., et al. (Eds.). (2003). The development of National Educational Standards. An expertise (Vol. 1). Berlin: BMBF.
Google Scholar
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York, NY: Springer.
Book Google Scholar
Kristen, C., Römmer, A., Müller, W., & Kalter, F. (2005). Longitudinal studies for education reports – European and North American examples, Report commissioned by the Federal Ministry of Education and Research. Bonn, Berlin: Federal Ministry of Education and Research (BMBF).
Google Scholar
Kröhne, U., & Martens, T. (2011). Computer-based competence tests in the national educational panel study: The challenge of mode effects. Zeitschrift für Erziehungswissenschaft, 14, 169–186.
Article Google Scholar
Linn, R. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6, 83–102.
Article Google Scholar
Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193.
Article Google Scholar
Martineau, J. (2006). Distorting value-added: The use of longitudinal, vertically scaled student achievement data for growth-based, value-added accountability. Journal of Educational and Psychological Statistics, 31, 35–62.
Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Article MATH Google Scholar
McClellan, C. A., Donoghue, J. R., Gladkova, L., & Xu, X. (2005). Cross-grade scales in NAEP: Research and real-life experience. Presentation at the conference Longitudinal Modeling of Student Achievement, Maryland Assessment Research Center for Education Success, University of Maryland, College Park, MD.
Google Scholar
Millman, J., Bishop, D. H., & Ebel, R. (1965). An analysis of test wiseness. Educational and Psychological Measurement, 25, 707–726.
Article Google Scholar
Mislevy, R. J. (1992). Linking educational assessments: Concepts, issues, methods, and prospects. Princeton, NJ: ETS Policy Information Center.
Google Scholar
Monseur, C., & Berezner, A. (2007). The computation of equating errors in international surveys in education. Journal of Applied Measurement, 8, 323–335.
Google Scholar
Monseur, C., Sibberns, H., & Hastedt, D. (2008). Linking errors in trend estimation for international surveys in education. In M. von Davier & D. Hastedt (Eds.), Issues and methodologies in large-scale assessments (pp. 113–122). Hamburg: IEA-ETS Research Institute.
Google Scholar
Mullis, I. V., Martin, M. O., Foy, P., & Arora, A. (2012). TIMSS 2011 international results in mathematics. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Google Scholar
Neumann, I., Duchhardt, C., Ehmke, T., Grüßing, M., Heinze, A., & Knopp, E. (2013). Modeling and assessing mathematical competence over the lifespan. Journal for Educational Research Online, 5, 80–109.
Google Scholar
OECD. (2013). PISA 2012 Assessment and analytical framework: Mathematics, reading, science, problem solving, and financial literacy. Paris: OECD Publishing.
Book Google Scholar
Penfield, R. D., & Algina, J. (2006). A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed format tests. Journal of Educational Measurement, 43, 295–312.
Article Google Scholar
Pohl, S. (2014). Longitudinal multi-stage testing. Journal of Educational Measurement, 50, 447–468.
Article Google Scholar
Pohl, S., & Carstensen, C. H. (2012). NEPS Technical Report: Scaling the data of the competence tests (NEPS Working Paper No. 14). Bamberg, Germany: University of Bamberg, National Educational Panel Study.
Google Scholar
Pohl, S., & Carstensen, C. H. (2013). Scaling the competence tests in the National Educational Panel Study—Many questions, some answers, and further challenges. Journal of Educational Research Online, 5, 189–216.
Google Scholar
Pohl, S., Gräfe, L., & Rose, N. (2014). Dealing with omitted and not reached items in competence tests-Evaluating different approaches accounting for missing responses in IRT models. Educational and Psychological Measurement, 74, 423–452.
Article Google Scholar
Pollack, J. M., Atkins-Burnett, S., Najarian, M., & Rock, D. A. (2005). Early Childhood Longitudinal Study, Kindergarten class of 1998–99 (ECLS–K), Psychometric report for the fifth grade. Washington, DC: National Center for Education Statistics. U.S. Department of Education.
Book Google Scholar
Popham, W. J. (2000). Educational measurement. Boston, MA: Allyn and Bacon.
Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielsen & Lydiche.
Google Scholar
Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
Book Google Scholar
Reckase, M. D., & Martineau, J. A. (2004). Growth as a multidimensional process. Paper presented at the Annual Meeting of the Society for Multivariate Experimental Psychology, Naples, FL.
Google Scholar
Robitzsch, A., Dörfler, T., Pfost, M., & Artelt, C. (2011). Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen: Lesekompetenzentwicklung in der Primarstufe [Relevance of item selection and model selection for assessing the development of competencies: The development of reading competence in primary school students]. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 43, 213–227.
Article Google Scholar
Rock, D. A., Pollack, J. M., Owings, J., & Hafner, A. (1991). Psychometric report for the NELS: 88 base year test battery. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement.
Google Scholar
Rock, D. A., Pollack, J. M., & Quinn, P. (1995). Psychometric report of the NELS: 88 base year through second follow-up. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement.
Google Scholar
Rupp, A. A., & Vock, M. (2007). National educational standards in Germany: Methodological challenges for developing and calibrating standards-based tests. In D. Waddington, P. Nentwig, & S. Schanze (Eds.), Making it comparable: Standards in science education (pp. 173–198). Münster: Waxmann.
Google Scholar
Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23, 41–54.
Article Google Scholar
Senkbeil, M., Ihme, J. M., & Wittwer, J. (2013). The test of Technological and Information Literacy (TILT) in the National Educational Panel Study: Development, empirical testing, and evidence for validity. Journal for Educational Research Online, 5, 139–161.
Google Scholar
Thissen, D. (2012). Validity issues involved in cross-grade statements about NAEP results. Washington, DC: American Institutes for Research, NAEP Validity Studies Panel.
Book Google Scholar
Tong, Y., & Kolen, M. (2007). Comparisons of methodologies and results in vertical scaling for educational achievement tests. Applied Measurement in Education, 20, 227–253.
Article Google Scholar
von Davier, A. A., Carstensen, C. H., & von Davier, M. (2008). Linking competencies in horizontal, vertical and longitudinal settings and measuring growth. In J. Hartig, E. Klieme, & D. Leutner (Eds.), Assessment of competencies in educational contexts (pp. 121–149). New York, NY: Hogrefe & Huber.
Google Scholar
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York, NY: Springer.
Book MATH Google Scholar
von Maurice, J., Artelt, C., Blossfeld, H. -P., Faust, G., Rossbach, H. -G., & Weinert, S. (2007). Bildungsprozesse, kompetenzentwicklung und formation von selektionsentscheidungen im vor- und grundschulalter: Überblick über die erhebungen in den längsschnitten BiKS-3-8 und BiKS-8-12 in den ersten beiden projektjahren [Educational processes, competence development and formation of selection decisions in preschool and primary school age: An overview of the first two years of data collection in the longitudinal studies BiKS-3-8 and BiKS-8-12]. Bamberg: Otto-Friedrich-Universität.
Google Scholar
Wang, S., & Jiao, H. (2009). Construct equivalence across grades in a vertical scale for a K-12 large-scale reading assessment. Educational and Psychological Measurement, 69, 760–777.
Article MathSciNet Google Scholar
Wang, S., Jiao, H., & Zahng, L. (2013). Validation of longitudinal achievement constructs of vertically scaled computerized adaptive tests: A multiple-indicator, latent-growth modeling approach. International Journal of Quantitative Research in Education, 1, 383–407.
Article Google Scholar
Weinert, S., Artelt, C., Prenzel, M., Senkbeil, M., Ehmke, T., & Carstensen, C. H. (2011). Development of competencies across the life span. Zeitschrift für Erziehungswissenschaft, 14, 67–86.
Article Google Scholar
Williams, V. S. L., Pommerich, M., & Thissen, D. (1998). A comparison of developmental scales based on Thurstone methods and item response theory. Journal of Educational Measurement, 35, 93–107.
Article Google Scholar
Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1–17.
Article Google Scholar
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort moderated model. Journal of Educational Measurement, 43, 19–38.
Article Google Scholar
Wu, M. (2010). Measurement, sampling, and equating errors in large-scale assessments. Educational Measurement: Issues and Practice, 29, 15–27.
Article Google Scholar
Wu, M., Adams, R. J., Wilson, M., & Haldane, S. (2007). Conquest 2.0 [Computer Software]. Camberwell, VIC: ACER Press.
Google Scholar
Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of Educational Measurement, 23, 299–325.
Article Google Scholar
Zerpa, C., Hachey, K., van Barnfield, C., & Simon, M. (2011). Modeling student motivation and students’ ability estimates from a large-scale assessment of mathematics. Sage Open, 1, 1–9.
Article Google Scholar

Download references

Acknowledgement

This research used data from the National Educational Panel Study (NEPS). From 2008 to 2013, NEPS data were collected as part of the Framework Programme for the Promotion of Empirical Educational Research funded by the German Federal Ministry of Education and Research (BMBF). As of 2014, the NEPS survey is carried out by the Leibniz Institute for Educational Trajectories (LIfBi) at the University of Bamberg in cooperation with a nationwide network.

This research is based on the dedicated work of professors and research assistants within the NEPS. We especially thank Karin Gehrer, Stefan Zimmermann, Cordula Artelt, and Sabine Weinert for developing the tests on reading competence, that are the basis of our research, and Maike Krannich, Michael Wenzler, Theresa Rohm, and Odin Jost for their valuable assistance in analyzing the data. Our thanks also go to the staff of the NEPS administration of surveys and to the methods group.

Author information

Authors and Affiliations

Faculty of Education and Psychology, Free University, Habelschwerdter Allee 45, 14195, Berlin, Germany
Steffi Pohl
Otto-Friedrich-University, Bamberg, Germany
Kerstin Haberkorn & Claus H. Carstensen

Authors

Steffi Pohl
View author publications
You can also search for this author in PubMed Google Scholar
Kerstin Haberkorn
View author publications
You can also search for this author in PubMed Google Scholar
Claus H. Carstensen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steffi Pohl .

Editor information

Editors and Affiliations

Institute of Psychology, University of Erlangen-Nuremberg, Erlangen, Germany
Mark Stemmler
Department of Psychology, Michigan State University, East Lansing, Michigan, USA
Alexander von Eye
Department of Educational, School, and Counseling Psychology,College of Education, University Of Missouri, Columbia, USA
Wolfgang Wiedermann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pohl, S., Haberkorn, K., Carstensen, C.H. (2015). Measuring Competencies across the Lifespan - Challenges of Linking Test Scores. In: Stemmler, M., von Eye, A., Wiedermann, W. (eds) Dependent Data in Social Sciences Research. Springer Proceedings in Mathematics & Statistics, vol 145. Springer, Cham. https://doi.org/10.1007/978-3-319-20585-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-20585-4_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20584-7
Online ISBN: 978-3-319-20585-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics