Abstract
The National Educational Panel Study (NEPS) aims at investigating the development of competencies across the whole lifespan. Competencies are assessed via tests and competence scores are estimated based on models of Item Response Theory (IRT). IRT allows a comparison of test scores—and, thus, the investigation of change across time and differences between cohorts—even when the respective competence is measured with different items. As in NEPS for most of the competencies retest effects are assumed, linking is done via additional link studies in which the tests for two age groups are administered to a separate sample of participants. However, in order to be able to link the test results of two different measurement occasions, certain assumptions, such as, that the measures are invariant across samples and that the tests measure the same construct, need to hold. These are challenging assumptions regarding the linking of competencies across the whole lifespan. Before linking reading tests in NEPS for different age cohorts in secondary school as well as in adulthood, we, thus, investigated unidimensionality of the items for different cohorts as well as measurement invariance across samples. Our results show that the tests for different age groups do measure a unidimensional construct within the same sample. However, measurement invariance of the same test across different samples does not hold for all age groups. Thus, the same test exhibits a different measurement model in different samples. Based on our results, linking may well be justified within secondary school, while linking test scores in secondary school with those in adult age is threatened by differences in the measurement model. Possible reasons for these results are discussed and implications for the design of longitudinal studies as well as for possible analyses strategies are drawn.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Newborns started 2012 and the adult sample was pursued from the former ALWA study.
- 2.
This is possible if measurement invariance for the instruments for comparisons between cohorts can be assumed. One may also investigate and account for cohort effects with this design.
- 3.
Test targeting is good, when the item difficulties of the test items well fit to the ability levels of the specific target group. A good test targeting enhances reliability of the ability measurement.
References
Alexandrowicz, R. (2008). Wieviel ist “ein bisserl”? Ein neuer zugang zum BIC im rahmen von Latent-Class-Analysen [How much is “a bit”? A new approach to the BIC within the framework of Latent Class Analyses]. In J. Reinecke & C. Tarnai (Eds.), Klassifikationsanalysen in theorie und anwendung (pp. 141–165). Münster: Waxmann.
Artelt, C., Weinert, S., & Carstensen, C. H. (2013). Assessing competencies across the lifespan within the German National Educational Panel Study (NEPS) – Editorial. Journal for Educational Research Online, 5, 5–14.
Aßmann, C., Steinhauer, H. W., Kiesl, H., Koch, S., Schönberger, B., Müller-Kuller, A., et al. (2011). Sampling designs of the National Educational Panel Study: Challenges and solutions. Zeitschrift für Erziehungswissenschaft, 14, 51–65.
Blossfeld, H.-P., von Maurice, J., & Schneider, T. (2011). The National Educational Panel Study: Need, main features, and research potential. Zeitschrift für Erziehungswissenschaft, 14, 5–17.
Böhme, K., & Robitzsch, A. (2009). Methodische aspekte der erfassung der lesekompetenz [Methodological aspects of reading assessment]. In D. Granzer, O. Köller, A. Bremerich-Vos, M. van den Heuvel-Panhuizen, K. Reiss, & G. Walther (Eds.), Bildungsstandards Deutsch und mathematik. Leistungsmessung in der grundschule (pp. 250–289). Weinheim: Beltz.
Camilli, G., Yamamoto, K., & Wang, M. (1993). Scale shrinkage in vertical equating. Applied Psychological Measurement, 17, 379–388.
Carstensen, C. H., Lankes, E. M., & Steffensky, M. (2012). Modellierung von längsschnittlichen daten am beispiel einer quasi-experimentellen studie zur erfassung von naturwissenschaftlichen kompetenzen im kindergartenalter [Modeling of longitudinal data illustrated on a quasi-experimental study of the assessment of scientific competencies in preschool children]. In W. Kempf & R. Langeheine (Eds.), Item-response-modelle in der sozialwissenschaftlichen forschung (pp. 109–126). Berlin: Regener.
Diamond, J. J., & Evans, W. J. (1972). An investigation of the cognitive correlates of testwiseness. Journal of Educational Measurement, 9, 145–150.
Doran, H. C., & Cohen, J. (2005). The confounding effect of linking bias on gains estimated from value-added models. In R. W. Lissitz (Ed.), Value-added models in education: Theory and applications (pp. 80–104). Maple Grove, MN: JAM Press.
Dorans, N. J., Pommerich, M., & Holland, P. (Eds.). (2007). Linking and aligning scores and scales. New York, NY: Springer.
Gehrer, K., Zimmermann, S., Artelt, C., & Weinert, S. (2013). NEPS framework for assessing reading competence and results from an adult pilot study. Journal for Educational Research Online, 5, 50–79.
Gibb, B. G. (1964). Testwiseness as secondary cue response (Doctoral dissertation). Stanford University, Ann Arbor, Michigan: University Microfilms, 1964. No. 64-7643.
Haberkorn, K., Pohl, S., Carstensen, C., & Wiegand, E. (in press). Scoring of complex multiple choice items in NEPS competence tests. In H. -P. Blossfeld, J. von Maurice, M. Bayer, & J. Skopek (Eds.), Methodological issues in longitudinal surveys. Springer.
Haertel, E. (1991). Report on TRP analyses of issues concerning within-age versus across-age scales for the National Assessment of Educational Progress. Washington, DC: National Center for Education Statistics.
Hahn, I., Schöps, K., Rönnebeck, S., Martensen, M., Hansen, S., Saß, S., et al. (2013). Assessing science literacy over the lifespan – A description of the NEPS science framework and the test development. Journal for Educational Research Online, 5, 110–138.
Hoover, H. D. (1984). The most appropriate scores for measuring educational development in the elementary schools: GE’s. Educational Measurement: Issues and Practice, 3, 8–14.
Hoover, H. D., Dunbar, S. B., & Frisbie, D. A. (2003). The Iowa Tests: Guide to research and development. Chicago, IL: Riverside Publishing.
Jones, L. V., & Olkin, I. (Eds.). (2004). The Nation’s Report Card: Evolution and perspectives. Bloomington, IN: Phi Delta Kappa Educational Foundation.
Klieme, E., Avenarius, H., Blum, W., Döbrich, P., Gruber, H., Prenzel, M., et al. (Eds.). (2003). The development of National Educational Standards. An expertise (Vol. 1). Berlin: BMBF.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York, NY: Springer.
Kristen, C., Römmer, A., Müller, W., & Kalter, F. (2005). Longitudinal studies for education reports – European and North American examples, Report commissioned by the Federal Ministry of Education and Research. Bonn, Berlin: Federal Ministry of Education and Research (BMBF).
Kröhne, U., & Martens, T. (2011). Computer-based competence tests in the national educational panel study: The challenge of mode effects. Zeitschrift für Erziehungswissenschaft, 14, 169–186.
Linn, R. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6, 83–102.
Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193.
Martineau, J. (2006). Distorting value-added: The use of longitudinal, vertically scaled student achievement data for growth-based, value-added accountability. Journal of Educational and Psychological Statistics, 31, 35–62.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
McClellan, C. A., Donoghue, J. R., Gladkova, L., & Xu, X. (2005). Cross-grade scales in NAEP: Research and real-life experience. Presentation at the conference Longitudinal Modeling of Student Achievement, Maryland Assessment Research Center for Education Success, University of Maryland, College Park, MD.
Millman, J., Bishop, D. H., & Ebel, R. (1965). An analysis of test wiseness. Educational and Psychological Measurement, 25, 707–726.
Mislevy, R. J. (1992). Linking educational assessments: Concepts, issues, methods, and prospects. Princeton, NJ: ETS Policy Information Center.
Monseur, C., & Berezner, A. (2007). The computation of equating errors in international surveys in education. Journal of Applied Measurement, 8, 323–335.
Monseur, C., Sibberns, H., & Hastedt, D. (2008). Linking errors in trend estimation for international surveys in education. In M. von Davier & D. Hastedt (Eds.), Issues and methodologies in large-scale assessments (pp. 113–122). Hamburg: IEA-ETS Research Institute.
Mullis, I. V., Martin, M. O., Foy, P., & Arora, A. (2012). TIMSS 2011 international results in mathematics. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Neumann, I., Duchhardt, C., Ehmke, T., Grüßing, M., Heinze, A., & Knopp, E. (2013). Modeling and assessing mathematical competence over the lifespan. Journal for Educational Research Online, 5, 80–109.
OECD. (2013). PISA 2012 Assessment and analytical framework: Mathematics, reading, science, problem solving, and financial literacy. Paris: OECD Publishing.
Penfield, R. D., & Algina, J. (2006). A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed format tests. Journal of Educational Measurement, 43, 295–312.
Pohl, S. (2014). Longitudinal multi-stage testing. Journal of Educational Measurement, 50, 447–468.
Pohl, S., & Carstensen, C. H. (2012). NEPS Technical Report: Scaling the data of the competence tests (NEPS Working Paper No. 14). Bamberg, Germany: University of Bamberg, National Educational Panel Study.
Pohl, S., & Carstensen, C. H. (2013). Scaling the competence tests in the National Educational Panel Study—Many questions, some answers, and further challenges. Journal of Educational Research Online, 5, 189–216.
Pohl, S., Gräfe, L., & Rose, N. (2014). Dealing with omitted and not reached items in competence tests-Evaluating different approaches accounting for missing responses in IRT models. Educational and Psychological Measurement, 74, 423–452.
Pollack, J. M., Atkins-Burnett, S., Najarian, M., & Rock, D. A. (2005). Early Childhood Longitudinal Study, Kindergarten class of 1998–99 (ECLS–K), Psychometric report for the fifth grade. Washington, DC: National Center for Education Statistics. U.S. Department of Education.
Popham, W. J. (2000). Educational measurement. Boston, MA: Allyn and Bacon.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielsen & Lydiche.
Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
Reckase, M. D., & Martineau, J. A. (2004). Growth as a multidimensional process. Paper presented at the Annual Meeting of the Society for Multivariate Experimental Psychology, Naples, FL.
Robitzsch, A., Dörfler, T., Pfost, M., & Artelt, C. (2011). Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen: Lesekompetenzentwicklung in der Primarstufe [Relevance of item selection and model selection for assessing the development of competencies: The development of reading competence in primary school students]. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 43, 213–227.
Rock, D. A., Pollack, J. M., Owings, J., & Hafner, A. (1991). Psychometric report for the NELS: 88 base year test battery. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement.
Rock, D. A., Pollack, J. M., & Quinn, P. (1995). Psychometric report of the NELS: 88 base year through second follow-up. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement.
Rupp, A. A., & Vock, M. (2007). National educational standards in Germany: Methodological challenges for developing and calibrating standards-based tests. In D. Waddington, P. Nentwig, & S. Schanze (Eds.), Making it comparable: Standards in science education (pp. 173–198). Münster: Waxmann.
Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23, 41–54.
Senkbeil, M., Ihme, J. M., & Wittwer, J. (2013). The test of Technological and Information Literacy (TILT) in the National Educational Panel Study: Development, empirical testing, and evidence for validity. Journal for Educational Research Online, 5, 139–161.
Thissen, D. (2012). Validity issues involved in cross-grade statements about NAEP results. Washington, DC: American Institutes for Research, NAEP Validity Studies Panel.
Tong, Y., & Kolen, M. (2007). Comparisons of methodologies and results in vertical scaling for educational achievement tests. Applied Measurement in Education, 20, 227–253.
von Davier, A. A., Carstensen, C. H., & von Davier, M. (2008). Linking competencies in horizontal, vertical and longitudinal settings and measuring growth. In J. Hartig, E. Klieme, & D. Leutner (Eds.), Assessment of competencies in educational contexts (pp. 121–149). New York, NY: Hogrefe & Huber.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York, NY: Springer.
von Maurice, J., Artelt, C., Blossfeld, H. -P., Faust, G., Rossbach, H. -G., & Weinert, S. (2007). Bildungsprozesse, kompetenzentwicklung und formation von selektionsentscheidungen im vor- und grundschulalter: Überblick über die erhebungen in den längsschnitten BiKS-3-8 und BiKS-8-12 in den ersten beiden projektjahren [Educational processes, competence development and formation of selection decisions in preschool and primary school age: An overview of the first two years of data collection in the longitudinal studies BiKS-3-8 and BiKS-8-12]. Bamberg: Otto-Friedrich-Universität.
Wang, S., & Jiao, H. (2009). Construct equivalence across grades in a vertical scale for a K-12 large-scale reading assessment. Educational and Psychological Measurement, 69, 760–777.
Wang, S., Jiao, H., & Zahng, L. (2013). Validation of longitudinal achievement constructs of vertically scaled computerized adaptive tests: A multiple-indicator, latent-growth modeling approach. International Journal of Quantitative Research in Education, 1, 383–407.
Weinert, S., Artelt, C., Prenzel, M., Senkbeil, M., Ehmke, T., & Carstensen, C. H. (2011). Development of competencies across the life span. Zeitschrift für Erziehungswissenschaft, 14, 67–86.
Williams, V. S. L., Pommerich, M., & Thissen, D. (1998). A comparison of developmental scales based on Thurstone methods and item response theory. Journal of Educational Measurement, 35, 93–107.
Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1–17.
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort moderated model. Journal of Educational Measurement, 43, 19–38.
Wu, M. (2010). Measurement, sampling, and equating errors in large-scale assessments. Educational Measurement: Issues and Practice, 29, 15–27.
Wu, M., Adams, R. J., Wilson, M., & Haldane, S. (2007). Conquest 2.0 [Computer Software]. Camberwell, VIC: ACER Press.
Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of Educational Measurement, 23, 299–325.
Zerpa, C., Hachey, K., van Barnfield, C., & Simon, M. (2011). Modeling student motivation and students’ ability estimates from a large-scale assessment of mathematics. Sage Open, 1, 1–9.
Acknowledgement
This research used data from the National Educational Panel Study (NEPS). From 2008 to 2013, NEPS data were collected as part of the Framework Programme for the Promotion of Empirical Educational Research funded by the German Federal Ministry of Education and Research (BMBF). As of 2014, the NEPS survey is carried out by the Leibniz Institute for Educational Trajectories (LIfBi) at the University of Bamberg in cooperation with a nationwide network.
This research is based on the dedicated work of professors and research assistants within the NEPS. We especially thank Karin Gehrer, Stefan Zimmermann, Cordula Artelt, and Sabine Weinert for developing the tests on reading competence, that are the basis of our research, and Maike Krannich, Michael Wenzler, Theresa Rohm, and Odin Jost for their valuable assistance in analyzing the data. Our thanks also go to the staff of the NEPS administration of surveys and to the methods group.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Pohl, S., Haberkorn, K., Carstensen, C.H. (2015). Measuring Competencies across the Lifespan - Challenges of Linking Test Scores. In: Stemmler, M., von Eye, A., Wiedermann, W. (eds) Dependent Data in Social Sciences Research. Springer Proceedings in Mathematics & Statistics, vol 145. Springer, Cham. https://doi.org/10.1007/978-3-319-20585-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-20585-4_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20584-7
Online ISBN: 978-3-319-20585-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)