Skip to main content

Measuring Competencies across the Lifespan - Challenges of Linking Test Scores

  • Conference paper

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 145))

Abstract

The National Educational Panel Study (NEPS) aims at investigating the development of competencies across the whole lifespan. Competencies are assessed via tests and competence scores are estimated based on models of Item Response Theory (IRT). IRT allows a comparison of test scores—and, thus, the investigation of change across time and differences between cohorts—even when the respective competence is measured with different items. As in NEPS for most of the competencies retest effects are assumed, linking is done via additional link studies in which the tests for two age groups are administered to a separate sample of participants. However, in order to be able to link the test results of two different measurement occasions, certain assumptions, such as, that the measures are invariant across samples and that the tests measure the same construct, need to hold. These are challenging assumptions regarding the linking of competencies across the whole lifespan. Before linking reading tests in NEPS for different age cohorts in secondary school as well as in adulthood, we, thus, investigated unidimensionality of the items for different cohorts as well as measurement invariance across samples. Our results show that the tests for different age groups do measure a unidimensional construct within the same sample. However, measurement invariance of the same test across different samples does not hold for all age groups. Thus, the same test exhibits a different measurement model in different samples. Based on our results, linking may well be justified within secondary school, while linking test scores in secondary school with those in adult age is threatened by differences in the measurement model. Possible reasons for these results are discussed and implications for the design of longitudinal studies as well as for possible analyses strategies are drawn.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Newborns started 2012 and the adult sample was pursued from the former ALWA study.

  2. 2.

    This is possible if measurement invariance for the instruments for comparisons between cohorts can be assumed. One may also investigate and account for cohort effects with this design.

  3. 3.

    Test targeting is good, when the item difficulties of the test items well fit to the ability levels of the specific target group. A good test targeting enhances reliability of the ability measurement.

References

  • Alexandrowicz, R. (2008). Wieviel ist “ein bisserl”? Ein neuer zugang zum BIC im rahmen von Latent-Class-Analysen [How much is “a bit”? A new approach to the BIC within the framework of Latent Class Analyses]. In J. Reinecke & C. Tarnai (Eds.), Klassifikationsanalysen in theorie und anwendung (pp. 141–165). Münster: Waxmann.

    Google Scholar 

  • Artelt, C., Weinert, S., & Carstensen, C. H. (2013). Assessing competencies across the lifespan within the German National Educational Panel Study (NEPS) – Editorial. Journal for Educational Research Online, 5, 5–14.

    Google Scholar 

  • Aßmann, C., Steinhauer, H. W., Kiesl, H., Koch, S., Schönberger, B., Müller-Kuller, A., et al. (2011). Sampling designs of the National Educational Panel Study: Challenges and solutions. Zeitschrift für Erziehungswissenschaft, 14, 51–65.

    Article  Google Scholar 

  • Blossfeld, H.-P., von Maurice, J., & Schneider, T. (2011). The National Educational Panel Study: Need, main features, and research potential. Zeitschrift für Erziehungswissenschaft, 14, 5–17.

    Article  Google Scholar 

  • Böhme, K., & Robitzsch, A. (2009). Methodische aspekte der erfassung der lesekompetenz [Methodological aspects of reading assessment]. In D. Granzer, O. Köller, A. Bremerich-Vos, M. van den Heuvel-Panhuizen, K. Reiss, & G. Walther (Eds.), Bildungsstandards Deutsch und mathematik. Leistungsmessung in der grundschule (pp. 250–289). Weinheim: Beltz.

    Google Scholar 

  • Camilli, G., Yamamoto, K., & Wang, M. (1993). Scale shrinkage in vertical equating. Applied Psychological Measurement, 17, 379–388.

    Article  Google Scholar 

  • Carstensen, C. H., Lankes, E. M., & Steffensky, M. (2012). Modellierung von längsschnittlichen daten am beispiel einer quasi-experimentellen studie zur erfassung von naturwissenschaftlichen kompetenzen im kindergartenalter [Modeling of longitudinal data illustrated on a quasi-experimental study of the assessment of scientific competencies in preschool children]. In W. Kempf & R. Langeheine (Eds.), Item-response-modelle in der sozialwissenschaftlichen forschung (pp. 109–126). Berlin: Regener.

    Google Scholar 

  • Diamond, J. J., & Evans, W. J. (1972). An investigation of the cognitive correlates of testwiseness. Journal of Educational Measurement, 9, 145–150.

    Article  Google Scholar 

  • Doran, H. C., & Cohen, J. (2005). The confounding effect of linking bias on gains estimated from value-added models. In R. W. Lissitz (Ed.), Value-added models in education: Theory and applications (pp. 80–104). Maple Grove, MN: JAM Press.

    Google Scholar 

  • Dorans, N. J., Pommerich, M., & Holland, P. (Eds.). (2007). Linking and aligning scores and scales. New York, NY: Springer.

    MATH  Google Scholar 

  • Gehrer, K., Zimmermann, S., Artelt, C., & Weinert, S. (2013). NEPS framework for assessing reading competence and results from an adult pilot study. Journal for Educational Research Online, 5, 50–79.

    Google Scholar 

  • Gibb, B. G. (1964). Testwiseness as secondary cue response (Doctoral dissertation). Stanford University, Ann Arbor, Michigan: University Microfilms, 1964. No. 64-7643.

    Google Scholar 

  • Haberkorn, K., Pohl, S., Carstensen, C., & Wiegand, E. (in press). Scoring of complex multiple choice items in NEPS competence tests. In H. -P. Blossfeld, J. von Maurice, M. Bayer, & J. Skopek (Eds.), Methodological issues in longitudinal surveys. Springer.

    Google Scholar 

  • Haertel, E. (1991). Report on TRP analyses of issues concerning within-age versus across-age scales for the National Assessment of Educational Progress. Washington, DC: National Center for Education Statistics.

    Google Scholar 

  • Hahn, I., Schöps, K., Rönnebeck, S., Martensen, M., Hansen, S., Saß, S., et al. (2013). Assessing science literacy over the lifespan – A description of the NEPS science framework and the test development. Journal for Educational Research Online, 5, 110–138.

    Google Scholar 

  • Hoover, H. D. (1984). The most appropriate scores for measuring educational development in the elementary schools: GE’s. Educational Measurement: Issues and Practice, 3, 8–14.

    Article  Google Scholar 

  • Hoover, H. D., Dunbar, S. B., & Frisbie, D. A. (2003). The Iowa Tests: Guide to research and development. Chicago, IL: Riverside Publishing.

    Google Scholar 

  • Jones, L. V., & Olkin, I. (Eds.). (2004). The Nation’s Report Card: Evolution and perspectives. Bloomington, IN: Phi Delta Kappa Educational Foundation.

    Google Scholar 

  • Klieme, E., Avenarius, H., Blum, W., Döbrich, P., Gruber, H., Prenzel, M., et al. (Eds.). (2003). The development of National Educational Standards. An expertise (Vol. 1). Berlin: BMBF.

    Google Scholar 

  • Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices. New York, NY: Springer.

    Book  Google Scholar 

  • Kristen, C., Römmer, A., Müller, W., & Kalter, F. (2005). Longitudinal studies for education reports – European and North American examples, Report commissioned by the Federal Ministry of Education and Research. Bonn, Berlin: Federal Ministry of Education and Research (BMBF).

    Google Scholar 

  • Kröhne, U., & Martens, T. (2011). Computer-based competence tests in the national educational panel study: The challenge of mode effects. Zeitschrift für Erziehungswissenschaft, 14, 169–186.

    Article  Google Scholar 

  • Linn, R. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6, 83–102.

    Article  Google Scholar 

  • Loyd, B. H., & Hoover, H. D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179–193.

    Article  Google Scholar 

  • Martineau, J. (2006). Distorting value-added: The use of longitudinal, vertically scaled student achievement data for growth-based, value-added accountability. Journal of Educational and Psychological Statistics, 31, 35–62.

    Google Scholar 

  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.

    Article  MATH  Google Scholar 

  • McClellan, C. A., Donoghue, J. R., Gladkova, L., & Xu, X. (2005). Cross-grade scales in NAEP: Research and real-life experience. Presentation at the conference Longitudinal Modeling of Student Achievement, Maryland Assessment Research Center for Education Success, University of Maryland, College Park, MD.

    Google Scholar 

  • Millman, J., Bishop, D. H., & Ebel, R. (1965). An analysis of test wiseness. Educational and Psychological Measurement, 25, 707–726.

    Article  Google Scholar 

  • Mislevy, R. J. (1992). Linking educational assessments: Concepts, issues, methods, and prospects. Princeton, NJ: ETS Policy Information Center.

    Google Scholar 

  • Monseur, C., & Berezner, A. (2007). The computation of equating errors in international surveys in education. Journal of Applied Measurement, 8, 323–335.

    Google Scholar 

  • Monseur, C., Sibberns, H., & Hastedt, D. (2008). Linking errors in trend estimation for international surveys in education. In M. von Davier & D. Hastedt (Eds.), Issues and methodologies in large-scale assessments (pp. 113–122). Hamburg: IEA-ETS Research Institute.

    Google Scholar 

  • Mullis, I. V., Martin, M. O., Foy, P., & Arora, A. (2012). TIMSS 2011 international results in mathematics. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.

    Google Scholar 

  • Neumann, I., Duchhardt, C., Ehmke, T., Grüßing, M., Heinze, A., & Knopp, E. (2013). Modeling and assessing mathematical competence over the lifespan. Journal for Educational Research Online, 5, 80–109.

    Google Scholar 

  • OECD. (2013). PISA 2012 Assessment and analytical framework: Mathematics, reading, science, problem solving, and financial literacy. Paris: OECD Publishing.

    Book  Google Scholar 

  • Penfield, R. D., & Algina, J. (2006). A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed format tests. Journal of Educational Measurement, 43, 295–312.

    Article  Google Scholar 

  • Pohl, S. (2014). Longitudinal multi-stage testing. Journal of Educational Measurement, 50, 447–468.

    Article  Google Scholar 

  • Pohl, S., & Carstensen, C. H. (2012). NEPS Technical Report: Scaling the data of the competence tests (NEPS Working Paper No. 14). Bamberg, Germany: University of Bamberg, National Educational Panel Study.

    Google Scholar 

  • Pohl, S., & Carstensen, C. H. (2013). Scaling the competence tests in the National Educational Panel Study—Many questions, some answers, and further challenges. Journal of Educational Research Online, 5, 189–216.

    Google Scholar 

  • Pohl, S., Gräfe, L., & Rose, N. (2014). Dealing with omitted and not reached items in competence tests-Evaluating different approaches accounting for missing responses in IRT models. Educational and Psychological Measurement, 74, 423–452.

    Article  Google Scholar 

  • Pollack, J. M., Atkins-Burnett, S., Najarian, M., & Rock, D. A. (2005). Early Childhood Longitudinal Study, Kindergarten class of 1998–99 (ECLS–K), Psychometric report for the fifth grade. Washington, DC: National Center for Education Statistics. U.S. Department of Education.

    Book  Google Scholar 

  • Popham, W. J. (2000). Educational measurement. Boston, MA: Allyn and Bacon.

    Google Scholar 

  • Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielsen & Lydiche.

    Google Scholar 

  • Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.

    Book  Google Scholar 

  • Reckase, M. D., & Martineau, J. A. (2004). Growth as a multidimensional process. Paper presented at the Annual Meeting of the Society for Multivariate Experimental Psychology, Naples, FL.

    Google Scholar 

  • Robitzsch, A., Dörfler, T., Pfost, M., & Artelt, C. (2011). Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen: Lesekompetenzentwicklung in der Primarstufe [Relevance of item selection and model selection for assessing the development of competencies: The development of reading competence in primary school students]. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 43, 213–227.

    Article  Google Scholar 

  • Rock, D. A., Pollack, J. M., Owings, J., & Hafner, A. (1991). Psychometric report for the NELS: 88 base year test battery. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement.

    Google Scholar 

  • Rock, D. A., Pollack, J. M., & Quinn, P. (1995). Psychometric report of the NELS: 88 base year through second follow-up. Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement.

    Google Scholar 

  • Rupp, A. A., & Vock, M. (2007). National educational standards in Germany: Methodological challenges for developing and calibrating standards-based tests. In D. Waddington, P. Nentwig, & S. Schanze (Eds.), Making it comparable: Standards in science education (pp. 173–198). Münster: Waxmann.

    Google Scholar 

  • Schmitt, N., Chan, D., Sacco, J. M., McFarland, L. A., & Jennings, D. (1999). Correlates of person fit and effect of person fit on test validity. Applied Psychological Measurement, 23, 41–54.

    Article  Google Scholar 

  • Senkbeil, M., Ihme, J. M., & Wittwer, J. (2013). The test of Technological and Information Literacy (TILT) in the National Educational Panel Study: Development, empirical testing, and evidence for validity. Journal for Educational Research Online, 5, 139–161.

    Google Scholar 

  • Thissen, D. (2012). Validity issues involved in cross-grade statements about NAEP results. Washington, DC: American Institutes for Research, NAEP Validity Studies Panel.

    Book  Google Scholar 

  • Tong, Y., & Kolen, M. (2007). Comparisons of methodologies and results in vertical scaling for educational achievement tests. Applied Measurement in Education, 20, 227–253.

    Article  Google Scholar 

  • von Davier, A. A., Carstensen, C. H., & von Davier, M. (2008). Linking competencies in horizontal, vertical and longitudinal settings and measuring growth. In J. Hartig, E. Klieme, & D. Leutner (Eds.), Assessment of competencies in educational contexts (pp. 121–149). New York, NY: Hogrefe & Huber.

    Google Scholar 

  • von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York, NY: Springer.

    Book  MATH  Google Scholar 

  • von Maurice, J., Artelt, C., Blossfeld, H. -P., Faust, G., Rossbach, H. -G., & Weinert, S. (2007). Bildungsprozesse, kompetenzentwicklung und formation von selektionsentscheidungen im vor- und grundschulalter: Überblick über die erhebungen in den längsschnitten BiKS-3-8 und BiKS-8-12 in den ersten beiden projektjahren [Educational processes, competence development and formation of selection decisions in preschool and primary school age: An overview of the first two years of data collection in the longitudinal studies BiKS-3-8 and BiKS-8-12]. Bamberg: Otto-Friedrich-Universität.

    Google Scholar 

  • Wang, S., & Jiao, H. (2009). Construct equivalence across grades in a vertical scale for a K-12 large-scale reading assessment. Educational and Psychological Measurement, 69, 760–777.

    Article  MathSciNet  Google Scholar 

  • Wang, S., Jiao, H., & Zahng, L. (2013). Validation of longitudinal achievement constructs of vertically scaled computerized adaptive tests: A multiple-indicator, latent-growth modeling approach. International Journal of Quantitative Research in Education, 1, 383–407.

    Article  Google Scholar 

  • Weinert, S., Artelt, C., Prenzel, M., Senkbeil, M., Ehmke, T., & Carstensen, C. H. (2011). Development of competencies across the life span. Zeitschrift für Erziehungswissenschaft, 14, 67–86.

    Article  Google Scholar 

  • Williams, V. S. L., Pommerich, M., & Thissen, D. (1998). A comparison of developmental scales based on Thurstone methods and item response theory. Journal of Educational Measurement, 35, 93–107.

    Article  Google Scholar 

  • Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1–17.

    Article  Google Scholar 

  • Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort moderated model. Journal of Educational Measurement, 43, 19–38.

    Article  Google Scholar 

  • Wu, M. (2010). Measurement, sampling, and equating errors in large-scale assessments. Educational Measurement: Issues and Practice, 29, 15–27.

    Article  Google Scholar 

  • Wu, M., Adams, R. J., Wilson, M., & Haldane, S. (2007). Conquest 2.0 [Computer Software]. Camberwell, VIC: ACER Press.

    Google Scholar 

  • Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of Educational Measurement, 23, 299–325.

    Article  Google Scholar 

  • Zerpa, C., Hachey, K., van Barnfield, C., & Simon, M. (2011). Modeling student motivation and students’ ability estimates from a large-scale assessment of mathematics. Sage Open, 1, 1–9.

    Article  Google Scholar 

Download references

Acknowledgement

This research used data from the National Educational Panel Study (NEPS). From 2008 to 2013, NEPS data were collected as part of the Framework Programme for the Promotion of Empirical Educational Research funded by the German Federal Ministry of Education and Research (BMBF). As of 2014, the NEPS survey is carried out by the Leibniz Institute for Educational Trajectories (LIfBi) at the University of Bamberg in cooperation with a nationwide network.

This research is based on the dedicated work of professors and research assistants within the NEPS. We especially thank Karin Gehrer, Stefan Zimmermann, Cordula Artelt, and Sabine Weinert for developing the tests on reading competence, that are the basis of our research, and Maike Krannich, Michael Wenzler, Theresa Rohm, and Odin Jost for their valuable assistance in analyzing the data. Our thanks also go to the staff of the NEPS administration of surveys and to the methods group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffi Pohl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Pohl, S., Haberkorn, K., Carstensen, C.H. (2015). Measuring Competencies across the Lifespan - Challenges of Linking Test Scores. In: Stemmler, M., von Eye, A., Wiedermann, W. (eds) Dependent Data in Social Sciences Research. Springer Proceedings in Mathematics & Statistics, vol 145. Springer, Cham. https://doi.org/10.1007/978-3-319-20585-4_12

Download citation

Publish with us

Policies and ethics