, Volume 75, Issue 2, pp 309–327 | Cite as

The Missing Data Assumptions of the NEAT Design and their Implications for Test Equating

  • Sandip SinharayEmail author
  • Paul W. Holland


The Non-Equivalent groups with Anchor Test (NEAT) design involves missing data that are missing by design. Three nonlinear observed score equating methods used with a NEAT design are the frequency estimation equipercentile equating (FEEE), the chain equipercentile equating (CEE), and the item-response-theory observed-score-equating (IRT OSE). These three methods each make different assumptions about the missing data in the NEAT design. The FEEE method assumes that the conditional distribution of the test score given the anchor test score is the same in the two examinee groups. The CEE method assumes that the equipercentile functions equating the test score to the anchor test score are the same in the two examinee groups. The IRT OSE method assumes that the IRT model employed fits the data adequately, and the items in the tests and the anchor test do not exhibit differential item functioning across the two examinee groups. This paper first describes the missing data assumptions of the three equating methods. Then it describes how the missing data in the NEAT design can be filled in a manner that is coherent with the assumptions made by each of these equating methods. Implications on equating are also discussed.


chain equating frequency estimation IRT observed-score equating post-stratification equating raking simulation true equating function 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bishop, Y.M.M., Fienberg, E.F., & Holland, P.W. (1975). Discrete multivariate analysis. Cambridge: MIT Press. Google Scholar
  2. Braun, H.I., & Holland, P.W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In P.W. Holland & D.B. Rubin (Eds.), Test equating (pp. 71–135). New York: Academic Press. Google Scholar
  3. Haberman, S.J. (2006). An elementary test of the normal 2PL model against the normal 3PL model (ETS RR-06-10). Princeton, NJ: ETS. Google Scholar
  4. Holland, P.W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55(4), 577–601. CrossRefGoogle Scholar
  5. Holland, P.W., & Thayer, D.T. (2000). Univariate and bivariate loglinear models for discrete test score distributions. Journal of Educational and Behavioral Statistics, 25, 133–183. Google Scholar
  6. Holland, P.W., Sinharay, S., von Davier, A.A., & Han, N. (2008). An approach to evaluating the missing data assumptions of the chain and post-stratification equating methods for the NEAT design. Journal of Educational Measurement, 45, 17–43. CrossRefGoogle Scholar
  7. Kolen, M.J., & Brennan, R.J. (2004). Test equating, scaling, and linking (2nd ed.). New York: Springer. Google Scholar
  8. Liou, M., & Cheng, P.E. (1995). Equipercentile equating via data-imputation techniques. Psychometrika, 60(1), 119–136. CrossRefGoogle Scholar
  9. Little, R.J., & Rubin, D.B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley. Google Scholar
  10. Livingston, S.A., Dorans, N.J., & Wright, N.K. (1990). What combination of sampling and equating methods works best? Applied Measurement in Education, 3, 73–95. CrossRefGoogle Scholar
  11. Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8, 452–461. CrossRefGoogle Scholar
  12. Marco, G.L., Petersen, N.S., & Stewart, E.E. (1983). A test of the adequacy of curvilinear score equating models. In D. Weiss (Ed.), New horizons in testing: latent trait test theory and computerized adaptive testing. New York: Academic Press. Google Scholar
  13. Miyazaki, K., Hoshino, T., Mayekawa, S., & Shigemasu, K. (2009). A new concurrent calibration method for nonequivalent group design under nonrandom assignment. Psychometrika, 74, 1–20. CrossRefGoogle Scholar
  14. Puhan, G. (2010). A comparison of chained linear and post stratification linear equating under different testing conditions. Journal of Educational Measurement, 47(1), 54–75. CrossRefGoogle Scholar
  15. Sinharay, S. (2008). Chain equating versus post-stratification equating: An illustrative comparison. Paper presented at the conference to honor Paul Holland, Princeton, NJ. Google Scholar
  16. Sinharay, S., & Holland, P.W. (in press). A fair comparison of three nonlinear equating methods in applications of the NEAT design. Journal of Educational Measurement. Google Scholar
  17. Thisted, R. (1988). Elements of statistical computing. New York: Chapman and Hall. Google Scholar
  18. von Davier, A.A., Holland, P.W., & Thayer, D.T. (2004). The kernel method of test equating. New York: Springer. Google Scholar
  19. von Davier, A.A., Holland, P.W., Livingston, S.A., Casabianca, J., Grant, M.C., & Martin, K. (2006). An evaluation of the kernel equating method. A special study with pseudo-tests constructed from real test data (ETS RR-06-02). Princeton, NJ: ETS. Google Scholar
  20. Wang, T., Lee, W.-C., Brennan, R.J., & Kolen, M.J. (2008). A comparison of the frequency estimation and chained equipercentile methods under the common-item non-equivalent groups design. Applied Psychological Measurement, 32, 632–651. CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2010

Authors and Affiliations

  1. 1.ETSPrincetonUSA

Personalised recommendations