Skip to main content
Log in

Optimal appropriateness measurement

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

The test-taking behavior of some examinees may be so idiosyncratic that their test scores may not be comparable to the scores of more typical examinees. Appropriateness measurement attempts to use answer patterns to recognize atypical examinees. In this report appropriateness measurement procedures are viewed as statistical tests for choosing between a null hypothesis of normal test-taking behavior and an alternative hypothesis of atypical test-taking behavior. Most powerful tests for inappropriateness are described together with methods for computing their power. A recursion greatly simplifying the calculation of optimal test statistics is described and illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bahadur, R. R. (1968). A representation of the joint distribution of responses ton dichotomous items. In H. Solomon (Ed.),Studies in item analysis and prediction. Stanford, CA: Stanford University Press.

    Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm.Psychometrika, 46, 443–459.

    Google Scholar 

  • Bock, R. D., & Lieberman, M. (1970). Fitting a response model forn dichotomously scored items.Psychometrika, 35, 179–197.

    Google Scholar 

  • Cressie, N., & Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait models.Psychometrika, 48, 129–141.

    Google Scholar 

  • Donlon, T. F., & Fischer, F. E. (1968). An index of an individual's agreement with group-determined item difficulties.Educational and Psychological Measurement, 28, 105–113.

    Google Scholar 

  • Donlon, T. F., & iRindler, S. E. (1979).Consistency of item difficulty for individuals and groups in the Graduate Record Examination. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

  • Drasgow, F. (1982). Choice of test model for appropriateness measurement.Applied Psychological Measurement, 6, 297–308.

    Google Scholar 

  • Drasgow, F. & Levine, M. V. (1986). Optimal detection of certain forms of inappropriate test scores.Applied Psychological Measurement, 10, 59–67.

    Google Scholar 

  • Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987) Detecting inappropriate test scores with optimal and practical appropriateness indices.Applied Psychological Measurement, 11, 59–79.

    Google Scholar 

  • Drasgow, F., Levine, M. V., & Williams, E. (1985). Appropriateness measurement with polychotomous item response models and standardized indices.British Journal of Mathematical and Statistical Psychology, 38, 67–86.

    Google Scholar 

  • Green, D. M. (1960). Auditory detection of noise signal.Journal of the Acoustical Society of America, 32, 1189–1203.

    Google Scholar 

  • Harnisch, D. L. (1983). Item response patterns: Applications for educational practice.Journal of Educational Measurement, 20, 191–206.

    Google Scholar 

  • Harnisch, D. L., & Linn, R. L. (1981a). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices.Journal of Educational Measurement, 18, 133–146.

    Google Scholar 

  • Harnisch, D. L. & Linn, R. L. (1981b).Identification of aberrant response patterns (Final Report for Grant No. G-80-0003). Washington, DC: National Institute of Education.

    Google Scholar 

  • Hulin, C. L., Drasgow, F., & Parson, C. K. (1983).Item response theory: Application to psychological measurement. Homewood, IL: Dow Jones-Irwin.

    Google Scholar 

  • Kane, M. T., & Brennan, R. L. (1980). Agreement coefficients as indices of dependability for domain-referenced tests.Applied Psychological Measurement, 4, 105–126.

    Google Scholar 

  • Kendall, M., & Stuart, A. (1979).The advanced theory of statistics (Vol. 2, 4th ed.). New York: Macmillan.

    Google Scholar 

  • Lehmann, E. L. (1959).Testing statistical hypotheses. New York: Wiley.

    Google Scholar 

  • Levine, M. V. (1984).An introduction to multilinear formula score theory (Report 84-4). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory.

    Google Scholar 

  • Levine, M. V. (1985).Representing ability distributions (Report 85-1). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory.

    Google Scholar 

  • Levine, M. V., & Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies.British Journal of Mathematical and Statistical Psychology, 35, 42–56.

    Google Scholar 

  • Levine, M. V., & Drasgow, F. (1983). The relation between incorrect option choice and estimated ability.Educational and Psychological Measurement, 43, 675–685.

    Google Scholar 

  • Levine, M. V., & Drasgow, F. (1984).Performance envelopes and optimal appropriateness measurement (Report 84-5). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory. (ERIC Document No. ED 263 126)

    Google Scholar 

  • Levine, M. V., & Rubin, D. F. (1979). Measuring the appropriateness of multiple choice test scores.Journal of Educational Statistics, 4, 269–290.

    Google Scholar 

  • Lord, F. M. (1968). An analysis of the Verbal Scholastic Aptitude Test using Birnbaum's three-parameter logistic model.Educational and Psychological Measurement, 28, 989–1020.

    Google Scholar 

  • Miller, M. D. (1981).Measuring between-group differences in instruction. Unpublished doctoral dissertation, University of California, Los Angeles.

    Google Scholar 

  • Mislevy, R. J. (1984). Estimating latent distributions.Psychometrika, 49, 359–382.

    Google Scholar 

  • Parsons, C. K. (1983). The identification of people for whom JDI scores are inappropriate.Organizational Behavior and Human Performance, 31, 365–393.

    Google Scholar 

  • Rudner, L. M. (1983). Individual assessment accuracy.Journal of Educational Measurement, 20, 207–219.

    Google Scholar 

  • Samejima, F. (1981).Final report: Efficient methods of estimating the operating characteristics of item response categories and challenge to a new model for the multiple-choice item (Technical Report). Knoxville, TN: Department of Psychology, University of Tennessee.

    Google Scholar 

  • Sato, T. (1975).The construction and interpretation of S-P tables. Tokyo: Meijo Tosho. (in Japanese)

    Google Scholar 

  • Tatsuoka, K. K. (1984). Caution indices based on item response theory.Psychometrika, 49, 95–110.

    Google Scholar 

  • Tatsuoka, K. K., & Linn, R. L. (1983). Indices for detecting unusual response patterns: Links between two general approaches and potential applications.Applied Psychological Measurement, 7, 81–96.

    Google Scholar 

  • Tatsuoka, K. K., & Tatsuoka, M. M. (1980).Detection of aberrant response patterns and their effect on dimensionality (Research Report 80-4-ONR). Urbana, IL: University of Illinois, Computer-based Education Research Laboratory.

    Google Scholar 

  • Van der Flier, H. (1977). Environmental factors and deviant response patterns. In Y. H. Poortinga (Ed.),Basic problems in cross-cultural psychology. Amsterdam: Swets and Seitlinger, B.V.

    Google Scholar 

  • Van der Flier, H. (1982). Deviant response patterns and comparability of test scores.Journal of Cross-Cultural Psychology, 13, 267–298.

    Google Scholar 

  • Wainer, H., & Wright, B. D. (1980). Robust estimation of ability in the Rasch model.Psychometrika, 45, 373–391.

    Google Scholar 

  • Wood, R. L., Wingersky, M. S., & Lord, F. M. (1976).LOGIST—A computer program for estimating examinee ability and item characteristic curve parameters (Research Memorandum 76-6). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Wright, B. D. (1977). Solving measurement problems with the Rasch model.Journal of Educational Measurement, 14, 97–116.

    Google Scholar 

  • Wright, B. D., & Stone, M. H. (1979).Best test design. Chicago: Mesa Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

The work reported in this article was supported by United States Office of Naval Research contracts N00014-79C-0752, NR 154-445 and N00014-83K-0397, NR 150-518, Michael V. Levine, Principal Investigator.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levine, M.V., Drasgow, F. Optimal appropriateness measurement. Psychometrika 53, 161–176 (1988). https://doi.org/10.1007/BF02294130

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02294130

Key words

Navigation