Abstract
The test-taking behavior of some examinees may be so idiosyncratic that their test scores may not be comparable to the scores of more typical examinees. Appropriateness measurement attempts to use answer patterns to recognize atypical examinees. In this report appropriateness measurement procedures are viewed as statistical tests for choosing between a null hypothesis of normal test-taking behavior and an alternative hypothesis of atypical test-taking behavior. Most powerful tests for inappropriateness are described together with methods for computing their power. A recursion greatly simplifying the calculation of optimal test statistics is described and illustrated.
Similar content being viewed by others
References
Bahadur, R. R. (1968). A representation of the joint distribution of responses ton dichotomous items. In H. Solomon (Ed.),Studies in item analysis and prediction. Stanford, CA: Stanford University Press.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm.Psychometrika, 46, 443–459.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model forn dichotomously scored items.Psychometrika, 35, 179–197.
Cressie, N., & Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait models.Psychometrika, 48, 129–141.
Donlon, T. F., & Fischer, F. E. (1968). An index of an individual's agreement with group-determined item difficulties.Educational and Psychological Measurement, 28, 105–113.
Donlon, T. F., & iRindler, S. E. (1979).Consistency of item difficulty for individuals and groups in the Graduate Record Examination. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Drasgow, F. (1982). Choice of test model for appropriateness measurement.Applied Psychological Measurement, 6, 297–308.
Drasgow, F. & Levine, M. V. (1986). Optimal detection of certain forms of inappropriate test scores.Applied Psychological Measurement, 10, 59–67.
Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987) Detecting inappropriate test scores with optimal and practical appropriateness indices.Applied Psychological Measurement, 11, 59–79.
Drasgow, F., Levine, M. V., & Williams, E. (1985). Appropriateness measurement with polychotomous item response models and standardized indices.British Journal of Mathematical and Statistical Psychology, 38, 67–86.
Green, D. M. (1960). Auditory detection of noise signal.Journal of the Acoustical Society of America, 32, 1189–1203.
Harnisch, D. L. (1983). Item response patterns: Applications for educational practice.Journal of Educational Measurement, 20, 191–206.
Harnisch, D. L., & Linn, R. L. (1981a). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices.Journal of Educational Measurement, 18, 133–146.
Harnisch, D. L. & Linn, R. L. (1981b).Identification of aberrant response patterns (Final Report for Grant No. G-80-0003). Washington, DC: National Institute of Education.
Hulin, C. L., Drasgow, F., & Parson, C. K. (1983).Item response theory: Application to psychological measurement. Homewood, IL: Dow Jones-Irwin.
Kane, M. T., & Brennan, R. L. (1980). Agreement coefficients as indices of dependability for domain-referenced tests.Applied Psychological Measurement, 4, 105–126.
Kendall, M., & Stuart, A. (1979).The advanced theory of statistics (Vol. 2, 4th ed.). New York: Macmillan.
Lehmann, E. L. (1959).Testing statistical hypotheses. New York: Wiley.
Levine, M. V. (1984).An introduction to multilinear formula score theory (Report 84-4). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory.
Levine, M. V. (1985).Representing ability distributions (Report 85-1). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory.
Levine, M. V., & Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies.British Journal of Mathematical and Statistical Psychology, 35, 42–56.
Levine, M. V., & Drasgow, F. (1983). The relation between incorrect option choice and estimated ability.Educational and Psychological Measurement, 43, 675–685.
Levine, M. V., & Drasgow, F. (1984).Performance envelopes and optimal appropriateness measurement (Report 84-5). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory. (ERIC Document No. ED 263 126)
Levine, M. V., & Rubin, D. F. (1979). Measuring the appropriateness of multiple choice test scores.Journal of Educational Statistics, 4, 269–290.
Lord, F. M. (1968). An analysis of the Verbal Scholastic Aptitude Test using Birnbaum's three-parameter logistic model.Educational and Psychological Measurement, 28, 989–1020.
Miller, M. D. (1981).Measuring between-group differences in instruction. Unpublished doctoral dissertation, University of California, Los Angeles.
Mislevy, R. J. (1984). Estimating latent distributions.Psychometrika, 49, 359–382.
Parsons, C. K. (1983). The identification of people for whom JDI scores are inappropriate.Organizational Behavior and Human Performance, 31, 365–393.
Rudner, L. M. (1983). Individual assessment accuracy.Journal of Educational Measurement, 20, 207–219.
Samejima, F. (1981).Final report: Efficient methods of estimating the operating characteristics of item response categories and challenge to a new model for the multiple-choice item (Technical Report). Knoxville, TN: Department of Psychology, University of Tennessee.
Sato, T. (1975).The construction and interpretation of S-P tables. Tokyo: Meijo Tosho. (in Japanese)
Tatsuoka, K. K. (1984). Caution indices based on item response theory.Psychometrika, 49, 95–110.
Tatsuoka, K. K., & Linn, R. L. (1983). Indices for detecting unusual response patterns: Links between two general approaches and potential applications.Applied Psychological Measurement, 7, 81–96.
Tatsuoka, K. K., & Tatsuoka, M. M. (1980).Detection of aberrant response patterns and their effect on dimensionality (Research Report 80-4-ONR). Urbana, IL: University of Illinois, Computer-based Education Research Laboratory.
Van der Flier, H. (1977). Environmental factors and deviant response patterns. In Y. H. Poortinga (Ed.),Basic problems in cross-cultural psychology. Amsterdam: Swets and Seitlinger, B.V.
Van der Flier, H. (1982). Deviant response patterns and comparability of test scores.Journal of Cross-Cultural Psychology, 13, 267–298.
Wainer, H., & Wright, B. D. (1980). Robust estimation of ability in the Rasch model.Psychometrika, 45, 373–391.
Wood, R. L., Wingersky, M. S., & Lord, F. M. (1976).LOGIST—A computer program for estimating examinee ability and item characteristic curve parameters (Research Memorandum 76-6). Princeton, NJ: Educational Testing Service.
Wright, B. D. (1977). Solving measurement problems with the Rasch model.Journal of Educational Measurement, 14, 97–116.
Wright, B. D., & Stone, M. H. (1979).Best test design. Chicago: Mesa Press.
Author information
Authors and Affiliations
Additional information
The work reported in this article was supported by United States Office of Naval Research contracts N00014-79C-0752, NR 154-445 and N00014-83K-0397, NR 150-518, Michael V. Levine, Principal Investigator.
Rights and permissions
About this article
Cite this article
Levine, M.V., Drasgow, F. Optimal appropriateness measurement. Psychometrika 53, 161–176 (1988). https://doi.org/10.1007/BF02294130
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02294130