Optimal appropriateness measurement

Levine, Michael V.; Drasgow, Fritz

doi:10.1007/BF02294130

Optimal appropriateness measurement

Published: June 1988

Volume 53, pages 161–176, (1988)
Cite this article

Psychometrika Aims and scope Submit manuscript

Michael V. Levine¹ &
Fritz Drasgow¹

174 Accesses
62 Citations
3 Altmetric
Explore all metrics

Abstract

The test-taking behavior of some examinees may be so idiosyncratic that their test scores may not be comparable to the scores of more typical examinees. Appropriateness measurement attempts to use answer patterns to recognize atypical examinees. In this report appropriateness measurement procedures are viewed as statistical tests for choosing between a null hypothesis of normal test-taking behavior and an alternative hypothesis of atypical test-taking behavior. Most powerful tests for inappropriateness are described together with methods for computing their power. A recursion greatly simplifying the calculation of optimal test statistics is described and illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Small is beautiful: In defense of the small-N design

Article Open access 19 March 2018

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

Article 17 October 2016

References

Bahadur, R. R. (1968). A representation of the joint distribution of responses ton dichotomous items. In H. Solomon (Ed.),Studies in item analysis and prediction. Stanford, CA: Stanford University Press.
Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm.Psychometrika, 46, 443–459.
Google Scholar
Bock, R. D., & Lieberman, M. (1970). Fitting a response model forn dichotomously scored items.Psychometrika, 35, 179–197.
Google Scholar
Cressie, N., & Holland, P. W. (1983). Characterizing the manifest probabilities of latent trait models.Psychometrika, 48, 129–141.
Google Scholar
Donlon, T. F., & Fischer, F. E. (1968). An index of an individual's agreement with group-determined item difficulties.Educational and Psychological Measurement, 28, 105–113.
Google Scholar
Donlon, T. F., & iRindler, S. E. (1979).Consistency of item difficulty for individuals and groups in the Graduate Record Examination. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Drasgow, F. (1982). Choice of test model for appropriateness measurement.Applied Psychological Measurement, 6, 297–308.
Google Scholar
Drasgow, F. & Levine, M. V. (1986). Optimal detection of certain forms of inappropriate test scores.Applied Psychological Measurement, 10, 59–67.
Google Scholar
Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987) Detecting inappropriate test scores with optimal and practical appropriateness indices.Applied Psychological Measurement, 11, 59–79.
Google Scholar
Drasgow, F., Levine, M. V., & Williams, E. (1985). Appropriateness measurement with polychotomous item response models and standardized indices.British Journal of Mathematical and Statistical Psychology, 38, 67–86.
Google Scholar
Green, D. M. (1960). Auditory detection of noise signal.Journal of the Acoustical Society of America, 32, 1189–1203.
Google Scholar
Harnisch, D. L. (1983). Item response patterns: Applications for educational practice.Journal of Educational Measurement, 20, 191–206.
Google Scholar
Harnisch, D. L., & Linn, R. L. (1981a). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices.Journal of Educational Measurement, 18, 133–146.
Google Scholar
Harnisch, D. L. & Linn, R. L. (1981b).Identification of aberrant response patterns (Final Report for Grant No. G-80-0003). Washington, DC: National Institute of Education.
Google Scholar
Hulin, C. L., Drasgow, F., & Parson, C. K. (1983).Item response theory: Application to psychological measurement. Homewood, IL: Dow Jones-Irwin.
Google Scholar
Kane, M. T., & Brennan, R. L. (1980). Agreement coefficients as indices of dependability for domain-referenced tests.Applied Psychological Measurement, 4, 105–126.
Google Scholar
Kendall, M., & Stuart, A. (1979).The advanced theory of statistics (Vol. 2, 4th ed.). New York: Macmillan.
Google Scholar
Lehmann, E. L. (1959).Testing statistical hypotheses. New York: Wiley.
Google Scholar
Levine, M. V. (1984).An introduction to multilinear formula score theory (Report 84-4). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory.
Google Scholar
Levine, M. V. (1985).Representing ability distributions (Report 85-1). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory.
Google Scholar
Levine, M. V., & Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies.British Journal of Mathematical and Statistical Psychology, 35, 42–56.
Google Scholar
Levine, M. V., & Drasgow, F. (1983). The relation between incorrect option choice and estimated ability.Educational and Psychological Measurement, 43, 675–685.
Google Scholar
Levine, M. V., & Drasgow, F. (1984).Performance envelopes and optimal appropriateness measurement (Report 84-5). Champaign, IL: University of Illinois, Department of Educational Psychology, Model-Based Measurement Laboratory. (ERIC Document No. ED 263 126)
Google Scholar
Levine, M. V., & Rubin, D. F. (1979). Measuring the appropriateness of multiple choice test scores.Journal of Educational Statistics, 4, 269–290.
Google Scholar
Lord, F. M. (1968). An analysis of the Verbal Scholastic Aptitude Test using Birnbaum's three-parameter logistic model.Educational and Psychological Measurement, 28, 989–1020.
Google Scholar
Miller, M. D. (1981).Measuring between-group differences in instruction. Unpublished doctoral dissertation, University of California, Los Angeles.
Google Scholar
Mislevy, R. J. (1984). Estimating latent distributions.Psychometrika, 49, 359–382.
Google Scholar
Parsons, C. K. (1983). The identification of people for whom JDI scores are inappropriate.Organizational Behavior and Human Performance, 31, 365–393.
Google Scholar
Rudner, L. M. (1983). Individual assessment accuracy.Journal of Educational Measurement, 20, 207–219.
Google Scholar
Samejima, F. (1981).Final report: Efficient methods of estimating the operating characteristics of item response categories and challenge to a new model for the multiple-choice item (Technical Report). Knoxville, TN: Department of Psychology, University of Tennessee.
Google Scholar
Sato, T. (1975).The construction and interpretation of S-P tables. Tokyo: Meijo Tosho. (in Japanese)
Google Scholar
Tatsuoka, K. K. (1984). Caution indices based on item response theory.Psychometrika, 49, 95–110.
Google Scholar
Tatsuoka, K. K., & Linn, R. L. (1983). Indices for detecting unusual response patterns: Links between two general approaches and potential applications.Applied Psychological Measurement, 7, 81–96.
Google Scholar
Tatsuoka, K. K., & Tatsuoka, M. M. (1980).Detection of aberrant response patterns and their effect on dimensionality (Research Report 80-4-ONR). Urbana, IL: University of Illinois, Computer-based Education Research Laboratory.
Google Scholar
Van der Flier, H. (1977). Environmental factors and deviant response patterns. In Y. H. Poortinga (Ed.),Basic problems in cross-cultural psychology. Amsterdam: Swets and Seitlinger, B.V.
Google Scholar
Van der Flier, H. (1982). Deviant response patterns and comparability of test scores.Journal of Cross-Cultural Psychology, 13, 267–298.
Google Scholar
Wainer, H., & Wright, B. D. (1980). Robust estimation of ability in the Rasch model.Psychometrika, 45, 373–391.
Google Scholar
Wood, R. L., Wingersky, M. S., & Lord, F. M. (1976).LOGIST—A computer program for estimating examinee ability and item characteristic curve parameters (Research Memorandum 76-6). Princeton, NJ: Educational Testing Service.
Google Scholar
Wright, B. D. (1977). Solving measurement problems with the Rasch model.Journal of Educational Measurement, 14, 97–116.
Google Scholar
Wright, B. D., & Stone, M. H. (1979).Best test design. Chicago: Mesa Press.
Google Scholar

Download references

Author information

Authors and Affiliations

210 Education Building, University of Illinois, 1310 South Sixth St., 61820, Champaign, IL
Michael V. Levine & Fritz Drasgow

Authors

Michael V. Levine
View author publications
You can also search for this author in PubMed Google Scholar
Fritz Drasgow
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

The work reported in this article was supported by United States Office of Naval Research contracts N00014-79C-0752, NR 154-445 and N00014-83K-0397, NR 150-518, Michael V. Levine, Principal Investigator.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Levine, M.V., Drasgow, F. Optimal appropriateness measurement. Psychometrika 53, 161–176 (1988). https://doi.org/10.1007/BF02294130

Download citation

Received: 26 March 1985
Revised: 17 February 1987
Issue Date: June 1988
DOI: https://doi.org/10.1007/BF02294130

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal appropriateness measurement

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Small is beautiful: In defense of the small-N design

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Optimal appropriateness measurement

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Small is beautiful: In defense of the small-N design

Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation