Skip to main content

Best Practices in Detecting Bias in Nonverbal Tests

  • Chapter
Handbook of Nonverbal Assessment

Abstract

Group comparisons of performance on intelligence tests have been advanced as evidence of real similarities or differences in intellectual ability by Jensen (1980) and, more recently, by Herrnstein and Murray (1994). This purported evidence includes the mean intelligence score dif-ferences that have been reported for various ethnic groups (e.g., Jensen, 1969,1980; Loehlin, Lindzey, & Spuhler, 1975; Lynn, 1977; Munford & Munoz, 1980) or gender (e.g., Feingold, 1993; Nelson, Arthur, Lautiger, & Smith, 1994; Smith, Edmonds, & Smith, 1989; Vance, Hankins, & Brown, 1988; Wessel & Potter, 1994; Wilkinson, 1993).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Alwin, D. F., & Jackson, D. J. (1981). Applications of simultaneous factor analysis to issues of factorial invariance. In D. Jackson & E. Borgatta (Eds.), Factor analysis and measurement in sociological research: A multi-dimensional perspective (pp. 249–279). Beverly Hills, CA: Sage.

    Google Scholar 

  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.

    Google Scholar 

  • Anderson, R. J., & Sisco, F. H. (1977). Standardization of the WISC-R performance scale for deaf children. (Office of Demographic Studies Publication Series T, No. 1). Washington, DC: Gallaudet College.

    Google Scholar 

  • Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3–24). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10, 95–105.

    Article  Google Scholar 

  • Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107,238–246.

    Article  PubMed  Google Scholar 

  • Bentler, P. M. (1992). On the fit of models to covariances and methodology in the Bulletin. Psychological Bulletin, 112, 400–404.

    Article  PubMed  Google Scholar 

  • Berk, R. A. (Ed). (1982). Handbook of methods for detecting test bias. Baltimore: Johns Hopkins University Press.

    Google Scholar 

  • Bock, R. D. (1997). The nominal categories model. In W. J. van der Linden & R. K. Hambleton (Eds.) Handbook of modern item response theory (pp. 33–49). New York: Springer.

    Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parame-ters: Application of an EM algorithm. Psychometrika, 46, 443–459.

    Article  Google Scholar 

  • Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.

    Google Scholar 

  • Bracken, B. A., & McCallum, R. S. (1998). Universal Nonverbal Intelligence Test: Examiner’s manual. Itasca, IL: Riverside Publishing.

    Google Scholar 

  • Braden, J. P. (1999). Straight talk about assessment and diversity: What do we know? School Psychology Quarterly, 14, 343–355.

    Article  Google Scholar 

  • Breslow, N. E., & Day, N. E. (1980). Statistical methods in cancer research Volume 1: The analysis of case-control studies. Lyon: International Agency for Research on Cancer.

    Google Scholar 

  • Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equations models (pp. 136–162). Newbury Park, CA: Sage.

    Google Scholar 

  • Bryk, A. (1980). Review of Bias in mental testing. Journal of Educational Measurement, 17, 369–374.

    Google Scholar 

  • Camilli, G. (1979). A critique of the chi-square method for assessing item bias. Unpublished paper, Laboratory of Educational Research, University of Colorado.

    Google Scholar 

  • Camilli, G., & Shepard, L. A. (1987). The inadequacy of ANOVA for detecting test bias. Journal of Educational Statistics, 12, 87–89.

    Article  Google Scholar 

  • Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.

    Google Scholar 

  • Cattell, R. B. (1978). The scientific use of factor analysis in behavioral and life sciences. New York: Plenum.

    Book  Google Scholar 

  • Clauser, B. E., Nungester, R. J., & Swaminathan, H. (1996). Improving the matching for DIF analysis by conditioning on both test score and an educational background variable. Journal of Educational Measurement, 33, 453–464.

    Article  Google Scholar 

  • Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and White students in integrated colleges. Journal of Educational Measurement, 5, 115–124.

    Article  Google Scholar 

  • Cohen, A. S., Kim, S., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20,15–26.

    Article  Google Scholar 

  • Cotter, D. E., & Berk, R. A. (1981, April). Item bias in the WISC-R using black, white, and hispanic learning disabled children. Paper presented at the Annual Meeting of the American Educational Research Association, Los Angeles (ERIC Document Reproduction Service ED 206 631).

    Google Scholar 

  • Diana v. the California State Board of Education. Case No. C-70–37 RFP. (N.D. Cal. 1970).

    Google Scholar 

  • Dorans, N. J. (1989). Two new approaches to assessing differential item functioning:Standardization and the Mantel-Haenszel method. Applied Psychological Measurement,2, 217–233.

    Google Scholar 

  • Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An applica-tion of the standardization approach (Research Rep. No. 83–9). Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Engelhard, G., Hansche, L., & Rutledge, K. E. (1990). Accuracy of bias review judges in identifying differential item functioning on teacher certification tests. Applied Psychological Measurement, 3, 347–360.

    Google Scholar 

  • Feingold, A. (1993). Cognitive gender differences: A developmental perspective. Sex Roles, 29, 91–112.

    Article  Google Scholar 

  • Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). New York: American Council on Education & Macmillan.

    Google Scholar 

  • Frisby, C. L. (1998). Poverty and socioeconomic status. In J. L. Sandoval, C. L. Frisby, K. F. Geisinger, J. D. Scheuneman, & J. R. Grenier (Eds.), Test interpretation and diversity: Achieving equity in assessment (pp. 241–270). Washington, DC: American Psychological Association.

    Chapter  Google Scholar 

  • Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential distractor functioning. Journal of educational measurement,26, 147–160.

    Article  Google Scholar 

  • Hammill, D. D., Pearson, N. A., & Wiederholt, J. L. (1997). Comprehensive test of nonverbal intelligence. Austin, TX: PRO-ED.

    Google Scholar 

  • Herrnstein, R. J., & Murray, C. (1994). The bell curve. New York: Free Press.

    Google Scholar 

  • Hills, J. R. (1989). Screening for potentially biased items in testing programs. Educational Measurement: Issues and Practice, 8(4), 5–11.

    Article  Google Scholar 

  • Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Hu, L., Bentler, P. M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112,351–362.

    Article  PubMed  Google Scholar 

  • Ilai, D., & Willerman, L. (1989). Sex differences in WAIS-R item performance. Intelligence, 13, 225–234.

    Article  Google Scholar 

  • Jastak, J. E., & Jastak, S. R. (1964). Short forms of the WAIS and WISC vocabulary subtests. Journal of Clinical Psychology, 20, 167–199.

    Article  PubMed  Google Scholar 

  • Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational Review, 39,1–123.

    Google Scholar 

  • Jensen, A. R. (1974). How biased are culture-loaded tests? Genetic Psychology Monographs, 90, 185–224.

    Google Scholar 

  • Jensen, A. R. (1976). Test bias and construct validity. Phi Delta Kappan, 58, 340–346.

    Google Scholar 

  • Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.

    Google Scholar 

  • Joint Committee on Testing Practices. (1988). Code of fair testing practices in education. Washington, DC: National Council on Measurement in Education.

    Google Scholar 

  • Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 57, 409–426.

    Article  Google Scholar 

  • Jöreskog, K. G., & Sörbom, D. (1989). LISREL7: A guide to the program and applications (2nd ed.). Chicago: SPSS.

    Google Scholar 

  • Koh, T., Abbatiello, A., & McLoughlin, C. S. (1984). Cultural bias in WISC subtest items: A response to Judge Grady’s suggestions in relation to the PASE case. School Psychology Review, 13,89–94.

    Google Scholar 

  • Kromrey, J. D., & Parshall, C. G. (1991, November). Screening items for bias: An empirical comparison of the performance of three indices in small samples of examinees. Paper presented at the annual meeting of the Florida Educational Research Association. Clearwater, FL.

    Google Scholar 

  • Larry P. et al. v. Wilson Riles, Superintendent of Public Instruction for the State of California, et al., Case No. C-71–2270 (N.D. Cal., 1979).

    Google Scholar 

  • Lawrence, I. M., & Curley, W. E. (1989, March). Differential item functioning of SAT-Verbal reading subscore items for males and females: follow-up study. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

    Google Scholar 

  • Lawrence, I. M., Curley, W. E., & McHale, F. J. (1988, April). Differential item functioning of SAT-Verbal reading subscore items for male and female examinees. Paper presented at the annual meeting of the American Educational Research Association, New Orleans.

    Google Scholar 

  • Linn, R. L., Levine, M. V., Hastings, C. N., & Waldrop, J. L. (1980). An investigation of item bias in a test of reading comprehension (Technical Rep. No. 163). Urbana: Center for the Study of Reading, University of Illinois at Urbana-Champaign.

    Google Scholar 

  • Loehlin, J. C., Lindzey, G., & Spuhler, J. N. (1975). Race differences in intelligence. San Francisco: W. H. Freeman.

    Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Lynn, R. (1977). The intelligence of the Japanese. Bulletin of the British Psychological Society. 30,69–72.

    Google Scholar 

  • Maller, S. J. (1996). WISC-III Verbal item invariance across samples of deaf and hearing children of similar measured ability. Journal of Psychoeducational Assessment, 14, 152–165.

    Article  Google Scholar 

  • Maller, S. J. (2000). Item invariance in four subtests of the Universal Nonverbal Intelligence Test across groups of deaf and hearing children. Journal of Psychoeducational Assessment, 18,240–254.

    Article  Google Scholar 

  • Maller, S. J. (2001). Differential item functioning in the WISC-III: Item parameters for boys and girls in the national standardization sample. Educational and Psychological Measurement, 61,793–817.

    Article  Google Scholar 

  • Maller, S. J., & Ferron, J. (1997). WISC-III factor invariance across deaf and standardization samples. Educational and Psychological Measurement, 7, 987–994.

    Article  Google Scholar 

  • Maller, S. J., Konold, T. R., & Glutting, J. J. (1998). WISC-III Factor invariance across samples of children displaying appropriate and inappropriate test-taking behavior. Educational and Psychological Measurement, 58, 467–475.

    Article  Google Scholar 

  • Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from the retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.

    PubMed  Google Scholar 

  • McGaw, B., & Jöreskog, K. G. (1971). Factorial invariance of ability measures in groups differing in intelligence and socio-economic status. British Journal of Mathematical and Statistical Psychology, 24, 154–168.

    Article  Google Scholar 

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education & Macmillan.

    Google Scholar 

  • Miele, F. (1979). Cultural bias in the WISC. Intelligence, 3, 149–164.

    Article  Google Scholar 

  • Munford, P. R., & Munoz, A. (1980). A comparison of the WISC and WISC-R on Hispanic children. Journal of Clinical Psychology, 36, 452–458.

    Article  Google Scholar 

  • National Council on Measurement in Education (1995). Code of professional responsibilities in educational measurement. Washington, DC: NCME.

    Google Scholar 

  • Nelson, K. M., Arthur, P., Lautiger, J., & Smith, D. K. (1994, March). Does the use of color on the WISC-III affect student performance? Paper presented at the annual meeting of the National Association of School Psychologists, Seattle, WA.

    Google Scholar 

  • O’Neill, K. A., & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In H. Wainer & H. I. Braun (Eds.). Test validity (pp. 255–276). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Plake, B. S. (1980). A comparison of a statistical and subjective procedure to ascertain item validity: One step in the test validation process. Educational and Psychological Measurement, 30,397–404.

    Article  Google Scholar 

  • Reynolds, C. R. (1982). The problem of bias in psychological assessment. In C. R. Reynolds & T. B. Gutkin (Eds.), The handbook of school psychology (pp. 178–108). New York: John Wiley.

    Google Scholar 

  • Rigdon, E. E. (1996). CFI versus RMSEA: A comparison of two fit indexes for structural equation modeling. Structural Equation Modeling, 3, 369–379.

    Article  Google Scholar 

  • Roid, G. H., & Miller, L. J. (1997). Leiter International performance scale-revised: Examiner’s manual. In G. H. Roid and L. J. Miller, Leiter International performance scale-revised,Wood Dale, IL: Stoelting.

    Google Scholar 

  • Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and the Mantel-Haenszel procedures for detecting differential item functioning. Applied Measurement in Education, 17, 105–116.

    Google Scholar 

  • Ross-Reynolds, J., & Reschly, D. J. (1983). An investigation of item bias on the WISC-R with four sociocultural groups. Journal of Consulting and Clinical Psychology, 51, 144–146.

    Article  Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 4(2),Whole No. 17.

    Google Scholar 

  • Sandoval, J. (1979). The WISC-R and internal evidence of test bias with minority groups. Journal of Consulting and Clinical Psychology, 47, 919–927.

    Article  Google Scholar 

  • Sandoval, J., & Millee, M. P. W. (1980). Accuracy of judgements of WISC-R item difficulty for minority groups. Journal of Consulting and Clinical Psychology, 48,249–253.

    Article  Google Scholar 

  • Sandoval, J., Zimmerman, I. L., & Woo-Sam, J. M. (1977). Cultural differences on the WISC-R verbal items. Journal of School Psychology, 21,49–55.

    Article  Google Scholar 

  • Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. Proceedings of the Business and Economic Statistics Section of the American Statistical Association, 303–313.

    Google Scholar 

  • Scheuneman, J. (1975, April). A new method of assessing bias in test items. Paper presented at the annual meeting of the American Educational Research Association, Washington, DC.

    Google Scholar 

  • Scheuneman, J. D., & Gerritz, K. (1990). Using differential item functioning procedures to explore sources of item difficulty and group performance characteristics. Journal of Educational Measurement, 27, 109–131.

    Article  Google Scholar 

  • Scheuneman, J. D., & Oakland, T. (1998). In J. Sandoval, C. L. Frisby, K. F. Geisinger, J. D. Scheuneman, & J. R. Grenirer (Eds.), Test interpretation and diversity: Achieving equity in assessment (pp. 77–103). Washington, DC: American Psychological Association.

    Chapter  Google Scholar 

  • Shelley-Sireci, & Sireci, S. G. (1998, August). Controlling for uncontrolled variables in cross-cultural research. Paper presented at the annual meeting of the American Psychological Association, San Francisco.

    Google Scholar 

  • Sireci, S. G., Bastad, B., & Allalouf, A. (1998, August). Evaluating construct equivalence across adapted tests. Paper presented at the annual meeting of the American Psychological Association, San Francisco.

    Google Scholar 

  • Simpson, E. H. (1951). Interpretation of interaction contingency tables. Journal of the Royal Statistical Society, (Series B),13, 238–241.

    Google Scholar 

  • Smith, T. C., Edmonds, J. E., & Smith, B. (1989). The role of sex differences in the referral process as measured by the Peabody Picture Vocabulary Test-Revised and the Wechsler Intelligence Scale for Children-Revised. Psychology in the Schools, 26,354–358.

    Article  Google Scholar 

  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.

    Article  Google Scholar 

  • Thissen, D. (1991). MULTILOG (Version 6.30) [Computer software]. Chicago: Scientific Software.

    Google Scholar 

  • Thissen, D., & Steinberg, L. (1997). A response model for multiple-choice items. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 51–65). New York: Springer.

    Google Scholar 

  • Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group mean differences: The concept of item bias. Psychological Bulletin, 99,118–128.

    Article  Google Scholar 

  • Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 149–169). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response theory. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–114). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Turner, R. G., & Willerman, L. (1977). Sex differences in WAIS item performance. Journal of Clinical Psychology, 33, 795–798.

    Article  PubMed  Google Scholar 

  • Vance, B., Hankins, N. & Brown, W. (1988). Ethnic and sex differences on the Test of Nonverbal Intelligence, Quick Test of Intelligence, and Wechsler Intelligence Scale for Children-Revised. Journal of Clinical Psychology, 44, 261–265.

    Article  PubMed  Google Scholar 

  • Veale, J. R. (1977). A note on the use of chi-square with “correct/incorrect” data to detect culturally biased items (Statistical Research in the Behavioral Sciences, Technical Report No. 4). (Available from J. R. Veale, PO Box 4036, Berkeley, CA 94704.)

    Google Scholar 

  • Welch, C., & Hoover, H. D. (1993). Procedures for extending item bias techniques to polytomously scored items. Applied Psychological Measurement, 6, 1–19.

    Google Scholar 

  • Wessel, J., & Potter, A. (1994, March). Analysis of WISC-III data from an urban population of referred children. Paper presented at the annual meeting of the National Association of School Psychologists, Seattle, WA.

    Google Scholar 

  • Wild, C. L., & McPeek, W. M. (1986, August). Performance of the Mantel-Haenszel statistic in identifying differentially functioning items. Paper presented at the annual meeting of the American Psychological Association, Washington, DC.

    Google Scholar 

  • Wilkinson, S. C. (1993). WISC-R profiles of children with superior intellectual ability. Gifted Child Quarterly, 2, 84–92.

    Article  Google Scholar 

  • Zeiky, M. (1993). Practical questions in the use of DIF statistics in item development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–364). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIP): Logistic regression modeling as a unitary framework for binary and Likert-type (ordi-nal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.

    Google Scholar 

  • Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessing differential item functioning in performance tasks. Journal of Educational Measurement, 30,233–251.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media New York

About this chapter

Cite this chapter

Maller, S.J. (2003). Best Practices in Detecting Bias in Nonverbal Tests. In: McCallum, R.S. (eds) Handbook of Nonverbal Assessment. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0153-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0153-4_2

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-4945-7

  • Online ISBN: 978-1-4615-0153-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics