Abstract
Group comparisons of performance on intelligence tests have been advanced as evidence of real similarities or differences in intellectual ability by Jensen (1980) and, more recently, by Herrnstein and Murray (1994). This purported evidence includes the mean intelligence score dif-ferences that have been reported for various ethnic groups (e.g., Jensen, 1969,1980; Loehlin, Lindzey, & Spuhler, 1975; Lynn, 1977; Munford & Munoz, 1980) or gender (e.g., Feingold, 1993; Nelson, Arthur, Lautiger, & Smith, 1994; Smith, Edmonds, & Smith, 1989; Vance, Hankins, & Brown, 1988; Wessel & Potter, 1994; Wilkinson, 1993).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alwin, D. F., & Jackson, D. J. (1981). Applications of simultaneous factor analysis to issues of factorial invariance. In D. Jackson & E. Borgatta (Eds.), Factor analysis and measurement in sociological research: A multi-dimensional perspective (pp. 249–279). Beverly Hills, CA: Sage.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Anderson, R. J., & Sisco, F. H. (1977). Standardization of the WISC-R performance scale for deaf children. (Office of Demographic Studies Publication Series T, No. 1). Washington, DC: Gallaudet College.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3–24). Hillsdale, NJ: Erlbaum.
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10, 95–105.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107,238–246.
Bentler, P. M. (1992). On the fit of models to covariances and methodology in the Bulletin. Psychological Bulletin, 112, 400–404.
Berk, R. A. (Ed). (1982). Handbook of methods for detecting test bias. Baltimore: Johns Hopkins University Press.
Bock, R. D. (1997). The nominal categories model. In W. J. van der Linden & R. K. Hambleton (Eds.) Handbook of modern item response theory (pp. 33–49). New York: Springer.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parame-ters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bracken, B. A., & McCallum, R. S. (1998). Universal Nonverbal Intelligence Test: Examiner’s manual. Itasca, IL: Riverside Publishing.
Braden, J. P. (1999). Straight talk about assessment and diversity: What do we know? School Psychology Quarterly, 14, 343–355.
Breslow, N. E., & Day, N. E. (1980). Statistical methods in cancer research Volume 1: The analysis of case-control studies. Lyon: International Agency for Research on Cancer.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equations models (pp. 136–162). Newbury Park, CA: Sage.
Bryk, A. (1980). Review of Bias in mental testing. Journal of Educational Measurement, 17, 369–374.
Camilli, G. (1979). A critique of the chi-square method for assessing item bias. Unpublished paper, Laboratory of Educational Research, University of Colorado.
Camilli, G., & Shepard, L. A. (1987). The inadequacy of ANOVA for detecting test bias. Journal of Educational Statistics, 12, 87–89.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Cattell, R. B. (1978). The scientific use of factor analysis in behavioral and life sciences. New York: Plenum.
Clauser, B. E., Nungester, R. J., & Swaminathan, H. (1996). Improving the matching for DIF analysis by conditioning on both test score and an educational background variable. Journal of Educational Measurement, 33, 453–464.
Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and White students in integrated colleges. Journal of Educational Measurement, 5, 115–124.
Cohen, A. S., Kim, S., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20,15–26.
Cotter, D. E., & Berk, R. A. (1981, April). Item bias in the WISC-R using black, white, and hispanic learning disabled children. Paper presented at the Annual Meeting of the American Educational Research Association, Los Angeles (ERIC Document Reproduction Service ED 206 631).
Diana v. the California State Board of Education. Case No. C-70–37 RFP. (N.D. Cal. 1970).
Dorans, N. J. (1989). Two new approaches to assessing differential item functioning:Standardization and the Mantel-Haenszel method. Applied Psychological Measurement,2, 217–233.
Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An applica-tion of the standardization approach (Research Rep. No. 83–9). Princeton, NJ: Educational Testing Service.
Engelhard, G., Hansche, L., & Rutledge, K. E. (1990). Accuracy of bias review judges in identifying differential item functioning on teacher certification tests. Applied Psychological Measurement, 3, 347–360.
Feingold, A. (1993). Cognitive gender differences: A developmental perspective. Sex Roles, 29, 91–112.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). New York: American Council on Education & Macmillan.
Frisby, C. L. (1998). Poverty and socioeconomic status. In J. L. Sandoval, C. L. Frisby, K. F. Geisinger, J. D. Scheuneman, & J. R. Grenier (Eds.), Test interpretation and diversity: Achieving equity in assessment (pp. 241–270). Washington, DC: American Psychological Association.
Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential distractor functioning. Journal of educational measurement,26, 147–160.
Hammill, D. D., Pearson, N. A., & Wiederholt, J. L. (1997). Comprehensive test of nonverbal intelligence. Austin, TX: PRO-ED.
Herrnstein, R. J., & Murray, C. (1994). The bell curve. New York: Free Press.
Hills, J. R. (1989). Screening for potentially biased items in testing programs. Educational Measurement: Issues and Practice, 8(4), 5–11.
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.
Hu, L., Bentler, P. M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112,351–362.
Ilai, D., & Willerman, L. (1989). Sex differences in WAIS-R item performance. Intelligence, 13, 225–234.
Jastak, J. E., & Jastak, S. R. (1964). Short forms of the WAIS and WISC vocabulary subtests. Journal of Clinical Psychology, 20, 167–199.
Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational Review, 39,1–123.
Jensen, A. R. (1974). How biased are culture-loaded tests? Genetic Psychology Monographs, 90, 185–224.
Jensen, A. R. (1976). Test bias and construct validity. Phi Delta Kappan, 58, 340–346.
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Joint Committee on Testing Practices. (1988). Code of fair testing practices in education. Washington, DC: National Council on Measurement in Education.
Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 57, 409–426.
Jöreskog, K. G., & Sörbom, D. (1989). LISREL7: A guide to the program and applications (2nd ed.). Chicago: SPSS.
Koh, T., Abbatiello, A., & McLoughlin, C. S. (1984). Cultural bias in WISC subtest items: A response to Judge Grady’s suggestions in relation to the PASE case. School Psychology Review, 13,89–94.
Kromrey, J. D., & Parshall, C. G. (1991, November). Screening items for bias: An empirical comparison of the performance of three indices in small samples of examinees. Paper presented at the annual meeting of the Florida Educational Research Association. Clearwater, FL.
Larry P. et al. v. Wilson Riles, Superintendent of Public Instruction for the State of California, et al., Case No. C-71–2270 (N.D. Cal., 1979).
Lawrence, I. M., & Curley, W. E. (1989, March). Differential item functioning of SAT-Verbal reading subscore items for males and females: follow-up study. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Lawrence, I. M., Curley, W. E., & McHale, F. J. (1988, April). Differential item functioning of SAT-Verbal reading subscore items for male and female examinees. Paper presented at the annual meeting of the American Educational Research Association, New Orleans.
Linn, R. L., Levine, M. V., Hastings, C. N., & Waldrop, J. L. (1980). An investigation of item bias in a test of reading comprehension (Technical Rep. No. 163). Urbana: Center for the Study of Reading, University of Illinois at Urbana-Champaign.
Loehlin, J. C., Lindzey, G., & Spuhler, J. N. (1975). Race differences in intelligence. San Francisco: W. H. Freeman.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Lynn, R. (1977). The intelligence of the Japanese. Bulletin of the British Psychological Society. 30,69–72.
Maller, S. J. (1996). WISC-III Verbal item invariance across samples of deaf and hearing children of similar measured ability. Journal of Psychoeducational Assessment, 14, 152–165.
Maller, S. J. (2000). Item invariance in four subtests of the Universal Nonverbal Intelligence Test across groups of deaf and hearing children. Journal of Psychoeducational Assessment, 18,240–254.
Maller, S. J. (2001). Differential item functioning in the WISC-III: Item parameters for boys and girls in the national standardization sample. Educational and Psychological Measurement, 61,793–817.
Maller, S. J., & Ferron, J. (1997). WISC-III factor invariance across deaf and standardization samples. Educational and Psychological Measurement, 7, 987–994.
Maller, S. J., Konold, T. R., & Glutting, J. J. (1998). WISC-III Factor invariance across samples of children displaying appropriate and inappropriate test-taking behavior. Educational and Psychological Measurement, 58, 467–475.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from the retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
McGaw, B., & Jöreskog, K. G. (1971). Factorial invariance of ability measures in groups differing in intelligence and socio-economic status. British Journal of Mathematical and Statistical Psychology, 24, 154–168.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education & Macmillan.
Miele, F. (1979). Cultural bias in the WISC. Intelligence, 3, 149–164.
Munford, P. R., & Munoz, A. (1980). A comparison of the WISC and WISC-R on Hispanic children. Journal of Clinical Psychology, 36, 452–458.
National Council on Measurement in Education (1995). Code of professional responsibilities in educational measurement. Washington, DC: NCME.
Nelson, K. M., Arthur, P., Lautiger, J., & Smith, D. K. (1994, March). Does the use of color on the WISC-III affect student performance? Paper presented at the annual meeting of the National Association of School Psychologists, Seattle, WA.
O’Neill, K. A., & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In H. Wainer & H. I. Braun (Eds.). Test validity (pp. 255–276). Hillsdale, NJ: Erlbaum.
Plake, B. S. (1980). A comparison of a statistical and subjective procedure to ascertain item validity: One step in the test validation process. Educational and Psychological Measurement, 30,397–404.
Reynolds, C. R. (1982). The problem of bias in psychological assessment. In C. R. Reynolds & T. B. Gutkin (Eds.), The handbook of school psychology (pp. 178–108). New York: John Wiley.
Rigdon, E. E. (1996). CFI versus RMSEA: A comparison of two fit indexes for structural equation modeling. Structural Equation Modeling, 3, 369–379.
Roid, G. H., & Miller, L. J. (1997). Leiter International performance scale-revised: Examiner’s manual. In G. H. Roid and L. J. Miller, Leiter International performance scale-revised,Wood Dale, IL: Stoelting.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and the Mantel-Haenszel procedures for detecting differential item functioning. Applied Measurement in Education, 17, 105–116.
Ross-Reynolds, J., & Reschly, D. J. (1983). An investigation of item bias on the WISC-R with four sociocultural groups. Journal of Consulting and Clinical Psychology, 51, 144–146.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 4(2),Whole No. 17.
Sandoval, J. (1979). The WISC-R and internal evidence of test bias with minority groups. Journal of Consulting and Clinical Psychology, 47, 919–927.
Sandoval, J., & Millee, M. P. W. (1980). Accuracy of judgements of WISC-R item difficulty for minority groups. Journal of Consulting and Clinical Psychology, 48,249–253.
Sandoval, J., Zimmerman, I. L., & Woo-Sam, J. M. (1977). Cultural differences on the WISC-R verbal items. Journal of School Psychology, 21,49–55.
Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. Proceedings of the Business and Economic Statistics Section of the American Statistical Association, 303–313.
Scheuneman, J. (1975, April). A new method of assessing bias in test items. Paper presented at the annual meeting of the American Educational Research Association, Washington, DC.
Scheuneman, J. D., & Gerritz, K. (1990). Using differential item functioning procedures to explore sources of item difficulty and group performance characteristics. Journal of Educational Measurement, 27, 109–131.
Scheuneman, J. D., & Oakland, T. (1998). In J. Sandoval, C. L. Frisby, K. F. Geisinger, J. D. Scheuneman, & J. R. Grenirer (Eds.), Test interpretation and diversity: Achieving equity in assessment (pp. 77–103). Washington, DC: American Psychological Association.
Shelley-Sireci, & Sireci, S. G. (1998, August). Controlling for uncontrolled variables in cross-cultural research. Paper presented at the annual meeting of the American Psychological Association, San Francisco.
Sireci, S. G., Bastad, B., & Allalouf, A. (1998, August). Evaluating construct equivalence across adapted tests. Paper presented at the annual meeting of the American Psychological Association, San Francisco.
Simpson, E. H. (1951). Interpretation of interaction contingency tables. Journal of the Royal Statistical Society, (Series B),13, 238–241.
Smith, T. C., Edmonds, J. E., & Smith, B. (1989). The role of sex differences in the referral process as measured by the Peabody Picture Vocabulary Test-Revised and the Wechsler Intelligence Scale for Children-Revised. Psychology in the Schools, 26,354–358.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
Thissen, D. (1991). MULTILOG (Version 6.30) [Computer software]. Chicago: Scientific Software.
Thissen, D., & Steinberg, L. (1997). A response model for multiple-choice items. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 51–65). New York: Springer.
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group mean differences: The concept of item bias. Psychological Bulletin, 99,118–128.
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 149–169). Hillsdale, NJ: Erlbaum.
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response theory. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–114). Hillsdale, NJ: Erlbaum.
Turner, R. G., & Willerman, L. (1977). Sex differences in WAIS item performance. Journal of Clinical Psychology, 33, 795–798.
Vance, B., Hankins, N. & Brown, W. (1988). Ethnic and sex differences on the Test of Nonverbal Intelligence, Quick Test of Intelligence, and Wechsler Intelligence Scale for Children-Revised. Journal of Clinical Psychology, 44, 261–265.
Veale, J. R. (1977). A note on the use of chi-square with “correct/incorrect” data to detect culturally biased items (Statistical Research in the Behavioral Sciences, Technical Report No. 4). (Available from J. R. Veale, PO Box 4036, Berkeley, CA 94704.)
Welch, C., & Hoover, H. D. (1993). Procedures for extending item bias techniques to polytomously scored items. Applied Psychological Measurement, 6, 1–19.
Wessel, J., & Potter, A. (1994, March). Analysis of WISC-III data from an urban population of referred children. Paper presented at the annual meeting of the National Association of School Psychologists, Seattle, WA.
Wild, C. L., & McPeek, W. M. (1986, August). Performance of the Mantel-Haenszel statistic in identifying differentially functioning items. Paper presented at the annual meeting of the American Psychological Association, Washington, DC.
Wilkinson, S. C. (1993). WISC-R profiles of children with superior intellectual ability. Gifted Child Quarterly, 2, 84–92.
Zeiky, M. (1993). Practical questions in the use of DIF statistics in item development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–364). Hillsdale, NJ: Erlbaum.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIP): Logistic regression modeling as a unitary framework for binary and Likert-type (ordi-nal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessing differential item functioning in performance tasks. Journal of Educational Measurement, 30,233–251.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media New York
About this chapter
Cite this chapter
Maller, S.J. (2003). Best Practices in Detecting Bias in Nonverbal Tests. In: McCallum, R.S. (eds) Handbook of Nonverbal Assessment. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0153-4_2
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0153-4_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-4945-7
Online ISBN: 978-1-4615-0153-4
eBook Packages: Springer Book Archive