Best Practices in Detecting Bias in Nonverbal Tests

Maller, Susan J.

doi:10.1007/978-1-4615-0153-4_2

Susan J. Maller²

1341 Accesses
5 Citations

Abstract

Group comparisons of performance on intelligence tests have been advanced as evidence of real similarities or differences in intellectual ability by Jensen (1980) and, more recently, by Herrnstein and Murray (1994). This purported evidence includes the mean intelligence score dif-ferences that have been reported for various ethnic groups (e.g., Jensen, 1969,1980; Loehlin, Lindzey, & Spuhler, 1975; Lynn, 1977; Munford & Munoz, 1980) or gender (e.g., Feingold, 1993; Nelson, Arthur, Lautiger, & Smith, 1994; Smith, Edmonds, & Smith, 1989; Vance, Hankins, & Brown, 1988; Wessel & Potter, 1994; Wilkinson, 1993).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alwin, D. F., & Jackson, D. J. (1981). Applications of simultaneous factor analysis to issues of factorial invariance. In D. Jackson & E. Borgatta (Eds.), Factor analysis and measurement in sociological research: A multi-dimensional perspective (pp. 249–279). Beverly Hills, CA: Sage.
Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Google Scholar
Anderson, R. J., & Sisco, F. H. (1977). Standardization of the WISC-R performance scale for deaf children. (Office of Demographic Studies Publication Series T, No. 1). Washington, DC: Gallaudet College.
Google Scholar
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3–24). Hillsdale, NJ: Erlbaum.
Google Scholar
Angoff, W. H., & Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 10, 95–105.
Article Google Scholar
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107,238–246.
Article PubMed Google Scholar
Bentler, P. M. (1992). On the fit of models to covariances and methodology in the Bulletin. Psychological Bulletin, 112, 400–404.
Article PubMed Google Scholar
Berk, R. A. (Ed). (1982). Handbook of methods for detecting test bias. Baltimore: Johns Hopkins University Press.
Google Scholar
Bock, R. D. (1997). The nominal categories model. In W. J. van der Linden & R. K. Hambleton (Eds.) Handbook of modern item response theory (pp. 33–49). New York: Springer.
Google Scholar
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parame-ters: Application of an EM algorithm. Psychometrika, 46, 443–459.
Article Google Scholar
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Google Scholar
Bracken, B. A., & McCallum, R. S. (1998). Universal Nonverbal Intelligence Test: Examiner’s manual. Itasca, IL: Riverside Publishing.
Google Scholar
Braden, J. P. (1999). Straight talk about assessment and diversity: What do we know? School Psychology Quarterly, 14, 343–355.
Article Google Scholar
Breslow, N. E., & Day, N. E. (1980). Statistical methods in cancer research Volume 1: The analysis of case-control studies. Lyon: International Agency for Research on Cancer.
Google Scholar
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equations models (pp. 136–162). Newbury Park, CA: Sage.
Google Scholar
Bryk, A. (1980). Review of Bias in mental testing. Journal of Educational Measurement, 17, 369–374.
Google Scholar
Camilli, G. (1979). A critique of the chi-square method for assessing item bias. Unpublished paper, Laboratory of Educational Research, University of Colorado.
Google Scholar
Camilli, G., & Shepard, L. A. (1987). The inadequacy of ANOVA for detecting test bias. Journal of Educational Statistics, 12, 87–89.
Article Google Scholar
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.
Google Scholar
Cattell, R. B. (1978). The scientific use of factor analysis in behavioral and life sciences. New York: Plenum.
Book Google Scholar
Clauser, B. E., Nungester, R. J., & Swaminathan, H. (1996). Improving the matching for DIF analysis by conditioning on both test score and an educational background variable. Journal of Educational Measurement, 33, 453–464.
Article Google Scholar
Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and White students in integrated colleges. Journal of Educational Measurement, 5, 115–124.
Article Google Scholar
Cohen, A. S., Kim, S., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20,15–26.
Article Google Scholar
Cotter, D. E., & Berk, R. A. (1981, April). Item bias in the WISC-R using black, white, and hispanic learning disabled children. Paper presented at the Annual Meeting of the American Educational Research Association, Los Angeles (ERIC Document Reproduction Service ED 206 631).
Google Scholar
Diana v. the California State Board of Education. Case No. C-70–37 RFP. (N.D. Cal. 1970).
Google Scholar
Dorans, N. J. (1989). Two new approaches to assessing differential item functioning:Standardization and the Mantel-Haenszel method. Applied Psychological Measurement,2, 217–233.
Google Scholar
Dorans, N. J., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An applica-tion of the standardization approach (Research Rep. No. 83–9). Princeton, NJ: Educational Testing Service.
Google Scholar
Engelhard, G., Hansche, L., & Rutledge, K. E. (1990). Accuracy of bias review judges in identifying differential item functioning on teacher certification tests. Applied Psychological Measurement, 3, 347–360.
Google Scholar
Feingold, A. (1993). Cognitive gender differences: A developmental perspective. Sex Roles, 29, 91–112.
Article Google Scholar
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146). New York: American Council on Education & Macmillan.
Google Scholar
Frisby, C. L. (1998). Poverty and socioeconomic status. In J. L. Sandoval, C. L. Frisby, K. F. Geisinger, J. D. Scheuneman, & J. R. Grenier (Eds.), Test interpretation and diversity: Achieving equity in assessment (pp. 241–270). Washington, DC: American Psychological Association.
Chapter Google Scholar
Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential distractor functioning. Journal of educational measurement,26, 147–160.
Article Google Scholar
Hammill, D. D., Pearson, N. A., & Wiederholt, J. L. (1997). Comprehensive test of nonverbal intelligence. Austin, TX: PRO-ED.
Google Scholar
Herrnstein, R. J., & Murray, C. (1994). The bell curve. New York: Free Press.
Google Scholar
Hills, J. R. (1989). Screening for potentially biased items in testing programs. Educational Measurement: Issues and Practice, 8(4), 5–11.
Article Google Scholar
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.
Google Scholar
Hu, L., Bentler, P. M., & Kano, Y. (1992). Can test statistics in covariance structure analysis be trusted? Psychological Bulletin, 112,351–362.
Article PubMed Google Scholar
Ilai, D., & Willerman, L. (1989). Sex differences in WAIS-R item performance. Intelligence, 13, 225–234.
Article Google Scholar
Jastak, J. E., & Jastak, S. R. (1964). Short forms of the WAIS and WISC vocabulary subtests. Journal of Clinical Psychology, 20, 167–199.
Article PubMed Google Scholar
Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational Review, 39,1–123.
Google Scholar
Jensen, A. R. (1974). How biased are culture-loaded tests? Genetic Psychology Monographs, 90, 185–224.
Google Scholar
Jensen, A. R. (1976). Test bias and construct validity. Phi Delta Kappan, 58, 340–346.
Google Scholar
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Google Scholar
Joint Committee on Testing Practices. (1988). Code of fair testing practices in education. Washington, DC: National Council on Measurement in Education.
Google Scholar
Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 57, 409–426.
Article Google Scholar
Jöreskog, K. G., & Sörbom, D. (1989). LISREL7: A guide to the program and applications (2nd ed.). Chicago: SPSS.
Google Scholar
Koh, T., Abbatiello, A., & McLoughlin, C. S. (1984). Cultural bias in WISC subtest items: A response to Judge Grady’s suggestions in relation to the PASE case. School Psychology Review, 13,89–94.
Google Scholar
Kromrey, J. D., & Parshall, C. G. (1991, November). Screening items for bias: An empirical comparison of the performance of three indices in small samples of examinees. Paper presented at the annual meeting of the Florida Educational Research Association. Clearwater, FL.
Google Scholar
Larry P. et al. v. Wilson Riles, Superintendent of Public Instruction for the State of California, et al., Case No. C-71–2270 (N.D. Cal., 1979).
Google Scholar
Lawrence, I. M., & Curley, W. E. (1989, March). Differential item functioning of SAT-Verbal reading subscore items for males and females: follow-up study. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.
Google Scholar
Lawrence, I. M., Curley, W. E., & McHale, F. J. (1988, April). Differential item functioning of SAT-Verbal reading subscore items for male and female examinees. Paper presented at the annual meeting of the American Educational Research Association, New Orleans.
Google Scholar
Linn, R. L., Levine, M. V., Hastings, C. N., & Waldrop, J. L. (1980). An investigation of item bias in a test of reading comprehension (Technical Rep. No. 163). Urbana: Center for the Study of Reading, University of Illinois at Urbana-Champaign.
Google Scholar
Loehlin, J. C., Lindzey, G., & Spuhler, J. N. (1975). Race differences in intelligence. San Francisco: W. H. Freeman.
Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Lynn, R. (1977). The intelligence of the Japanese. Bulletin of the British Psychological Society. 30,69–72.
Google Scholar
Maller, S. J. (1996). WISC-III Verbal item invariance across samples of deaf and hearing children of similar measured ability. Journal of Psychoeducational Assessment, 14, 152–165.
Article Google Scholar
Maller, S. J. (2000). Item invariance in four subtests of the Universal Nonverbal Intelligence Test across groups of deaf and hearing children. Journal of Psychoeducational Assessment, 18,240–254.
Article Google Scholar
Maller, S. J. (2001). Differential item functioning in the WISC-III: Item parameters for boys and girls in the national standardization sample. Educational and Psychological Measurement, 61,793–817.
Article Google Scholar
Maller, S. J., & Ferron, J. (1997). WISC-III factor invariance across deaf and standardization samples. Educational and Psychological Measurement, 7, 987–994.
Article Google Scholar
Maller, S. J., Konold, T. R., & Glutting, J. J. (1998). WISC-III Factor invariance across samples of children displaying appropriate and inappropriate test-taking behavior. Educational and Psychological Measurement, 58, 467–475.
Article Google Scholar
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from the retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
PubMed Google Scholar
McGaw, B., & Jöreskog, K. G. (1971). Factorial invariance of ability measures in groups differing in intelligence and socio-economic status. British Journal of Mathematical and Statistical Psychology, 24, 154–168.
Article Google Scholar
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education & Macmillan.
Google Scholar
Miele, F. (1979). Cultural bias in the WISC. Intelligence, 3, 149–164.
Article Google Scholar
Munford, P. R., & Munoz, A. (1980). A comparison of the WISC and WISC-R on Hispanic children. Journal of Clinical Psychology, 36, 452–458.
Article Google Scholar
National Council on Measurement in Education (1995). Code of professional responsibilities in educational measurement. Washington, DC: NCME.
Google Scholar
Nelson, K. M., Arthur, P., Lautiger, J., & Smith, D. K. (1994, March). Does the use of color on the WISC-III affect student performance? Paper presented at the annual meeting of the National Association of School Psychologists, Seattle, WA.
Google Scholar
O’Neill, K. A., & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In H. Wainer & H. I. Braun (Eds.). Test validity (pp. 255–276). Hillsdale, NJ: Erlbaum.
Google Scholar
Plake, B. S. (1980). A comparison of a statistical and subjective procedure to ascertain item validity: One step in the test validation process. Educational and Psychological Measurement, 30,397–404.
Article Google Scholar
Reynolds, C. R. (1982). The problem of bias in psychological assessment. In C. R. Reynolds & T. B. Gutkin (Eds.), The handbook of school psychology (pp. 178–108). New York: John Wiley.
Google Scholar
Rigdon, E. E. (1996). CFI versus RMSEA: A comparison of two fit indexes for structural equation modeling. Structural Equation Modeling, 3, 369–379.
Article Google Scholar
Roid, G. H., & Miller, L. J. (1997). Leiter International performance scale-revised: Examiner’s manual. In G. H. Roid and L. J. Miller, Leiter International performance scale-revised,Wood Dale, IL: Stoelting.
Google Scholar
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and the Mantel-Haenszel procedures for detecting differential item functioning. Applied Measurement in Education, 17, 105–116.
Google Scholar
Ross-Reynolds, J., & Reschly, D. J. (1983). An investigation of item bias on the WISC-R with four sociocultural groups. Journal of Consulting and Clinical Psychology, 51, 144–146.
Article Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 4(2),Whole No. 17.
Google Scholar
Sandoval, J. (1979). The WISC-R and internal evidence of test bias with minority groups. Journal of Consulting and Clinical Psychology, 47, 919–927.
Article Google Scholar
Sandoval, J., & Millee, M. P. W. (1980). Accuracy of judgements of WISC-R item difficulty for minority groups. Journal of Consulting and Clinical Psychology, 48,249–253.
Article Google Scholar
Sandoval, J., Zimmerman, I. L., & Woo-Sam, J. M. (1977). Cultural differences on the WISC-R verbal items. Journal of School Psychology, 21,49–55.
Article Google Scholar
Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. Proceedings of the Business and Economic Statistics Section of the American Statistical Association, 303–313.
Google Scholar
Scheuneman, J. (1975, April). A new method of assessing bias in test items. Paper presented at the annual meeting of the American Educational Research Association, Washington, DC.
Google Scholar
Scheuneman, J. D., & Gerritz, K. (1990). Using differential item functioning procedures to explore sources of item difficulty and group performance characteristics. Journal of Educational Measurement, 27, 109–131.
Article Google Scholar
Scheuneman, J. D., & Oakland, T. (1998). In J. Sandoval, C. L. Frisby, K. F. Geisinger, J. D. Scheuneman, & J. R. Grenirer (Eds.), Test interpretation and diversity: Achieving equity in assessment (pp. 77–103). Washington, DC: American Psychological Association.
Chapter Google Scholar
Shelley-Sireci, & Sireci, S. G. (1998, August). Controlling for uncontrolled variables in cross-cultural research. Paper presented at the annual meeting of the American Psychological Association, San Francisco.
Google Scholar
Sireci, S. G., Bastad, B., & Allalouf, A. (1998, August). Evaluating construct equivalence across adapted tests. Paper presented at the annual meeting of the American Psychological Association, San Francisco.
Google Scholar
Simpson, E. H. (1951). Interpretation of interaction contingency tables. Journal of the Royal Statistical Society, (Series B),13, 238–241.
Google Scholar
Smith, T. C., Edmonds, J. E., & Smith, B. (1989). The role of sex differences in the referral process as measured by the Peabody Picture Vocabulary Test-Revised and the Wechsler Intelligence Scale for Children-Revised. Psychology in the Schools, 26,354–358.
Article Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
Article Google Scholar
Thissen, D. (1991). MULTILOG (Version 6.30) [Computer software]. Chicago: Scientific Software.
Google Scholar
Thissen, D., & Steinberg, L. (1997). A response model for multiple-choice items. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 51–65). New York: Springer.
Google Scholar
Thissen, D., Steinberg, L., & Gerrard, M. (1986). Beyond group mean differences: The concept of item bias. Psychological Bulletin, 99,118–128.
Article Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 149–169). Hillsdale, NJ: Erlbaum.
Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response theory. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–114). Hillsdale, NJ: Erlbaum.
Google Scholar
Turner, R. G., & Willerman, L. (1977). Sex differences in WAIS item performance. Journal of Clinical Psychology, 33, 795–798.
Article PubMed Google Scholar
Vance, B., Hankins, N. & Brown, W. (1988). Ethnic and sex differences on the Test of Nonverbal Intelligence, Quick Test of Intelligence, and Wechsler Intelligence Scale for Children-Revised. Journal of Clinical Psychology, 44, 261–265.
Article PubMed Google Scholar
Veale, J. R. (1977). A note on the use of chi-square with “correct/incorrect” data to detect culturally biased items (Statistical Research in the Behavioral Sciences, Technical Report No. 4). (Available from J. R. Veale, PO Box 4036, Berkeley, CA 94704.)
Google Scholar
Welch, C., & Hoover, H. D. (1993). Procedures for extending item bias techniques to polytomously scored items. Applied Psychological Measurement, 6, 1–19.
Google Scholar
Wessel, J., & Potter, A. (1994, March). Analysis of WISC-III data from an urban population of referred children. Paper presented at the annual meeting of the National Association of School Psychologists, Seattle, WA.
Google Scholar
Wild, C. L., & McPeek, W. M. (1986, August). Performance of the Mantel-Haenszel statistic in identifying differentially functioning items. Paper presented at the annual meeting of the American Psychological Association, Washington, DC.
Google Scholar
Wilkinson, S. C. (1993). WISC-R profiles of children with superior intellectual ability. Gifted Child Quarterly, 2, 84–92.
Article Google Scholar
Zeiky, M. (1993). Practical questions in the use of DIF statistics in item development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–364). Hillsdale, NJ: Erlbaum.
Google Scholar
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIP): Logistic regression modeling as a unitary framework for binary and Likert-type (ordi-nal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Google Scholar
Zwick, R., Donoghue, J. R., & Grima, A. (1993). Assessing differential item functioning in performance tasks. Journal of Educational Measurement, 30,233–251.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Educational Studies, Purdue University, West Lafayette, Indiana, 47907, USA
Susan J. Maller

Authors

Susan J. Maller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Tennessee, Knoxville, Tennessee, USA
R. Steve McCallum

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Maller, S.J. (2003). Best Practices in Detecting Bias in Nonverbal Tests. In: McCallum, R.S. (eds) Handbook of Nonverbal Assessment. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0153-4_2

Download citation

DOI: https://doi.org/10.1007/978-1-4615-0153-4_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-4945-7
Online ISBN: 978-1-4615-0153-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Best Practices in Detecting Bias in Nonverbal Tests

Abstract

Access this chapter

Preview

Similar content being viewed by others

Comprehensive Test of Nonverbal Intelligence: Second Edition

The Universal Nonverbal Intelligence Test: Second Edition

The General Ability Measure for Adults

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Best Practices in Detecting Bias in Nonverbal Tests

Abstract

Access this chapter

Preview

Similar content being viewed by others

Comprehensive Test of Nonverbal Intelligence: Second Edition

The Universal Nonverbal Intelligence Test: Second Edition

The General Ability Measure for Adults

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation