Issues to Consider When Evaluating “Tests”



Proper measurement of various client characteristics is an essential component in the financial planning process. In this chapter, we will present an overview of how to go about evaluating the quality of a “test” that one may be using or considering, including the meaning of important underlying concepts and the techniques that form the basis for assessing quality. We will present the reader with the tools for making this evaluation in terms of commonly accepted criteria, as reported in Standards for Educational and Psychological Testing, a document that is produced jointly by the American Educational Research Association (AERA), the National Council on Measurement in Education (NCME), and the American Psychological Association (APA) (American Psychological Association, American Educational Research Association, and the National Council on Measurement in Education (Joint Committee), 1985). Conceptually similar guidelines for test development and usage are published by the International Test Commission (ITC), a multinational association of test developers, users, and the agencies charged with oversight of proper test use (see


Positive Predictive Value Negative Predictive Value Item Response Theory Reliability Coefficient American Psychological Association 


  1. American Psychological Association, American Educational Research Association, and the National Council on Measurement in Education (Joint Committee) (1985), Standards for Educational and Psychological Tests, Washington, DC: APA.Google Scholar
  2. Batjelsmit, J. (1977). Reliability and validity of psychometric measures. In R. Andrulis (Ed.), Adult assessment. New York: Thomas, 29–44.Google Scholar
  3. Caspi, A., Roberts, B. W., & Shiner, R. L. (2005). Personality development: Stability and change. Annual Review of Psychology, 56, 453–484.PubMedCrossRefGoogle Scholar
  4. Cattell, R. B. (1986). The 16PF personality structure and Dr. Eysenck. Journal of Social Behavior and Personality, 1, 153–160.Google Scholar
  5. Cronbach, L. (1951). Coefficient alpha and the internal consistency of tests. Psychometrika, 16(3), 297–334.CrossRefGoogle Scholar
  6. DeVellis, R. F. (2003). Scale development: Theory and applications (2nd ed.). Thousand Oaks, CA: Sage.Google Scholar
  7. Greiner, M., Pfeiffer, D., & Smith, R. D. (2000). Principles and practical application of the receiver operating characteristic analysis for diagnostic tests. Preventive Veterinary Medicine, 45, 23–41.PubMedCrossRefGoogle Scholar
  8. Hinkin, T. R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21, 967–988.CrossRefGoogle Scholar
  9. Hogan, T. P., & Agnello, J. (2004). An empirical study of reporting practices concerning measurement validity. Educational and Psychological Measurement, 64, 802–812.CrossRefGoogle Scholar
  10. Nunnally, J. (1967). Psychometric theory. New York: McGraw Hill.Google Scholar
  11. Reschly, D. (1987). Learning characteristics of mildly handicapped students: Implications for classification, placement, and programming. In M. C. Wang, M. C. Reynolds, & H. J. Walberg (Eds.), The handbook of special education: Research and practice (pp. 35–58). Oxford: Pergamon Press.Google Scholar
  12. Roszkowski, M. J., & Cordell, D. M. (2009). A longitudinal perspective on financial risk tolerance: Rank-order and mean level stability. International Journal of Behavioural Accounting and Finance, 1, 111–134.CrossRefGoogle Scholar
  13. Rudner, L. M. (1994). Questions to ask when evaluating tests. Practical Assessment, Research & Evaluation, 4(2). Retrieved February 18, 2010 from
  14. Saad, S., Carter, G. W., Rothenberg, M., & Israelson, E. (1999). Testing and assessment: an employer’s guide to good practices. Washington, DC: U.S. Department of Labor Employment and Training Administration.Google Scholar
  15. Spreat, S., & Connelly, L. (1996). A reliability analysis of the Motivation Assessment Scale. American Journal on Mental Retardation, 100(5), 528–532.PubMedGoogle Scholar
  16. Stanley, J. (1971). Reliability. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). Washington, DC: American Council on Education.Google Scholar
  17. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667–680.CrossRefGoogle Scholar
  18. Thorndike, R., & Hagen, E. (1961). Measurement and evaluation in psychology and education. New York: John Wiley and Sons.Google Scholar
  19. Weber, E. U., Blais, A-R., & Betz, N. E. (2002). A domain-specific risk-attitude scale: Measuring risk perceptions and risk behaviors. Journal of Behavioral Decision Making, 15, 263–290.CrossRefGoogle Scholar
  20. Weber, E. U., & Milliman, R. A. (1997). Perceived risk attitudes: Relating perception to risky choice. Management Science, 43, 123–144.CrossRefGoogle Scholar
  21. Weber, E. U., Shafir, S., & Blais, A-R. (2004). Predicting risk-sensitivity in humans and lower animals: Risk as variance or coefficient of variation. Psychological Review, 111, 430–445.PubMedCrossRefGoogle Scholar
  22. Woehr, D. J., & Huffcutt, A. L. (1994). Rater training for performance appraisal: A quantitative review. Journal of Occupational & Organizational Psychology, 4, 189–216.CrossRefGoogle Scholar
  23. Yao, R., Gutter, M.S., & Hanna, S. D. (2005). The financial risk tolerance of Blacks, Hispanics and Whites. Financial Counseling and Planning, 16, 51–62.Google Scholar
  24. Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Institutional Research, La Salle UniversityPhiladelphiaUSA
  2. 2.Behavioral Health at Woods Services, IncLanghorneUSA

Personalised recommendations