Abbott, L. C. (1983). A study of humanism in family practice. Journal of Family Practice,
Anderson, L. A., & Dedrick, R. F. (1990). Development of the trust in physician scale: A measure to assess interpersonal trust in patient-physician relationships. Psychological Reports,
Arnold, E. L., Blank, L. L., Race, K. E., & Cipparrone, N. (1998). Can professionalism be measured? The development of a scale for use in the medical environment. Academic Medicine,
Baker, R. (1990). Development of a questionnaire to assess patients’ satisfaction with consultations in general practice. British Journal of General Practice,
Balzer, W. K., & Sulsky, L. M. (1992). Halo and performance appraisal research: A critical examination. Journal of Applied Psychology,
Brennan, R. L. (2001a). Generalizability theory. New York: Springer.
Brennan, R. L. (2001b). An essay on the history and future of reliability from the perspective of replications. Journal of Educational Measurement,
Butterfield, P. S., Mazzaferri, E. L., & Sachs, L. A. (1987). Nurses as evaluators of humanistic behavior in internal medicine residents. Journal of Medical Education,
Campbell, D. T., & Fiske, D. W. (1959). Convergent and divergent validation by the multitrait-multimethod matrix. Psychological Bulletin,
Clauser, B. E., Margolis, M. J., & Swanson, D. B. (2008). Issues of validity and reliability for assessments in medical education. In E. Holmboe & R. Hawkins (Eds.), Practical guide to the evaluation of clinical competence (pp. 10–23). Amsterdam: Elsevier.
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items (ITEMS Module). Educational Measurement: Issues and Practice,
Cronbach, L. J. (1980). Validity on parole: How can we go straight? New directions for testing and measurement: Measuring achievement over a decade. Proceedings of the 1979 ETS Invitational Conference (pp. 99–108). San Francisco: Jossey-Bass.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin,
Cruess, R. L., & Cruess, S. R. (2006). Teaching professionalism: General principles. Medical Teacher,
Cruess, R., McIlroy, J. H., Cruess, S., Ginsburg, S., & Steinert, Y. (2006). The professionalism mini-evaluation exercise: A preliminary investigation. Academic Medicine (RIME Supplement),
Dannefer, E. F., Henson, L. C., Bierer, S. B., Grady-Weliky, T. A., Meldrum, S., Nofziger, A. C., et al. (2005). Peer assessment of professional competence. Medical Education,
Eagly, A. H., Ashmore, R. D., Makhijani, M. G., & Longo, L. C. (1991). What is beautiful is good, but…: A meta-analytic review of research on the physical attractiveness stereotype. Psychological Bulletin,
Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement,
Flanagan, J. C. (1948). The aviation psychology program in the Army Air Forces. Washington: US Government Printing Office.
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement,
Ginsburg, S., Regehr, G., & Mylopoulos, M. (2007). Reasoning when it counts: Students’ rationales for action on a professionalism exam. Academic Medicine (RIME Supplement),
Ginsburg, S., Regehr, G., & Mylopoulos, M. (2009). From behaviours to attributions: Further concerns regarding the evaluation of professionalism. Medical Education,
Gulliksen, H. (1950). Theory of mental tests
. New York: Wiley.CrossRef
Hambleton, R. K., Swaminathan, H., & Jane Rogers, H. (1991). MMSS fundamentals of item response theory. Newbury Park: Sage.
Jacobs, R., & Kozlowski, S. W. J. (1985). A closer look at halo error in performance ratings. The Academy of Management Journal,
Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport: American Council on Education/Praeger.
Lipner, R. S., Blank, L. L., Leas, B. F., & Fortina, G. S. (2002). The value of patient and peer ratings in recertification. Academic Medicine,
Margolis, M. J., Clauser, B. E., Cuddy, M. M., Ciccone, A., Mee, J., Harik, P., et al. (2006). Use of the Mini-CEX to rate examinee performance on a multiple-station clinical skills examination: A validity study. Academic Medicine (RIME Supplement),
Mazor, K., Canavan, C., Farrell, M., Margolis, M. J., & Clauser, B. E. (2008). Collecting validity evidence for an assessment of professionalism: Findings from think-aloud interviews. Academic Medicine,
Mazor, K. M., Margolis, M. J., Holtman, M., & Clauser, B. E. (2007). Evaluation of missing data in an assessment of professional behaviors. Academic Medicine (RIME Supplement),
Mazor, K. M., Ockeme, J. K., & Rogers, H. J. (2005). The relationship between checklist scores on a communications OSCE and analogue patients’ perceptions of communications. Advances in Health Science Education,
McCall, G. J. (1984). Systematic field observation. Annual Review of Sociology,
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education, MacMillan Publishing Co.
Murphy, K. R., Jako, R. A., & Anhalt, R. L. (1993). Nature and consequences of halo error: A critical analysis. Journal of Applied Psychology,
Papadakis, M. A., Arnold, G. K., Blank, L. L., Holmboe, E. S., & Lipner, R. S. (2008). Performance during internal medicine residency training and subsequent disciplinary action by state licensing boards. Annals of Internal Medicine,
Papadakis, M., & Loeser, H. (2006). Using critical incident reports and longitudinal observations to assess professionalism. In D. T. Stern (Ed.), Measuring medical professionalism (pp. 159–173). New York: Oxford University Press.
Ram, P., van der Vleuten, C., Rethans, J. J., Grol, R., & Aretz, K. (1999). Assessment of practicing family physicians: Comparison of observation in a multiple-station examination using standardized patients with observation of consultations in daily practice. Academic Medicine,
Singer, P. A., Cohen, R., Robb, A., & Rothhan, A. I. (1993). The ethics of objective structured clinical examination. Journal of General Internal Medicine,
Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology,
Stern, D. T. (1996). Values on call: A method for assessing the teaching of professionalism. Academic Medicine,
Stern, D. T. (2006). Measuring medical professionalism. New York: Oxford University Press.
Stern, D. T., Frohna, A. Z., & Gruppen, L. D. (2005). The prediction of professional behavior. Medical Education,
Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology,
Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessing professional competence: From methods to programmes. Medical Education,
Veloski, J. J., Fields, S. K., Boex, J. R., Blank, L. L. (2005). Measuring professionalism: A review of studies with instruments reported in the literature between 1982 and 2002. Academic Medicine, 80
Violato, C., Lockyer, J., & Fidler, H. (2003). Multisource feedback: a method of assessing surgical practice. British Medical Journal,