Objective
To summarize the psychometric properties of the PHQ2 and PHQ9 as screening instruments for depression.
Interventions
We identified 17 validation studies conducted in primary care; medical outpatients; and specialist medical services (cardiology, gynecology, stroke, dermatology, head injury, and otolaryngology). Electronic databases from 1994 to February 2007 (MEDLINE, PsycLIT, EMBASE, CINAHL, Cochrane registers) plus study reference lists have been used for this study. Translations included US English, Dutch, Italian, Spanish, German and Arabic). Summary sensitivity, specificity, likelihood and diagnostic odds ratios (OR) against a gold standard (DSM-IV) Major Depressive Disorder (MDD) were calculated for each study. We used random effects bivariate meta-analysis at recommended cut points to produce summary receiver–operator characteristic (sROC) curves. We explored heterogeneity with metaregression.
Measurements and Main Results
Fourteen studies (5,026 participants) validated the PHQ9 against MDD: sensitivity = 0.80 (95% CI 0.71–0.87); specificity = 0.92 (95% CI 0.88–0.95); positive likelihood ratio = 10.12 (95% CI 6.52–15.67); negative likelihood ratio = 0.22 (0.15 to 0.32). There was substantial heterogeneity (Diagnostic Odds Ratio heterogeneity I2 = 82%), which was not explained by study setting (primary care versus general hospital); method of scoring (cutoff ≥ 10 versus “diagnostic algorithm”); or study quality (blinded versus unblinded). The diagnostic validity of the PHQ2 was only validated in 3 studies and showed wide variability in sensitivity.
Conclusions
The PHQ9 is acceptable, and as good as longer clinician-administered instruments in a range of settings, countries, and populations. More research is needed to validate the PHQ2 to see if its diagnostic properties approach those of the PHQ9.
This is a preview of subscription content, access via your institution.

References
Simon G, Von Korff M. Recognition and management of depression in primary care. Arch Fam Med. 1995;4:99–105.
Katon W, Ciechanowski P. Impact of major depression on chronic medical illness. J Psychosom Res. 2002;53:859–63.
Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed patients. Results from the Medical Outcomes Study. JAMA. 1989;262(7):914–9.
Simon GE, Chisholm D, Treglia M, Bushnell D. Course of depression, health services costs, and work productivity in an international primary care study. Gen Hosp Psych. 2002;24(5):328–35.
Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2002;136:765–76.
Agency for Healthcare Research and Quality. Screening for Depression: Systematic Evidence Review Number 6. Rockville MD: AHRQ, 2002.
Street RL, Jr., Gold WR, McDowell T. Using health status surveys in medical consultations. Med Care. 1994;32(7):732–44.
Williams JW, Pignone M, Ramirez G, Stellato CP. Identifying depression in primary care: a literature synthesis of case-finding instruments. Gen Hosp Psych. 2002;24:225–37.
Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care. 2003;41:1284–92.
Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. JAMA. 1999;282:1737–44.
Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–13.
Deeks J. Evaluations of diagnostic and screening tests. In: Egger M, Davey Smith G, Altman DG, eds. Systematic Reviews in Health Care. London: BMJ Books, 2000:248–82.
Deville WL, Buntinx F, Bouter LM, et al. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol. 2002;2:9.
Whiting P, Rutjes AW, Dinnes J, Reitsma J, Bossuyt PM, Kleijnen J. Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Technol Assess. 2004;8:1–234.
Knottnerus JA, Muris JW. Assessment of the accuracy of diagnostic tests: the cross-sectional study. J Clin Epidemiol. 2003;56:1118–28.
World Health Organisation. International Statistical Classification of Diseases and Related Health Problems—10th Revision. Geneva: WHO, 1990.
American Psychiatric Association. Diagnostic and Statistical Manual—4th Edition. Washington DC: American Psychiatric Association, 1994.
Spitzer RL, Williams JB, Gibbon M, First MB. The Structured Clinical Interview for DSM-III-R (SCID). I: History, rationale, and description. Arch Gen Psychiatry. 1992;49(8):624–9.
Robins LN, Helzer JE, Croughan J, Ratcliff KS. National Institute of Mental Health Diagnostic Interview Schedule. Its history, characteristics, and validity. Arch Gen Psychiatry. 1981;38:381–9.
Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study. JAMA. 1994;272:1749–56.
Churchill R, Hunot V, McGuire H. Cochrane Depression Anxiety and Neurosis Group. Cochrane Library 2004;2.
Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A basic science for clinical medicine. Boston, MA.: Little, Brown and Company, 1991.
Sackett DL, Haynes RB. Evidence base of clinical diagnosis: the architecture of diagnostic research. BMJ. 2002;324:539–41.
Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56:1129–35.
Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004;140(3):189–202.
Reitsma JB, Glas AS, Rutjes AWS, Scholten RJPM, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005;58:982–90.
Chu H, Cole SR. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. J Clin Epidemiol. 2006;59:1331–32.
Knottnerus JA, ed. The evidence base of clinical diagnosis. London: BMJ Publishing, 2002.
Walter SD. Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Stat Med. 2002;21:1237–56.
Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327:557–60.
Thompson SG, Higgins JP. How should meta-regression analyses be undertaken and interpreted? Stat Med. 2002;21:1559–73.
Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression. Stat Med. 2004;23:1663–82.
Begg CB. Publication bias. In: Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994:399–409.
Egger M, Davey-Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997;315:629–34.
Lowe B, Spitzer RL, Grafe K, et al. Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians' diagnoses. J Affect Disord. 2004;78:131–40.
Fann JR, Bombardier CH, Dikmen S, et al. Validity of the Patient Health Questionnaire-9 in assessing depression following traumatic brain injury. J Head Trauma Rehabil. 2005;20:501–11.
Watnick S, Wang PL, Demadura T, Ganzini L. Validation of 2 depression screening tools in dialysis patients. Am J Kidney Dis. 2005;46:919–24.
Picardi A, Adler DA, Abeni D, et al. Screening for depressive disorders in patients with skin diseases: a comparison of three screeners. Acta Derm Venereol. 2005;85:414–9.
Williams LS, Brizendine EJ, Plue L, et al. Performance of the PHQ-9 as a screening tool for depression after stroke. Stroke. 2005;36:635–8.
Wulsin L, Somoza E, Heck J. The feasibility of using the Spanish PHQ-9 to screen for depression in primary care in Honduras. Prim Care Companion J Clin Psychiatr. 2002;4:191–5.
Persoons P, Luyckx K, Desloovere C, Vandenberghe J, Fischler B. Anxiety and mood disorders in otorhinolaryngology outpatients presenting with dizziness: validation of the self-administered PRIME-MD Patient Health Questionnaire and epidemiology. Gen Hosp Psych. 2003;25:316–23.
Becker S, Al Zaid K, Al Faris E. Screening for somatization and depression in Saudi Arabia: a validation study of the PHQ in primary care. Int J Psychiatry Med. 2002;32:271–83.
Diez-Quevedo C, Rangil T, Sanchez-Planell L, Kroenke K, Spitzer RL. Validation and utility of the patient health questionnaire in diagnosing mental disorders in 1003 general hospital Spanish inpatients. Psychosom Med. 2001;63:679–86.
Henkel V, Mergl R, Kohnen R, Allgaier A, Möller H, Hegerl U. Use of brief depression screening tools in primary care: consideration of heterogeneity in performance in different patient groups. Gen Hosp Psych. 2004;26(3):190–8.
McManus D, Pipkin SS, Whooley MA. Screening for depression in patients with coronary heart disease (data from the Heart and Soul Study). Am J Cardiol. 2005;96:1076–81.
Lowe B, Kroenke K, Grafe K. Detecting and monitoring depression with a two-item questionnaire (PHQ-2). J Psychosom Res. 2005;58:163–71.
Eack S, Greeno CG, Lee BJ. Limitations of the Patient Health Questionnaire in identifying anxiety and depression in community mental health: many cases are undetected. Res Soc Work Pract. 2006;16:625–31.
Adewuya AO, Ola BA, Afolabi OO. Validity of the patient health questionnaire (PHQ-9) as a screening tool for depression amongst Nigerian university students. J Affect Disord. 2006;96:89–93.
Gilbody S, Richards D, Barkham M. Diagnosing depression in primary care using self-completed instruments: a UK validation of the PHQ9 and CORE-OM. Br J Gen Pract. 2007;57(541):65–652.
Andrews G, Peters L. The psychometric properties of the Composite International Diagnostic Interview. Soc Psychiatry Psychiatr Epidemiol. 1998;33:80–8.
Sheehan DV, Lecrubier Y, Sheehan KH, et al. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59(Suppl 20):22–33.
Lowe B, Grafe K, Zipfel S, Witte S, Loerch B, Herzog W. Diagnosing ICD-10 depressive episodes: superior criterion validity of the Patient Health Questionnaire. Psychother Psychosom. 2004;73:386–90.
Streiner D, Norman G. Health Measurement Scales: A practical guide to their development and use, 3rd ed. Oxford, UK.: Oxford University Press, 2003.
Williams JW, Noel PH, Cordes JA, Ramirez G, Pignone M. Is this patient clinically depressed? JAMA. 2002;287:1160–70.
Bossuyt PM, Reitsma JB, Bruns DE. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem. 2003;49:1–6.
Gilbody S, Sheldon T, Wessely S. Should we screen for depression? BMJ. 2006;332(7548):1027–30.
Unutzer J, Katon W, Callahan CM, et al. Collaborative care management of late-life depression in the primary care setting: a randomized controlled trial. JAMA. 2003;288:2836–45.
Acknowledgments
We are grateful to Dr Peter Bower for comments on an earlier draft of the manuscript. We also thank authors for providing unpublished data, and answering queries about study design. There is no external or internal funding for this project.
Conflict of interest
None disclosed.
Author information
Authors and Affiliations
Corresponding author
Additional information
SG had the original idea for this meta-analysis, and produced the protocol, extracted data, undertook all analyses and produced initial and final drafts. DR, CH and SB execrated data and commented on all drafts of the paper.
Rights and permissions
About this article
Cite this article
Gilbody, S., Richards, D., Brealey, S. et al. Screening for Depression in Medical Settings with the Patient Health Questionnaire (PHQ): A Diagnostic Meta-Analysis. J GEN INTERN MED 22, 1596–1602 (2007). https://doi.org/10.1007/s11606-007-0333-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11606-007-0333-y
Key words
- depression
- screening
- questionnaire
- psychometrics