Journal of General Internal Medicine

, Volume 22, Issue 11, pp 1596–1602 | Cite as

Screening for Depression in Medical Settings with the Patient Health Questionnaire (PHQ): A Diagnostic Meta-Analysis

  • Simon Gilbody
  • David Richards
  • Stephen Brealey
  • Catherine Hewitt
Clinical Review


To summarize the psychometric properties of the PHQ2 and PHQ9 as screening instruments for depression.


We identified 17 validation studies conducted in primary care; medical outpatients; and specialist medical services (cardiology, gynecology, stroke, dermatology, head injury, and otolaryngology). Electronic databases from 1994 to February 2007 (MEDLINE, PsycLIT, EMBASE, CINAHL, Cochrane registers) plus study reference lists have been used for this study. Translations included US English, Dutch, Italian, Spanish, German and Arabic). Summary sensitivity, specificity, likelihood and diagnostic odds ratios (OR) against a gold standard (DSM-IV) Major Depressive Disorder (MDD) were calculated for each study. We used random effects bivariate meta-analysis at recommended cut points to produce summary receiver–operator characteristic (sROC) curves. We explored heterogeneity with metaregression.

Measurements and Main Results

Fourteen studies (5,026 participants) validated the PHQ9 against MDD: sensitivity = 0.80 (95% CI 0.71–0.87); specificity = 0.92 (95% CI 0.88–0.95); positive likelihood ratio = 10.12 (95% CI 6.52–15.67); negative likelihood ratio = 0.22 (0.15 to 0.32). There was substantial heterogeneity (Diagnostic Odds Ratio heterogeneity I2 = 82%), which was not explained by study setting (primary care versus general hospital); method of scoring (cutoff ≥ 10 versus “diagnostic algorithm”); or study quality (blinded versus unblinded). The diagnostic validity of the PHQ2 was only validated in 3 studies and showed wide variability in sensitivity.


The PHQ9 is acceptable, and as good as longer clinician-administered instruments in a range of settings, countries, and populations. More research is needed to validate the PHQ2 to see if its diagnostic properties approach those of the PHQ9.

Key words

depression screening questionnaire psychometrics 



We are grateful to Dr Peter Bower for comments on an earlier draft of the manuscript. We also thank authors for providing unpublished data, and answering queries about study design. There is no external or internal funding for this project.

Conflict of interest

None disclosed.


  1. 1.
    Simon G, Von Korff M. Recognition and management of depression in primary care. Arch Fam Med. 1995;4:99–105.PubMedCrossRefGoogle Scholar
  2. 2.
    Katon W, Ciechanowski P. Impact of major depression on chronic medical illness. J Psychosom Res. 2002;53:859–63.PubMedCrossRefGoogle Scholar
  3. 3.
    Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed patients. Results from the Medical Outcomes Study. JAMA. 1989;262(7):914–9.PubMedCrossRefGoogle Scholar
  4. 4.
    Simon GE, Chisholm D, Treglia M, Bushnell D. Course of depression, health services costs, and work productivity in an international primary care study. Gen Hosp Psych. 2002;24(5):328–35.CrossRefGoogle Scholar
  5. 5.
    Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2002;136:765–76.PubMedGoogle Scholar
  6. 6.
    Agency for Healthcare Research and Quality. Screening for Depression: Systematic Evidence Review Number 6. Rockville MD: AHRQ, 2002.Google Scholar
  7. 7.
    Street RL, Jr., Gold WR, McDowell T. Using health status surveys in medical consultations. Med Care. 1994;32(7):732–44.PubMedCrossRefGoogle Scholar
  8. 8.
    Williams JW, Pignone M, Ramirez G, Stellato CP. Identifying depression in primary care: a literature synthesis of case-finding instruments. Gen Hosp Psych. 2002;24:225–37.CrossRefGoogle Scholar
  9. 9.
    Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care. 2003;41:1284–92.PubMedCrossRefGoogle Scholar
  10. 10.
    Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. JAMA. 1999;282:1737–44.PubMedCrossRefGoogle Scholar
  11. 11.
    Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–13.PubMedCrossRefGoogle Scholar
  12. 12.
    Deeks J. Evaluations of diagnostic and screening tests. In: Egger M, Davey Smith G, Altman DG, eds. Systematic Reviews in Health Care. London: BMJ Books, 2000:248–82.Google Scholar
  13. 13.
    Deville WL, Buntinx F, Bouter LM, et al. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol. 2002;2:9.PubMedCrossRefGoogle Scholar
  14. 14.
    Whiting P, Rutjes AW, Dinnes J, Reitsma J, Bossuyt PM, Kleijnen J. Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Technol Assess. 2004;8:1–234.Google Scholar
  15. 15.
    Knottnerus JA, Muris JW. Assessment of the accuracy of diagnostic tests: the cross-sectional study. J Clin Epidemiol. 2003;56:1118–28.PubMedCrossRefGoogle Scholar
  16. 16.
    World Health Organisation. International Statistical Classification of Diseases and Related Health Problems—10th Revision. Geneva: WHO, 1990.Google Scholar
  17. 17.
    American Psychiatric Association. Diagnostic and Statistical Manual—4th Edition. Washington DC: American Psychiatric Association, 1994.Google Scholar
  18. 18.
    Spitzer RL, Williams JB, Gibbon M, First MB. The Structured Clinical Interview for DSM-III-R (SCID). I: History, rationale, and description. Arch Gen Psychiatry. 1992;49(8):624–9.PubMedGoogle Scholar
  19. 19.
    Robins LN, Helzer JE, Croughan J, Ratcliff KS. National Institute of Mental Health Diagnostic Interview Schedule. Its history, characteristics, and validity. Arch Gen Psychiatry. 1981;38:381–9.PubMedGoogle Scholar
  20. 20.
    Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study. JAMA. 1994;272:1749–56.PubMedCrossRefGoogle Scholar
  21. 21.
    Churchill R, Hunot V, McGuire H. Cochrane Depression Anxiety and Neurosis Group. Cochrane Library 2004;2.Google Scholar
  22. 22.
    Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A basic science for clinical medicine. Boston, MA.: Little, Brown and Company, 1991.Google Scholar
  23. 23.
    Sackett DL, Haynes RB. Evidence base of clinical diagnosis: the architecture of diagnostic research. BMJ. 2002;324:539–41.PubMedCrossRefGoogle Scholar
  24. 24.
    Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56:1129–35.PubMedCrossRefGoogle Scholar
  25. 25.
    Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004;140(3):189–202.PubMedGoogle Scholar
  26. 26.
    Reitsma JB, Glas AS, Rutjes AWS, Scholten RJPM, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005;58:982–90.PubMedCrossRefGoogle Scholar
  27. 27.
    Chu H, Cole SR. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. J Clin Epidemiol. 2006;59:1331–32.CrossRefGoogle Scholar
  28. 28.
    Knottnerus JA, ed. The evidence base of clinical diagnosis. London: BMJ Publishing, 2002.Google Scholar
  29. 29.
    Walter SD. Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Stat Med. 2002;21:1237–56.PubMedCrossRefGoogle Scholar
  30. 30.
    Higgins JP, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327:557–60.PubMedCrossRefGoogle Scholar
  31. 31.
    Thompson SG, Higgins JP. How should meta-regression analyses be undertaken and interpreted? Stat Med. 2002;21:1559–73.PubMedCrossRefGoogle Scholar
  32. 32.
    Higgins JPT, Thompson SG. Controlling the risk of spurious findings from meta-regression. Stat Med. 2004;23:1663–82.PubMedCrossRefGoogle Scholar
  33. 33.
    Begg CB. Publication bias. In: Cooper H, Hedges LV, eds. The handbook of research synthesis. New York: Russell Sage Foundation, 1994:399–409.Google Scholar
  34. 34.
    Egger M, Davey-Smith G, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ. 1997;315:629–34.PubMedGoogle Scholar
  35. 35.
    Lowe B, Spitzer RL, Grafe K, et al. Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians' diagnoses. J Affect Disord. 2004;78:131–40.PubMedCrossRefGoogle Scholar
  36. 36.
    Fann JR, Bombardier CH, Dikmen S, et al. Validity of the Patient Health Questionnaire-9 in assessing depression following traumatic brain injury. J Head Trauma Rehabil. 2005;20:501–11.PubMedCrossRefGoogle Scholar
  37. 37.
    Watnick S, Wang PL, Demadura T, Ganzini L. Validation of 2 depression screening tools in dialysis patients. Am J Kidney Dis. 2005;46:919–24.PubMedCrossRefGoogle Scholar
  38. 38.
    Picardi A, Adler DA, Abeni D, et al. Screening for depressive disorders in patients with skin diseases: a comparison of three screeners. Acta Derm Venereol. 2005;85:414–9.PubMedCrossRefGoogle Scholar
  39. 39.
    Williams LS, Brizendine EJ, Plue L, et al. Performance of the PHQ-9 as a screening tool for depression after stroke. Stroke. 2005;36:635–8.PubMedCrossRefGoogle Scholar
  40. 40.
    Wulsin L, Somoza E, Heck J. The feasibility of using the Spanish PHQ-9 to screen for depression in primary care in Honduras. Prim Care Companion J Clin Psychiatr. 2002;4:191–5.Google Scholar
  41. 41.
    Persoons P, Luyckx K, Desloovere C, Vandenberghe J, Fischler B. Anxiety and mood disorders in otorhinolaryngology outpatients presenting with dizziness: validation of the self-administered PRIME-MD Patient Health Questionnaire and epidemiology. Gen Hosp Psych. 2003;25:316–23.CrossRefGoogle Scholar
  42. 42.
    Becker S, Al Zaid K, Al Faris E. Screening for somatization and depression in Saudi Arabia: a validation study of the PHQ in primary care. Int J Psychiatry Med. 2002;32:271–83.PubMedCrossRefGoogle Scholar
  43. 43.
    Diez-Quevedo C, Rangil T, Sanchez-Planell L, Kroenke K, Spitzer RL. Validation and utility of the patient health questionnaire in diagnosing mental disorders in 1003 general hospital Spanish inpatients. Psychosom Med. 2001;63:679–86.PubMedGoogle Scholar
  44. 44.
    Henkel V, Mergl R, Kohnen R, Allgaier A, Möller H, Hegerl U. Use of brief depression screening tools in primary care: consideration of heterogeneity in performance in different patient groups. Gen Hosp Psych. 2004;26(3):190–8.CrossRefGoogle Scholar
  45. 45.
    McManus D, Pipkin SS, Whooley MA. Screening for depression in patients with coronary heart disease (data from the Heart and Soul Study). Am J Cardiol. 2005;96:1076–81.PubMedCrossRefGoogle Scholar
  46. 46.
    Lowe B, Kroenke K, Grafe K. Detecting and monitoring depression with a two-item questionnaire (PHQ-2). J Psychosom Res. 2005;58:163–71.PubMedCrossRefGoogle Scholar
  47. 47.
    Eack S, Greeno CG, Lee BJ. Limitations of the Patient Health Questionnaire in identifying anxiety and depression in community mental health: many cases are undetected. Res Soc Work Pract. 2006;16:625–31.CrossRefGoogle Scholar
  48. 48.
    Adewuya AO, Ola BA, Afolabi OO. Validity of the patient health questionnaire (PHQ-9) as a screening tool for depression amongst Nigerian university students. J Affect Disord. 2006;96:89–93.PubMedCrossRefGoogle Scholar
  49. 49.
    Gilbody S, Richards D, Barkham M. Diagnosing depression in primary care using self-completed instruments: a UK validation of the PHQ9 and CORE-OM. Br J Gen Pract. 2007;57(541):65–652.Google Scholar
  50. 50.
    Andrews G, Peters L. The psychometric properties of the Composite International Diagnostic Interview. Soc Psychiatry Psychiatr Epidemiol. 1998;33:80–8.PubMedCrossRefGoogle Scholar
  51. 51.
    Sheehan DV, Lecrubier Y, Sheehan KH, et al. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59(Suppl 20):22–33.PubMedGoogle Scholar
  52. 52.
    Lowe B, Grafe K, Zipfel S, Witte S, Loerch B, Herzog W. Diagnosing ICD-10 depressive episodes: superior criterion validity of the Patient Health Questionnaire. Psychother Psychosom. 2004;73:386–90.PubMedCrossRefGoogle Scholar
  53. 53.
    Streiner D, Norman G. Health Measurement Scales: A practical guide to their development and use, 3rd ed. Oxford, UK.: Oxford University Press, 2003.Google Scholar
  54. 54.
    Williams JW, Noel PH, Cordes JA, Ramirez G, Pignone M. Is this patient clinically depressed? JAMA. 2002;287:1160–70.PubMedCrossRefGoogle Scholar
  55. 55.
    Bossuyt PM, Reitsma JB, Bruns DE. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem. 2003;49:1–6.PubMedCrossRefGoogle Scholar
  56. 56.
    Gilbody S, Sheldon T, Wessely S. Should we screen for depression? BMJ. 2006;332(7548):1027–30.PubMedCrossRefGoogle Scholar
  57. 57.
    Unutzer J, Katon W, Callahan CM, et al. Collaborative care management of late-life depression in the primary care setting: a randomized controlled trial. JAMA. 2003;288:2836–45.CrossRefGoogle Scholar

Copyright information

© Society of General Internal Medicine 2007

Authors and Affiliations

  • Simon Gilbody
    • 1
  • David Richards
    • 1
  • Stephen Brealey
    • 1
  • Catherine Hewitt
    • 1
  1. 1.Department of Health SciencesUniversity of YorkYorkUK

Personalised recommendations