Journal of Anesthesia

, Volume 25, Issue 6, pp 839–844 | Cite as

Comparison of sequential organ failure assessment (SOFA) scoring between nurses and residents

  • Nur BaykaraEmail author
  • Kaan Gökduman
  • Tülay Hoşten
  • Mine Solak
  • Kamil Toker
Original Article



We aimed to evaluate differences in the interobserver reliability and accuracy of sequential organ failure assessment (SOFA) scoring between nurses and residents.


Eight nurses and eight residents independently scored 24 randomly selected patients. Intraclass correlation coefficients (ICCs) for the reliability of total SOFA scoring were calculated. The residents’ and nurses’ SOFA scores were compared with a gold standard to assess accuracy.


The overall ICC of the total SOFA score was 0.87 (nurses 0.89, residents 0.86) for a single measurement. Residents tended to assign higher total SOFA scores than did nurses, without a statistically significant difference (7.01 ± 4.43 vs. 6.72 ± 4.27, P > 0.05). The mean bias between the nurses’ and the gold standard total SOFA scores was −0.16 ± 1.86 and the 95% confidence limit of agreement was −3.8 to +3.49. The mean bias between the residents’ and the gold standard total SOFA scores was −0.39 ± 1.81, and the 95% confidence limit of agreement was −3.95 to +3.16. The percentage of accurate data for the total SOFA score was 47.4% for nurses and 51% for residents (P > 0.05). Although not statistically significant, the major error rate (≥2 point deviation from the gold standard score) was higher for nurses than for residents (29.16 and 23.43%, P > 0.05). Accuracy of scoring individual organ systems was similar for the two groups; however, the major error rate in the cardiovascular system score was higher for nurses.


Interobserver reliability was good and mean SOFA scores were not significantly different between nurses and residents. The accuracy of SOFA scoring was moderate for both groups; however, although the difference was not statistically significant, the major error rate was higher for nurses than for residents.


SOFA Nurse Resident Interobserver reliability 



The authors thank Mrs. Laura Danner for editing the English translation of the manuscript.


  1. 1.
    Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, Reinhart CK, Suter PM, Thijs LG. The SOFA (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. Intensive Care Med. 1996;22:707–10.PubMedCrossRefGoogle Scholar
  2. 2.
    Bota DP, Melot C, Ferreira FL, Ba VN, Vincent JL. The multiple organ dysfunction score (MODS) versus the sequential organ failure assessment (SOFA) score in outcome prediction. Intensive Care Med. 2002;28:1619–24.CrossRefGoogle Scholar
  3. 3.
    Vincent JL, De Mendonça A, Cantraine F, Moreno R, Takala J, Suter PM, Sprung CL, Colardyn F, Blecher S. Use of the SOFA score to assess the incidence of organ dysfunction/failure in intensive care units: results of a multicentric, prospective study. Crit Care Med. 1998;26:1793–800.PubMedCrossRefGoogle Scholar
  4. 4.
    Moreno R, Vincent JL, Matos R, Mendonça A, Cantraine F, Thijs L, Takala J, Sprung C, Antonelli M, Bruining H, Willatts S. The use of maximum SOFA score to quantify organ dysfunction/failure in intensive care. Results of a prospective multicenter study. Intensive Care Med. 1999;25:686–96.PubMedCrossRefGoogle Scholar
  5. 5.
    Lopes Ferreira F, Peres Bota D, Bross A, Melot C, Vincent JL. Serial evaluation of the SOFA score to predict outcome. JAMA. 2001;286:1754–8.CrossRefGoogle Scholar
  6. 6.
    Shiels C, Eccles M, Hutchinson A, Gardiner E, Smoljanovic L. The inter-rater reliability of a generic measure of severity of illness. Fam Pract. 1997;14:466–71.PubMedCrossRefGoogle Scholar
  7. 7.
    Arts DG, de Keizer NF, Vroom MB, de Jonge E. Reliability and accuracy of sequential organ failure assessment (SOFA) scoring. Crit Care Med. 2005;33:1988–93.PubMedCrossRefGoogle Scholar
  8. 8.
    Tallgren M, Backlund M, Hynninen M. Accuracy of sequential organ failure assessment (SOFA) scoring in clinical practice. Acta Anaesthesiol Scand. 2009;53:39–45.PubMedCrossRefGoogle Scholar
  9. 9.
    Walter SD, Eliaszıw M, Donner A. Sample size and optimal designs for reliability studies. Stat Med. 1998;7:101–10.CrossRefGoogle Scholar
  10. 10.
    Cicchetti DV, Volkmar F, Sparrow SS, Cohen D, Fermanian J, Rourke BP. Assessing the reliability of clinical scales when data have both nominal and ordinal features: proposed guidelines for neuropsychological assessments. J Clin Exp Neuropsychol. 1992;14:673–86.PubMedCrossRefGoogle Scholar
  11. 11.
    Bland JM, Altman DJ. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;8:307–10.CrossRefGoogle Scholar
  12. 12.
    Holt AW, Bury LK, Bersten AD, Skowronski GA, Vedig AE. Prospective evaluation of residents and nurses as severity score data collectors. Crit Care Med. 1992;20:1688–91.PubMedCrossRefGoogle Scholar
  13. 13.
    Booth FV, Short M, Shorr AF, Arkins N, Bates B, Qualy RL, Levy H. Application of a population-based severity scoring system to individual patients results in frequent misclassification. Crit Care. 2005;9:R522–9.PubMedCrossRefGoogle Scholar
  14. 14.
    Thomas NK. Resident burnout. JAMA. 2004;292:2880–9.PubMedCrossRefGoogle Scholar
  15. 15.
    Fahrenkopf AM, Sectish TC, Barger LK, Sharek PJ, Lewin D, Chiang VW, Edwards S, Wiedermann BL, Landrigan CP. Rates of medication errors among depressed and burnt out residents: prospective cohort study. BMJ. 2008;336(7642):488–91.PubMedCrossRefGoogle Scholar
  16. 16.
    Embriaco N, Papazian L, Kentish-Barnes N, Pochard F, Azoulay E. Burnout syndrome among critical care healthcare workers. Curr Opin Crit Care. 2007;13:482–8.PubMedCrossRefGoogle Scholar
  17. 17.
    Holdgate A, Ching N, Angonese L. Variability in agreement between physicians and nurses when measuring the Glasgow Coma Scale in the emergency department limits its clinical usefulness. Emerg Med Australas. 2006;18:379–84.PubMedCrossRefGoogle Scholar
  18. 18.
    Gill MR, Reiley DG, Green SM. Interrater reliability of Glasgow Coma Scale scores in the emergency department. Ann Emerg Med. 2004;43:215–23.PubMedCrossRefGoogle Scholar
  19. 19.
    Polderman KH, Jorna EM, Girbes AR. Inter-observer variability in APACHE II scoring: effect of strict guidelines and training. Intensive Care Med. 2001;27:1365–9.PubMedCrossRefGoogle Scholar
  20. 20.
    Arts DG, Bosman RJ, de Jonge E, Joore JC, de Keizer NF. Training in data definitions improves quality of intensive care data. Crit Care. 2003;7:179–84.PubMedCrossRefGoogle Scholar
  21. 21.
    Gill M, Martens K, Lynch EL, Salih A, Green SM. Interrater reliability of 3 simplified neurologic scales applied to adults presenting to the emergency department with altered levels of consciousness. Ann Emerg Med. 2007;49(4):403–7.PubMedCrossRefGoogle Scholar
  22. 22.
    Romm FJ, Putnam SM. The validity of the medical record. Med Care. 1981;19(3):310–5.PubMedCrossRefGoogle Scholar

Copyright information

© Japanese Society of Anesthesiologists 2011

Authors and Affiliations

  • Nur Baykara
    • 1
    • 2
    Email author
  • Kaan Gökduman
    • 1
  • Tülay Hoşten
    • 1
  • Mine Solak
    • 1
  • Kamil Toker
    • 1
  1. 1.Department of Anesthesiology and Reanimation, Faculty of MedicineUniversity of KocaeliKocaeliTurkey
  2. 2.IstanbulTurkey

Personalised recommendations