Clinical assessment of neurological status is a vital element in decision making, outcome prediction, and information sharing among medical professionals. Traditionally, the Glasgow Coma Scale (GCS) has been widely adopted to document and formally assess neurological status. This scale has been praised for its simplicity and ease of use among healthcare workers. However, a shortcoming of the GCS is its inaccuracy in certain patient populations, including those with severe neurological impairment. This population may include intubated patients, which are difficult to assess with the GCS due to their lack of verbal communication. Similarly, alteration of brainstem function and respiratory pattern are important clinical factors reflecting severity of impairment, which the GCS neither addresses nor attempts to quantify.

In 2005, Wijdicks et al. [1] devised a new coma score, the Full Outline of UnResponsiveness (FOUR) score, which addressed the pitfalls of the GCS. The benefit that the FOUR score has over preexisting systems is the inclusion of specific categories for eye movement, motor exam, brainstem reflexes, and respiratory pattern. Thus, the FOUR score provides a structured scoring system for aspects of brainstem function that can be assessed in all patients, including those unable to verbally communicate.

Since its inception, the FOUR score has been studied in a variety of settings and patient populations. Our goal was to perform a scoping systematic review of the existing literature on the application of the FOUR score within critically ill patients and its use in outcome prediction.


A systematic review using the methodology outlined in the Cochrane Handbook for Systematic Reviewers [2] was conducted. Data are reported following the PRISMA guidelines [3]. The PRISMA checklist is found in the supplementary material as “Appendix A.” The search strategy was decided upon by the primary author (AA) and supervisor (FAZ).

Search Question, Population, Inclusion, and Exclusion Criteria

We aimed to answer the question: What literature is available for FOUR score and outcome prediction in critically ill patients? The primary outcome of interest was patient global outcome, as assessed by any of: mortality, modified Rankin Score (mRS), Glasgow Outcome Score (GOS), or any other functional or neuropsychiatric outcome.

Studies documenting interobserver variability were also included in order to provide context to the reliability of the FOUR score system.

Inclusion criteria were: humans, adults, and children; prospective randomized controlled trial, prospective cohort, cohort/control, case series, prospective and retrospective studies. Non-English studies and those involving animals were excluded. Ultimately, studies on pediatric populations were excluded as these results will be reported in a separate publication. The FOUR score was used as described in the original validation study by Wijdicks et al. [1] (Table 1).

Table 1 Neurological grading scales

Search Strategy

Six databases were searched from inception to September 2017: MEDLINE, BIOSIS, Scopus, Cochrane Libraries, Globalhealth, and Embase. Published meeting proceedings were included in the search. Following study selection, reference sections of each paper were examined to ensure relevant papers not captured by the initial search were included in the review. “Appendix B” of the supplementary materials highlights the search strategy implemented for each database.

Study Selection

A two-step review was performed. Two reviewers independently screened each resulting title and abstract from the initial search for inclusion. Full texts for citations passing this initial screen were obtained. Inclusion and exclusion criteria were applied to each article to obtain final articles for review. In cases of disagreement between the two reviewers, open discussion was done and a detailed review of the study in question was done to reach a consensus.

Data Collection

Data were extracted from the final list of articles and stored electronically. Data from adult populations were organized into the following categories based on patient pathology and setting by the primary author: patients in the emergency department, medicine and general critical illness patients, traumatic brain injury, intraventricular/intracerebral hemorrhage, subarachnoid hemorrhage, ischemic stroke and general neurology/neurosurgery patients. Data extracted included study country, design, objectives, outcomes, and conclusions made by the study authors. Data on interobserver reliability, if assessed, was also included, as was any information on prognostic ability of the FOUR score.

Quality of Evidence Assessment

Each study was evaluated for quality of evidence using the RTI Item Bank on Risk of Bias and Precision of Observational Studies [4]. This validated item bank is applicable to a variety of observational study designs and evaluates the risk of bias and internal validity of studies using a comprehensive list of itemized questions. “Appendix C” of the supplementary materials provides the tabulated results of the bias assessment for each study included in this scoping review.

Statistical Analysis

A meta-analysis was not performed due to the heterogeneity of data and study design within the studies; thus, a scoping review was performed.


The initial search yielded 1709 citations. Of 55 articles selected for final review, 49 were based on adult populations and will be included in the results of this paper. Sixteen of these articles studied general medical and critical illness populations, 6 articles studied patients in an emergency department setting, 10 articles studied patients with traumatic brain injury, 3 articles studied patients with intracerebral or intraventricular hemorrhage, 1 article studied patients with subarachnoid hemorrhage, 2 articles studied patients with ischemic stroke, and 11 articles studied general neurology and neurosurgery patients. Forty-one of these articles were performed prospectively; the remainder of the articles were performed retrospectively. There were no randomized controlled trials performed in the literature. A total of 9092 adult patients were studied. Figure 1 displays the PRISMA [3] flow diagram of the search results and filtering processes.

Fig. 1
figure 1

Flow diagram of study selection

Interobserver Reliability

Fourteen studies [1, 5,6,7,8,9,10,11,12,13,14,15,16,17] demonstrated good to excellent interobserver reliability of the FOUR score among raters. In general, a kappa value of 0.4 or less is considered poor, values of 0.4–0.6 are considered fair to moderate, values of 0.6–0.8 are considered good, and values above 0.8 are considered to have excellent inter-rater agreement. The lowest weighted kappa score found in the literature for the FOUR score was 0.68 [10], with the majority being at least 0.80. 3 of the 14 studies were done on patients in the emergency department [5,6,7], 6 on general medical and critical illness patients [8,9,10,11,12,13], and 5 on general neurology and neurosurgical patients [1, 14,15,16,17]. No articles that studied interobserver reliability failed to demonstrate at least good reliability. “Appendix D” of the supplementary materials provides the tabulated results from those studies assessing interobserver reliability.

Prognostic Value When Used Alone

Nine studies [18,19,20,21,22,23,24,25] demonstrated prognostic value of the FOUR score in predicting mortality and functional outcomes. Four were based on general medicine and critical illness populations [18,19,20,21], whereas 5 were based on neurology and neurosurgery populations (including 2 on intracerebral hemorrhage patients [23, 24] and 2 on traumatic brain injury patients [25]).

In a neurological intensive care unit, Akavipat et al. [22] demonstrated the predictive value of the FOUR score in predicting poor outcome at discharge (AUC ROC = 0.88, 95% CI 0.82–0.92) and in-hospital mortality (AUC ROC = 0.92, 95% CI 0.87–0.97). In medicine patients, Rohaut et al. [20] demonstrated the predictive value of the FOUR score in predicting 28-day mortality (c-index of 0.76, 95% CI 0.67–0.84). Other outcomes studied include admission to an intensive care unit [23], overt hepatic encephalopathy [18] and discharge to home or a rehabilitation facility [19]. “Appendix E” of the supplementary materials displays the tabulated results from these studies.

One study [26] examined the use of various weaning parameters (including the FOUR score) in predicting extubation failure in general neurology and neurosurgical patients. The authors found no significant difference in FOUR score between patients who failed extubation and those who were successfully extubated.

Prognostic Value When Compared to the GCS

Thirty-two studies [1, 6, 7, 9, 11, 12, 16, 25, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51] demonstrated equivalency or superiority of the FOUR score compared to GCS in the prediction of mortality and functional outcomes. Four of these studied patients in the emergency department [6, 7, 27, 30], 8 studied general medical and critical illness patients [9, 11, 12, 29, 31, 45, 48, 49], 11 studied traumatic brain injury patients [25, 34,35,36, 40,41,42,43,44,45, 47], and 10 studied other neurology/neurosurgery patients (6 studied general neurology and neurosurgical patients [16, 28, 33, 37, 49, 50], 2 studied ischemic stroke patients [38, 39], 1 studied intraventricular hemorrhage patients [32] and 1 studied aneurysmal subarachnoid patients [51]).

Table 2 displays the studies focusing on emergency department populations. Multiple authors demonstrated equal or superior prognostic value of the FOUR score in predicting mortality; for example, Eken et al. [30] showed AUC ROC = 0.788 for FOUR (95% CI 0.722–0.844) and AUC ROC = 0.735 for GCS (95% CI 0.655–0.797) in predicting in-hospital mortality (p = 0.0001). Similarly, Stead et al. [6] demonstrated OR = 0.67 for FOUR (95% CI 0.53–0.84) versus OR = 0.68 for GCS (95% CI 0.56–0.83) in predicting in-hospital mortality (p < 0.001).

Table 2 Studies on patients in the emergency department examining the prognostic value of FOUR Score versus GCS

Table 3 displays the studies on general medical/critically ill patients. Outcomes studied in this population include those in the intensive care unit (ICU) [49], in-hospital [6, 7, 27, 31] and 28-day mortality [45], successful extubation [45], the ability to become a potential organ donor [29], and other functional outcomes based on the GOS, modified Rankin Scale and Glasgow–Pittsburgh cerebral performance categories [9, 12, 45, 48]. All demonstrated equivalency or superiority of the FOUR score; for example, Wijdicks et al. [49] demonstrated AUC ROC = 0.742 (95% CI 0.694–0.790) for FOUR and AUC ROC = 0.715 for GCS (95% CI 0.663–0.768) in predicting in-ICU mortality (p = 0.001).

Table 3 Studies on medicine and general critical illness patients examining the prognostic value of FOUR score versus GCS

Table 4 outlines the studies on traumatic brain injury patients, while Table 5 highlights the other studies on neurology/neurosurgery patient populations. Values for AUC ROC were similar across studies in predicting in-hospital mortality; generally AUC ROC ≥ 0.80 [1, 34, 39, 40, 43, 44, 47]. Mortality was studied at various other time points [35, 46], along with functional outcome based on the GOS, the Karnofsy Performance Score, the Acute Physiology and Chronic Health Evaluation II score and the modified Rankin Scale [34, 36, 40, 41, 43, 44]. Again, all illustrate the equivalent or superior ability of the FOUR score to predict mortality and functional outcomes when compared to GCS.

Table 4 Studies on traumatic brain injury patients examining the prognostic value of the FOUR score versus GCS
Table 5 Studies on other neurology and neurosurgical patients examining the prognostic value of the FOUR score versus GCS

One study [52] conducted in post-resuscitation encephalopathy patients studied the motor components of both the FOUR score and GCS to predict poor prognosis, and found a lower sensitivity of the FOUR score in outcome prediction (68.7% sensitivity for FOUR, 95% CI 41.4–88.9 vs. 87.5% sensitivity for GCS, 95% CI 61.6–92.6).

Quality of Evidence

Quality of evidence was assessed using the RTI Item Bank on Risk of Bias and Precision of Observational Studies [4]. Based on its itemized list of questions, there was an overall low risk of bias in the studies included in this review.


We aimed to perform a scoping review of the FOUR score and its use in outcome prediction. The existing literature around the FOUR score generally demonstrates that it possesses prognostic value alone and in comparison with the GCS, as exemplified through 9 and 32 mainly prospective studies, respectively.

In predicting extubation failure, however, Ko et al. [26] failed to show predictive value for the FOUR score as well as all other weaning parameters they chose to study, including rapid shallow breathing index and spontaneous breathing trial. In neurology and neurosurgical patients, the ability to forcefully cough and actively clear secretions is of importance in successful extubation, and perhaps not specifically assessed by the FOUR score. However, the authors also had missing data regarding etiology of respiratory failure and inaccurate fluid balance, which may have contributed to their negative results. In contrast, Said et al. [45] published a pilot study among a general ICU population, and did show superiority of the FOUR score compared to GCS in predicting successful extubation at 14 days post-intubation.

In comatose patients post cardiopulmonary arrest, Topcuoglu et al. [52] examined the motor parts of the GCS and FOUR score in outcome prediction and showed a lower sensitivity of the FOUR motor component compared to the GCS motor component. When either scores were combined with specific magnetic resonance imaging (MRI) findings, sensitivity improved to 100%. It is important to note that in this study, the authors focus primarily on MRI findings, and specific details regarding how and when the FOUR score is measured, as well as the presence of potentially confounding factors (for example, sedating or paralyzing medications) is unclear.

Taking into consideration the shortcomings of these two studies, overall, the remainder of the present literature around the FOUR score displays its usefulness as a neurological assessment tool. While its accuracy in predicting extubation failure is not clearly established—again, perhaps as a result of the limitations described previously—multiple studies have shown it to hold prognostic significance in predicting mortality and functional outcomes in diverse patient populations, from general medicine patients to those with neurosurgical pathology.

Accurate neurological assessment is made imperfect by the presence of mental alteration caused by sedating medications, endotracheal intubation, and language barriers including patient dysphasia or dysarthria. These conditions make application of the GCS verbal score especially difficult, as it relies heavily on accurate comprehension, orientation and the ability of the patient to verbally respond to the rater. Raters will make adjustments to the GCS to account for the presence of such clouding factors, especially in the presence of endotracheal intubation, but these adjustments are non-standardized across institutions. The FOUR score bypasses this by not including a verbal score based on orientation or the ability of a patient to respond; a degree of comprehension is required to obey simple motor commands only. In patients with receptive aphasia, application of both the FOUR score and GCS may be difficult secondary to inability of the patient to process motor commands; however, the FOUR score subcategories of eye, motor, brainstem and respiratory function can be quantified in a greater number of patients than GCS including those that are intubated or with expressive language impairment only.

In comparison to the GCS, the FOUR score may also be helpful in further subcategorizing patients with severe neurological impairment based on their brainstem function and respiratory pattern, which the GCS is unable to do. This has the potential to better stratify patients with severe neurological injury and provide clinicians with further information regarding overall prognosis. These advantages, combined with its good inter-rater reliability, give the FOUR score the potential to replace conventional scoring systems and allow for precise and consistent neurological assessments among health care providers.


Despite the promising results surrounding the application of the FOUR score across these varied patient populations, there are some limitations which deserve highlighting.

First, a main limitation of this review is the heterogeneity of the studies included. Given the assortment of study designs, objectives and patient populations, a meaningful meta-analysis of results was impossible to conduct. Thus, we are left with a purely descriptive analysis of the available literature. With that said, the body of literature which includes almost 10,000 patients provides evidence in support of its use for clinical grading in a variety of situations.

Second, the FOUR score itself is limited by the fact that it requires a more detailed neurological examination and the experience to confidently conduct such an examination. The saving grace of more simplistic systems, such as GCS, is they can be readily employed by various medical and paramedical professionals with fairly consistent reliability. This flexibility of GCS in the face of varied training and backgrounds of the assessor is a major benefit of the system. The FOUR score requires a slightly higher background knowledge of the nervous system, which may limit its application in other settings, such as the pre-hospital environment.

Third, the majority of the literature identified within this review focuses on general medical/ICU or traumatic brain injury patients. Thus, the conclusion regarding the association of the FOUR score with outcome or its performance compared to GCS in other populations, such as subarachnoid hemorrhage or intracerebral hemorrhage, is quite limited at this time. Further work is required in these sub-populations of patients to determine the role of the FOUR score. Studies directly comparing the utility of the FOUR score versus GCS in intubated versus non-intubated patients, and those with brain stem lesions versus those without, are also lacking within these populations. While the advantages of using the FOUR score over GCS in intubated patients is logical, and as previously described relates primarily to its ability to bypass a verbal assessment, this has not been specifically demonstrated in the literature and deserves focused attention in future studies.

Lastly, as with many outcome prediction studies, the presence of observer bias is possible. Only 12 of the 49 studies were transparent about the presence of blinding in their study protocol. Many authors failed to mention whether they blinded their raters and outcome observers, making the presence of observer bias unclear. The numbers of patients in whom life-sustaining therapies were withdrawn in their clinical course is also poorly described in the literature, potentially introducing an element of selection bias as well.


The existing literature favors the FOUR score as a useful outcome predictor in many patients with depressed level of consciousness. It has been studied in a wide variety of critically ill patients, both with and without neurologic pathology in predicting mortality and functional outcomes. It displays good inter-rater reliability among physicians and nurses.