Much research has focussed on the prognostic value of serum lactate estimation in critically ill patients [1]; however, in the context of work in low resource settings we have noted that facilities for lactate and blood gas analysis are frequently not available, prompting a search for alternative risk stratification tools. The anion gap (AG) is an easily calculated marker of metabolic acidosis based on analytes typically available from routine chemistry analysis. It may have potential as a risk stratification tool to identify sick patients at risk of deterioration, who would benefit from further management whilst pathophysiological processes are still reversible. The AG reflects the concentration of unmeasured anions as calculated by the formula Na+ - (Cl + HCO3). Inclusion of potassium in the formula is recommended where its concentration is abnormally high or low [2]. In healthy subjects, the unmeasured anions or “gap” is mostly made up of albumin; however, hypoalbuminaemia, commonly observed in critically ill patients, can lower the AG and mask an acidosis. Feldman and colleagues therefore recommended that the AG should be corrected for albumin [3].

In metabolic acidosis, addition of fixed acids leads to a rise of the AG: while the proton within the acid combines with bicarbonate, the conjugate base contributes to the unmeasured anions. Metabolic acidosis is common in critically ill patients and is a strong predictor of prognosis [4]. Maciel and Park observed that unmeasured anions accounted for the majority of metabolic acidosis in both intensive care unit (ICU) survivors and non-survivors, whereas lactate accounted for only a quarter of acidosis [5]. AG may thus have potential as a risk stratification tool, especially if corrected for albumin.

The validity of the AG as a predictor of mortality has been studied and has been compared to other indices of acid–base balance, especially Stewart’s strong ion gap [6]. However, the strong ion gap is more cumbersome and expensive to measure than the AG and is thus less suitable as a risk stratification tool in low resource settings. In studies with contrasting findings, AG was noted to be a very strong predictor of mortality [7] or of limited value with neither the AG nor the strong ion gap effective as predictors of in-hospital mortality [8]. Furthermore, it has been noted that studies conducted in countries where gelatin-based intravenous fluids are routinely used, such as the UK and Australia, failed to show an association between the strong ion gap and mortality whereas studies conducted in settings where such fluids are not routinely used, especially the USA were able to demonstrate an association [9, 10]. Gelatins are an exogenous source of unmeasured anions [11] and an increase in AG after gelatin infusion has been demonstrated in animal experimental studies [12].

Recently, a large study of 18,985 patients found that ∆AG, defined as the difference in AG between pre-hospital admission and critical care admission, was a robust predictor of all-cause mortality, where the pre-hospital AG was determined between seven and 365 days before admission [13]. However, this approach requires adequate documentation and a laboratory database, which are unlikely to be available in resource-limited settings.

In the present systematic review and meta-analysis we therefore aimed to determine the validity of a single AG measurement as a risk stratification tool predicting 31-day mortality in-hospital mortality, in-ICU mortality and comparable outcome measures in adult patients admitted to critical care settings. We also aimed to compare the prognostic validity of the observed and albumin-corrected AG. Although the AG as a risk stratification tool would be mainly applicable to low income countries, this systematic review and meta-analysis does not limit itself to studies conducted in such countries as the main focus lies on the scientific validity of the AG as a risk stratification tool.


Protocol registration

This systematic review and meta-analysis adheres to the “preferred reporting items for systematic reviews and meta-analyses” (PRISMA) standards [14]. A protocol was registered with PROSPERO, registration number CRD42015015249.

Search strategy, study selection and data extraction

We searched the electronic databases of Medline, Embase, Scopus, Medion, The Cochrane Library, Web of Science and regional bibliographic databases including African Index Medicus, Latin America and the Caribbean (LILACS), IndMed, Index Medicus for South East Asia Region (IMSEAR) and Western Pacific Region Index Medicus (WPRIM). In addition, journals specialising in the fields of critical care, anaesthetics, emergency medicine and intensive care medicine were searched electronically. Searches were performed for studies that were conducted on humans and published in English, German or French using the search terms “anion gap”, “unmeasured anions” and “unidentified acids”. The initial search was performed in January 2015 and the search was subsequently updated in May 2016. All search results were initially screened by abstract and title and those considered relevant subsequently underwent full-text screening. To identify further relevant studies, reference lists were reviewed, citation searches were performed and citation alerts were set up for all articles considered relevant after full-text screening.

Studies, conducted in any acute care clinical setting, were eligible for inclusion if they were published within the last 15 years, reported measurement of the observed and/or corrected serum AG in adult patients and mortality defined as “in-hospital mortality”, “in ICU mortality” or, if a time-frame was stated, death within up to 31-days of hospital admission. The latter definition was chosen where both outcomes were reported within a single study. Case studies, case–control studies and studies whose main focus was a hyperglycaemic emergency, poisoning or renal failure were excluded.

Data extraction was performed by a single reviewer, SG. A second reviewer, WS, independently extracted data from 10 % of the studies selected using a random number generator. Corresponding authors were contacted where necessary to discuss missing or unclear data.

Assessment of methodological quality and risk of bias

Two reviewers (SG and WS) independently graded the methodological quality and risk of bias of included studies using a modified version of the Quality In Prognostic Studies (QUIPS) tool [15]. This tool assesses the risk of bias of prognostic studies in six domains: study participation, study attrition, prognostic factor measurement, outcome measurement, confounding, and statistical analysis and reporting. Each study was rated as being at high, moderate or low risk of bias in each domain. Disagreements were resolved by discussion between the two reviewers.

Statistical analysis

Area under the ROC curve (AUCs), odds ratios (ORs) and mean differences were pooled in random or fixed effects generic inverse variance models for the observed and corrected AG. The I2 test was used to quantify heterogeneity. A fixed-effects model was used where the I2 was below 30 %; otherwise, a random-effects model was used. Meta-analysis of ORs and mean differences was undertaken in Review Manager version 5.3 (The Cochrane Collaboration, 2014, Copenhagen), while AUCs were pooled in StatsDirect version 2.8.0 (England: StatsDirect Ltd. 2013); an AUC of ≥0.8 was considered to denote good discriminatory power. Pooled estimates were not presented in forest plots due to high heterogeneity in the meta-analyses of AUC and mean difference; in the results section the pooled estimates are reported together with their respective 95 % confidence intervals (CIs). Subgroup analysis was undertaken to assess whether heterogeneity between studies could be explained by the following study characteristics: patient age, study setting, quantity of intravenous fluids received, determination of the AG before the initiation of hospital-based treatment, the routine use of gelatin-based intravenous fluids in the study country, choice of outcome measure, publication date and overall mortality.

Statistical significance testing for subgroup differences employed the unpaired t-test in GraphPad® QuickCalc Web Calculator (La Jolla California USA) [16]. Probabilities were two-tailed and a probability of less than 0.05 was considered statistically significant. No adjustments were made for multiple comparisons. Sensitivity analysis was undertaken to assess the effect of including retrospective studies and studies at high risk of attrition bias in the meta-analysis. Funnel plots were visually inspected for evidence of publication bias.


Study selection and characteristics of included studies

The study selection process is summarised in Fig. 1. In total, the search yielded 2688 non-duplicate publications; 2630 articles were excluded after title and abstract screening thus 58 articles were retrieved in full-text. Twenty-nine articles were excluded during full-text screening, leaving 29 studies that were subjected to data extraction. Ten studies were excluded during data extraction and thus 19 studies were included in the systematic review, of which 18 were included in one or more quantitative syntheses.

Fig. 1
figure 1

Flow chart summarising the search and study selection process. DKA = diabetic ketoacidosis; SOFA = sequential organ failure assessment score

Table 1 illustrates the characteristics of included studies. A majority of studies were conducted in high income countries [7, 8, 1626] while three studies were conducted in middle income countries [2729] and one in a low income country [30]. Studies were conducted in the following settings: ICU (10 studies), trauma centre (5 studies), coronary care unit/intensive cardiac care unit (3 studies) and Accident and Emergency department (1 study). Five studies accounted for the effect of intravenous fluids on AG levels: in one study no patient received more than 400 ml of any intravenous fluid before the AG was measured [7], in two studies patients receiving more than 250 ml or 500 ml of intravenous fluids respectively were excluded from the analysis [16, 30]. Two studies stated that the AG was determined before hospital based management, including intravenous fluids, was initiated [21, 29]. The remaining studies did not report on the quantity of intravenous fluids received by their study cohorts. No study reported both in-hospital and a time-frame specific mortality. One study [29] failed to define the outcome measure, reporting it as “mortality”. Sensitivity analysis was carried out to determine whether inclusion of this study affected the pooled effect measure. The risks of bias ratings are displayed in Table 2. Risk of attrition bias and confounding were the most poorly rated domains. There were no disagreements between review authors on data extraction.

Table 1 Characteristics of included studies
Table 2 Risk of bias rating

Prognostic ability of the AG to predict mortality

Owing to the high heterogeneity identified in the meta-analyses of AUC and mean difference, the pooled effect measures reported in this section should not be interpreted. Overall, three studies reported good discriminatory ability of the AG to predict mortality, of which two studies were included in meta-analysis [7, 16] and one study allowed the calculation of an OR for a specific AG threshold [22]. The former two studies were responsible for a large proportion of the statistical heterogeneity; both studies were conducted in young patients in the same trauma centre and only patients receiving less than a specified volume of intravenous fluids were included in the analysis. The latter study was conducted in patients with ST-elevation myocardial infarction undergoing percutaneous coronary intervention. The remaining 16 studies reported poor to moderate ability of the AG to predict mortality.

Nine studies reported AUCs for the observed AG (Fig. 2). Meta-analysis yielded a summary AUC of 0.72 (95 % CI 0.59 to 0.86). Heterogeneity was very high (I2 = 99 %) but reduced to I2 = 68 % when the two studies by Kaplan and Kellum were excluded from the analysis [7, 16]. Six studies reported AUCs for the corrected AG (Additional file 1: Figure S1). The summary AUC was estimated as 0.67 (95 % CI 0.62 to 0.71) and heterogeneity was high (I2 = 67 %).

Fig. 2
figure 2

Forest plot of area under the ROC curves (AUCs) for observed AG predicting mortality. Forest plot of a random effects meta-analysis of AUCs for the observed AG predicting mortality; I2 = 99 %. In view of the high heterogeneity a pooled effect estimate is not shown

Six studies reported ORs derived by logistic regression modelling for the observed AG (Fig. 3); five reported univariate logistic regression ORs whilst one study reported an OR adjusted for age. The summary OR was 1.08 (95 % CI 1.06 to 1.11); results were homogenous (I2 = 0 %) but it should be noted that the two studies by Kaplan and Kellum were not included in this analysis as neither study reported the OR. Four studies reported ORs derived by univariate logistic regression for the corrected AG (Additional file 2: Figure S2). The summary OR was 1.10 (95 % CI 1.07 to 1.13) and heterogeneity was very low (I2 = 5 %). Data reported in the study of Lazzeri and colleagues allowed the calculation of an OR for a specified AG positivity threshold. This yielded an OR of 2.8 (95 % CI 1.5 to 5.5) for an AG positivity threshold of 11 mEq/L [22].

Fig. 3
figure 3

Forest plot of odds ratios (ORs) for observed AG predicting mortality. Forest plot of a fixed effects meta-analysis of ORs derived by univariate logistic regression for the observed AG predicting mortality; I2 = 0 %. In view of the high heterogeneity in meta-analyses of other effect measures a pooled effect estimate is not shown

Mean difference was the most frequently reported effect measure with ten studies reporting it for the observed AG (Fig. 4). The summary mean difference was 3.55 mEq/L (95 % CI 1.08 to 6.02). Heterogeneity was high (I2 = 97 %); however, excluding the study by Kaplan and Kellum [7] completely eliminated heterogeneity (I2 = 0 %). The mean difference for corrected AG was reported by three studies (Additional file 3: Figure S3) and the summary mean difference was estimated as 3.25 mEq/L (95 % CI 1.53 to 4.96) with homogeneous results (I2 = 0 %).

Fig. 4
figure 4

Forest plot of mean differences for observed AG predicting mortality. Forest plot of mean differences in observed AG between survivors and non-survivors; I2 = 96 %. In view of the high heterogeneity a pooled effect estimate is not shown

Sensitivity analysis showed that including retrospective studies and studies at risk of attrition bias did not affect the summary AUC. Including prospective studies only (six studies) yielded a summary AUC of 0.73 (95 % CI 0.69 to 0.78). Similarly, excluding studies rated at high risk of attrition bias (four studies) yielded a summary AUC of 0.75 (95 % CI 0.54 to 0.96). Excluding the study by Dondorp and colleagues [31] with an undefined outcome measure (“mortality”) yielded an AUC of 0.72 (95 % CI 0.56 to 0.88).

AUC of observed AG was chosen for subgroup analysis; results are shown in Table 3. The quantity of intravenous fluids given to a patient had the strongest influence on the summary AUC. Studies excluding patients who received more than a specified volume of intravenous fluids [7, 16] reported a significantly higher summary AUC than studies not excluding patients for this reason (P = 0.0008), but heterogeneity remained high in both subgroups. For the subsequent analysis, studies restricting intravenous fluids [7, 16] and studies measuring the AG before initiation of hospital-based management [21, 29] were combined in a subgroup and compared to studies that did not account in any way for the effect of intravenous fluids on the AG. The former subgroup yielded a significantly higher summary AUC than the latter subgroup (P <0.0001). Heterogeneity remained high in the former subgroup (I2 = 98 %), though results in the latter subgroup were homogenous (I2 = 0 %). The summary AUC of studies conducted in countries where gelatin-based resuscitation fluids are routinely used [8, 19, 28] is not significantly different to that of studies conducted in countries where gelatins are not routinely used [13, 24] (P = 0.33). Studies in which intravenous fluids were restricted or in which the AG was measured before initiation of hospital-based treatment were not included in the latter comparison. The observed AG appears to be a slightly better predictor of mortality among younger patients (P = 0.011) and those admitted to trauma centre settings (P = 0.0235). Subgroup analysis showed no significant difference between studies reporting in-hospital mortality and those reporting a time-framed mortality (P = 0.65); similarly, no significant difference was found between studies in which overall mortality was below 30 % and above 30 % (P = 0.89) or between studies published before and after the year 2005 (P = 0.43); and heterogeneity remained high in all subgroups. Funnel plots did not show evidence of publication bias (Additional file 4: Figure S4 and Additional file 5: Figure S5).

Table 3 Results of subgroup analysis


This systematic review and meta-analysis does not support the use of a single AG measurement for risk stratification in critically ill patients. Quantitative synthesis was limited by significant statistical heterogeneity, which, following a series of subgroup analyses, could be partially explained by the quantity of intravenous fluids received by study patients. Studies differed substantially with regards to setting, presumed use of gelatin-based intravenous fluids as well as the age and mortality rate of their patient cohorts; however, in our analysis none of these factors fully accounted for the high degree of heterogeneity. Disease severity was not consistently characterised across studies and could therefore not be analysed in subgroup analysis. Overall, the high degree of unexplained heterogeneity, poor quality of primary studies and poor to moderate discriminatory power of the AG reported by the majority of studies suggest that there is insufficient evidence to recommend the use of the AG in clinical practice. Owing to the small number of studies that calculated a corrected AG, we were unable to determine whether correction of the AG for albumin improves its predictive ability.

In subgroup analysis, a highly statistically significant difference was seen between studies accounting for intravenous fluids by means of fluid restriction or by measuring the AG before the initiation of hospital-based management and studies that did not account for quantity of intravenous fluids by any means. This indicates that intravenous fluids may have blurred the association between AG and mortality. Administration of normal saline lowers the AG because addition of NaCl to the plasma increases the baseline chloride concentration proportionately more than the baseline sodium concentration, owing to the differences in volume of distribution between the two ions [4]. This effect may not be seen with more balanced fluids such as Hartman’s solution but, given that normal saline is commonly used in clinical practice, a risk stratification tool that is considerably affected by saline infusion is impractical. However, the validity of this subgroup analysis is limited by its observational nature and relatively small number of studies contained in each subgroup. Therefore, other confounders may have accounted for these results. Furthermore, the study by Shane and colleagues [30] also employed intravenous fluid restriction but found no predictive effect of the AG; however, only mean difference in AG between survivors and non-survivors but not AUC was reported. Further research would be required to determine more conclusively the extent to which the AG is affected by intravenous fluids.

Other than intravenous fluid restriction, our subgroup analysis did not identify factors accounting for the high statistical heterogeneity in the meta-analysis of AUC. A small effect of study setting and mean/median age on the pooled AUC was observed; however, the associated probabilities were >0.01 where no adjustments were made for multiple comparisons and in both analyses one subgroup contained the two studies by Kaplan and Kellum [7, 16]. Therefore, we consider the observed differences most likely to have arisen due to chance or confounding. Notably, no significant effect of outcome measure or publication date was observed on the pooled AUC. This supports the appropriateness of including studies reporting a time-framed mortality and in-hospital mortality and studies published at different times over the past 15 years. As parameters denoting disease severity were not consistently reported across studies, we divided studies according to overall mortality in subgroup analysis. No difference in pooled AUC was seen between studies reporting mortality rates above and below 30 %; however, overall mortality is a suboptimal indicator of severity of illness. Therefore, the contribution of disease severity to the observed statistical heterogeneity remains unclear.

Another important factor limiting the ability of the AG to predict clinical outcomes is its considerable baseline variability amongst healthy people. To address this, Kraut and Nagami suggested comparing the AG during an acute admission to the “personal AG” measured when the individual was in good health [2]. Dynamic AG indices, describing not only the magnitude of acid–base disturbance but also trends over time, are better predictors of mortality, as shown by the large study by Lipnick and colleagues including 18,995 patients [13]. For a subset of patients in this study (n = 664), the predictive ability of a single AG measurement was also reported and was shown to be poor (AUC 0.61). However, implementation of the systems required to support a “personal AG” would be challenging in low resource settings.

A rise in the AG in critically ill patients was long thought to be predominantly due to lactic acidosis, yet several studies reported poor sensitivity of the AG in detecting hyperlactataemia defined by a lactate threshold of 2.5 mmol/l [3134]. The AG was an excellent predictor of severe hyperlactataemia defined as lactate above 4 mmol/l [32] or 5 mmol/l [8]; however, Nichol and colleagues found that a higher lactate concentration even within a normal reference range of 2 mmol/l independently predicts mortality [35]. The AG may thus miss patients at risk of mortality, as a considerable degree of hyperlactataemia is required to push the AG outside its normal reference range if the baseline AG is low [2]. This is in keeping with the extreme difference in lactate levels between survivors and non-survivors observed in the study reporting the highest predictive value of AG [7]. Other studies reported smaller, albeit mostly statistically significant, differences in lactate levels between survivors and non-survivors.


The methodological quality of primary studies was generally poor. Most studies were rated at moderate or high risk of attrition bias and sampling bias as a result of failure to quantify missing outcome or prognostic data, especially in retrospective studies, where case notes with missing information are less likely to have been available. This may have affected our results where information was missing in a non-random manner. Similarly, risk of confounding was moderate to high in the majority of studies. Only one study explored the influence of age on AG levels, and stratification was not employed by any study; the risk of confounding affecting the review outcome is therefore high. Several studies did not report relevant effect measures, such as OR, AUC or mean difference or failed to provide confidence intervals, leading to exclusion from this review. Furthermore, the overall severity of illness in the study cohort was sometimes not quantified by means of an accepted disease severity score, such as the Acute Physiology and Chronic Health Evaluation II (APACHE II), Sequential Organ Failure Assessment (SOFA) score or Injury Severity Score (ISS) in trauma patients. No studies reported short-term mortality outcomes, which may have been more appropriate as naturally the prognosis in critical care patients is heavily influenced by clinical interventions undertaken during the inpatient stay. Shapiro and colleagues found that a single lactate level drawn on admission has good discriminatory power to predict 3-day mortality (AUC = 0.8) but poor discriminatory power to predict 28-day mortality (AUC = 0.67) [36]. Lastly, few studies stated the types of intravenous fluids used and no study included the quantity of intravenous fluid used as a variable in multivariate analysis.


The high degree of unexplained statistical heterogeneity, considerable diversity between patient cohorts and poor quality of primary studies, in particular the high risk of attrition bias and confounding, impact on the overall strength of evidence of this systematic review and meta-analysis. The majority of studies reported here do not support the use of the AG as a predictor of 31-day mortality, in-ICU mortality or in-hospital mortality. Therefore, based on the available evidence, the use of a single AG measurement for risk stratification in critically ill patients cannot be recommended. Further high quality research would be required to conclusively determine the validity of the AG as a predictor of mortality. However, the probable influence of intravenous fluids on AG levels and the substantial baseline variability between AG levels among healthy individuals may render the use of the AG problematic in clinical practice. In light of the growing body of evidence supporting the use of lactate concentration for monitoring of critically ill patients, it may be more worthwhile to focus efforts on increasing the capacity for lactate measurement in low resource settings.