Introduction

It cannot be considered to be good medical practice to use a weight estimation system that is known to be inaccurate [1]. When children’s weight cannot be measured during emergency care, an accurate, rapid estimation of weight is needed, as the safety and effectiveness of emergent interventions may ultimately depend on the accuracy of the weight estimation [2, 3]. Since most drug doses in children are based on weight, an accurate estimation of weight is important to ensure that a correct amount of medication is administered to achieve the desired effect, as well as to prevent the potential complications and side-effects of overdosing [4, 5]. This is relevant because most paediatric medication errors occur in the Emergency Department and most cases of resultant patient harm are related to incorrect dosing [6,7,8].

The problem is that most contemporary methods used to estimate children’s weight have been shown to lack sufficient accuracy and consistency of performance in different populations [9]. Most existing weight estimation systems are “one-dimensional”, because a single variable, usually age or length, is used in the weight estimation methodology. These systems fail because a single variable cannot adequately account for the biological variability of weight-for-age and weight-for-length [10, 11]. There is a wide variability of body habitus that is not accounted for in these weight-estimation systems, aggravated by the increasing levels of obesity affecting children [12, 13]. Newer, more promising, methods are the “two-dimensional” or dual length- and habitus-based systems, which include two variables in the estimation methodology: length (or a surrogate such as humerus or ulna length) and habitus (or a surrogate such as mid-arm circumference or waist circumference) [5, 14,15,16,17]. These have been shown to be much more accurate than the older, one-dimensional systems, in many studies [5, 15, 18,19,20,21,22].

Healthcare providers may also need more than one approach to emergency weight estimation: while parental estimates of weight can be very accurate, parents may not be present at the time that emergency care is required (especially in the prehospital environment) [9]. In these situations, an evidence-based alternative system may be required.

There has been a large amount of material published on weight estimation in children. It would be useful to combine the data from these studies to establish the accuracy of different methodologies both within and between different populations. Since many of the same weight estimation systems are used in populations with very different prevalences of underweight and obese children, it needs to be ascertained whether this impacts on the accuracy outcomes of these systems.

In order to create an evidence-based approach to emergency paediatric weight estimation, it is crucial to discover which methods predict weight most accurately and which are most appropriate for emergency use. This will enable clinicians to decide which systems they should incorporate into their clinical practice and will provide some guidance to those who administer, teach and train paediatric advanced life support on which systems are important.

The overall aim of this study was to determine which paediatric weight estimation systems most accurately estimate total body weight in children. The first objective was to determine whether there was evidence in the literature for an acceptable benchmark level of accuracy for a weight estimation system. The second objective was to extract and pool data on the performance of paediatric weight estimation systems to integrate the findings, provide a more comprehensive analysis on their functioning and identify those systems that operated best in diverse populations. The third objective was to directly compare the accuracy of paediatric weight estimation systems, for which paired data was available, using pooled data and meta-analysis techniques.

Only one meta-analysis has addressed this topic, but was limited to studies in low- and middle-income countries [23].

Methods

This systematic review and meta-analysis followed the PRISMA guidelines.

Search strategy

Online databases (MEDLINE, SCOPUS, Science Direct and Google) were interrogated for eligible studies, published between January 1983 and May 2017, using the following search terms: “paediatric weight estimation”, “weight estimation children” and “Broselow tape”. Citation lists of reviewed papers were examined for additional relevant articles. Studies in any language were included if English translations were obtainable. To minimise publication bias, all studies with adequate reporting were included, whether full-text articles, dissertations, abstracts, conference presentations or other unpublished data that had undergone some form of peer-review.

Study selection and eligibility criteria

All studies that evaluated weight-estimation methodologies were assessed for inclusion into the study by two separate investigators (MW and LG). Articles that contained discussions on desired targets of accuracy of weight estimation systems, or analysis of the performance of weight-estimation systems were included in the qualitative arm of the review. Studies that presented original data with either accuracy data (percentage of estimations within 10% of actual weight (PW10)) or bias and precision data (mean percentage error plus an appropriate indicator of variance), or both, were included in the meta-analysis. Studies that did not include original data, those that did not include usable data and those at high risk of bias (see below) were excluded from the meta-analysis (see Fig. 1).

Fig. 1
figure 1

The PRISMA flow-chart of the study design

Data abstraction and analysis

Data was extracted from the included studies independently by two researchers (MW, LG), cross-checked and confirmed. Standard statistics for meta-analysis of method-comparison studies were used [24], with an emphasis on evaluating accuracy (percentage of estimations within 10% of actual weight), bias (mean percentage error) as well as precision (limits of agreement of percentage error). Two methods of representing the pooled parametric and non-parametric data were employed: a fixed effects model weighted by inverse variance and a random effects model. In general, the random effects model was preferred because of the large variance within and between samples as well as the effects of several very large database studies that may have introduced bias.

Many of the evaluated studies presented incomplete data. Where it was possible, without risking bias, missing data was imputed using standard methodologies [25].

Direct comparisons between weight estimation systems, using pooled paired data, were performed with non-parametric techniques based on PW10 accuracy data, where such data was available.

Subgroup analysis

There was considerable heterogeneity in the use and composition of subgroups within the included studies. Wherever possible, subgroup analyses that had been performed in each study were included in the overall meta-analysis. The included subgroups focused on different age groups as previous studies have shown a difference in weight estimation accuracy between infants (<1 year), toddlers and pre-school children (1 to 6 years) and older children (>6 years of age) [26].

Risk of bias within and across studies

Reporting bias was minimised by including all available methodologically sound studies (published or not). Methodological causes of potential bias were common (e.g. the Broselow tape was not actually used in many studies, but weight-estimates were generated from length data), but these were individually assessed and rated according to the level of risk of systematic bias. Studies with a high risk of bias were excluded from the meta-analysis (e.g. studies which excluded children above or below certain weight-for-length centiles).

Sensitivity analysis

There were three large database studies among those evaluated, with more than 100,000 children, one of which had more than 400,000 data points [27,28,29]. The effects of these “virtual” weight estimation studies, from very large databases, were carefully considered to establish any significant contribution to bias or distorted outcomes.

Software

Statistical analysis was performed using Stata (StataCorp. 2015. Stata Statistical Software: Release 14. College Station, TX: StataCorp LP), Graphpad Prism (GraphPad Prism version 8.00 for Mac, GraphPad Software, La Jolla, California, USA, www.graphpad.com) and Review manager (Review Manager (RevMan) [Computer program]. Version 5.3. Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014).

Results

Excluded studies

The most common reason for exclusion of potentially relevant studies was incomplete data presentation (see Fig. 1). The large database studies did not have a significant impact on overall outcomes based on the sensitivity analysis and were therefore not excluded from the analysis.

Characteristics of included studies

Two-thirds of included studies evaluated multiple weight-estimation systems and contained paired data or made direct comparisons, while one-third evaluated only a single system. Prospective studies accounted for the majority of articles (70/98 (71.4%)) but a minority of total patients (58,618/1,054,673 (5.6%)).

Table 1 provides a descriptive summary of the studies included in both the qualitative review as well as the meta-analysis, including the major findings and limitations of each study and the risk of bias assessment for each included study.

Table 1 Studies included in the qualitative review and quantitative meta-analysis

Benchmark accuracy for a weight estimation system

After studying the 150 identified articles, only three articles were found to propose a statistically meaningful target for a weight estimation system: one article recommended that 95% of weight estimates must fall within 20% of actual weight and two articles suggested that 70% of estimates must be within 10% of actual weight and 95% of weight estimates must fall within 20% of actual weight [11, 30, 31]. There was, however, no evidence found upon which to base any specific measurement analysis metric for a weight estimation system. There was also no credible evidence found of a tolerable weight estimation error, in terms of safety for drug dose calculation, for an individual child.

In 90/150 articles (60.0%), there was no mention at all of an appropriate target for weight estimation accuracy. In 41/150 articles (27.3%) an error of < 10% was suggested as appropriate; in 11/150 articles (7.3%) an error of < 20% was advocated; in 2/150 articles (1.3%) an error of < 30%; and in 6/150 articles (4.0%) another value or a statistically inappropriate measure was proposed. None of the studies included any evidence to support these target figures. The values were selected based on clinical significance, pragmatic limits based on generalised therapeutic ratios, or based on guidelines on determining drug bioequivalence [32, 33].

Meta-analysis data on bias (trueness), precision and accuracy of paediatric weight estimation systems

Table 2 contains a description of each of the weight estimation systems reviewed, as well as any restrictions on their use. The raw data and outcomes for each of the weight-estimation methodologies included in the meta-analysis are shown in Additional file 1: Table S1. From the individual study data, it could be seen that there was very poor within-study precision for most weight estimation systems (shown by the wide limits of agreement), with the exception of the two-dimensional methods, which generally had precision limits of agreement of less than ± 20%.

Table 2 Summary and description of weight estimation methodologies described in the literature

Figure 2 shows the pooled data of the bias and precision for the weight-estimation systems evaluated. The fixed effects outcomes and data for the weight estimation methods not presented in Fig. 2 can be found in Table 3. The important findings can be summarised as follows:

  • There was a wide variation in the weight estimation bias between low- and middle-income countries (overestimation) and high-income countries (underestimation). This was most noticeable with the age-based systems, less so with the length-based systems and least with the two-dimensional systems, which had virtually zero bias.

  • There were very wide limits of agreement for all methods other than the PAWPER tape and the Mercy method.

Fig. 2
figure 2

Forest plot showing the bias and precision data of the major weight estimation systems evaluated

Table 3 Weight estimation meta-analysis summary data, showing both fixed effects (FE) and random effects (RE) data

Figure 3 show the overall accuracy data for each weight estimation system (PW10 data). Age-based systems were least accurate, length-based systems were slightly more accurate and parental estimates and the two-dimensional systems were the most accurate. Despite the difference in bias between high-income countries and low- and middle-income countries for the one-dimensional systems, the overall accuracy was similarly poor. If a PW10 of 70% were used as a benchmark of acceptable accuracy, only the PAWPER tape and the Mercy method would have achieved acceptable accuracy, with parental estimates close behind. When examining the PW20 data in Table 3, only the PAWPER tape (96.6%) and the Mercy method (95.3%) met the acceptability criteria suggested by Stewart of a PW20 > 95% [30]. The PW20s for the Broselow tape, parental estimates and a value calculated for pooled age-based formulas were 81.2, 87.1 and 65.0%, respectively.

Fig. 3
figure 3

A bar chart showing the accuracy data of the major weight estimation systems evaluated

The results of the subgroup analyses are shown in Table 4.

Table 4 Subgroup data for each weight estimation system

Figure 4 shows the results of direct statistical comparisons between weight estimation systems from studies where paired data could be pooled, using non-parametric measures of accuracy (PW10 data). The full analyses are available in Additional file 2: Figure S1. There was little difference between the accuracy of the different age formulas. Length-based methods were always more accurate than age-based methods, and two-dimensional methods were more accurate than one-dimensional methods. On direct comparison, but with data from only two studies, the PAWPER tape was significantly more accurate than the Mercy method. Parental estimates were significantly more accurate than the Broselow tape, but there was no data for direct comparison with any two-dimensional system.

Fig. 4
figure 4

Direct meta-analysis comparisons between weight estimation systems

Discussion

Summary

The quality of the evidence from the contributing studies was generally good, and the number of studies that could be included allowed for a comprehensive analysis of the data. The underlying risks of bias, while present, were considered not sufficient to alter the overall findings. Additional information on parental estimations of weight in different populations and circumstances is also required, as well as a comparison with the two-dimensional weight estimation systems.

The implications of the results for clinical practice and future research are profound: age-based formulas, along with healthcare provider guesses, were the least accurate of all weight estimation systems. They should not be used or taught. Similarly, one-dimensional length-based systems, while widely used and advocated by advanced life support organisations, were simply not accurate enough. The future challenges will be to develop two-dimensional systems, which produced the most accurate weight estimations, to be safe, quick and easy-to-use during emergency care.

Many articles on weight estimation have been—and continue to be—published without any clear indication if the results achieved, and the weight estimation systems tested, were actually good or bad. This meta-analysis has provided some useful findings which could guide researchers and decision-makers on which systems to use in clinical practice and which to explore in further research. It has also provided some perspective on the performance of weight estimation systems in high-income and low-and middle-income populations, which is important as most weight estimation systems have been developed in high-income countries and have the potential to be dangerous if used inappropriately.

A benchmark for weight estimation systems

What degree of under- or overestimation of weight is dangerous to a child when calculating drug doses is not known [34, 35]. Many of the drugs used in paediatric emergencies have not been adequately studied to determine optimal dosing ranges. Moreover, the consequences of overestimating or underestimating weight (and therefore dose) will differ between different drugs, different patients and different clinical scenarios [36,37,38,39]. The final dose will be strongly influenced by the clinical situation and the discretion of the treating doctor, but an accurate and reliable weight estimation would still be required to provide the starting point to allow for dose modifications. Some authors regard the need for a highly accurate weight estimation as debatable. Other argue that any factors potentially impacting on patient safety must be addressed and minimised, especially in the light of compounded errors in drug dose calculations [40].

In the qualitative arm of the systematic review, we found no objective evidence to support any particular target or system by which to assess the adequacy of weight estimation methodologies. The failure to define outcome measures on how accurately a weight estimation method must perform is methodologically unsound, however. This is important as the use of a system known to be inaccurate, or inferior to another system is not good medical practice [1]. There are clearly factors other than accuracy to consider when selecting the most appropriate weight system to adopt including the complexity and cognitive load generated by the system, the vulnerability to human factor errors and its ability to interface with a drug dosing guide [41]. This needs further research.

Despite the lack of objective evidence, some reference standard is still required. A large number of articles implied or stated explicitly that an individual estimation of weight within 10% of actual weight is desirable, but only three articles provided a benchmark by which to judge a weight estimation system. The suggested criteria were that, to be considered accurate, 70% of weight estimates must be within 10% of actual weight and 95% of weight estimates must fall within 20% of actual weight [11, 30, 31]. Since the newest two-dimensional systems have shown the capability to repeatedly achieve this standard, it could, therefore, be considered a reasonable benchmark to propose to assess the adequacy of weight estimation systems in the future.

Meta-analysis data: the accuracy of weight estimation systems

Age-based weight estimation

The age-based formulas were the least accurate and worst-performers of all the weight estimation methods. There are multiple reasons for the inaccuracy of age-based formulas: a large biological variability in weight-for-age; a non-linear relationship between weight and age; and differences between populations with different ethnic groups and different levels of nutrition [10]. We found that age-based formulas have never been shown to perform better than length-based systems. Despite this, many authors still regard the EPLS formula as the “gold standard” for weight estimation and age-formulas are still taught on advanced life support courses [42, 43]. Some authors also still support the use of age-based formulas because of their ostensible simplicity, because they require no equipment to function and they allow advanced preparation if emergency services personnel communicate a child’s age during transport to hospital [35]. However, their use presupposes that a child’s correct age is known, that the formula is remembered correctly and that the arithmetic is performed accurately. Memory is capricious in emergencies, however, and increased stress causes errors even in calculating simple formulas [44]. The benefits of the formulas are unlikely to mitigate for their very poor accuracy [11].

Many studies have shown age-based formulas to underestimate weight in first-world populations [45,46,47], but studies in low- and middle-income countries have shown a significant, potentially dangerous overestimation of weight by the same formulas [48,49,50]. In this meta-analysis, this was confirmed, with no age-based formula performing well in any population, but the overestimation of weight in low- and middle-income populations was significant and potentially unsafe. Even the use of habitus-modified age-formulas has failed to produce an improvement in accuracy to the degree of accuracy seen with length-based habitus-modified systems, as this modification still does not account for variations in length-for-age [11, 51].

This futility of age-based weight estimation can be perfectly summed up: “Accurate paediatric weight estimation by age: mission impossible” [27]. The unavoidable conclusion is that age-based formulas should no longer be used and clinicians that manage children should ensure that a better weight-estimation system is available for use during emergency care [11, 47, 52].

Length-based weight estimation

Every length-based system performed better than every age-based system in this study. This supports the argument that length-based weight estimation is more biologically valid than age-based estimation [10]. No length-based system achieved the acceptable outcome benchmark, however.

The two length-based formulas were originally designed to predict ideal body weight in children, but they have been used, albeit incorrectly, to estimate total body weight. The addition of a habitus-modification to these formulas has been shown to increase their performance significantly, to the same level of accuracy as the other two-dimensional systems [11]. The use of these formulas in this way shows potential, especially if used with a mobile phone app, and requires further investigation.

Although there are at least seven length-only weight-estimation tapes, only the Broselow tape has been extensively studied, while the Blantyre tape, the Sandell tape and the Handtevy tape have been evaluated only in single, small studies [53,54,55]. The Broselow tape, like other one-dimensional length-based systems, is vulnerable to error based on individual variations of weight-for-length (differences in body habitus) [56,57,58,59]. Some authors have questioned whether the tape is still valid given the increase in prevalence of overweight and obese children and may result in the “under-resuscitation of children” [33]. Although the manufacturer recommends modifying weight estimation up a colour zone in overweight children, to reduce this underestimation of weight, this has never been formally studied and still needs to be verified [60, 61]. However, while studies in high-income countries have demonstrated an overall underestimation of weight, studies in low- and middle-income countries have mostly shown an overestimation of weight, potentially to a dangerous degree in some populations (if drug doses were to be computed from those weights) [56, 57, 62]. Since length-based weight estimation is advocated by major, international advanced life support organisations and, since these systems are insufficiently accurate, this recommendation needs to be reconsidered and researched further [43, 63].

Two-dimensional (dual length- and habitus-based) weight estimation

The two-dimensional systems were far superior in accuracy to the one-dimensional age- and length-based systems. The accuracies of the Mercy method and the PAWPER tape in the meta-analysis were excellent, each with a PW10 of above 70% in both over- and undernourished populations. This finding was confirmed in individual studies, with no study reporting a one-dimensional system to be more accurate than a two-dimensional method. The direct meta-analysis comparisons showed that the PAWPER and Mercy methods were significantly more accurate than the other systems, with the PAWPER tape outperforming the Mercy method in the two studies in which they were both evaluated.

All weight estimation systems have limitations, however. The Mercy method, like all other weight estimation systems was vulnerable to human factor errors in undertrained users [64]. It also has shown considerable variation in accuracy between individual assessors [19]. The functioning of the Mercy system in emergencies still needs to be evaluated– this is of concern as one of the poorest performances of the Mercy method was in a study which measured children in the supine position, as it might be used in an emergency [4, 20]. The PAWPER system was shown to be very accurate in two South African studies, one Australian study and one study based on NHANES data from the USA [5, 20,21,22, 31]. It was somewhat less accurate in two American studies with very obese populations, mostly because of difficulties in assessing body habitus, however [13, 65]. Although the tape’s length-based measurements are objective and simple to perform, assessment of body habitus is more subjective and dependent on training and experience [66]. This will need to be researched further to explore more standardised and objective ways of assessing habitus.

The Devised Weight Estimating Method (DWEM), the Yamamoto obesity icon system, the Wozniak system and habitus-modified Traub-Johnson and Traub-Kichen formulas have all been shown to be significantly more accurate than length-based methods, but have not yet been sufficiently studied [11, 12, 14, 67].

Estimates of weight by parents

The utility of parental estimates of their child’s weight is dependent on the parent being willing to offer a weight estimate and being accessible to healthcare personnel at the time of the child’s need for emergency care [26]. The accuracy of prediction is determined by whether the accompanying parent is the regular caregiver of the child and whether or not the child has had a recent measurement of weight by the parent or in the parent’s presence [9]. A previous systematic review has suggested that parental estimates are the most accurate method for obtaining a weight, when it cannot be measured [9]. In this meta-analysis, parental estimates were statistically superior to the Broselow tape on direct comparison, but there were no paired data from which direct comparisons could be made with the two-dimensional systems. Only one previous study has compared the Mercy method with parental estimates, in which parental estimates were found to be more accurate [68]. This will require further research to clarify, especially the accuracy of parental estimates in populations of different socio-economic status and the frequency of availability of parental estimates. Since parents might not always be available, especially in the prehospital environment, it would be prudent to always have an alternative method of estimation available.

Differences in weight estimation accuracy between different populations

This study showed a clear disparity in how the one-dimensional weight estimation systems performed in different populations. These differences were primarily as a result of differences in bias, however, while the underlying lack of precision within each population was similar. Thus, the variability between populations was similar to the within-population variability shown in even the most homogeneous populations. The significance of this is that, although recalibration of a system for a specific population might reduce the bias, the underlying variability and imprecision would not allow an acceptable degree of overall accuracy to be achieved. This was well shown in the study by Asskaryar et al. which failed to recalibrate the Broselow tape in an Indian population by manipulating the bias only [57]. The two-dimensional systems, with their enhanced methodology which accounts for habitus, have proven to be the closer to a universally applicable system by achieving a more uniform accuracy, both within and between populations.

Limitations

The limitations of this study are similar to what is expected from any meta-analysis of this nature [24]. The lack of data comparing parental estimates and the newer two-dimensional systems limited the comparisons between these systems. The under-reporting on subgroups of weight status also limited the ability to analyse the performance of weight estimation systems in children with habitus that deviated from the average—this would provide insight into how the systems might function in populations with a high prevalence of underweight or obese children (or both).

Conclusions

No evidence exists of an acceptable benchmark for weight estimation systems. An accuracy of at least PW10 > 70% and PW20 > 95% could be considered as a reference standard, since the length-based, habitus-modified systems have proven that this target is achievable across a wide range of populations.

The only weight-estimation systems that were found to be of acceptable accuracy were the two-dimensional length- and habitus-based systems. The PAWPER tape and the Mercy Method achieved an accuracy that surpassed all other methods. Wide discrepancies in the accuracy of the Broselow tape in different age groups and different populations raise questions about its use. It may dangerously overestimate weight in children from low- and middle-income countries or poor communities. Without exception, the age-based formulas evaluated proved to be highly inaccurate, with a possibility for patient harm, especially in low- and middle-income countries. There is sufficient evidence to conclude that the use of age-based formulas should be discouraged.

Recommendations

Dual length- and habitus-based (two-dimensional) systems should be used for weight estimation in children because of superior accuracy to other systems (high quality evidence).

The Broselow tape or parental estimates of weight should be used for weight estimation in preference to age-based formulas and healthcare provider guesses (medium quality evidence).

Age-based formulas and healthcare provider guesses should not be used for weight estimation in children because of potential patient harm (high quality evidence).

Parental estimates should be used to estimate weight in preference to length-based and age-based systems (high quality evidence). There was insufficient evidence to provide a recommendation between the two-dimensional systems and parental estimates of weight.