The median thickness measurement deviations at the individual eight sites (inter-measurer reliability study) were all below 0.2 mm when experienced measurers performed the measurements (with sufficiently high probe frequency of about 12–18 MHz). This is comparable to the (physically given) accuracy of ultrasound distance measurement, which is mainly determined by the wavelength-dependent image resolution, provided that the correct speed of sound for the pulse-echo thickness calculation in a given tissue is used. Therefore, sums of subcutaneous adipose tissue (SAT) thicknesses can be determined with high accuracy and reliability: the 95% limit of agreement for the experienced measurers was below DI = 1.5 mm (and below DE = 2.2 mm). This enables monitoring changes of SAT mass in athletes (which forms the dominant part of total body fat) with an accuracy of about 0.2 kg.
In female elite athletes, median SAT thickness sums DI were three times higher as in their male counterparts (51 mm vs 17 mm). Before, only preliminary data comparing men and women have been presented [25]. B-mode ultrasound is the only imaging technique that enables also to quantify the amount of fibrous structures embedded in the SAT (fasciae). In this research, the embedded fasciae were quantified for the first time in a large group (N = 76) of elite athletes of various sports. The amount of these connective tissues was significantly lower in the 39 female elite athletes (median: 11%) when compared to the 37 male elite athletes (18%); this further increases the ratio of subcutaneous fat in elite female athletes with respect to that in male athletes. This has not been studied before, and also comparisons of SAT amounts in male and female elite athletes have been missing; only preliminary data of a small group of non-elite athletes were available [3] and exemplary comparisons of four elite athletes [6].
For persons with the same sitting height (i.e., similar leg length), the BMI and the MI, which is a sitting height corrected BMI, are identical (definition of the MI [15,16,17]).
Means of BMI and MI in large groups, which represent mean sitting height can, therefore, be expected to be similar. Median BMI and MI were 22.6 kg m−2 and 22.2 kg m−2, respectively. The small difference in our group may be because a part of the measurements were made in a Hispanic country, where sitting height medians are higher when compared to Caucasian White persons [26]; this results in MI values lower than BMI values. However, the difference between BMI and MI was large in several cases (up to 1.7 kg m−2); a body mass change of more than 5 kg would correspond to a BMI difference of 1.7 kg m−2. Such differences are of core relevance for both assessing the athlete’s health status and for designing competition rules based on ‘relative body mass’ (such rules are currently used in ski jumping, for example [15, 17], where the BMI is used).
Body Fat Measurements in Sport
The status of body composition assessment in sport has been reviewed by the Working Group on Body Composition, Health and Performance (under the auspices of the IOC Medical and Scientific Commission) [2], and best practice protocols for physique assessment in sport were recently presented, including the standardised US method, which is capable of measuring SAT at an accuracy level not reached by any other method [27]. All other methods analysed there are usually not sufficiently accurate for monitoring body composition on the fine scale needed in top-level athletes. This is particularly the case if athletes are excessively small, large, or lean [2], because most athlete groups are highly specialised and their sport-specific physique imperatives are not in line with general morphological norms [27,28,29,30,31]. Therefore, many of the assumptions upon which measurement techniques are based are not valid in athletes. Densitometry, for example, has resulted in scores of minus 12% fat [28], and with DXA, the seven leanest in a group of male athletes showed negative fat on the torso [29]. Obviously, the morphology assumed in the measurement algorithms causes impossible results in lean athletes. Limitations of measurement techniques are discussed in the ESM and in the literature [2, 3, 6, 30,31,32].
Ultrasound Brightness-Mode Imaging and Distance Measurement Accuracy
Diagnostic (brightness-mode) ultrasound has been used for fat measurement since 1965 [33, 34], and many publications followed. At sufficiently high probe frequency (12–18 MHz), the thickness measurement accuracy is approximately 0.1–0.2 mm [3, 6, 35], provided that the appropriate speed of sound in the given tissue is used (1450 ms−1 in fat [18,19,20,21,22]). The high accuracy enables measuring the embedded fibrous structures, which amount to substantial percentages of the SAT (Fig. 2c, d, Tabl.1, and ESM: Tables A1 and A2). A typical US image of SAT at the ‘front thigh’ site is shown in Fig. 1b. A thick layer of gel between the probe and the skin (black band above the epidermis in Fig. 1b, c) avoids compression. This is an important feature of this US measurement technique [3,4,5,6,7] as adipose tissue is highly compressible, and this degree of compressibility varies from site to site and between individuals [3]. Factors influencing accuracy are analysed in the ESM and in various publications [3, 6, 22]. However, the technical accuracy limits for US are not the crucial point: the limitations are set by biological reasons including detection of furrowed borders and visco-elastic deformations of adipose tissue. Therefore, measurement reliability is the overall limiting factor (Table 1).
Table 1 Data of athlete groups Reliability of the Standardised US Method: An Overview
US images can never be captured by different investigators or at different times by the same investigator at exactly the same US probe position and orientation, which affects reliability. Therefore, a standardised technique has been introduced recently [6, 7]. More information about the choice of standard US sites can be found in the ESM, and in previous publications [3, 4, 6, 7]. Reliability obtained by experienced measurers has been tested in groups ranging from lean [4, 6] to overweight and obese [7]; reliability has also been tested in children [10, 13]. However, the extent to which measurer experience plays a role has never before been analysed systematically.
Tables 2 and 3 compare the core results obtained previously by experienced measurers [4, 6, 7] to the findings of the current multicentre study (MCS), in which both experienced and novice measurers were involved in the inter- and intra-measurer reliability studies. Experienced measurers of centres C1–C2 had their US system permanently available, whereas the novices (C3–C5) had to borrow a US system for their measurement series and they had no preceding experience with US imaging. Their training was limited to a 2-day course, followed by supervised US measurements in about five individuals. These are main factors causing the lower accuracy and reliability obtained by the novices.
Table 2 Inter-measurer reliability Table 3 Intra-measurer reliability Measurement deviations of experienced measurers in the current study (95% LOA was ± 1.2 mm for DI that ranged from 6 to 70 mm) did not differ noticeably from previous results (± 1.0 mm, at DI ranging from 10 to 51 mm, [6]). However, the deviations of the novice measurers were substantially larger, indicating clearly that measurers need sufficient experience to obtain the highest accuracy and reliability-level possible. The reasons for the larger errors were: bad US image quality, the US probe was not exactly at the marked position, incorrect interpretation of embedded structures as being muscle fasciae (e.g., Camper’s fascia [3, 4]), the ROI not set symmetrically, or the gel layer not thick enough resulting in fat compression. Another source of error may be that some participants did not stop breathing at mid-tidal expiration when US images were captured.
The inter- and intra-measurer deviations were larger when thicker SAT layers were measured; however, the relative deviations (ΔDI/DI) were found to be smaller with increasing SAT thicknesses [7]. In most cases, the deviations with respect to DE (fibrous structures excluded) are slightly larger, because for measuring DE, several tissue borders within the SAT need to be detected additionally. In the inter-measurer reliability tests, the deviations for novice measurers were about three times larger than for the experienced measurers, but in the intra-measurer reliability tests, this difference was only twofold, indicating that novices repeated some of their measurement mistakes.
Reliability at Individual Measurement Sites
The reliability of the sum D of the eight SAT thicknesses d is composed of the reliabilities of the thickness measurements at the individual sites. Figure 6a–d shows the absolute values ABS(δ) of the measurer differences from their means at the eight sites (ESM: Tables A3 and A4). Median values, interquartile ranges (IQR), and third interquartile values (Q3) were substantially smaller in the group C1–C2 (experienced examiners) compared to C3–C5 (novices) at all sites. At sites with usually higher SAT thickness d, differences ABS(δ) also tended to be higher, but all medians of the experienced group were below 0.2 mm, and below 0.5 mm in the novices’ group. Not only the differences ABS(δ), in mm, but also the relative differences \({\text{ABS}}\left( {\delta_{\text{rel}} } \right) = 100\, \cdot \,{\text{ABS(}}\delta )/\tt\text{d}\), in % of the SAT thickness d at the given site, are of relevance. For example, ABS(δ) is low for EO, but the according ABS(δrel) has the highest value of all sites (ESM: Tables A5 and A6). This is one of the reasons why this site has meanwhile been replaced by lateral thigh (LT) [6]. Another reason is that the site EO causes measurement problems in obese individuals [7].
A further reason for replacing the site EO by LT is that the latter is a pronounced fat depot site in women and thus of high relevance when studying sex differences. The measurement deviations at the site LT (median of absolute deviations was 0.24 mm, median SAT thickness was 14 mm; corresponding to 1.7%) found in an intra-measurer reliability study published in 2017 [7] were comparable to the measurement deviations which these authors found at UA and LA (0.21 mm and 0.26 mm, 12 mm and 19.5 mm; 1.8% and 1.3%, respectively). The participants studied in the cited publication [7] ranged from extremely lean to obesity class III. Based on these findings, the measurement differences at LT in our study group can be assumed to be in a similar range as found at the abdomen sites.
SAT Thickness Measurement Errors Transform Linearly into Fat Mass Errors
The small error of US thickness measurements of a fat layer transforms linearly into the error of subcutaneous fat mass, because the fat volume is proportional to the (calibrated) mean of subcutaneous fat thickness of the whole-body surface. An SAT thickness measurement error of 1.4 mm (95% LOA; see Tables 2 and 3, and Figs. 3, 4, 5) transforms into an SAT mass error of about 0.2 kg (see ESM); this is almost an order of magnitude below the daily body weight fluctuations. SAT makes by far the largest part of total body fat (typically 80–90% of anatomically detectable fat mass [36]). The SAT thickness sums in females can be expected to be higher when the site EO is replaced by LT [7, 25, 37].
None of the measurement techniques for cross-sectional or longitudinal studies of body fat is capable of measuring on such a fine scale as US [2, 27], and no other can quantify the amount of connective tissues embedded in the SAT (‘fascias’), which forms a substantial part of SAT (4.0 to 29.3% in the group of elite athletes studied here).
Relative Body Mass: BMI and MI
Several indices that are power functions of body mass (m) and stature (h) were originally meant for measuring body fatness [38,39,40]. One such index that is widely used is the body mass index (BMI or Quetelet’s index): BMI = m/h2. Figure 2a shows that the BMI is useless for assessing body fat in athletes: as expected [2], there was no correlation between BMI and SAT thicknesses sums. Similar results were found in several other groups, too [7, 11, 37]. Conversely, among anorexia nervosa patients, with extremely low BMI (below 17.5 kg m−2), some individuals may have subcutaneous fat thickness values comparable to those of healthy women [9, 12]. When using the BMI as a measure of ‘relative body mass’, there is a further important limitation that the World Health Organisation (WHO) Expert Committee on Physical Status has pointed out:
“Problems arise, however, in adults whose shape differs from the norm… Care should therefore be taken in groups and individuals with unusual leg length to avoid classifying them inappropriately as thin or overweight” [14]. Based on this justified critique, a measure for relative body mass, the mass index MI has been developed [15, 17]: \({\text{MI}}_{1} = 0.53\;{{m}}/({{hs}})\). This measure considers not only stature h, but also the individual’s sitting height s (and thus, implicitly, the leg length l). For the derivation of the MI1 formula, see ESM. In this study, mean BMI was 22.6 kg m−2 and mean MI was 22.2 kg m−2, the difference MI1-BMI was large in several individual cases, ranging from − 1.7 to 1.3 kg m−2. Particularly in weight-sensitive sports, such differences in individuals are of core relevance for assessing the athlete’s health status and for rising the alarm when the individual’s body weight becomes critical [1].
Characteristics of the Athlete Groups and Their SAT
Figure 2a shows that there was no correlation (R2 = 0.13) between BMI (which ranged from 17.9 to 29.0 kg m−2) and SAT thicknesses DI (ranging from 6 to 160 mm). This also holds true for the MI1 (R2 = 0.09). Neither BMI nor MI1 give useful information about athletes’ body fat. Although relative body mass was 1.0 kg m−2 lower in females in terms of BMI (and 1.6 kg m−2 in terms of MI), their median DI was 3.0-times higher (51.1/17.2 = 3.0). In addition, their median percentage of embedded fibrous structures was 1.7 times lower than in males: therefore, females’ median DE was 3.2 times the value found in males (Fig. 2c, d; Table 1). In the sub-group of athletes in weight-sensitive sports, women (median DI = 33.1 mm) had about 3.5 times the amount of SAT as men (median DI = 9.5 mm), and for athletes in the non-weight-sensitive group, females’ median DI (66.7 mm) was 2.9 times higher than that in males (DI = 23.1). Using LT instead of EO would further increase the ratio because LT is a prominent fat depot site in women [25]. Four (of 39) DI values of women were below 25 mm, and 15 (of 37) values of men were below 12 mm (“extremely low” according to [12]).
The means of all female participants were significantly higher for DI, DE, and DF, and significantly lower for DF,% when compared to means of all male participants (p ≤ 0.001). The percentage of embedded fibrous structures tended to decrease with increasing DI in both male and female participants (R2 = 0.35 and 0.41, respectively). The median percentage of fibrous structures for all athletes was 13.3% (4.0–29.3%), for male athletes 18.3% (8.9–29.3%), and for female athletes 10.5% (4.0–22.5%).
The difference in SAT between highly trained male and female athletes is large in most cases. This also holds true for total body fat (TBF), because SAT mass represents the major part of TBF (typically 80–90%) [36].
Limitations
-
1.
Visceral adipose tissue, which is typically about 10–20% of total body fat [36] (but may also be beyond this percentage range in some individuals), is not included in the US SAT measurement. This has to be considered when using SAT as a surrogate for total body fat.
-
2.
Currently, only preliminary normative data are available for comparisons [12].