Background

The advent of ultrasonography and its swift advances has in the recent years significantly improved prenatal diagnosis and care globally [1, 2]. In the early stages of a pregnancy, ultrasound is essential in predicting the risk of adverse pregnancy outcomes such as aneuploidy, stillbirth, pre-eclampsia and the possibility of abnormal cord insertion visualization [3, 4]. It is also used for fetal anatomic surveys during a second-trimester scan to detect fetal malformations, monitoring fetal growth in utero and in pregnancy dating [5,6,7]. Therefore, given the essential role of ultrasonography in clinical decision making, it is imperative that sonographic parameters obtained are accurate and precise [8]. However, a small percentage of error in measurements or incompleteness of the information obtained is at times unavoidable. [9, 10]. In first trimester, measurement error of CRL and MSD has been reported to be ±18.78% limits of agreement in United Kingdom (UK) [11]. If significant, this error has implications on the accuracy of estimates of the fetal gestation age obtained. And if not taken into account at MSD or CRL cut offs used for the diagnosis of miscarriage, some normal pregnancies may be erroneously deemed non-viable [11]. Consequently, this could lead to inadvertent termination of viable embryos and immense physical and emotional harm to the patient [11,12,13].

The unavoidable measurement error or incompleteness in information obtained during an ultrasound examination is related to various factors including but not limited to the skill of the sonographer and their level of training; technical factors related to the patient such as body habitus; the quality of the machine; fetal position; and the duration of the examination [14]. As in other low resourced settings, Uganda’s healthcare system faces severe shortage of imaging experts [15,16,17]. This results in high workload which affects the performance and efficiency of health workers. In addition, majority of the low-income countries lack adequate resources to acquire high-end ultrasound machines with very good spatial resolution [16, 18]. With low spatial resolution machines, images appear blurred or enlarged, and due to this effect, calipers are placed beyond or may not cover the true dimensions leading to errors in measurements [19]. Errors arising from variation between machines have been found to be substantial [19]. The Ministry of Health Standards on Diagnostic Imaging and Therapeutic Radiology in Uganda recommends the use of CRL cut off of 5 mm to diagnose a miscarriage yet this has changed following recommendation by recent studies. The use of the outdated CRL cut off of 5 mm increases the risk of misdiagnosing normal pregnancies. This practice guidelines does not also provide clear guidance for measurement of MSD [20]. This may lead to significant variations in MSD measurements.

The reliability of CRL and MSD measurements in first trimester using modern ultrasound equipment has not been adequately explored in the low developed countries like in the developed nations [11, 19, 21]. This study sought to understand the level of intra- and inter-observer variability in measuring MSD and CRL in women between 6 and 10 weeks’ gestation at Mulago National Referral Hospital.

Methods

This was a cross-sectional study conducted on pregnant women at the Department of Obstetrics and Gynecology, Mulago National Referral Hospital, Uganda from January to March 2016. We consecutively enrolled women with a single viable intrauterine embryo from 6 to 10 weeks of gestation and not bleeding. The first observer examined a woman who had consented, to assess if they were eligible for inclusion in this study. The second observer then further examined the eligible participant. The two observers examined each woman at the same point in time. Both observers used a Phillips Envisor (PHILIPS, USA, 2009) with a 7.5 MHz transvaginal probe for B-imaging to do all examinations.

For each examined participant, the observers took CRL measurements twice and MSD measurements once, and in between the two CRL measurements, the observers examined the ovaries and uterus. These measurements were obtained as described in the WHO Manual of diagnostic ultrasound, Volume 2 [5] (Fig. 1). To archive blinding, the measurements of the first observer were removed from the machine before the second observer was allowed to enter the examination room. The same two sonographers that examined all the women had good training in obstetric sonography and at least five years of experience in fetal ultrasound. A female nurse or professional was always brought into the examination room for all the transvaginal ultrasound scans done by the male sonographer to make the women feel comfortable and safe.

Fig. 1
figure 1

a Measurement of mean sac diameter at 8 weeks’ gestational age using transvaginal ultrasound scan. Gestational sac diameter was obtained by placing the calipers inner-to-inner on the sac wall, excluding the surrounding echogenic rim of tissue. MSD was calculated by first adding the longitudinal, anteroposterior and transverse dimensions of the chorionic cavity. Thereafter, the sum of the three measurements was divided by three. b Measurement of crown–rump length with transvaginal ultrasound at 8 weeks’ gestational age. CRL was measured as the maximal straight-line length of the embryo, obtained along its longitudinal axis, with the embryo neither too flexed nor too extended

Statistical issues

Sample size

The sample size calculations were based on the formula below by considering 95% Limits of agreement (LOA) of ±18.78% as the cut off for clinical significance [11, 22, 23]. In the formula, n = desired sample size and s = standard deviation of the differences in CRL or MSD measurements [24].

$$ 1.96\kern0.5em \sqrt{\left[\frac{3{\mathrm{s}}^2}{\mathrm{n}}\right]}\kern0.5em =\kern0.5em \mathrm{Desired}\ \mathrm{confidence}\ \mathrm{interval}\ \mathrm{of}\ \mathrm{limits}\ \mathrm{of}\ \mathrm{agreement}\ \left[24\right]. $$

Statistical analysis

Data was double entered and validated in Epidata version 3.1 to identify inconsistent entries before being exported to SPSS Version 19.0 for analysis. Scatterplots of paired sets of measurements created with the line of equality were visually assessed for potential systematic errors in the intra and inter-observer measurements. A paired t-test at 0.05 set level of significance was used to check if the paired sets of measurements were significantly different, to rule out any systematic errors in the measurements.

To assess the strength of the absolute agreement within and between observers, the intraclass correlation coefficient (ICC) was computed based on a two-way random effects model [24,25,26]. Normality, constant mean and variance assumptions for LOA were fulfilled. Therefore, the difference between paired sets of measurements were plotted against their mean in Bland–Altman plots to assess the level of clinical agreement within and between the observers. The lack of agreement between measurements or observers becomes relevant only when the LOAs are wider than what is clinically acceptable [27, 28]. Technical error of measurements (TEM) within and between observers were calculated by taking the square root of the sum of the squares of the differences of the paired sets of measurements divided by twice the total number of participants measured.

Results

We screened 71 pregnant women suspected to be in first trimester and enrolled 56 in this study. Of the 15 women excluded from the study, one had a ruptured ectopic pregnancy; three had empty gestation sacs; six were more than 10 weeks of gestation pregnant; three were not pregnant and two declined to be examined after consenting. The mean (SD) maternal age was 25.8 (4.33) and mean (SD) gestation age was 7.5 (1.14) (Table 1).

Table 1 Demographic characteristics of women between 6 and 10 weeks of gestation in Mulago Hospital, Kampala, 2016

Intra-observer ICCs were 0.993 and 0.995 for CRL measurements while inter-observer ICCs were 0.988 for CRL and 0.955 for MSD measurements (Table 2). Intra-observer 95% LOAs for CRL were ± 2.04 mm (Fig. 2) and ± 1.66 mm (Fig. 3). Inter-observer 95% LOAs were ± 2.35 mm (Fig. 4) for CRL and ± 4.87 mm for MSD (Fig. 5). Intra-observer relative TEM for CRL were 4.62% and 3.70%, while inter-observer relative TEM were 5.88% for CRL and 5.93% for MSD measurements respectively (Table 3).

Table 2 The intraclass correlation coefficients of CRL and MSD measurements of women between 6 and 10 weeks of gestation in Mulago Hospital, Kampala, 2016
Fig. 2
figure 2

Bland–Altman plots with 95% limits of agreement showing intra-observer agreement of crown–rump length measurements of observer 1. Y axis title: Difference in CRL (mm). Y axis scale = 1. From − 5, − 4, − 3, − 2, − 1, 0, 1, 2, 3, 4, to 5. X axis: Mean of first and second CRL measurements of observer 1 (mm). X axis scale = 10. Start and end point: 0, 10, 20, 30, 40. ▬▬▬▬▬▬ Reference point where the mean difference between repeated measures is equal to zero. ▬ ▬ ▬ ▬ ▬ ▬ The upper and lower limit of the 95% confidence interval of limits of agreement

Fig. 3
figure 3

Bland–Altman plots with 95% limits of agreement showing intra-observer agreement of crown–rump length measurements of observer 2. Y axis: Difference in CRL (mm). Y axis scale = 1. From − 5, − 4, − 3, − 2, − 1, 0, 1, 2, 3, 4, to 5. X axis: Mean of first and second CRL measurements of observer 2 (mm). X axis scale = 10. Start and end point: 0, 10, 20, 30, 40. ▬▬▬▬▬▬ Reference point where the mean difference between repeated measures is equal to zero. ▬ ▬ ▬ ▬ ▬ ▬ The upper and lower limit of the 95% confidence interval of limits of agreement

Fig. 4
figure 4

Bland–Altman plots with 95% limits of agreement showing inter-observer agreement of crown–rump length measurements of observer 1 and observer 2. Y axis: Difference in CRL (mm). Y axis scale = 1. From − 5, − 4, − 3, − 2, − 1, 0, 1, 2, 3, 4, to 5. X axis: Mean of CRL measurements of observers 1 and 2 (mm). X axis scale = 10. Start and end point: 0, 10, 20, 30, 40. ▬▬▬▬▬▬ Reference point where the mean difference between repeated measures is equal to zero. ▬ ▬ ▬ ▬ ▬ ▬ The upper and lower limit of the 95% confidence interval of limits of agreement

Fig. 5
figure 5

Bland–Altman plots with 95% limits of agreement showing inter-observer agreement of mean gestational sac diameter measurements of observer 1 and observer 2. Y axis title: Difference in MSD (mm). Y axis scale = 1. From − 5, − 4, − 3, − 2, − 1, 0, 1, 2, 3, 4, to 5. X axis title: Mean of MSD measurements of observers 1 and 2 (mm). X axis scale = 10. Start and end point: 0, 10, 20, 30, 40, 50, 60. ▬▬▬▬▬▬ Reference point where the mean difference between repeated measures is equal to zero. ▬ ▬ ▬ ▬ ▬ ▬ The upper and lower limit of the 95% confidence interval of limits of agreement

Table 3 The technical error of measurements of CRL and MSD of women between 6 and 10 weeks of gestation in Mulago Hospital, Kampala, 2016

Discussion

This study found a strong observer agreement with intra- and inter-observer ICCs ≥0.955 and this is similar to findings from other studies [29, 30]. Inter-observer 95% limits of agreement for MSD and CRL measurements were also in tandem with findings from other studies [11]. However, intra-observer 95% limits of agreements for CRL measurements were about 2% higher than findings reported in a study by Pexters and colleagues [11]. They reported intra-observer limits of agreement of CRL of ±8.91 and ± 11.37% [11]. The minor differences observed could be attributed to the differences in settings such as observers, patient overload and the finite consistency and read-out precision of the instrument used to measure the structures [9]. The study by Pexters et al. used an ultrasound machine with a 6–12-MHz transvaginal transducer for B-mode imaging while our machine was equipped with a 7.5-MHz probe [11]. Intra-observer inconsistencies highlight a lack of clear or uniform criteria of measurement and interpretation of embryonic landmarks [31]. Detailed instructions in locating landmarks are necessary to minimize intra- and inter-observer technique difference [31]. The majority of our study participants were between 6 to 7 weeks of gestation. At this stage, reproducibility of CRL measurements is better than it is later in the first trimester because of increased embryonic mobility at about 8 weeks’ gestation and above [7]. This could also explain the optimal reliability observed in this study. The relative TEM observed were within clinically acceptable variability in the precision of anthropometric measurements of 5.0% and 7.5% for intra-observer and inter-observer variability respectively [10].

The strength in this study is that it utilized an ultrasound machine with a high spatial resolution. We used the best available ultrasound machine in our setting at the time this study was conducted. This allowed a clear delineation of the anatomical landmarks of the embryo and the gestational sac therefore minimizing measurement errors. In using the same machine, we also eliminated errors due to differences in the machines. The short time interval between intra-observer measurements was our major limitation.

The intra- and inter-observer differences in crown-rump length and mean sac diameter relates to the utility of these measurements in first trimester to accurately estimate gestation age and/or make a diagnosis of early pregnancy loss [5]. If the error is substantial, it may have serious clinical consequences. Our study has shown that intra and inter-observer error of CRL and MSD measurements among pregnant women in our setting were within acceptable limits. Therefore, in relation to the accurate estimation of the gestation age, it is unlikely to result in large differences in days when dating a pregnancy. However, in relation to making a diagnosis of early miscarriage, even a difference of 1 mm can have an impact on the clinical decision [11]. Since our findings are within acceptable limits reported by Pexters et al. and other studies, an MSD cutoff of 25 mm and CRL cutoff of 7 mm for the diagnosis of early miscarriage should be suitable for use in our setting. These cut offs take into account measurement error and were amended as new guidelines [22, 23]. A large multicenter prospective study has demonstrated that these cutoffs are appropriate, with mean gestational sac diameter ≥ 25 mm with an empty sac (364/364 specificity: 100%, 95% confidence interval 99.0% to 100%), embryo with crown-rump length ≥ 7 mm without visible embryo heart activity (110/110 specificity: 100%, 96.7% to 100%) [32].

Conclusions

Intra- and inter-observer error of CRL and MSD measurements among pregnant women at Mulago hospital were within acceptable limits. This provides assurance that the error in the estimates of gestational age obtained are within acceptable margins of ±3 days in first trimester. The CRL and MSD cut offs of ≥7 mm and ≥ 25 mm are therefore reliable for diagnosis of miscarriage on TVS in our setting. However, these results should be generalized to the rest of the country with caution. Such diagnostic accuracy levels are achievable in Mulago hospital because it is a national referral hospital with sophisticated equipment and highly trained personnel. We recommend further studies in the lower health facilities to establish their diagnostic accuracy levels. Sonographers can achieve acceptable and comparable diagnostic accuracy levels of MSD and CLR measurements with proper training, regular audits and adherence to practice guidelines.