Background

Infertility was defined as the inability to conceive after a year of consistent, unprotected sexual activity because of either the individual or the partner’s decreased fertility [1]. Infertility impacts one-fourth of couples in developing countries [2, 3]. After the age of 30, women’s fecundity declines due to a proportionate decrease in the quantity of eggs that are accessible [4]. Nonetheless, it is challenging to predict the rate at which a given woman’s reproductive function deteriorates [5]. Interindividual variations exist among women who suffer from infertility [6]. Ensuring that the patient has enough eggs to develop high-quality embryos for implantation into the uterus without running the danger of overstimulating the ovaries is one factor that contributes to the success of in vitro fertilzation (IVF) [7].

The goal of ovarian stimulation medication therapy is to promote the growth of ovarian follicles [1, 8]. The administration of exogenous gonadotropin, which enables the collection of several oocytes in a single cycle, forms the cornerstone of ovarian stimulation for assisted reproduction. So-called controlled ovarian stimulation (COS) is employed when these medications are given only to stimulate follicles and oocyte collection is finished with assisted reproductive technology (ART) [9]. Patients of the same age, following the same protocol, or using the same medication all have quite different responses to controlled ovarian stimulation [6, 7, 10]. While studies [11,12,13,14,15,16,17,18,19,20,21,22,23,24] have shown that only a handful of women who are receiving controlled ovarian stimulation (2–50%) have a poor ovarian response, there is no consensus on what constitutes a good or poor ovarian response. However, a single stimulation cycle is necessary to predict poor ovarian response [25].

Many factors such as patient age, premenstrual basal concentrations of hormone markers (e.g., FSH, LH, estradiol, inhibin B, and more recently AMH), and ultrasound measurements (pretreatment antral follicle count) are predictive of ovarian response [6, 7, 9, 10]. However, given the high expense and difficulty of therapy for treatment-seeking couples, identifying ideal markers of ovarian response is very useful for helping reproductive specialists choose the best dosage of gonadotropins for ovarian stimulation and for predicting the outcomes of ART [9, 10, 26, 27].

One of the biggest health issues facing Ethiopian women today is infertility [3]. Furthermore, the fact that Ethiopia has only one government-run fertility institution complicates the issue’s resolution. To the best of our knowledge, there are no data about the ovarian response or its predictors.

It is crucial to identify patients based on their ovarian response to modify treatment protocols, customize care for patients, prevent treatment failures and complications, and improve pregnancy outcomes [6, 8].

Thus, the present study aimed to determine the degree of ovarian response to controlled stimulation and identify its predictors at St.Paul’s Hospital Millennium Medical College in the Center for Fertility and Reproductive Medicine by examining clinical characteristics, treatment protocols, gonadotropins, hormonal factors, and transvaginal ultrasound data.

Methods

Study setting, design, participants

A retrospective follow-up study was undertaken from April 1, 2021, to March 31, 2022, at St.Paul’s Hospital Millennium Medical College Reproductive Medicine and Fertility Center in Addis Ababa, Ethiopia. It is the first public health facility to provide advanced options for infertility treatment such as in vitro fertilization (IVF). Established in January 2019, the Center for Fertility provides annual infertility treatment to thousands of couples [28]. The center offers several work-ups for infertility, such as hysterosalpingography, semen analysis, hormone analysis including anti-Mullerian hormone concentration, and transvaginal antral follicular count. Despite the center’s establishment having occurred three years prior, the study period was purposefully chosen because of the increased accessibility of investigative modalities, medications including gonadotropins, and infertility experts. The period of data collection was April 1, 2022–April 30, 2022.

The women who visited the center for infertility treatment composed the source population. The study population consisted of all the women who met the inclusion criteria during the study period and participated in the first cycle of controlled ovarian stimulation. To be included in the final analysis, all relevant variables were needed, such as the determination of the oocyte status at retrieval and the antral follicular count. Women with incomplete documentation and a single ovary were excluded. Those with high and normal responses were regarded as having adequate responses.

On the third day of the menstrual cycle, the basal concentrations of FSH, estradiol (E2), and AMH at any point during the cycle were determined. From day three to day five of the same cycle, all patients underwent ovarian ultrasonography handled by fertility specialists utilizing two-dimensional transvaginal ultrasound with a transvaginal probe operating at 5–10 MHz. For the AFC and AMH levels, the standard normal were ≥ 5 counts and ≥ 1.2 ng/ml, respectively [29, 30].

The three primary stimulation protocols used were antagonistic, minimal-stimulation, and long protocols. Patients under 35 years old with good ovarian reserve (the AFC greater than 5) are typically candidates for the long protocol. After confirming the existence of a cyst or dominant follicle, the patient was prescribed 3.6 mg of goserelin (Zoladex) subcutaneously on day 21 of her period. She returned for stimulation on day 2 of her period or 14 days after receiving the Zoladex injection, whichever came first. Human menopausal gonadotropin (HMG) is started at a dose determined by age and body mass index (BMI) and can be used either alone (Menopur) or in combination with recombinant FSH (Gonal-F) if there are no contraindications to stimulation (no ovarian cysts larger than 10 mm). Transvaginal ultrasonography was used to observe changes in the endometrium and follicle size, and the dosage was adjusted according to the findings. Patients older than 35 years, with low ovarian reserve (AFC < 5), and who could not afford a lengthy treatment course were advised to receive minimal stimulation. Starting on day 2 of the cycle, 5 mg of letrozole was taken orally for 5 days, and 150 IU or 225 IU of hMG SC was added on day 4. Cetrotide-mediated downregulation commenced when the size of the leading follicle reached 14 mm. When minimal stimulation failed or the desired response was not as expected, antagonist protocols were frequently used. The gonadotropin dose is not fixed and is instead determined by age and BMI. This is the only distinction between the minimally stimulated regimen and the antagonist regimen. The minimally stimulated regimen started with letrozole on day 2 in contrast to the antagonist regimen.

Sample size determination and sampling technique

The sample size was calculated to be 384 using the single population proportion formula, with a 95% confidence interval, 5% margin of error, and 50% proportion of good ovarian response because the degree of ovarian response in the study region was unknown. By accounting for a 10% rate of inadequate documentation, the ultimate sample size became 422. To choose the sampling units, convenience sampling was used, which involved using the medical record number found in both the chart and computerized medical records.

Operational definitions

The ovarian response (as good or poor) was defined as the degree to which the ovarian follicles developed in response to gonadotropins. On the day of the oocyte maturation trigger, > 3 follicles and/or > 3 oocytes retrieved were considered to indicate a good response. A poor ovarian response was the reverse of what happens in a good ovarian response [9, 12, 31]. The identified ovulatory factors included hyperprolactinemia, polycystic ovary syndrome (PCOS), and thyroid disorders. Intrauterine adhesions, bicornuate uterus, polyps, and submucous myoma were considered uterine/endometrial factors.

Study variables

The degree of ovarian response was the dependent variable. The independent variables included age, the duration of infertility, the cause of infertility, the stimulation protocol employed, the induction length, the type of medication, the antral follicular count (AFC), hormonal test results for follicle-stimulating hormone (FSH), thyroid-stimulating hormone (TSH), estradiol, prolactin, and anti-Mullerian hormone (AMH).

Data collection tools, procedures, and data quality assurance

A structured checklist was used to collect the patient data. It was developed using registration log books and various literature [6, 7, 10,11,12] to address the sociodemographic characteristics, hormonal test results, sonographic findings, treatment protocols, and types of medications used. One day of training was given to data collectors and supervisors. The principal investigator provided the training. Two certified nurses with at least two years of experience at the center worked as data collectors, while two general practitioners served as supervisors. The quality of the data was evaluated each day after data collection.

Data processing and statistical analysis

The data were coded, verified for accuracy, and entered into Epidata version 4.2. Then, SPSS version 26.0 was subsequently used to export and analyze the data. One-way ANOVA, post hoc tests, and independent samples t-tests were used for evaluating relationships and differences between significant categorical factors and between those variables and the continuous variable. The data were summarized, tabulated, analyzed, and expressed with descriptive statistics such as frequency, percentage, mean, median, and interquartile range. Frequencies and percentages are presented for categorical variables, the mean is presented for continuous variables with normally distributed data, and the median and the interquartile range are presented for data with a nonnormal distribution. To identify predictive factors of ovarian response, a binary logistic regression model was applied. The final model of multivariable analysis included all variables with a p-value of less than 0.25 in bivariate analyses. Stepwise variable selection was employed. Model fitness was checked by the Hosmer-Lemeshow test and goodness of fit by the omnibus test of coefficients. The adjusted odds ratio was used to determine the strength of the measure of association. A p-value < 0.05 was considered to indicate statistical significance in the final model. ROC curve analysis was used to determine the ideal cutoff points for the selected predictors and to assess the predictive accuracy, sensitivity, and specificity of the predictors of ovarian response.

Results

Sociodemographic and clinical characteristics

Four hundred and twelve patient charts and electronic medical records were reviewed and included in the final analysis. The mean age of the participants was 32.3 ± 5.1 years (mean ± SD), and the age range was 20 to 47 years. More than half (61.4%) of the participants were under 35 years old. Among the patients, 93.2% lived in urban areas and nearly all (98.3%) were married. In addition, the majority (70.6%) of the study participants was parous, and 44.6% had at least a secondary level of education. Regarding the causes of infertility, the tubal factor was responsible for half (50.2%) of the cases. Polycystic ovarian syndrome (PCOS) was present in four patients (1%) based on the Rotterdam Criteria. Most of the patients (54.6%) were not able to conceive for more than five years (Table 1). Three patients with ovarian hyperstimulation syndrome were diagnosed and treated by infertility specialists because the study included patients with PCOS and other hyperresponders.

Table 1 Sociodemographic characteristics of participants at St.Paul’s Hospital (April 1, 2021–March 31, 2022)

Results of hormonal testing and transvaginal ovarian sonography

FSH was determined in 63.3% of patients with a median value of 6.6 (IQR: 4.37–9.06), AMH in 23.3% of women with a median value of 0.7 (IQR: 0.28–1.58), AFC in all patients with a median value of 6.9 (IQR: 4.36–12.36). On the other hand, the median values of TSH (in miu/l), prolactin (in ng/ml), and estradiol (in pmol/l) were 1.7 (1.13–2.7), 17.15 (11.96–23.74), and 56 (36.65–133.7), respectively. Of the women whose FSH levels were checked, 2.7% had abnormal values; 67.7% had AMH levels less than 1.2; 39.1% had Antral follicle counts (AFCs) less than 5 (Table 2).

Table 2 Hormone test results and the antral follicle counts

Stimulation protocol and the ovarian response to controlled ovarian stimulation

For the stimulation protocol that was employed, 62.7% of the participants used Ministim. Conversely, 8.7% and 28.6% of the patients, respectively, followed the antagonist and long protocols. The stimulation lasted an average of ten days (minimum four days, maximum fifteen days). A good ovarian response (meaning that more than three oocytes were retrieved) of 67% (95% CI (62.2–71.5)) was achieved. A mean of 7.6 ± 7.5 (0–60) oocytes were retrieved. Using one-way ANOVA and post hoc tests, this study also attempted to examine the mean difference in the total number of oocytes retrieved for each patient between treatment protocols. The mean number of oocytes retrieved across all treatment protocols varied significantly (F.Statistics = 101.8 with p-value < 0.001). For each of the three treatment protocols, a difference was noted (post hoc Bonferroni p-value < 0.001). To determine whether there was a mean difference in the total number of oocytes retrieved between the two regularly utilized medication types (pure FSH and combined form (HMG)), the independent samples t-test was also employed. A mean difference = 5.4, 95% CI (3.19–7.9), and p-value < 0.001 indicated a statistically significant difference.

Predictors for ovarian response after controlled stimulation

The following twelve factors were subjected to univariate analysis: patient age, duration of infertility, cause of infertility, type of protocol used, induction length in days, type of medication, AFC, FSH, TSH, level of estradiol, level of prolactin, and AMH. Except for the cause of infertility, eleven factors had a p-value of < 0.25. The variance inflation factor was used to verify the multicollinearity test (mean VIF = 1.1). The model with the lowest Akaike information criterion and the lowest Bayesian information criterion (AIC = 346.2, BIC = 339.5) included seven variables. Anti-Mullerian hormone levels, antral follicular count, and duration of stimulation were found to be predictors of ovarian response to controlled stimulation with a p-value < 0.05. The Hosmer-Lemeshow test, which was used to evaluate the goodness of fit test, was not significant (p-value = 0.508), while the omnibus test, which assessed model fitness, had a p-value < 0.001.

Keeping the other variables constant, compared to patients with a normal range of Anti-Mullerian hormone levels, those with an AMH level < 1.2 had an 81% reduced ovarian response to controlled ovarian stimulation (p-value = 0.0003). Similarly, patients with an AFC of less than five antral follicle counts had 84% lower odds of ovarian response than patients with higher AFC did (p-value = 0.0004), after adjusting for other confounders. Finally, when all the other variables were held constant, more than ten days of induction increased the chance of a good ovarian response by 77% (p-value = 0.04) (Table 3).

Table 3 Predictors for ovarian response to controlled stimulation (n = 412)

Receiver operating characteristics (ROC) curve analysis

ROC analysis was performed to determine the predictive accuracy of the predictors. The ROC curve for the AFC is as follows: AUC = 0.844, 95% CI (0.805, 0.883), and p-value < 0.001. According to the recently introduced best approach for determining the cutoff point, the index of the union selection method, the ideal cutoff level for discriminating between good and poor ovarian response was a follicle count of 5.5; and 72.8% was the specificity and 77.2% was its sensitivity. At this ideal cut-off point, the positive and negative predictive values were 85.2% and 61.1%, respectively (Fig. 1).

Fig. 1
figure 1

ROC curve for the antral follicular count

The area under the curve (AUC) for anti-Mullerian hormone (AUC = 0.719, 95% CI (0.615, 0.823), p-value < 0.001) is displayed by the next ROC curve. With a sensitivity and a specificity of 65.2% and 66%, respectively, the discriminative optimum cutoff point for AMH was 0.71ng/ml. At this ideal cutoff point, the positive and negative predictive values were 63.8% and 67.3%, respectively (Fig. 2).

Fig. 2
figure 2

ROC curve for anti-Mullerian hormone concentration

The third involved analyzing the ROC curve for the induction length. The area under the curve (AUC) was 0.668, 95% CI (0.611, 0.724), and the p-value was < 0.001). Although there was a significant association between induction length and ovarian response, the predictive accuracy was not very strong (AUC < 0.7) (Fig. 3).

Fig. 3
figure 3

ROC curve for induction length

When we evaluated how combined testing improved the predicted accuracy of the two abovementioned good predictors, anti-Mullerian hormone and antral follicular count, the combined test did not significantly improve the prediction of AFC, according to receiver operating characteristics curve analysis (AUC = 0.801, 95% CI (0.715, 0.887), p-value = 0.07) (Fig. 4).

Fig. 4
figure 4

ROC curve for combination testing for both the AFC and AMH concentration

Discussion

This study was conducted to evaluate the degree of ovarian response and its predictors among women who had first-cycle controlled ovarian stimulation at St.Paul’s Hospital in Addis Ababa, Ethiopia. This is the first study in the nation to evaluate ovarian response through the use of clinical factors, transvaginal ultrasound, gonadotropins, and stimulation protocols. The overall percentage of good ovarian response was 67% according to the number of oocytes retrieved. The ovarian response was found to be predicted by the AFC, AMH concentrations, and induction length.

Two-thirds of our patients achieved a good ovarian response, which was consistent with most previous studies [11, 13, 15,16,17,18,19,20,21,22,23,24]. However, compared to studies performed in Egypt (94%) [32] and the United Kingdom (88.9%) [12], this number was much lower. This percentage was, still greater than the 50% registered with the American Society for Reproductive Medicine [14]. The lack of an agreed-upon definition for ovarian response may help to explain this. Instead of quantifying the number of oocytes retrieved, ovarian response has been defined based on factors such as advanced age, cycle cancellation, leading follicles before trigger, follicular growth after trigger, previous poor ovarian response, and abnormal ovarian reserve test results including peak estradiol level [6, 7, 9, 10, 12, 25, 33]. Furthermore, women over the age of 40 and those with endocrine problems such as PCOS or hypo- or hyperthyroidism were included in our study.

Concerning the prognostic value of the AFC and AMH concentration for ovarian response, our findings were in line with earlier research [9, 26, 31, 34,35,36,37,38,39,40,41] that employed various treatment protocols. Compared to conventional indicators such as age, basal FSH, and estradiol, AFC and AMH were superior predictors of ovarian response. These findings will assist clinicians in deciding whether to start gonadotropins based on the AFC and AMH levels as opposed to patient age and baseline FSH levels [12, 40, 42, 43]. A further implication could be heightened absorption of AMH and the AFC in contrast to that of FSH, which might be less indicative of an ovarian response [35, 41]. This, however, contradicts the findings of a systematic review [5], which indicated that all tests, including the AFC and AMH levels, have little clinical significance in terms of predicting poor response. As no test could accurately predict all patients’ poor ovarian response, it was advised that the best course of action would be to start the first cycle of ovulation induction without performing any precycle testing [25].

Unlike prior studies [12, 39, 44,45,46,47,48], which indicated similar predictive values of AFC and AMH, our findings showed that the AFC was a better predictor of retrieval success than was the anti-Mullerian hormone concentration. However, other studies [41, 49, 50] showed that the AMH concentration had a greater predictive value than the AFC. These differences could be explained by variations in transvaginal ultrasound sensitivity, inter- and intraobserver variability in ultrasound examination, variations in the repeatability and cycle stability of AMH, and the use of similar or different stimulation protocols for comparison. The AFC performed better, according to certain studies [10, 51, 52], which was in keeping with our findings. Because AMH levels are expensive and difficult to measure and access, our finding that the AFC has a greater predictive value also has significant repercussions. These findings suggested that both the AFC and AMH concentrations are suitable for assessing ovarian response. Clinicians can readily determine the antral follicle count in infertility patients on a routine basis. Women older than 35 and women with PCOS can benefit from the selective use of AMH when utilizing AFC as a surrogate marker [53]. The absence of standardization in AMH assays and interindividual variability are other drawbacks of using AMH [9].

The optimal thresholds for the AFC and AMH concentration for predicting good or poor ovarian response were 5.5 and 0.71ng/ml, respectively, in our study, in contrast with previous studies (AFC-10, AMH-0.99) [12], and (AFC-8.5, AMH-1.22 [10]. The population variations in the study areas may explain the difference. A further explanation can stem from employing distinct techniques to ascertain ideal threshold values. In contrast to the current study, which used the index of union approach to determine the ideal cutoff point, the majority of earlier studies used the highest sum of sensitivity and specificity. Similarly, there was a correlation between the two important predictors and patient age, indicating that our research did not exclude older age groups. However, these findings were similar to the findings of a Chinese cohort study [54] in which the optimal AMH concentration was 0.8ng/ml, and that of the UK report [55], which was 0.7ng/ml concentration. The application of a single protocol in those two studies was the distinction. In many studies [56,57,58], the optimal AFC cutoff point was shown to be in the range of 3 to 10.

According to the current evidence, induction length was found to be a predictors of ovarian response. However, we were unable to locate a previous study demonstrating the exact length of stimulation as a predictor of ovarian response. These findings regarding the average length of stimulation were inconsistent with those of studies conducted in Brazil and the UK [12, 59]. Nonetheless, given their tight connection with the nature of the treatment protocol employed, these outcomes made sense. This may also call for an evaluation of the length of the menstrual cycle as a factor affecting the choice of treatment protocol and the ovarian response. Consequently, it is important to take into account treatment protocols and menstrual cycle length when interpreting induction length since this consideration will help individualize care for each patient.

The retrospective nature of this study was one of its main limitations. Advanced age, endometriosis status, inheritance status, endocrine problems, BMI, menstrual cycle duration, hormone therapy status during the past three months, and pelvic surgery were among the factors affecting ovarian response and were not considered in this study. There was no attempt to determine the predictive performance of the AFC and AMH concentration according to the treatment protocols. Due to the use of 2-D ultrasound rather than 3-D ultrasound performed by our setup, comparisons were challenging. The quality of the hormone tests could not be evaluated. Hyperresponders and normal responders were not distinguished in the study. The study did not account for the subsequent induction cycles. The use of ROC curve analysis to display the relative predictive values and ideal cutoff levels for the predictors was one of the study’s strengths. By using electronic medical records, we were able to lower recall bias and increase response rates. Since this was the first exploration in our context, it will provide important preliminary data for future studies. The conclusions of this study should be interpreted carefully in light of these limitations.

Conclusion

Only two-thirds of our patients achieved a good ovarian response. The antral follicular count, anti-Mullerian hormone concentration, and induction length were found to be predictors of ovarian response. The AFC was the strongest predictor, and combination testing did not improve its accuracy. Thus, in settings with limited resources, the AMH is selectively used, whereas the AFC should be performed on all of our patients. Future studies are required to verify the findings about the optimal cutoff point, evaluate ovarian responses in each treatment protocol, and explore additional potential factors influencing ovarian response.