Introduction

Infertility, defined as the inability to achieve clinical pregnancy after 1 year of regular unprotected sexual intercourses, is estimated to affect between 8 and 16% of reproductive-aged couples worldwide [1]. Thus, unsurprisingly, assisted reproduction techniques (ART) are increasingly applied in current clinical practice, due both to the relevant incidence of couple infertility and to the advanced age of couples starting to search a pregnancy [2]. Globally, it is estimated that more than eight million babies have been conceived through ART [3]. However, infertility treatment is a long-term and expensive therapy with high dropout rates [4] and nearly half of all couples who started ART are likely to remain childless even after multiple treatment cycles, with foreseeable sequelae in terms of psychological, social, and economic health [5,6,7]. In this context, mathematical models are generated predict strong outcomes, such as pregnancy and live birth rates. On these parameters, several predictive models for ART have been developed over the last three decades in order to estimate individualized chances of treatment success [8]. These models are required to select either the ART approach applicable to the couple or the best treatment for the female partner [9]. However, these algorithms are scantly applied in current clinical practice. Nowadays, scientific societies suggest to tailor COS schemes on the female characteristics [10,11,12].

In clinical practice, the physician needs to estimate a priori the female response after the controlled stimulation phase (COS) without clear evidence-based recommendations [13], leaving an extreme variability in the proposed therapeutic regimens [8]. Indeed, the most cost-effective ART management in terms of pregnancy and live birth rates is still far to be achieved [14]. Alongside the absence of a gold standard of care in ART, the clinical application of predictive models is still scanty, considering their limited predictive ability and lack of confidence among clinicians about their effectiveness [15].

Despite the wide literature describing how to perform a predictive research, the majority of models published so far suffer from methodological shortcomings [16, 17]. One of the most accredited predictive models is the “McLernon post-treatment model,” which predicts the cumulative probability of a live birth after the first fresh embryo transfer during one or more in vitro fertilization (IVF) or intracytoplasmic sperm injection (ICSI) cycles [18]. Within this post-treatment model, the following parameters were identified as best predictors for live birth rate: (i) woman’s age, (ii) number of oocytes retrieved, and (iii) cryopreservation of embryos [18]. However, the currently available predictive models are not sufficiently reliable to guarantee uniform counseling for infertile couples [8]. Moreover, these parameters are generally available only when the ART path has been started and has passed a point of no return, which cannot be changed. Indeed, knowing that the chances of ART success are low or equal to zero when the number of embryos is low, does not allow the clinician to change the approach or suspend treatment before failure, with the known psychological and economic consequences on the couple.

Thus, the need to obtain reliable parameters able to improve the concordance of treatment decisions in reproductive medicine remains urgent [8]. In particular, to promote the clinical impact of predictive models, it could be useful to identify predictors for those ART phases which could be revised or interrupted, i.e., COS, pick-up and embryo transfer phases. In particular, a predictive model able to estimate the chances of success in the time point after ovarian stimulation and before the pick-up could guide the decision to prosecute or not the ongoing ART path or to suspended it and to re-schedule a new COS.

With this in mind, the aim of this study is the development of a decisional algorithm able to predict strong ART outcomes, i.e., pregnancy and live birth rates, in order to help the clinician to decide when and whether to perform oocytes pick-up, continuing the ongoing ART path.

Materials and methods

A single center, retrospective analysis of real-world data was carried out, considering all couples attending the Fertility Centre of the Department of Obstetrics and Gynaecology of Reggio Emilia (Italy).

All consecutive ART cycles performed from 1998 to December 2020 were retrospectively extracted and couples fulfilling following inclusion criteria were included in the final dataset. Couple with (i) both partners older than 18 years, (ii) attending ART cycles performed using fresh sperms and oocytes, (iii) in which COS proceeded until ovulation, and (iv) all ART variables reported below are available. Thus, neither cycles stopped for any reasons, nor cycles performed with frozen sperms and/or oocytes have been included. Donor egg and donor sperm were excluded. Both ICSI and IVF cycles were considered.

Assisted reproductive technology (ART) procedures

The downregulation of the hypothalamic-pituitary–gonadal axis was obtained through gonadotropin-releasing hormone agonists administration (GnRHa) (Enantone®, Takeda Pharmaceutical, or Decapeptyl®, Ipsen). Then, ovarian stimulation was performed applying individualized protocols: (i) recombinant follicle-stimulating hormone (FSH) alone (Gonal-F®, Merck Serono), (ii) recombinant FSH plus luteinizing hormone (LH) (Pergoveris®, Merck Serono), (iii) highly purified human menopausal gonadotropin (hMG) (Meropur®, Ferring), or (iv) biosimilar FSH (Ovaleap®, Theramex). The ovarian stimulation was monitored by serum estradiol assays and serial ultrasound (US) evaluations. When more than three follicles with diameter higher than 17 mm were observed at US, human chorionic gonadotropin (hCG) (Gonasi®, IBSA Institut Biochimique) was injected to complete oocyte maturation and to promote ovulation. The oocyte retrieval was performed 34–36 h after hCG administration by US-guided transvaginal aspiration. All patients received supplemental progesterone for 12 days until β-hCG assay.

For conventional IVF, oocytes were individually cultured in microdrops of fresh medium under mineral oil with 100,000 activated sperms. For ICSI, after the removal of the cumulus and corona cells, nuclear maturation assessment of oocytes was performed using an inverted microscope to ensure the sperm injection in metaphase-II oocytes only.

Oocytes fertilization was assessed at 18–20 h (day 1) after insemination/injection and confirmed by the presence of two pronuclei and the alignment of nucleolar precursor bodies. In all cases, the embryonic development was assessed on days 2 and 3 (i.e., after 41–43 and 65–67 h from insemination/injection, respectively). The best-quality embryos were transferred on day 2 or 3 after IVF/ICSI procedures until July 2020, when blastocyst transfer was started. In Italy, embryo production and transfer are regulated by specific national laws which have changed over the years. In particular, until 2004, a maximum of five embryos were transferred for each cycle, since embryos freezing was not allowed. Afterwards, the allowed maximum number of transferred embryos has progressively decreased until July 2020, when it was set to one embryo for women younger than 38 years and two for women older. Moreover, across years of observations, the day in which the embryo could be transferred changed according to national rules. Since the embryo transfer follows COS, the day in which it was performed was not considered as a predictive variable of the statistical analysis, limiting its potential bias.

For the evaluation of pregnancies, international ESHRE definitions were considered [19]. In particular, biochemical pregnancies were assessed 12 days after embryo transfer by a positive quantitative serum β-hCG assay higher than 10 IU/L. In case of positive biochemical pregnancy test, micronized progesterone support (Prometrium®, Rottapharm Madaus or Crinone®, Merck Serono) was continued until 35 days after embryo transfer.

Outcomes

Baseline couple characteristics were collected, considering age, body mass index (BMI), and smoking habit of both partners. Moreover, reason of couple infertility and fertility history of the couple (previous pregnancies, miscarriages, pre-term, and term births) were collected as categorical data.

The ART procedure was evaluated collecting several variables, considering male parameters (e.g., semen volume, sperm concentration, sperm motility, and morphology percentages), COS approach (e.g., gonadotropin drug used, days of stimulation, starting gonadotropin dosage, total gonadotropins units used), and variables of COS response (e.g., ovarian follicles > 17 mm detected at US before pick-up—OF17, total and mature oocytes retrieved, injected/inseminated oocytes, fertilized oocytes, fertilization rate and number of total, transferred, and frozen embryos). The fertilization rate was calculated a posteriori as the ratio between the number of fertilized oocytes and the number of either injected (ICSI method) or inseminated (IVF cycles) oocytes.

Finally, the strong ART outcomes were considered, i.e., biochemical and clinical pregnancy and live birth rates. The biochemical pregnancy rate represented the detection of increased levels of hCG in serum, while the clinical pregnancy was diagnosed in case of US visualization of at least one embryos with heartbeat [19].

Statistical analysis

First, the entire dataset was evaluated performing descriptive statistics, in order to obtain a snapshot of the characteristics of the cohort included, evaluating both ART variables and outcomes.

Second, continuous parameters’ distribution was evaluated by Kolmogorov–Smirnov test. Then, continuous data were compared between couples who obtained a pregnancy (biochemical and clinical separately) and a live birth, using either ANOVA univariate or Mann–Whitney U-test, according to data distribution.

Linear regression logistic analyses were performed, repeating the analysis for each strong ART outcome. In details, these analyses were performed using strong ART outcomes as dependent variables, setting both cohort baseline characteristics and ART variables reported above as covariates and cofactors. Among cofactors, the ART approach used (i.e., ICSI or IVF) and the couple infertility etiology were included. Only those ART variables which predicted pregnancy/live birth rates were extracted and evaluated in correlation analyses with other variables, using Spearman’s Rho analysis. Moreover, these variables were used as dependent variables in multiple linear regression analyses, setting other ART variables, and baseline cohort characteristics as independent variables. These analyses are needed to decide whether a predictive model could be developed considering only variables obtained before pick-up.

The predictive models’ development was systematically performed according to the seven steps proposed by Steyerberg et al. [20], considering separately biochemical pregnancy, clinical pregnancy, and live birth rates as final outcome. Table 1 shows the systematic approach to the main question.

Table 1 Development of predictive models applying the seven-steps systematic approach

Finally, three decision trees were created. Each decision tree was built to predict pregnancy (biochemical and clinic separately) and live birth rates, considering the study question. Thus, the dependent variables were the strong ART outcomes, whereas independent variables/factors were the ART variables that precede pick-up. The exhaustive chi-square automatic interaction detector (CHAID) decision tree was applied. This statistical tool derived from the first algorithm developed in the 1980s, accepting both categorical and continuous variables [21]. In particular, like CHAID, merely nominal or ordinal categorical predictors are allowable, thus continuous predictors are first converted into ordinal predictors before the merging step. The primary advantage CHAID decision tree analysis is the large number of variables potentially usable in the segmentation process. In these analyses, nodes were created considering those variables acting before the ovulation induction, such as female and male ages, BMI, smoking habit, infertility causes, gonadotropin drug used, starting gonadotropin dose, days of stimulation, total gonadotropin units used, and OF17. The analysis was performed using the 50% of the casuistry, randomly selected, to test the tree, and the remnant 50% to validate the result. Percentages reported within each node of the decision tree generated did not report the occurrence of the endpoint evaluated but the accuracy of the classification performed by each node. Since this approach could be suboptimal for internal validation, we confirmed these results applying a further cross-validation resampling [22]. Moreover, in applying the CHAID algorithm, the rule of thumb (or stopping rule) for the growth of the tree had a key role. Thus, we considered a minimum sample size of 50 cases for the terminal nodes (final segments). As such, we assure the assumption of normality for an ANOVA procedure to compare the means of a continuous variable of interest for each segment. In addition, if the variable of interest is a categorical one, we reach a reasonable sampling size to apply a multinomial logistic regression.

Finally, since the dataset included couples treated one time and couples in which ART was applied more than two times, the analyses were repeated considering the couples treated only one time as a single group.

Statistical analysis was performed using the “Statistical Package for the Social Sciences” software for Windows (version 26.0; SPSS Inc., Chicago, IL, USA). For all comparisons, p < 0.05 was considered statistically significant.

Results

The final database included 12,275 ART cycles, consisting of 7826 ICSI (63.8%) and 4449 IVF (36.2%) procedures. The 87.5% of the entire cohort (10,375 couples) were treated for primary couple infertility. Table 2 summarizes main baseline cohort characteristics of couples enrolled. Table 3 shows the ART variables and outcomes (as distinguished above) collected for each ART cycle included in the analyses.

Table 2 Cohort baseline characteristics
Table 3 Assisted reproductive technique (ART) variables and outcomes. Data are expressed as mean ± standard deviation

Comparing couples who obtained a biochemical pregnancy to those who did not, the pregnant couples showed lower female and male ages (p < 0.001 and p = 0.002, respectively), higher sperm morphology (p = 0.044), OF17 (p < 0.001), total retrieved oocytes (p < 0.001), injected/inseminated oocytes (p < 0.001), fertilized oocytes (p < 0.001), total embryos (p < 0.001), transferred embryos (p < 0.001), and fertilization rate (p < 0.001) (Table 4). Similarly, couples who achieved a clinical pregnancy showed lower female and male ages (p = 0.002 and p < 0.001, respectively), higher sperm morphology (p = 0.006), number of ovarian follicles higher than 17 mm (p < 0.001), total retrieved oocytes (p = 0.025), injected/inseminated oocytes (p = 0.004), oocytes fertilized (p < 0.001), total embryos (p < 0.001), fertilized embryos (p < 0.001), and fertilization rate (p < 0.001) (Table 5). On the contrary, couples who achieved a live birth showed fewer statistically significant differences compared to couples who did not, as expected. In particular, couples who achieved a live birth showed higher OF17 number (p < 0.001), total retrieved oocytes (p < 0.001), and total embryos (p < 0.001) (Table 6).

Table 4 Comparison between couples who achieved a biochemical pregnancy compared to couples who did not. Data are expressed as mean ± standard deviation. Bold values express statistically significant differences
Table 5 Comparison between couples who achieved a clinical pregnancy compared to couples who did not. Data are expressed as mean ± standard deviation. Bold values express statistically significant differences
Table 6 Comparison between couples who achieved a live birth compared to couples who did not. Data are expressed as mean ± standard deviation. Bold values express statistically significant differences

Linear logistic regression analysis detected three ART variables able to influence the biochemical pregnancy rate, such as OF17 number (B = 0.336; Wald coefficient = 887.1; p < 0.001; OR, 1.40; CI95%, 1.37–1.43), the number of total embryos (B = 0.132; Wald coefficient = 58.0; p < 0.001; OR, 1.14; CI95%, 1.10–1.18), and the fertilization rate (B = 0.982; Wald coefficient = 91.9; p < 0.001; OR, 2.67; CI95%, 2.18–3.26). Similar results were obtained considering the clinical pregnancy rate as dependent variable. However, the latter was significantly influenced by the OF17 number (B = 0.494; Wald coefficient = 281.0; p < 0.001; OR, 1.64; CI95%, 1.55–1.73), and the total number of embryos formed (B = 0.140; Wald coefficient = 14.2; p < 0.001; OR, 1.15; CI95%, 1.07–1.24), but not by the fertilization rate. Finally, live birth rate was not significantly related to any ART variable in logistic regression analysis.

Thus, three ART variables are related to ART outcomes, such as ovarian follicle number at US, total embryos, and fertilization rate. These parameters were set as dependent variables in correlation analyses with all parameters that chronologically precede them. The OF17 number was inversely related to female age (p < 0.001), BMI (p < 0.001), and total FSH dose used for COS (p < 0.001) (Table 3). On the contrary, neither the FSH starting dose, nor the length of COS significantly influenced the OF17 number (Table 7). Similarly, the number of embryos obtained was inversely related to the female age (p < 0.001) and directly related to the OF17 number (p < 0.001), semen volume (p < 0.001), sperm concentration (p < 0.001), sperm motility (p < 0.001), sperm morphology (p < 0.001), total retrieved oocytes (p < 0.001), mature oocytes (p < 0.001), injected/inseminated oocytes (p < 0.001), and fertilized oocytes (p < 0.001) (Table 7). Finally, the fertilization rate was significantly directly related to OF17 (p < 0.001), sperm concentration (p < 0.001), sperm motility (p < 0.001), sperm morphology (p < 0.001), total retrieved oocytes (p < 0.001), injected/inseminated oocytes (p < 0.001), fertilized oocytes (p < 0.001), total embryos (p < 0.001), transferred embryos (p < 0.001) (Table 7).

Table 7 Linear correlation analysis between parameters which predicted assisted reproduction techniques (ART) outcomes and all parameters preceding the variable itself. Bold values express statistically significant differences

The three ART variables that significantly influenced pregnancy and live birth rates were then used as dependent variables in multivariate linear regression analyses, setting other factors, and baseline characteristics as independent variables. No statistical models were generated by multivariate analyses able to correlate the three ART variables with other parameters. These results suggest that the three parameters are not statistically influenced by any of the other ART parameters considered.

Decision trees analysis

The first decision tree was created using biochemical pregnancy as dependent variable. The statistical power of the analysis was 80.8% for the training (relative risk = 0.192, standard error 0.005) and 81.4% for the validation analysis (relative risk = 0.186, standard error 0.005). Seven predictive nodes have been identified (Fig. 1). The first five nodes classified biochemical pregnancies according to thresholds for OF17 (Fig. 1). The result of the decision tree suggests that at least two thresholds of OF17 could be suggested. When two or less follicles have been recognized at US before pick-up a biochemical pregnancy is virtually unachievable, while when more than 7 follicles are observed, there is the highest probability to achieve a biochemical pregnancy. Moreover, although between three and seven follicles at US, no clear distinction between pregnancy and no pregnancy could be achieved, when the follicle numbers at US was three, sperm motility entered the model (nodes 6 and 7), detecting a predicting threshold of 34.0% of progressive sperm motility (Fig. 1).

Fig. 1
figure 1

Decision tree performed using biochemical pregnancy as dependent variable. Only the validation step of the analysis is represented. Percentages reported for each node express the predictive accuracy of the node. Df, degrees of freedom; US, ultrasound

The second decision tree was performed on clinical pregnancy, reaching a statistical power of 85.4% for the training (relative risk = 0.146, standard error 0.005) and 86.8% for the validation analysis (relative risk = 0.138, standard error 0.004). Ten predictive nodes were identified (Fig. 2), in which the OF17 number remained in the first six nodes (Fig. 2). As for biochemical pregnancy, this analysis confirmed two thresholds regarding the OF17 number, i.e., equal or below 2 and above 7 follicles. In this analysis, between 5 and 7 follicles detected at US, no clear distinction between pregnancy and no pregnancy could be detected. On the contrary, when the number of follicles was 3 or 4/5, the female age entered the model with two different thresholds, respectively (37.2 years, nodes 7 and 8, and 40.1 years, nodes 9 and 10) (Fig. 2).

Fig. 2
figure 2

Decision tree performed using clinical pregnancy as dependent variable. Only the validation step of the analysis is represented. Percentages reported for each node express the predictive accuracy of the node.Df, degrees of freedom; US, ultrasound

Finally, the third decision tree analysis was made for live birth rate. In this last model, the OF17 represented the first five nodes, with a statistical power of 86.6% for the training (relative risk = 0.134, standard error 0.004) and 87.2% for the validation analysis (relative risk = 0.128, standard error 0.004) (Fig. 3). This result suggests that OF17 is the first parameter able to classify pregnant and not pregnant women (first node). However, only the threshold of 2 follicles is confirmed as a predictor of lack ART success. On the contrary, other nodes did not clearly classify live birth rate. In this analysis, if the OF17 number was 3 or > 7, female age entered into the model (second node) with the thresholds of 37.2 (nodes 6 and 7) and 35.6 years (nodes 8 and 9), respectively (Fig. 3).

Fig. 3
figure 3

Decision tree performed using live birth as dependent variable. Only the validation step of the analysis is represented. Percentages reported for each node express the predictive accuracy of the node.Df, degrees of freedom; US, ultrasound

Decision tree analyses for each strong ART outcome were repeated considering only those couples in which the ART was applied only one time (i.e., 7952 couples). The predictive accuracy of all three trees remained also considering only this subgroup (data not shown).

Male contribution

Considering the role of sperm motility in the biochemical pregnancy decision tree, the casuistry was divided according to the threshold suggested (i.e., 34%). Fisher exact test was performed to compare biochemical and clinical pregnancy between the two groups created, considering IVF and ICSI separately. Indeed, we previously demonstrated that sperm motility could have a predictive role in IVF, rather than ICSI cycles (Villani et al. 2021, submitted). As a confirm, both biochemical (75.3 versus 24.7%, p < 0.001) and clinical pregnancy (74.8 versus 25.2%, p < 0.001) rates were significantly higher in couples in which the man showed sperm motility higher than 34% compared to the others only in IVF cycles. On the contrary, no differences in biochemical (44.1 versus 55.9%, p = 0.189) and clinical (44.8 versus 55.2%, p = 0.303) pregnancy rates were detected between the two groups created on sperm motility in case of ICSI.

Discussion

Here, we applied a systematic seven-steps approach to generate a predicting ART success model [20], detecting three sensitive milestones of the decision-making process in which the clinician is routinely involved. In particular, OF17 number, female age, and sperm motility could be used to evaluate whether the ART path should be continued before pick-up. Several mathematical and statistical models have been proposed so far to predict ART success. However, an overall limited predictive accuracy and clinical utility emerged, due to several shortcomings and to a probably incorrected view to the problem [23]. Indeed, these previous works were finalized to identify predicting markers of overall ART success, not asking the question of how these factors could influence the decision-making process. Here, on the other hand, we have changed the point of view to the question, first selecting the ART path sensitive points, where a predictor could advise the doctor either to suspend the treatment or to change the approach. Two ART sensitive milestones, in which the clinician could decide to stop the process, could be the pick-up and the transfer time. Here, we demonstrate that the decision to continue the ART path to pick-up could be guided by three specific factors, applying a decision tree analyses.

Logistic regression analyses confirmed the relationship between strong ART outcomes and those variables detected before pick-up. Interestingly, these connections appeared only when pregnancy rates were considered, suggesting that the classical statistical approach is not able to overcome the higher number of biases influencing live birth rates. In the biochemical pregnancy decision tree, alongside to OF17, sperm motility entered the model introducing the threshold of 34%. This result suggests that a male parameter represents a crucial point in terms of prediction of pregnancy obtainment with a cut-off near to what proposed by the WHO manual, i.e., 32% [24, 25]. Although the relevance of male contribution in human reproduction seems obvious, most studies aimed at predicting ART success relegated the male factor to a secondary role, evaluating only the female partner (Villani MT et al., 2021, submitted). Together with the male partner, two female-related parameters emerged as the strongest and most clinically relevant key points derived from our decision trees, i.e., OF17 and female age. In particular, when less or equal than two follicles have been identified by US after COS, the chance of pregnancy obtainment is virtually zero. Similarly, in case of three OF17, the chance of conceiving remained below 6% for all three developed trees. On the other hand, increasing the OF17 number coherently, the pregnancy probabilities raised, however without identifying a clear threshold beyond which the virtual certainty of pregnancy obtainment is reached. When more than seven follicles have been identified, the pregnancy probabilities are the highest possible. Moreover, our trees showed that, for intermediate OF17 results, the female age could guide the clinician decision. In other words, our results suggest that when OF17 is lower than three the ART path should be stopped and the COS re-scheduled. When the OF17 is higher than 7, the ART path should be followed. When the OF17 is between 3 and 6, other parameters should be considered. In particular, in this setting, when the female age is high or when the sperm motility is low, the chance of ART success significantly decreases, and a COS re-schedule should be considered.

Recently, a meta-analysis evaluated the methodological quality and the performance of all existing ART predictive models, to recommend the most accurate approach at predicting chances of parenting after ART procedures, helping couples in managing their expectations [23]. The first interesting result of this meta-analysis is the high heterogeneity of statistical approach to the topic. Indeed, considering 35 predictive models from 33 manuscripts, logistic regression analyses were applied in 91% of studies and time to event modelling in 9% [23, 26, 27]. A combined statistical approach was not applied in all these analyses. The second relevant issue raised by this meta-analysis was the sample size considered in the 35 trials included. Indeed, only four works (11.4%) had sample sizes large enough to support the development and the validation of their models [18, 28,29,30], while the remaining analyzed cohorts of patients was insufficient to obtain reliable predictors. Here, we considered more than twelve thousand fresh cycles, applying a complex statistical model in which the linear logistic regression analysis was combined to a decision tree classification approach. Moreover, the vast majority of previous predictive models shares the limitation of not being processed in a systematic way, avoiding the recommended methodological development steps and consequently limiting their reliability [20, 31, 32]. In our work, we applied a systematic seven-steps approach to predictive model development [20] on a consistent single center casuistry of fresh ART cycles. In this systematic approach, the first step provides the problem definition and data inspection. Evaluating what is already known in the literature, the problem has been often faced looking for those parameters able to predict the final success, not thinking about how these can then be really applied to clinical practice. Thus, we changed the point of view on the problem, focusing the analysis on the search for those parameters able to tell the clinician when to continue the ART path after the COS phase, i.e., whether to proceed or not with the pick-up. Thus, our model could help to objectively predict a priori the potentiality in terms of pregnancy/live birth after ovarian stimulation. In particular, our model showed that three variables (OF17, female age, and sperm motility) could help the clinician to decide when and whether to continue the ART path after COS, and, consequently, when re-schedule a new ART cycle. This change of view could be potentially extremely relevant in ART management, to avoid loss of time and money stopping the ART cycle at an early stage when success chances result negligible. As a consequence, a new ART cycle could be planned to modify the COS phase in order to improve the OF17 number and to proceed to pick-up with higher probability of success. Indeed, the COS phase is heterogeneously managed comparing different assisted fertilization centers given the absence of evidence-based protocols, but crucial for ART success [13, 33]. Apart from tangible clinic and economic advantages, possible repercussions on the psychological health of the infertile couple have to be taken into account. Indeed, it is well established in the scientific literature and clearly evident in clinical practice that ART procedures are accompanied by a significant emotional burden experienced by both partners [34]. Since psychological consequences could be even more burdensome in case of ART treatment failures and with the consequent need to schedule new cycles [34], a precocious suspension of the ART cycle followed by a new treatment schedule could improve the couples’ psychological health.

Our study fits well into this research line [35], applying a validated statistical method and enriching it with the analysis of decision trees. A decision tree is a tree-like model commonly used as a supportive tool in operation research and decision analyses, such as in economy and marketing settings. This approach, aimed at identifying the strategy most likely to reach a goal, is simple to understand and interpret and could be combined with other decision techniques, such as logistic regression analyses. For these reasons, the decision tree model has been applied in several medicine branches, such as gastroenterology [36, 37], breast oncological surgery [38], cardiology [39], orthopedic surgery [40], and even to diagnose SARS-CoV-2 infection [41]. Specifically in the ART setting, the decision tree model has been previously applied mainly in cost-effectiveness analyses, for instance to identify the most cost-effective ovarian stimulation drug for intra-uterine insemination (IUI) [42]; to evaluate the clinical utility for preimplantation genetic assessment for aneuploidy after IVF in the USA [43], and in Germany [44]; to highlight anti-Müllerian hormone (AMH) serum levels as informative for stimulation dose management for optimizing blastocyst development [45]; and to identify the most cost-effective policy in terms of ART success in case of female age below 38 years comparing expectant management, IUI with ovarian stimulation and IVF [46]. Moreover, a decision tree was applied to develop a model combining AMH, antral follicle counts, FSH basal levels and female age to obtain the true ovarian reserve [47], and to compare GnRH-agonist long protocol to GnRH-antagonist protocol in IVF, highlighting that GnRH-antagonist introduces an economic advantage in case of fresh embryos, while the GnRH-agonist long protocol is preferable considering the cumulative pregnancy rate using both fresh and frozen embryos [48]. Here, we applied for the first time this statistical approach to new clinical question that should be increasingly relevant in ART practice.

The main strengths of our study are represented by the large sample size and the systematic approach to the predictive model development. However, several limitations should be considered. First, the retrospective collection of real-world data is biases by a high rate of missing data, possibly impacting the reliability of the results. However, we included only those cycles in which all ART variables were available, at the cost of reducing the sample size. In addition, during the long interval of data collection (i.e., 1998–2020), ART technologies evolved, as well as the regulatory rules for ART access and the characteristics of couples recurring to ART procedures. This data heterogeneity over the years could mitigate the reliability of our results. Moreover, we decided to develop our models excluding frozen embryos to avoid possible confounding factors, whereby obtained results are reliable only for fresh ART cycles. However, the large time-frame interval of observations, together with the use of only fresh cycles could be the reason for the low overall pregnancy rates detected in the cohort. Finally, our results come from a fairly young population, limiting their actual application to an older cohort. The accuracy of our model is 100% when less than three OF17 were detected and reaches the 43% when more than three OF17 were identified. However, this is true in our casuistry and further studies should confirm this accuracy.

In conclusion, we identified three decision trees helping the clinician to decide whether or not to perform oocytes pick-up, continuing the ongoing ART path. In these mathematical models, three predictors of ART success at a very early stage emerged, such as OF17 number, sperm motility, and female age. Although the female age constitutes a non-modifiable factor, the increase of OF17 and sperm motility should be pursued by clinicians to improve the chances of ART success.