Introduction

Current evidence-based practice guideline for the management of polycystic ovary syndrome (PCOS) recommends that clinical laboratories use validated liquid chromatography–tandem mass spectrometry (LC–MS/MS) assays for diagnosing biochemical hyperandrogenism in patients with PCOS [1]. Whilst automatized immunochemiluminescence (ICLA) or unextracted direct radioimmunoassays (RIA) are reliable enough to measure sex steroids such as dehydroepiandrosterone-sulfate (DHEAS), that circulate in μM concentrations [2], their diagnostic accuracy is poor at the very low nM circulating concentrations that characterize testosterone (T) or androstenedione (A4) in women [3,4,5]. The inaccuracy and lack of sensitivity of high-throughput ICLA for measuring sex steroids was reported almost two decades ago [6]. Nowadays, however, LC–MS/MS has not replaced ICLA in most routine clinical laboratories [7], despite the more favorable performance-to-cost ratio of the former.

In the clinical setting, a misdiagnosis of hyperandrogenemia derived from using ICLA, may lead to an inappropriate management in a significant proportion of women. Hyperandrogenemia identifies a subset of patients in whom metabolic risk is higher than that of their normoandrogenic counterparts [8]. On the other hand, a falsely elevated androgen concentration might submit affected women to harmful consequences such as the negative emotional impact of a lifetime diagnosis, prolonged follow-ups, or even unnecessary drug exposition [9].

In routine clinical practice, ultrasound (US) assessment of polycystic ovarian morphology (PCOM) represents another hurdle for PCOS phenotyping. Some US operators may lack the extensive training needed for the accurate assessment of ovarian volume and follicle number per ovary and, as a result, its diagnosis may become somewhat subjective. Moreover, the adequate assessment of PCOM constitutes a time-demanding exploration, the vaginal approach is not always feasible or desirable to the patient, and/or the imaging quality is insufficient for follicle counting [10]. To overcome these limitations, latest evidence-based guidelines recommend, with moderate quality of evidence, using also circulating anti-müllerian hormone (AMH) concentrations as a valid surrogate marker of PCOM in adults [1, 11].

We hypothesized that i) a more precise diagnosis of hyperandrogenemia would rule out the falsely elevated androgens that resulted from inaccurate immunoassays, avoiding overdiagnoses of PCOS; ii) the use of serum AMH concentrations for defining PCOM would increase the proportion of women fulfilling such diagnostic criteria. Hence, we here aimed to: i) compare, in a large series of women, the performances of state-of-the-art LC–MS/MS assays for hyperandrogenemia and of AMH assays for PCOM with those of routine immunoassays and US examinations conducted earlier; and ii) study how the implementation of these techniques would impact the distribution of PCOS phenotypes.

Material and methods

Subjects

We used serum samples and clinical data from 359 consecutive premenopausal women referred to our Reproductive Endocrinology clinic from 2006 to 2022 because of symptoms of functional androgen excess or hyperandrogenemia. To be included, a diagnosis of PCOS required the presence of at least two out of three criteria: clinical and/or biochemical androgen excess, ovulatory dysfunction (OD), and PCOM, and the woman had to allow us to include their database for research studies by signing an informed consent (see Ethical approval). We systematically excluded women with other etiologies of hyperandrogenism or ovulatory dysfunction [1, 12].

We defined clinical hyperandrogenism as hirsutism by a modified Ferriman-Gallwey score ≥ 8. In the clinical setting, immunoassays were used to estimate biochemical hyperandrogenism using local in-house cut-offs for each assay derived from a sample of non-hyperandrogenic premenopausal female volunteers presenting with regular menses (Table 1) [13, 14].

Table 1 Circulating androgen measurements by LC–MS/MS and serum anti-müllerian hormone assayed by the Elecsys ICLA immunoassay in a local non-hyperandrogenic control population

World Health Organization (WHO) group II OD (https://www.ncbi.nlm.nih.gov/books/NBK327781/) was defined by the presence of more than six cycles longer than 36 days in the previous year, absence of menstruation for three consecutive months, or by luteal phase progesterone concentrations below 17 μM (4 ng/mL) in women with regular menses.

Ovarian US examination was conducted in those women who presented with only hyperandrogenism or OD. On the contrary, in women meeting both criteria US was not systematically performed because they had already met a PCOS diagnosis, yet many of these women also received an US examination as a part of their Gynecology consultation [1]. PCOM was evaluated by gynecologists according to routine clinical practice based on the European Society of Human Reproduction and Embryology/American Society for Reproductive Medicine 2003 criteria [15]. Rotterdam criteria defined PCOM by the presence of ≥ 12 follicles in each ovary measuring 2–9 mm in diameter, and/or increased ovarian volume (10 mL) estimated both in longitudinal and antero-posterior cross-sections of the ovaries. According to current evidence-based definition for PCOM [1], an ovarian volume ≥ 10 ml or follicle number per section ≥ 10 in at least one ovary in adults can be considered the threshold for PCOM when older technology or image quality is insufficient to allow for an accurate assessment of follicle counts throughout the entire ovary; hence, those examinations would fulfill 2023 criteria also for clinical practice. None of these women had received treatment with oral contraceptives, antiandrogens, or insulin sensitizers during at least 6 months before sampling.

Metabolic phenotyping

Two trained investigators (M.L.-R. & H.F.E.-M.) were responsible for clinical, anthropometric, and physical evaluations, including the above-mentioned hirsutism score, body mass index (BMI), and waist circumference using the National Health Examination Survey method [16]. Office blood pressure was determined as the mean of two manual sphygmomanometer readings at sitting position. The composite insulin sensitivity index was calculated from the circulating glucose and insulin concentrations during a standard oral glucose tolerance test (oGTT) [17]. Abnormal glucose tolerance (prediabetes and type 2 diabetes mellitus) was defined from circulating glucose concentrations at 0 and 120 min during oGTT [18].

Sampling

Basal blood samples for sex steroid profiles and serum AMH were obtained after 12-h overnight fasting, between days 3 and 9 of a spontaneous or progestin withdrawal-induced menstrual bleeding, or at random after excluding pregnancy in amenorrheic patients. Then, a 75-g oGTT was performed, and samples were obtained at 0, 30, 60, 90, and 120 min. Samples were immediately centrifuged, and serum and plasma for storing were aliquoted, coded, and frozen at − 80 °C until thawed for analysis.

Assays

The technical specifications of sex steroids and serum AMH assays are detailed in Supplementary Information. For LC–MS/MS measurements and AMH assay, biochemical hyperandrogenism and upper limit of normality (ULN) were defined by the presence of values > 95th percentile (P95) of a sample of non-hyperandrogenic premenopausal female volunteers presenting with regular menses, composed of hospital's staff and overweight or obese women seeking advice solely for weight loss at our Department (Table 1). Those control women were similar in terms of BMI and age to the study population of hyperandrogenic women. None of these controls had a history of infertility, oophorectomy, or hysterectomy nor had received treatment with hormonal contraceptives, antiandrogens, or insulin sensitizers for at least 6 months before sampling. Of note, this population of control women derived from a similar sample of non-hyperandrogenic individuals than that previously used in establishing local in-house normality values for routine androgen assays [13, 14]. In the subset of women in whom sonography exploration was available, we also determined the optimal AMH threshold value in identifying PCOM by means of the Youden’s J statistics.

All but the LC–MS/MS and serum AMH assays were run at the time of subjects’ recruitment as described above. Both the external and local laboratories were blinded to women's features and patient or control status.

Statistical analysis

Data are shown as mean ± standard deviation or 95% confidence intervals (95%CI), and counts (percentage). For continuous variables, their normal distribution was assessed by the Kolmogorov–Smirnov test, and logarithmic transformation was applied to ensure normality if needed. First, we compared T and A4 levels as measured by routine immunoassays and LC–MS/MS, and their relationships with anthropometric, clinical, and metabolic variables, in the whole group of hyperandrogenic women. Second, we compared subgroups of hyperandrogenic women among them and with non-hyperandrogenic control women.

Continuous variables were compared by one-way ANOVA, Welch´s ANOVA, or univariate general linear models as a function of the homogeneity of variances and adjustment by covariates. Mean differences among three or more groups were analyzed by post hoc Tukey, Games-Howell, or Bonferroni methods. Categorical variables were analyzed by Fisher's exact, χ2 tests, or binary logistic regression models. Instead of using Passing-Bablok or Deming regressions, Spearman's correlation and linear regression were used to compare routine immunoassays with LC–MS/MS results since the latter is considered as the gold-standard method for sex steroid measurement. Consistency and absolute agreement among assays were addressed by their intra-class correlation coefficient using a two-factor and a random-effect model. Quantitative agreement was graphically assessed by Bland–Altman plots and the ratio method. Diagnostic agreement between routine phenotyping and those phenotypes derived from applying LC–MS/MS and AMH measurements was assessed by kappa (κ) and weighted κ coefficients [MedCalc Software Ltd. Inter-rater agreement. https://www.medcalc.org/calc/kappa.php (Version 22.014; accessed May 9, 2024)]. We performed other statistical analyses using IBM® SPSS® Statistics 23 (IBM España S.A., Madrid, Spain).

A P value < 0.05 was considered statistically significant.

Results

Different assays when diagnosing hyperandrogenism

The clinical variables, anthropometrics and sex steroid profiles of our study population are shown in Table 2. Routine total T was assayed by RIA and ICLA in 306 (85.2%) and 53 (14.8%) women, respectively (Supplementary Table S1). Total T and A4 concentrations assayed by routine methods showed a moderate positive correlation with those derived from LC–MS/MS, but the concordance among both techniques was poor-moderate in terms of total T, and virtually inexistent for A4 (Supplementary Information, Figure S1). Bland–Altman plots showed a tendency towards greater differences with increasing mean T and A4 values, especially with the former, and when those samples had been assayed by RIA. This total T overestimation by the routine immunoassays was confirmed using the ratio method.

Table 2 Clinical and anthropometric variables, sex steroids and metabolic profiles of women with PCOS

Impact on PCOS phenotyping of LC–MS/MS and serum anti-müllerian hormone

Table 3 shows the counts and percentages of hyperandrogenemia as a function of the assay used for measurement of serum androgen concentrations. The agreement between immunoassays and LC–MS/MS in identifying those patients with or without hyperandrogenemia was poor, regardless of the circulating androgen being analyzed; this occurred mostly because approximately 20% women showing normal LC–MS/MS values had shown hyperandrogenemia when using routine immunoassays.

Table 3 Presence of hyperandrogenemia as a function of sex steroid assay in women with PCOS

One hundred-seventy-eight out of 209 women who received sonographic evaluation presented with PCOM (85.2%) (Table 2). Using the P95 of AMH in our control group of non-hyperandrogenic women as cut-off value [55.9 pM (7.8 ng/mL)], 109 (30.5%) out of 357 patients showed increased circulating AMH concentrations (two samples were missing). In those women with US assessment, the diagnostic agreement between US-PCOM and AMH was only 27.3% [κ: 0.060 (0.005; 0.115)]. After performing a receiver-operating characteristic (ROC) curve analysis (Supplementary Information, Figure S2), AMH showed a poor diagnostic performance in diagnosing PCOM. According to the Youden’s index, the optimal AMH threshold value would be 16.3 pM (2.3 ng/mL), which showed 0.92 sensitivity for diagnosing ultrasound PCOM, but only 0.36 specificity. To control for the effect of inter-observer variability and diverse US equipment, we also restricted these analyses to those subjects evaluated throughout the last 5 years of the inclusion period─namely, from 2017 to 2022─in which the same gynecologists conducted those explorations. In these subjects (n = 116), the diagnostic agreement between US-PCOM and AMH was also poor: 43.1% [κ: 0.115 (0.039; 0.191)].

Figure 1 shows PCOS phenotypes after initial routine assessment using androgen immunoassays and US, and their re-classification after applying LC–MS/MS and adding AMH plus US to define hyperandrogenemia and PCOM, respectively. Fifty-two (18.3%) out of 285 individuals with hyperandrogenemia according to routine immunoassays (264 patients with classic PCOS and 21 women with ovulatory PCOS) no longer presented with androgen excess by LC–MS/MS. These 52 women were now diagnosed as non-hyperandrogenic PCOS in 26 cases (50.0%), group II WHO OD in 25 cases (48.1%), and isolated PCOM in the remaining case, even though only six out of those 25 women in the group II WHO OD had initially received an ovarian US examination. On the contrary, eight (15.4%) of the 52 women initially diagnosed with non-hyperandrogenic PCOS by routine methods were re-classified into the classic PCOS phenotype after using LC–MS/MS (Fig. 1). Overall, 60 out of 359 (16.7%) patients changed their PCOS phenotype after applying LC–MS/MS. The diagnostic agreement among PCOS phenotypes resulting from the application of routine immunoassays or from the use of LC–MS/MS was moderate [weighted κ (linear weights): 0.500 (0.404; 0.596)].

Fig. 1
figure 1

PCOS phenotypes identified as a function of each individual diagnostic criteria by routine methods of androgen assay and PCOM examination (upper panel) compared to gold standard LC–MS/MS for diagnosing hyperandrogenemia and serum AMH support (lower panel). The numbers into the white boxes indicate how many subjects are included in each PCOS phenotype. The arrows between upper and lower boxes denotes women who changed their PCOS phenotype after the study. The small number above final phenotypes indicate how many individuals updated their phenotype

When we added an elevated serum AMH concentration as a surrogate marker for PCOM to the ovarian US conducted initially, seven out of those 25 patients with group WHO II OD now fulfilled diagnostic criteria of non-hyperandrogenic PCOS. However, an elevated AMH concentration as a surrogate marker of PCOM would have only identified three out of 23 women diagnosed with ovulatory PCOS when using ovarian US, and only 25 out of 77 women with non-hyperandrogenic PCOS according to US showed increased AMH values. In other words, 20 (87.0%) patients initially diagnosed with ovulatory PCOS, and 52 (67.5%) of those diagnosed with non-hyperandrogenic PCOS, had PCOM by US but a serum AMH ≤ 55.9 pM (7.8 ng/mL). Overall diagnostic agreement between routine assessment using immunoassays and US and that derived from LC–MS/MS and the addition of AMH to US was moderate [weighted κ (linear weights): 0.512 (0.416; 0.608)]. If we do not take into account its poor specificity, using the cut-off derived from Youden’s statistics would have been classified as normoandrogenic PCOS, 20 out of those 25 women presenting with only OD. Nonetheless, overall diagnostic agreement between routine assessments and LC–MS/MS and the addition of AMH to US was not improved [weighted κ (linear weights): 0.536 (0.441; 0.632)]. Similarly, when we tested the AMH cutoff based on a previous study using the Elecsys AMH Plus immunoassay [3.2 ng/mL (23 pM)] [19], 18 out of that group of 25 women were classified as normoandrogenic PCOS.

Effect of changes in the phenotyping of patients with PCOS on their metabolic profiles

The only subgroup of patients in whom metabolic indices were altered when compared with controls was that comprised of women showing hyperandrogenemia by LC–MS/MS, regardless of immunoassays; these patients presented with reduced  insulin sensitivity index and HDL-cholesterol, and increased diastolic BP values and triglycerides concentrations, and were more likely to have an abnormal glucose metabolism, namely prediabetes or diabetes mellitus (Table 4). Moreover, their insulin sensitivity index was lower, and their triglyceride concentrations and serum AMH concentrations were higher, than the subgroup of women showing hyperandrogenemia only by immunoassays.

Table 4 Anthropometrics, clinical and metabolic features when grouping patients as a function of the method used to diagnose hyperandrogenemia

Table 5 shows anthropometrics, clinical and metabolic variables as a function of PCOM in 209 women in whom both ovarian US and AMH concentrations were available. Only 4 women had PCOM as defined by AMH in the presence of a normal ovarian US. Women with US-PCOM showed higher BMI values and indexes of abdominal adiposity compared with those in whom PCOM was established by AMH. Moreover, women with PCOM according to both US and AMH showed higher circulating total T and A4 concentrations than those only matching US criteria for PCOM.

Table 5 Anthropometrics, clinical and metabolic features as a function of PCOM status

Those presenting with elevated AMH had reduced BMI and systolic BP, and increased insulin sensitivity index, circulating T and A4 concentrations, and HDL-cholesterol compared with their counterparts showing normal AMH. When the analysis was restricted to patients with ovulatory or non-hyperandrogenic phenotypes, those presenting with elevated AMH concentrations were leaner, had a higher insulin sensitivity index and circulating HDL-cholesterol, SHBG, total T and A4, but showed lower DHEAS concentrations than women with normal AMH levels (Supplementary Information, Tables S2 & S3).

The re-classification of PCOS phenotypes resulting from the application of LC–MS/MS and addition to serum AMH concentrations to ovarian US confirmed the association of hyperandrogenic phenotypes with metabolic dysfunction. Despite all subgroups of women showing a similar BMI, only patients with the classic PCOS phenotype presented with higher mean diastolic BP, circulating triglycerides, and lower mean HDL-cholesterol concentrations compared with the control group (Figs. 2 & 3). Mean LDL-cholesterol was also higher in patients with classic PCOS than in ovulatory and non-hyperandrogenic phenotypes. This latter non-hyperandrogenic phenotype showed a better lipid profile also for HDL-cholesterol and triglycerides compared with classic PCOS.

Fig. 2
figure 2

Anthropometric, clinical variables, circulating androgens, and anti-müllerian hormone levels according to revised PCOS phenotypes. The box indicates the 25th and 75th percentiles, the solid and short dashed lines within the boxes mark the median and mean, respectively. Whiskers below and above the box indicate the 10th and 90th percentiles. The shaded areas, solid and short dashed lines behind the boxes represent the interquartile range, median and mean, respectively, of our control population. * (y-axis) significant differences for the comparisons among all subgroups of patients including the control group of women. * (within the boxes) significant differences between that PCOS phenotype and control women. * (above the boxes) significant differences between those PCOS phenotypes

Fig. 3
figure 3

Lipid profile and glucose metabolism parameters according to revised PCOS phenotypes. The box indicates the 25th and 75th percentiles, the solid and short dashed lines within the boxes mark the median and mean, respectively. Whiskers below and above the box indicate the 10th and 90th percentiles. The shaded areas, solid and short dashed lines behind the boxes represent the interquartile range, median and mean, respectively, of our control population. * (y-axis) significant differences for the comparisons among all subgroups of patients including the control group of women. * (within the boxes) significant differences between that PCOS phenotype and control women. * (above the boxes) significant differences between those PCOS phenotypes

Regarding carbohydrate metabolism, patients with classic PCOS had reduced insulin sensitivity index compared with control women and those with non-hyperandrogenic PCOS (Fig. 3). Furthermore, only women with the classic PCOS phenotype were more likely to have prediabetes or diabetes mellitus than control women [OR: 1.86 (1.02; 3.40); P = 0.043; Supplementary Information, Figure S3]. Of note, those women not fulfilling PCOS criteria after applying LC–MS/MS and serum AMH assessments –almost all women with group II WHO OD– showed an insulin sensitivity index that was similar to that of women with classic PCOS and was lower than that of control women (Fig. 3). Five of them (26.3%) showed abnormal glucose tolerance –including type 2 diabetes in 3 women– that was accompanied by obesity in four cases.

Discussion

Automated ICLA is still mostly used in routine laboratories worldwide [7, 20, 21], despite the repeated recommendations to use LC–MS/MS for the measurement of total T in women made by most international scientific societies during the past 15 years [1, 22, 23]. Although standardization efforts may have enhanced analytical accuracy and precision for T immunoassays, these improvements still do not permit an accurate measurement at the very low concentrations that characterize non-hyperandrogenic women or patients with PCOS [20]. In agreement, the application of a validated LC–MS/MS T and A4 assay to serum samples from our clinically well-characterized population of hyperandrogenic women and controls, revealed that over 20% of women diagnosed with hyperandrogenemia by immunoassay had normal androgen concentrations. As a result, the diagnosis of 17% of the patients changed from hyperandrogenic to non-hyperandrogenic phenotypes of PCOS, and more importantly, in as many as 5% of the women previously diagnosed with PCOS such a diagnosis could not be confirmed any longer. Considering the very large prevalence of PCOS reported worldwide, these apparently small figures become of utmost importance in terms of economic burden, besides carrying an unnecessary psycho-emotional stigma for women incorrectly diagnosed with this condition.

Our data partially agree with previous observations comparing routine phenotyping by ICLA against LC–MS/MS in a population of 204 Italian women with PCOS [3]. In this Italian series, 15.7% women diagnosed with ICLA-hyperandrogenemia had normal circulating androgen levels by LC–MS/MS. Conversely, LC–MS/MS unmasked hyperandrogenemia in another 13.7% classified as normoandrogenic by immunoassays [3]. As a result, 12.2% of patients were reassigned to a different PCOS phenotype, with most of them moving from the normoandrogenic to the classic phenotype.

Both reports confirm the poor-performance of immunoassays when compared with the gold-standard LC–MS/MS technique, although some differences among studies merit further explanations. The use of LC–MS/MS for the diagnosis of hyperandrogenemia and serum AMH for supporting PCOM definition in our series changed the PCOS phenotype of 16.8% women but, unlike the findings of the Italian study, most of our patients were reassigned from the classic phenotype to the normoandrogenic one. This apparent discrepancy may rely on: i) we used two different immunoassays throughout the recruitment period previously compared in terms of performance for phenotyping women with PCOS [14]. Our unextracted direct RIA found hyperandrogenemia in 17.2% more women than the ICLA did. However, this difference had a small impact on PCOS phenotyping when clinical hyperandrogenism was also considered, reducing the 17.2% figure of changes in the phenotype to merely 3.2% [14]; ii) the ULN for testosterone based on the P95 of our control group of women was slightly higher than that reported in the Italian population, possibly because we matched our control population for BMI with the patients, and the Italian controls were normal-weight women; and iii) most participants (90%) in the Italian study presented with PCOM according to US, whereas we performed an ovarian US in only 58% of our patients, even though 85% of them showed PCOM.

At this point, was serum AMH measurement helpful in supporting PCOS phenotyping in our population? There is no doubt that serum AMH measurement would simplify PCOS phenotyping in the routine practice, avoiding explorations that may not only be considered unpleasant by women, but may also lead to misdiagnoses derived from untrained observers or obsolete point-of-care US equipment. Furthermore, serum AMH may play a role in clinical counseling regarding assisted reproductive therapy outcomes among women with PCOS undergoing fertility treatment [24]. Nevertheless, if an elevated serum AMH concentration and US-PCOM define the same PCOS phenotype in the clinical setting is unclear in view of our current findings. When we applied an in-house specific cut-off to set normal or increased serum AMH levels as recommended [1], increased serum AMH showed a good specificity for identifying PCOM but a very poor sensitivity and negative predictive value. Therefore, increased serum AMH concentrations would be useful to ascertain PCOM but would not rule it out when a normal AMH result is present.

Earlier reports [25, 26], but not all [27], suggested a good concordance between serum AMH and US for the diagnosis of PCOM. Former studies applied standardized US protocols and state-of-the-art equipment, while our observations were conducted in the context of real-life clinical practice by observers that, not rarely, had to use US probes with a maximum frequency below 8 MHz. In addition, although our local AMH cutoff is similar to that obtained by others from population-based studies [28], previous studies set serum AMH thresholds for defining PCOM that were lower than those derived from our control population. Such threshold values were obtained from normal cycling women with US-PCOM [29], or from ROC analyses in which the diagnosis of PCOM relied on Rotterdam criteria [26]. Although whether excluding otherwise healthy women with US-PCOM from the process of establishing a normal reference range for serum AMH levels may be debatable, the fact that US was not performed in any of our controls made it impossible for us to exclude this possibility. Therefore, the inclusion of certain number of non-hyperandrogenic control women presenting with sonographic appearance of PCOM, but not meeting other PCOS criteria, may have increased our AMH normality cutoff. Either way, lowering AMH cut-offs to the level suggested by ROC analysis of patients in whom US was available would have resulted in an unacceptable low specificity. In other vein, our reference population for establishing AMH normative values included both non-obese and obese non-hyperandrogenic women with regular menses. Obesity may be negatively associated with ovarian reserve decreasing circulating AMH. In conceptual agreement, our obese control women presented with AMH levels mildly lower than those of their non-obese counterparts (data not shown). However, the impact of adiposity on AMH levels in otherwise healthy women with regular menses remains largely uncertain [30].

Nevertheless, the possibility exists that PCOM defined by US and serum AMH levels identify two subgroups of women with subtle phenotype differences [31]. Taken together, our results suggest that US and serum AMH are not completely interchangeable to diagnose PCOM, and, accordingly, it would be advisable a two-step process for PCOM phenotyping in women with a potential ovulatory or non-hyperandrogenic PCOS if US examination is not necessary to rule out other suspected condition. The initial step would consist of measuring serum AMH concentrations, giving its very good predictive value for PCOM, reserving US examination for women in whom AMH concentrations were not increased.

Our work had several weaknesses that may limit a broad generalization such as the use of two different immunoassays throughout the study period in our routine laboratory, possible technical issues with US examinations, and the absence of US examination in both our control population and some patients already discussed. Another potential factor that may have influenced our results resulted from referral bias. Earlier studies including our own suggested that patients referring to a Reproductive Endocrinology clinic may be more hyperandrogenic and more obese than those in the general population [13, 32]. Such a referral bias would explain not only the high frequency of obesity and abnormal glucose tolerance in our women who did not fulfil PCOS criteria after LC–MS/MS and serum AMH assessments, but also their presence in our population of control women that was partially composed of individuals seeking advice for weight excess. Finally, not all women with group II WHO OD in our series received an US examination despite presenting with normal AMH concentrations. Thus, we cannot rule out that PCOM was present in them, yet obesity itself may associate ovulatory dysfunction even in eumenorrheic women [33], regardless of circulating androgen levels [34].

In short, immunoassays are not accurate enough to permit a reliable diagnosis and phenotyping of PCOS, possibly the most common endocrine and metabolic disorder of women [35] with consequences that extend into the menopausal age [36]. Hence, even if integration into routine clinical practice of MS-based methods for sex-steroid measurements might be considered unrealistic by some, arguing complexity issues and costs, the vast numbers of women with PCOS worldwide deserve a reliable method for diagnosis, facilitating attending physicians with the correct tools for clinical decision-making. Our data cast some doubt upon the interchangeability of serum AMH and US for the diagnosis of PCOM in routine clinical practice, mostly because of the low negative predictive value of the former. Nonetheless, increased serum AMH measurements in those women without features of classic PCOS would avoid about 30% of US examinations in a first step.