Introduction

Thanks to medical advances, the living condition of women with premature ovarian insufficiency (POI) has gained more attention in recent years [1]. POI is a clinical syndrome defined by loss of ovarian activity before the age of 40, associated with menstrual disturbance, raised gonadotropins and low estradiol [2]. Although proper diagnostic accuracy in POI is lacking, the European Society of Human Reproduction and Embryology (ESHRE) has developed guidelines on management of women with premature ovarian insufficiency [2] in which they recommend the following diagnostic criteria for POI: (i) oligo/amenorrhea for at least 4 months, and (ii) an elevated FSH level > 25 IU/l on two occasions > 4 weeks apart. The nomenclature has changed over the years and POI has been referred to as premature ovarian failure, premature menopause, and premature ovarian dysfunction [3]. Earlier studies often used the term premature ovarian failure (POF) and more recent articles have used POI. It should also be noted that in POI serum follicle-stimulating hormone (FSH) levels are often found to exceed the diagnostic definition in studies of POI and are noted in several studies to be above 40 IU/L [2,3,4]. An earlier study reported the prevalence of POI in women under 30 years old estimated to be 0.1%, while the incidence of menopause in women before the age of 40 is approximately 1% [5]. In recent years, studies have investigated the prevalence of patients with POI in different countries. For example, one article reported a higher prevalence (1.9%; 95% CI 1.7–2.1) of POI in women before the age of 40 in Sweden [6] and another article reported 0.91% (95% CI 0.81–1.02%) in Estonia [7]. There has been a long-standing confusion over the various terms such as poor ovarian responders (POR), premature menopause and diminished ovarian reserve (DOR) [2, 3, 8, 9]. It is important to distinguish these conditions from POI because women with POI face more challenges than diminished fertility, and have different management needs [2, 10]. Only 5–10% of women with POI may be able to spontaneously conceive and deliver a child [11]. In addition, women with POI suffer from amenorrhea-related symptoms [12] psychological problems [13, 14], increased risk to cardiovascular health [15, 16] and to bone health [17]. POI is a condition that is influenced by genitourinary and sexual function [18] and neurological dysfunction [19] in both the short- and long-term and can lead to premature death [20]. The best option to relieve symptoms and protect POI patients against serious morbidity related to prolonged estrogen deficiency is hormone replacement therapy (HRT). However, HRT is just a mimic of normal physiological endocrinology, which has no evidence to improve the ovary function [2]. Consequently, patients with POI are at risk of poor health quality despite available treatment options. Quality of life (QoL) is a broad multidimensional concept that usually includes subjective evaluations of both positive and negative aspects of life [21]. While, health-related quality of life (HrQoL) focus on the effects of a disease on an individual’s health and its treatment [22,23,24,25] encompassing physical, psychological, and social functioning [23, 26] and presents an avenue for the evaluation of the consequences of experiencing premature ovarian insufficiency. This review aimed to investigate studies of women with POI, which have included measures of HrQoL, in order to evaluate effect sizes and in addition to identify the measurement instruments used. A meta-analysis was conducted of the studies that reached quality standards and which compared the HrQoL outcomes among patients with POI with a control group consisting of normal ovary function women.

Materials and methods

This study followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) [27] reporting guideline (Online Resource ESM_1). A submission to the ethics committee of the Clinical Basic Medicine Institute, China Academy of Chinese Medical Sciences was sought. The Ethics committee judged that ethical approval was not required for this research (ref 2019/1).

Search strategy and data selection

An electronic search of the six databases was undertaken from database inception to June 2018. PubMed/MEDLINE and ‘Web of science’ provided a broad coverage of the biomedical literature, including reproductive biology and clinical medicine. EMBASE was included because it has greater coverage of European and non-English language publications and topics such as alternative medicine. China National Knowledge Infrastructure (CNKI), WanFang database and Chongqing VIP information (CQVIP) were included to ensure that no Asian publications were missed. Searches were conducted without restrictions with respect to publication year, language, type or setting of study or accessibility to full-text articles. A combination of keywords and database specific terms was used (premature ovarian insufficiency OR premature ovarian failure OR diminished ovarian reserve OR poor ovarian response OR premature menopause OR hyper-gonadotropic hypogonadism OR elevated gonadotrophins OR triad of amenorrhea OR estrogen deficiency) AND (well-being OR health outcome OR quality-of-life OR health-related quality of life) AND (questionnaire OR instrument OR patient reported outcome). Strategies differed in the different databases depending upon the information structures. The details of the different search strategies are provided in the online resource materials (online resource ESM_2). The process of article selection is outlined in Fig. 1 with a description of predefined criteria for selection. One author (XT Li) was mainly responsible for screening the titles and abstracts. Articles identified were independently read and discussed with two more authors (HS Yang, PY Li) to ensure an unbiased selection. Some studies of post-menopause have used instruments such as the MSQOL [28, 29] however this is not a measure of subjective quality-of-life and was therefore not included in this review. No additional articles were identified through the manual search. Studies describing the construction and validity of the HrQoL questionnaires used in the studies were also evaluated. If information on construction and validity was sparse, contact was attempted with the author responsible for the development of the questionnaire.

Fig. 1
figure 1

The article selection process and criteria for selection for the literature review and meta-analysis

Criteria to select articles

The inclusion criteria for empirical investigation studies of adults with POI was that HrQoL was a primary or secondary outcome. Studies with participants from hospitals and long-term care facilities or with specific conditions (e.g. Turner syndrome or anorexia) or where abstracts only were found were included in the literature in order to be able to extract data on the questionnaires used but excluded from the meta-analysis. No restrictions were placed on the geographic, soioeconoimic or ethinic backgrounds of any of the participants. There was no restriction in terms of treatment, both randomized and non-randomized trials were included. Exclusion criteria for the systematic review were duplicate publications or reviews, studies that did not include outcomes from a HrQoL questionnaire. Exclusion criteria for the meta-analysis were articles which lacked relevant data for investigation and studies without a normal ovary function control group.

Critical appraisal: assessment of bias in the studies

The quality of eligible articles was assessed at the study level using the Newcastle–Ottawa Scale (NOS) for nonrandomized cohort studies [30]. Each article was awarded a ‘star’ or score out of four for selection bias, two for comparability and three for bias in the outcome assessment, with a maximum total score of nine points. The NOS score was used to assess differences in study quality scores > 6 high; 4–6 medium, < 4 low [31]. The scoring system and evaluation is provided in the Online Resource ESM_3. Two authors (XT Li, PY Li) independently evaluated the findings of each study to ensure an unbiased assessment.

Meta-analysis

A meta-analysis investigated the outcome of HrQoL in patients with POI compared with a normal ovary function reference population. Review Manager (Version 5.3. Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014) was used. The estimated value and 95% confidence interval (95% CI) of the effect size was calculated by Standard Mean Difference (SMD) [32]. The SMD is used as a summary statistic in meta-analysis when the studies all assess the same outcome but measure it in a variety of ways [33]. Cohen [34] suggested that d = 0.2 be considered a ‘small’ effect size, 0.5 represents a ‘medium’ effect size and 0.8 a ‘large’ effect size. The size of heterogeneity among studies after combination was determined via I2 statistic: 0% to 40%: might not be important; 30% to 60%: may represent moderate heterogeneity; 50% to 90%: may represent substantial heterogeneity; 75% to 100%: considerable heterogeneity [35]. If there was no heterogeneity among studies, a fixed effects model was applied for meta-analysis; if there was statistical heterogeneity, the sources of heterogeneity were further analyzed, and a random effects model was adopted for meta-analysis. According to the same questionnaires used and same specific domain evaluated, the effect sizes were divided into subgroups. This systematic review and meta-analysis were performed and reported according to the PRISMA guidelines. The PRISMA checklist is included as Online Resource_3.

Results

Thirty-four studies matched the inclusion criteria and were included for review. Fifteen articles were related to treatment evaluation while 19 articles examined elements of HrQoL (Tables 1, 2). In five of these studies only the abstracts were available for examination [36,37,38,39,40]. These articles were all published between 2006 and 2018. Eighteen articles were cross-sectional studies [36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53] two of which included case–controls [43, 51]. One article reported only case–control data [54]. Nine articles described HrQoL among patients with the nomenclature of POI [36, 39, 40, 42, 47, 49, 51,52,53] and ten articles described HrQoL among patients with the previous nomenclature of POF [37, 38, 41, 43,44,45,46, 48, 50, 54]. Thirteen articles had control groups [39,40,41,42,43,44,45,46, 48, 49, 51, 53, 54] and nine of these had a control group of women with normal ovarian function [41,42,43,44,45,46, 51, 53, 54], six of these had sufficient information to be included in the meta-analysis [41,42,43,44,45, 54]. None of the studies used proxy-reports from family members as part of the evaluation. Reported studies had varying sample sizes; the largest sample size was 340 women [46]. The studies were geographically diverse including China [41, 44,45,46], UK [37, 38, 50], America [36, 39, 40, 42, 49, 51,52,53], Brazil [43, 54], Australia [48] and multi-national studies [47] (Fig. 1 and Tables 1, 2).

Table 1 Presentation of details of studies included in the systematic review and included in the meta-analysis
Table 2 Studies included in the systematic review not included in the meta-analysis due to insufficient data or non-normal ovarian function control group

Domains of HrQoL examined

The definition of HrQoL used in the studies is derived from the domains of the questionnaires used to measure HrQoL. Among the 19 articles examining HrQoL, seven studies included a measure of overall HrQoL as measured by either a generic questionnaire (SF-36, WHOQoL-BREF) [37, 43, 44, 50, 54] or measured in relation to fertility or sexual function [42, 45, 50, 54]. Nine studies focused on psychiatric aspects including depression and meaning in life [36, 38,39,40, 49,50,51,52,53]. Four articles used the POI related symptom questionnaires [38, 47, 48] Only one of these [50] used a condition specific instrument designed for POI (Young Menopause Assessment (YMA) [50]). One study evaluated the aspect of social function: perceived social support [53]. The reduced HrQoL among patients with POI was mentioned in all 19 articles. A summary of the studies is found in Tables 1, 2.

Overall HrQoL

Three articles described factors correlated with lower HrQoL in POI populations: one article reported that orgasm and sexual satisfaction were correlated with all QOL domains [54]; a second article analysed character traits of POI patients [45], which showed that older patients, with primary infertility and who had had children had lower HrQoL scores than patients who were of younger age, secondary infertility or had previously given birth. In one article [44] different Traditional Chinese Medicine (TCM) syndromes were considered as summaries of symptoms of the pathogenesis of disease development [55]. These syndromes included insufficiencies of liver and kidney or asthenia of both the spleen and kidney. It was noted that patients with deficiency of liver and kidney had the lowest overall QOL scores (Table 3).

Table 3 Studies included in the systematic review not included in the meta-analysis due to insufficient data and no control group

Physical function and symptoms

Physical health of the women with POI was consistently reported to be significantly lower than controls. A number of physical function symptoms were explored including experience of physical pain [43] sexual function [42, 54] arousal, lubrication, orgasm and satisfaction, and sexual behaviour/experiences [42, 50, 54]. In addition, menopause symptoms such as vasomotor symptoms, mood swings and mental fog, hair loss, dry eyes, cold intolerance, joint clicking, tingling in limbs and low blood pressure were found at a high rate in patients with POI [47].

Psychological function and psychosocial aspects

Women with spontaneous POI were reported to score adversely on all measures of psychological functioning [43, 51] with higher negative feelings such as “blue mood” [56], despair, anxiety, and depression or had a negative impact on their self-image and confidence [50]. This population also had a high rate of mental health medication use and counselling [51] and a risk for depression [49]. Some articles analysed the factors related to these negative feelings. Adverse affective symptoms were associated with a lower perceived level of control [39]. One article reported illness uncertainty and lack of purpose in life as a significant independent factor associated with anxiety [51]. Scores on the Spiritual Well-Being scale were also associate with POI and were found to reduce with increased age [52].

Social function

Marital relationship and social support were reported to be significantly lower in POI patients [45]. Social relationships were found to have a negative influence of sexual function such as arousal, orgasm, satisfaction and pain [53, 54]. However, other articles reported no significant differences found with respect to the social relationships or support [43, 46].

Questionnaires

In total, twenty-three different questionnaires had been used in the nineteen articles identified for review (Table 4). The most frequently used questionnaires were the two generic HrQoL: World Health Organization Quality of Life (WHOQoL-BREF) [62,63,64], and the 36-Item Short Form Survey from the RAND Medical Outcomes Study (SF-36) [65,66,67] which were used in five studies. Between 1 and 4 questionnaires were used in each study, 50% of the studies only used one questionnaire. Those studies that used four concentrated on the psychological aspects of the condition and were mainly from the same research group at NIH in the US and reported in Abstract form. Other studies combined generic questionnaires with condition specific issues e.g. sexual or menopause specific questionnaires. Only one study [50] used a POI specific questionnaire (Young Menopause Assessment (YMA) [Unpublished]. This was used in combination with a sexual function questionnaire (Sexual Personal Experiences Questionnaire (SPEQ) [73]) a psychological questionnaire (Rosenberg’s Self Esteem Questionnaire [74,75,76,77]) and a generic questionnaire (SF-36 Short Form Survey from the RAND Medical Outcomes Study (SF-36) [65,66,67]). All the HrQoL instruments used are described in Table 4, a more detailed summary of the six questionnaires used in the studies included in the meta-analysis can be found as Online Resource ESM_5.

Table 4 Questionnaires used in the studies included in the systematic review

Synthesis of results and risk of bias (results of meta-analysis)

Six studies were included in the meta-analysis [41,42,43,44,45, 54] (Fig. 2) with 645 POI participants and 492 normal-ovarian controls. Where data on average age was available the POI group had a pooled mean age of 33.3 ± 5.47; and the control group a pooled mean age of 32.87 ± 5.61.

Fig. 2
figure 2figure 2

a Patients with POI compared with normal ovarian reference populations: overall health related quality-of-life (HrQoL). b Patients with POI compared with normal ovarian reference populations: physical functioning. c Patients with POI compared with normal ovarian reference populations: mental health. d Patients with POI compared with normal ovarian reference populations: social functioning

At the overall HrQoL level (Fig. 2a) four studies [42, 44, 45, 54] had lower level of HrQoL recorded in the POI group (pooled SMD = − 0.73, 95% CI − 0.94, − 0.51; I2 = 54%) as compared to a normal ovarian control group. The pooled heterogeneity can be considered moderate. To address the heterogeneity, a subgroup analysis (2 studies included) was performed to separately examine the measures of sexual functioning (Fig. 2a3) (SMD = − 0.78, 95% CI − 1.00, − 0.55; I2 = 0%) the effect size was medium to large and there was no indication of heterogeneity. The largest effect size (large) was found for ‘overall HrQoL’ as measured by the SF-36 (− 0.93, 95% CI − 1.22, − 0.64).

In regard to the physical functioning aspects of HrQoL (Fig. 2b), this was measured by four studies using nine different indicators. The results again showed moderate pooled effect size and moderate heterogeneity (pooled SMD = − 0.54, 95% CI − 0.69, − 0.39; I2 = 55%) as compared to a normal ovarian control group. The sexual function (2 studies included) measures explained the heterogeneity where these alone demonstrated substantial heterogeneity (I2 = 64%) but with a medium effect size (SMD = 0–0.52, 95% CI − 0.70, − 0.34; I2 = 64%). The largest effect size (moderate) was found for ‘Lubrication’ as measured by the FSFI (− 0.74, 95% CI − 1.06, − 0.42).

In the mental health area (Fig. 2c1, 2), the studies agreed that there was a lower level of mental health in the POI group than was found in the controls however the pooled effect size was small [1. SMD = − 0.43, 95% CI − 0.54, − 0.32; I2 = 0% (higher score = better Fig. 2c1); 2. SMD = 0.72, 95% CI 0.50, 0.95; I2 = 0% (lower score = better Fig. 2c2)]. The largest effect size (moderate) was found for ‘Optimism’ as measured by the TABP/TABC (− 0.64, 95% CI − 0.95, − 0.32).

The social functioning domain (Fig. 2d) was addressed by five of the six studies, the pooled effect size was small with no heterogeneity (pooled SMD = − 0.27, 95% CI − 0.38, − 0.15; I2 = 0%). The largest effect size (moderate) was found for ‘Drive and relationship’ in the DISF (− 0.48, 95% CI − 0.78, − 0.17).

Ji [44,45,] has calculated a total QoL score for the SF-36. There is not information on how this was calculated. For discussion on this issue see by Lins and Martins Carvalho (2016) https://doi.org/10.1177/2050312116671725.

Discussion

Nineteen studies reported the empirical measurement of HrQoL among patients with POI. Reports of the impact of POI on different aspects of HrQoL differed between studies. However, impaired physical, psychological and general health was reported across all areas of HrQoL. There were no articles prior to 2006 and studies used a variety of HrQoL instruments both generic and condition specific although only one measure was specially designed for POI [50]. Although subjective experiences of patients with POI have received more attention from the medical profession in the past decade, relevant and valid evaluation instruments have not been developed, and long-term follow-up studies of HrQoL have not been carried out.

The six controlled studies included in the meta-analysis demonstrated that overall HrQoL in patients with POI/POF is lower than individuals with normal ovarian functioning with low to medium pooled effect sizes [41,42,43,44,45, 54]. The moderate heterogeneity in the general measure of HrQoL appears to be due to the different concept being measured under the term HrQoL. It may also come from the different socioeconomic groups being included in the various studies. Information on socioeconomic status was sparsely reported and it was not possible for us to make an assessment of the influence of this moderator.

The finding that studies concerning HrQoL in relation to POI were not found prior to 2006 may be related to fact that the definition of POI had not been standardized. Recent guidelines from the European Society of Human Reproduction and Embryology, published in 2015 [2], coincide with the beginning of investigations into HrQoL in POI. However, some variation in diagnostic criteria is evident. Some studies used broader age intervals, and the levels of Follicle-Stimulating Hormone (FSH), which is a very important indicator of POI diagnosis [2], were vague. This may lead to heterogeneity of the results.

The factors measured in the six studies in the meta-analysis varied and included: fertility, sexual function, anxiety, depression, menopausal symptoms. Although all the measurements were cross-sectional, the concepts measures could all be considered to have long-term effects and would vary according to, for example, diagnostic age, marriage condition or education. In one study [45], an association was investigated between personal character traits and the impact of POI this highlighted the patient’s response to the stress of a POI diagnosis and of living with the condition.

Geographical diversity is apparent from our review. It is noted that studies were found in five countries and included one multi-national study [47]. Studies taking a cross-cultural perspective were not conducted. This highlights the possibility of cultural bias in the results [103]. The sparsity of these studies may be due to the lack of a single agreed and validated condition specific instrument translated into multiple language. In addition, despite substantial clinical studies on the use of traditional medicine with this condition, there is a lack of controlled studies that can be used as evidence of treatment effects.

The large number of instruments used (23) in 19 studies with a very low repetition rate, indicates that there is no common view concerning instruments. In some studies, the generic instruments were used to address a comprehensive array of domains of QoL, however, this focus may have limited the sensitivity to detect subtle aspects of POI. It is interesting to speculate on what we did not find, which was the patient perspective. The instrument designed for POI by Singer [50] for their study was based on ‘clinical experience’ and covered the areas of ‘About your POF/young menopause’, ‘Treatment’, and ‘Information and Support’. For many patients, there are concerns about the implications of the treatment and of possible long-term side effects which might be more meaningful to the patient [104, 105] and yet these aspects were not investigated. Some studies choose questionnaires that are specific for similar conditions such as menopause or infertility, however, even though the symptoms may be similar, the patients’ experiences and requirements may not be the same [47, 48, 54]. It also must be considered that these questionnaires may not be sensitive to all patients with POI. Although the majority of the questionnaires used to measure HrQoL in these studies had good psychometric properties, none of them had evidence to confirm the sensitivity and specificity of the instruments in relation to POI. There were ten studies [36, 39, 40, 46, 47, 50,51,52,53,54] that used a combination of questionnaires to capture more comprehensive information. However, mood, symptom, and fertility questions specific for women with POI were lacking [47, 50].

Strengths and limitations

Some limitations of the study need to be taken into consideration. It is possible that some studies have been missed due to the use of different terms for POI or in languages that were not included in the databases we examined. There were some studies that were only published as Abstracts and although we tried to contact these researchers we were unable to obtain more information. Our study has the strength of including both European and Asian databases. Those databases that were searched are those that have the highest likelihood of finding studies of HrQoL and POI.

Conclusion and future recommendations

This literature review and meta-analysis gives new information on HrQoL in patients with POI. In this review, the magnitude of the subjective effects is found to vary with effect sizes between low and medium. The largest effect sizes were found in the area of sexual function and general HrQoL. Cross-cultural approaches and international collaboration were found in only one study. Additional studies are recommended to make a stratified comparison of patients, larger sample sizes to identify real changes in outcomes and long-term follow-ups need to be done in order to have sufficient information for evidence based clinical practice decisions. Future research should focus on developing condition specific and sensitive assessments of the effect of POI based on the patient perspective. This can be achieved through focus groups with the aim of achieving a broader understanding of the outcome domains that are relevant to this population.