Background

Polycystic ovary syndrome (PCOS) is a diverse condition found relatively 6–20% in females at reproductive age causing an endocrine disorder [1,2,3]. However, multiple diagnostic features can be found for PCOS accompanied by more comorbidities fall in hyperandrogenism, hirsutism, acne, amenorrhea or oligomenorrhea, insulin resistance, obesity, imbalanced hormonal system affecting reproductive ability include luteinizing hormone (LH), hepatic lipid deposition, hypertension, an increased risk of miscarriage, and shape of polycystic ovarian on ultrasound [3,4,5,6,7].

In women, androgens come from ovaries and adrenal glands comprises 40–50% of the body testosterone with vital role in controlling fertility among them. Studies showed that tumors and PCOS are the main affecters in increasing their level without age relevance [8].

Hyperandrogenism is the distinct feature of PCOS. Therefore, polymorphisms of androgen activation genes are considered to be involved in the development of PCOS [9]. The N-terminal domain of AR encoded by exon 1 found to harbors variable expanses of CAG repeat that code for polyglutamine tracts (poly-Gln) regions [10,11,12,13,14].

The N-terminal transactivation domain of AR encoded by exon 1 harbors variable expanses of CAG repeat that code for polyglutamine tracts (poly-Gln) regions. CAG repeats vary normally from 8 to 35 and demonstrate constant offspring lineage [15]. In vitro studies results demonstrated that shorter poly-Gln in AR, the higher ability to activate reporter genes with elements responding to androgen [16].

The AR protein comprises four functional domains that begin with N-terminal domain (NTD) encoded by exon 1 which is related to AR transactivation activity with polyglutamine stretches encoded by CAG repeat segments ranging from 11 to 36 and polyglycine stretches encoded by GCC triplet segments, DNA-binding domain (DBD), the resilient joint region, and the ligand-binding domain (LBD) [17]. Previous studies found inverse correlation between CAG repeat number and AR activity, where shorter CAG repeats increase AR activity and is associated with hirsutism and ovarian hyperandrogenism, while longer CAG repeats lower AR activity causing hypoandrogenicity and male infertility [18].

This study was designed and conducted to identify the effective pathogenic SNPs within exon 1 with possible involvement to cause PCOS, role of CAG stretches number in AR activity as first study conducted involving Iraqi women.

Methods

Study approval

This research was reviewed and approved by Al-Nahrain University, Biotechnology Research Center IRB, Scientific Committee and Ethical Committee under reference No. 175.

Participants

Inclusion criteria

All participants were of 14–40 years old attending fertility clinic of Kamal Al-Samarie Hospital in Baghdad from period May 2020 to August 2021. A total of 150 PCOS patients and 100 controls with matching age group were recruited for this study after they gave their written and verbal consent. The diagnosis of PCOS came depending on Rotterdam criteria [5] when at least two symptoms were identified such as signs of hyperandrogenism, oligo- or anovulation, or polycystic ovaries. Control women had a normal menstrual cycle (26–35 days), normal range of fertility hormones, and normal morphology of ovaries. Control women visited the Fertility Clinic due to unrelated PCOS issue or male factors-related infertility.

Exclusion criteria

Women suffering from ovary tumors, ovary cancer, subjected to ovarian cyst excision surgery, and who were injected with fertility hormones for IVF super-ovulation program were excluded from this study. Research steps are illustrated in Fig. 1.

Fig. 1
figure 1

Flowchart showing research steps, participants, and main goal of this work

Clinical, hormonal, and body criteria measurement

The anthropometric variables of all participants, including their age, body height and weight, and menstrual cycles, were recorded. The levels of fertility hormones including follicle-stimulating hormone (FSH), luteinizing hormone (LH), oestradiol (E2), progesterone (P), total testosterone (TT), prolactin, and anti-Müllerian hormone (AMH) were assayed in the clinical laboratory of the hospital by enzyme-linked immunosorbent assay (ELISA) and chemiluminescence immunoassay (CLIA). Hormone levels were recorded in stage of follicular phase at day 3 and 8 after period.

Molecular assay for exon 1 analysis

DNA extraction

Blood lymphocytes were isolated and purified by Lymphoprep (Nycomed Pharma, Oslo, Norway) and stored saline at − 20 °C. DNA was extracted from these cells using Genaid Total DNA extraction kit, a product of Macrogen Company, Korea. DNA purity and concentration were measured using Techne NanoDrop system (England), and samples were kept under − 20 °C until further processing.

PCR amplification and characterization of CAG repeats

CAG repeats were characterized by the number of units in the AR exon1 depending on previously published technique [19]. Four primers were designed depending on NCBI sequence with accession number NG_009014.2 to amplify the entire exon1 using genome walking technique with the following sequence (5′–3′): AR-1-Forward GTGCTGGACACGACAACAAC, AR-1-Reverse CTCATTCGGACACACTGGCT with product size 307 bp, AR-2-Forward TAGGGCTGGGAAGGGTCTAC, AR-2-Reverse GCTGTTGCTGAAGGAGTTGC with product size 573 bp, AR-3-Forward GCAACTCCTTCAGCAACAGC, AR-3-Reverse GCCTTCTAGCCCTTTGGTGT with product size 394 bp, and AR-4-Forward GGTGAGCAGAGTGCCCTATC, AR-4-Reverse GTTGTTGTCGTGTCCAGCAC with product size 556 bp. Genomic DNA with concentration of 100 ng was amplified by PCR using the following protocol: denaturation cycle at 94 °C for 5 min. followed by 35 cycles of denaturation at 94 °C for 30 s, annealing at 56 °C for 30 s, extension at 72 °C for 30 s, and final extension cycle at 72 °C for 10 min. Successful amplification was confirmed by electrophoresis at 10 v/cm field strength for 90 min., and resulting amplicons were sequenced by Macrogen company, Korea using Sanger method.

Software and data analysis

Data analysis was performed using BlastRef Sequence, BlastX, and Blastp available tools at https://www.ncbi.nlm.nih.gov, BLAT, genome alignment, TBLASTN at http://asia.ensembl.org., Phyre 2 at http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index, and DNASp6 as a standalone software. All data were statistically analyzed by SPSS software (version 12; SPSS Inc., Chicago, IL, USA).

Results

Measurement of hormones

Data were recorded from 150 PCOS cases and 100 controls who were 14–40 years old. The endocrine and anthropometric parameters’ comparison of two categories of participants is listed in Table 1. PCOS patients showed higher body mass index (BMI), prolactin (PRL), luteinizing hormone (LH), progesterone (P), oestradiol (E2), total testosterone (TT), and anti-Müllerian hormone (AMH) levels compared to controls. Follicle-stimulating hormone (FSH) level decreased significantly (p < 0.05) with no significant differences related to age.

Table 1 Hormonal and anthropometric data of PCOS patients and the control

Clinical biochemical and anthropometric

In Table 2 biochemical and clinical characteristics were listed for 150 subjects diagnosed with PCOS who met Rotterdam ESHRE/ASRM-Sponsored PCOS Consensus Workshop Group [5] characteristics. Multiple phenotypic and biochemical parameters were recorded and compared with control.

Table 2 Clinical and biochemical presentation of women with PCOS and non-PCOS control groups

Molecular analysis of AR exon 1

Determination of amplicons sequence criteria

Amplicons sequences obtained from patients and control were blast using tool available at http://asia.ensembl.org/ to determine the exact location, length, and position of the sequences on exon 1. Results obtained showed that the sequence covered was X: 67545209 to 67546114 distributed as follows: X:67545209 to X:67545742; X:67545503 to X:67545739; X:67545763 to X:67546114; X:67545758 to X:67546115; and X:67545760 to X:67546114. Further analysis showed that the first two locations are vulnerable for DNA change since multiple transitions and transversion were found within the sequences causing significant effect on amino acid sequences and protein configuration. Electrophoresis of PCR amplicons is given in Fig. 2.

Fig. 2
figure 2

DNA electrophoresis of PCR amplicons for patients with PCOS and control. A shows DNA amplification using primer AR1 in lanes 1, 2 4, 5, and AR3 in lanes 6 and 7. Lanes 9 and 10 amplicons from control group and empty lanes are negative controls. B shows DNA amplification using primer AR2 in lanes 2, 3, 4, and AR4 in lanes 5, 6, and 7. Lanes 9 and 11 amplicons from control group and empty lanes are negative controls. M is 100 bp DNA marker

Effect of DNA change on amino acid sequence

In present study, DNA resulted from PCR amplification for all patients was sequenced to determine the effective change influencing the primary structure of the protein encoded by exon 1. We were able to find seven SNPs in sequences X:67545209–67545742 and X:67545503–67545739 with effect on amino acid sequence. These changes are listed in Table 3.

Table 3 Location, genetic code change, and effect on amino acid sequence in patients with PCOS

None of these SNPs were not detected in control or the consensus sequence. The in silico analysis of these SNPs demonstrated that SNPs at positions X:67545218, X:67545264 might be pathogenic, while the remaining SNPs were likely to be benign. This was determined by effect of these SNPs on resulting amino acid and later on protein secondary structure, folding, and disorder in further analysis using Phyre2 software.

Figure 3 shows chromatograms of SNPs detected in PCOS patients during DNA sequencing.

Fig. 3
figure 3

Chromatograms of PCR amplicon sequencing showing DNA change in patients with PCOS compared to control. The upper chromatogram represents normal sequence obtained from control, while the lower chromatogram shows SNPs detected in PCOS patients

Effect of amino acid substitution and protein configuration

Translation of DNA transcripts is a critical stage toward creating a protein with optimum function. Change in the genetic code assembled in mRNA may not cause a change in the final form of the protein due to redundancy or sustaining the same criteria of the amino acid. However, in our case, alteration in some amino acid sequence occurred in patients, especially these with different criteria altered the secondary form of the protein by changing in both alpha helix and beta strand. Applying DNA sequence obtained from PCOS patients to determine the secondary protein structure using Phyre2 software showed significant change in the secondary structure compared to control as elaborated in Fig. 4.

Fig. 4
figure 4

Secondary structure change in protein in PCOS patients in comparison with control. A represents effect of amino acid substitution in patients with PCOS compared with control B in sequence X:67545209–67545742; C represents alteration in protein secondary structure in PCOS patients compared with control D in sequence X:67545503–67545739. Data were generated using Phyre2 protein analysis software

Analysis of data obtained for secondary structure of protein in PCOS patients and control shows significant change represented by change in alpha helix length and change of from alpha helix form to beta strand with twice the ratio (A compared to B), while the obvious change in (D and C) was the change from alpha to beta strand in multiple locations. Parameters changed resulted from amino acids substitution (AAS) are listed in Table 4.

Table 4 Parameters related to secondary protein structure generated by Phyre2 software

The prediction is 3 states including α-helix, β-strand, or coil. Each part of the protein is represented by a color according to the type of fold: α-helices (green), β-strands (blue arrows), and faint lines (coil). Confidence of prediction is proportionally inverse to disorder with high confidence (red) and low confidence (blue). Data obtained with secondary structure were with 78–80% accuracy, and disorder prediction is on average 30%.

Analysis of CAG repeat length

Determination and estimation of the number of CAG repeats were performed during this study for patients with PCOS compared to control. We conclude that with higher number of CAG repeats, the AR becomes more sensitive to testosterone even with slight change in number of these stretches and might play a significant role in the syndrome. These stretches detected are shown in Fig. 5.

Fig. 5
figure 5

CAG stretches detected in control and PCOS patients. Figure shows difference in the number of CAG repeats between both categories

For further analysis, since the difference in CAG stretches is not significant between patients and control, we suspected the reason may be attributed to difference in codon usage in both PCOS patients. Results obtained are listed in Table 5.

Table 5 Codon usage in PCOS patients and control

The table explains the following results: two redundant codons were used for Gln; CAG that is of interest and the synonymous CAA. In patients with PCOS, both codons are almost equally used (observed value), and expected value should be constant, but RSCU results show decrease in CAG usage compared to control, meaning CAG sequence was less used in the genetic code for glutamine synthesis with deviation toward CAA codon to compensate this decrease.

Discussion

As reported by previous articles, androgens play critical roles in influencing the follicular development-associated downstream genes expression controlling stages of development of follicle through their regulatory function mediated by specific AR receptors [20, 21] and revealed that nucleotide alteration [22, 23] or altered expression of AR may enhance the progression of PCOS [24, 25].

PCOS exhibit phenotypical features like hirsutism, overweight and obesity, and biochemical changes like hormone imbalance in patients. In our investigation, patients with PCOS showed typical phenotypic and biochemical features for the syndrome as shown in Tables 1 and 2 with significant elevation in prolactin, LH, oligomenorrhea or amenorrhea, infertility, and high testosterone level which is considered as the main feature in this syndrome as hyperandrogenism. Obese normal women are fertile in general, but elevated BMI may relate to high risk of infertility. Obesity may result from ovulation, menstrual, and infertility disorders and increases the hormonal and clinical features of PCOS [26,27,28]. Thus, hormonal imbalance causing certain body illness symptoms can be associated with the pathogenesis of PCOS [29, 30]. Furthermore, in many cases, the estrogen dominance results from excess testosterone converted by the body to estrogen in PCOS patients. Such elevation may result in high fat accumulation and water retention leading to obesity [31].

Polymorphism and mutation of AR CAG repeat length were central point of research in the etiology of PCOS. Many researches have been published to relate genetic change and CAG repeat polymorphism in AR gene with PCOS, without concurrent results [9, 18]. However, in this study we used different approaches to analyze effect of polymorphism in developing PCOS depending on relative sensitivity of AR gene in regard to Gln production that has been encoded by redundant of triplets coding for it and their usage. In previously published report [19, 32], shorter CAG repeats are associated with higher sensitivity of AR, resulting in hyperandrogenism. However, such assumption was not completely confirmed, since such conclusion came from X chromosome inactivation in cases where no significant difference was found between PCOS patients and control [16], which is the same result we found in this study showing no significant difference in CAG repeats between both subjects participated in this study was found. Moreover, we found that the number of repeats in Iraqi women is higher than previously reported in both patients and control [33, 34] which might be an ethnic criterion for those women. In addition, multiple SNPs were identified along exon 1 which controls AR activity and harbor most of CAG repeats [16, 35]. These were 7 SNPs with in which two changed the physicochemical criteria of amino acids affecting the protein PI (isoelectric point), while the others affected CAG stretches toward reduction in number. However, they are extended to change the secondary structure which might affect the final form of the protein. Most of the effect was observed in X:67545503–67545739 sequence, while sequence X:67545209–67545742 was mostly affected by CAG repeats change.

The observation within this study is that, in the participants with PCOS and control, the CAG repeats were more than previously reported [16, 33]. Even with non-significant difference in their number in both participants’ categories, it seems that an increased number of these stretches render the AR receptor more sensitive to any slight change. More confirmation came from codon usage which showed significant difference in CAG repeats usage between patients and control. Such reduction in CAG repeats usage forced the AR to use synonymous codon CAA for compensation. On molecular level, such shift codon usage can produce a variant mRNA which eventually altered nonsense-mediated mRNA decay (NMD) [36, 37].

With such possible explanation for higher sensitivity of AR receptor in women with PCOS, the use of androgen blocking drugs can be a promising medication to reduce the impact of this physiological problem on these women and may elevate their fertility level.

Conclusion

The AR exon 1 is considered to be highly vulnerable for genetic alteration. This alteration is featured by the presence of effective SNPs at specific location of the exon and change in CAG repeats number. With exon harboring high content of CAG repeats, it becomes more sensitive to any change in their number and compensates such decrease with synonymous codon. It seems that a switching in codon usage for Gln amino acid present in AR exon 1 from CAG to CAA with a mechanism may be related to upstream control elements that we recommend to be studied.