Background

Recent progress in the detection of molecular genetic defects has led to a major development in the diagnosis and treatment of diseases. Decoding the human genome has provided important clues about the genetic diversity of diseases and paved the way for the development of more specialized prevention, diagnostic and therapeutic strategies. By using high-throughput technologies, next-generation sequencing (NGS) generated a significant amount of genomic data, which has been widely used over the past decade [1].

The NGS method can generally be used to sequence genes regardless of their size and complexity and cover all parts of the genome. This widespread coverage has improved the sensitivity of mutation detection methods more than other conventional approaches. Currently, the causative variants of many single-gene disorders have been identified by the NGS-based method. However, at the clinical level, identifying the effect of genetic variants on the cell function and pathogenesis is extremely important. Thus, various software and web-based bioinformatics tools have been designed and presented for variant evaluation [2].

Hereditary cardiomyopathies include a group of diseases that involve the heart muscle [3]. Their most common complications comprise the thickening of the heart muscle or dilation of the ventricles, which lead to hypertrophic (HCM) and dilated (DCM) cardiomyopathies, respectively [4]. Importantly, the patients may be asymptomatic or have mild non-specific symptoms. For this reason, heart failure can progress to sudden cardiac arrest in a seemingly healthy individual. Since cardiomyopathies run in families, rapid and accurate molecular diagnosis can be of great value to prevent the disease progression in individuals with a positive family history [5].

One of the genes associated with cardiomyopathies is myosin heavy chain gene (known as MYH7), which its mutations are reported in 14–25% of all cardiomyopathy cases [6]. The MYH7 gene is located on the 14q11-12 chromosomal position and consisted of 40 exons. Myosin heavy chain (MyHC) protein is almost exclusively expressed in heart muscle and contributes to the formation of thick filaments in a hexamer format along with myosin light chains. The protein has 1934 amino acids and is consisted of two spherical heads followed by an extended α-helical myosin rod tail which are bonded together at the neck region [7].

Since primary studies in cardiomyopathies, commonly reported the mutations in the head area, the importance of rod tail region is often underestimated.

Given that the conventional study of the MYH7 gene is time-consuming and costly, regional studies have been limited to analysis exons in the MyHC head domain. Consequently, they have not had much success in mutation detection.

Iranian Genome Database (Iranome) has provided genomic information on 800 individuals regardless of their disease or health status [8, 9]. The distribution of reported variants could help to predict the occurrence of mutations in the related pathological conditions. It can be assumed that the variants reported in Iranome could also be distributed in the related patients. Although this is not a straightforward link, it can be a key to predicting pathogenic mutations. Owing to these facts, we aimed to perform further bioinformatics studies regarding MYH7 variants based on the Iranome database. The objective of our study was to identify variants that could be disease-causing. By detecting these variants, further clinical validation studies can focus on exons which probably have a higher chance of mutation in the Iranian population.

Results

An analysis of Iranome database revealed a total of 235 variants in the MYH7 gene, 161 (68.5%) of them were predicted to be intronic (Fig. 1). Among coding variants, the highest frequency (17.4%, N = 41) was allocated to synonymous alterations. Missense substitutions accounted for 18 (7.7%) of all reported changes. As indicated in Fig. 1, the reminder included 3′ UTR (1.7%, N = 4), frameshift (4%, N = 1), splice region (1.7%, N = 4), and nonsense (9%, N = 2) variants.

Fig. 1
figure 1

The contribution of different MYH7 gene changes is indicated in the chart

When variants were analyzed based on the exon-intron distribution, it was found that intron 22 had the highest rate of changes. The synonymous alterations were located almost uniformly in all exons and two nonsense changes were reported in exons 3 and 33. Interestingly, the missense variants were mostly observed in exons 20–40 that encode MyHC α-helical rod tail (Fig. 2).

Fig. 2
figure 2

Distribution of variants according to their exonic-intronic positions

MYH7 missense variants

For further identification of the variants that could be considered as pathogenic in the Iranian population, missense substitutions were studied more precisely. The variants which were positioned on the exons and subsequently led to MyHC protein amino acid changes were then filtered. The filtering analysis found 18 missense alterations, including p.Pro211Leu, p.Arg787His, p.Val964Leu, p.Arg1277Gln, and p.Ala1603Thr which were already known to be associated with inherited cardiomyopathy. Some substitutions had previously been identified as a causative mutation in cardiomyopathies, although the subsequent studies did not confirm their pathogenesis. From this group, we can refer to p.Ala26Val and p.Arg1662His. Due to the high prevalence in the human genome databases and the results of clinical and bioinformatics studies, two variants, p.Asn1257Ser and p.Ser1491Cys, were previously considered as polymorphisms [10]. Variants p.Ala1191Thr, p.Ser1366Leu, p.Ser1596Leu, p.Asn1824Asp, and p.Asn1824Ser were found with relatively rare allele frequencies in dbSNP or genomAD databases. However, they were not reported related to any disease and generally considered as uncertain significance (Table 1).

Table 1 Characterization of MYH7 missense variants along with exon location, allele frequencies in Iranian ethnic groups and the international published references

The majority of the variants were detected in heterozygote states in only one individual out of 800 genomes indicating that they were very rare (allele frequency of 0.000625). Three variants were found in heterozygous status, each of them in two different individuals. With Allele frequency of 0.0025, the variant p.Arg1277Gln was found in four individuals in a heterozygous manner. The most common variants were p.Ser1491Cys, with 22 reported heterozygous individuals and allele frequency of 0.01375 which implies that it is a population polymorphism.

The results of variant pathogenicity on the databases and in silico analysis are presented in Table 2. As shown in the table, the results obtained from different sources were not necessarily consistent, and the conflicting outcome was observed. Variants with the most evidence of disease-causing were p.Val964Leu, p.Arg1277Gln, and p.Ala1603Thr.

Table 2 The interpreted results of the risk assessment of various reported changes

Interpretation of not annotated MYH7 missense variants

As indicated in Table 1, four reported missense variants, p.Asn1623Ser, p.Arg1588His, p.Phe1498Tyr, and p.Arg1129Ser, were not annotated on dbSNP or genomAD databases. All the four variants were located on MyHC α-helical rod tail (Fig. 3). Except for p.Asn1623Ser, the rest of the variants have not been reported on the ClinVar website.

Fig. 3
figure 3

The location of the four unreported variants is shown on the MyHC rod tail

p.Arg1129Ser (c.3387G>C) located on MYH7 exon 27 was identified as damaging by FATHMM. Another substitution in the nucleotide number 3387 (c.3387G>A) has been reported on ClinVar. This synonymous change which does not result in an amino acid change (p.Arg1129 =), has been reported in cardiomyopathy and considered as likely benign [11].

The p.Phe1498Tyr is located on exon 32 and has been declared as damaging by the majority of the algorithms, but not by MutationAssessor and FATHMM which interpreted this variant as tolerated (Table 3).

Table 3 The Predicted results of the pathogenicity assessment of four unannotated variants

By the score of 34, p.Arg1588His (c.4763G>A) has the highest combined annotation-dependent depletion (CADD) score indicating that the variant is among the top 0.1% of deleterious variants in the human genome. Also, this variant has been evaluated as disease-causing in almost all in silico analyses. On ClinVar, another missense variant, i.e., p.Arg1588Pro (c.4763G>C) and the synonymous alteration of p.Arg1588 (c.4764C>T) have been reported at this position, which are related to myopathy distal 1 disease [12] and hypertrophic cardiomyopathy [13], respectively.

Asn1623Ser has been declared as pathogenic by most of the software and reported on ClinVar to be associated with cardiomyopathy phenotypes with an uncertain significance [14]. This variant occurred in highly conserved asparagine residue located on exon 34 of the MYH7 gene.

Discussion

Using various predictive algorithms, we have evaluated the MYH7 gene variants reported on the Iranome website. Following the filtering steps, 18 missense MYH7 variants were found that could be related to the pathogenesis of the cardiomyopathies. Located on the exon 3, p.Ala26Val was previously reported in HCM and DCM probands of the Asian-origin families [15]. Further studies revealed that Alanine 26 substitution is likely benign as it occurs at poorly conserved amino acid. Furthermore, it has an allele frequency of 0.55 in the East Asian population, which based on the ClinGen Inherited Cardiomyopathy Expert Panel, is above the threshold and should be considered as benign [16].

Another variant, p.Pro211Leu, is identified in several studies related to cardiomyopathies [17, 18]. It has been reported in several patients as a compound heterozygous alteration along with other MYH7 missense mutations [19]. It should be noted that adjacent mutations to Pro211Leu were reported to be involved in the disease pathogenesis. Also, its low prevalence is another reason to be considered as a disease causative mutation.

In a previous study, p.Arg787His was declared as a mutation that could cause phenotypes of varying severity [20]. This mutation has been reported in several studies from India, while in Iranome database, it has been identified in a Persian Gulf Islander in a heterozygous status. By geographic proximity, it can be assumed that a founder effect is involved, although in studies from India, this mutation has been identified as de novo [21].

The variant which should be considered seriously in Iranian cardiomyopathy patients is p.Val964Leu located on exon 23. In Iranome, two individuals from Turkmen and Persian ethnicity carried this substitution. The p.Val964Leu has been reported linked to cardiomyopathies, either HCM or DCM, in numerous studies [22,23,24]. However, this variant is indicated in ClinVar with conflicting interpretations of pathogenicity because of relatively high frequency in the European population (0.08%). The Valine964 is located in the neck region of MyHC and is a highly conserved amino acid and thus the change to Leucine was predicted to be pathogenic [25].

Another variant of uncertain significance is p.Arg1277Gln which has changed as a semi-conservative amino acid. This substitution is located on exon 34 and has been reported from different parts of the world [26, 27].

The p.Ala1603Thr is another alteration that should be considered in Iranian studies. In silico testing, including protein predictors and evolutionary conservation, showed that p.Ala1603Thr can be pathogenic. Using high resolution melting (HRM) method, this variant was firstly reported in a cohort of HCM patients [28]. In a recent study, p.Ala1603Thr has also been reported in an HCM patient and it has been deemed as pathogenic in the population study [29].

The next variant of uncertain significance is p.Arg1662His, which is found in both HCM and DCM [30, 31]. It should be noted that Histidine is the wild-type amino acid at this position, in different species.

Conclusion

In our study, four amino acid substitutions in MYH7 protein were taken into consideration. These variants occurred in the protein tail rod region and were reported as disease causative by most prediction software. Among them, p.Asn1623Ser was reported in ClinVar and suggested to be deleterious based on a computational algorithm that was developed to evaluate the pathogenicity of MYH7 gene variants. The other three variants were not present in dbSNP or genomAD databases and have not been reported in individuals with MYH7-related cardiomyopathy according to the literature. That could be evidence of their pathogenicity in the Iranian population. However, this finding should be confirmed by conducting molecular studies on potential patients. In summary, the Iranian patient’s studies should be prioritized to evaluate MYH7 exons 20–40.

Given the high cost of molecular diagnosis and its vital importance for many patients, the availability of national databases should be considered as a valuable opportunity. The availability of this information will also prevent blind studies and have a promising impact on the perspective of genetics research.

Methods

Data extraction

Based on the literature review and extensive search on available databases, MYH7 was selected due to its greatest contribution to hereditary cardiomyopathies. All national reports that had represented MYH7 mutations and their association with cardiac disease were screened. In the next step, by referring to the Iranome website, all the reported MYH7 variants were extracted and loaded onto the SPSS version 20.0 and Excel 2010 software. Iranome database includes the results of NGS analysis of 800 genomes obtained from Iranian individuals over 35 years old. The samples were collected from 8 different Iranian ethnic groups, 100 individuals from each. Iranome website provides a search tool based on the gene name, genomic region, transcript, and multi-allele variants which are continually updated with new genomic data. The majority of the reported variants are similar to other communities, while 30% (422,000) of these genetic changes are unique to the Iranian population.

Filtering strategy

To determine the MYH7 gene varieties associated with cardiomyopathies, the data in Iranome was filtered in several steps. All variants which occurred in exons and led to amino acid change were selected and studied. To identify the pathogenic effects of the variants, they were divided into two groups including previously reported and unannotated variants. An “unannotated variant” was referred to as the alterations that were not previously interpreted on the dbSNP or genomAD databases. Published articles and documents related to the reported mutations were also analyzed and variants that were associated with cardiomyopathy phenotype were identified.

Bioinformatics

Bioinformatics analysis was done on putative MYH7 nucleotide substitutions selected from filtering steps using the following databases and online resources.

Iranome: http://www.iranome.com/

dbSNP: http://www.ncbi.nlm.nih.gov/snp

Genome Aggregation Database: http://gnomad.broadinstitute.org/

The data were interpreted using various online algorithms. The software is score-based so that after the analysis, they determine a numerical value. The results were mentioned in the tables after the final interpretation. Deleterious thresholds were PolyPhen2 > 0.5, MutationTaster > 0.5, SIFT > 0.95, Mutation Assessor > 0.65, FATHMM > 0.453, and CADD ≥ 30 deleterious (in the top 0.1% of deleterious variants in the human genome).

PolyPhen2: genetics.bwh.harvard.edu/pph2/

SIFT: https://sift.bii.a-star.edu.sg/

MutationTaster:http://www.mutationtaster.org/RRID:SCR_010777

Mutation Assessor: mutationassessor.org

ClinVar: http://www.ncbi.nlm.nih.gov/clinvar/

FATHMM:fathmm.biocompute.org.uk/

CADD: http://cadd.gs.washington.edu/home