Introduction

Gastric cancer begins with the uncontrolled proliferation of cells in the stomach. Despite a decline in incidence, it remains one of the most common types of cancer worldwide. According to GLOBOCAN 2020, gastric cancer ranks fifth in incidence at 5.6%. However, it is the fourth leading cause of cancer deaths globally. This type of cancer is more common in males, ranking fourth among males. However, it is less significant among common cancers in women. Out of the 1,089,103 new cases of gastric cancer, 75.3% (819,944 cases) occurred in the Asian population. Additionally, the Asian population accounted for 74.8% of the mortality cases. The age-standardized incidence rate (ASR) for gastric cancer was estimated to be 11.1% worldwide. Eastern Asia, with the highest level of incidence rate, had an ASR of 22.4. Males and females in this area had ASRs of 32.5 and 13.2, respectively [1].

Ardabil province is located in the northwest of Iran. Based on an ASR of 49.1 for males and 25.4 for females in this area, compared to the Korean Republic (39.7 for males and 17.6 for females), its population is among the highest prevalence population affected by gastric cancer [2].

Among the subtypes of gastric cancer, the cardia type has been shown to be significantly higher in the Ardabil region compared to some European countries, the USA, Japan, and Korea. The highest incidence of gastric cancer, especially the cardia type, in the Ardabil region provides an exceptional opportunity to investigate its etiology [3].

Through genome-wide association studies (GWAS), more than 430 cancer-associated loci have been identified [4]. Multiple GWAS have been conducted for gastric cancer, leading to the determination of many cancer-associated loci. The SNPs associated with cancer susceptibility are located in coding or non-coding regions of the genome. Synonymous and non-synonymous mutations are two subtypes of coding SNPs [5].

Despite not affecting the amino acid sequence, synonymous variants may change the expression level and function of the gene product through post-transcriptional modification, translation rate stability of mRNA, failure of splicing regulatory proteins to interact with some exons, affecting the kinetics of translation, followed by slowing down the rate of protein synthesis and/or modification, and ultimately altering the pause site, which may result in a different conformation of the protein. Non-coding variants are located in intervening and intergenic sequences, accounting for more than 90% of interindividual variations. Depending on the variant's location, it may be involved in the regulation of gene expression by harboring response elements such as promoters and enhancers, as well as post-transcriptional and translational processing [6].

Additionally, changes in epigenetic modification and chromatin structure caused by non-coding variants can alter the target gene expression level. Variants in the 3' UTR of genes can modulate the interactions between mRNA and related microRNAs, as well as polyadenylation, and can impact translation efficiency and/or mRNA stability [7].

MicroRNAs are small (18–20 nucleotide) non-coding RNAs that regulate mRNA translation by directly binding to their 3'-UTR. This regulation is managed through cleavage of the mRNA transcript and/or repression of translation, depending on the complementarity between the miRNA and its targeted mRNA. Therefore, SNPs in the sequence and/or 3'-UTR of the mRNA transcript can be valuable markers for predicting cancer susceptibility. SNPs in miRNAs and miRNA binding sites play critical roles as cancer risk biomarkers. For example, the miR-453 binding site in the ESR1 gene, the miR-638 binding site in the BRCA1 gene, and the miR-628-5p binding site in the TGFBR1 gene are important in breast cancer. Variants such as rs3783553 in hepatocellular carcinoma create new binding sites for miR-122 and miR-378 in the IL1A gene. There is also a relationship between SNPs within miRNA binding sites of the CD886, INSR, RPA2, and GTF2H1 genes and colorectal cancer. Furthermore, certain SNPs in the 3'-UTR of the MYCL1 and NBS1 genes are prognostic biomarkers in lung cancer, as well as breast, ovarian, and bladder cancers, while also serving as predictive biomarkers in bladder cancer (for response to radiotherapy), prostate cancer (for response to ADT), and response to Methotrexate and Cisplatin. SNPs in the seed sequence of miRNAs can also affect their processing or binding. Reported variants such as rs2910164 in miR-146a, rs3746444 in miR-499, rs12975333 in miR-125a, rs34059726 in miR-124, and rs11614913 in miR-196-a2 indicate an association between miRNA SNPs and cancer risk [8, 9].

In this study, we compared the frequency of 263 variants in coding and non-coding sequences between populations based on previous case–control, meta-analysis, and GWAS reports. The aim of this research was to identify variants that could be considered as predictive values for susceptibility to gastric cancer in the Ardabil population.

Materials and methods

Study design

All participants were provided signed informed consent declaring to investigate the peripheral blood samples according to the ethical rule of the Iranian Ministry of Health.

One hundred and fifty healthy individuals aged 18–35 years (mean age 27.32 ± 6.32), 150 cases of primary gastric cancer aged 36–72 years (mean age 63.2 ± 11.3), and 150 healthy age- and sex-matched individuals for the case group aged 37–70 years (mean age 60.1 ± 12.4) were included in this study. Among the general population group, 86 individuals (57.3%) were male. This distribution was 60% for cases and 56.7% for controls, respectively. All of the volunteers were born and currently reside in Ardabil province in the northwest of Iran. The inclusion criteria for the general population group were birth and residence in Ardabil, while the exclusion criteria were having cancer, having first-degree relatives with any type of cancer, and having gastrointestinal disorders or diseases associated with DNA repair system deficiencies. Additionally, having a relative affected by cancer was an exclusion criterion for the control group. The ethical and regulatory issues related to the collection of human specimens for research purposes were approved by the Ethics Committee of Islamic Azad University-Rasht Branch (Approval ID: IR.IAU.RASHT.REC.1398.057).

All participants provided signed informed consent, in accordance with the ethical rules of the Iranian Ministry of Health, to investigate their peripheral blood samples.

Whole exome sequencing

Genomic DNA was isolated from the patient's specimen, EDTA anticoagulated peripheral blood, using a silica-membrane-based DNA purification method recommended by the manufacturer (QIAamp DNA Blood Mini Kit, QIAGEN, Germany). The integrity of the DNA was verified using the Qubit 4 Fluorometer (Thermo Fisher Scientific, USA). A total of 1.0 μg of genomic DNA per sample was used for the DNA sample preparation.

Sequencing libraries were created using the Agilent SureSelect Human All ExonV7 kit (Agilent Technologies, CA, USA) following the manufacturer's instructions. Index codes were added to attribute sequences to each sample. Fragmentation was performed using a hydrodynamic shearing system (Covaris, Massachusetts, USA) to generate fragments of 180–280 bp. Remaining overhangs were converted into blunt ends using exonuclease/polymerase activities, and enzymes were then removed. After adenylation of the 3' ends of the DNA fragments, adapter oligonucleotides were ligated. DNA fragments with adapter molecules on both ends were selectively enriched in a PCR reaction. Captured libraries were further enriched in a PCR reaction to add index tags for hybridization. The products were purified using the AMPure XP system (Beckman Coulter, Beverly, USA) and quantified using the Agilent high sensitivity DNA assay on the Agilent Bioanalyzer 2100 system. The qualified libraries were then sequenced using the NovaSeq 6000 Illumina sequencers.

Data quality control, analysis, and interpretation were performed on a G9 generation HP server using a Unix-based operating system. Variant analysis and classification utilized relevant databases and references [10,11,12,13,14,15,16,17,18].

Sanger-based PCR-sequencing

Despite using the WES 100X for variant reading, 28 variants were randomly validated using Sanger sequencing. The selection criteria for these variants were based on their location within the gene, including coding sequences, promoters, introns, and non-coding sequences such as microRNAs. The PCR products were sequenced directly on the automated genetic analyzer (Applied Biosystems 3130xl; USA), followed by blasting on the NCBI website (http://www.ncbi.nlm.nih.gov/blast) and comparison with normal sequences. Twenty-six primer pairs were used to amplify the selected variants (see Additional file 1: Table S1). The Primer3 software (https://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi) and previous studies [19,20,21,22,23,24,25,26,27,28,29] were utilized for primer design and selection. Ultimately, seventeen variants were included in the case–control study.

Statistical analysis

Descriptive statistics of the variants were presented as frequencies. The frequency of the variants was calculated using standard methods. Differences in frequencies between groups were compared using the Pearson correlation test. Additionally, scatter plots were drawn for each variant. Genotype and allele frequencies between the cases and controls were analyzed using the Chi-square test. For all analyses, a P value of less than 0.05 was considered statistically significant. The IBM SPSS Statistics version 25 was used for the statistical analyses. Three online Chi-square software tools (https://gene-calc.pl/hardy-weinberg-page, https://www.had2know.org/academics/hardy-weinberg-equilibrium-calculator-2-alleles.html, and https://wpcalc.com/en/equilibrium-hardy-weinberg/) were used to test for deviation from the Hardy–Weinberg equilibrium (HWE) in the studied groups.

Results

Significant differences of allele distribution among different populations

For the 260 variants selected to be correlated with gastric carcinogenesis through a literature review, allele frequencies were obtained from the whole exome sequencing (WES) results of the general population group (Additional file 2: Table S2).

Comparison of these results with frequencies from genomic databases such as Iranome, Alfa, 1000G, ExAC, and gnomAD, for Iran, Europe, and the world, revealed considerable differences. These variant frequencies were statistically analyzed using the Pearson correlation test and scatter plots, based on their reported age-standardized rates (ASR). Table 1 and Additional file 4: Fig. S1 display the significant differences (P < 0.05) and their corresponding scatter plots. The variants that showed significant differences were rs10061133, rs1050631, rs12220909, rs12983273, rs1695, rs2274223, rs2292832, rs2294008, rs2505901, rs2976391, rs33927012, rs3744037, rs3745469, rs4789936, rs4986790, rs4986791, rs6194, rs63750447, and rs6505162.

Table 1 Statistical association between the variant frequencies in Ardabil population comparing Iran, Europe, and World data, based on their reported Age-Standardized Rates (ASR for Ardabil: 39.1, Iran: 16.6, Europe: 13.7, and World: 11.1)

To validate the accuracy of the detected frequencies, 26 randomly selected variants were confirmed using Sanger-based PCR-Sequencing (Additional file 5: Fig. S2).

Significant differences of allele distribution among case and control groups

Based on their significant differences with other populations and the frequency of the variant in our population (Table 1), seventeen variants were selected for evaluation in terms of frequency difference between two case and control groups. The genotype frequencies of the variants rs12220909, rs2292832, rs2505901, rs33927012, rs4789936, rs6194, and rs63750447 did not deviate from HWE, while the others did deviate from HWE. Among the 17 selected variants, the variants rs1050631, rs12983273, rs1695, rs2274223, rs2292832, rs2505901, rs33927012, rs374569, and rs6505162 showed significant differences (Table 2).

Table 2 Consistency with Hardy Weinberg’s law of the selected variants for the general population

The CT (36.7%), TT (0.047%), and CT+TT (41.3%) genotypes of the rs1050631 C > T polymorphism (deviated from HWE: χ2 = 2.65; p Value = 0.104) were less frequently observed in the gastric cancer affected cases than in the controls (46.7%, 14%, and 52.7%, respectively). Comparisons of the major allele (C) frequencies and genotypes (CC) indicated that they were significantly associated with gastric cancer using a dominant model (CC vs. CT+TT: P < 0.0001).

Comparisons of the minor allele (T) frequencies and genotypes of the variant rs12983273 (deviated from HWE: χ2 = 2.03; P value = 0.154) indicated that they had a significant association using recessive models: allele T (7.7% in cases and 13.7% in controls) vs. C, P = 0.02; genotype CT+TT (14.7% and 24.7% for the cases and controls, respectively), P = 0.031.

The AG+GG (62.7%) genotype of the rs1695 A>G polymorphism (deviated from HWE: χ2 = 0.498; P value = 0.48) was more frequently observed in the gastric cancer affected cases than in the controls (50.7%). Additionally, the higher frequency of the allele G among the cases (37.3% vs. 29.7% for the controls) indicated that they were significantly associated with gastric cancer using a recessive model: allele (G vs. A), P = 0.047; genotype, P = 0.036; for the genotype (AG+GG vs. AA).

The genotype GG of the variant rs2274223 A>G (deviated from HWE: χ2 = 0.49; P value = 0.48) showed to be more frequent among the cases (24% vs. 16% for the controls). This difference indicated that it was significantly associated with gastric cancer using a recessive model: genotype GG vs. AA, P = 0.035.

The allele C (35.3%), and TC (38.7%) and CC (18%) genotypes of the rs2292832 T>C polymorphism (did not deviate from HWE: χ2 = 13.42; P value = 0.0002) were less frequently observed in the cases than in the healthy controls (64.7%, 43.3%, and 26.7%, respectively). Comparisons of the major allele (T) frequencies and genotypes (TT) indicated that they were significantly associated with GDM using both dominant and recessive models (for the recessive models: allele (C vs. T), P = 0.001, and genotype (TC+CC vs. TT), P = 0.01; for the dominant models (CC vs. TT), P = 0.01, and (CT vs. TT), P = 0.046).

The rs2505901 T>C did not deviate from HWE (χ2 = 51.59; P value = 0). However, comparisons of the minor allele (C) frequencies (28.3% and 36.3% for cases and controls, respectively) and genotypes (TC+CC; 41.3% and 52.7% for cases and controls, respectively) indicated that they were significantly associated with the inverse risk of gastric cancer using a recessive model (T vs. C: P = 0.0366; TC+CC vs. TT: P = 0.0497).

Despite not deviating from HWE (χ2 = 16.94; P value = 0) and not determining any significant difference for the distribution of the genotypes among the cases and controls for the variant rs33927012 T>C, the higher frequency of the allele C (6.7% for the cases vs. 3% for the controls) led to a significant difference (P = 0.0412).

The GA+AA (46.7%) genotype of the rs3745469 G>A polymorphism (deviated from HWE: χ2 = 5.74; P value = 0.057) was more frequently observed in the gastric cancer affected cases than in the controls (35.3%). Additionally, the allele A was more frequent among the cases (28% vs. 21% for the controls). These findings indicated that they were significantly associated with gastric cancer using a recessive model: allele (A vs. G), P = 0.047; genotype (GA+AA vs. GG), P = 0.046.

When the frequency of rs6505162 AA genotype was used as a reference, individuals carrying the AC and AC+CC genotypes had an increased risk of gastric cancer (crude OR 0.46, 95% CI 0.24–0.89 for AC vs. AA, P = 0.02; crude OR 0.5, 95% CI 0.27–0.91 for AC+CC vs. AA, P = 0.02). This variant deviated from HWE (χ2 = 2.7; P value = 0.1). The mentioned points are shown in detail in Tables 2, and 3, and Additional file 3: Table S3.

Table 3 Significant differences between cases and controls for some variants

Discussion

Somatic mutations in oncogenes and tumor suppressor genes, in collaboration with environmental exposures such as tobacco and carcinogenic chemicals, can trigger cancer. While hereditary cancers account for only 5–10% of all cases and are associated with germline alterations in oncogenes and tumor suppressor genes, non-hereditary cancer can be predisposed by single-nucleotide polymorphisms (SNPs). These genetic variants, known as “drivers,” primarily cause uncontrolled cellular proliferation and occur in “cancer driver genes” [30]. A recent study identified about 570 cancer driver genes that mediate molecular regulatory networks and changes in the tumor microenvironment. Mutations in these genes, as well as abnormal expression levels, can lead to uncontrolled tumor cell proliferation, invasion, and drug resistance [31].

Some case–control studies have shown a correlation between SNPs and susceptibility to gastric cancer in the Ardabil population. Among these, the relationship between IL-1β-511 and MTHFR C677T polymorphisms and gastric cancer is notable [32, 33]. However, some SNPs previously reported as gastric cancer-related variants were not associated with this population, including the polymorphism P53 Arg72Pro [34].

Previous studies have suggested an important role of miR-149 in carcinogenesis. This microRNA acts as a tumor suppressor gene in some cancers and an oncogene in others. Upregulation of miR-149 has been reported in AML, prostate cancer, glioblastoma, and melanoma, while it is downregulated in gastric, hepatocellular, renal cell, lung, colorectal, breast, and thyroid cancers, as well as neuroblastoma. Upregulation of miR-149 in tumor cells and gastric cancer cell lines inhibits cell proliferation and induces G0/G1 cell cycle arrest [35].

In our general population, the frequency of the C allele of rs2292832 (a T to C nucleotide change in miR-149) was found to be the lowest (0.51 vs. 0.64 for Iran, 0.7 for Europe, and 0.7 for the world), indicating a significant difference (P = 0.017). Consistent with this finding, a lower frequency of the C allele (P = 0.001) and the genotypes TC (P = 0.046), CC (P = 0.01), and TC+CC (P = 0.01) were observed compared to the controls. Based on the reported relationship between a reduced risk of gastric cancer and the rs2292832 C allele in two case–control studies on Chinese and Korean populations, as well as a meta-analysis [36,37,38], the lower frequency of the C allele in Ardabil could be associated with a higher risk of gastric cancer. However, the absence of deviation from Hardy–Weinberg equilibrium and the lack of association in other studies [39,40,41] necessitate further research, including cohort investigations, in our population.

The variant rs12983273 C>T is located in the MIR371B and MIR373 genes. It has been reported that hsa-miR-373 is downregulated in the gastric cancer recurrent group [42]. PRDM4, which has been identified as a risk biomarker for gastric cancer patients, is regulated by hsa-miR-373 [43]. In addition to the upregulation of miR373 in gastric adenocarcinoma tissue and gastric carcinoma cell lines, the inhibition of migration and invasion in some gastric cancer cell lines has been observed [44].

There were no reported associations for rs12983273 and female neoplasm, breast cancer, ESCC, Oral Premalignant Lesions (OPL), and esophageal cancer risk [45]. Compared to Iranian, European, and world populations, the T allele has a lower frequency in Ardabil (0.107 vs. 0.136). This difference tends to be significant (P = 0.017). Consistent with this finding, the frequency of allele T and genotype CT+TT in cases affected with gastric cancer were lower than in controls (P values of 0.02 and 0.031, respectively). Cohort studies are needed to validate the impact of rs12983273 in conferring gastric cancer predisposition in our population.

The variant rs1695 is located on exon 5 of the GSTP1 gene. The substitution of an A nucleotide with a G leads to a missense mutation (Leu > Val). This polymorphism could reduce the detoxification activity of the enzyme. Furthermore, different genotypes of rs1695 have important interactions with environmental factors, including Helicobacter pylori infection, smoking, and alcohol consumption in cases with gastric cancer [46]. While some reports show no association with gastric cancer [47, 48], Val/Val (vs. Ile/Ile) and the G allele have been found to be related to gastric cancer. Additionally, the GA+GG genotype was associated with a larger tumor size [49]. These results do not align with our finding in the general population, which has a lower frequency of the G allele. However, our case–control study revealed that the frequency of the G allele and genotype AG+GG were higher among the cases compared to the controls (P values of 0.047 and 0.036, respectively). A meta-analysis found no association between rs1695 and gastric cancer in the Caucasian population (the Ardabil population is part of Caucasians), in contrast to Asians [50]. This report supports our finding in the general population. The contradiction between our population-based and case–control studies can be resolved by conducting a cohort study.

The Phospholipase C epsilon1 gene (PLCE1) is involved in cell growth and differentiation, as well as gene expression. The phospholipase C encoded by PLCE1 induces GTPases and hydrolyzes 1,4,5-phosphatidylinositol to inositol 1,4,5-triphosphate and 4,5-diacylglycerol [51]. Variant rs2274223 is a missense variant (A>G; His>Arg) in the coding region of the PLCE1 gene.

There is a 30% difference in the frequency of the G allele between the Ardabil general population and the world population (0.42 vs. 0.32), which is statistically significant (P = 0.036). This difference, along with the consistent relationship between rs2274223 and gastric cancer reported previously, could explain the important role of this variant in conferring susceptibility to gastric cancer in Ardabil [52,53,54,55,56,57]. Additionally, the GG genotype was found to be more frequent in the cases compared to the controls (P = 0.035).

There were no reported associations for rs12983273 and female neoplasm, breast cancer, ESCC, Oral Premalignant Lesions (OPL), and esophageal cancer risk [45]. Compared to Iranian, European, and world populations, the T allele has a lower frequency in Ardabil (0.107 vs. 0.136). This difference tends to be significant (P = 0.017). Consistent with this finding, the frequency of allele T and genotype CT+TT in cases affected with gastric cancer were lower than in controls (P values of 0.02 and 0.031, respectively). Cohort studies are needed to validate the impact of rs12983273 in conferring gastric cancer predisposition in our population.

The variant rs1695 is located on exon 5 of the GSTP1 gene. The substitution of an A nucleotide with a G leads to a missense mutation (Leu>Val). This polymorphism could reduce the detoxification activity of the enzyme. Furthermore, different genotypes of rs1695 have important interactions with environmental factors, including Helicobacter pylori infection, smoking, and alcohol consumption in cases with gastric cancer [46]. While some reports show no association with gastric cancer [47, 48], Val/Val (vs. Ile/Ile) and the G allele have been found to be related to gastric cancer. Additionally, the GA+GG genotype was associated with a larger tumor size [49]. These results do not align with our finding in the general population, which has a lower frequency of the G allele. However, our case–control study revealed that the frequency of the G allele and genotype AG+GG were higher among the cases compared to the controls (P values of 0.047 and 0.036, respectively). A meta-analysis found no association between rs1695 and gastric cancer in the Caucasian population (the Ardabil population is part of Caucasians), in contrast to Asians [50]. This report supports our finding in the general population. The contradiction between our population-based and case–control studies can be resolved by conducting a cohort study.

The Phospholipase C epsilon1 gene (PLCE1) is involved in cell growth and differentiation, as well as gene expression. The phospholipase C encoded by PLCE1 induces GTPases and hydrolyzes 1,4,5-phosphatidylinositol to inositol 1,4,5-triphosphate and 4,5-diacylglycerol [51]. Variant rs2274223 is a missense variant (A>G; His>Arg) in the coding region of the PLCE1 gene.

There is a 30% difference in the frequency of the G allele between the Ardabil general population and the world population (0.42 vs. 0.32), which is statistically significant (P = 0.036). This difference, along with the consistent relationship between rs2274223 and gastric cancer reported previously, could explain the important role of this variant in conferring susceptibility to gastric cancer in Ardabil [52,53,54,55,56,57]. Additionally, the GG genotype was found to be more frequent in the cases compared to the controls (P = 0.035).

The variant rs6194 is a synonymous variant (p.His =) in the NR3C1 gene. Although synonymous variants are generally considered to have no pathogenic effect, some reports suggest that they may play a role in splicing, RNA structure, and translation rate in carcinogenesis processes [6]. Investigating the role of rs6194 in conferring cancer susceptibility is therefore important, as it has been found to create premature stop codons and yield a shorter mRNA via exon skipping [69]. However, despite the high frequency of the T allele in the general population of Ardabil (Ardabil: 0.003; World: 0.0005; P = 0.028), the very low frequency of this variant in the population of Ardabil means that it lacks diagnostic and prognostic validity. Our case–control study results confirm this issue.

The variant rs63750447 is a missense variant (A>T; Val>Asp) in the mismatch repair gene hMLH1. In Silico analysis, including FATHM, FATHM-MKL, MutationTaster, MutationAssessor, SIFT, Polyphen2 HVAR, PROVEAN, BayesDel_addAF, DANN, DEOGEN2, EIGEN, LIST-S2, and MetaLR, predicts this variant to be deleterious. We found a significant difference in the frequency of the T allele in our general population (P = 0.009). A meta-analysis and a study in a Chinese population have reported associations between rs63750447 and increased risk of colorectal, endometrial, and gastric cancer [70, 71]. However, another study in a Chinese population did not find any association [72]. Therefore, this variant could be considered a candidate for increasing gastric cancer frequency in our population. The inconsistency between our case–control study results and the general population findings may be due to not deviating from the Hardy–Weinberg equilibrium.

The variant rs1047768 is a silent variant (p.His =) in the ERCC5 gene, which replaces a T nucleotide with a C. ERCC5 is an essential element of the Nucleotide Excision Repair (NER) pathway and encodes an endonuclease that catalyzes 3' incisions [73, 74]. Previous studies have suggested that rs1047768 plays an important role in carcinogenesis, abnormal cell proliferation and differentiation, and increased cancer susceptibility. This variant has also been reported to impact sensitivity to chemotherapy [75]. While some reports have found no association between rs1047768 and gastric cancer [76,77,78,79], our results are consistent with a study in a Chinese population, which found a decreased risk for gastric cancer with the TC genotype compared to TT [80]. Therefore, the lower frequency of the C allele in the Ardabil population (0.46 vs. 0.57 for the World) could confer susceptibility to gastric cancer.

The variant rs1050631 is a C to T nucleotide change in exon 6 of the SLC39A4 gene, resulting in a synonymous mutation. Although it is expected to have no impact on protein function, this synonymous mutation may be involved in various pathogenic mechanisms, such as disease susceptibility, treatment outcomes, protein levels, structure and function, RNA processing, post-transcriptional regulation, translation initiation, early and late translation elongation, and co-translational folding [6, 69]. Our current findings reveal a lower allelic frequency of the T nucleotide in the general population of the Ardabil province (0.27 vs. World: 0.35; P = 0.027). Additionally, the frequencies of CT, TT, and CT+TT genotypes were lower among affected cases compared to healthy controls (P value < 0.0001, 0.019, and < 0.0001, respectively). These results are consistent with a Chinese population, where the CT+TT genotype was associated with a higher risk of recurrence and death in patients with gastroadenocarcinoma [81]. However, another study did not find any association [82]. Therefore, rs1050631 may be associated with the high rate of gastric cancer in the Ardabil population.

DNA methylation, which is regulated by DNA methyltransferases (DNMTs), plays a crucial role in transcription regulation. Abnormal expression of DNMT1, the first discovered DNMT, has been linked to various types of cancer. In gastric cancer, overexpression of DNMT1 is associated with abnormal differentiation, higher tumor stage, and increased mortality risk [83, 84]. Knockdown of DNMT1 leads to cell cycle arrest, increased apoptosis, decreased invasion, and improved response to chemotherapy. Patients with low DNMT1 expression levels have shown better histopathological and clinical responses. DNMT1 expression is higher in gastric cancer compared to normal, para-cancerous, and dysplasia tissues [85].

The variant rs16999593 is a substitution of His amino acid with Arg (A>G) in exon 4 of the DNMT1 gene. Although no association with gastric cancer has been found in some populations, such as Iranian and Chinese [86, 87], our findings are consistent with a higher frequency of the G allele in the Ardabil population compared to the global population (0.013 vs. 0.002). Strong associations with gastric cancer have been reported in three different meta-analyses, with a 45% increased risk in G allele carriers [76, 88,89,90].

The variant rs1800469 is located in the promoter region of the TGFB gene. The TGFβ1 polypeptide can have a dual role in gastric carcinogenesis, either inhibiting immunosurveillance and promoting EMT and metastasis, or suppressing tumor growth by inhibiting cell cycle progression [91]. Several studies conducted on Chinese, American, and Polish populations did not find any significant association between this variant and gastric cancer [100]. Additionally, there was no relationship between rs1800469 and pancreatic cancer in the Iranian population [92].

The decreased risk of gastric cancer in carriers of the C allele (genotypes CC and CT compared with TT) in a Chinese population, as well as the significant association found in certain subtypes of gastric cancer including intestinal type, poorly differentiated, and stage TNM, suggests a relationship between rs1800469 and a decrease in gastric cancer risk [93]. These findings indicate that the lower frequency of the C allele in the Ardabil population (0.52 vs. 0.67 worldwide) could be related to a higher frequency of gastric cancer in this geographical location.

The GG genotype of the miR-449 variant rs10061133 showed a predictive effect on esophageal squamous cell carcinoma (ESCC) in a Chinese population [94]. However, no association was found for lung cancer [95]. In our study, we found that Allele G was more frequent in our general population compared to the worldwide frequency (0.13 vs. 0.091; P = 0.038). Given the reported correlations with gastric cancer, including downregulation of miR-449, inhibition of cell proliferation, induction of senescence and apoptosis, and the upregulation of CDK6, as well as its high frequency in the Korean population (0.25) with a gastric cancer age-standardized rate (ASR) of 39.6, investigating the role of rs10061133 in gastric carcinogenesis seems to be useful.

It was observed that SPHK1 expression is elevated in gastric cancer tissues and cells. Additionally, SPHK1 knockdown tends to suppress cell proliferation, migration, and invasion of gastric cancer cell lines, as well as block the cell cycle and induce apoptosis [97]. In the Chinese population, higher expression levels of SPHK1 are associated with shorter overall survival time in gastric cancer patients [98]. The variant rs3744037 is a stop-gained variant that leads to premature termination of translation. Although it appears to reduce SPHK1 expression, based on five benign predictions (by in silico analysis including BayesDel_addAF, DANN, EIGEN, FATHMM-MKL, and MutationTaster), and the fact that it is not in a highly conserved position, it may not have a mitigating effect. The higher frequency of this variant in the Ardabil general population (0.2 vs. 0.167 worldwide; P = 0.024) makes it an attractive subject for investigation. However, the findings of our case–control study did not confirm this conclusion.

A significant association was found for the apoptosis-related gene, BCL2L12, with early-stage (I/II) gastric tumors and the intestinal histotype, as well as disease-free and overall survival of patients [99]. The variant rs3745469 is located in the non-coding region of the BCL2L12 gene and has been found to be more frequent in the Ardabil population (0.24 vs. worldwide: 0.09; P = 0.016). Our case–control study helped us reveal its role in gastric carcinogenesis. The frequency of allele A and the genotype GA+AA were determined to be higher in our affected cases compared to controls (P value, 0.047, and 0.046, respectively).

In a Chinese population, allele C of rs12220909 has been associated with a decreased risk of esophageal squamous cell carcinoma [94]. Similarly, the frequency of allele C in patients affected by non-small cell lung cancer (NSCLC) was significantly lower than in the Chinese normal population (13.3% in cases vs. 16.9% in controls; P = 0.001) [95]. Therefore, the low frequency of this variant in the Ardabil general population (0.003 compared to Iran: 0.01, 1000G: 0.0381, and gnomAD: 0.01) could be considered a risk factor for gastric cancer.

Bioinformatics analysis, including RegulomeDB, HaploReg, and GTEx database, found that rs4789936 has an effect on transcription factor binding, motif changes, DNase footprint, and DNase peaks of the TIMP2 gene. Additionally, it was shown that carriers of the TT genotype have a higher expression level compared to CC carriers, and the risk of this variant is associated with an increased expression level of the gene [100]. The higher frequency of the T allele in the Ardabil population (0.54 vs. 0.5 worldwide; P = 0.012) could be related to the observed incidence rate of gastric cancer. However, the results of the case–control study showed contradictory results. This contradiction may be due to not deviating from the Hardy–Weinberg equilibrium. Therefore, designing a cohort study could be useful.

TLR4 is highly expressed in monocytes, lymphocytes, and splenocytes. Its product plays a critical role in the inflammatory-related immune response to H. Pylori infection in the gastric carcinogenesis process. A meta-analysis detected a significantly increased gastric cancer risk in Caucasian populations for rs4986790 and rs4986791 [100]. Despite our findings in the general population (0.03 in Ardabil vs. 0.06 in Iran and worldwide), the case–control study did not reveal any significant difference.

Three unrelated meta-analyses revealed a significant association between the miR-423 rs6505162 A>C variant and overall cancer risk, including gastric cancer [101,102,103]. The higher frequency of allele C in our population (0.58 vs. 0.45 worldwide; P = 0.010), as well as the higher frequencies of the genotypes AC (P = 0.02) and AC+CC (P = 0.025), suggest that this variant may be a risk factor for gastric cancer in this region.

PSCA (Prostate Stem Cell Antigen), a member of the LY-6 family of surface proteins, is abundantly expressed in the normal esophagus and stomach but undetectable in esophageal or gastric tumors. PSCA is expressed in differentiating gastric epithelial cells, and in vitro studies have detected its cell proliferation inhibition activity and silencing in gastric cancer [104].

Various SNPs in the PSCA gene have been associated with gastric cancer in Chinese, Korean, Japanese, Tibetan, and Caucasian populations [105,106,107,108,109,110]. The variant rs2976391 C>A is an intronic variant of the PSCA gene. In a meta-analysis with 81,961 cases affected by gastric cancer and 442,932 healthy controls, a strong association was found for this variant [111]. The frequency of allele A in our population was found to be greater than in other populations (0.57 vs. 0.41 in Iran, 0.4 in gnomAD, and 0.405 in 1000G). This suggests that this variant may play a role in conferring gastric cancer susceptibility in Ardabil province.

The PLCE1 (Phospholipase C epsilon 1) gene is involved in the regulation of cell growth, differentiation, and oncogenesis. The substitution of a C nucleotide with a T at residue 4406 of the PLCE1 gene results in the substitution of threonine amino acid with isoleucine, known as rs3765524. This variant has been reported to be correlated with an increased risk of gastric cancer in some populations, including Chinese and Kashmir Valley [52, 53]. Despite finding no association in the north of Iran [62], the higher frequency of allele T in Ardabil (0.41 vs. 0.35 in Iran and 0.31 in gnomAD) may be related to the high incidence of gastric cancer.

The frequency of allele C in the DNA repair gene ERCC2 rs3810366 is higher in the Ardabil population compared to other populations (Ardabil: 0.54; Iran: 0.52; 1000G: 0.42). This finding is consistent with studies conducted on southern Chinese and Taiwanese populations (112).

The differences in observed frequencies, whether significant or not, motivate researchers to conduct cohort studies in the Ardabil population. By doing so, we aim to identify prognostic factors that can help diagnose individuals predisposed to gastric cancer in this population.