Background

Dental caries is a complex disease influenced by genetic and environmental factors, including diet, oral hygiene, oral bacteria such as Streptococcus mutans, tooth morphology and placement, the composition and flow rate of saliva, fluoride exposure, and access to oral health care [1,2,3,4]. Genetic determinants of caries differ, in part, based on tooth surface and tooth type (primary versus permanent) [5, 6]. Etiological mechanisms can additionally involve gene-by-sex and gene-by-environment interactions [7, 8].

According to the National Health and Nutrition Examination Survey (NHANES), caries affects the majority of children (i.e., 23% by age 5 years, 56% by age 8, 67% by age 19), and adults (91%) and is the most common chronic disease in the United States [9,10,11]. Lack of treatment leads to serious co-morbidities that greatly impair quality of life [9].

Although caries has declined in the United States since the mid-twentieth century, the caries rate in young children has increased in recent years, and disparities persist between racial/ethnic, demographic, and socioeconomic groups [10,11,12]. Caries prevalence in primary teeth is 42% higher in non-Hispanic black children compared with non-Hispanic Caucasian children. Non-Hispanic black children have double the rate of untreated tooth decay in primary teeth compared to non-Hispanic Caucasian children [11], and among adults, non-Hispanic blacks have nearly double the rate of untreated decayed teeth (42%) of non-Hispanic Caucasians (22%) [10].

Some disparity is explained by sociocultural differences between racial groups. African Americans are less likely to have access to and utilize oral health care [13, 14]. Other factors include differences in caretaker fatalism and oral health education [15], socioeconomic status, and transmission of cariogenic bacteria [16]. Genetic differences in caries predisposition are known: the 2% of African American children with localized juvenile periodontitis – a disease more common in African Americans – have fewer carious teeth than others, likely due to a variant in the gene encoding a protective component of saliva [17]. Other differences include those in immunity genes and propensity toward cariogenic oral flora [18]. While inter-racial genetic differences influence dental features [19], there is a dearth of studies on the role of genetics in differences in dentition across racial and ethnic groups.

Although dental caries is estimated to be 30–50% heritable [1, 5, 6, 20], few specific caries-related genes have been discovered, with the majority of these identified in Caucasians [21]. Yet, it is known that some complex diseases exhibit differences in their predominant genetic architecture across races [22,23,24]. Genetic markers for disease vary in frequency between races, and the effect sizes of the genetic variants can display large heterogeneity [25]. Indeed, up to 25% of GWAS tagSNPs show effect heterogeneity by ancestry [26]. Thus it is possible that there are different genetic risk factors for caries operating between races, or that the effects of risk variants are dissimilar. In spite of this, adequate information is lacking regarding the disease process in vulnerable groups such as racial/ethnic minorities; in particular, few studies have focused on the oral health of African Americans [12]. Genome-wide association studies (GWAS) of dental caries in African American samples have not been performed, and although African-Americans are a large US minority group, little work has been done to understand their dental genetics. In this study, we describe a pilot caries GWAS in African American children and adults to generate hypotheses about the genetics of dental caries in African Americans. We consider primary and permanent dentition separately because previously work has estimated that only 18% of covariation in primary vs permanent tooth caries is due to common genetic factors [6]. Furthermore, we compare the GWAS scans in African Americans to analogous analyses in Caucasian children and adults to determine whether there is heterogeneity present between the two racial groups.

Methods

Study sample

One hundred nine African American adults (aged > 18 years) and 96 African American children (3–12 years) were recruited through the Center for Oral Health Research in Appalachia (COHRA, cohort COHRA1), a joint study of the University of Pittsburgh and West Virginia University [27]. Briefly, all participants provided consent or assent with written parental informed consent, in accordance with the Institutional Review Board policies of the University of Pittsburgh and West Virginia University. Two clinical examination sites were located in Pennsylvania and four in West Virginia. Admixed African ancestry was verified using Principal Component Analysis (PCA) with respect to HapMap controls from Europe, Asia, Africa, and Central/South America. Participants were genotyped for approximately 550,000 single nucleotide polymorphisms (SNPs) using the Illumina Human610-Quad Beadchip (Illumina, Inc., San Diego, CA). Genetic data were rigorously cleaned and quality-checked as previously described [28], and imputed to the 1000 Genomes Project (June 2011) phase 1 reference panel using SHAPEIT (for pre-phasing) [29] and IMPUTE2 [30]. SNPs were filtered for INFO score > 0.5, and MAF > 5% (separately for each age group). SNPs were not filtered for HWE due to the admixed nature of the African American population. Quality filters included participant call rates > 90% and SNP call rates > 99%. Approximately 4.9 million SNPs passed quality control and were included in the GWASs. Identical analyses were performed in COHRA-recruited cohorts of 918 Caucasian adults and 983 children (results for these cohorts have been previously published) [28, 31]. The same filters were used in Caucasians (separately for each age group) along with a filter for HWE (p-value > 10− 4). STROBE guidelines were followed for this observational study.

Quantitative caries phenotypes

Ascertainment of caries status was conducted with a dental explorer by either a licensed dentist or a dental hygienist. The assessments were done in exam rooms with a dental chair and dental examination light on dried teeth, and were mutually calibrated at the start of the study and several times over the course of data collection via a review of data collection techniques followed by reliability testing [27]. Inter- and intra-rater reliability of caries assessments was high [27]. From these assessments, the following caries phenotypes were generated: the DMFS index (Decayed, Missing, and Filled Tooth Surfaces) and DMFT index (Decayed, Missing, and Filled Teeth) in adults, and the dfs index (decayed and filled deciduous tooth surfaces) and dft index (decayed, and filled deciduous teeth) in children. These caries indices represent the count of affected tooth surfaces or teeth, in accordance with the World Health Organization DMFS/dfs or DMFT/dft scales [32] and established dental caries research protocols [33, 34]. For 31 of the 96 children in the African American pediatric cohort with mixed dentition, and 378 of 983 children in the Caucasian pediatric cohort with mixed dentition, both DMFS/DMFT and dfs/dft indices were scored at the time of the assessment. For the purposes of this study only dfs/dft measures were tested for association in the pediatric cohorts. White spots were included in the DMFS/DMFT and dfs/dft counts because their inclusion has been shown to increase caries heritability estimates and thus improve power to detect association in gene mapping [6].

Statistical model

The GWASs were performed separately in adults (for DMFT and DMFS) and children (for dft and dfs) using linear regression while adjusting for age, sex, and two principal components of ancestry in PLINK v1.9 [35]. Statistical significance was determined using adaptive imputation with a maximum number of 1,000,000 permutations per SNP as implemented in PLINK. P-value thresholds incorporated the burden of multiple testing: genome-wide significance was defined as p-value less than 5 × 10− 8 and suggestive significance as p-value less than 5 × 10− 6. Results were visualized in Manhattan plots using R (v3.2.0) [36].

Results annotation and comparison with Caucasian caries GWASs

Genes within 500 kb of the top associated SNP in each locus were queried for corroborating biological connections to dental caries in public databases, including OMIM, PubMed, and ClinVar. In addition, GREAT [37] was used to assess the functions of cis-regulatory regions of the associated loci using default parameters.

Heterogeneity in effect sizes between the GWAS results of African Americans and Caucasians were compared via Cochran’s Q statistic. The effect sizes for the lead SNPs at suggestive (p-value ≤5 × 10− 6) loci observed in African Americans were compared with the effect sizes of the same SNPs in Caucasians, if present. Not all suggestively-associated lead SNPs in African Americans were tested for heterogeneity because MAF and quality controls filters yielded different sets of SNPs retained for African Americans and Caucasians. Specifically, the numbers of loci tested for heterogeneity were 17 of 25 for DMFT, 11 of 12 for DMFS, 20 of 26 for dft, and 12 of 18 for dfs. The genome-wide significance threshold for heterogeneity tests was p-value ≤5 × 10− 8.

Results

Four GWASs of indices of dental caries were performed: DMFS and DMFT in 109 African American adults, and dfs and dft in 96 African American children. Cohort demographics are shown in Table 1. The GWAS in African Americans did not yield associations at genome-wide significance (p-value ≤5 × 10− 8) for any phenotype (Fig. 1), while several loci with potential roles in caries etiology were associated at suggestive significance (p-value ≤5 × 10− 6).

Table 1 Demographics of African-American and Caucasian cohorts included in the study
Fig. 1
figure 1

Manhattan plots for the permuted results of a permanent DMFT b permanent DMFS c primary dft, and d primary dfs GWASs. P-values are log10-transformed. The red line signifies genome-wide significance (p-value ≤5 × 10− 8), and the blue line signifies suggestive significance (p-value < 5 × 10− 6)

GWASs of caries in the permanent dentition in African Americans

The GWAS of DMFT yielded 94 suggestive (p-value ≤5 × 10−6) SNPs across 25 distinct loci. The GWAS of DMFS yielded 23 suggestive SNPs across 11 distinct loci. These loci and corroborating evidence for nearby genes are listed in Table 2 (DMFT) and Table 3 (DMFS). Many of the top loci for the two phenotypes overlapped (rs6947348, rs12171500, chr3:194035416, rs12488352, rs1003652). GREAT regulatory analysis results are available in the Appendix.

Table 2 Suggestive loci observed for DMFT
Table 3 Suggestive loci observed for DMFS

GWASs of caries in the primary dentition in African Americans

The dft GWAS yielded 46 suggestive SNPs across 17 distinct loci. The dfs GWAS yielded 32 suggestive SNPs across 17 distinct loci. Two loci overlapped between dfs and dft (rs2012033 and rs74574927/rs78777602). One notable suggestive locus, indicated by rs2515501 (p-value 4.54 × 10− 6), harbors antimicrobial peptide DEFB1. Gene annotations for the suggestive loci (p-value ≤5 × 10− 6) are listed in Table 4 (dft) and Table 5 (dfs). GREAT regulatory analysis results are available in the Appendix.

Table 4 Suggestive loci observed for dft
Table 5 Suggestive loci observed for dfs

Comparison with Caucasian caries GWAS

Results of the tests for heterogeneity between African Americans and Caucasians are listed in Table 6. Significant (p-value ≤5 × 10− 8) heterogeneity in effects between racial groups was observed for 50% of the loci in children, and 12–18% of loci in adults.

Table 6 Loci showing significant heterogeneity between African Americans and Caucasians caries GWASs

Discussion

Dental caries is a complex disease that disproportionately affects certain groups, including African Americans.

This is one of few studies of the genetics of dental caries to specifically investigate African Americans. The purpose of this pilot study was to perform preliminary GWAS scans in African American children and adults and to contrast the evidence for genetic association between Africans Americans and Caucasians.

Though no significant associations were observed (which was expected given the small samples sizes), several suggestive loci showed strong evidence of genetic heterogeneity between African Americans and Caucasians. These findings suggest that the genetic architecture of dental caries differs across racial groups. Thus, gene-mapping efforts in African American and other minority racial groups are warranted, and may lead to the discovery of caries risk loci that would go undetected by studying Caucasians alone.

Several suggestive loci harboring genes with putative connections to caries were observed. Given the exploratory nature of this study, we describe suggestive hits to potentially help inform new hypotheses about caries genetics. We caution that these suggestive loci should be interpreted with much skepticism.

GWASs of permanent dentition in African Americans

Several themes emerged from annotation of suggestively associated genes, including saliva-, salivary gland-, and salivary proteome-related genes. A gene encoding a salivary protein involved in inflammatory processes (KLK1; rs4801855; p-value 3.24 × 10− 6) [85, 86], a transcription factor differentially expressed in the minor salivary glands between the sexes (LSG1; chr3:194035416; p-value 1.6 × 10− 7) [51], and a gene encoding a salivary protein (CTSB; rs2838538; p-value 4.34 × 10− 6) were identified.

Several genes related to the immune response and periodontal disease were identified. HES1 (chr3:194035416) encodes a transcription factor with roles in antimicrobial response within epithelial cells [49]. NOD1 (rs66691214; p-value 7.24 × 10− 7) encodes a dental pulp protein with roles in sensing caries-related [78] and periodontal pathogens [79, 80], and the subsequent immune response [78, 81]. Protein products of several genes are involved in innate immunity [64, 88] (SIGLEC9, CD33; rs4801855; p-value 3.24 × 10− 6 and SLC5A12; rs7107282; p-value 3.21 × 10− 6). PTGER3 (rs74086974; p-value 3.18 × 10− 6) is a candidate gene for the outcome of periodontal disease therapy [38], and MIR186 (rs74086974) is differentially expressed between gingiva in health versus periodontitis [41]. rs28503910 (p-value 4.84 × 10− 6) contained MIR1305, which is upregulated in response to smoking and may impair regeneration of periodontal tissues in that state [52]. TRPM2 (rs2838538; p-value 4.34 × 10− 6) encodes an ion channel upregulated in dental pulpitis [137], and is involved in saliva production [138].

Tooth and enamel development-related genes were present across several loci, including a gene associated at nominal significance, TUFT1 (rs11805632; p-value 5.15 × 10− 6), which had previously been found to be associated with dental caries in Caucasian children and adults, and which displays interaction with fluoride exposure [8]. Additional genes included HS3ST4 (rs72787939; p-value 2.20 × 10− 7), which encodes a co-receptor essential for submandibular gland and tooth progenitor function [82]. Genes with roles in dental stem cells (MIR148A; rs6947348; p-value 1.38 × 10− 6) [59], and a locus with genes involved in tooth development (IQGAP2; rs12171500; p-value 1.96 × 10− 6) [53], enamel formation (F2R) [56], deciduous tooth pulp (CRHBP) [55], and ameloblastoma (S100Z, SNORA47, IQGAP2) [53, 54], were found. Also, previously-mentioned HES1 (chr3:194035416) has a role in tooth development [48], and taste cell differentiation [50]. The rs2317828 locus (p-value 1.55 × 10− 6) contains genes that play a crucial role in odontogenesis (PLCG2) [56] and ameloblast development (CDH13) [70]. LGR4 (rs7107282; p-value 3.21 × 10–6) is required for the sequential development of molars [66]. FOXF2 (rs2814820; p-value 3.90 × 10− 6) and TAF1B (rs1003652; p-value 4.54 × 10− 6) are near a cleft lip [139] and cleft lip and palate risk loci [88], respectively. FOXF2 also encodes a protein located near tooth germ cells during tooth development [140]. The rs1003652 (p-value 4.54 × 10− 6) locus includes several genes that are differentially expressed between various dental, bone, or gingival tissues (GRHL1, PDIA6) [44, 46], and one involved in odontoblast development (KLF11) [45].

Finally, several genes are involved in monogenic disorders with dental phenotypes, including SNX10 (malignant osteopetrosis of infancy, which can have features of delayed tooth eruption, missing or malformed teeth; rs6947348; p-value 1.7 × 10− 7) [61], a locus containing POLD1 (mandibular hypoplasia, deafness, progeroid features; rs4801855; 3.24 × 10− 6) [83], ACPT (hypoplastic amelogenesis imperfecta) [84], KLK4 (hypomaturation amelogenesis imperfecta) [87], a locus containing AIRE (autoimmune polyendocrinopathy candidiasis-ectodermal dystrophy, which can feature dental abnormalities; rs2838538; p-value 4.34 × 10− 6) [72], and TSPEAR (ectodermal dysplasia causing hypodontia) [74].

The locus chr16:28719857 (p-value 4.36 × 10− 6) contains genes associated with body fat percentage (APOBR) [67] and BMI (SH2B1) [68], and rs12154393 (p-value 3.06 × 10–6) contains THSD7A, a candidate gene for obesity [58].

GWASs of primary dentition in African Americans

The locus near rs2012033 was associated in both primary caries GWASs (dft p-value 8.21 × 10− 7; dfs p-value 1.40 × 10− 6) and harbored a candidate gene for hypodontia (CHST8) [129] and a gene associated with obesity and preference for carbohydrate (KCTD15) [130]. Other loci with connections to obesity and related disorders include chr13:96271864 (p-value 3.62 × 10− 6) that harbors the obesity-associated gene HS6ST3 [123], rs422342 (2.39 × 10− 6), which includes MAP 2 K5, also associated with BMI [125], and rs6483205 (p-value 1.24 × 10− 6) which contains MTNR1B, polymorphisms in which are associated with fasting glucose [134] and type 2 diabetes [135].

The locus rs2515501 (p-value 4.54 × 10− 6) harbored several members of the alpha and beta defensin family of antimicrobial peptides [141], which are involved in chronic periodontal inflammation [116] and oral carcinogenesis [117]. Of note, this locus contains DEFB1, polymorphisms in which are associated with a > 5 fold increase in DMFT and DMFS scores [114], and general DMFT index [115]. An additional gene at this locus, ANGPT2, is also associated with oral cancer, and upregulated in response to P. gingivalis, a periodontal pathogen [113].

Three separate associated loci harbored genes associated with complex periodontal traits, proxies for different subgroups of periodontal disease, a condition closely associated with dental caries [142]. rs1235058 (p-value 3.14 × 10–6) harbored HPVC1, a candidate gene for a trait involving a mixed infection bacterial community [107]. rs7630386 (p-value 9.51 × 10− 7) harbored RBMS3, a candidate gene for a trait involving a high periodontal pathogen load [107]. Thirdly, rs17606253 (p-value 1.85 × 10− 6) harbored TRAF3IP2, a protein implicated in mucosal immunity and IL-17 signaling, and associated with a trait involving high levels of A. actinomycetemcomitans and a profile of aggressive periodontal disease [107].

Two loci were found to be related to asthma, a disease associated with a doubled risk of caries [143]. rs12125935 (p-value 2.78 × 10− 6) harbors PYHIN1, which encodes a protein involved in inflammasome activation in response to pathogens [94], and represents an asthma susceptibility locus specific to African-American ancestry [95]. rs11741099 (p-value 2.93 × 10− 6) is intronic to ADAMTS2; the ADAMTS protein family is proposed to play a role in asthma [105]. Additionally, homozygous mutations in ADAMTS2 cause Ehlers-Danlos syndrome (VIIC), features of which can include multiple tooth agenesis and dentin defects [104].

rs7174369 (p-value 1.72 × 10− 6) harbored IGF1R, involved in dental fibroblast apoptosis [127]. Interestingly, in addition to its receptor, the regulator of hard dental tissue encoded by IGF1 was also associated at a separate locus (rs79812076; p-value 2.17 × 10− 6).

Comparison between association results across dentition type and across races

Aside from TUFT1 and DEFB1, the loci reported here have not been associated with dental caries in previous studies, which have largely comprised Caucasian individuals. This is in line with previous research showing differences in frequencies of risk alleles for complex disease across races, but may also be because the study was underpowered to detect associated loci in African Americans. In addition, no overlap was found in associated loci between this study and a multi-ethnic pilot GWAS of early childhood caries [144]. There was no overlap in loci associated with primary and permanent caries indices, but this might be expected given that the genetic determinants of caries are thought to largely differ between the dentitions [6]. However we cannot rule out similarities in genetic determinants across dentitions because this pilot study was not designed to have sufficient power for this purpose.

Loci displaying significant heterogeneity between African Americans and Caucasians (Table 6) in permanent dentition were largely ones in gene deserts with unknown function. One locus (rs12171500; DMFT Q statistic [Q] p-value 6.46x− 10; DMFS Q p-value 3.37x− 12) contained genes involved in enamel and tooth development.

Among loci displaying significant heterogeneity in primary dentition, there were several that harbored genes related to periodontitis. Such loci represented genes related to periodontal inflammation (rs2515501; Q p-value 4.39x− 10), gingival healing (rs9915753; dft Q p-value 1.81x− 07, dfs Q p-value 1.47x− 10), and aggressive periodontal disease and high levels of oral A. actinomycetemcomitans (rs17606253; Q p-value 1.41x− 9). Notably, African American pre-teens are approximately 16 times as likely as Caucasian ones to have localized aggressive periodontitis and detection of A. actinomycetemcomitans is associated with early surrogates for periodontal inflammation in African American preadolescents [145].

Several broad categories of genes associated with caries in African Americans emerged, including those involved in tooth/enamel development, those causing single-gene disorders with craniofacial or dental malformations, those involved in immune response or periodontitis, those related to salivary glands and proteins, and those associated with obesity. These results support the known multifactorial nature of dental caries [21]. Further studies will be necessary to confirm the loci nominated in this pilot study. Nevertheless, these GWASs provide valuable insight into the differences in the genetic architecture of caries across populations, and suggest new candidate genes worth following-up in hypothesis-driven studies.

Study limitations

This study has limitations, including the genotyping platform, which was not optimized for genomic coverage of the African American population [146, 147]. Thus, studies in larger African American cohorts and with denser chips are needed to identify risk loci that may not have been well represented in this study. The ascertainment of caries was limited by the lack of X-ray examination to confirm white spots and approximal tooth surface caries, which would have underestimated the true extent of caries counts. Imprecision in the caries assessment would lower the power to detect association, but would not result in false positive associations. Therefore, the associations observed in this study would likely not be influenced by this limitation, but other true associations may have gone undetected. The pediatric cohort analyses were somewhat limited in that the primary caries indices (dfs/dft) were tested for genetic association in a sample that included some children with mixed dentition. Limiting the scope of the pediatric analyses to solely primary dentition caries indices allows for simplified interpretation of the association results because genetic determinants of primary and permanent tooth caries have been found to differ [6]. However, assessing dfs/dft scores in the mixed dentition provides an incomplete picture of the caries experience in the primary dentition, given the exfoliation of some teeth. This is another important source of measurement error, which would bias our analysis toward the null hypothesis of no association.

Conclusions

In summary, these results suggest that there may be genetic differences in caries susceptibility, and potentially differing genetic etiologies or differentially distributed genetic risk factors, across racial groups. Indeed, addressing the oral health disparity gap is a national priority according to both the US Surgeon General’s Oral Health in America report [12] and the Healthy People 2020 public health goal framework [148]. This oral health disparity has parallels in the research sphere - relatively little work, to date, has been done on the genetics of caries in African Americans. Furthermore, African Americans represent a segment of the population traditionally underrepresented in biomedical research (UBR) and the importance of including such groups in research is recognized as foundational to the future of precision medicine by the National Institutes of Health initiative, All of Us [149]. Larger gene-mapping studies are thus needed in this population to help alleviate its disproportionate burden of the disease.