High throughput DNA sequencing to detect differences in the subgingival plaque microbiome in elderly subjects with and without dementia
- First Online:
- Cite this article as:
- Cockburn, A.F., Dehlin, J.M., Ngan, T. et al. Investig Genet (2012) 3: 19. doi:10.1186/2041-2223-3-19
- 8k Downloads
To investigate the potential association between oral health and cognitive function, a pilot study was conducted to evaluate high throughput DNA sequencing of the V3 region of the 16S ribosomal RNA gene for determining the relative abundance of bacterial taxa in subgingival plaque from older adults with or without dementia.
Subgingival plaque samples were obtained from ten individuals at least 70 years old who participated in a study to assess oral health and cognitive function. DNA was isolated from the samples and a gene segment from the V3 portion of the 16S bacterial ribosomal RNA gene was amplified and sequenced using an Illumina HiSeq1000 DNA sequencer. Bacterial populations found in the subgingival plaque were identified and assessed with respect to the cognitive status and oral health of the participants who provided the samples.
More than two million high quality DNA sequences were obtained from each sample. Individuals differed greatly in the mix of phylotypes, but different sites from different subgingival depths in the same subject were usually similar. No consistent differences were observed in this small sample between subjects separated by levels of oral health, sex, or age; however a consistently higher level of Fusobacteriaceae and a generally lower level of Prevotellaceae was seen in subjects without dementia, although the difference did not reach statistical significance, possibly because of the small sample size.
The results from this pilot study provide suggestive evidence that alterations in the subgingival microbiome are associated with changes in cognitive function, and provide support for an expanded analysis of the role of the oral microbiome in dementia.
KeywordsCognitive impairment Oral disease Oral microbiome Subgingival plaque
analysis of variance
cognitive impairment without dementia
Mild Cognitive Impairment
Third National Health and Nutrition Examination Survey
operational taxonomic unit
pocket probing depth
Quantitative Insights into Microbial Ecology.
In addition to the hypothesized link between oral health and chronic systemic diseases, such as cardiovascular disease, stroke, and diabetes, there now appears to be an association between oral health and neurodegenerative diseases, ranging from mild to moderate loss of cognitive function to Alzheimer’s Disease (AD) . Poorer cognitive performance and tooth loss have been linked epidemiologically in both retrospective and prospective studies [2, 3, 4, 5, 6, 7], and tooth loss has been associated with an increased risk of both dementia and cognitive decline . Indeed, increasing tooth loss over time is associated with increased likelihood of low cognitive scores . Beyond epidemiological associations, independent lines of experimental evidence support the hypothesis that bacteria associated with diseases of the oral cavity contribute to neurodegeneration. Oral bacteria and bacteria closely related to those found in the oral cavity have been found at a higher frequency post mortem in the brains of patients with AD than in those of patients who did not have AD [9, 10]. In addition, the Third National Health and Nutrition Examination Survey (NHANES-III) provided evidence that gingival bleeding and loss of periodontal attachment were associated with lower cognitive function . Furthermore, subjects with high levels of antibody against the periodontal pathogen Porphyromonas gingivalis had significantly greater impaired verbal memory and subtraction test performance, and this finding remained robust when adjusting for potential sociodemographic and vascular confounders . Levels of immunoglobulin G (IgG) antibodies and serum tumor necrosis factor levels have been found to discriminate between normal subjects and AD patients , and recently Sparks Stein et al.  found that elevated antibody levels to periodontal disease bacteria were observed in subjects years before cognitive impairment, suggesting that periodontal disease could potentially contribute to the risk of AD onset or progression.
If oral health is linked to neurodegeneration, it is plausible that bacteria found in the oral cavity play a causal role in establishing this link. The collection of microorganisms in the oral cavity, the ‘oral microbiome’, has been studied using a variety of molecular methods that can identify both cultivatable and non-cultivatable bacteria within various ecological niches in the oral cavity [14, 15]. Despite the evidence linking oral health and cognitive function, there is a paucity of empirical data that assess the oral microbiome in patients with cognitive degeneration . Microbiome analysis could be used to determine whether the bacterial compositions of the oral microbiome or ‘bacterial signatures’ could serve as a predictive biomarker for increased risk of cognitive impairment. It is also possible that preventive treatment could target the makeup of the oral microbiome and the efficacy of such treatment could be monitored with this approach.
Recent advances in the availability and reduced costs of high throughput DNA sequencing and bioinformatics tools provide a broadly available and increasingly cost effective method to identify bacterial populations found in polymicrobial biofilms associated with human tissue, including the oral cavity [17, 18, 19]. The present study was undertaken to develop a sampling and analysis pipeline using next generation DNA sequencing technology that could be used to characterize microbial populations in subgingival plaque samples. Using an Illumina HiSeq1000 DNA sequencer and a sample preparation and analysis pipeline that enabled multiple samples to be sequenced within the same sequencing lane, we were able to generate and analyze economically more than one million bacterial DNA sequences from each of 15 subgingival plaque samples. The participants were enrolled in a study that assessed oral health and cognitive function among adults at least 70 years old from West Virginia, some of whom were from medically underserved communities . These sequences were analyzed using bioinformatics tools available in the publicly accessable software package Quantitative Insights into Microbial Ecology (QIIME) , and sequence comparisons were made among participants who were clinically assessed as normal or exhibited alterations in cognitive function. The results provide a road map for future efforts to use high throughput DNA sequencing to characterize the oral microbiome in the context of systemic disease, and provide preliminary evidence that differences exist in the bacterial composition of subgingival plaque in patients with alterations in cognitive function. In contrast to other approaches to microbiome analysis, high throughput sequencing holds out the promise of also being useful for metagenomic analysis of the oral microbiome to identify potential virulence factors that contribute to systemic disease.
Oral health screening, cognitive analysis, and sample collection
All samples were collected under a protocol reviewed and approved by the West Virginia University Institutional Review Board. The criteria for study participants were age 70 years or older, resident of West Virginia, community-living, and at least four natural teeth. Oral evaluations were performed by calibrated researchers using guidelines from the NHANES 1999 to 2000 . A psychometrician administered to the participants a battery of neuropsychological measures that assessed verbal and visual memory, language, executive function, orientation, praxis, and reading ability. Depression was assessed using the Geriatric Depression Scale . A proxy informant, usually a spouse or adult child, provided information about the participant’s cognitive function, functional limitations, medical history, and medications. All collected data were reviewed by two study psychologists and diagnoses were assigned within three cognitive categories: normal cognitive function, cognitive impairment without dementia (CIND), and dementia. The Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, criteria were used for the diagnosis of dementia . CIND was defined as mild cognitive or functional impairment reported by the participant or informant that did not meet the criteria for dementia, or performance on neuropsychological measures that was both below expectations based on reading ability and educational and occupational history, and at least 1.5 standard deviations below published norms on any test within a cognitive domain (for example, memory, orientation, language, executive function, praxis). Diagnoses were anchored by these criteria, but the final diagnoses were based on clinical judgment. Similar assessment and diagnostic procedures have been used and validated in multiple large epidemiological studies on cognitive impairment in later life [25, 26].
A total of 15 samples of bacterial DNA from 10 individuals were sequenced for this report, four of which (N1, N2, C2, C3) were obtained and partially analyzed during an earlier phase of the study . Eleven additional subgingival plaque samples were obtained using sterile periodontal curettes from pocket probing depths of 1 to 3 mm, 3 to 5 mm, or >5 mm. These samples were collected into tubes containing Invitek SalivaGene DNA stabilization buffer (STRATEC Molecular GmbH, Berlin, Germany). In most cases, multiple plaque samples from the same pocket probing depth in the same participant were pooled into one tube for DNA extraction.
DNA from the 11 new samples was purified using an Invitek PSP SalivaGene DNA Kit. As part of the purification procedure, 100 μg of lysozyme (Sigma-Aldrich, St. Louis, MO, USA) was added to each tube, the mixture incubated at 37°C for 10 minutes, and then processed according to the manufacturer's recommendations.
PCR and fragment purification
PCR primers, conditions for amplification of sequences in the V3 region of the 16S ribosomal RNA gene, and a multiplexed DNA sequencing strategy were as described in Bartram et al.  unless otherwise indicated. The V3 region varies in length by about 30 base pairs among different species of bacteria in the Greengenes database, and the sequences obtained and analyzed in this study showed a similar size variability. The amplicon ranges from 296 to 327 base pairs, of which 160 base pairs is the primer. High pressure liquid chromatography-purified PCR primers were obtained from Integrated DNA Technologies (Coralville, IA, USA). Purified DNA was amplified using an AccuPrime PCR Kit (Invitrogen Life Technologies, Grand Island, NY, USA) on an MJ Research PTC-200 Thermal Cycler using the following conditions: 95°C for 6 minutes denature; 95°C for 2 minutes, 50°C for 2 minutes, 72°C for 2 minutes 30 cycles; 72°C for 4 minutes extend. Each reaction contained 0.5 μl TAQ polymerase, 5 μl 10x buffer 1(600 mM Tris-SO4 (pH 8.9), 180 mM (NH4)2SO4, 20 mM MgSO4, 2 mM dGTP, 2 mM dATP, 2 mM dTTP, 2 mM dCTP, thermostable AccuPrime™ protein, 10% glycerol), 20 μM forward primer, 20 μM reverse primer, and up to 60 ng DNA in a total volume of 50 μl. PCR reactions were performed in triplicate and reaction products were pooled prior to purification. Because there was a low concentration of DNA in some of the samples, it was necessary to perform 30 cycles of amplification to obtain sufficient material to view on a gel. Pooled PCR products were purified by electrophoresis through 2% agarose in Tris/borate/EDTA gels and the bands corresponding to approximately 300 base pairs were excised and purified using a QIAquick Gel Extraction Kit (Qiagen, Valencia, CA, USA) according to the manufacturer's directions.
Indexed libraries were pooled so that 12 libraries were sequenced in each lane of the flow cell. Eight pmols of the pooled libraries were clustered onto an Illumina v2 sequencing flow cell using an Illumina cBOT. Libraries were then sequenced in a 2 x 125 bp paired-end strategy on an Illumina HiSeq1000, so that the forward and reverse reads could be assembled into a single contig. Reads were converted from Illumina bcl format to fastq format and separated into bins based on exact match to the index using CASAVA 1.8.2 (Illumina, San Diego CA, USA). An average of 6.7 million reads/sample passing the filter was sequenced in each library.
DNA sequence processing
Sequence files were initially processed by removing sequences corresponding to linkers and primers by automated batch processing using scripts written in-house. In an effort to reduce artifacts generated by sequencing errors, a strict quality filtering protocol was employed that reduced the number of analyzed sequences to approximately 35% of the total number of sequences generated. Nevertheless, an average of more than two million high quality reads was obtained from each sample. Quality filtering of DNA sequences was performed using the following steps: 1) Sequences were first filtered by the Illumina software to eliminate the poorest reads (Q score ≥ 30) and imperfect primer matches. 2) The forward and reverse sequences were matched to construct a sequence that spanned the entire region between the primers with a program written in-house. The original Illumina sequences were all 125 bases in length, which is where the run ended. The pairing strategy overlaid the two 3’ ends starting with an overlap of 58 bases. The overlap window was extended one base at a time to 89 bases until a perfect match was obtained in the overlap region. Any pair of sequences that did not match at 100% identity in any of the size windows was discarded. This step eliminated 56% of the sequences, which overwhelmingly had the lowest quality scores. In general, the sequence quality was better in the middle than at the end, so this preferentially eliminated sequences with sequencing artifacts. 3) Paired sequences with a Phred quality score of less than five were discarded. This removed a few remaining low quality sequences, especially any that had low quality in the regions between the primers and the overlap. 4) The sequences were clustered by matching against the Greengenes database, which is a curated collection of known bacterial 16S sequences. Sequences that did not match any of the known bacterial sequences with 97% identity were discarded. This removed chimeras and most major PCR artifacts and represented approximately 5% of the total remaining sequences. Matching at 97% identity meant that any single base PCR artifacts would be combined with the corresponding authentic sequence (since the region is about 100 bases long, up to three single base changes will be ignored). 5) The resulting table of operational taxonomic units (OTUs) was filtered to remove any sequences that appeared less than 150 times.
Finally, scripts written in-house in biopython were used to convert the filtered Illumina data to the FASTA format for analysis by QIIME for taxonomic assignment and measurements of microbial diversity, but scripts to do this are now part of QIIME. To process Illumina-generated files in QIIME, the file headers were changed to begin with ‘>sample_number’ where ‘sample’ is the sample number and ‘number’ is the number of the sequence in the file. All of the sequences were then combined into one file for analysis with QIIME. DNA sequences generated and analyzed in this study can be found at the National Center for Biotechnology Information Sequence Read Archive, project number SRA057340.
All QIIME analyses were performed on a virtual server hosted by Amazon Web Services using an existing QIIME image. The server had the following specification: QIIME 1.4.0 EBS East XLARGE (ami-438d5b2a). The following QIIME scripts were used during analysis and default parameters were used unless otherwise noted: 1) ‘pick_reference_otus_through_otu_table.py’ matched sequences at 97% sequence identity with OTUs associated with specific bacterial phylotypes in the Greengenes database (4Feb2011); 2) ‘summarize_taxa_through_plots.py’ generated bar graphs of the relative abundance of different taxa in each sample; 3) ‘alpha_rarefaction.py’ generated alpha rarefication plots; 4) ‘pick_rep_set.py’, ‘align_seqs.py’, ‘filter_alignment.py’, and ‘make_phylogeny.py’ were chained to generate a phylogenetic tree of the OTUs; 5) ‘beta_diversity_through_plots.py’ (using the phylogenetic tree and the weighted UniFrac option) generated a beta diversity table and principle coordinate plots for the inter-subject diversity; and 6) ‘otu_category_significance.py’ generated analysis of variance (ANOVA) scores for all OTUs versus various categories. This script calculated raw, Bonferroni corrected, and false discovery rate corrected probabilities.
Data analyses involved logistic regression as implemented in JMP/Pro Software (version 9.0.2) and random forests as implemented in R Software .
Demographics and health status of study participants
Demographics and health status of study participants
Number of teeth
Number of coronal caries
Number of root caries
Generation and filtering of DNA sequences
Number of DNA sequences obtained during processing
Initial Number of Sequences
Successful end pairing
PHRED score >5
OTUs identified and analyzed
Population diversity in samples
The two purposes of this study were to develop a sample preparation and analysis pipeline to assess the oral microbiome using high throughput DNA sequencing, and to expand an ongoing study on the relationship between oral health and cognitive function in older West Virginians.
The major advantage of the Illumina platform is its capacity to generate millions of reads from each sample. Because of the relatively short read lengths, care must be used in choosing an appropriate region of the 16S RNA gene for analysis using the Illumina platform. The V3 region was selected because the primers used are the same as those used for older methods of bacterial community analysis, and this region had been used previously in Illumina-based analysis of microbial communities from environmental samples . The region amplified in this study is longer (170 to 190 bases) than the V6 region (105 to 120 bases)  or the V5 region (approximately 82 bases)  sequenced in other studies. Using the PCR primers described in Bartram et al.  it was possible to run up to 12 samples per sequencing lane in this study, thereby substantially reducing the cost of the analysis. However, a challenge to using this system for microbiome analysis is the relatively short read lengths that are typically generated in a run (approximately 125 bp) and the lower quality of many of these reads. These disadvantages are obviated by using a paired-end sequencing approach, and successful microbiome analyses of various environmental niches  including the oral cavity [18, 31] have been documented. Furthermore, recent additions to the QIIME program have streamlined analysis of Illumina-generated data. We used the Greengenes database to identify the taxa corresponding to our sequences. About 5% of our sequences were not found in Greengenes; we believe that most of these are artifacts, but it is possible that a small number of rare OTUs could have been excluded, which limits the utility of this approach for identifying very rare phylotypes with a high level of confidence. Nevertheless, we successfully obtained millions of sequences from each sample, yielding profound details of the structure of the microbiome in subgingival plaque.
Although the main goal of this pilot study was to work out methods for obtaining high quality data and performing subsequent analysis using validated, universally available software and databases, two interesting observations were made during the phylogenetic analysis of the data. First, a very high level of Fusobacteria was found, particularly in the samples from normal and CIND participants. Fusobacteria are well-studied anaerobes that have been found with great frequency in the oral cavity using culture-independent analyses [32, 33, 34, 35], and members of the genus Fusobacterium were previously found to be among the most commonly identified species in the oral cavities of elderly patients [34, 35], particularly in association with root caries . A second novel observation was that the levels of Fusobacteriaceae were lower, and that levels of Prevotellaceae were higher in samples from subjects with dementia compared to subjects without dementia. We had hundreds of taxa in our results, so by chance some of them would likely appear to be correlated with dementia. However, Prevotellaceae and Fusobacteriaceae are the two most abundant families of bacteria, and antibody levels to individual species in those families have been shown to increase to higher levels in people who develop dementia than in those who do not .
There are four possible explanations for the correlations between dementia and components of the microbiome: 1) the correlations are spurious due to the small sample size; 2) dementia affects the microbiome; 3) the microbiome affects dementia; and 4) a third variable affects both.
First, we acknowledge that the sample size is small and that many more subjects need to be evaluated to obtain a robust result. Whether a larger sample size will confirm these preliminary observations is an open question.
Second, it might seem self-evident that individuals with dementia have poor oral hygiene resulting from changes in diet or oral hygiene behavior, and therefore worse oral health than individuals without dementia. As expected, the participants with dementia in this study had on average, slightly more gingivitis, fewer teeth, more caries, and much higher plaque indices. However, while this is true on average, it was not always the case on an individual basis. Participant Normal 2 had poor oral health while participants Dementia 1 and Dementia 5 had relatively good oral health, albeit with fewer teeth. Participant Dementia 2 had the highest number of teeth of all those in the study. If dementia causes poor oral health, which in turn causes the changes in the microbiome, then the correlations between the directly related parameters (cognition and oral health, or oral health and the microbiome) should be higher than the correlation between the indirectly related parameters (cognition and the microbiome). Since we found the opposite, the data do not support the hypothesis that the observed differences are merely secondary effects of poor oral hygiene in subjects with dementia.
We found more Prevotella on average in the samples from participants with dementia than in the samples from participants without dementia. However, the difference was not large and the statistical significance of that finding was dependent on the statistical test used to analyze the data. The number of Prevotellaceae phylotypes was high in both groups of samples, supporting many previous studies that showed diversity in Prevotellaceae phylotypes/species in the oral cavity . In addition, there was a slight but statistically significant increase in the number of distinct OTUs in the dementia samples compared to the non-dementia samples, raising the question of whether there are phylotypes in the Prevotellaceae that contribute to dementia. At the species and strain levels there are examples of specific genes that could potentially contribute to virulence within the Prevotellaceae family including genes that encode fimbrial adhesins, phospholipases, host-resistance factors, adenine-specific DNA-methyltransferase and 8-amino-7-oxononanoate synthase [36, 37]. Species-specific insertion sequences have also been identified , but whether these or other genes are disproportionately expressed in dementia patients and play a role in disease awaits metagenomic analyses. There were no other predominant phylotypes found in higher levels in participants with dementia compared to non-dementia, arguing against the idea that the presence of certain bacteria promotes dementia. However, the fact that higher levels of Fusobacteriaceae were found in all samples from participants without dementia suggests an alternate explanation, that perhaps certain oral bacteria provide protection against dementia, possibly by filling environmental niches that could be populated by more inflammatory microorganisms, by actively suppressing local or systemic inflammatory responses, or by producing biomolecules that are neuroprotective.
The final possibility is that both dementia and the microbiome are affected by a third variable. There is a strong genetic link to some forms of dementia, including the presence of the APOE-e4 variant of the Apolipoprotein E gene . It is possible that the presence or absence of specific taxa could be due to genetic factors in the subject such as host immune responses, expression of adhesion molecules on host tissues that affect bacterial adherence, or other undefined factors. The relationship between human genotype and the oral microbiome needs to be studied carefully.
Sparks Stein et al.  found elevated levels of antibodies to Prevotella intermedia and Fusobacterium nucleatum in the blood of subjects who later developed AD. These investigators also found that subjects with Mild Cognitive Impairment (MCI), unlike AD subjects, had no differences in P. intermedia and F. nucleatum compared to normal subjects, but had reduced levels of antibodies to several other oral bacteria. Similarly, we found that our normal and CIND subjects did not separate based on their microbiome beta diversity and, in particular, that their Prevotellaceae and Fusobacteriaceae were similar. We hypothesize that our results can be reconciled with those of Sparks Stein et al. by predicting that subjects who will develop dementia have a leakier sub-gingival compartment resulting in increased interaction between the microbiome and the immune system, leading to higher antibody levels to the most prevalent bacteria: Fusobacterium and Prevotella. Compared to Prevotella, Fusobacteria are much less genetically diverse at the 16S gene, so they might be more sensitive to elevated serum antibody levels because of less diversity of surface proteins that could serve as targets for antibodies. Thus, later in life one might predict that higher levels of antibody might reduce levels of Fusobacteria yet fail to be as effective against genetically diverse Prevotella. Alternatively, it is possible that the difference in findings for Fusobacteriaceae might be because Sparks Stein et al. were using antibodies that would differentiate strains on the basis of surface proteins while we used 16S ribosomal sequences.
In summary, our results demonstrate, via high throughput DNA sequencing, that substantial inter-person variability exists in the oral microbiome of subgingival plaque. There appears to be a consistent difference in the levels of Fusobacteriaceae, and perhaps Prevotellaceae, in samples from patients who do or do not have dementia, which should be studied in more detail.
We have shown that high throughput DNA sequencing is an effective and inexpensive method for analyzing the microbiome of oral subgingival plaque from individual subjects. It is sensitive enough to provide a measure of the bacteria from a single sampling site. Substantial inter-person variability exists in the sub-gingival plaque microbiome, while there is generally little variation at depths ranging from 1 to 5 mm in an individual subject's mouth. There appears to be a consistent difference in the levels of Fusobacteriaceae, and perhaps Prevotella, in samples from patients who do or do not have dementia, which should be studied in more detail.
This project was funded in part by the National Institute of Dental and Craniofacial Research (1R21DE016970, PI: Bei Wu) and awards from the National Center for Research Resources/National Institute of General Medical Sciences (2P20RR016477/8P20GM103434) to the WV-IDeA Network of Biomedical Research Excellence (WV-INBRE). Assistance with statistical analysis was provided by Gerald R. Hobbs and Mark V. Culp of the West Virginia University Department of Statistics. Joan Olson kindly provided DNA samples from the earlier study. The authors thank the study participants and their families.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.