Identification of Household Bacterial Community and Analysis of Species Shared with Human Microbiome

Microbial populations in indoor environments, where we live and eat, are important for public health. Various bacterial species reside in the kitchen, and refrigerators, the major means of food storage within kitchens, can be a direct source of food borne illness. Therefore, the monitoring of microbiota in the refrigerator is important for food safety. We investigated and compared bacterial communities that reside in the vegetable compartment of the refrigerator and on the seat of the toilet, which is recognized as highly colonized by microorganisms, in ten houses using high-throughput sequencing. Proteobacteria, Firmicutes, Actinobacteria, and Bacteroidetes were predominant in refrigerator and toilet samples. However, Proteobacteria was more abundant in the refrigerator, and Firmicutes was more abundant in the toilet. These household bacterial communities were compared with those of human skin and gut to identify potential sources of household bacteria. Bacterial communities from refrigerators and toilets shared more species in common with human skin than gut. Opportunistic pathogens, including Propionibacterium acnes, Bacteroides vulgatus, and Staphylococcus epidermidis, were identified as species shared with human skin and gut microbiota. This approach can provide a general background of the household microbiota and a potential method of source-tracking for public health purposes. Electronic supplementary material The online version of this article (doi:10.1007/s00284-013-0401-y) contains supplementary material, which is available to authorized users.


Trimming primer sequences
The profiles of the V1−V3 regions of the 16S rRNA gene were used to trim primer sequences by hmm-search within HMMER (Eddy, 2011). Trimming primer sequences is essential for removing the Roche-454 adaptor sequence and primer sequences used for amplification in primer regions that could generate inaccurate results. Sequence reads without target primer sequences were eliminated in subsequent steps because these reads could be generated by sequencing errors.

Assembly of reads into representative sequences
Pyrosequencing can generate homopolymers in sequences, which may bias the results of microbial community analysis. To correct this problem, we generated representative sequences from clusters by the following process. (1) Each read was converted to an artificial sequence that had homopolymeric regions condensed to a single base. (2) Identical sequences and subsequences of longer sequences were clustered. (3) A consensus sequence was generated for each cluster with the original sequences 2 of the cluster by using multiple alignments. (4) Generated consensus sequences were arranged by sequence length, and they were clustered again, allowing less than two base mismatches based on the error rate of 454 sequencing technology, which is reported at 0.5% (Huse, et al., 2007). (5) The longest consensus sequence was selected from the clustered sequences as the representative sequence.
Representative sequences were used to assign taxonomic positions.

Taxonomic assignments of individual reads
Each read was identified using hierarchical taxonomic information in the EzTaxon-e database (Kim et al., 2012) and robust pairwise global sequence alignment. The 5 sequences most similar to each pyrosequencing read were identified by a BLASTN search against the EzTaxon-e database, and the pairwise similarities between the query and the 5 most similar sequences were calculated by global pairwise alignment (Myers & Miller, 1988). Taxonomic classifications were carried out using the criteria of ≥97% similarity for species, ≥94% for genus, ≥90% for family, ≥85% for order, ≥80% for class, and ≥75% for phylum. If the sequence similarity was below the criteria value, the sequence was assigned to the "unclassified" group at the corresponding taxonomic ranks.

Filtering chimera sequences
Artificial products (chimera sequences) are generated during PCR amplification. They can affect the analysis of microbial communities. To remove chimera sequences, sequences that did not match to the EzTaxon-e database at 97% similarity were subjected to a chimera check process. The EzTaxon-e database was used as first screening tool to choose chimera sequences, because it consists of manually curated high quality and non-chimeric sequences (Kim, et al., 2012). The detection of chimeras was conducted using the UCHIME program (Edgar, et al., 2011).

Calculation of diversity indices 3
The cutoff value for determining operational taxonomic units (OTUs) is generally 97% sequence similarity. These OTUs can be considered as species in pyrosequencing data analysis. Therefore, we calculated the diversity indices of samples using the following three different methods.
(3) TDC-TBC method: Both CD-HIT and TBC methods are de novo clustering methods, which ignore the real taxonomic identification of each read. Taxonomy-dependent clustering (TDC) can overcome this problem by using information on taxonomic identification. Each read is identified against the EzTaxon-e database, and unclassified reads at the species level (<97% similarity) are subjected to clustering as OTUs using the TBC method. Therefore, the TDC-TBC method is a combination of database-dependent and de novo clustering, and the reads determined by this method are considered real species and artificial OTUs. This hybrid approach can maximize the information on real species diversity in samples in which the 16S rRNA similarity values between species are often higher than 97 % (e.g. species belonging to the family Enterobacteriaceae).
The results of the three different calculations are presented in Supplementary Table 1.