Background

Our facial skin homes millions of microorganisms, predominately bacteria, playing vital roles in skin health. These functions include maintaining the physical barrier, protection against pathogens, modulating immune response, and metabolizing bioactive materials [1,2,3,4]. Facial skin has unique characteristics compared with other skin regions [5], such as increased sebaceous glands density, particularly in the T-zone (forehead, nose, and chin), contribute to enhanced sebum production, preventing dryness, and aiding skin protection [6, 7]. Moreover, constantly exposure to environmental factors such as UV radiation, pollutants, and microorganisms shapes the distinct facial microbiome that differs from other skin area [8].

The facial microbiome plays an indispensable role in skin health, with profound implications for both academia and industry. Aging is a recognized factor influencing the composition and function of the facial microbiome [9, 10]. Studies show that elderly individuals exhibit significantly higher microbiome diversity [9, 11], potentially linked to the reduced immune capabilities associated with increased age [12]. Significant distinctions Actinobacteria and Proteobacteria populations have been observed between the age groups, with specific bacterial enrichment of Cutibacterium and Staphylococcus among Korean women in their 20–30 s, and Streptococcus and Pseudomonas in individuals in their 60 s [13]. Other studies [12, 14] have come across similar results. Notably, facial microbiome composition demonstrates a stronger prediction capacity on chronical age than the gut microbiome, indicating a stronger association between aging and changes in facial microbiome structure [15, 16].

Conversely, the facial microbiome impacts the host skin phenome by regulating skin immunity and inflammation [17, 18], inhibiting pathogen invasion [19, 20], and maintaining pH and stratum corneum hydration [21, 22] through various mechanisms [23], opening avenues for facial microbiome interventions to enhance host skin physiological properties [24].

However, targeting aging-related facial microbes can be challenging due to correlations with skin physiological properties, with themselves change with age. Aging-related physiological changes, such as decreased collagen, sebaceous glands, and sweat [25], affect skin surface physiology [26, 27], influencing the facial microbiome [28]. Notably, certain age-related physio-optical features shape common indicator microbes with age. Particularly, studies show abundances of Cutibacterium positively while Streptococcus negatively correlated with sebaceous gland area [29], and many other factors [10, 25, 30]. These results raise questions about the comprehensive understanding of how age influences the skin microbiota.

To address the role of aging in shaping the facial microbiome while controlling for skin physio-optical properties, we conducted a comprehensive study involving 479 volunteers across three regions in China, spanning ages 18 to 64 with a well-distributed representation across age-intervals. Employing deep metagenome sequencing on the facial microbiome and state-of-art equipment to assess skin characteristics, we established the Facial Microbiome Genome Compendium (FMGC). This compendium better represented our and surprisingly public facial microbiome datasets than existing skin microbial catalogs such as the Skin Microbial Genome Collection (SMGC) [31], and the Unified Human Skin Genome (UHSG) catalog [32]. We then further characterized our data and focused on four key aspects: identifying distinct cutotypes (a classification or grouping of individuals based on the composition and abundance of microbial species found on their skin) correlated with age and skin properties, revealing aging as the primary driver of the facial microbiome, developing a microbiome-based Facial Age Index (FAI) to assess the impact of lifestyle habits on skin aging, and identifying aging-related microbial species and pathways. Notably, our research underscores that the acceleration of aging induced by unhealthy lifestyles is cutotype-dependent. This nuanced understanding of the relationship between aging and the facial skin microbiome sheds light on potential personalized interventions for skin microbiome management. In summary, our study provides a comprehensive dataset, sheds light on the role of aging in skin microecology, and lays the foundation for tailored intervention approaches targeting lifestyle habits, skin characteristics, and microbial factors.

Results

A facial microbiome genome compendium that better represented our and public facial metagenomic datasets through multi-center sampling and deep-sequencing

To enhance our understanding of the human facial microbiome and its association with aging and the physio-optical conditions of the skin, we conducted a study involving 479 healthy Chinese individuals of both genders and various ages (ranging from 18 to 64) from Beijing, Wuhan, and Guangzhou in China. These regions represent the Northern, Central, and Southern parts of China, respectively (Table S1). Inclusion of these regions is beneficial to the broad representation of constructed metagenome-assembled genomes (MAGs). We collected 498 skin samples from both cheeks of the volunteers, subjected them to deep metagenomic sequencing (Fig. 1A). After eliminating vector sequences and low-quality bases, we obtained an average of 102 million read pairs per sample. We estimated that 71.35% of these reads (interquartile range (IQR): 63.20 ~ 85.24%) were attributed to human DNAs, consistent with findings from previous studies [5]. Subsequent removal of host contaminations yielded an average of 29 million read pairs per sample. To mitigate potential impacts induced by physiological characteristics and personal lifestyle habits, we assess the physio-optical features of participants’ facial skin, including moisture, sebum, gloss, skin elasticity, and sensitivity (Tivi), using state-of-the-art equipment (“Methods” section), while relevant lifestyle information from the participants were collected (Table S1).

Fig. 1
figure 1

Construction and analysis of the Facial Microbiome Genome Compendium (FMGC). A Overall workflow of this study including the multi-center sample and meta info collection, measurement of skin physio-optical properties using industrial-grade instruments such as Corneometer, pH meter, Glossymeter, Sebumeter, and Cutometer from Courage and Khazaka, Visia-CR from Canfield and Tivi700 from WheelsBridge, generation of a representative Facial Microbiome Genome Compendium (FMGC) consisting 2737 species level genomes and integrative analyses of the (meta)genomic features with the host meta-data and skin features. BD Mapping rates comparison: Comparison of mapping rates of facial metagenome sequencing reads to MAGs from FMGC, SMGC, and UHSG. The facial metagenome data included B 498 sample from this study; C 506 sample from SMGC study and randomly selected 120 sample from UHSG study; and 822 sample including 3 sample sites from the iHSMGC study [5]. D Novelty assessment: Evaluation of FMGC MAGs’ novelty based on their overlap with reference microbial genomes in the SMGC and those used by GTDB-tk and metaPhlAn4. E Phylogenetic analysis of the non-redundant 3359 MAGs combined from the FMGC and SMGC genomes. The orange color strip for FMGC MAGs, grey for SMGC MAGs, and blue for UHSG MAGs, and light pink for novel MAGs in FMGC. The outer ring is colored by phyla that were annotated using GTDB-tk [33]. F Phylogenetic diversity (branch length of selected MAGs) expansion of the FMGC over the SMGC genomes at the phylum level. The top ten 10 phyla with the highest number of MAGs were selected. G Expansion of the FMGC MAGs over the SMGC genomes at the genus level. Top 10 genera with the largest number of MAGs were selected. Statistical significance indicated by: ns p > 0.05, p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001, Wilcoxon rank sum test

A comprehensive skin reference microbial genome catalogue is essential to characterize our samples. Recent studies have generated several such catalogs, including the SMGC [31] and the UHSG [32] catalog. SMGC was assembled from a total of 2479 skin metagenome samples including 506 facial samples and 1973 from other body sites, while UHSG used a total of 450 facial and 2069 other skin metagenomes. We thus evaluated their representativeness on our dataset by aligning our clean reads to these catalogs and calculated the mapping rates. Surprisingly, only 66.9% and 59.4% of our reads could be mapped to the two catalogs, respectively. Additionally, metaPhlAn4 [34], a widely used taxonomic profiling tool, reported ~ 45% unaccounted abundances by existing reference genomes (Fig. S1). Therefore, we have made the decision to assemble our own MAG as the basis to enhance the annotation and characterization of our data.

To construct our own set of MAGs, we employed a bioinformatics pipeline similar to SMGC [31] (“Methods” section). Briefly, we filtered out MAGs that were ≤ 200 kbp or deemed low-quality, high-contamination, or both based on CheckM [35] and de-replicated the remaining MAGs at an average nucleotide identity (ANI) of 95% (“Methods” section). We finally generated a comprehensive set of 2737 non-redundant species-level MAGs and referred it as the Facial Microbiome Genome Compendium (FMGC) (Table S2).

As expected, the FMGC genomes better represented our facial microbiome samples by recruiting a median of 80.0% sequencing reads, significantly higher than the public catalogues (p < 0.001, Wilcoxon rank sum test, Fig. 1B). Strikingly, FMGC also better represented available public facial microbiome datasets. For example, on the SMGC facial samples (n = 506), FMGC recruited 95.8% of the metagenomic sequencing reads, significantly higher than that by UHSG (91.3%, p < 0.001, Wilcoxon rank sum test; Fig. 1C). Similarly, FMGC recruited 90.37% of the sequencing reads of the UHSG facial samples (n = 120), higher than SMGC (~ 86.61%, p < 0.001, Wilcoxon rank sum test; Fig. 1C). For fair comparison, we conducted the reads recruitment using the three above mentioned catalogs on an independent public dataset used by the integrated Human Skin Microbial Gene Catalog (iHSMGC) [5]. FMGC recruited 68.5% of the sequencing reads of the iHSMGC samples (n = 822), higher than the SMGC (56.3%) and UHSG (50.98%, p < 0.001, Wilcoxon rank sum test; Fig. 1C).

As the iHSMGC dataset encompassed facial samples from multiple regions, while FMGC only collected from the cheek, we also examined whether the representativeness of the FMGC varied across different facial locations. The results revealed no significant differences in the representativeness of FMGC among the cheeks, forehead, and nose regions. We further conducted principal coordinate analysis (PCoA) on the iHSMGC samples, employing FMGC for abundance calculations to evaluate the overall microbial structure across these three facial regions. The results demonstrated a lack of significant differences among these areas (Fig. S2, p = 0.7), underscoring the consistent representativeness of FMGC across distinct facial regions.

To quantify the novelty in FMGC genomes to facial microbiome, we compared the genomic sequences with those in public reference skin microbial catalogs. At an ANI threshold of 95%, 752 and 807 of the FMGC MAGs could be found in SMGC and UHSG, respectively, totaling 1029 MAGs (Fig. S3A). In addition, 1644 and 1014 could also be found in the reference microbial genomes used by GTDB-tk [33] and MethPhlAn4 [34] at such an ANI threshold. Specifically, using GTDB-tk [33], we assigned all FMGC MAGs to either bacteria (n = 2734) or archaea (n = 3) at the kingdom level, 94–100% of them to known genus (94.26%) and higher taxonomic levels, and ~ 60% to known species (Table S1; Fig. S3B). Together, a total of 809 MAGs were considered as novel, representing 30% of the FMGC genomes (Fig. 1D and S3B). Importantly, these novel MAGs exhibited even higher qualities in genome size, N50 length and completeness while maintaining the same contamination level (~ 2.58%) as compared with those overlapping with the aforementioned datasets (Fig. S4; Wilcoxon rank sum test). These findings suggest that the novel MAGs likely represent genuine novel genomes rather than low-quality fragments resulting from mis-assembly.

We also quantified the contribution of FMGC to the increased genome diversity at different taxonomic levels. To achieve this, we merged all species-level MAGs from FMGC, SMGC, and UHSG, and obtained a total of 4167 MAGs. Remarkably, FMGC MAGs exhibited an expansive representation across diverse branches in the phylogenetic tree (Fig. 1E), in contrast to certain branches in public datasets that lacked representation (Fig. S5). In total, the FMGC contributed to a 47% increase in phylogenetic diversity throughout the entire tree (Fig. S5). Further examination at the phylum level revealed that most phyla displayed similar levels of increase (Fig. 1F). Similar trends were observed at the genus level (Fig. 1G). Notably, the Sphingomonas genus of the Proteobacteria phylum stands out, with 47 out of 48 MAGs exclusively detected in FMGC and only one MAG found in existing datasets (Fig. 1G).

Together, FMGC demonstrated a higher representativeness compared to existing public catalogues, contributing to a substantial increase in phylogenetic diversity across various taxonomic levels.

FMGC reveals two consistent cutotypes in facial samples across regions

Cutotypes offer a comprehensive view of the inter-individual variation in facial microbial composition [5, 36]. A cutotype is a classification or grouping of facial microbiome samples (or based on the microbial composition into different skin microbiome profiles or community types. Community types in the gut bacteriome and mycobiome are associated with disease risks [37] and aging [38]. Similarly, we would expect that cutotypes can also help researchers to better understand the role of the skin microbiome in maintaining health and preventing skin-related issues. To identify and characterize cutotypes in our samples, we calculated the relative abundances at the genus-level by aligning the sequenced reads to the FMGC MAGs using BWA (“Methods” section). Genus-level abundances were chosen because community types are commonly determined at this level [39, 40], and the GTDB-tk tool we used for taxonomic classification could assign most of our MAGs (94.26%) to known genera (Fig. S6), ensuring the reliability of our results. Applying the Partitioning Around Medoids (PAM) clustering algorithm (“Methods” section), we obtained two cutotypes: the Cutibacterium-cutotype (C-cutotype) and Streptococcus-cutotype (S-cutotype; Fig. 2A). Importantly, we consistently observed these cutotypes when analyzing samples from individual regions or when combining them (Fig. 2A). And only a small fraction of samples (~ 6.2%) exhibited an additional cutotype, namely Enterobacter-cutotype.

Fig. 2
figure 2

Two consistent cutotypes built with FMGC and the correlation between microbial community structure and aging and skin physio-optical properties. Cutotypes constructed using genus-level abundances derived from the A FMGC MAGs and B metaPhlan4. The PCoA plots display the cutotypes built with all samples together. The Sankey diagram illustrates the analysis of samples separated by region and the combined analysis of all samples, showcasing the variation in cutotypes. C The proportion of different cutotypes observed, when age, moisture, and sebum are evenly grouped according to their distributions. D The relative abundance of the driver of the cutotypes, when age, moisture, and sebum are evenly grouped according to their distributions. E The tile map showing the Pearson correlation between collected meta info and skin physio-optical properties. The width of curves is the explained variance of these factors to metagenome structure calculated with Envfit test, and colors for its p-value. F Mediation analysis examining relationship between age and microbiome structure, considering significant influencing factors such as sebum, moisture, and sensitivity. The arrow from the independent variable (age) to the mediating variable (sebum, moisture, and Tivi) illustrates the influence of the independent variable on the mediating variable. Likewise, the arrow from the mediating variable to the dependent variable (microbiome structure) demonstrates the impact of the mediating variable on the dependent variable. Finally, the arrow from the independent variable to the dependent variable indicates the overall effect of the independent variable on the dependent variable, encompassing both total effect and average direct effect. Average direct effects (ADE) refers to the direct effects of the independent variable on the dependent variable. It represents the effect of the independent variable on the dependent variable while controlling for the mediating variable(s). Total effect, on the other hand, refers to the overall effect (direct + indirect) of the independent variable on the dependent variable. p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001, Wilcoxon rank sum test

In comparison, we constructed cutotypes using the genus-level abundances annotated by metaPhlAn4 [34], one of the most widely used taxonomic profiling tools. However, it is important to note that metaPhlAn4 assigned a median of only 54% abundances to known genera, resulting in fewer annotated genera and lower α-diversity compared to FMGC in our samples (Fig. S6). When analyzing all samples combined, metaPhlAn4 generated two cutotypes: the Prevotella-cutotype and the Cutibacterium-cutotype. However, significant inconsistencies emerged when analyzing the samples separately by region, ~ 51% of samples’ cutotypes were discordant. Specifically, ~ 42% Prevotella-cutotype samples shift to Streptococcus-cutotype (~ 29% to Corynebacterium-cutotype; ~ 18.22% to Malikia-cutotype; ~ 9.81% to Porphyromonas-cutotype; ~ 0.93% to Cutibacterium-cutotype), while ~ 4.2% Cutibacterium-cutotype samples shift to Corynebacterium-cutotype (~ 2.9% to Porphyromonas-cutotype).

These findings highlight the importance of using FMGC as a better catalogue of reference genomes for further characterizing the facial microbiome, emphasizing the need for more complete and accurate reference genomes to obtain stable and reliable cutotype classifications for further investigations.

Age as a determinant of facial microbiome composition

With the availability of skin physio-optical features and participants’ metadata, we investigated the influence of these factors on facial microbiome. Remarkably, age, moisture, and sebum were identified as key factors that correlated with both cutotypes and driver taxa, exhibiting distinct trends (Fig. 2C). Specifically, we observed a notable decline in the proportion of the C-cutotype with increasing age and higher levels of skin moisture, while it exhibited a significant increase with elevated sebum levels (Fig. 2C). Conversely, the S-cutotype displayed an opposite pattern, indicating a contrasting response to these skin properties (Fig. 2C, S7). These findings were further supported by the relative abundance of the two driver genera, Streptococcus and Cutibacterium, which exhibited similar patterns to their respective cutotypes (Fig. 2D, S8). These results are consistent with previous findings [25, 30].

We further quantified the influence of the aforementioned factors on the facial microbiome using an Envfit test (“Methods” section). Consistent with the cutotype analysis, all of these factors demonstrated significant correlations with the metagenome structure (measured with axis 1, and 2 of the principal coordinate analysis (PCoA) of facial microbial profiles), with the exception of R2 (an elasticity measurement) and pH (Fig. 2E). Among them, age, moisture, sebum, and sensitivity emerged as the top four factors (Fig. 2E). Notably, significant inter-correlations were observed among these factors, with age demonstrating correlations with most of the other factors (Fig. 2E). Specifically, age displayed negative correlations with skin elasticity measures (R2, R5, and R7), sebum, and gloss, while demonstrating a positive correlation with Sensitivity (Tivi). To better understand the individual contributions of these factors, we conducted a mediation analysis [41,42,43], which revealed that age had the most substantial and direct impact on the community structure of the facial microbiome (measured with axis 1, 2, and 3 of PCoA, Fig. 2F). For example, age exerts a significant influence on both sebum levels and the structure of the microbiome, where the latter is influenced both directly and indirectly by age; when accounting for the mediator, age continues to exhibit a significant average direct effect, which is stronger than that of the sebum level to the facial microbiome (Fig. 2F).

Together, age stood out as the factor exhibiting the strongest correlation with the abundances of specific bacterial species in our exploration of the correlation between individual bacterial species and the physio-optical properties of the skin, as well as other meta-information (partial Pearson correlation, p < 0.05; Fig. S9). Considering that age is an independent characteristic not influenced by any other factor, while other physiological properties are correlated with age (e.g., the older the age, the less sebum), we used it as a primary variable in our mediation analysis. Following the same logic, we argue that the observation of a stronger age-microbe correlation may suggest a causal relationship from age to microbial composition.

FMGC reveals hundreds of novel age-related MAGs

We next focused on the age-related effects in the facial microbiome. We found a significant increase in the α-diversity of the facial microbiome, as indicated by the Shannon index, with advancing age (Fig. 3A). This observation was supported by an overall Pearson correlation coefficient of 0.47 and a p value < 0.001 (Fig. S10), consistent with previous research [11, 16]. The overall trend was partially driven by the higher α-diversity of the S-cutotype comparing to the C-cutotype (Fig. S11) and the increased proportion of the S-cutotype in the older age groups (Fig. 2C). Furthermore, similar patterns, such as the increased α-diversity with advancing age, were consistently observed within individual cutotypes (Fig. S12). These trends may reflect the declining functionality of the facial immune system, resulting from continuous exposure to environmental and/or chemical factors[44]. Furthermore, they suggest a potential connection between aging, increased microbiome diversity and susceptibility to certain diseases.

Fig. 3
figure 3

A microbiome-based facial age index (FAI) reveals effects of suboptimal lifestyle habits on skin aging. A Barplot showing the increase in Shannon diversity with advancing age. B Scatter plot displaying MAGs correlated with the calculated Facial Aging Index (FAI). Labeled dots represent MAGs with mean abundance higher than 1% or relatively higher absolute Spearman’s R (≥ 0.5). Framed dots indicate MAGs that also exhibit strong correlations with FAI. C Facial age index (FAI) built with marker microbes and its significantly positive correlation with chronical age, confirming the practicability of using facial skin microbes to access skin age. Noted that Wuhan samples were not included in the construction of FAI due to its limited age range and inclusion of acne samples. D Boxplot showing the FAI-residuals (with the effects of the chronical age removed) in participants with/without certain lifestyle habits and different cutotypes. E PCoA plot demonstrating the difference of overall microbiome structure among participants with/without certain lifestyle habits and different cutotypes. p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001, Wilcoxon rank sum test

To identify individual microbial genomes (MAGs) significantly correlating with age, we scrutinized the dataset, excluding 269 samples with potential confounding factors such as acne and make-up habits [45, 46]. The refined dataset of 229 samples revealed a total of 685 age-associated MAGs (p < 0.05, Pearson’s correlation; Fig. 3B, Table S3). Among these, 652 MAGs (338 novel MAGs) displayed positive correlations with age (pMAGs), while only 31 (21 novel MAGs) exhibited negative correlations (nMAGs). Interestingly, a closer inspection revealed that the Prevotella genus, known as a keystone genus in human gut while often found associated with human infection in other body sites [47], contained the highest number of pMAGs (47 out of 652; Fig. S13A), followed by Streptococcus and Neisseria (44 and 30, respectively), two of the known members of the oral microbiota. These two genera include several pMAGs capable of opportunistic pathogenicity, such as Streptococcus oralis [48] (5 MAGs, including 3 novel ones), Streptococcus mitis [49] (21 MAGs, 14 novel ones), Neisseria subflava [50] (5 MAGs, 1 novel). Surprisingly, there are 8 pMAGs (6 novel ones) belong to the notorious pathogen Streptococcus pseudopneumoniae [49]. On the other hand, Corynebacterium contains MAGs from different species correlated positively (27 MAGs) or negatively (12 MAGs) with age (Fig. S13A). For example, BJ042_bin.19 is a novel pMAG of Corynebacterium macginleyi, reported as a pathogen in case of unilateral conjunctivitis [51], that positively correlates with age showing a Spearman’s R of 0.42, while HC31_bin.3 is a Corynebacterium singular negatively correlates with age exhibiting a Spearman’s R of − 0.33(Table S3), suggesting heterogeneity exists within the Corynebacterium genus. Besides, the Cutibacterium contains the second highest number of negatively correlated MAGs (7 nMAGs), consistent with our above observations.

Focusing on MAGs with stronger correlations (absolute Spearman’s R ≥ 0.5) or with high mean relative abundance (≥ 1%) across samples (Fig. S13B), we identified a subset of eight MAGs, comprising six pMAGs and two nMAGs. Among which, two nMAGs, 35_bin.4 and YCR_1_bin.2, from Cutibacterium acnes stood out, with a mean relative abundance ranging from ~ 8 to 10% and high Spearman’s R, consistent with the known negative associations between C. acnes and aging [16]. Surprisingly, all of the six pMAGs are identified as novel MAGs, including four with no species-level assignment from four different genera (Leuconostoc, Enterobacter, Anaerococcus, and Streptococcus). Specifically, the two low-abundant yet strongly correlated pMAGs both belong to the Anaerococcus genus, including one from Anaerococcus nagyae, a species previously reported to be associated with fatal sepsis in patients undergoing transarterial chemoembolization treatment [52].

Experimental validation is imperative to elucidate the roles of these age-related MAGs, particularly the novel ones. Subsequent experiments should shed light on their functional significance.

A microbiome-based Facial Age Index reveals cutotype-dependent effects of lifestyles on skin aging

To quantitatively access the age-related effects to the facial microbiome, we further identified age-related taxa at higher taxonomic levels (Table S3) and used them together with the age-related MAGs to develop a Facial Aging Index (FAI). The FAI takes into account both variations in taxa abundance and the correlations between bacterial composition and age (“Methods” section). A higher FAI score indicates a greater degree of skin aging. To validate the effectiveness of our approach, we constructed the FAI using data from a specific region (e.g., Guangzhou) and then tested it on samples from different regions (e.g., Beijing). We assessed the correlation between chronological age and FAIs in the test dataset. Regardless of the region used to build the FAI, the test results remained consistent across other regions (Fig. S14). Applying this method to all our samples, we discovered a positive correlation between the FAI and the chronical age of the participants, both when combined (p < 2.2e-16, Pearson’s r = 0.55, Fig. 3C) and within individual cutotypes (C-cutotype: p < 2.2e-16, Pearson’s r = 0.54; S-cutotype: p = 2.3e-11, Pearson’s r = 0.47, Fig. S15). Specifically, 557 out of the 652 pMAGs were found to correlate with FAI, while all 31 of the nMAGs also exhibited correlations with FAI (Fig. 3B, S13C).

Interestingly, when examining the impact of different lifestyles on the facial microbiome, which are often related to skin conditions [46, 53, 54], we observed significantly higher FAI-residuals (controlled with age) in individuals with certain lifestyles, including “Make up” (excessive or heavy makeup application, defined as three or more cosmetic applications per week; see “Methods” section) and “Stay up late” (going to bed after 11 pm on three or more occasions per week, indicating habitual sleep restriction; see “Methods” section). After accounting for the participants’ chronological age, we found significantly elevated FAI residuals in the “Make up” group compared to the group without make-up in both the C- and S-cutotypes (Fig. 3E). We observed similar trends in the “Stay up late” group, although the difference was only statistically significant in the C-cutotype, not the S-cutotype. Notably, since the C-cutotype was enriched with young participants (Fig. 2C), our analysis suggests that the “Stay up late” lifestyle has a greater impact on the younger population than the older population. Furthermore, we observed relatively lower sebum-level-residuals (p < 0.1, Wilcoxon rank sum test), significantly lower R5-residuals (skin elasticity, the higher the better; p < 0.05, Wilcoxon rank sum test), and higher Tivi-residuals (sensitivity, the lower the better; p < 0.05, Wilcoxon rank sum test), in S-cutotype samples with unhealthy lifestyles (Fig. S16). Additionally, other host conditions such as constipation and “Mood Swings” (referring to frequently significant emotional fluctuations) were also found to significantly affect skin aging (Fig. S17). It is noteworthy that “Mood Swings” specifically impact participants with the C-cutotype (Fig. S16).

We further confirmed the effects of lifestyles on facial microbiome using PCoA analysis, which taking into account differences in all microorganisms. We again observed significant differences in the overall between group diversities in participants with and without “Make UP” and “Stay up late” lifestyles. As shown in Fig. 3E, we used the main axis 2 and 3 of the PCoA analysis to analyze the influence of lifestyles on facial microorganisms, because the main axis was mainly driven by the cutotype (Fig. S18). Our results showed that “Stay up late” significantly affected the microbial structure in both cutotypes, while “Make up” only affected the C-cutotype.

Specifically, we observed a higher relative abundance of the pMAGs in samples with makeup habits, while the nMAG were enriched in samples without makeup habits (Fig. S13D). Furthermore, we also assessed the total abundance of the pMAGs and the nMAGs, and the results remained consistent. This suggests a potential association between makeup habits and MAG-level changes in facial microbes.

Together, our findings suggest that age-associated microbial markers can serve as indicators to characterize skin age, and distinct cutotypes exhibit varied responses to external influences.

Cutotype-dependent aging effect in microbial metabolite pathways due to unhealthy lifestyles

We next investigated the microbial functional changes related to facial skin aging. We annotated the microbial functions using HUMAnN3 [55]. These functions were then grouped into seven categories based on their MetaCyc annotations [56], namely Inflammation, Nicotinamide adenine dinucleotide (NAD) consumption, Oxidation/Glycosylation, Anti-aging, Anti-Oxidation, Biosynthesis, and Others. We then conducted a PERMANOVA analysis to determine the role of aging in the abundance structure of the microbial pathways, and discover that age is the only significant factor (Fig. S19A). To investigate age-related pathways, we classified the participants into four equal-sized groups based on their age: Group A (18–25 years), B (26–37 years), C (37–51 years), and D (51–64 years). Using LefSe analysis [57], we compared the youngest group (Group A) with the oldest group (Group D) and identified 61 and 237 pathways that were significantly more abundant in Group A and D, respectively (Table S4).

Notably, pathways associated with the first three categories were known to be associated with aging [58,59,60,61,62,63,64,65,66]. All pathways in these three groups showed significant enrichment in the Group D (Fig. 4). We collectively refer to these pathways as AG-pathways (Table S4). For instance, the OANTIGEN-PWY pathway, categorized under Inflammation, is responsible for producing O-antigen, a repeating unit of lipopolysaccharides (LPS) known to trigger inflammatory responses [67]. We observed a significantly enrichment of this pathway in oldest group (Fig. S20) with the highest linear discriminant analysis (LDA) score of 3.30. Another pathway, PWY-5136, falls under NAD consumption and employs NAD in the process of fatty acid β-oxidation. This pathway generates reactive oxygen species (ROS), leading to mitochondrial damage and oxidative stress [68,69,70]. The increasing prevalence of the NAD consumption pathway underscores its potential significance, explaining the inclusion of NAD in skincare products designed for the advanced age group. Additionally, besides the individual pathways, the cumulative abundance of pathways within each of the broad functional categories demonstrated an upward trend with advancing age (Fig. 4A).

Fig. 4
figure 4

Functional analysis reveals molecular pathways underlying skin aging and their associations with suboptimal lifestyle habits. A Boxplot showing the accumulated abundance of aging-related (Inflammation [58, 59], NAD consumption [60,61,62, 71], and Oxidation/Glycosylation [63,64,65,66]; red shadows) and anti-aging-related (Anti-Aging, Biosynthesis, and Anti-Oxidation; green shadows) pathways across different age groups. B Tile map plot displaying the effect size, as determined by the Wilcoxon rank sum test (filtered with adjusted one tailed p value < 0.1), between the youngest and oldest age groups, with or without make up habits and with or without stay-up-late habits. p < 0.1, *p < 0.05, **p < 0.01, ***p < 0.001, Wilcoxon rank sum test

Conversely, pathways associated with Anti-Aging, Biosynthesis, and Anti-Oxidation are found to be enriched in Group A, thus referred to as anti-AG-pathways. For example, the ARGSYN-PWY (l-arginine biosynthesis pathway) of the Anti-Aging group generates l-arginine. Arginine is often used in anti-aging skincare products because of its potential ability to support collagen production [72, 73]. Meanwhile, the PWY-6126 (superpathway of adenosine nucleotides de novo biosynthesis II) of the Anti-Oxidation group is a metabolic pathway that participates in the de novo biosynthesis of adenosine nucleotides, which are involved in oxidative stress response, glutathione biosynthesis, and DNA synthesis under anaerobic conditions. Overall, the total abundance of the pathways within each group also displayed a decreasing trend with the advancing age (Fig. 4A). It is worth to mention that this grouping is based on sample size, and is merely for better presenting the result. Using an alternative grouping method according to skin wrinkles [74] generated essentially the same results (Fig. S21). In fact, when age was used as numerical values, we observed significant correlations between age and the pathway abundances in manners that are consistent with the grouped ages (Fig. S19B). In particular, the chronical age negatively correlated with Anti-Aging (R = − 0.31 and p = 1.7e-11), Biosynthesis (R = − 0.3 and p = 4.9e-11) and Anti-Oxidation (R = − 0.29 and p = 1.6e-10), and positively correlated with Inflammation (R = 0.2 and p = 2.4e-05), NAD consumption (R = 0.2 and p = 2.4e-05), and Oxidation/Glycosylation (R = 0.21 and p = 5.3e-06). These results indicate that our observations are robust again the age grouping methods.

We further investigated whether the impact of the above mentioned lifestyles on the facial microbiome is reflected in microbial functions. To explore this, we controlled for chronological age differences between groups and compared pathway abundances among groups with different lifestyles. Consistent with the FAI analysis, we observed significantly higher abundances of AG-pathways and lower abundances of anti-AG-pathways in the “Make up” group (Wilcoxon rank sum test; Fig. 4B). These trends were consistently observed in both the C- and S-cutotype groups. Interestingly, while the FAI analysis indicated that “Stay Up Late” primarily affected the C-cutotype group, our findings revealed that metabolite pathways in participants belonging to both cutotypes were influenced by this lifestyle. However, the influence of “Stay Up Late” on microbial functions were only statistically significant in the AG-pathways but not on the anti-AG-pathways (Fig. 4B).

Should these alterations in metabolic pathways be experimentally validated, they could elucidate the potential molecular mechanisms underlying skin aging and provide potential intervention targets to counteract the aging process. Our analysis further emphasizes the impact of specific lifestyles on aging and highlights the importance of personalized intervention strategies tailored to individual lifestyles. For example, exploring the supplementation of NAD and anti-aging metabolites may be considered as strategies to mitigate skin aging in older age groups or in individuals with habitual sleep restriction.

Discussion

Aging and alterations skin physio-optical conditions could exert similar changes to the facial microbiome, obscuring the aging-microbiome associations. Here, we elucidated the pivotal role of aging in facial microbiome while effectively controlling for skin physio-optical properties through a multi-center and deep-sequencing survey on the facial microbiome. We sampled 479 volunteers with a broader age distribution (spanning ages 18 to 64) and well-distributed age representations across age-intervals (Fig. S22), and employed stage-of-art industrial grade equipment to measure the skin physio-optical properties.

Strikingly, existing skin reference microbial catalogs including SMGC, and UHSG had low representation of our sequencing data (Fig. 1B). To address this, we constructed our own MAG catalog, the Facial Microbial Genome Compendium (FMGC), consisting of 2373 species-level MAGs. FMGC better represented our samples and effectively expanding the diversity of known skin microbial genomes by up to 47% (Fig. 1F, S5). FMGC also better represented the public skin microbiome datasets than these two public MAG catalogs (Fig. 1B, C).

Despite that SMGC (n = 2479) and UHSG (n = 2519) used more samples and included skin microbiome data from other body sites, we obtained significantly more MAGs. This may be attributed to two reasons. First, due to high host DNAs in the facial microbiome samples (e.g., median: 76.43%; interquartile range: 63 ~ 85%; Fig. S23), we employed deep sequencing and obtained an average of ~ 4.2 G usable bases after the removal of human DNA, vector sequences, and low-quality bases. This value is significantly higher than the samples used in SMGC (~ 0.14 G). Second, in order to determine contribution of aging in the composition of skin microbes, we sampled a relatively even distribution of age groups ranging from 18 to 64 compared to that of UHSG, which consisted of mostly young adults (Fig. S22). Considering the pivotal role of aging in skin microbiome, the limited age group might result in the loss of certain MAGs. This emphasizes the critical importance of sequencing depth and age distribution of the samples to enhance the FMGC due to the high human DNA contents and age dependent composition in facial microbiome sequencing data.

We identify two consistent cutotypes in our samples, both within and across different regions. In our study, the high rate of unclassified taxa (~ 45%) obtained from public taxonomic profiling tools has further contributed to the observed inconsistencies among samples from various regions (Fig. 1A). Cutotypes have been previously reported in facial skin microbiome by Li and colleagues [5] but the results also suffered from high fractions of unknown taxa according to metaPhlAn4 (~ 45%%, Fig. S24). Consequently, the authors reported two cutotypes, namely Cutibacterium- and Moraxella-cutotype [5]. With FMGC, we recapitulated the two cutotypes but also identified an additional Enterobacter-cutotype that accounted for 33% (n = 268) of the total samples (Fig. S25). Interestingly, in these samples, Enterobacter is predominantly undetectable when analyzed using metaPhlAn. However, when employing FMGC, the relative abundance of Enterobacter averages closes to 40%. These findings underscore the critical role of a highly representative genome catalog in facilitating more reliable and robust subsequent analyses. These findings are also in line with previous results that the microbiome community types can vary significantly in different populations and/or are influenced by sample sizes [40].

We then highlighted the central role of aging in shaping the facial microbiome, while properly controlling the skin physio-optical properties. Our findings demonstrated significant impacts of all examined factors on the skin microbiome compositions including the cutotypes, moisture, sebum, gloss, and pH, with the exception of skin elasticity, while also revealing significant inter-correlations among these factors. Specifically, age exhibited strong correlations with all the factors, including decreased skin elasticity, gloss, and sebum, as well as increased moisture level, pH, and sensitivity with advancing age (Fig. 2E). These observations align with prior research [10, 25, 30, 45, 75]. Notably, our discovery that Cutibacterium abundance positively correlates with sebum content and negatively correlates with moisture level echoes previous reports [10, 25, 30]. These conclusions are further confirmed through our multi-center sampling, affirming their robustness. Diverging from existing literature, our study employs multivariate and Mediation analyses to extensively disentangle the contributions of these factors, effectively establishing the pivotal role of aging in shaping the skin microbiome, either directly or indirectly by mediating the physio-optical properties. Consequently, our study offers a more comprehensive and unique perspective on the factors that contribute to the composition of the skin microbiome compared to previous research.

Having established the key role of aging, we further revealed its impacts on the facial microbiome from several aspects. Firstly, we have successfully identified key microbes that are specifically associated with the aging process. Building upon this, we have developed and rigorously validated a microbiome-based facial aging index (FAI). By utilizing the FAI, we have been able to effectively quantify and uncover the effects of lifestyles on the skin aging index. Furthermore, our investigation has revealed aging-related metabolite pathways and shed light on the differential impacts of lifestyles on different cutotypes. These findings imply that behavior can influence skin microbiome, and modifying behavior can enhance skin microbiome health, e.g., by reversing the facial aging index. These results offer potential avenues for targeted interventions that can specifically target aging-related microbes, pathways, or metabolites. However, further longitudinal studies, tracking individuals over time, would be more effective in elucidating the impact of age on the skin microbiome and establishing a causal relationship. If future experiments validate our observed correlations and demonstrate their impact on skin aging, they could serve as a foundation for potential skincare interventions, and pave the way for the development of tailored skincare products and treatments that cater precisely to the unique and specific requirements of individuals.

In conclusion, our findings highlight the central role of aging on facial microbiome and strongly support personalized skin microbiome interventions by targeting lifestyle, skin properties, and aging-related microbial factors including cutotypes, key microbes, metabolic pathways, and metabolites, laying a foundation for future research and applications.

Conclusions

Our study shed light on the significant role of aging in shaping the facial microbiome, carefully considering skin physio-optical properties through a comprehensive multi-center deep-sequencing survey. These findings emphasize the potential value of personalized skin microbiome interventions aimed at addressing aging-related microbial factors, offering valuable insights for future research and applications.

Methods

Microbial sampling, skin physiology assessment, and lifestyle survey

A total of 479 volunteers were recruited from three cities, namely Beijing, Wuhan, and Guangzhou, representing the northern, central, and southern of China, respectively. A total of 498 samples were collected from these volunteers. This study was approved by the Ethics Committee of Tongji Medical College of Huazhong University of Science and Technology (No, S083).

For the volunteers in Beijing (n = 201) and Guangzhou (n = 170), the following requirements must be met:

  1. 1.

    Age 18 to 65

  2. 2.

    In good health and free from serious illnesses such as cancer, cardiovascular, and cerebrovascular diseases

  3. 3.

    No serious skin conditions

  4. 4.

    No antibiotics in the last 2 months

  5. 5.

    No acid skin care products in the last month

A two-day procedure was carried out to assess the physio-optional properties and sample the skin microbiome, as the following. Firstly, on the first day, participants are required to gently wash their faces only with water in the morning and avoid excessive rubbing, then arrived at the laboratory and rested for 20 min before any tests. During which, they were asked to fill out a questionnaire about personal data and lifestyles (Supplementary Table 1). Then tests were performed using the following instruments to measure the physical, chemical and optical properties of the facial skin, as detailed in Table 1. Secondly, to prepare for the second day, the volunteers were asked to wash their face only with water and apply a minimal amount of skin care products in the evening. They were advised to avoid acid and antibacterial skin care products, as well as to maintain a consistent diet and emotional stability for the duration of the experiment. Lastly, on the second day, the volunteers were asked not to wash their face or use any skin care products and cosmetics, and arrive at the laboratory in the morning. They rested for 20 min before the sampling. To avoid contaminations, the participants were instructed to very gently splash distilled water on their faces in our facilities.

Table 1 Test indicators and the corresponding instruments used in this study

We then used sterilized microbe-free swabs (Cat #: LC34, HYNAUT, China) to sample on both cheeks of each volunteer, wearing latex gloves that were disinfected with alcohol. We used two cotton swabs soaked with sterilized saline water (0.9%) on each cheek, wiping each cotton swab on the face for about 40 s. We used a total of four cotton swabs on both cheeks. After sampling, we broke off the cotton swab head and put it into a tube containing 2 mL preservation solution (Cat #: CY-98000A, Huachenyang, Shenzhen, China).

For volunteers in Wuhan (n = 127), only the facial microbiome samples were collected, following the same procedure mentioned above. Volunteers may have either healthy skin (n = 71) or suffer from acne (n = 56), which were diagnosed by dermatologists. Other requirements remain the same as for volunteers in Beijing and Guangzhou.

In particular, several swabs were exposed to air for the same duration as the sampling process as a negative control. However, further validation indicated their low DNA content, rendering them unsuitable for sequencing.

To prevent DNA degradation, a 2 mL room-temperature stabilizing reagent kit (Cat. No. CY-98000A, Huachenyang, Shenzhen, China) was used to mix with the swab collected from each participant. The collected samples were transferred to freezers at – 80 °C within 12 h of collection and stored until DNA extraction.

DNA extraction, library preparation, and metagenomic sequencing

DNA extraction was carried out by adding 1000 μL of CTAB lysis buffer and 20 μL of lysozyme to the sample, followed by a 2–3-h incubation at 65 °C. The supernatant was collected by centrifugation at 950 μL, and an equal volume of phenol:chloroform:isoamyl alcohol (25:24:1) was added, mixed thoroughly, and centrifuged at 12,000 rpm for 10 min. The supernatant was transferred, and an equal volume of chloroform:isoamyl alcohol (24:1) was added, mixed well, and centrifuged at 12,000 rpm for 10 min. The supernatant was transferred to a 1.5 mL centrifuge tube, and 3/4 volume of isopropanol was added for precipitation at – 20 °C. After centrifugation at 12,000 rpm for 10 min, the liquid was discarded, and the DNA pellet was washed twice with 1 mL of 75% ethanol. The DNA sample was dissolved in 51 μL of ddH2O, and if necessary, incubated at 55–60 °C for 10 min for better dissolution. RNase A (1 μL) was added to digest RNA, and the mixture was incubated at 37 °C for 15 min.

Subsequently, genomic DNA was randomly fragmented by Covaris LE220 (Covaris, Inc.) according to the manufacturer’s instructions. The fragmented DNA was selected by magnetic beads to an average size of 200–500 bp. The selected fragments underwent end-repair, 3′adenylated, adapters-ligation, PCR amplifying, and the products were purified by the magnetic beads. The double-stranded PCR products were heat denatured and circularized using the splint oligo sequence. The single-strand circle DNA (ssCirDNA) was formatted as the final library and qualified by QC. Sequencing was performed using the BGI Genomics DNBSEQ-T7 (PE150) sequencer.

Metagenome data processing and construction of the facial microbiome genome compendium

All the raw metagenomic data were processed using Fastp v0.23.4 [76] to eliminate adapter contamination, low-quality bases, N′ base, and duplication. The resulting trimmed reads were then aligned to the human genome reference sequence (hg38) using Bowtie2 (version 2.4.5) with default parameters and filtered to obtain clean data. In the pursuit of optimum results, we explored the efficacy of two recently introduced human reference genomes: the T2T-CHM13v2.0 genome [77] and the Chinese human pan-genome (CPC) [78]. To achieve this, we employed a sample set consisting of 10 randomly selected samples. To our surprise, our analysis revealed that both the T2T and CPC genomes yielded significantly lower read removal rates compared to the hg38 genome (p = 0.00098 and 0.004, respectively, as per paired Mann–Whitney test; depicted in Fig. S26). Consequently, we opted to employ the hg38 genome for the removal of human reads in our dataset, ensuring data integrity and quality.

Subsequently, the clean data were assembled primarily using metaSPAdes [79] (version 3.15.5) due to its superior assembly lengths and quality. MetaSPAdes first tested the k-mer values, and selected the optimal value in 21–141 as the optimal strategy for subsequent assembly. However, MetaSPAdes failed to assemble a small fraction of samples (n = 56; Supplementary Table 1) due to the high memory demand and/or large number of sequencing reads. The failed samples were then assembled by MEGAHIT [80] (version 1.2.9) with default parameters. The resulting contigs were further refined with metaMIC [81] (https://github.com/ZhaoXM-Lab/metaMlC).

Next, MetaWRAP [82] (version 1.3.2) was employed to bin the assembled contigs of each sample. MetaWRAP integrated the binning results from three binning tools, namely CONCOCT [83], MaxBin [84], and metaBAT [84], refined the bins using its “bin_refinement” module, and generated a consolidated, high-quality bin set. This process yielded a genome set (bins) for each sample.

CheckM [35] was performed on the final bin set (default parameters) and generate the final report file for quality assessment.

The bins from all sample were then combined and de-replicated using dRep [85] (version 3.4.1) with parameters “-comp 50 -con 10 -pa 0.9 -sa 0.95 -nc 0.30 -cm larger,” which would generate a set of non-redundant metagenome-assembled genomes (MAGs) with average nucleotide identity (ANI) > 95% (i.e., species-level MAGs) with > 50% completeness and < 10% contamination, and result in 2737 MAGs. Additionally, we also generate a set with ANI > 99% (i.e., strain-level MAGs), and resulted in 5029 MAGs.

Furthermore, Barrnap (version 0.9, https://github.com/tseemann/barrnap) and tRNAscan-SE [86] (version 2.0.11) were used to identify rRNA and tRNA genes in these MAGs, respectively. The quality of these MAGs was evaluated based on the criteria defined by the MIMIG [87] which was also used by the SMGC [31], as detailed below:

  1. 1.

    High quality: MAGs with > 90% completeness, < 5% contamination and having 23SrRNA, 16SrRNA, 5S rRNA genes and at least 18 tRNA genes;

  2. 2.

    Near complete: MAGs with > 90% completeness, < 5% contamination;

  3. 3.

    Medium quality: MAGs with ≥ 50% completeness and < 10% contamination;

    Of note, MAGs with < 50% completeness or ≥ 10% contamination were considered to be low quality and removed from further consideration.

In the end, we obtained a total of species-level 2737 MAGs and referred them as to the Facial Microbiome Genome Compendium (FMGC). Of which, 62 were high quality, 732 (26.74%) were near complete, while 1943 (70.99%) were medium quality.

Detection of eukaryotic genomes and viral contigs in FMGC

The detection of eukaryotic genomes involved an initial filtration step using EukRep v0.6.7 [88] to selectively retain bins containing substantial amounts of eukaryotic DNA. Bins with more than 1 Mbp of eukaryotic bases were considered as potential eukaryotic bins. Next, Busco v5.5.0 [89] was used to estimate bin completeness and contamination. We obtained a total of 35 putative eukaryotic bins with very low completeness, ranging from 1 to 50%, suggesting no useful eukaryotic genomes could be identified in our dataset. Notably, only 2 bins reached 50% completeness (50.6% and 54.5%).

Additionally, VirSorter2 [90] (–min-score 0.7) was used to identify viral contigs. Contigs with VirSorter2 score over 0.7 were considered as viral contigs. As short contigs may only represent fragments of viral genomes, contigs that were longer than 5 kb were kept for further analysis. The kept dataset was dereplicated using CD-HIT [91] (v4.8.1, parameters: -c 0.95 -n 8) using a global identity threshold of 95%. We obtained a total of 21,777 non-redundant viral contigs. The completeness and contamination were estimate with CheckV [92]. In total, 2417 (9.86%) and 165 (0.76%) viral contigs were considered complete or high-quality (> 90% completeness).

The viral contigs could be obtained at https://figshare.com/projects/Integrated_analysis_of_facial_microbiome_illustrates_impacts_of_aging_and_skin_physio-optical_properties/173451.

Of note, our analysis focused on mainly the facial bacteria and did not incorporate the viral contigs or the eukaryotic bins in our further analyses.

Taxonomic annotation and estimation of the novelty of FMGC in comparison to public datasets

The taxonomy annotation of FMGC MAGs was conducted using the classify_wf module of GTDB-tk [33] with default parameters. MAGs lacking taxonomy annotation at the species level were identified as potential novel MAGs. To determine the proportion of novel MAGs in FMGC, the FastANI [93] (v1.1, default parameters) tool was utilized to compare all MAGs against various public datasets, including the reference microbial genomes in the SMGC and those used by metaPhlAn4. A MAG is considered novel if it exhibits < 95% ANI compared to the sequences in the public datasets and was not annotated by the GTDB-tk tool at the species level.

Phylogenetic analysis of MAGs

The de_novo_wf module of GTDB-tk [33] was used to construct a phylogenetic tree using the nucleotide sequences of selected MAGs/genomes as input. The resulting tree was visualized with iTol [94] and Evolview v3 [95].

Estimation of the relative abundance of the FMGC MAGs at all taxa level

Two methods were used to calculate the relative abundance of FMGC MAGs in each sample. The first method was MetaPhlAn4 [96] (version 4.0.3), which uses clade-specific marker genes to provide taxonomic assignments to the sequencing data and estimate the composition of microbial communities with relative abundances from species level to higher taxonomic clades such as genus, order, family, phylum, and kingdom. These profiles were filtered to retain species with a relative abundance of ≥ 1e-4.

The second method was adopted from the SMGC paper [31], which first involved aligning the clean sequencing reads from each sample to the FMGC MAGs using BWA-MEM (version 0.7.17) [97]. Then the “presence” of a MAG in a sample was estimated if over 50% of its length was covered by the aligned reads from the sample with > 4X overall sequencing depth. For each of MAGs, a reads per kilobase million (RPKM) value was calculated. However, the abundance values of the MAGs that were not “present” in the sample were set to zero. Relative abundance at the MAG (species) level was calculated by dividing the RPKM of a specific MAG by the total RPKM of all MAGs that presented in the sample.

The relative abundance of higher taxa levels was calculated as the summation of the abundances of all its members, using taxonomic annotations from GTDB-tk.

Identification of cutotypes of facial microbiome in our and public samples

To define the cutotypes based on the skin microbiome, similar to the construction of enterotype in previous studies [39, 40], samples were clustered with Jensen-Shannon distance (JSD) [98] using genus level relative abundance estimated by FMGC MAGs and metaPhlAn4. Samples were further clustered using partitioning around medoid (PAM) clustering. The PAM algorithm is an implementation of k-medoids clustering. The core idea of the k-medoids algorithm is similar to that of k-means, but the main difference lies in the way the cluster centers are updated. The optimal number of clusters was estimated using the Calinski-Harabasz (CH) index of the ClusterSim package. The CH index measures the compactness within each cluster by calculating the sum of squared distances between each point and the centroid of its respective cluster. It also evaluates the separation between the cluster centroids and the centroid of the entire dataset by calculating the sum of squared distances.

Between-class analysis (BCA) was used for clustering the samples for visualization and identifying the driver taxa using “bca” function of ade4 package [99]. BCA is a special case of principal component analysis (PCA) that utilizes instrumental variables. In this case, the variables are qualitative factors, such as skin types. BCA analysis allows us to identify the main components based on the centroids of each group, highlighting the differences between the groups, and then associating each sample with its corresponding category. It is a supervised projection of the data, where the distances between predefined classes (in this case, cutotypes) are maximized.

Correlation analysis of age, skin properties, and facial microbiome profiles

The correlations between the participant-related data including age and skin physio-optical characterizes the facial microbiome-related data including relative abundances of the taxa and alpha- and beta-diversities using either the “psych” package or the Envfit test implemented in the “vegan” package. R v.4.0.5 (https://www.r-project.org/) was used throughout the study unless otherwise stated.

Mediation analysis of age and skin properties to microbial structure

Mediation analysis implemented in the mediation R package (v4.5.0, https://www.rdocumentation.org/packages/mediation/versions/4.5.0) to determine whether age mediates the microbial structural changes through the skin physio-optical properties such as moisture, sebum production, and sensitivity. The results were visualized in Fig. 2F. Specifically, as shown in Fig. 2F, the arrow from the independent variable (age) to the mediating variable (sebum, moisture, and Tivi) illustrates the influence of the independent variable on the mediating variable. Likewise, the arrow from the mediating variable to the dependent variable (microbiome structure) demonstrates the impact of the mediating variable on the dependent variable. Finally, the arrow from the independent variable to the dependent variable indicates the overall effect of the independent variable on the dependent variable, encompassing both total effect and average direct effect. Average direct effects (ADE) refers to the direct effects of the independent variable on the dependent variable. It represents the effect of the independent variable on the dependent variable while controlling for the mediating variable(s). Total effect, on the other hand, refers to the overall effect (direct + indirect) of the independent variable on the dependent variable.

Construction of age index

To eliminate the influence of confounding factors, we exclude individuals with acne (n = 55), and those who use makeup (n = 191), resulting in a total of 229 usable samples (Table S1) in the construction of Age Index. A Pearson correlation analysis was performed on age and species abundances for the selected samples to obtain correlations and p values for each species with age. The Facial Age Index (FAI) was constructed using species that significantly correlated with age (i.e., p value < 0.05 in Pearson correlation analysis), with the equations below.

$$m{R}_{i}=\frac{{R}_{ij}}{\text{median}({R}_{i})}$$
$${I}_{p}= {\sum }_{i\in P}m{R}_{i}\times {cor}_{i}\times {log}_{10}({p}_{i})$$
$${I}_{n}= {\sum }_{i\in N}m{R}_{i}\times {cor}_{i}\times {log}_{10}({p}_{i})$$
$$\text{Facial Age Index}=\frac{{I}_{p}}{{I}_{n}}$$

\({R}_{ij}\): The relative abundance of species i in sample j;

\(median({R}_{i})\): The median relative abundance of species i in all samples;

\({cor}_{i}\): The absolute correlation coefficient of Spearman’s analysis between species i and age;

\({p}_{i}\): The correlation p value of Spearman’s analysis between species i and age.

\(P\): Species positively correlated with age;

\(N\): Species negatively correlated with age.

The FAI represents the quantification of age through skin microbes.

Impact of lifestyle on Facial Age Index while controlling for chronical age

Lifestyle information were extracted from a self-reported questionnaire, including habitual sleep restriction (whether they went to bed after 11 pm [100] more than 3 times a week, referred to as stay up late), heavy cosmetic application (whether they excessively or heavily wear cosmetics, more than 3 times a week, referred to as make up), emotional dysregulation (whether they frequently experience significant emotional fluctuations, more than 2 times a day), and constipation (less than two bowel movements per week).

To analyze the effects of these lifestyle on FAI, participants were divided into two groups according to each lifestyle. To control for the chronical age of the participants, the residuals of the FAI as a function of age was calculated using the “resid” function in R using formular of “resid( lm( FAI ~ age)).”

Estimation of the relative abundance of FMGC pathways

HUMAnN [55] (version 3.6.1) was used to annotate the microbial functions and estimate their relative abundances for each sample. HUMAnN utilizes a comprehensive database to assign functional annotations to the microbial genes present in the samples, allowing for the identification of specific biological pathways and functions associated with the microbial communities.

Identification of microbial pathways associated with aging

To identify pathways associated with age and lifestyle, participants were divided into four equal-sized groups according to their age, namely Group A (18–25 years), B (26–37 years), C (37–51 years), and D (51–64 years). Aging-associated pathways were identified as those they showed significantly different abundance between the youngest (Group A) and oldest (Groups D) groups using a Linear discriminant analysis Effect Size (LEfSe) [57]. A linear discriminant analysis (LDA) score cutoff of 3 (adjusted p value < 0.05) was used as the threshold of statistical difference. Statistical analysis using Wilcoxon rank sum test (adjusted p value < 0.1 to identified pathways with slight difference among age groups) was then carried out to compared the relative abundance across different lifestyles, and the effect sizes were visualized in Fig. 4B.

Statistical analysis

All processed data, if not otherwise stated, were loaded into R (v4.0.5, https://www.r-project.org/), analyzed or visualized. Adonis analysis were carried out with Adnois2 using adnois2 function from R package,vegan [101], with default parameters.