Introduction

In light of its complex biology and its involvement in numerous diseases, the mammalian skin provides an attractive paradigm for research in a variety of topics, including stem cell biology, cell adhesion and inflammation [1, 2]. The skin fulfills numerous key functions, which include the formation of a physical, chemical and biological protective interface with the external environment, supporting body thermoregulation by harboring hairs and secreting sweat and lipids, and by acting as a sensory organ. Structurally, it consists of two layers separated by a basal membrane [2, 3]. The upper epidermis is a multilayered stratified epithelium, constantly renewed by a balance between self-renewal of basal cells and their differentiation, which includes detachment from the basal membrane and terminal differentiation in a process termed cornification. The lower dermis consists essentially of fibroblasts in an extracellular matrix and harbors structures such as hair follicles with associated sebaceous glands (SG), sweat glands, nerves, and sensory receptors as wells as blood and lymphatic vessels. The dermis also includes a variety of immune cells as dendritic cells, macrophages, T-cells, mast cells, and eosinophils. The lowest part of the dermis is continuous to a fat layer, called dermal white adipose. The skin additionally includes different types of stem cells, muscle cells (responsible for the goosebumps) and pigment-containing melanocytes; altogether, the skin is composed by more than 50 different cell types [4].

These features make the skin a particularly attractive study object for methods as single-cell RNA sequencing (scRNA-seq), which can deal with the heterogeneity of RNA transcripts and assess different cell types and their functions in highly organized tissues [5]. scRNA-seq has been extensively applied to assess different aspects of human skin pathophysiology, such as the characterization of epidermal stem cells and the study of wound healing, fibrotic diseases or skin cancer [6,7,8]. A number of studies also assessed single cell transcripts of mouse skin under a variety of conditions, providing novel insights into epidermal homeostasis, hair follicle cycling, or wound healing [4, 9,10,11].

SG are exocrine glands associated with hair follicles, whose secretion (sebum) lubricate and protect the skin and hair, in addition of having other (rather uncharacterized) antimicrobial and antioxidative protective functions [12]. SG cells (sebocytes) can be divided in distinct populations according to their maturation stage: the peripheral zone is composed of flat, mitotically active cells in contact with the basal lamina. Under the influence of hormones and local factors, peripheral cells start synthesizing and accumulating lipids in the form of intracellular droplets and are displaced towards the middle of the gland, forming the maturation zone. As lipid accumulation proceeds, cells undergo an extraordinary increase in volume, become distorted in shape, display pyknotic nuclei, and organelles become generally unrecognizable. This process culminates in membrane disruption and release of the cellular contents at the center of the gland, in a process called holocrine secretion [13,14,15]. The cell debris reach the skin surface via the sebaceous duct and the hair follicle canal; during this journey, the oily product of sebocyte disruption may pick up additional substances and its lipids may by modified, resulting in the formation of sebum. Sebum composition differs significantly from that of adipocyte-produced fat, and is composed by large amounts of triglycerides, diglycerides and free fatty acids (57%), wax esters (26%), squalene (12%), and cholesterol (2%) [16]. Wax esters and squalene are typical for sebum and are normally not found elsewhere in the body. Interestingly, sebum fatty acids show unusual saturation and branching patterns [17, 18]. As the proportion of the different lipid classes significantly differs among species [19], comparisons between human and laboratory animal SG and sebum is challenging.

In the last years, the key role of SG in modulating skin immunology and inflammatory responses [20, 21] or its neuroendocrine regulation [22] became increasingly appreciated. While interest in SG increased substantially, the gland and its product are considerably less characterized than other skin compartments as the hair follicle, and commonly overlooked when studying skin biology. For instance, it was shown that epidermal stem cell differentiation is modulated by specific lipid species [23], but this study was restricted to epidermal lipids and did not include sebaceous lipids, which is regrettably considering that the latter make out the majority of lipids at the skin surface. The latter circumstance was the rationale for the recent study by Inoue and colleagues, who assessed transcripts in skin surface lipids and found predominantly mRNAs derived from SGs [24]. In another study, laser-capture microdissection followed by bulk RNAseq (LCM-RNAseq) was used to define the SG gene transcriptional profiles [25].

Except for a recent study focusing on PPARG-positive mouse sebocytes [26], SG transcripts have not been at the focus of scRNA-seq studies so far. While in some cases sebocytes were absent in the sample analyzed because of the chosen skin location [27, 28], other scRNA-seq studies probably did capture sebocytes transcripts, but these were either not annotated or not followed in detail because the study had a widely different focus [10, 11, 29]. Here, we re-assessed two well-annotated scRNA-seq data sets from human [30] and mouse [9] skin to assess and compare SG transcripts in these species. We selected these studies because they provide a comprehensive annotation of epidermal samples that allows to identify individual cells comprising the SG.

Methods

In this study we analysed already existing scRNA-seq data of human [30] and mouse [9] skin from a sebaceous gland perspective. Although the original works were not focussed on SG, the authors identified such samples in their approach. According to the authors, all experiments were carried out in accordance to local ethical and legal regulations. All sample handling and sequencing experiments belong to the original work; we therefore refer to these publications for details about the experimental setup.

We applied the statistical software R [31] and its packages biomaRt [32] and oposSOM [33] to re-analyze both datasets to unravel the characteristics of SG transcripts in comparison to the other labelled epidermal regions. In addition we used the R package ggplot2 [34] to visualize our results. Furthermore, the identification of homologue genes is part of biomaRt [32], whereas the oposSOM-framework [33] was used to explore the gene expression in both datasets individually. oposSOM uses various metrics, such as logFC or min–max, for sample ranking and geneset analysis and has been applied successfully in the past to a variety of gene expression analysis such as cancer specific modifications [35, 36] and in particular also to characterize scRNA-seq samples [37, 38].

The mouse scRNA-seq data [9] consisted of epidermal cells from dorsal skin of C57BL/6 wild-type mice during second telogen at around 8 weeks of age. This dataset consists of 13 epidermal regions comprising approximately 1400 cells and 26000 genes. The human epidermal scRNA-seq samples [30] included normal surgical tissue discards from circumcisions, reduction abdominoplasties and mammoplasties, and scalp excisions. The human dataset consists of 7 epidermal regions comprising approximately 93000 cells and 19000 genes.

All scRNA-seq samples are provided as total reads per cell. The base-10 logarithmic total reads were analyzed by the oposSOM R-package [33]. In a first attempt this data is normalized and genes with constant expression are removed from the final dataset. Thereby, the genes are clustered into metagene profiles with each metagene serving as a representative of a cluster of similarly expressed genes. For both datasets, we used a grid of 30 × 30 metagenes and applied most parameters with default settings provided in the oposSOM-framework. Deviating from the default options, we specified sample groups according to the subpopulations denoted in the original work [9, 30], increased the parameter training.extension and fixed the parameter dim.1stLvlSom (see Supplementary Table 1). Thus, oposSOM provides various categories of interesting areas of metagenes ranked by internal criteria, such as logFC differences or the GSZ score.

Finally, in each dataset we analysed unsupervised the mouse and human SG-samples only. Unsupervised implies that we did not predefine sample groups. With this setting, oposSOM sorts the samples by internal similarity criteria and assign them afterwards to a sufficient number of distinct groups. To avoid a bad separation following from bias through local similarities or too strict segregation, we increased the parameter training.extension (see Supplementary Table 1).

We used the R-package ggplot2 [34] to visualize the violin expression patterns. There we applied scale = ”width” to overcome the different amount of samples (two datasets were split into four subsets: SG cells and others for mouse and human data) and normalized any violin to the same maximum width.

For demonstrating the localization of selected proteins by immunohistochemistry, we assessed the Human Protein Atlas (https://www.proteinatlas.org/). Details of this open access resource are provided elsewhere [39].

Results

Sebaceous gland cells are heavily underrepresented in skin scRNA-seq data

The assessed mouse scRNA-seq data [9] encompassed interfollicular and follicular epidermal cells from dorsal skin samples. Unsupervised clustering resulted in thirteen main cell populations, including different compartments of the epidermis and hair follicle, immune cells, and SG cells. SG cells were characterized by Scd1/Mgst1 expression and made up only 1.3% (19 / 1422) of total analysed cells (Fig. 1a). The human epidermal scRNA-seq samples [30] originated from adult scalp skin, adult truncal skin, and neonatal foreskin. Scalp samples were further categorized into seven subpopulations inspired by the clustering provided in the mouse scRNA-seq dataset [9], of which the follicular cluster of the scalp samples turned out to include SG cells [30]. Resembling the mouse study, human SG cells were characterized by APOE/MGST1 expression and made up only 8.5% (260 / 3044) of total analysed cells (Fig. 1a). Both mouse and human SG transcripts, as originally analysed, build up a discrete population when visualized with t-distributed stochastic neighbour embedding (Supplementary Fig. 1). We therefore took advantage of the available scRNA-seq data and the specified SG subpopulations, and analysed the whole data set with the machine learning open-source R-package oposSOM [33].

Fig. 1
figure 1

Identification of mouse and human sebaceous gland cells with oposSOM. a Scheme of the main skin epithelial compartments analysed by scRNA-seq, the defining marker and the percentage of identified sebaceous gland cells relative to all cells analysed. The skin outline was drawn by using a picture from Servier Medical Art (licensed under a Creative Commons Attribution 3.0 Unported License; https://creativecommons.org/licenses/by/3.0/). b Expression portraits of the sebaceous gland populations in mouse and c human. d A combined legend of cell populations from mouse [9] and human [30]. Tissues that are split in several subgroups, such as uHF-1, uHF-2, uHF-3, are aggregated to a common group. Sebaceous gland cells are represented in red. APOE: Apolipoprotein E (human samples only); IB: Inner bulge (human and mouse samples); IFE: Interfollicular epidermis (human and mouse samples); KRT: Keratin 15/glutathione peroxidase 4 (human samples only); LH: Langerhans (mouse samples only); OB: Outer bulge (human and mouse samples); TC: T-cells (mouse samples only); UHF: Upper hair follicle (human and mouse samples). Correlation trees demonstrate mouse (e) and human (f) sebaceous gland cell clustering and segregation from the remaining population

oposSOM-based scRNA-seq data analysis distinguishes SG cells

oposSOM characterizes each subpopulation, denoted as group, by an individual expression portrait, each of them providing areas with strongly (red) expressed metagenes. The expression portraits of the SG population in mouse (Fig. 1b, Supplementary Fig. 2) and human (Fig. 1c, Supplementary Fig. 3) scRNA-seq samples are thereby clearly discriminated from all remaining cell populations. The SG cell clustering and segregation is also illustrated by corresponding correlation trees (Fig. 1d,e,f). Furthermore, for the entire dataset, oposSOM identifies regions of metagenes, denoted as spots, that are expressed strongly in a subset of samples. In contrast to the pre-defined groups, these spots may compose samples from various populations.

No single overlap in the 15 highest expressed genes in mouse and human sebaceous glands

To assess which genes are commonly expressed at high levels in mouse and human SG exclusively, we selected the 15 transcripts with the highest oposSOM rank in both spots (Fig. 2a) consisting primarily of SG cells. As expected, the mouse dataset included a number of genes previously known to be expressed specifically or at high levels in SG (Scd3, Mgst1, Cidea, Awat2), and the human dataset included the human SG-specific marker KRT7 as well as FABP7, a protein involved in fatty acid uptake, transport, and metabolism. Several other SG-typical transcripts were also clearly enriched in the human SG dataset compared to the remaining cells, including ELOVL5 (logFC 1.89), PLIN2 (logFC 1.08), MGST1 (logFC 2.54), and CIDEA (logFC 1.21). However, to our surprise, there was not a single overlap in the listed genes when comparing both species. In fact, the first overlap is represented by CLMP, which is ranked at position 12 in the human samples and at position 111 in the mouse samples.

Fig. 2
figure 2

Identification of sebaceous gland spots and the highest expressed genes. a The top 15 enriched genes in mouse [9] and human [30] sebaceous gland exclusive spots. Samples assigned to the sebaceous gland spot show a different expression pattern than other samples. b In both mouse and human datasets, oposSOM identified one spot that consists of sebaceous glands cells almost exclusively, the color code indicates (red: maximal) the amount of homologous genes in the corresponding spot of the other species

Twenty-five commonly expressed genes characterize mouse and human sebaceous glands

Adhering to our goal to identify common genes in mouse and human SG, we next compared both spots for similar gene ontology sets (via oposSOM ranking) and genes via homology mapping provided with the R-package biomaRt [32]. This processing identified 25 mouse and human homologue genes (Fig. 2b), which are depicted with their respective FC-values and main attributes in Table 1 and Extended Table 1 (Additional file). The identified transcripts include genes whose expression in SG seems logical considering their high lipid metabolism, such as ELOVL7 and LPCAT3. However, for the most part, the identified genes have not been associated with this gland so far (Table 1).

Table 1 Transcripts jointly upregulated in human and mouse sebaceous gland cells. An extended table in machine readable format is provided in the supplement (Extended Table 1)

To confirm the expression of the corresponding proteins, we accessed the Human Protein Atlas [39]. While SGs are not annotated in this database, several immunohistochemically-stained skin samples include SG, thus allowing an evaluation of protein expression of the identified transcripts. In total, 10 out of the 25 candidate proteins could be evaluated, and, except for SERPINB1, expression in SG cells could be confirmed for all proteins (Fig. 3). Adipocyte plasma membrane–associated protein (APMAP) and methylcrotonoyl-CoA carboxylase subunit alpha, mitochondrial (MCCC1) are particularly strongly expressed in the whole SG, while elongation of very long chain fatty acids protein 5 (ELOVL5) shows intensive staining restricted to the peripheral layers of the SG. Weaker, but homogeneous expression was observed for motile sperm domain-containing protein 1 (MOSPD1), mitochondrial-processing peptidase subunit beta (PMPCB), ring finger protein 5 (RNF5) and staphylococcal nuclease and tudor domain containing 1 (SND1), while glycogen phosphorylase B (PYGB) expression was clearly restricted to the utmost peripheral layer of the gland (Fig. 3).

Fig. 3
figure 3

Expression of the protein encoded by the mouse/human core genes in human sebaceous glands by immunohistochemistry. Skin samples from the Human Protein Atlas (proteinatlas.org) immunohistochemically stained for the corresponding protein were screened for the presence of sebaceous glands

Peroxisomal genes are strongly enriched in mouse and human sebaceous gland cells

Based on the ranking provided by oposSOM, we identified within the TOP10 ranked Gene Ontology (GO) sets, four GO sets (Table 2) associated with the SG cells in both datasets. It was again not surprising to see fatty acid metabolism and mitochondrial processes among the enriched GOs. Peroxisomal processes, in contrast, have not been so far considered as a hallmark of SG activity. Undoubtedly, this general observation of peroxisomal activity in all SG cells corresponds to an enrichment of peroxisomal genes inside the human (22/124, p = 2e-18) and the mouse (8/114, p = 0.005) spots consisting almost exclusively of SG cells. The TOP30 representatives of the GO-set peroxisome identified in all SG samples are listed separately for mouse and human genes in Table 3 (see the Additional file Extended Table 3 for fold-change values and gene IDs). TOP30 genes allocated in the SG spots too are underlined.

Table 2 Gene ontology sets associated with mouse and human SG samples
Table 3 Peroxisomal transcripts (GO:0005777) upregulated in human and mouse sebaceous gland cells. Underlined transcripts are allocated in the SG spots too. An extended table in machine readable format is provided in the supplement (Extended Table 3)

To confirm protein expression of the identified genes in human SG in the skin in situ, we again assessed the Human Protein Atlas [39]. Due to their presence in selected immunohistochemical-stained skin samples, it was possible to evaluate staining in SG for 11 out of the 30 peroxisomal genes assessed, and all of the assessable proteins except for peroxisomal targeting signal 2 receptor, also known as peroxin 7 (PEX7), revealed to be expressed with a range of staining intensities (Fig. 4). Although staining for PEX7 was negative, we were able to confirm expression of other peroxins (PEX2, PEX3, PEX6, and PEX12), thus confirming the presence of numerous peroxisomal proteins in human SG (Fig. 4).

Fig. 4
figure 4

Expression of proteins encoded by the genes of the GO term “Peroxisome” in human sebaceous glands by immunohistochemistry. Skin samples from the Human Protein Atlas (proteinatlas.org) immunohistochemically stained for the corresponding protein were screened for the presence of sebaceous glands

No additional insights by sebaceous gland-specific sample exploration

In a final analysis, both SG populations only were explored in an unsupervised manner. In this approach, oposSOM classifies the cells according to internal criteria. In this process the human samples are allocated in six groups and seven spots, while the mouse samples are characterized by four groups and nine spots. Comparing their overlap, it seems that the unsupervised composed groups do not allow a more detailed functional insight into the SG transcriptome (Supplementary Fig. 4). Considering the small number (n = 19) of SG samples in the mouse dataset, this statement might be revised once more numerous samples become available.

Discussion

In this study, we harnessed skin scRNA-seq data to throw light on the transcriptional landscape of mouse and human SG cells. The employed mouse scRNA-seq dataset [9], while arguably rather small for more advanced bioinformatic analyses (it includes only 19 individual SG cell transcripts), has been shown to be highly predictive for the identification of sebocyte-specific proteins [40]. Furthermore, the genes with the highest expression in our analysis encode numerous proteins of recognized relevance in SG physiology, such as Scd3, Mgst1, Cidea, and Awat2. By using the machine learning open-source R-package oposSOM [33], we were able to derive SG-specific cell characteristics from both mouse [9] and human [30] datasets. Hereby, oposSOM takes advantage of these well-annotated datasets and characterizes each group, denoted as tissue subpopulation by the authors, through an individual expression portrait, each of them providing areas with strongly expressed metagenes. Conversely, these areas of strongly expressed metagenes are available for the entire dataset too, and will be used to identify cell subsets with similar expression pattern which are incorporated to individual spots in this process. In that way, these spots comprise cells that preferentially express a certain gene set. Yet, a spot may either reflect an originally defined tissue-allocated cell population, or it may consist of cells from various tissues. In this work we focused on those spots that comprise almost exclusively cells allocated in the SG. Notably, these spots do not include all SG cells. Moreover, both datasets contain spots comprising cells from various tissues, including SG cells.

The 15 genes with the highest expression in the mouse spot associated mainly with SG samples encode a range of proteins intimately associated with SG, such as Scd3 [12], Mgst1 [41], Cidea [42], and Awat2 [43], while the human dataset included only KRT7 [44] as a prominent human SG-specific marker. Nevertheless, numerous other SG-typical transcripts were massively enriched in the human SG dataset (ELOVL5, PLIN2, MGST1, and CIDEA, among others). Whether the difference in the density of SG-typical transcripts in the TOP15 list reflects an unfavorable choice of markers in the human samples, resulting in the inclusion of non-sebaceous cells in the subset, remains to be determined. Anyway, quite astonishingly, there was not a single overlap when both species were compared, indicating that our analysis revealed a previously unknown core of genes relevant for both mouse and human SG. These include numerous mitochondrial and peroxisomal genes involved in fatty acid, amino acid, and glucose processing, thus highlighting the high metabolic rate of this gland. While SGs are not annotated in the Human Protein Atlas [39], database, we were able to assess the protein expression of part of the identified genes by screening immunohistochemically-stained skin samples for the presence of SG. Altogether, 10 out of the 25 candidate proteins could be evaluated, and, except for SERPINB1, expression in SG cells could be confirmed at different intensities and in part with a conspicuous expression pattern for the remaining proteins.

As an example, expression of PYGB, which catalyzes the rate-limiting step of glycogenolysis [45], was clearly restricted to the uttermost peripheral layer of the glands. Glycogen presence in SG has been described decades ago [46], and it is known that this glucose polymer is an important substrate for sebum synthesis [47]. Our finding highlights a potential role for glycogen as a substrate for sebaceous energy production and biosynthesis, possibly in a way similar to that recently reported for hair follicles [48]. The identification of SLC1A3, a sodium-dependent, high-affinity amino acid transporter provides another clue for deciphering energy metabolism in SG, as previous studies revealed SLC1A3 to be involved in stem/progenitor cell activation in different skin niches, including the SG [49]. Thus, our systematic analysis of scRNA-seq data complements targeted studies, such as the recent identification of embigin as a modulator of SG cell adhesion and metabolism [40] departing form the data in Joost et al. [9], and provides novel starting points for experimentally addressing different aspects of SG homeostasis.

In contrast to the small overlap of transcripts within the SG associated spots, the ranking provided by oposSOM identified an overlap of four GO sets already within the Top 10 GO sets associated with the entire SG population in both datasets. While fatty acid metabolism and mitochondrial processes as enriched GOs were clearly expectable, it was rather surprising to also detect peroxisomal processes, as the latter have not been so far considered a hallmark of SG activity. Yet, peroxisomal genes were clearly enriched in both mouse and human SG genes with high expression, and expression of numerous peroxisome-relevant in human SG was confirmed by assessing immunohistochemical-stained skin samples from the Human Protein Atlas [39]. Peroxisomes are organelles present in almost all eukaryotic cells with manifold metabolic functions, including α- and β-fatty acid oxidation, synthesis of bile acids and plasmalogens, and response to oxidative stresses by handling reactive oxygen species [50]. Peroxisomes have been described in SG cells, where they may form extensive tubular aggregates [51]. The latter authors speculated that the wide variety in peroxisomal complex morphologies may in part explain the striking differences in sebum composition across species and across single individuals of the same species [18, 19]. Our data highlight the role of peroxisomes in SG metabolism and will stimulate the functional analysis of the identified proteins in sebaceous biology.

Conclusions

We anticipate that future studies focusing on samples restricted to SG instead of whole skin will increase the resolution and the analytic power of the obtained data, thus providing novel insights into SG homeostasis, its secretory activity, and its interaction with other organs and tissues. In addition, our findings may also indicate novel potential targets for modulating SG activity in skin diseases involving altered SG function, as acne vulgaris, atopic dermatitis, and psoriasis.