Background

Skin is a multi-layered tissue that is composed of continuously renewing epidermis – with keratinocytes (KCs) as a predominant cell type – and underlying dermis populated mostly by fibroblasts. The life span of epidermal keratinocytes is controlled by two alternative pathways: differentiation as their normal function or activation as an altered function in wound healing or skin diseases [1]. Epidermal KCs residing in the basal layer of the epidermis differentiate through multiple layers and finally shed as cornified dead cells from the skin surface [2, 3].

The relatively noninvasive sampling together with the methods that allow culturing of pure KCs have greatly facilitated research on skin and KCs. In cell culture, KCs are uncoupled from their tissue environment that naturally provides a network of homeostatic control signals; they are induced to either retain an active proliferative state or to differentiate. However, the prolonged KC culturing leads to the induction of cellular senescence [4] and therefore not only primary KCs but also immortalized KC lines, such as HaCaT (a spontaneously immortalized cell line) [5], have been widely studied to understand various normal and altered functions of the skin. HaCaT cells represent a highly popular model system since despite some UV-inducible mutations in TP53 alleles [6, 7] they are non-tumorigenic and have retained their capacity to differentiate [5, 7, 8]. The comparability of each of the models to intact skin has often been questioned. In the current study, we address the question of how representative models cultured KCs and HaCaTs are for studying human epidermis.

Genome-wide expression profiling is an useful approach to screen key genes with respect to different cellular statuses and to further model the regulatory networks [9]. Microarray technology provides a traditional profiling method to measure thousands of known genes simultaneously, but it has recently been replaced by RNA-seq technology that has proven to give more detailed insights into transcriptome. Both technologies have been previously applied to study the gene expression in skin [1012]. However, an important fact has been largely omitted: when KCs undergo their complex lifecycle, they change not only cell size and cell cycle kinetics, but also the actively transcribed RNA content, with the largest RNA content in fresh, actively growing cultured KCs [13]. In microarray, RNA-seq and even qRT-PCR, the same amount of total RNA is loaded for each sample, although yields of the polyA+ RNAs purified from total RNAs may differ. Moreover, normalization for the differential expression test expects equivalent expression levels for several co-expressed genes [14]. Therefore, the genome-wide expression profiling in the previous studies might have underestimated the complexity of the KC transcriptome during their lifecycle.

In this study, we revisit the skin and KC transcriptome with respect to fluctuation of polyA+ RNA content by the keratinocyte statuses; differentiated, activated, senescent and immortalized. Four types of human keratinocyte samples represented these cell statuses: epidermal tissue (split-thickness skin grafts; SGs), cultured primary KCs in early and late passages, and HaCaT cell line. To reduce the sample size and sequencing costs and to control the fluctuation of mRNA concentration, we applied single-cell tagged reverse transcription (STRT) sequencing method for expression profiling using 10 ng of total RNA per sample which is ten times less than required for a conventional RNA-seq method [15]. For accurate expression profiling and statistical tests, we employed STRT RNA-seq with synthetic polyA+ spike-in RNA [16], and SAMstrt statistical package with spike-in based normalization [14]. We first evaluated the improvements of our approach on the genome-wide expression profiling and confirmed the accuracy of the improved methods by literature survey of the keratin and collagen genes. Then we extracted genes that correlated with the sample types, and genes contributing to the sample classification, especially transcription factors [17] and long noncoding RNAs [18], as candidate regulators for keratinocyte characters. These results provide new insights into the skin transcriptome and into the usefulness of primary KCs and HaCaTs as model systems.

Results and discussion

Sample preparation, STRT RNAseq, and quality control

The protocol for sample preparation is depicted in Fig. 1. We collected SG biopsies with minimal inclusion of dermis and full thickness skin biopsies for KC culture from 8 donors undergoing plastic surgery (Additional file 1: Table S1). The SG samples were used directly for total RNA extraction whereas the full-thickness biopsies were used to set up KC cultures from which total RNA was extracted at early (1st; EKC) and late (5th-6th; LKC) passages. After RNA quality control (Additional file 2: Table S2), 16 samples were used to prepare two STRT libraries using 3 technical replicates, each containing 10 ng total RNA (Additional file 3: Table S3). Each STRT library was sequenced on four lanes of Illumina HiSeq 2000 instrument. In average, there were 10.8 million STRT reads and 7.92 million mapped reads per replica. After alignment to the human genome and gene-based quantitation, we confirmed consistency between all technical replicates (Additional file 4: Table S4, Additional file 5: Figure S1 and Additional file 6: Figure S2). One sample (10k1b, EKC replica 2 of donor 10k) exhibited exceptionally high relative polyA+ transcript counts (Additional file 5: Figure S1), and displayed unexpected overexpression of some abundantly expressed genes (Additional file 6: Figure S2). Therefore, we excluded it from further consideration as an outlier potentially biasing further analyses.

Fig. 1
figure 1

Sample collection. a Split-thickness skin graft (SG) samples were harvested in the operating room using a dermatome. Lower panel shows a haematoxylin-eosin stained section of SG sample demonstrating minimal dermis involvement (light blue tissue). b Full thickness biopsy samples were collected as 3 mm punch biopsies. Red dotted line in the schematic drawing of skin (middle panel) demonstrates sampling depth for SG samples (upper line) and punch biopsy samples (lower line). Keratinocyte (KC) cultures were established from punch biopsy samples after enzymatic dissociation, and isolated primary KC were cultured to the passage 1 (EKC samples) and to the passage 5/6 (LKC samples). c The spontaneously immortalized model keratinocyte cell line, HaCaT, was used as a cell line model. Total RNA was isolated from each sample as outlined in Materials and Methods

Varying polyA+ RNA content in 10 ng total RNA

A common assumption is that cells to be compared contain equal amounts of RNA. However, KCs have been shown to change their RNA content over time in long-term culture [13], leading us to further investigate the differences in polyA+ RNA contents in our samples. We found that the estimated polyA+ RNA contents as quantified against the added spike-in RNA controls varied in different sample types (Additional file 5: Figure S1). The variation was larger than the variation of the repeatedly measured total RNA amounts that were loaded for sequencing (Fig. 2a). Such differences can lead to the misinterpretation of differential expression when traditional endogenous gene-based normalization is applied. This is demonstrated in Fig. 2b, which is a comparison of 10k donor samples between SG and EKCs. The endogenous gene-based normalization method did not estimate the spike-in levels equivalently although the amount of spike-in RNAs were equal in all samples. Because the normalized spike-in levels must be same in the comparison, we employed the recently developed normalization method, SAMstrt [14], which uses exclusively the spike-in RNAs for normalization (Fig. 2c). Validation assays by qRT-PCR confirmed the upregulation of two housekeeping genes, RPLP0 and RPL13A, as predicted by the spike-in-based normalization, both of which were predicted unchanged or downregulated by the gene-based normalization (Fig. 2d). Moreover, the spike-in-based normalization method provided more consistent expression patterns in multiple samples with the qRT-PCR measurements (Fig. 2e). In conclusion, the observed variation in the polyA+ RNA content in 10 ng total RNA led to a misinterpretation of the expression pattern by the gene-based normalization method, but it became more reliable by the spike-in based normalization.

Fig. 2
figure 2

Amounts of polyA+ RNA in cells and effect of normalization. a Relative human polyA+ RNA amounts in samples were estimated by relating the human-specific sequence counts to the spike-in sequence counts. The amounts polyA+ RNA detected were EKC>LKC=HaCaT>SG. The EKC samples contained approximately twice as much polyA+ RNA as the SG samples. The samples contained equal concentrations of cellular RNA. b Comparison of gene expression between SG and EKCs of donor 10k by applying the gene-based normalization as implemented in SAMseq. Gene expression levels are shown as light gray dots (left panel), levels of spike-in RNAs as asterisks, and three control genes with distinct symbols. The spike-in RNAs appear twofold downregulated, although all samples contained equal amounts of spike-in RNAs. With the gene-based normalization, the Q-Q plot (right panel) shows approximately similar numbers of up- and down-regulated genes, shown as deviations from the diagonal. c The same comparison of gene expression between SG and EKCs by applying spike-in normalization as implemented in SAMstrt (left). Spike-in RNA counts follow the diagonal, but many more polyA+ RNAs appear upregulated than downregulated as shown by the upward shift of the gray cloud compared to (b). Q-Q plot (right) shows many more upregulated than downregulated genes, consistent with the increase of relative polyA+ RNA amount as shown in (a). d To experimentally validate which normalization yields a more correct analysis, qRT-PCR assays were performed on selected genes. Delta-Ct values indicate no change in spike-in RNA, approximately twofold upregulated RPL13A and RPLP0, and 8-fold upregulated GAPDH, consistent with the spike-in-normalized results. e Gene expression profiles for three sample types (SG, EKC, LKC) are similar between SAMstrt-normalized and RT-PCR results, whereas the SAMseq-normalized profiles show mild U shapes for RPL13A and RPLP0, different from the unchanged levels by RT-PCR and SAMstrt

Characterization of SGs, cultured KCs, and HaCaTs by differentially expressed genes

When we assessed the transcriptome profiles in the different samples, we found 11,908 differentially expressed genes (Additional file 7: Table S5). Among them, 40 out of 58 cytokeratin genes were differentially expressed (Fig. 3a), and many of them well known markers for the KC differentiation status both in cell culture and in tissue [1, 19, 20]. Hierarchical clustering confirmed significant contrasts between three sample types: SG, cultured KC and HaCaT. SGs contained several cytokeratin transcripts corresponding to cells at differentiated or differentiating epidermal layers (KRT1, 2, and 10), whereas the KCs contained cytokeratins typical of cells that maintain their proliferative capacity (KRT5 and 14) or that are activated by wound healing, hyperproliferative skin diseases or in vitro culturing (KRT6, 16, and 17) [1, 20]. KRT8 and KRT18 that are the developmentally first keratins absent from normal skin and rather characterizing simple epithelia are in our data expressed by both cultured KCs and HaCaTs supporting their proliferative and undifferentiated nature [20].

Fig. 3
figure 3

Differential expression between the SGs, cultured KCs and HaCats. a The heat map panel to the left shows the expression of cytokeratins in different samples. Annotations for cytokeratins in the panel to the right for the gene expression on different layers and tissue types (orange) are based on Takahashi et al. [19], for the status in activation cycle (green) are based on Freedberg et al. [1], and for the protein expression in five cell types (blue) are based on Human Protein Atlas, HPA [21]. According to HPA, white denotes no data, gray means not detected (primary cell types) or with negative intensity (HaCaT), and red represents varying protein levels according to the intensity of the colour. SG, Split-thickness skin graft; EKC, early passage keratinocyte; LKC, late passage keratinocyte; HCTP, HaCaT. b The heat map shows the expression of collagens in different samples

There were no significant differences between the early (EKC) and late (LKC) passage KCs in the cytokeratin levels. In contrast, HaCaT was remarkably different from both SGs, EKC and LKC expressing higher levels of cytokeratins 4, 13, 15, 19 and 81 (KRT4, 13, 15, 19 and 81). Furthermore, several cytokeratins (KRT7, 8, 18, and 19) characterising HaCaT by the spike-in normalization (Fig. 3a) were consistent with published protein expression patterns, low expression in skin, but high in HaCaT [21]. This observation was only partially supported by the gene-based normalization (Additional file 8: Figure S3) – by that method KRT7 and 8 did not reach significant difference, suggesting that the gene-based normalization was less sensitive or less accurate than the spike-in normalization.

To further confirm the sample classification by STRT results, we compared the sample specific expression of different collagen genes (Fig. 3b), which have previously characterized expression profiles in different types of tissues [22, 23]. 25 out of 44 collagen genes were differentially expressed. SGs expressed mainly connective tissue specific collagens (types I, V and VI), consistent with the presense of thin connective tissue layer underlying the epidermis, whereas cultured cells were characterized by the basement membrane (types IV) and skin hemidesmosome (type XVII) specific collagens again supporting the proliferative nature typical for the KCs in basal layer of epidermis.

Characterization of SGs, EKCs, LKCs, and HaCaTs by coregulated genes

We approached the question how comparable cultured cells are as model systems for intact skin by elucidating the gene expression differences between the samples. We applied Principal Component Analysis (PCA; Fig. 4) to decompose the differences into several dimensions and to simplify the complexity of our dataset. PCA enabled multiclass comparisons between samples and improved the interpretation of expression profiles. The first principal componene (PC), that depicts the largest variation between samples, classified the SGs as a separate group compared to the other samples. The second and third PCs separated the early and late passages of KCs from HaCaTs, respectively. In the following paragraphs, we interpret the meaning of each PC axis with literature surveys to assess appropriateness of the sample classifications based on our expression profile, and to decode insights into the differentiation and the growth control.

Fig. 4
figure 4

Sample classification by the principal component analysis. PCA scatter plots using the spike-in normalized gene expression profiles. PC1 demonstrates the contrast between SGs and other sample types, PC2 depicts the contrast between EKCs and HaCaTs, and PC3 shows the contrast between HaCaTs and LKCs. Percentages beside of the axis labels are the contribution ratios. Identical symbols between SGs and KCs denote identical donors in three technical replicas each. SG, Split-thickness skin graft; EKC, early passage keratinocyte; LKC, late passage keratinocyte

PC1 demonstrates the contrast between SGs and other sample types

First, we extracted genes correlating with the PC1. In this component, positively correlated genes referred to those with higher expression in the SGs when compared to KCs or HaCaTs, and negatively correlated genes acting oppositely. Our data showed that 3,402 genes correlated positively with the PC1 and thus were expressed higher in SGs (Table 1a and Additional file 9: Table S6), although the total polyA+ content of SGs was lower than that of KCs and HaCaTs. In contrast, 4,663 genes correlated negatively (Table 1a and Additional file 9: Table S6). Nine out of ten most upregulated genes in the SGs were previously associated with epidermal differentiation or with small organelles in differentiating cells (Table 1a), and five of ten most upregulated genes in KCs and HaCaTs were annotated as localized in mitochondria (Table 1b), which are lost from keratinocytes during epidermal differentiation [24]. To conclude, the most correlated known genes contributing to PC1 were thus consistent with the biological phenotype of the contrasted cell types.

Table 1 Ten most correlated genes with PC1. Tables a and b are subsets of significantly correlated genes with relevant functions in skin lineage (for full data, see Additional file 9: Table S6). a Positively correlated genes, which were upregulated in SGs, and (b) negatively correlated genes, which were upregulated in KCs and HaCaTs

To further interpret the PC1 and to find the associations between genes and phenotypes contrasted on PC, we performed gene set enrichment analysis amongst all genes that correlated with that PC (PC-GSEA). Our results showed that 136 out of 7,801 gene sets correlated positively, being upregulated in the SGs (q-value FDR < 1 %; Additional file 10: Table S7). The most significant gene set (Table 2a) was target genes of p53 and p63, which are known mediators of keratinocyte differentiation [25]. Similarly, 1,340 gene sets showed negative correlation (q-value FDR < 1 %; Additional file 10: Table S7). The 2nd, 4th, 6th, 9th, and 10th of significant gene sets (Table 2b) were related to mitochondria. Therefore, the PC-GSEA extracted meaningful contrasts of the biological functions and the phenotypes to interpret the PC1.

Table 2 Ten most significant gene sets enriched in PC1 correlating genes. Tables a and b are subsets of gene sets that were significantly (FDR q-value < 1 %) enriched in PC1 correlating genes by PC-GSEA (for full data, see Additional file 10: Table S7). (a) Gene sets in positively correlated genes, which were upregulated in SGs, and (b) gene sets in negatively correlated genes, which were upregulated in KCs and HaCaTs. SIZE is number of genes belonging the gene set. NES is normalized enrichment score

Since the PC scores representing the functions and phenotypes were calculated by linear combination of the expression profile and the loading coefficients, genes with large loading coefficients on each PC would be the key regulators for the functional contrast. For example, in case of PC1, genes with high positive loading coefficients contribute to the characteristic phenotypes and functions of SGs, and conversely, genes with high negative loading coefficients contribute to common characteristic phenotypes and functions of HaCaTs and KCs. When we extracted genes contributing to PC1, 223 genes showed positive loading and 104 genes showed negative loading (three sigma; Additional file 11: Table S8). 12 of the 223 positively contributing genes that explain the functions in SGs, and one of the 104 negative ones, characterizing the cultured cells, were known transcription factors (Fig. 5a; the definition of a transcription factor was based on [26]). Six of the 12 transcription factors upregulated in SGs were known regulators for skin maturation and differentiation phenotypes (Table 3). As an example, POU2F3 (a.k.a. Skn-1a/Oct11) is a known transactivator of the suprabasal layer marker KRT10 [27], which correlated positively with PC1 (Score = 0.89, local-FDR = 6.00 × 10−4), and is also a known repressor for the basal layer marker KRT14 [28], which correlated negatively with PC1 (Score = −0.68, local-FDR ~ 0). To conclude, differentiation and the mitochondrial phenotypes are possible interpretations of the PC1 and can be explained by the fluctuation of transcription factors that were clearly associated to sample types.

Fig. 5
figure 5

Hierarchical clustering of expression profiles of high loading loci contributing to the sample classification for PC1, PC2 and PC3. Panel (a) shows transcription factors (TFs) in the high loading features, and panel (b) known noncoding genes. The expression of different TFs separates SGs, KCs and HaCaTs from each other whereas the expression of noncoding genes mainly separates HaCaTs from other samples. The contribution to the three first PCs is shown to the right red denoting positively contributing genes and green denoting negatively contributing genes. SG, Split-thickness skin graft; EKC, early passage keratinocyte; LKC, late passage keratinocyte; HCTP, HaCaT

Table 3 Transcription factors, positively contributing to PC1. Table shows a subset of significantly contributing loci with relevant functions in skin lineage (for full data, see Additional file 11: Table S8)

PC2 depicts the contrast between EKCs and HaCaTs

PC2 captured the second largest variation which mostly shows the contrast between HaCaTs and EKCs, with intermediate LKCs (Fig. 4). Three thousand nine hundred sixty nine genes correlated positively with PC2 and were thus expressed at higher level in EKCs (Additional file 9: Table S6), and six of the top 10 positively correlated genes were polyA+ RNA binding proteins (Table 4). In contrast, there were no negatively correlated genes that would appear upregulated in HaCaT, which may depend on the fact that HaCaTs and LKCs had less polyA+ RNA than EKCs (Fig. 2a). Accordingly, PC-GSEA towards PC2 revealed 23 positively correlating and only two negatively correlating gene sets (Table 5 and Additional file 10: Table S7). When we investigated associations in the correlating genes and the gene sets as an interpretation of PC2, we found that the PC2 explained the difference of G1/S-transition between the EKCs and HaCaTs both through negative and positive correlation. First, the genes that bear H3K27me3 in ES cells and have high-CpG-density promoter, showed negative correlation (i.e. upregulation in HaCaTs; MIKKELSEN_ES_HCP_WITH_H3K27ME3; q-value FDR ~ 0) [29]; the genes with H3K27me3 marks have actually been shown to be transcribed at G1/S- and S-phases in HaCaTs. Second, target genes of RB1, which is known to be negative regulator of the S-phase entry, showed positive correlation (i.e. upregulation in EKCs; EGUCHI_CELL_CYCLE_RB1_TARGETS, q-value FDR = 4.76 × 10−3) [30].

Table 4 Ten most positively correlated genes with PC2. Table is a subset of positively correlated genes with PC2, and the thick mark indicates if the gene codes polyA+ RNA binding protein (for full data, see Additional file 10: Table S6)
Table 5 Most significant gene sets enriched in PC2 correlating genes. (a) Table is a subset of gene sets (ten out of 23) significantly (FDR q-value < 1 %) enriched in PC2 positively correlating genes, which were upregulated in EKCs, by PC-GSEA (for full data, see Additional file 10: Table S7). (b) Table is gene sets significantly enriched in PC2 negatively correlating genes, which were upregulated in HaCaTs. SIZE is number of genes belonging the gene set. NES is normalized enrichment score

Then we proceeded to find genes that are the key regulators for the functional contrast on PC2. 137 genes showed high positive loading to PC2, and 111 genes showed negative loading (three sigma; Additional file 11: Table S8). Among them, five positive loading genes and nine negative ones were transcription factors (Fig. 5a). Interestingly, the positive loading genes APBB1 [31], ESPL1 [32], TCF19 [33] and ZNF300 [34] were known cell cycle regulators or contributors to cell proliferation. Moreover, one of the 137 positive loading genes and seven of the 111 negative ones were known lncRNAs (Fig. 5b). UCA1 was a negatively loading lncRNA, known as essential for bladder cancer cell proliferation via CREB-dependent pathway [35]. Interestingly, Cyclin D1 expression in KCs is also under the CREB-dependent pathway [36]. In conclusion, the difference of G1/S-transition between HaCaTs and EKCs is a possible interpretation for PC2, and again the fluctuation of transcription factors was highly associated to sample types.

PC3 shows the contrast between HaCaTs and LKCs

PC3 associated mostly with the variation between HaCaTs and LKCs (Fig. 4). We could not detect positively correlated genes (upregulated in LKC) with PC3, although these samples had equivalent polyA+ RNA contents (Fig. 2a). In contrast, 2,992 correlated negatively refering to upregulation in HaCaT (Additional file 9: Table S6), and five of the top 10 negatively correlated genes codes polyA+ RNA binding protein (Table 6). In PC-GSEA, only 30 gene sets correlated positively with PC3, whereas 241 correlated negatively (Additional file 10: Table S7). As the interpretation of PC3, we found that it was associated with senescense, in accordance of KC senescence that does not apply to the continuously proliferating HaCaT cells. One explaining component was the positive correlation (upregulation in LKCs) of potassium channel genes (Table 7a). Potassium channel activation inhibits proliferation by activating a senescence program in breast cancer [37], and the G0/G1-arrest is accompanied by this activation. Consistently, not only the genes for G0 and early G1 phases, but also the other cell cycle associated genes were downregulated in LKCs (Table 7b). Furthermore, miR-192 and miR-34 target genes were upregulated in HaCaTs (negative correlation with PC3; Table 7c). Those miRNAs are functionally associated with p53-dependent cellular maintenance and aging (miR-34 [38, 39], and miR-192 [40, 41]).

Table 6 Ten most negatively correlated genes with PC3. Table is a subset of negatively correlated genes with PC3, and the thick mark indicates if the gene codes polyA+ RNA binding protein (for full data, see Additional file 9: Table S6)
Table 7 Significant gene sets enriched in PC3 correlating genes. Tables a, b and c are subsets of gene sets significantly (FDR q-value < 1 %) enriched in PC3 correlating genes by PC-GSEA (for full data, see Additional file 10: Table S7). (a) Table is a subset of the ten (out of 30) gene sets most significantly enriched in the positively correlating genes, which were upregulated in LKCs. (b) Table is cell cycle associated gene sets significantly enriched in the negatively correlating genes, which were upregulated in HaCaTs. (c) Table is miRNA-target gene sets significantly enriched in PC3 negatively correlating genes. SIZE is number of genes belonging the gene set. NES is normalized enrichment score

We next attempted to find supporting genes for this interpretation. A total of 117 genes showed high positive loading to PC3, and 144 genes showed high negative loading (three sigma; Additional file 11: Table S8). Among them, six positive and 15 negative loading genes were transcription factors (Fig. 5a). One negatively loading (upregulated in HaCaTs) gene was RAD51, known to be involved in the homologous recombination and the repair of DNA. Expression of RAD51 is regulated by Lamin A (LMNA) [42], and expression of the LMNA was consistently negatively correlated (Local-FDR = 8.99 × 10−4). It’s well known that the mutations of LMNA lead to Hutchinson–Gilford progeria syndrome characterized by a premature aging [43]. Therefore, senescence and cellular aging responces would be a possible interpretation for PC3.

Conclusion

STRT RNA-seq method complemented with synthetic RNAs revealed a variation of polyA+ RNA content per total RNA in different cell types, SGs, EKCs, LKCs and HaCaTs, reflecting the activity of the cell type. Even though the STRT reads are concentrated towards the 5’end of the polyA+ transcripts and the method has limited resolution at 2nd and more downstream exons, the advantages of the method include the small amount of starting material needed for library preparation, early multiplexing of up to 92 samples reducing the cost and time of library preparation, and the inclusion of external spike-in RNAs as a standard procedure. The spike-in normalization has been shown to be a valuable tool when comparing samples with fluctuating polyA+ RNA contents [14, 44]. We showed that the use of spike-in-based normalization produced consistent results with qPCR validations, and provided us with deeper insights into KC biology. In contrary, the traditional gene-based normalization method led to inaccurate expression profiles. Moreover, our approach would be applicable not only for the studies on KCs but also for the other studies with fluctuation of polyA+ RNA content, for example those on single cells with different types or sizes [45, 46].

We applied PCA to elucidate dissimilarity between the samples, and also to decompose the differences. The three first PCs represented differentiation and the mitochondrial phenotypes between SGs and cultured cells, G1/S-transition between HaCaTs and EKCs, and senescence and cellular aging responces between HaCaTs and LKCs. All cultured cells differed from tissue samples and HaCaT cells differed remarkably from other cultured cells based on both PCA and the comparison of previously known KC markers, cytokeratins. Our results thus suggest that great caution should be payed when using cultured primary KCs and cell models like HaCaTs as models for skin, especially when focusing on the pathways revealed by PCA. The transcriptomes of cultured primary KCs and HaCaTs resemble that of acivated skin rather than normal skin as shown also by others [10, 11].

In this study, we present an approach to compare highly varying cell types by applying synthetic RNA based normalization. For further studies or applications of other biological events, the key is to find the hidden associations between genes and phenotypes which are contrasted on PCs. We also note that there are still many poorly annotated genes in the genome that might be revealed by our approach. Moreover, PCA without sample pre-classification might be applied in studying gene expression in complex disorders using a large enough cohort.

Methods

Sample collection, cell culture, RNA extraction, and STRT RNA-seq

All subjects involved in this study provided written informed consent under a protocol adherent to the Helsinki Guidelines and approved by the Institutional Review Board of the Helsinki University Central Hospital.

Split-thickness skin grafts (SGs) were harvested as outlined previously [47]. Briefly, SG samples were obtained using a compressed air-driven dermatome (Zimmer®, Warsaw, IN) with a fixed thickness setting of 2/1000 in. to obtain a representative sample of epidermis to its full thickness with minimal dermis involvement from the donor site skin. In order to initiate keratinocyte cultures, full thickness skin samples (3-mm diameter punch biopsies) were collected (Additional file 1: Table S1). The quality of SG samples was examined from HE-stained paraffin sections. The SGs were immediately immersed in RNAlater to ensure the least possible manipulation and gene expression changes.

Epidermal cells were isolated from the full thickness skin with dispase digestion followed by trypsinization to enable collection of all primary cell types and phenotypes of the epidermis in the initial harvest. Human primary keratinocytes were cultured in Keratinocyte Growth Medium 2 (PromoCell # C-20011) with calcium (0.06 mM) and PromoCell supplements (#C-39016). Cell culture dishes were coated with collagen I (Gibco Rat tail, A-10483-01). Cells were routinely passaged, and samples were collected from early (passage 1; EKC) and late passages (passage 5 or 6; LKC). The human immortalized keratinocyte HaCaT cell line was grown in low glucose DMEM (Lonza) supplemented with 5 % FBS, 2 mM L-glutamine, 1 mM sodium pyruvate solution, 0.1 mM non-essential amino acids, 100 U/ml penicillin and 100 μg/ml streptomycin at 37 °C and 5 % CO2. HaCaT cells were collected at passage 43 for RNA isolation.

Total RNA was extracted by miRNeasy kit (Qiagen) from both tissue samples and cells. RNA concentrations were measured by Nanodrop and Qubit and the quality was controlled by Bioanalyzer (RIN for all samples >8; Additional file 2: Table S2).

STRT RNA-seq

Qualified total RNA samples (10 ng of each replicate, three replicates for each sample) were used for RNA sequencing library preparation according to the STRT protocol [16], which was adjusted for 10 ng samples by decreasing the number of cycles to 10 during the first PCR amplification. The libraries (Additional file 3: Table S3) were sequenced using an Illumina HiSeq 2000 instrument. Preprocessing of STRT reads, alignments and per-gene quantitation were performed by an established analysis pipeline [16]. Differential expression analysis was performed by SAMstrt [14], and also by SAMseq [48] for an example of the gene-based normalization. Differentially expressed genes by the four sample types were extracted by multiclass response test; threshold of the significantly regulated genes is Local-FDR < 1 %. Gene type (e.g. protein coding, pseudo, noncoding), genes of glycolysis/gluconeogenesis and transcription factors were classified based on Entrez Gene, KEGG pathway [KEGG:hsa00010] and Ravasi et al. [26], respectively. For cytokeratin expression analysis, we selected the gene symbols that begin “KRT” followed by number. We performed PCA with the scaling but non-centering preprocess steps. Correlation of gene expression with PC was estimated by SAMstrt quantitative response test. Scores of samples on a PC were given as the quantitative values, and threshold of the significantly correlated gene is Local-FDR < 1 %. PC-GSEA, which tests correlation of gene set with PC, was by GSEA [49] preranked test towards c1, c2, c3, c5, c6 and c7 of MSigDB version 4. The ranked lists contain the gene correlation scores estimated by SAMstrt as descrived above; threshold of the significantly correlated gene set is q-value FDR < 1 %.

qRT-PCR

The qualified total RNA samples 10kD, 10k1 and 10k6, which were used for sequencing with STRT method, were subjected to cDNA synthesis. Equal amount of the SPIKE-in RNA mix (ArrayControl RNA, AM1780M, Ambion) was added to each cDNA synthesis reaction according to the STRT-library preparation protocol described above. cDNA synthesis was carried out using oligo dT-primers and SuperScript III First-Strand synthesis system for RT-PCR (18080–151, Invitrogen) according to manufacturer’s instructions. 5 ng of cDNA was applied to each qPCR assay and each sample was run in three technical replicates. qPCR was carried out using an ABI PRISM 7500 Fast Real-Time PCR System with Fast SYBR® Green Master mix (4385617, both from Applied Biosystems) according to manufacturer’s instructions. The primer sequences are shown in Additional file 12: Table S9.

Availability of supporting data

The processed STRT reads supporting the results of this article are available in the European Nucleotide Archive (http://www.ebi.ac.uk/ena/data/view/PRJEB8997).