Introduction

Morphoea is characterised by fibrosis of the skin and/or underlying connective tissues, with the potential for significant functional and psychological impact. It is suggested that environmental triggers [1,2,3], occurring in a genetically susceptible individual, underpin the inflammation and deregulated tissue injury response in morphoea [4]. However, precise genetic susceptibility factors, inciting and propagating molecular mechanisms, remain unclear.

Linear morphoea (LM) may follow Blaschko’s lines of epidermal development, and hence may represent epidermal somatic mosaicism for a mutation conferring increased risk of disease at specific sites [5,6,7,8,9]. Accordingly, keratinocyte-derived signals and epidermal-dermal communication pathways vital to normal skin development and wound repair, are also key to pathological skin fibrosis and highly active, proliferative keratinocytes are seen in systemic sclerosis (SSc) [4, 10, 11].

However, LM is a non-congenital and morphologically heterogeneous dermal pathology, potentially suggesting more complex underlying aetiopathogenic mechanisms. Correspondingly, non-linear morphoea subtypes show alternative, but often symmetrical and somewhat predictably patterned skin involvement. As such, dermal fibroblasts have site-specific gene expression, known as positional identity (PI). Many molecular pathways instrumental in developmental patterning, regional-specific mesenchymal differentiation and epidermal fate, such as FGFs, TGF-β and Wnt [12, 13], are also involved in pathogenic fibrosis and SSc [14, 15]. Similarly, morphogenic and epidermal–dermal signaling pathways, including Wnt, Hedgehog [14, 16] and Notch [14, 17, 18], are deregulated in fibrosis and SSc [17, 19,20,21,22,23].

Morphoea’s morphological heterogeneity, clinical symmetry, patterning and possibly Blaschkoid distribution, may therefore provide clinical clues to potential underlying epidermal and dermal genetic aetiopathogenic and disease-driving mechanisms [4].

The goals of this study were to identify the presence or absence of primary somatic epidermal genomic variation (as a common single nucleotide variant (SNV), or differing SNVs in a commonly affected gene, across all study samples) in LM, and to explore differential gene expression (DGE) in isolated epidermal and dermal site-matched tissue pairs, to identify potential inciting and pathogenic pathways in the epidermis and dermis. We aimed to correlate our data with the very limited current genetic data in morphoea, to propose a possible genetic and molecular narrative underlying morphoea aetiopathogenesis and hence identify potential future study and therapeutic targets.

Methodology

This study was approved by the National Research Ethics Service (London-Hampstead, MREC Reference 6398). Tissue specimens were obtained with written informed consent as part of an ongoing programme of research into the pathogenesis of scleroderma.

Specimen source

Patients with LM involving the limb(s) and/or trunk identified from our previously characterised morphoea cohort were eligible for specimen collection [24]. A total of 16 patients were enrolled (Table 1). Details regarding sample selection for each molecular (DNA/RNA) and tissue layer (epidermal/dermal) dataset are described in the Supplemental Methods section.

Table 1 Study cohort; experimental studies and clinical characteristic

Paired 4 mm whole skin punch-biopsies were taken from each participant; one or two from morphoea affected (lesional) skin, and one or two from site-matched contralateral unaffected skin. For tissues samples utilised for DNA/RNA isolation, epidermis was immediately chemically separated from the dermis utilising 3.8% ammonium thiocyanate (Sigma-Alrich USA) in Dulbecco's phosphate-buffered saline pH 7.4 at room temperature for 25 min. Residual epidermal tissue was gently curetted off the superficial dermal surface using a scalpel blade (no. 15) [25].

DNA isolation, whole genome sequencing and analysis; epidermis

DNA was isolated from paired epidermal tissue and four selected paired samples underwent WGS. All identified genes with SNVs underwent network analysis utilising STRING online database (v11). Identified SNVs were then classified; graded according to disease relevance and sub-classified according to MAF (using ExAC) and pathogenicity (according to PolyPhen-2, PROVEAN, SIFT and CADD scores) (Supplemental Methods and Fig. 1).

Fig. 1
figure 1

Classification strategy for disease relevant gene candidates (graded as very high, high or medium according to functional relevance to morphoea aetiopathogenesis; vertical grading) and for pathogenicity (according to allele frequency and pathogenicity criteria; horizontal classification ranking)

RNA isolation, sequencing and analysis; epidermis and dermis

Total RNA was isolated from paired epidermal and dermal tissue, and selected samples underwent RNA-seq. Epidermal and dermal differentially expressed genes (DEG) were further analysed via Gene Set Enrichment Analysis (GSEA), using MSigDB Hallmark gene sets [26, 27]. Enrichment was reported as significant if the false discovery rate (FDR) was less than 0.25 [28] and each GSEA set was ranked according to log2 fold change (log2FC).

For dermal RNA-seq data, further complimentary analysis via PANTHER (PANTHER Gene Ontology (GO)-Slim Biological Process) [29] was completed. An adjusted P-value was calculated using Bonferroni correction, with a statistical significance cut-off of < 0.05. STRING database was utilised to review protein–protein interactions between products of particular DEGs of interest. (Supplemental Methods).

RT-qPCR and IHC of selected epidermal and dermal gene candidates derived from epidermal RNA-seq

Details can be found in the relevant Supplemental Methods sections.

Results

Epidermal protein coding single nucleotide variants

861 SNVs were identified in morphoea-affected epidermis, but absent in paired unaffected epidermis. Of these, 119 were protein-coding exonic and 72 nonsynonymous. No single common SNV or commonly affected gene was identified across all four sequenced epidermal tissue pairs.

A number of nonsynonymous protein-coding SNVs had high CADD scores (> 20) and pathogenicity rated as damaging or possibly damaging by at least two of PolyPhen-2, PROVEAN and SIFT algorithms, including; ADAMTS16, ADAMTSL1 and CBX2 (Table 2). STRING network analyses of these variants yielded no noteworthy gene clusters.

Table 2 All protein coding, nonsynonymous genomic variants (alphabetised, CADD scores rounded to the nearest whole number)

Disease relevance of epidermal genomic variants

No protein coding nonsynonymous SNVs were graded as very high for disease relevance. Variants in the genes ADAMTS16 and ADAMTSL1 were graded as high for disease relevance and Level 1 for potential pathogenicity and rarity. All other protein-coding nonsynonymous variants were graded as medium disease relevance (Table 3).

Table 3 Potential gene candidates from epidermal whole genome sequencing as selected by network analyses and disease relevance; graded by potential relevance to morphea pathogenesis, and sub-categorised by Level, based on potential pathogenicity

Epidermal gene expression

Only three gene transcripts were significantly upregulated, including gene paralogs SPRR4 (FDR = 0.011, Log2FC 1.266) and SPRR1B (FDR = 0.026, log2FC 1.252), and four were significantly downregulated including LAMA4 (FDR = 0.026, log2FC −1.263) and PAX8 (FDR = 0.029, log2FC −0.785). Despite FDR > 0.05, IFI27 (log2FC 1.565) and WNT2 (log2FC 1.351) were noted with log2FC > 1.

Epidermal gene signatures; gene set enrichment analysis

Thirty-six Hallmark gene sets had significant enrichment; 16 with positive and 20 with negative enrichment. TNF-α signalling via NFkB (NES = 2.514, FDR =  < 0.001), TGF-β signalling (NES = 2.006, FDR = 0.001) and IL-6/JAKSTAT3 signalling (NES = 1.961, FDR = 0.001) were the most strongly positively enriched (Fig. 2 and Table 4).

Fig. 2
figure 2

Enrichment of disease relevant Hallmark gene sets on GSEA, comparing epidermal and dermal datasets. An asterix (*) denotes significantly enriched sets (FDR < 0.25). Dermal Wnt signaling and epidermal Notch signaling were not in the top 20 differentially expressed Hallmark sets within their respective dataset and hence are not displayed graphically

Table 4 Epidermal RNA sequencing: Hallmark gene sets with significant positive or negative enrichment on GSEA, listed by NES

Dermal gene expression

Ninety-three gene transcripts were significantly upregulated, 263 downregulation and 15,206 had nonsignificant differential expression (DE). A number of immunoglobulin-related genes were amongst the most strongly DEGs [(all FDR < 0.001, log2FC > 2.927). Other genes with significant positive DE included SFRP4 (log2FC 3.277), CXCL9 (log2FC 2.709), COMP (log2FC 1.664), WNT16 (log2FC 0.742), CCL2 (log2FC 0.701), WNT2B (log2FC 0.576), NOTCH4 (log2FC 0.500)]; while MMP7 (log2FC −2.861) and NR4A1 (log2FC −0.630) were negatively expressed.

Dermal gene signatures; gene set enrichment analysis and PANTHER statistical enrichment analysis

Seventeen GSEA Hallmark gene sets were significantly enriched; 9 with positive and 8 with negative enrichment (Fig. 2 and Table 5). Sixteen biological processes were statistically enriched on PANTHER statistical enrichment testing; 7 with positive and 9 with negative enrichment (Fig. 3).

Table 5 Dermal RNA sequencing: Hallmark gene sets with significant positive or negative enrichment on GSEA, listed by NES
Fig. 3
figure 3

PANTHER Gene Ontology biological processes with significant positive and negative enrichment according to PANTHER enrichment test (Bonferroni correction, adjusted P-values listed next to biological process name)

Two distinct gene expression clusters were evident from analyses; inflammatory [GSEA: IFNα response (NES = 1.465, FDR = 0.162) and IFNγ response (NES = 1.402, FDR = 0.145), and PANTHER: Humoral immune response (P < 0.001) and Positive regulation of lymphocyte reactivation (P = 0.001) see Table 6], and; profibrotic, morphogenic signatures [GSEA: Epithelial to mesenchymal transition (NES = 1.536, FDR = 0.125) and Angiogenesis (NES = 1.422, FDR = 0.147), as well as nonsignificant positive enrichment of Hedgehog signalling (NES = 1.217, FDR = 0.291), Notch signalling (NES = 0.981, FDR = 0.655) and Wnt signalling (NES = 0.453, FDR = 0.999), and PANTHER: Multicellular organism development (P = 0.007); 434 contributory genes including WNT (WNT16, WNT10B, WNT2B), hedgehog (HHAT, HHATL), disheveled (DVL1, DVL2, DVL3) and frizzled (SMO), HOX (HOXA1a HOXA3, HOXA4, HOXA5, HOXA6, HOXA7, HOXA13, HOXB3, HOXB4, HOXB5, HOXB6, HOXB7, HOXC4, HOXC6, HOXC13) and PAX (PAX3, PAX6, PAX8)] (Fig. 4).

Table 6 Dermal RNA sequencing: transcripts contributing to the three key selected positively enriched PANTHER GO-Slim Biological Processes (multicellular organism development, humoral immune response and positive regulation of lymphocyte activation) with significant upregulation
Fig. 4
figure 4

Interactions between leading edge genes within inflammatory gene sets IFN-signaling (α and γ), and developmental related gene sets of epithelial to mesenchymal transition, Angiogenesis and Hedgehog signaling, demonstrating clustering and inter-pathway interactions. Default STRING criteria used: nodes linked by evidence, with medium confidence level of 0.4

Many HOX, PAX, SOX and CBX genes were impacted across all three epidermal/dermal and genomic/transcriptomic datasets (Fig. 5).

Fig. 5
figure 5

STRING network diagram demonstrating multiple strong and overlapping interactions between PAX, HOX, SOX and CBX genes with protein or non-protein coding epidermal SNVs on WGS and/or differential epidermal or dermal expression on RNA-seq. Nodes linked by evidence with medium confidence level of 0.4 (default STRING criteria)

Thirty-two members of the ADAM, ADAMTS and ADAMTSL super-family were nonsignificantly DE in the dermis (13 downregulated and 19 upregulated) and 12 in the epidermis (6 upregulated and 6 downregulated). Overall, 50 ADAM/ADAMTS-family genes were affected across all three datasets, including the potentially highly pathogenic (according to criteria described in Fig. 1) nonsynonymous SNVs in ADAMTS16 and ADAMSTL1 (Fig. 6).

Fig. 6
figure 6

STRING network diagram of all ADAM, ADAMTS and ADAMTSL proteases with epidermal SNVs and/or epidermal and/or dermal differential RNA expression. Nodes linked by evidence, with medium confidence level of 0.4 (default STRING criteria). Further genes with strong links to the ADAM, ADAMTS and/or ADAMTSL proteins were also included (via STRING extended analysis); two of which were the ‘delta like canonical notch ligands’ (1 and 4); linking the ADAM, ADAMTS and ADAMTSL proteins, to notch signalin

Candidate genes and pathways based on epidermal genomic and epidermal and dermal transcriptomic profiles

Based on the WGS and RNA-seq results, a number of gene candidates were selected; some for further study. Selected epidermal candidate genes included ADAMTS16, ADAMTSL1 and the inflammatory and profibrotic TGF-β1 and JUNB. Selected dermal candidates included members of some developmental and morphogenic signaling pathways; SFRP4, SIX1, WNT2 and NOTCH4. Key characteristics of these genes and justification for their selection as candidates are detailed in Table 7.

Table 7 Descriptive and statistical characteristics of selected gene candidates in epidermal and dermal tissue

RT-qPCR and immunohistochemistry validation of selected epidermal and dermal gene candidates

Two key candidate genes were validated by RT-pPCR in this study; TGF-β1 and JUNB. These were from the strongly over-expressed and highly disease-relevant TGF-β signaling gene set. TGF-β1 is the recognised orchestrator of fibrosis and the role of its epidermal production and expression have not been specifically investigated in morphoea. JUNB is also a key player in TGF-β signaling and hence with its relatively high log2CPM, JUNB was selected as the second validation candidate, keeping both genes for qPCR from the TGF-β signaling gene set (NES = 2.006, FDR = 0.001).

Expression of TGF-β1 and JUNB was higher in morphoea affected epidermis compared to the contralateral site-matched unaffected epidermis in all samples, but this trend was not significant (TGF-β1; P = 0.476, JUNB; P = 0.105, Fig. 7).

Fig. 7
figure 7

RT-qPCR validation for key epidermal upregulated TGF-β signaling genes, mean expression levels as normalised copy number; A TGF-β1, B JUNB

WNT2 was selected for validation via IHC on formalin-fixed, wax-embedded paraffin whole skin sections. WNT2 was highlighted by dermal transcriptomic profiling, subsequent pathway analysis and is a member of the developmental morphogenic pathways which are of particular relevance to the anatomical patterning in morphoea and its pathogenesis. Of note, WNT2 was also highlighted by epidermal RNA-seq.

In the dermis, WNT2 was the only Wnt signaling gene with log2FC > 1.5 (log2FC = 1.79), its FDR approached significance (FDR = 0.061), it was a leading edge gene (highest ranked) within the positively enriched Notch signaling Hallmark gene set within dermal GSEA data and was also present within the significantly enriched Multicellular organism development gene set (PANTHER GO-Slim Biological Process; P = 0.007).

WNT2 staining demonstrated discernible staining differences between morphpea-affected and unaffected control skin in both epidermis (4 of 5) and dermis (3 of 5) (Fig. 8).

Fig. 8
figure 8

High power images of immunohistochemical staining with WNT2 antibody; unaffected control skin (above) and morphoea affected contralateral site-matched skin (below); study participant 15

Discussion

In this study, WGS did not identify a single common somatic mutation occurring in all four epidermal samples taken from LM-affected skin, or a commonly affected gene across all study samples. To our knowledge, this is the first study to investigate the presence of primary genomic variation in morphoea skin. This critical finding provides robust evidence against primary genomic epidermal segmental mosaicism-related aetiology in adult-onset LM. There are several clinical complexities of LM supporting more multifaceted aetiopathogenesis. LM may not be truly Blaschkoid [8], morphoea is a dermal pathology, has vast clinical heterogeneity with complex patterning and morphology [4, 30] and is not congenital.

Accordingly, we identified 861 epidermal SNVs, including 119 protein-coding variants, many with medium to high disease relevance and potential pathogenicity, providing possible support for complex polygenic epidermal mosaicism in LM [31, 32].

The ADAM/ADAMTS-family genes were widely affected across all three datasets, including potentially highly pathogenic nonsynonymous SNVs in ADAMTS16 and ADAMSTL1, possibly pointing to their pathogenic role in morphoea. These proteins/proteases are ECM-regulators implicated in embryological morphogenesis, skin development, wound healing, fibrosis [33,34,35,36], rare primary fibrotic genetic disorders [37, 38], SSc and keiloidal morphoea [39, 40]. Using site-matched tissue-pair methodology, Badshah et.al. recently demonstrated upregulated ADAMTS8 in LM fibroblasts and whole-skin, hypothesising ADAMTS8’s role in tissue atrophy [41]. Whilst links between the ADAMTS/ADAMTSL’s and their precise functions in morphoea are unclear, their possible role in LM is further supported by our findings.

Corroborating the potential key role of the epidermis in morphoea pathogenesis, we demonstrated a structurally active, proliferative and differentiating epidermis, with significant overexpression of SPPRs, PALLD, WNT2, other cell cycle/cell division (such as p53 and KRAS signalling) and apoptosis-related gene pathways, along with significant down-regulation of checkpoint and DNA repair-related genes (such as G2M DNA checkpoint and E2F targets) (Fig. 9).

Fig. 9
figure 9

Multicomponent morphea etiopathogenesis; summary of key epidermal and dermal genes involved in morphea, as highlighted by NGS of paired epidermal and dermal tissue samples in this study

We also demonstrated an inflammatory and profibrotic epidermal gene signature, which corresponds to the early inflammatory and profibrotic disease phases previously mapped by blood cytokine profiles [42,43,44,45,46]. A Th1 response (IL-2, TNF-α and IL-6) seen in the first year, is followed by a Th17 response (IL-1, IL-17, IL-22 and TGF-β) and Th2 cytokines (IL-4 and IL-13) [47]. Accordingly, the three Hallmark gene sets with the strongest significant positive enrichment in this study were TNF-α signalling via NFkB, TGF-β signalling and IL-6/JAKSTAT3 signalling; all suggesting early active inflammatory and fibrotic phase disease (Fig. 9). This was despite study samples being from LM of at least 3-years duration and not all demonstrating an inflammatory clinical phenotype; supporting an ongoing disease-driving role of the epidermis.

Importantly, in recently published work evaluating transcriptomic whole-skin profiles of pediatric-onset morphoea, healthy controls, active and inactive disease were compared, and JAK/STATs were highlighted as the most prevalent DE pathway [48]. By separating the epidermis and dermis, we have highlighted that this signature may originate from the epidermis, promoting ongoing dermal disease activity. These findings provide further support for future studies to better elucidate precise pathogenic JAK/STAT-related mechanisms in morphoea and the use of therapeutic JAK-inhibitors in sclerotic skin disease [49].

Finally, the epidermal molecular picture was also that of a ‘wounded epidermis’, similar to the epidermal phenotype demonstrated in SSc [10, 50, 51]. TGF-β is a key orchestrator of wound healing responses, also propagating pathological fibrosis [52]. Isolating a strongly enriched TGF-β signature in morphoea epidermis is unique, significant, and could provide impetus for further study of local TGF-β inhibition in appropriate clinical scenarios of superficial disease (e.g. with pirfenidone) [53]. However, precisely whether these signals are originating in the epidermis, or due to secondary unchecked positive feedback from the dermis, remains unclear.

Relevantly, epidermal IFI27 was upregulated (nonsignificant, but with the dataset’s highest log2FC). It is known to induce IFNγ-related epidermal apoptosis. We saw significant upregulation of the epidermal Apoptosis gene set, and epidermal and dermal IFNα and IFNγ responses. IFN-signalling has been widely implicated in SSc and morphoea [11, 48, 54]. IFNγ-related chemokines and their receptors may stimulate fibroblasts, including in morphoea [46, 48, 55]. CXCL9 was significantly upregulated in morphoea dermis in our study, and it has previously been suggested as a disease biomarker [46, 55].

Importantly, IFI27 negatively regulates NR4A1 [54], which was significantly downregulated in the dermal dataset. In turn, NR4A1 is an endogenous TGF-β inhibitor [56]. Fibrotic diseases appear to utilise this NR4A1-dependent mechanism to enable persistent TGF-β signaling and deregulated fibrosis and NR4A1 agonists inhibit laboratory-induced fibrosis of the skin, lung, liver, and kidney in mice [56, 57].

Clues to another potential inciting epidermal ‘damage’ signal in morphoea lie in the significant downregulation of LAMA4. Laminins are extracellular matrix (ECM) glycoproteins involved in differentiation, cell adhesion, signaling, migration, and form a key non-collagen component of the dermo-epidermal junction (DEJ) [54]. Related DEJ disruption could plausibly enhance epidermal-dermal communication and/or act as an initiating ‘damage’ signal, inciting proinflammatory and profibrotic dermal responses. Correspondingly, LAMA4-deficiency has been linked to cardiac [58,59,60] and renal fibrosis [61].

Individual dermal-genes demonstrated far greater DGE compared to the epidermis, suggesting dermal factors are more disease-specific in morphoea; in keeping with its predominantly dermal pathology. Two distinct DGE clusters were identified; inflammatory and profibrotic. The inflammatory signature, with significant upregulation of Humoral immunity, Lymphocyte activation and IFN-response-related genes, validates and adds to the limited morphoea gene expression data currently available [11, 48, 62]. This corroborated over-expression of IFN-signalling has an immediate foreseeable opportunity for potential therapeutic exploitation via anifrolimab, FDA-approved for systemic lupus erythematosus. Interestingly, KRAS-signalling has been identified as a potential biomarker for disease activity [48]. We demonstrated significant downregulation of inhibitory KRAS-signalling in the dermis and upregulated KRAS-signalling in the epidermis also. All our cases had disease activity as demonstrated by LoScAT-activity scores of greater than zero (progressive or stable disease activity) (Tables 1, 4 and 5).

In the profibrotic DGE cluster, upregulated genes involved in embryogenesis and oncogenesis was seen such as Wnt, Hedgehog, dishevelled, frizzled family, HOX and PAX. PAX and HOX genes were specifically highlighted by PANTHER pathway analysis of dermal RNA-seq data. These families of biologically and functionally related developmental genes were collectively impacted in all three data sets (epidermal WGS, epidermal RNA-seq and dermal RNA-seq). HOX genes are the key orchestrating genes involved in fibroblast PI [12, 13, 63,64,65]. Related location-specific gene signatures confer developmental patterning, position and help determine downstream differentiation of site-specific mesenchymal cells [13, 66]. The genetic origin of fibroblasts can also alter their crosstalk with overlying keratinocytes [67]. Several HOX genes have shown significant DE in affected SSc-skin compared to unaffected skin [68] and related SOX genes have also been implicated in fibrosis and SSc [23, 69]. Accordingly, one can deduce the feasible role HOX and related developmental and patterning genes could play in morphoea aetiopathogenesis and observed clinical patterning of non-linear subtypes. Indeed, their involvement in ‘dermal mosaicism’ has been suggested.

It is also suggested that via its regulation of dermal development, epidermal Wnt- signalling could account for the Blaschkoid distribution of dermal dermatoses, including Focal Dermal Hypoplasia [70]. Twelve Wnt-signalling genes contributed to the upregulation of the GO-Slim Biological Process of Multicellular organism development; WNT2B, WNT10B and WNT16 with significant DE. WNT2 was significantly upregulated in the epidermis, approached significance in the dermis (FDR = 0.061) and both these RNA-seq results were validated with IHC whole skin staining. Correspondingly, WNT2, WNT3A and β-catenin have previously demonstrated increased activity via IHC staining in both SSc and morphoea [71] and the role of Wnt-signalling in morphoea is established [20, 55, 71,72,73,74,75]. Dermal SFRP4 was also significantly upregulated and recent data demonstrated the upregulation of SFRP2 in morphoea dermal fibroblasts [55]. SFRPs are homologous to the Wnt-binding site on frizzled proteins and, therefore, modulate Wnt-signalling via direct interactions [54]. Interestingly, SFRP4 expression in the myocardium is associated with an apoptotic-related gene expression profile [54], feasibly associating its overexpression in morphoea to a disease-related damage signal.

Limitations of this study include its cross-sectional nature, small datasets and limited validation of transcriptomic data. It is also impossible to differentiate primary from secondary gene expression changes or to adjust for treatment effect.

In summary, despite the often assumed Blaschkoid distribution of LM, data from this study indicate the absence of a single epidermal developmental somatic mutation responsible for disease causation. Instead, this study’s molecular (genomic and transcriptomic) and tissue (epidermis and dermis) layered approach highlights possible polygenic epidermal mosaicism in initiating a complex multicomponent disease aetiopathogenesis. A wounded epidermal phenotype could, perhaps via Wnt-signalling, depletion of NR4A1 and other complex tissue layer crosstalk, contribute to the consequent inflammatory dermal fibrosis of morphoea, with its variable patterning possibly explained, at least in part, by the involvement of HOX, SOX, PAX and WNT developmental patterning genes (Fig. 9).