The intracellular domain of BP180/collagen XVII is intrinsically disordered and partially folds in an anionic membrane lipid-mimicking environment

The trimeric transmembrane collagen BP180, also known as collagen XVII, is an essential component of hemidesmosomes at the dermal–epidermal junction and connects the cytoplasmic keratin network to the extracellular basement membrane. Dysfunction of BP180 caused by mutations in patients with junctional epidermolysis bullosa or autoantibodies in those with bullous pemphigoid leads to severe skin blistering. The extracellular collagenous domain of BP180 participates in the protein’s triple-helical folding, but the structure and functional importance of the intracellular domain (ICD) of BP180 are largely unknown. In the present study, we purified and characterized human BP180 ICD. When expressed in Escherichia coli as glutathione-S-transferase or 6 × histidine tagged fusion protein, the BP180 ICD was found to exist as a monomer. Analysis of the secondary structure content by circular dichroism spectroscopy revealed that the domain is intrinsically disordered. This finding aligned with that of a bioinformatic analysis, which predicted a disordered structure. Interestingly, both anionic detergent micelles and lipid vesicles induced partial folding of the BP180 ICD, suggesting that in its natural environment, the domain’s folding and unfolding may be regulated by interaction with the cell membrane or accompanying proteins. We hypothesize that the intrinsically disordered structure of the ICD of BP180 contributes to the mechanism that allows the remodeling of hemidesmosome assembly.


Introduction
BP180, also known as collagen XVII and BPAG2, is a trimeric transmembrane collagen and a structural component of hemidesmosomes that anchors epidermal keratinocytes to the underlying basement membrane (Franzke et al. 2005;Walko et al. 2015). The importance of BP180 for epidermal adhesion is demonstrated by hereditary and acquired blistering skin diseases: junctional epidermolysis bullosa in which COL17A1 mutations causing either absence or dysfunction of BP180 lead to severe, lifelong blistering of skin and mucosa, and bullous pemphigoid in which autoantibodies against BP180 cause severe itch and widespread blistering in elderly people (Franzke et al. 2005).
BP180 belongs to the family of type II transmembrane proteins with collagenous structure (Franzke et al. 2005). This group consists of collagens XIII, XVII, XXIII, and XXV, which are mainly involved in cell-matrix interactions, and the more distantly related gliomedin, ectodysplasin A, macrophage scavenger receptors I and II, and macrophage receptor MARCO proteins, which have functions in the maintenance of the nodes of Ranvier in myelinated axons, the development of ectodermal organs, ligand binding, and internalization and microbial defence, respectively (Franzke et al. 2005). In this protein family, all four collagens, gliomedin, and ectodysplasin all undergo a shedding of their extracellular (ecto) domain.
BP180 has a large 120 kDa ectodomain with 16 noncollagenous (NC) domains interrupted by 15 collagenous sequences (Fig. 1a). The ectodomain, which is shed by ADAM (a disintegrin and metalloprotease) proteases both constitutively and following inducible stimuli, is involved in wound healing (Franzke et al. 2002(Franzke et al. , 2004Jackow et al. 2016). In contrast to the other transmembrane collagens that harbor only a short cytoplasmic stub, the intracellular domain of BP180 (BP180 ICD) consists of 467 amino acids (aa) and has a molecular weight of approximately 50 kDa (Fig. 1a) (Franzke et al. 2005). While the function of BP180's ectodomain has been the focus of intensive research, the structure and functions of its intracellular domain are yet to be fully elucidated.
BP180 ICD is known to interact with other components of the hemidesmosome, such as integrin β4, the plakin family proteins BP230 and plectin and acidic14-3-3σ, and thereby connects hemidesmosomes with keratin intermediate filaments (Koster et al. 2003;Fontao et al. 2004;Fig. 1 Sequence modeling and immunoblotting suggest that the BP180 ICD has both disordered and folded regions. a The primary structure of BP180. The intracellular domain (ICD) is marked in green, the transmembrane domain in black, collagenous and noncollagenous regions in the extracellular domain (ECD) with gray and white, respectively. b Alignment of secondary structure/disorder status of the human BP180 ICD. Sequences predicted to have extended, β-strand secondary structure are marked with 'E', helices with 'H' and disordered regions with 'd'. Boxes indicate subdomains that may be folded. aa 456-467, not present in the truncated construct, are underlined. c Plot of hydropathy vs. net charge by PONDR modeling. Red and blue dots represent known disordered and folded proteins, respectively. The position of BP180 ICD is marked with a green diamond. d GST-BP180 fusion proteins FP1 (aa 2-168), FP2 (147-305), FP3 (261-401) and FP4 (377-455) were expressed in Escherichia coli and glutathione sepharose pull downs (Tuusa et al. 2019) were analyzed by immunoblotting with anti-GST antibody. In the case of FP1 and FP2, the polypeptide of highest molecular weight represents the full-length fusion protein. Experiment was replicated three times Li et al. 2007;Walko et al. 2015). It has also been implicated in the cytoplasmic signaling cascade that follows the shedding of the ectodomain and allows squamous carcinoma cell survival and proliferation (Galiger et al. 2018). Furthermore, phorbol ester-induced protein kinase C (PKC)-mediated phosphorylation of BP180 ICD leads to the dissociation of BP180 from hemidesmosomes (Kitajima et al. 1992(Kitajima et al. , 1995, and anti-BP180 IgGs induce the PKC-dependent macropinocytosis-mediated internalization of BP180 (Iwata et al. 2016). To date, the structure of human BP180 ICD has been described only at the level of its primary architecture. However, electron microscopy of Triton X-100 solubilized purified bovine BP180 suggests a coiled-coil trimeric protein with a flexible extracellular tail and a globular intracellular head domain (Hirako et al. 1996). The amino acid sequence of BP180 ICD is known to harbor a few special features including four short repeats, a glycine tract, and a cysteine cluster in its carboxyterminal end, however, without known function (Giudice et al. 1992).
Here we have expressed the human BP180 ICD in Escherichia coli as a soluble polypeptide, purified, and characterized its physicochemical properties. Bacterially expressed BP180 ICD is a monomeric and intrinsically disordered in solutions, but undergoes partial folding in the presence of anionic detergent micelles or lipid vesicles.

Static light scattering
Molecular mass and oligomerization state were analyzed with a MiniDawn multi-angle light scattering (MALS) device (Wyatt Technology Corporation, Santa Barbara, USA) connected to a Shimadzu high-performance liquid chromatography (HPLC) unit (Shimadzu Corporation, Kyoto, Japan) with a Superdex 200 Increase SEC column (GE Healthcare Life Sciences). The RID-10A refractive index detector (Shimadzu Corporation) connected to the HPLC system was used as a concentration source for the calculations. ASTRA software (Wyatt Technology Corporation) was used to calculate molecular weight.

Circular dichroism spectroscopy
Samples were dialyzed against a 10 mM potassium phosphate pH 7.5, 150 mM NaF buffer. Then a Chirascan™ circular dichroism (CD) spectrometer (Applied Photophysics Ltd, Leatherhead, UK) with 1 mm pathlength quartz cuvettes was used to measure CD spectra at 190-280 nm wavelengths under the following conditions: sample in buffer alone; immediately after the addition of sodium dodecyl sulfate (SDS), n-dodecylphosphocholine (DPC) (Merck), unilamellar lipid vesicles containing dimyristoylphosphatidylcholine (DMPC)(Sigma) and dimyristoylphosphatidylglycerol (DMPG)(Sigma) in 1:1 molar ratio, or calcium acetate; immediately after the addition of both lipid vesicles and calcium acetate. The unilamellar lipid vesicles were prepared as described before . Every spectrum is an average of three scans. The Bestsel (Micsonai et al. 2015) web server was used for spectrum deconvolution and to provide estimations of the contents of the protein's secondary structure.

SDS-PAGE, co-sedimentation analysis and immunoblotting
Proteins were analyzed by separating them in 4-20% polyacrylamide gradient TGX gels (Bio-Rad Laboratories, Inc. Hercules, USA) then subjecting them to PageBlue (Thermo Fisher Scientific, Inc. Waltham, USA) staining or immunoblotting. Proteins were identified by the Proteomics and Protein Analysis core service at the Biocenter of the University of Oulu, Finland, using in-gel digestion and MALDI-TOF mass spectrometry. For co-sedimentation analysis of lipid vesicles, 21 μM BP180 ICD was incubated with DMPC:DMPG vesicles using a protein:lipid molar ratio of 1:20 for 1 h at 22 °C followed by 1 h centrifugation at 20,000×g. Samples from supernatants and dissolved pellets were analyzed by immunoblotting using the Endo-2 anti-BP180 ICD antibody (Franzke et al. 2002). GST fusion proteins were detected by an anti-GST monoclonal antibody (clone 8-326; Thermo Fisher Scientific, Inc).

Bioinformatic analysis suggests that the BP180 ICD has a disordered structure
Sequence analysis did not provide strong homology-based hints regarding the structure of the BP180 ICD. PSI-BLAST searches showed only a few non-collagen XVII protein sequences that aligned locally with the BP180 ICD, and those had low (< 30%) sequence identity. These included the heavily O-glycosylated extracellular domain of CD45 tyrosine phosphatase, which likely has a disordered structure (data not shown). Phyre2 modeling, which was done both for full-length BP180 and for its ICD, indicated some folded regions dispersed among disordered sequences ( Fig. 1b and data not shown). The modeling of the ICD as a part of the full-length protein provided a few modest hits for bacterial glycosyltransferases. It is estimated that 70% of the ICD is composed of disordered regions. We then used the JPred4 and PSIPred 4.0 software packages to determine secondary structures and the DisoPred3 and PONDR packages to map disordered regions. Taking these findings together, it appears that the BP180 ICD might have a few secondary structures, in its N-terminal half and a larger folded region between aa 335 and 425 (Fig. 1b). In the net charge vs. hydropathy plot, the BP180 ICD was mapped on the side of the folded proteins, but close to the border between the ordered and disordered regions (Fig. 1c). These predictions are in line with our observations that the GST-BP180 fusion proteins FP1 (containing aa 2-168) and FP2 (aa 147-305) were accompanied by degradation and/or truncated translation products typical for disordered proteins, when expressed in Escherichia coli, whereas the fusion proteins FP3 (aa 261-401) and FP4 (aa 377-455) were expressed as full-length soluble proteins without degradation or premature termination (Fig. 1d). Taken together, while the bioinformatic modeling does not exclude the possibility of a stable folded structure, it suggests that the BP180 ICD is intrinsically partly disordered.

Bacterially expressed and purified BP180 ICD is prone to aggregation in an environment with low ionic strength
To study the structural properties of the BP180 ICD, we first expressed GST-BP180 ICD using constructs encoding aa residues 2-467 and 2-455 in the Escherichia coli strain BL21 (DE3), assuming that the GST tag would promote the folding of a fusion partner. Both GST fusion proteins were expressed as soluble proteins with the accompanying degradation and/or truncated translation products (Fig. 2a). The GST fusion BP180 ICD aa 2-455 (hereafter GST-BP180 ICD), which lacks 12 C-terminal amino acids including four cysteine residues, had a slightly higher yield (Fig. 2a) and was selected for further analysis. The GST-BP180 ICD was purified by glutathione sepharose affinity chromatography followed by dialysis, thrombin digestion to remove the cleaved GST tag and cation exchange chromatography. Despite the comprehensive optimization of its expression and purification conditions, the BP180 ICD was found to be prone to aggregation in low-salt conditions during thrombin digestion. Furthermore, the MALDI mass spectrometry analysis of tryptic fragments of the 75 kDa protein band visible in SDS-PAGE confirmed the identity of GST-BP180, but also revealed the co-purification of the bacterial chaperone protein DnaK, which appears to have the same molecular weight (data not shown), a finding that may indicate the presence of unburied hydrophobic residues in the BP180 ICD.
Since the relatively large GST (28 kDa) might also sterically hinder the folding of the BP180 ICD, we changed to a short 6xHis affinity tag. Both N-or C-terminal 6xHistagged fusion proteins (His-BP180 ICD, BP180 ICD-His) were expressed as soluble proteins. The His-BP180 ICD was purified in four steps: The removal of the DNase resistant nucleic acid remnants by polyethylenimine precipitation was followed by Ni affinity, cation exchange, and size exclusion chromatographies. This process yielded near-homogenous His-BP180 ICD (Fig. 2b). At low concentration (≤ 1 mg ml −1 ) in a high-salt buffer (≥ 300 mM), the His-BP180 ICD was stable at + 4 °C for several weeks. However, the His-BP180 ICD was prone to aggregation at higher protein concentrations, in dialysis against a low-salt buffer or after a freeze-thaw cycle. This suggests that the BP180 ICD can be expressed in a bacterial expression system as a soluble protein and purified to homogeneity, but may have a natively unfolded structure under these conditions.

Bacterially expressed BP180 ICD is a monomer
In mammalian cells, the full-length BP180 is a trimer in which the extracellular NC16A domain forms a coiled coil, triggering the collagenous domains to form a triple helical Fig. 2 In Escherichia coli, the BP180 ICD is expressed as a soluble monomer. a SDS-PAGE analysis of the BP180 ICD (aa2-455 and aa2-467). The solid arrows indicate the 75 kDa GST-BP180 ICD with overlapping Escherichia coli DnaK contamination. The thrombin cleaved BP180 ICD, GST and truncated translation/degradation products are marked with an arrowhead, open arrow and an asterisk respectively. b The UV-curve of HiLoad 16/60 SEC and corresponding protein gel of the BP180 ICD with a N-terminal 6xHis-tag. c A sample was analyzed with Superdex 200 Increase SEC-coupled MALS-instrument. Light scattering analysis of the elution peak, mass distribution, and molecular weight of purified His-tagged BP180 ICD. The UV, refractive index, and light scattering profile are shown in the box. MALS was performed twice from both of the two protein purifications ▸ assembly (Kroeger et al. 2017). However, it is unknown whether the cytoplasmic ICD domains are able to bind to each other independently. This was examined by static light scattering.
Purified His-BP180 ICD was run through SEC column and analyzed by refractive index detector and MALS detector unit. Two scattering peaks were detected (Fig. 2c). The first peak eluted in the void volume at a low protein concentration (below UV 280nm detection) and the main light scattering peak with a high UV absorption eluted at 24 min (Fig. 2c). The determined molecular weight, 50 kDa (Fig. 2c), corresponds with both the theoretical value of 48 kDa calculated from the amino acid sequence and the apparent size of 55 kDa in SDS-PAGE. Importantly, the light scattering data indicated that His-BP180 ICD, without the transmembrane and extracellular domains of full-length BP180, exists as a monomer in solution.

BP180 ICD is intrinsically disordered but partially folds in the presence of an anionic membrane mimicking detergent micelles and lipid vesicles
We used circular dichroism (CD) spectroscopy to investigate whether the BP180 ICD has a folded structure and to measure its secondary structure elements. The CD spectrum of BP180 ICD resembles one typical of a random coil structure (Fig. 3a), suggesting that bacterially expressed BP180 ICD is a disordered protein.
In mammalian cells, the positively charged ICD (pI = 9.61) of BP180 is localized next to the cell membrane, which generally has a negative net charge on the cytoplasmic leaflet (McLaughlin 1989). Therefore, we tested whether a low concentration of negatively charged sodium dodecyl sulfate (SDS) affects the structure of the BP180 ICD. SDS micelles induced an increase of CD signal between 190 and 205 nm, a clear shift of the minimum wavelength from 200 to 205 nm and a change in the shape of the spectrum at 220 nm (Fig. 3a). The Bestsel analysis indicated an increase from 12.0 to 23.1% in the relative proportion of α-helices upon SDS addition (Fig. 3a). In contrast, the zwitterionic detergent n-dodecylphosphocholine (DPC) did not have any effect (Fig. 3a).
Next, we measured the CD spectrum for BP180 ICD in the presence of negatively charged unilamellar DMPC:DMPG lipid vesicles with a 1:100 protein:lipid molar ratio. A clear change in the CD spectrum was seen (Fig. 3b), with a more pronounced negative CD signal at the 222 nm wavelength as well as a decrease in the negative peak at 198 nm. This indicates a partially α-helical conformation with some regions remaining disordered. Interestingly, the inclusion of 1 mM calcium with the lipid vesicles partially reversed the effect of the lipids (Fig. 3b). Finally, we expressed and purified C-terminally tagged BP180 ICD-His to ensure that the lack of secondary structures was not due to the non-native N-terminus. The CD spectrum measured after two purification steps showed the BP180 ICD-His to have a disordered structure (Online Resource- Fig. S1). Taken together, these CD spectroscopy findings suggest that bacterially expressed Fig. 3 The BP180 ICD is an intrinsically disordered protein that undergoes partial folding in an anionic lipid environment. a CD spectra of BP180 ICD (solid curve, n = 3), BP180 ICD + neutral lipid n-dodecyl phosphocholine (dotted curved, n = 3) and BP180 ICD + anionic lipid sodium dodecyl sulphate (dashed curve, n = 3). b CD spectra of BP180 ICD (solid curve, n = 3), BP180 ICD + DMPC:DMPG lipid vesicles (dashed curve, n = 3) and BP180 ICD + DMPC:DMPG lipid vesicles with 1 mM calcium acetate (dotted curve, n = 2). CD spectra were averaged of three scans in each of three (two in case of calcium) independent measurements. Representative series are shown. c Samples with or without BP180 ICD and lipid vesicles were incubated at room temperature and centrifuged at 20,000×g for 60 min. Pellets (P) and supernatants (S) were analyzed by immunoblotting against BP180 ICD. Sedimentation analysis was performed twice with identical results BP180 ICD is intrinsically disordered in solution but can acquire partial folding upon interaction with negatively charged lipid vesicles.

BP180 ICD binds to lipid vesicles
Finally, we further analyzed the interaction of BP180 ICD with lipid vesicles. It is known that many peripheral membrane-binding proteins, such as myelin protein P2, can aggregate membrane vesicles in vitro . Therefore, we incubated His-BP180 ICD with and without DMPC-DMPG vesicles (1:100 protein:lipid molar ratio) and centrifuged sedimentable material. The majority of BP180 ICD was co-sedimented with DMPC-DMPG vesicles, while a small amount of protein stayed in the supernatant (Fig. 3c). Little aggregation and sedimentation of BP180 ICD was visible in the absence of lipid vesicles. These results confirm the binding of the BP180 ICD to negatively charged lipid vesicles.

Discussion
BP180 is unique among the transmembrane collagens, having a relatively large cytoplasmic domain. The structure of its intracellular N-terminus is poorly characterized and the current knowledge of its function is limited to its interactions with other hemidesmosomal proteins. In the present study, we expressed the human BP180 ICD in Escherichia coli, purified it to homogeneity and characterized it as an intrinsically disordered polypeptide.
The BP180 ICD was expressed as a soluble protein with or without its cysteine-rich C-terminus, or its native N-terminus and independently of tag size (GST or His). Previously, electron microscopy of full-length bovine BP180 has suggested that the ICD has a globular shape (Hirako et al. 1996) and protein-protein-interaction motifs within the first 400 amino acids have been mapped using a yeast two-hybrid system (Fontao et al. 2004). Our findings suggests that bacterially expressed BP180 ICD lacks a stable global folding pattern as suggested by bioinformatics and demonstrated by the presence of degradation and/or prematurely terminated translation products with the GST fusions of the N-terminal ICD fragments, co-purification of bacterial chaperone DnaK, the instability of the protein in high concentration, in low ionic strength or during freezing-melting and most importantly by the CD spectrum typical for disordered proteins.
We cannot exclude the possibility that the folding of BP180 ICD requires chaperoning or post-translational modifications that are not present in the bacterial expression system. Also, although the GST-BP180 ICD (aa 2-467) with the cysteine rich terminus had rather lower yield than the truncated BP180 ICD (aa 2-455) and cysteines should be reduced in the physiological cytosolic environment in basal keratinocytes (Wolf et al. 2014), we cannot totally exclude the role of rare cytosolic disulfides to the folding of the BP180 ICD in mammalian cell. Another possible explanation is that the folding of BP180 ICD may depend upon the transmembrane and extracellular domains of BP180 and/ or the trimerization mediated by the extracellular domain (Balding et al. 1997). However, our results raise the possibility that even in hemidesmosomes, the BP180 ICD may be natively unfolded, especially in the N-terminal regions, unless they interact with membrane lipids and/or acidic protein partners. Both the bioinformatics analysis and observations that membrane proximal part of the BP 180 ICD may have higher tendency to fold compared to N-terminus (Fig. 1b, d) support the hypothesis that lipid membrane may affect the structure of BP 180 ICD. Interestingly, the isoelectric point (pI) of the region comprising the fibronectin domains III and IV of beta-4 integrin is 5.15 and these domains are critical for the interaction of beta-4 integrin with the BP180 ICD (Schaapveld et al. 1998). Similarly, the acidic region of BP230 (aa 1-555, pI = 5.41) has been shown to interact with the BP180 ICD. In contrast, the BP180 ICD is basic (pI = 9.61), particularly so in the region of aa 145-230 (pI = 9.99), which is critical for the interaction between the BP180 ICD and BP230 (Koster et al. 2003). Finally, the acidic regulatory protein 14-3-3σ binds to the BP180 ICD (Li et al. 2007).
Interestingly, we found that the interactions of BP180 ICD with anionic detergent micelles or glycerophospholipids induce partial folding, which hints that the folding and unfolding of the BP180 ICD are connected to its interaction with anionic membrane head groups and/or other negatively charged macromolecules. We observed that a divalent cation (calcium) partially blocked the folding induced by the negatively charged lipid vesicle. This finding may be related to the physiological phenomenon, whereby an increase in cytosolic calcium and the associated activation of protein kinase C leads to the phosphorylation of the BP180 ICD and dissociation of BP180 from hemidesmosomes (Kitajima et al. 1992(Kitajima et al. , 1995. Taken together, these findings suggest that coulombic interactions can regulate the structure and function of BP180 on the cytoplasmic side of membrane. Importantly, it has been estimated that approximately 50% of transmembrane proteins have at least one disordered region at least 30 amino acids in length, and that these are concentrated on the cytoplasmic side (Burgi et al. 2016). Of note, macrophage scavenger receptor I, a transmembrane collagen family protein with type II topology, has a disordered cytoplasmic domain 50 amino acids long, which tends to bind to molecular chaperones (Nakamura et al. 2002;Franzke et al. 2005). Disordered regions may allow interactioninduced folding that provides a means by which to regulate cellular functions. This has previously been demonstrated in proteins such as cyclin-dependent kinase inhibitor p27kip1 (Lacy et al. 2004), α-synuclein (Davidson et al. 1998) and myelin basic protein . It is an interesting question whether the intrinsically disordered character of the BP180 ICD is involved in the regulation of hemidesmosome assembly and disassembly and/or to other protein-protein or protein-lipid interactions.
BP180 is the major antigen in bullous pemphigoid. The immunodominant epitope is the extracellular juxtamembraneous NC16A domain, but autoantibodies against the carboxyterminal extracellular epitopes are also frequently found. The phenomenon of epitope spreading has been found to take place in bullous pemphigoid and includes autoantibodies directed against epitopes located in the intracellular domain (Di Zenzo et al. 2011). Whether the latter are pathogenic is currently unknown, but our recent study demonstrated that some autoantibodies against the BP180 ICD are common among patients with Alzheimer's disease and those with multiple sclerosis and not rare among healthy subjects. These autoantibodies do not bind native protein in the skin and are likely not pathogenic. Our current finding indicates that the BP180 ICD is intrinsically disordered making it prone to degradation and altered processing, which may explain neoepitope presentation by BP180 and its relative instability compared with the other hemidesmosomal components (Liu et al. 2019).
In conclusion, our results suggest that the cytoplasmic domain of BP180 is intrinsically disordered and has potential for partial folding when exposed to negatively charged lipids. Further research is required to characterize the structure and dynamics of the BP180 ICD in its natural context.