Cell and Tissue Research

, 339:247



    • Department of Pharmacology and Toxicology, Ernest Mario School of PharmacyRutgers University
  • Rita A. Hahn
    • Department of Pharmacology and Toxicology, Ernest Mario School of PharmacyRutgers University
At-a-Glance Article

DOI: 10.1007/s00441-009-0844-4

Cite this article as:
Gordon, M.K. & Hahn, R.A. Cell Tissue Res (2010) 339: 247. doi:10.1007/s00441-009-0844-4


The collagens represent a family of trimeric extracellular matrix molecules used by cells for structural integrity and other functions. The three α chains that form the triple helical part of the molecule are composed of repeating peptide triplets of glycine-X-Y. X and Y can be any amino acid but are often proline and hydroxyproline, respectively. Flanking the triple helical regions (i.e., Col domains) are non-glycine-X-Y regions, termed non-collagenous domains. These frequently contain recognizable peptide modules found in other matrix molecules. Proper tissue function depends on correctly assembled molecular aggregates being incorporated into the matrix. This review highlights some of the structural characteristics of collagen types I-XXVIII.


CollagensExtracellular matrixFibrilsFACITsBasement membrane


Collagens are extracellular matrix molecules used by cells for structural integrity and a variety of other functions. Recent reviews offer insights into the collagen family members and important aspects of their structures, functions, or associated disease states (Kadler et al. 2007; Myllyharju and Kivirikko 2001, 2004; Brodsky and Persikov 2005; Gelse et al. 2003; Ortega and Werb 2002). Table 1 lists the α chains of the various collagens and the National Center for Biotechnology Information reference numbers that provide sequence information and cite some published literature for the 28 different collagen types. These α chains are used to build trimeric molecules, which are woven together into a triple helix in at least one region. The triple helical domains form as a result of glycine (gly) being used every third residue (i.e., repeating peptide triplets of gly-X-Y). In this triplet, X is often proline, and Y is frequently hydroxyproline. In each α chain, triple helical regions, termed Col domains, are flanked by non-collagenous (non-gly-X-Y) regions, termed NC domains. These NC domains often contain recognizable peptide modules found in other matrix molecules. The function of a collagen depends on the proper supramolecular assembly of molecules into an aggregate that becomes incorporated into the matrix. This review attempts to highlight some of the structural characteristics of the numbered collagens, types I-XXVIII. Space precludes us from including the growing number of triple-helix-containing molecules that are not among the numbered collagens, many of which (e.g., the collectin family) are involved in innate immunity. Schematic diagrams of the aggregated forms of the numbered collagens are provided where known.
Table 1

Collagen α chains, number of amino acids (aa), and National Center for Biotechnology Information (NCBI) reference numbers (SP signal peptide, vWC von Willebrand factor C domain, FNIII fibronectin type III domain)

Collagen α chain

Number of amino acids

NCBI reference number


1464 (includes 22 aa SP)



1366 aa (includes SP)



1487 aa (includes 25 aa SP)



Same as a1(II)A but lacks vWC domain



1466 aa (includes 23 aa SP)



1669 aa (includes 27 aa SP)



1712 aa (includes 25 aa SP)



1670 aa (includes 28 aa SP)



1690 aa (includes 38 aa SP)



1685 aa (includes 26 aa SP)



1691 aa (includes 21 aa SP)

NP_001838 (B isoform NP_378667)


1838 aa (includes SP)



1499 aa (includes SP)



1745 aa (includes 29 aa SP)



1028 aa (includes 19 aa SP)



1019 aa (includes 20 aa SP)


2C2a and 2C2a' isoforms

NP_478054 and NP_478055


3177 (with 25 aa SP)

NP_004360 - multiple splicings

mu α4(VI)

Not in human; mouse=2309 aa

Swiss-Prot A2AX52


2611 (includes SP)

NP_694996 (partial-shows 2526 aa)

Isoforms 2 and 3

NP_203699 and NP_203700


2263 aa (includes SP)



2944 aa (includes 16 aa SP)



744 aa (includes 28 aa SP)

NP_065084; NP_001841


703 aa (includes SP)



921 aa (includes 23 aa SP)


Short form 678 aa (includes 23 aa SP)



689 aa (includes SP)



684 aa (includes SP)



680 aa (includes 18 aa SP)



1806 aa (includes 36 aa SP)



1818 aa (includes 36 aa SP)



1767 aa (includes 36 aa SP)



1736 aa (includes 22 aa SP)


Isoforms 2 and 3

NP_542412 and NP_542410


Same as α1(II)A



3063 aa (includes 23 aa SP)

NP_004361 (has NC1 variants)

Short form 1899 aa, includes same SP as long form

NP_542376 (has NC1 splice variants)


717 aa (transmembranous)

NP_005194 (20+ splice variants)


1796 aa (includes SP)

NP_066933 (has NC1 variants)

Short form without N-terminal FNIII domain; also has NC1 variants)


1388 aa (includes 25 aa SP)



1604 aa (includes SP)



1497 aa (transmembranous)



1516 aa (includes 23 aa SP)


Short form 1336 aa (includes 33 aa SP)



1142 aa (includes 23 aa SP)


ch α1(XX)

Not in human; ch=1472 aa (without SP)



957 aa (includes 22 aa SP)



1626 aa (includes SP)



540 aa (transmembranous)



1714 aa (includes SP)



654 aa (transmembranous)


Isoform 2 is 642 aa



439 aa (includes SP)



1860 aa (includes 41 aa SP)



1125 aa (includes SP)


Fibrillar collagens

The ways that α chains form trimeric molecules, which in turn assemble into fibrils, are shown in Fig. 1. Some types, such as collagen II, form homotrimeric molecules. Others, such as types I and V, form heterotrimers. With the exception of collagens XXIV and XXVII, the major triple helical domain is slightly longer than 1000 amino acid residues and has a perfect gly-X-Y triplet structure. All non-fibrillar collagens have at least one imperfection or interruption in their triple helices. The amino end of a fibrillar collagen, called the N-peptide or N-propeptide, usually contains at least one small triple helical domain, designated the minor helix. Once the major triple helix has formed, the amino and carboxyl ends undergo processing. Processed molecules are aligned in a quarter stagger arrangement in the growing fibril. In electron micrographs, fibrils have an alternating light and dark pattern, which has led to them being called banded fibrils. Pure single-type fibrils are unlikely to exist. Types V and XI collagen nucleate fibrils of types I and II collagen, respectively (Kadler et al. 2008; Wenstrup et al. 2006). The portion of the V and XI N-peptides that are retained after processing serve to regulate fibril diameter (Birk 2001; Blaschke et al. 2000; Linsenmayer et al. 1993; Birk et al. 1990). Fibril diameter control is crucial to proper tissue function. In the fibrillar collagen group, collagens XXIV and XXVII are unique because their major triple helices are shorter than those of the other members, and they have one or two interruptions. The number depends on interpretation: one can view the amino-most interruption (yellow triangle in Fig. 1) as the separation between the minor and major helices, or it can be viewed as an additional interruption in the major triple helix of a collagen that contains no minor helix (Koch et al. 2003; Pace et al. 2003; Boot-Handford et al. 2003). Interestingly, the C-propeptides of both collagens XXIV and XXVII contain chain selection sequences that resemble those used in invertebrate fibrillar collagens (Lees et al. 1997). This and the analysis of the genes suggest that these two newly identified fibrillar collagens are of ancient origin (Boot-Handford et al. 2003).
Fig. 1

Assembly of α chains into trimeric collagen molecules and then of molecules into fibrils. Top The folding of three chains into a single molecule. Middle Fibrillar collagen α chains showing domain structures. Scale is approximate (C-pro C-propeptide domain, TH gly-X-Y repeating region). The yellow triangle is a short non-gly-X-Y domain in collagens XXIV and XXVII and has been interpreted in two ways: as a separation between the minor and major helices in collagen XXIV, and as an interruption in the major (and only) triple helix in collagen XXVII. Bottom Perfect quarter stagger overlaps result when no bulky groups protrude from the fibril surface. Bottom right Electron micrograph of collagen fibrils in the rabbit cornea. The thin diameters (~100 nm) are the result of fibrils being heterotypic structures composed of ~80% type I and 20% type V collagen

Collagens associated with banded fibrils


Several non-fibrillar types of collagen associate with the surface of collagen fibrils. One group has been named “FACITs” (fibril-associated collagens with interrupted triple helices). The prototype is collagen IX. This collagen is a product of cartilage, where it is cross-linked to the surface of type II collagen fibrils (Eyre et al. 1987). Types XII and XIV collagen are present in tissues containing either type I or type II collagen fibrils (Walchli et al. 1994; Eyre 2002). Proof of their association with banded fibrils has been demonstrated in tendon (Keene et al. 1991). Fibril association is schematicized in Fig. 2 for types IX, XII, and XIV. These three collagens can be alternatively spliced to yield short and long variants (Svoboda et al. 1988; Gerecke et al. 1997; Imhof and Trueb 2001) that might confer different properties to the fibrils in tissues. Type XIV collagen is also a fibril diameter regulator in early stages of fibrillogenesis (Ansorge et al. 2009). The way that diameter regulation may occur is shown in Fig. 2. Type XX collagen has been cloned from chicken (Koch et al. 2001). The human gene is disrupted in the region encoding the Col 1 domain, so the molecule is probably not used as a collagen in humans. Many lines of evidence suggest that, in addition to a fibril-associated form, type XII collagen has an additional aggregation that involves interaction with basement membrane components (Wessel et al. 1997; Ljubimov et al. 1996; Anderson et al. 2000; Cheng et al. 2001; Bader et al. 2009). This underscores an important point: many collagens can participate in more than one kind of supramolecular aggregate.
Fig. 2

Collagens associated with banded fibrils. Top FACIT α chains, FACIT-like α chains, and the type VII α1 chain. Domains are indicated below. Scale is approximate. Bottom left Long and short forms of types IX, XII, and XIV collagen on the surface of a fibril. Bottom middle The presence of a few FACITs allows the collagen fibrils to come into contact (double-headed arrows) and to fuse forming larger diameter structures. More FACITs associated with the fibril surface hinders fibril fusions. This has been shown recently for type XIV collagen. Bottom right A collagen VII antiparallel dimer functioning as a “rivet” linking the epithelial basement membrane with the banded fibrils of the dermis

The FACIT-like collagens, viz., types XVI (Pan et al. 1992), XIX (Yoshioka et al. 1992; Myers et al. 1994, 1997), XXI (Chou and Li 2002; Li et al. 2005), and XXII (Koch et al. 2004), are primarily localized to basement membrane zones or junctions between tissues. Type XVI collagen has 10 collagenous domains flanked by non-collagenous domains (Pan et al. 1992; Yamaguchi et al. 1992), and one of its supramolecular aggregated forms involves association with dermal fibrillin 1 near the epidermal basement membrane. Another form, found in cartilage, is associated with banded fibrils, but only fibrils that do not have collagen IX on their surface (Kassner et al. 2003). The aggregate form of collagen XIX is unknown, but noteworthy is the finding that Col19a1 null mice undergo a transdifferentiation of smooth muscle to skeletal muscle in the abdominal part of the esophagus (Sumiyoshi et al. 2004). Little is known about collagen type XXI other than its primary structure and potential chain association from recombinant α chains (Chou and Li 2002; Li et al. 2005). Type XXII collagen, a close relative in sequence to domains of collagen XXI, is found in the basement membrane of the myotendinous junction. Another form of it is associated with cartilage microfibrils (Koch et al. 2004).

Much has been written about collagen VII, since it is crucial to the attachment complex that rivets the epidermis to the dermis (for reviews, see Bruckner-Tuderman 2009; Uitto 2008; Aumailley et al. 2006; Uitto and Pulkkinen 1996). The attachment complex is comprised of hemidesmosomes in the basal epithelial cells, anchoring filaments in the basement membrane, and anchoring fibrils that reach from the basement membrane down into the dermis. Type VII collagen is the major component of anchoring fibrils and has recently been shown to merge into the banded collagen fibrils of the dermis (Villone et al. 2008). A representation of the antiparallel dimer form of collagen VII is shown in Fig. 2, where one end of the aggregate is localized to the lamina densa (bound to anchoring filaments) of the epithelial basement membrane, and the other end interacts with a dermal collagen fibril.

Network-forming collagens

Collagen types IV, VI, VIII, and X form networks of various kinds. Type IV collagen, otherwise known as basement membrane collagen, makes a three-dimensional structure (Yurchenco and Ruben 1988; Barge et al. 1991) from laterally associating chicken wire-like two-dimensional structures (Fig. 3). There are six α chains that can be used to make the type IV collagen trimers (for reviews on structure and disease states, see Hudson et al. 1993; Khoshnoodi et al. 2008). The most common chain composition, however, is [α1(IV)]2α2(IV). Only two representative α chains are shown in Fig. 3, one to represent the α1 and α2 chains, and the other to represent the α3, α4, α5, and α6 chains. The N-terminal domains of the α1 and α2 chains are composed of 143 and 167 amino acid residues, respectively, whereas the other four chains have small amino terminal domains ranging from 13 to 29 amino acids. The triple helical domains of the six chains range from 1271 amino acids to 1416 residues, and all have more than 20 interruptions in the gly-X-Y triplet structure. The carboxyl termini all have approximately 228 amino acid residues.
Fig. 3

Network-forming collagens. Top Representative type IV collagen chains. The chicken wire supramolecular aggregate of type IV collagen is shown below. Middle Type VI collagen chains, including the murine α4, which does not have a human counterpart. Single molecules forming dimers, and dimers interacting to become tetramers are presented right. The tetramers aggregate end to end. Bottom The α chains for short chain collagens, types VIII and X (yellow boxes domains unique to the collagen VIII α chains, blue boxes conserved non-collagenous domains, black boxes collagenous domains that are also conserved between collagen VIII and X). A drawing of a hexagonal lattice, a supramolecular aggregate that these collagens can assume, is shown right

Type VI collagen is another network-forming molecule. It has a ubiquitous tissue distribution. Once thought to make heterotypic trimers by using the smaller-sized α1 and α2 chains plus the larger α3 chain, it is now known that three additional chains, α4–α6, exist in mouse. All three are similar to the α3 chain. Orthologs of α5 and α6 have been found in human (Fitzgerald et al. 2008; Gara et al. 2008). In mouse, the α4 chain is used to make trimers far more frequently than the α5 and α6 chains, so why do humans lack the α4 chain? Spatial expression patterns indicate that the tissue distribution of mouse α4 is similar to that of human α5, suggesting that human α5 is its equivalent. The lack of the α4 chain in humans is attributable to a disrupted COL4A4 gene. This disruption is probably a relatively recent gene alteration, since rhesus monkeys (Old World monkey lineage) have an intact α4(VI) chain gene, whereas chimpanzees and humans do not (Fitzgerald et al. 2008).

In the type VI α chains, the short ~330-amino-acid collagenous domain is flanked on each side by von Willebrand factor A domains (see Fig. 3). To make the type VI collagen supramolecular aggregate, molecules form dimers, the dimers form tetramers, and the tetramers form end to end associations that yield the ultrastructural appearance of beads on a string (Engel et al. 1985). The structure is represented in Fig. 3. Type VI collagen has a crucial role in the function in muscle, since mutations cause Bethlem myopathy and Ulrich congenital muscular dystrophy (for a review, see Lampe and Bushby 2005).

Types VIII and X are related short chain collagens (for reviews, see Schmid et al. 1990; Shuttleworth 1997; Plenz et al. 2003). Type VIII is expressed by many cells, especially by endothelial cells, whereas type X is restricted to hypertrophic chondrocytes. The type VIII α1 and α2 chains are slightly longer than the α1(X) collagen chain because of an additional exon in the type VIII genes encoding an extra polypeptide sequence for the amino terminal domain. This is shown as a yellow box in Fig. 3. One of the supramolecular aggregates that type VIII collagen can make is a hexagonal lattice (Fig. 3). This structure is beautifully demonstrated by Descemet’s membrane of the cornea (Sawada et al. 1990). Recombinant type VIII collagen molecules allowed to aggregate also make hexagonal lattices (Stephan et al. 2004). Additional macromolecular forms must exist, since type VIII collagen made by vascular endothelial cells has never been visualized in a hexagonal lattice. With regard to type X collagen, no in vivo lattice that reacts with a collagen X antibody has been observed. However, type X collagen molecules form hexagonal lattices when allowed to assemble into aggregates in vitro (Kwan et al. 1991). What one does observe in vivo is that type X collagen has a fibril-associated form, and that it also forms fine filaments in the pericellular matrix of hypertrophic chondrocytes (Schmid and Linsenmayer 1990). The latter may be hexagonal lattices that have collapsed during preparation for ultrastructural analysis. In individual molecules, type X collagen, having only one α chain, forms homotrimers. Although two α chains are known for type VIII collagen, the preponderance of evidence suggests each chain forms homotrimers (Illidge et al. 1998; Greenhill et al. 2000; Stephan et al. 2004).

Transmembranous collagens

Types XIII, XVII, XXIII, and XXV collagens are transmembrane molecules inserted in the plasma membrane in a type II orientation (for a review, see Franzke et al. 2003). All members of the group are also shed from the cell surface, generating soluble forms (for reviews, see Franzke et al. 2003; Veit et al. 2007). Type XVII collagen is unlike the other three members of the group, being much larger and having many more collagenous domains plus a large intracellular domain (Fig. 4). As a component of the hemidesmosomes, collagen XVII, together with α6β4 integrin, extends from the cell membrane into the basement membrane to bind with laminin 332, the anchoring filament component of the attachment complex. As such, collagen XVII is indirectly assembled into a supramolecular complex with collagen VII, which binds to laminin 332 from the dermal side.
Fig. 4

Transmembrane, endostatin precursor, and other collagens. The α chains for each collagen are shown (black bars collagenous domains, blue bars NC domains, TM in green boxes transmembrane domains). In the transmembrane group, matching geometric patterns in collagens XIII, XXIII, and XXV indicate conserved regions (TSP thrombospondin N-terminal module, vWA von Willebrand factor A domains)

Collagen types XIII, XXIII, and XXV are similar to each other. Types XIII and XXV have a conserved linear pattern of domains. Collagen XXIII has conserved domains, but they are rearranged compared with the other two (Koch et al. 2006). This is indicated by the geometric patterns superimposed on the domain structures in Fig. 4. The initiation of chain selection, followed by zippering of the triple helix (i.e., folding the molecule properly), is not so straight forward for a transmembrane collagen. Coiled coils have been postulated to play a role in molecular assembly (McAlinden et al. 2003). For collagen XIII, this has been established by examining NC3 domain mutations (Snellman et al. 2007). Our understanding of collagens XIII, XXIII, and XXV is still in its infancy. Collagen XXV is known to be a component of the amyloid plaques characteristic of Alzeheimer’s disease (Hashimoto et al. 2002). The molecule appears to assemble amyloid beta protein fibrils into bundles that are resistant to proteases (Soderberg et al. 2005). With regard to collagens XIII and XXIII, the level of these molecules is elevated in some tumors (Vaisanen et al. 2005; Banyard et al. 2003; Koch et al. 2006). A most interesting finding is that a mutation in collagen XIII has been associated with altering an animal’s immune response, thereby increasing its susceptibility to bacteria. The presence of the mutant collagen XIII has been correlated with a predisposition toward developing B cell lymphomas (Tuomisto et al. 2008).

Endostatin precursor collagens

The carboxyl terminal domains of collagens XV and XVIII can be cleaved to generate antiangiogenic peptides. (This also occurs for type IV collagen.) These are called either endostatin and restin, or endostatin-XVIII and endostatin-XV. These cleaved fragments have some distinct differences in properties (Sasaki et al. 2000). As full-length molecules, collagen XV and XVIII are basement membrane collagens with similar features. There is high homology between triple helical domains and the amino and carboxyl non-collagenous domains (Rehn et al. 1994). Moreover, both molecules are proteoglycan core proteins. Collagen XV has a chondroitin sulfate glycosaminoglycan side chain (Li et al. 2000), and collagen XVIII has a heparin sulfate side chain (Halfter et al. 1998). However, some important differences exist in the tissue distribution of these collagens. The most striking is that collagen XV is expressed in heart and skeletal muscle, whereas collagen XVIII is expressed by smooth muscle (Sasaki et al. 2000). Because of this pattern, collagen XV null mice unsurprisingly have skeletal myopathies and cardiovascular defects (Eklund et al. 2001). Mice null for both genes demonstrate that the collagens do not functionally compensate for each other (Ylikarppa et al. 2003). Another interesting feature about collagen XVIII is that it is like fibronectin in the sense that it, too, has a plasma form. The long form of the molecule synthesized by the liver circulates in the blood (Musso et al. 2001).

Other collagens

Collagens XXVI and XXVIII do not easily fit into any category. Collagen XXVI is expressed in testis and ovaries, especially in neonates. The molecule is small for a collagen, having only 438 amino acids in total, including two short collagenous domains (69 aa and 33 aa). It does, however, undergo modification processes expected for a collagen, such as prolyl hydroxylation, trimer formation, and secretion (Sato et al. 2002).

The collagen XXVIII triple helix is flanked by von Willebrand factor A domains, and the molecule has some sequence relationship with type VI chains based on phylogenetic analyses, but the triple helical domain is longer than that of collagen VI chains. The molecule is expressed predominantly by dorsal root ganglia and peripheral nerves. Newborn, but not adult, sciatic nerve expresses collagen XXVIII mRNA, although the protein is detected in adult sciatic nerve, suggesting a long half life for the molecule.

Concluding remarks

There are many collagens, generating many questions. Because the family is so diverse, we can look forward with pleasure to the exciting answers that will emerge.

Copyright information

© Springer-Verlag 2009