Skip to main content
Log in

Comparative genomic analysis of collagen gene diversity

  • Original Article
  • Published:
3 Biotech Aims and scope Submit manuscript

Abstract

Collagen gene family, comprising 30% of the total protein mass in mammals, is the major part of extracellular matrix. To understand the complexity of collagen gene family, detailed sequence, phylogenetic and synteny analyses of 44 collagen genes were performed. According to sequence analysis results, Fibril-associated collagen with interrupted triple helices (FACITs) were identified as the most recently evolved vertebrate-specific collagens while Fibril-forming collagens and Collagen VI, VII, XXVI, and XXVIII were the most ancient collagens, originating at the time of choanoflagellates. Network-forming collagens were entirely conserved from arthopods to homo sapiens, except one gene loss event. Of note, bird specific gene dispensability of COL1A1, COL3A1, COL5A3 and COL11A2 genes was observed in Fibril-forming collagens. According to phylogenetic analysis, gene duplications in collagen family occurred at variable time points during invertebrate to vertebrate evolution. However, majority of gene duplications in FACITs and network-forming collagens occurred at fish time point, suggesting large scale duplications at the root of vertebrate lineage. Lastly, synteny analysis identified 12 conserved blocks containing 27 collagen genes in vertebrate species. Interestingly, dysregulation of seven conserved blocks including block1 (COL11A1), block3 (COL3A1, COL5A2), block5 (COL6A5, COL6A6), block7 (COL1A2), block9 (COL4A1, COL4A2), block11 (COL6A1, COL6A2, COL18A1) and block12 (COL4A5, COL4A6) were also reported in different diseases including cancer. The current study revealed many critical insights into sequence, structural and functional diversity of collagen gene family. In future, by using this information we may be able to establish the clinical and pathological relevance of these conserved collagen blocks in different diseases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abbasi AA (2008) Are we degenerate tetraploids? More genomes, new facts. Biol Direct 3:50

    Article  Google Scholar 

  • Abbasi AA (2015) Diversification of four human HOX gene clusters by step-wise evolution rather than ancient whole-genome duplications. Dev Genes Evol 225:353–357

    Article  CAS  Google Scholar 

  • Abedin M, King N (2010) Diverse evolutionary paths to cell adhesion. Trends Cell Biol 20:734–742

    Article  CAS  Google Scholar 

  • Albalat R, Cañestro C (2016) Evolution by gene loss. Nat Rev Genet 17:379–391

    Article  CAS  Google Scholar 

  • Alföldi J, Lindblad-Toh K (2013) Comparative genomics as a tool to understand evolution and disease. Genome Res 23:1063–1068

    Article  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  Google Scholar 

  • Asrar Z, Haq F, Abbasi AA (2013) Fourfold paralogy regions on human HOX-bearing chromosomes: role of ancient segmental duplications in the evolution of vertebrate genome. Mol Phylogenet Evol 66:737–747

    Article  CAS  Google Scholar 

  • Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D (2004) Ultraconserved elements in the human genome. Science 304:1321–1325

    Article  CAS  Google Scholar 

  • Boot-Handford RP, Tuckwell DS (2003) Fibrillar collagen: the key to vertebrate evolution? A tale of molecular incest. BioEssays News Rev Mol Cell Dev Biol 25:142–151

    Article  CAS  Google Scholar 

  • Bretaud S, Pagnon-Minot A, Guillon E, Ruggiero F, Le Guellec D (2011) Characterization of spatial and temporal expression pattern of Col15a1b during zebrafish development. Gene Expr Patterns 11:129–134

    Article  CAS  Google Scholar 

  • Brodsky B, Ramshaw JAM (1997) The collagen triple-helix structure. Matrix Biol 15:545–554

    Article  CAS  Google Scholar 

  • Byron A, Humphries JD, Humphries MJ (2013) Defining the extracellular matrix using proteomics. Int J Exp Pathol 94:75–92

    Article  CAS  Google Scholar 

  • Christoffels A, Koh EGL, Chia J-M, Brenner S, Aparicio S, Venkatesh B (2004) Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol 21:1146–1151

    Article  CAS  Google Scholar 

  • Chu M-L (2011) Structural proteins: genes for collagen. In: eLS. Wiley, Chichester. https://doi.org/10.1002/9780470015902.a0005023.pub2

  • Conrad B, Antonarakis SE (2007) Gene duplication: a drive for phenotypic diversity and cause of human disease. Annu Rev Genom Hum Genet 8:17–35

    Article  CAS  Google Scholar 

  • Dennis MY, Eichler EE (2016) Human adaptation and evolution by segmental duplication. Curr Opin Genet Dev 41:44–52

    Article  CAS  Google Scholar 

  • Dorman SN, Viner C, Rogan PK (2014) Splicing mutation analysis reveals previously unrecognized pathways in lymph node-invasive breast cancer. Sci Rep 4:7063

    Article  CAS  Google Scholar 

  • Egeblad M, Rasch MG, Weaver VM (2010) Dynamic interplay between the collagen scaffold and tumor evolution. Curr Opin Cell Biol 22:697–706

    Article  CAS  Google Scholar 

  • Ehrlich J, Sankoff D, Nadeau JH (1997) Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics 147:289–296

    CAS  PubMed  PubMed Central  Google Scholar 

  • Exposito J-Y, Valcourt U, Cluzel C, Lethias C (2010) The fibrillar collagen family. Int J Mol Sci 11:407–426

    Article  CAS  Google Scholar 

  • Gelse K, Pöschl E, Aigner T (2003) Collagens–structure, function, and biosynthesis. Adv Drug Deliv Rev 55:1531–1546

    Article  CAS  Google Scholar 

  • Hakes L, Pinney JW, Lovell SC, Oliver SG, Robertson DL (2007) All duplicates are not equal: the difference between small-scale and genome duplication. Genome Biol 8:R209

    Article  Google Scholar 

  • Huang X, Godfrey TE, Gooding WE, McCarty KS, Gollin SM (2006) Comprehensive genome and transcriptome analysis of the 11q13 amplicon in human oral cancer and synteny to the 7F5 amplicon in murine oral carcinoma. Genes Chromosomes Cancer 45:1058–1069

    Article  CAS  Google Scholar 

  • Hwang K-T, Chung JK, Jung IM, Heo SC, Ahn YJ, Ahn HS, Chang MS, Kim J-A, Han W, Noh D-Y (2010) COL18A1 as the candidate gene for the prognostic marker of breast cancer according to the analysis of the DNA copy number variation by array CGH. J Breast Cancer 13:37

    Article  Google Scholar 

  • Kasahara M (2007) The 2R hypothesis: an update. Curr Opin Immunol 19:547–552

    Article  CAS  Google Scholar 

  • Kemkemer C, Kohn M, Cooper DN, Froenicke L, Högel J, Hameister H, Kehrer-Sawatzki H (2009) Gene synteny comparisons between different vertebrates provide new insights into breakage and fusion events during mammalian karyotype evolution. BMC Evol Biol 9:84

    Article  Google Scholar 

  • King N, Westbrook MJ, Young SL, Kuo A, Abedin M, Chapman J, Fairclough S, Hellsten U, Isogai Y, Letunic I et al (2008) The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451:783–788

    Article  CAS  Google Scholar 

  • Leitinger B (2011) Transmembrane collagen receptors. Annu Rev Cell Dev Biol 27:265–290

    Article  CAS  Google Scholar 

  • Matondo A, Jo YH, Shahid M, Choi TG, Nguyen MN, Nguyen NNY, Akter S, Kang I, Ha J, Maeng CH et al (2017) The Prognostic 97 Chemoresponse Gene Signature in Ovarian Cancer., The Prognostic 97 Chemoresponse Gene Signature in Ovarian Cancer. Sci Rep Sci Rep 7:7, 9689–9689

    Article  Google Scholar 

  • Meyer A, Van de Peer Y (2005) From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). BioEssays 27:937–945

    Article  CAS  Google Scholar 

  • Montpetit A, Wilson MD, Chevrette M, Koop BF, Sinnett D (2003) Analysis of the conservation of synteny between Fugu and human chromosome 12. BMC Genom 4:30

    Article  Google Scholar 

  • Morvan-Dubois G, Le Guellec D, Garrone R, Zylberberg L, Bonnaud L (2003) Phylogenetic analysis of vertebrate fibrillar collagen locates the position of zebrafish alpha3(I) and suggests an evolutionary link between collagen alpha chains and hox clusters. J Mol Evol 57:501–514

    Article  CAS  Google Scholar 

  • Nagase H, Fields GB (1996) Human matrix metalloproteinase specificity studies using collagen sequence-based synthetic peptides. Pept Sci 40:399–416

    Article  CAS  Google Scholar 

  • Oohashi T, Naito I, Ueki Y, Yamatsuji T, Permpoon R, Tanaka N, Naomoto Y, Ninomiya Y (2011) Clonal overgrowth of esophageal smooth muscle cells in diffuse leiomyomatosis-Alport syndrome caused by partial deletion in COL4A5 and COL4A6 genes. Matrix Biol J Int Soc Matrix Biol 30:3–8

    Article  CAS  Google Scholar 

  • Rannikmäe K, Davies G, Thomson PA, Bevan S, Devan WJ, Falcone GJ, Traylor M, Anderson CD, Battey TWK, Radmanesh F et al (2015) Common variation in COL4A1/COL4A2 is associated with sporadic cerebral small vessel disease. Neurology 84:918–926

    Article  Google Scholar 

  • Ricard-Blum S (2011) The collagen family. Cold Spring Harb Perspect Biol 3:a004978

    Article  Google Scholar 

  • Rychel AL, Smith SE, Shimamoto HT, Swalla BJ (2006) Evolution and development of the chordates: collagen and pharyngeal cartilage. Mol Biol Evol 23:541–549

    Article  CAS  Google Scholar 

  • Sabatelli P, Gara SK, Grumati P, Urciuolo A, Gualandi F, Curci R, Squarzoni S, Zamparelli A, Martoni E, Merlini L et al (2011) Expression of the collagen VI α5 and α6 chains in normal human skin and in skin of patients with collagen VI-related myopathies. J Invest Dermatol 131:99–107

    Article  CAS  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    CAS  PubMed  Google Scholar 

  • Shahid M, Choi TG, Nguyen MN, Matondo A, Jo YH, Yoo JY, Nguyen NNY, Yun HR, Kim J, Akter S et al (2016a) An 8-gene signature for prediction of prognosis and chemoresponse in non-small cell lung cancer. Oncotarget 7:86561–86572

    PubMed  PubMed Central  Google Scholar 

  • Shahid M, Cho KM, Nguyen MN, Choi TG, Jo YH, Aryal SN, Yoo JY, Yun HR, Lee JW, Eun YG et al (2016b) Prognostic value and their clinical implication of 89-gene signature in glioma. Oncotarget 7:51237–51250

    PubMed  PubMed Central  Google Scholar 

  • Shaw LM, Olsen BR (1991) FACIT collagens: diverse molecular bridges in extracellular matrices. Trends Biochem Sci 16:191–194

    Article  CAS  Google Scholar 

  • Staatz WD, Fok KF, Zutter MM, Adams SP, Rodriguez BA, Santoro SA (1991) Identification of a tetrapeptide recognition sequence for the alpha 2 beta 1 integrin in collagen. J Biol Chem 266:7363–7367

    CAS  PubMed  Google Scholar 

  • Sundaramoorthy M, Meiyappan M, Todd P, Hudson BG (2002) Crystal structure of NC1 domains structural basis for type IV collagen assembly in basement membranes. J Biol Chem 277:31142–31153

    Article  CAS  Google Scholar 

  • Tanaka T, Ikari K, Furushima K, Okada A, Tanaka H, Furukawa K-I, Yoshida K, Ikeda T, Ikegawa S, Hunt SC et al (2003) Genomewide linkage and linkage disequilibrium analyses identify COL6A1, on Chromosome 21, as the locus for ossification of the posterior longitudinal ligament of the spine. Am J Hum Genet 73:812–822

    Article  CAS  Google Scholar 

  • Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    Article  CAS  Google Scholar 

  • Veit G, Kobbe B, Keene DR, Paulsson M, Koch M, Wagener R (2006) Collagen XXVIII, a novel von willebrand factor a domain-containing protein with many imperfections in the collagenous domain. J Biol Chem 281:3494–3504

    Article  CAS  Google Scholar 

  • Wada H, Okuyama M, Satoh N, Zhang S (2006) Molecular evolution of fibrillar collagen in chordates, with implications for the evolution of vertebrate skeletons and chordate phylogeny. Evol Dev 8:370–377

    Article  CAS  Google Scholar 

  • Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699

    Article  CAS  Google Scholar 

Download references

Acknowledgements

Special thanks to the teams behind UCSC, NCBI and Ensembl genome browsers, for making all the data publically available for the analyses.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farhan Haq.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Haq, F., Ahmed, N. & Qasim, M. Comparative genomic analysis of collagen gene diversity. 3 Biotech 9, 83 (2019). https://doi.org/10.1007/s13205-019-1616-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13205-019-1616-9

Keywords

Navigation