Journal of Molecular Evolution

, Volume 86, Issue 5, pp 293–302 | Cite as

Domain-Specific Proteogenomic Analysis of Collagens to Evaluate De Novo Sequencing Results and Database Information

  • Anne J. KleinnijenhuisEmail author
  • Frédérique L. van Holthoon
Original Article


Collagen is an important structural protein and the most abundant protein in mammals. In several research fields, structural analysis of collagens is performed. Fibrillar collagens almost entirely consist of continuous repeats of GXY, where G is glycine, X is often proline or alanine and Y is often hydroxyproline or alanine. In the present study, the collagen structure was investigated in detail at the nucleotide, codon group, amino acid and target peptide level using sequence analyses. One of the most important findings was that a selection of codon groups is predominantly involved in amino acid changes between closely related collagens and that other change routes come up when collagens are less related. The findings of the sequence analyses were used to evaluate reported sequences of non-avian dinosaur species and database entries of duck and chicken collagen. The duck assessment was supported by an experimental data set, obtained by collagen extraction from duck skin and subsequent digestion and LC–MS analysis. It was found that database entries of chicken and duck collagen 3α1 contained unreliable features, such as missing parts, no continuous GXY pattern and too many interspecies differences. As an example, the erroneous nature of one of these unreliable features was confirmed experimentally using LC–MS. Finally, dino and bird collagen 1α1 were compared. The presented results will show that performing a domain-specific proteogenomic analysis provides very useful information to assess de novo sequencing results and database information of collagens. Furthermore, it offers deeper insight in the functional restrictions and routes of evolutionary divergence.


Proteogenomics Domain-specific Collagen LC-MS De novo sequencing GXY domain 



The research was performed in Triskelion study 20959 and was financed by Triskelion. Anne Schulp (Naturalis, Leiden, the Netherlands) provided helpful feedback on an earlier version of the manuscript.

Compliance with Ethical Standards

Conflict of interest

The authors declare no conflicts of interest.

Animal Rights

The animal material used during the study (skin from duck) was purchased at a local supermarket.

Supplementary material

239_2018_9844_MOESM1_ESM.docx (16 kb)
Supplementary material 1 (DOCX 15 KB)


  1. Almeida PF, da Silva Lannes SC (2013) Extraction and physicochemical characterization of gelatin from chicken by-product. J Food Process Eng 36:824–833CrossRefGoogle Scholar
  2. Asara JM, Schweitzer MH, Freimark LM, Phillips M, Cantley LC (2007) Protein sequences from Mastodon and Tyrannosaurus rex revealed by mass spectrometry. Science 316:280–285CrossRefPubMedGoogle Scholar
  3. Buckley M et al (2008) Comment on “protein sequences from Mastodon and Tyrannosaurus rex revealed by mass spectrometry”. Science 319:33CrossRefPubMedPubMedCentralGoogle Scholar
  4. Buckley M, Warwood S, van Dongen B, Kitchener AC, Manning PL (2017) A fossil protein chimera; difficulties in discriminating dinosaur peptide sequences from modern cross-contamination. Proc R Soc B 284: 20170544.
  5. Chen L, Liu P, Evans TC, Ettwiller LM (2017) DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355:752–756CrossRefPubMedGoogle Scholar
  6. Cloudsley-Thompson JL (2005) Ecology and behaviour of Mesozoic reptiles. Springer. ISBN 978-3-540-26571-9Google Scholar
  7. Di Lullo GA, Sweeney SM, Körkkö J, Ala-Kokko L, San Antonio JD (2002) Mapping the ligand-binding sites and disease-associated mutations on the most abundant protein in the human, type i collagen. J Biol Chem 277:4223–4231CrossRefPubMedGoogle Scholar
  8. Exposito J, Valcourt U, Cluzel C, Lethias C (2010) The fibrillar collagen family. Int J Mol Sci 11:407–426CrossRefPubMedPubMedCentralGoogle Scholar
  9. Godefroit P, Cau A, Dong-Yu H, Escuillié F, Wenhao W, Dyke G (2013) A jurassic avialan dinosaur from China resolves the early phylogenetic history of birds. Nature 498:359–362CrossRefPubMedGoogle Scholar
  10. Han S, Makareeva E, Kuznetsova NV, DeRidder AM, Sutter MB, Losert W, Phillips CL, Visse R, Nagase H, Leikin S (2010) Molecular mechanism of type I collagen homotrimer resistance to mammalian collagenases. J Biol Chem 285:22276–22281CrossRefPubMedPubMedCentralGoogle Scholar
  11. Kang AH, Dixit SN, Corbett C, Gross J (1975) The covalent structure of collagen. Amino acid sequence of alpha1-CB5 glycopeptide and alpha1-CB4 from chick skin collagen. J Biol Chem 250:7428–7434PubMedGoogle Scholar
  12. Karsdal MA, Leeming DJ, Henriksen K, Bay-Jensen A (2017) Biochemistry of collagens, laminins and elastin. Structure, function and biomarkers. Elsevier Academic Press. ISBN: 978-0-12-809847-9Google Scholar
  13. Kleinnijenhuis AJ (2017) Domain-specific analysis of collagen code.
  14. Kleinnijenhuis AJ, van Holthoon FL, Herregods G (2018) Validation and theoretical justification of an LC–MS method for the animal species specific detection of gelatin. Food Chem 243:461–467CrossRefPubMedGoogle Scholar
  15. Larance M, Lamond AI (2015) Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol 16:269–280CrossRefPubMedGoogle Scholar
  16. Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J (2000) Molecular cell biology, 4th edn. New York: WH, Freeman, ISBN-10: 0-7167-3136-3Google Scholar
  17. Mertins P et al (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534:55–62CrossRefPubMedPubMedCentralGoogle Scholar
  18. Nesvizhskii AI (2014) Proteogenomics: concepts, applications and computational strategies. Nat Methods 11:1114–1125CrossRefPubMedPubMedCentralGoogle Scholar
  19. Persikov AV, Pillitteri RJ, Amin P, Schwarze U, Byers PH, Brodsky B (2004) Stability related bias in residues replacing glycines within the collagen triple helix (Gly-Xaa-Yaa) in inherited connective tissue disorders. Hum Mutat 24:330–337CrossRefPubMedGoogle Scholar
  20. Pevzner PA, Kim S, Ng J (2008) Comment on “protein sequences from Mastodon and Tyrannosaurus rex revealed by mass spectrometry”. Science 321:1040bCrossRefGoogle Scholar
  21. Primrose S, Woolfe M, Rollinson S (2010) Food forensics: methods for determining the authenticity of foodstuffs. Trends Food Sci Technol 21:582–590CrossRefGoogle Scholar
  22. Schroeter ER, DeHart CJ, Cleland TP, Zheng W, Thomas PM, Kelleher NL, Bern M, Schweitzer MH (2017) Expansion for the Brachylophosaurus canadensis collagen i sequence and additional evidence of the preservation of cretaceous protein. J Proteome Res 16:920–932CrossRefPubMedPubMedCentralGoogle Scholar
  23. Schweitzer M, Zheng W, Organ C, Avci R, Suo Z, Freimark L, Lebleu V, Duncan M, Vander Heiden M, Neveu J, Lane W, Cottrell J, Horner J, Cantley L, Kalluri R, Asara J (2009) Biomolecular characterization and protein sequences of the Campanian hadrosaur B. canadensis. Science 324:626–631CrossRefPubMedGoogle Scholar
  24. Slatter DA, Farndale RW (2015) Structural constraints on the evolution of the collagen fibril: convergence on a 1014-residue COL domain. Open Biol 5:1–7CrossRefGoogle Scholar
  25. Stinson RH, Sweeny PR, Hendricks RW (1979) Experimental confirmation of calculated phases and electron density profile for wet native collagen. Biophys J 26:209–222CrossRefPubMedPubMedCentralGoogle Scholar
  26. Suzuki N, Nawa D, Su TH, Lin CW, Khoo KH, Yamamoto K (2013) Distribution of the Galβ1-4Gal epitope among birds: species-specific loss of the glycan structure in chicken and its relatives. PLoS ONE 8:e59291CrossRefPubMedPubMedCentralGoogle Scholar
  27. Szpak P (2011) Fish bone chemistry and ultrastructure: implications for taphonomy and stable isotope analysis. J Archaeol Sci 38:3358–3372CrossRefGoogle Scholar
  28. Tromp G, Kuivaniemi H, Stacey A, Shikata H, Baldwin CT, Jaenisch R, Prockop DJ (1988) Structure of a full-length cDNA clone for the preproα1(I) chain of human type I procollagen. Biochem J 253:919–922CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Anne J. Kleinnijenhuis
    • 1
    Email author
  • Frédérique L. van Holthoon
    • 1
  1. 1.TriskelionZeistThe Netherlands

Personalised recommendations