Collagen is an important structural protein and the most abundant protein in mammals. In several research fields, structural analysis of collagens is performed. Fibrillar collagens almost entirely consist of continuous repeats of GXY, where G is glycine, X is often proline or alanine and Y is often hydroxyproline or alanine. In the present study, the collagen structure was investigated in detail at the nucleotide, codon group, amino acid and target peptide level using sequence analyses. One of the most important findings was that a selection of codon groups is predominantly involved in amino acid changes between closely related collagens and that other change routes come up when collagens are less related. The findings of the sequence analyses were used to evaluate reported sequences of non-avian dinosaur species and database entries of duck and chicken collagen. The duck assessment was supported by an experimental data set, obtained by collagen extraction from duck skin and subsequent digestion and LC–MS analysis. It was found that database entries of chicken and duck collagen 3α1 contained unreliable features, such as missing parts, no continuous GXY pattern and too many interspecies differences. As an example, the erroneous nature of one of these unreliable features was confirmed experimentally using LC–MS. Finally, dino and bird collagen 1α1 were compared. The presented results will show that performing a domain-specific proteogenomic analysis provides very useful information to assess de novo sequencing results and database information of collagens. Furthermore, it offers deeper insight in the functional restrictions and routes of evolutionary divergence.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Almeida PF, da Silva Lannes SC (2013) Extraction and physicochemical characterization of gelatin from chicken by-product. J Food Process Eng 36:824–833
Asara JM, Schweitzer MH, Freimark LM, Phillips M, Cantley LC (2007) Protein sequences from Mastodon and Tyrannosaurus rex revealed by mass spectrometry. Science 316:280–285
Buckley M et al (2008) Comment on “protein sequences from Mastodon and Tyrannosaurus rex revealed by mass spectrometry”. Science 319:33
Buckley M, Warwood S, van Dongen B, Kitchener AC, Manning PL (2017) A fossil protein chimera; difficulties in discriminating dinosaur peptide sequences from modern cross-contamination. Proc R Soc B 284: 20170544. https://doi.org/10.1098/rspb.2017.0544
Chen L, Liu P, Evans TC, Ettwiller LM (2017) DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355:752–756
Cloudsley-Thompson JL (2005) Ecology and behaviour of Mesozoic reptiles. Springer. ISBN 978-3-540-26571-9
Di Lullo GA, Sweeney SM, Körkkö J, Ala-Kokko L, San Antonio JD (2002) Mapping the ligand-binding sites and disease-associated mutations on the most abundant protein in the human, type i collagen. J Biol Chem 277:4223–4231
Exposito J, Valcourt U, Cluzel C, Lethias C (2010) The fibrillar collagen family. Int J Mol Sci 11:407–426
Godefroit P, Cau A, Dong-Yu H, Escuillié F, Wenhao W, Dyke G (2013) A jurassic avialan dinosaur from China resolves the early phylogenetic history of birds. Nature 498:359–362
Han S, Makareeva E, Kuznetsova NV, DeRidder AM, Sutter MB, Losert W, Phillips CL, Visse R, Nagase H, Leikin S (2010) Molecular mechanism of type I collagen homotrimer resistance to mammalian collagenases. J Biol Chem 285:22276–22281
Kang AH, Dixit SN, Corbett C, Gross J (1975) The covalent structure of collagen. Amino acid sequence of alpha1-CB5 glycopeptide and alpha1-CB4 from chick skin collagen. J Biol Chem 250:7428–7434
Karsdal MA, Leeming DJ, Henriksen K, Bay-Jensen A (2017) Biochemistry of collagens, laminins and elastin. Structure, function and biomarkers. Elsevier Academic Press. ISBN: 978-0-12-809847-9
Kleinnijenhuis AJ (2017) Domain-specific analysis of collagen code. http://www.slideshare.net/AnneKleinnijenhuis/domain-specific-analysis-of-collagen-code
Kleinnijenhuis AJ, van Holthoon FL, Herregods G (2018) Validation and theoretical justification of an LC–MS method for the animal species specific detection of gelatin. Food Chem 243:461–467
Larance M, Lamond AI (2015) Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol 16:269–280
Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J (2000) Molecular cell biology, 4th edn. New York: WH, Freeman, ISBN-10: 0-7167-3136-3
Mertins P et al (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534:55–62
Nesvizhskii AI (2014) Proteogenomics: concepts, applications and computational strategies. Nat Methods 11:1114–1125
Persikov AV, Pillitteri RJ, Amin P, Schwarze U, Byers PH, Brodsky B (2004) Stability related bias in residues replacing glycines within the collagen triple helix (Gly-Xaa-Yaa) in inherited connective tissue disorders. Hum Mutat 24:330–337
Pevzner PA, Kim S, Ng J (2008) Comment on “protein sequences from Mastodon and Tyrannosaurus rex revealed by mass spectrometry”. Science 321:1040b
Primrose S, Woolfe M, Rollinson S (2010) Food forensics: methods for determining the authenticity of foodstuffs. Trends Food Sci Technol 21:582–590
Schroeter ER, DeHart CJ, Cleland TP, Zheng W, Thomas PM, Kelleher NL, Bern M, Schweitzer MH (2017) Expansion for the Brachylophosaurus canadensis collagen i sequence and additional evidence of the preservation of cretaceous protein. J Proteome Res 16:920–932
Schweitzer M, Zheng W, Organ C, Avci R, Suo Z, Freimark L, Lebleu V, Duncan M, Vander Heiden M, Neveu J, Lane W, Cottrell J, Horner J, Cantley L, Kalluri R, Asara J (2009) Biomolecular characterization and protein sequences of the Campanian hadrosaur B. canadensis. Science 324:626–631
Slatter DA, Farndale RW (2015) Structural constraints on the evolution of the collagen fibril: convergence on a 1014-residue COL domain. Open Biol 5:1–7
Stinson RH, Sweeny PR, Hendricks RW (1979) Experimental confirmation of calculated phases and electron density profile for wet native collagen. Biophys J 26:209–222
Suzuki N, Nawa D, Su TH, Lin CW, Khoo KH, Yamamoto K (2013) Distribution of the Galβ1-4Gal epitope among birds: species-specific loss of the glycan structure in chicken and its relatives. PLoS ONE 8:e59291
Szpak P (2011) Fish bone chemistry and ultrastructure: implications for taphonomy and stable isotope analysis. J Archaeol Sci 38:3358–3372
Tromp G, Kuivaniemi H, Stacey A, Shikata H, Baldwin CT, Jaenisch R, Prockop DJ (1988) Structure of a full-length cDNA clone for the preproα1(I) chain of human type I procollagen. Biochem J 253:919–922
Web references: http://www.ebi.ac.uk/ena, http://www.uniprot.org, https://blast.ncbi.nlm.nih.gov/Blast.cgi, http://www.kazusa.or.jp/codon, http://www.fr33.net/translator.php, http://web.expasy.org/sim
The research was performed in Triskelion study 20959 and was financed by Triskelion. Anne Schulp (Naturalis, Leiden, the Netherlands) provided helpful feedback on an earlier version of the manuscript.
Conflict of interest
The authors declare no conflicts of interest.
The animal material used during the study (skin from duck) was purchased at a local supermarket.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Kleinnijenhuis, A.J., van Holthoon, F.L. Domain-Specific Proteogenomic Analysis of Collagens to Evaluate De Novo Sequencing Results and Database Information. J Mol Evol 86, 293–302 (2018). https://doi.org/10.1007/s00239-018-9844-x
- De novo sequencing
- GXY domain