Domain-Specific Proteogenomic Analysis of Collagens to Evaluate De Novo Sequencing Results and Database Information
Collagen is an important structural protein and the most abundant protein in mammals. In several research fields, structural analysis of collagens is performed. Fibrillar collagens almost entirely consist of continuous repeats of GXY, where G is glycine, X is often proline or alanine and Y is often hydroxyproline or alanine. In the present study, the collagen structure was investigated in detail at the nucleotide, codon group, amino acid and target peptide level using sequence analyses. One of the most important findings was that a selection of codon groups is predominantly involved in amino acid changes between closely related collagens and that other change routes come up when collagens are less related. The findings of the sequence analyses were used to evaluate reported sequences of non-avian dinosaur species and database entries of duck and chicken collagen. The duck assessment was supported by an experimental data set, obtained by collagen extraction from duck skin and subsequent digestion and LC–MS analysis. It was found that database entries of chicken and duck collagen 3α1 contained unreliable features, such as missing parts, no continuous GXY pattern and too many interspecies differences. As an example, the erroneous nature of one of these unreliable features was confirmed experimentally using LC–MS. Finally, dino and bird collagen 1α1 were compared. The presented results will show that performing a domain-specific proteogenomic analysis provides very useful information to assess de novo sequencing results and database information of collagens. Furthermore, it offers deeper insight in the functional restrictions and routes of evolutionary divergence.
KeywordsProteogenomics Domain-specific Collagen LC-MS De novo sequencing GXY domain
The research was performed in Triskelion study 20959 and was financed by Triskelion. Anne Schulp (Naturalis, Leiden, the Netherlands) provided helpful feedback on an earlier version of the manuscript.
Compliance with Ethical Standards
Conflict of interest
The authors declare no conflicts of interest.
The animal material used during the study (skin from duck) was purchased at a local supermarket.
- Buckley M, Warwood S, van Dongen B, Kitchener AC, Manning PL (2017) A fossil protein chimera; difficulties in discriminating dinosaur peptide sequences from modern cross-contamination. Proc R Soc B 284: 20170544. https://doi.org/10.1098/rspb.2017.0544
- Cloudsley-Thompson JL (2005) Ecology and behaviour of Mesozoic reptiles. Springer. ISBN 978-3-540-26571-9Google Scholar
- Karsdal MA, Leeming DJ, Henriksen K, Bay-Jensen A (2017) Biochemistry of collagens, laminins and elastin. Structure, function and biomarkers. Elsevier Academic Press. ISBN: 978-0-12-809847-9Google Scholar
- Kleinnijenhuis AJ (2017) Domain-specific analysis of collagen code. http://www.slideshare.net/AnneKleinnijenhuis/domain-specific-analysis-of-collagen-code
- Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J (2000) Molecular cell biology, 4th edn. New York: WH, Freeman, ISBN-10: 0-7167-3136-3Google Scholar
- Schweitzer M, Zheng W, Organ C, Avci R, Suo Z, Freimark L, Lebleu V, Duncan M, Vander Heiden M, Neveu J, Lane W, Cottrell J, Horner J, Cantley L, Kalluri R, Asara J (2009) Biomolecular characterization and protein sequences of the Campanian hadrosaur B. canadensis. Science 324:626–631CrossRefPubMedGoogle Scholar