A Roadmap to Domain Based Proteomics

  • Carsten Kemena
  • Erich Bornberg-Bauer
Part of the Methods in Molecular Biology book series (MIMB, volume 1851)


Protein domains are reusable segments of proteins and play an important role in protein evolution. By combining the elements from a relatively small set of domains into unique arrangements, a large number of distinct proteins can be generated. Since domains often have specific functions, changes in their arrangement usually affect the overall protein function. Furthermore, domains are well amenable to computational representations, e.g., by Hidden Markov Models (HMMs), and these HMMs are widely represented in various databases. Therefore, domains can be efficiently used for proteomic analyses. Here, we describe how domains are annotated using different domain databases and then how to assess the annotation quality of proteomes. We next show how functional annotations of domains in large-scale data such as whole genomes or transcriptomes can be used to analyze molecular differences between species. Furthermore, we describe methods to analyze the changes in domain content of proteins which significantly helps to characterize and reconstruct the modular evolution of proteins. Altogether, domain-based methods offer a computationally highly effective approach to analyze large amounts of proteomic data in an evolutionary setting.

Key words

Protein domain Molecular evolution 



We would like to thank Mark Harrison and Ulrike Brandt for helpful suggestions.


  1. 1.
    Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA (2004) Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol 14(2):208–216CrossRefGoogle Scholar
  2. 2.
    Moore AD, Asa KB, Ekman D, Bornberg-Bauer E,  Elofsson A (2008) Arrangements in the modular evolution of proteins. Trends Biochem Sci 33(9):444–451CrossRefGoogle Scholar
  3. 3.
    Lees JG, Dawson NL, Sillitoe I, Orengo CA (2016) Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 38:44–52CrossRefGoogle Scholar
  4. 4.
    Levitt M (2009) Nature of the protein universe. Proc Natl Acad Sci USA 106(27):11079–11084CrossRefGoogle Scholar
  5. 5.
    Remmert M, Biegert A, Hauser A,  Soding J (2011) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175CrossRefGoogle Scholar
  6. 6.
    Moore AD, Grath S, Schüler A, Huylmans AK, Bornberg-Bauer E (2013) Quantification and functional analysis of modular protein evolution in a dense phylogenetic tree. Biochim Biophys Acta Proteins Proteomics 1834(5):898–907CrossRefGoogle Scholar
  7. 7.
    Moore AD,  Bornberg-Bauer E (2012) The dynamics and evolutionary potential of domain loss and emergence. Mol Biol Evol 29(2):787–796CrossRefGoogle Scholar
  8. 8.
    Kersting AR, Bornberg-Bauer E, Moore AD,  Grath S (2012) Dynamics and adaptive benefits of protein domain emergence and arrangements during plant genome evolution. Genome Biol Evol 4(3):316–329CrossRefGoogle Scholar
  9. 9.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM,  Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29CrossRefGoogle Scholar
  10. 10.
    Sigrist CJA,  Castro E, de Cerutti L, Cuche BA, Hulo N, Bridge A, Lydie B,  Xenarios I (2013) New and continuing developments at PROSITE. Nucleic Acids Res 41(Database-Issue):344–347CrossRefGoogle Scholar
  11. 11.
    Bitard-Feildel T, Heberlein M, Bornberg-Bauer E,  Callebaut I (2015) Detection of orphan domains in Drosophila using “hydrophobic cluster analysis”. Biochimie 119:244–253CrossRefGoogle Scholar
  12. 12.
    Finn RD, Attwood TK, Babbitt PC, Bateman A, Bork P, Bridge AJ, Chang HY, Dosztanyi Z, El-Gebali S, Fraser M, Gough J, Haft D, Holliday GL, Huang H, Huang X, Letunic I, Lopez R, Lu S, Marchler-Bauer A,  Mi H, Mistry J, Natale DA, Necci M, Nuka G, Orengo CA, Park Y,  Pesseat S, Piovesan D, Potter SC, Rawlings ND, Redaschi N,  Richardson L, Rivoire C, Sangrador-Vegas A, Sigrist C, Sillitoe I,  Smithers B, Squizzato S, Sutton G, Thanki N, Thomas PD, Tosatto SC, Wu CH, Xenarios I, Yeh LS, Young SY, Mitchell AL (2017) InterPro in 2017–beyond protein family and domain annotations. Nucleic Acids Res 45(D1):D190–D199CrossRefGoogle Scholar
  13. 13.
    Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA,  Tate J,  Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44(D1):D279–D285CrossRefGoogle Scholar
  14. 14.
    Bernardes JS, Vieira FR, Zaverucha G, Carbone A (2016) A multi-objective optimization approach accurately resolves protein domain architectures. Bioinformatics 32(3):345–353CrossRefGoogle Scholar
  15. 15.
    Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ,  Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515CrossRefGoogle Scholar
  16. 16.
    NCBI Resource Coordinators (2017) Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res 45(D1):D12–D17CrossRefGoogle Scholar
  17. 17.
    Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D,  Cummins C, Clapham P, Fitzgerald S, Gil L, Giron CG, Gordon L,  Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Keenan S,  Lavidas I, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R,  Nuhn M, Parker A, Patricio M, Pignatelli M, Rahtz M, Riat HS,  Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A,  Birney E, Harrow J, Muffato M, Perry E, Ruffier M, Spudich G, Trevanion SJ, Cunningham F, Aken BL, Zerbino DR,  Flicek P (2016) Ensembl 2016. Nucleic Acids Res 44(D1):D710–D716CrossRefGoogle Scholar
  18. 18.
    Elsik CG, Tayal A, Diesh CM, Unni DR, Emery ML, Nguyen HN, Hagen DE (2016) Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine. Nucleic Acids Res 44(D1):793–800CrossRefGoogle Scholar
  19. 19.
    Labunskyy VM, Hatfield DL, Gladyshev VN (2014) Selenoproteins: molecular pathways and physiological roles. Physiol Rev 94(3):739–777CrossRefGoogle Scholar
  20. 20.
    Dohmen E, Kremer LPM, Bornberg-Bauer E, Kemena C. (2016) DOGMA: domain-based transcriptome and proteome quality assessment. Bioinformatics 32(17):2577–2581CrossRefGoogle Scholar
  21. 21.
    Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31(19):3210–3212CrossRefGoogle Scholar
  22. 22.
    Terrapon N, Gascuel O, Marechal E,  Breehelin L (2009) Detection of new protein domains using co-occurrence: application to Plasmodium falciparum. Bioinformatics 25(23):3077–3083CrossRefGoogle Scholar
  23. 23.
    Alexa A,  Rahnenführer J (2016) topGO: enrichment analysis for gene ontology. R package version 2.26.0Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute for Evolution and BiodiversityUniversity of MünsterMünsterGermany

Personalised recommendations