Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Similar content being viewed by others
Abbreviations
- PDA:
-
Protein domain architecture
- DCN:
-
Protein domain co-occurrence networks
- SCOP:
-
Structural classification of proteins
- CATH:
-
Protein structural classification database; class (C), architecture (A), topology (T) and homologous superfamily (H)
- Pfam:
-
The protein families database ProDoM
- EGF:
-
Epidermal growth factor
- CUB domain:
-
Complement protein subcomponents C1r/C1s, urchin embryonic growth factor, and bone morphogenetic protein 1
- HSP:
-
Heat shock protein
References
Aderinwale T, Bharadwaj V, Christoffer C, Terashi G, Zhang Z, Jahandideh R, Kagaya Y, Kihara D (2022) Real-time structure search and structure classification for AlphaFold protein models. Commun Biol 5(1):316
Amoutzias GD, Robertson DL, Oliver SG, Bornberg-Bauer E (2004a) Convergent evolution of gene networks by single-gene duplications in higher eukaryotes. EMBO Rep 5(3):274–279
Amoutzias GD, Robertson DL, Bornberg-Bauer E (2004) The evolution of protein interaction networks in regulatory proteins. Comp Funct Genomics 5(1):79–84
Amoutzias GD, Veron AS, Weiner J 3rd, Robinson-Rechavi M, Bornberg-Bauer E, Oliver SG, Robertson DL (2007) One billion years of bZIP transcription factor evolution: conservation and change in dimerization and DNA-binding site specificity. Mol Biol Evol 24(3):827–835
Apic G, Gough J, Teichmann SA (2001) Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol 310(2):311–25
Apic G, Huber W, Teichmann SA (2003) Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination. J Struct Funct Genomics 4(2):67–78
Aravind L, Koonin EV (1999) DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res 27(23):4658–4670. https://doi.org/10.1093/nar/27.23.4658
Bashton M, Chothia C (2002) The geometry of domain combination in proteins. J Mol Biol 315(4):927–939
Basu MK, Carmel L, Rogozin IB, Koonin EV (2008) Evolution of protein domain promiscuity in eukaryotes. Genome Res 18(3):449–461
Basu MK, Poliakov E, Rogozin IB (2009) Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform 10(3):205–216. https://doi.org/10.1093/bib/bbn057
Bhattacharya R, Rose PW, Burley SK, Prlić A (2017) Impact of genetic variation on three dimensional structure and function of proteins. PLoS ONE 12(3):e0171355
Björklund ÅK, Ekman D, Light S, Frey-Skött J, Elofsson A (2005) Domain rearrangements in protein evolution. J Mol Biol 353(4):911–923
Böhning D, Dietz E, Schlattmann P (1998) Recent developments in computer-assisted analysis of mixtures. Biometrics 54(2):525–36
Bornberg-Bauer E, Beaussart F, Kummerfeld SK, Teichmann SA, Weiner J (2005) The evolution of domain arrangements in proteins and interaction networks. Cell Mol Life Sci CMLS 62(4):435–445
Bourne PE, Briedis K, Dupont C, Valas R, Yang S (2010) Genome evolution studied through protein structure. Evolutionary genomics and systems biology. Wiley, pp 153–164
Brown P, Pullan W, Yang Y, Zhou Y (2016) Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic. Bioinformatics 32:370–377
Buljan M, Bateman A (2009) The evolution of protein domain families. Biochem Soc Trans 37(4):751–755
Buljan M, Frankish A, Bateman A (2010) Quantifying the mechanisms of domain gain in animal proteins. Genome Biol 11(7):1–15
Daria N, ShalaevaDmitry A, CherepanovMichael Y, GalperinAndrey V, GolovinArmen Y, Mulkidjanian (2018) Evolution of cation binding in the active sites of P-loop nucleoside triphosphatases in relation to the basic catalytic mechanism. eLife 7:e37373
Daturpalli S, Waudby CA, Meehan S, Jackson SE (2013) Hsp90 inhibits α-synuclein aggregation by interacting with soluble oligomers. J Mol Biol 425(22):4614–4628
Deng L, Zhong G, Liu C, Luo J, Liu H (2019) MADOKA: an ultra-fast approach for large-scale protein structure similarity searching. BMC Bioinform 20:662
Dessailly BH, Dawson NL, Mizuguchi K, Orengo CA (2013) Functional site plasticity in domain superfamilies. Biochim Biophys Acta 1834(5):874–889
Dohmen E, Klasberg S, Bornberg-Bauer E, Perrey S, Kemena C (2020) The modular nature of protein evolution: domain rearrangement rates across eukaryotic life. BMC Evol Biol 20(1):30. https://doi.org/10.1186/s12862-020-1591-0
Doolittle RF (1995) The origins and evolution of eukaryotic proteins. Philosophical transactions of the royal society of London. Ser B: Biol Sci 349(1329):235–240
Doolittle RF, Bork P (1993) Evolutionarily mobile modules in proteins. Sci Am 269(4):50–56
Ekman D, Björklund ÅK, Frey-Skött J, Elofsson A (2005) Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J Mol Biol 348(1):231–243
Ersfeld K, Barraclough H, Gull K (2005) Evolutionary relationships and protein domain architecture in an expanded calpain superfamily in kinetoplastid parasites. J Mol Evol 61(6):742–757
Fong JH, Geer LY, Panchenko AR, Bryant SH (2007) Modeling the evolution of protein domain architectures using maximum parsimony. J Mol Biol 366(1):307–315
Forslund K, Sonnhammer EL (2012) Evolution of protein domain architectures. Evolutionary genomics. Humana Press, Totowa, pp 187–216
Forslund K, Henricson A, Hollich V, Sonnhammer EL (2008) Domain tree-based analysis of protein architecture evolution. Mol Biol Evol 25(2):254–264
Forslund K, Pekkari I, Sonnhammer EL (2011) Domain architecture conservation in orthologs. BMC Bioinform 12(1):1–14
Forslund SK, Kaduk M, Sonnhammer EL (2019) Evolution of protein domain architectures. Evolutionary genomics. Springer, New York, pp 469–504
Garcie C, Tronnet S, Garénaux A, McCarthy AJ, Brachmann AO, Pénary M, Martin P (2016) The bacterial stress-responsive Hsp90 chaperone (HtpG) is required for the production of the genotoxin colibactin and the siderophore yersiniabactin in Escherichia coli. J Infect Dis 214(6):916–924
Genest O, Wickner DSM (2019) Hsp90 and Hsp70 chaperones: collaborators in protein remodeling. J Biol Chem 294(6):2109–2120
Genest O, Hoskins JR, Kravats AN, Doyle SM, Wickner S (2015) Hsp70 and Hsp90 of E. coli directly interact for collaboration in protein remodeling. J Mol Biol 427(24):3877–3889
Gerstein M (1998) How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. Fold Des 3:497–512
Gough J (2005) Convergent evolution of domain architectures (is rare). Bioinformatics 21(8):1464–71. https://doi.org/10.1093/bioinformatics/bti204
Guettouche T, Boellmann F, Lane WS, Voellmy R (2005) Analysis of phosphorylation of human heat shock factor 1 in cells experiencing a stress. BMC Biochem 6(1):1–14
Hainzl O, Lapina MC, Buchner J, Richter K (2009) The charged linker region is an important regulator of Hsp90 function. J Biol Chem 284(34):22559–22567
Harris SF, Shiau AK, Agard DA (2004) The crystal structure of the carboxy-terminal dimerization domain of htpG the Escherichia coli Hsp90 reveals a potential substrate binding site. Structure 12(6):1087–1097
Holm L (2019) Benchmarking fold detection by DaliLite vol 5. Bioinformatics 35:5326–5327
Hsu CH, Chen CK, Hwang MJ (2013) The architectural design of networks of protein domain architectures. Biol Let 9(4):20130268
Huck JD, Que NL, Hong F, Li Z, Gewirth DT (2017) Structural and functional analysis of GRP94 in the closed state reveals an essential role for the pre-N domain and a potential client-binding site. Cell Rep 20(12):2800–2809
Huynen MA, van Nimwegen E (1998) The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol 15(5):583–589
Itoh M, Nacher JC, Kuma K, Goto S, Kanehisa M (2007) Evolutionary history and functional implications of protein domains and their combinations in eukaryotes. Genome Biol 8:R121
Jahn M, Tych K, Girstmair H, Steinmaßl M, Hugel T, Buchner J, Rief M (2018) Folding and domain interactions of three orthologs of Hsp90 studied by single-molecule force spectroscopy. Structure 26(1):96–105
Jolly C, Metz A, Govin J, Vigneron M, Turner BM, Khochbin S, Vourc’h C (2004) Stress-induced transcription of satellite III repeats. J Cell Biol 164(1):25–33
Kim SA, Yoon JH, Lee SH, Ahn SG (2005) Polo-like kinase 1 phosphorylates heat shock transcription factor 1 and mediates its nuclear translocation during heat stress. J Biol Chem 280(13):12653–12657
Knudsen LM, Hippe E, Hjorth M, Holmberg E, Westin J (1994) Renal function in newly diagnosed multiple myeloma—a demographic study of 1353 patients. Eur J Haematol 53(4):207–212
Konaté MM, Plata G, Park J, Usmanova DR, Wang H, Vitkup D (2019) Molecular function limits divergent protein evolution on planetary timescales. Elife 8:e39705. https://doi.org/10.7554/eLife.39705
Koonin EV, Wolf YI, Karev GP (2002) The structure of the protein universe and genome evolution. Nature 420:218–223
Kozlova MI, Shalaeva DN, Dibrova DV, Mulkidjanian AY (2022) Common patterns of hydrolysis initiation in P-loop fold nucleoside triphosphatases. Biomolecules 12(10):1345. https://doi.org/10.3390/biom12101345
Kravats AN, Hoskins JR, Reidy M, Johnson JL, Doyle SM, Genest O, Wickner S (2018) Functional and physical interaction between yeast Hsp90 and Hsp70. Proc Natl Acad Sci USA 115(10):E2210–E2219
Kummerfeld SK, Teichmann SA (2009) Protein domain organisation: adding order. BMC Bioinform 10:39
Lavery LA, Partridge JR, Ramelot TA, Elnatan D, Kennedy MA, Agard DA (2014) Structural asymmetry in the closed state of mitochondrial Hsp90 (TRAP1) supports a two-step ATP hydrolysis mechanism. Mol Cell 53(2):330–343
Levine M, Tjian R (2003) Transcription regulation and animal diversity. Nature 424(6945):147–151
Levitt M (2009) Nature of the protein universe. Proc Natl Acad Sci 106(27):11079–11084
Liu M, Grigoriev A (2004) Protein domains correlate strongly with exons in multiple eukaryotic genomes–evidence of exon shuffling? Trends Genet 20(9):399–403
Ljung F, Andre I (2021) ZEAL: protein structure alignment based on shape similarity. Bioinformatics 37(18):2874–2881. https://doi.org/10.1093/bioinformatics/btab205
Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C (2000) SCOP: a structural classification of proteins database. Nucleic Acids Res 28(1):257–259
Lucas JI, Arnau V, Marin I (2006) Comparative genomics and protein domain graph analyses link ubiquitination and RNA metabolism. J Mol Biol 357:9
Luscombe NM, Qian J, Zhang Z, Johnson T, Gerstein M (2002) The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties. Genome Biol 3(8):00401
Madan Babu M, Teichmann SA (2003) Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res 31(4):1234–1244
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D (1999) Detecting protein function and protein–protein interactions from genome sequences. Science 285:751–753
Marsh JA, Teichmann SA (2010) How do proteins gain new domains? Genome Biol 11(7):1–4
Mascarenhas NM, Gosavi S (2017) Understanding protein domain-swapping using structure-based models of protein folding. Prog Biophys Mol Biol 128:113–120. https://doi.org/10.1016/j.pbiomolbio.2016.09.013
Mayer MP (2013) Hsp70 chaperone dynamics and molecular mechanism. Trends Biochem Sci 38(10):507–514
Mayer MP (2018) Intra-molecular pathways of allosteric control in Hsp70s. Philos Trans R Soc B: Biol Sci 373(1749):20170183
Mayer MP, Kityk R (2015) Insights into the molecular mechanism of allostery in Hsp70s. Front Mol Biosci 2:58
Morimoto RI (2002) Dynamic remodeling of transcription complexes by molecular. Cell 110(3):281–284
Mulder NJ (2010) Protein domain architectures. Methods Mol Biol 609:83–95. https://doi.org/10.1007/978-1-60327-241-4_5
Müller A, MacCallum RM, Sternberg MJ (2002) Structural characterization of the human proteome. Genome Res 12(11):1625–1641
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540. https://doi.org/10.1016/S0022-2836(05)80134-2
Nakamoto H, Fujita K, Ohtaki A, Watanabe S, Narumi S, Maruyama YH (2014) Physical interaction between bacterial heat shock protein (Hsp) 90 and Hsp70 chaperones mediates their cooperative action to refold denatured proteins. J Biol Chem 289(9):6110–6119
Nimwegen EV (2006) Scaling laws in the functional content of genomes. Power laws, scale-free networks and genome biology. Springer, pp 236–253
Park J, Lappe M, Teichmann SA (2001) Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. J Mol Biol 307(3):929–938
Patthy L (1999) Genome evolution and the evolution of exon-shuffling–a review. Gene 238(1):103–114
Peisajovich SG, Garbarino JE, Wei P, Lim WA (2010) Rapid diversification of cell signaling phenotypes by modular domain recombination. Science 328:368–372
Pérez-Rueda E, Collado-Vides J (2000) The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Res 28(8):1838–1847
Persson E, Kaduk M, Forslund SK, Sonnhammer ELL (2019) Domainoid: domain-oriented orthology inference. BMC Bioinform 20(1):523. https://doi.org/10.1186/s12859-019-3137-2
Prodromou C (2016) Mechanisms of Hsp90 regulation. Biochem J 473(16):2439–2452
Przytycka T, Davis G, Song N, Durand D (2006) Graph theoretical insights into evolution of multidomain proteins. J Comput Biol 13(2):351–363
Qian J, Luscombe NM, Gerstein M (2001) Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model11Edited by. J Thornton J Mol Biol 313(4):673–681
Radli M, Rüdiger SG (2018) Dancing with the diva: Hsp90–client interactions. J Mol Biol 430(18):3029–3040
Ramesh P, Nagendrappa JH, Shivashankara SKH (2021) Comparative analysis of Rosetta stone events in Klebsiella pneumoniae and Streptococcus pneumoniae for drug target identification. Beni-Suef Univer J Basic Appl Sci 10(1):1–11
Röhl A, Rohrberg J, Buchner J (2013) The chaperone Hsp90: changing partners for demanding clients. Trends Biochem Sci 38(5):253–262
Rost B (1997) Protein structures sustain evolutionary drift. Fold Des 2:S19–S24. https://doi.org/10.1016/S1359-0278(97)00059-X
Russell RB, Alber F, Aloy P, Davis FP, Korkin D, Pichaud M, Sali A (2004) A structural perspective on protein–protein interactions. Curr Opin Struct Biol 14(3):313–324
Schopf FH, Biebl MM, Buchner J (2017) The HSP90 chaperone machinery. Nat Rev Mol Cell Biol 18(6):345–360
Servant F, Bru C, Carrere S, Courcelle E, Gouzy J, Peyruc D, Kahn D (2002) ProDom: automated clustering of homologous domains. Brief Bioinform 3(3):246–251
Shiau AK, Harris SF, Southworth DR, Agard DA (2006) Structural analysis of E. coli hsp90 reveals dramatic nucleotide-dependent conformational rearrangements. Cell 127(2):329–340
Sikosek T, Chan HS (2014) Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 11(100):20140419. https://doi.org/10.1098/rsif.2014.0419
Sillitoe I, Dawson N, Thornton J, Orengo C (2015) The history of the CATH structural classification of protein domains. Biochimie 119:209–217. https://doi.org/10.1016/j.biochi.2015.08.004
Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res 26(1):320–322
Sõti C, Nagy E, Giricz Z, Vígh L, Csermely P, Ferdinandy P (2005) Heat shock proteins as emerging therapeutic targets. Br J Pharmacol 146(6):769–780
Teichmann SA, Park J, Chothia C (1998) Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc Natl Acad Sci USA 95:14658–14663
Thomas GWC, Dohmen E, Hughes DST, Murali SC, Poelchau M, Glastad K, Anstead CA, Ayoub NA, Batterham P, Bellair M, Binford GJ, Chao H, Chen YH, Childers C, Dinh H, Doddapaneni HV, Duan JJ, Dugan S, Esposito LA, Friedrich M (2020) Gene content evolution in the arthropods. Genome Biol 21:15. https://doi.org/10.1186/s13059-019-1925-7
Toll-Riera M, Albà MM (2013) Emergence of novel domains in proteins. BMC Evol Biol 13:47. https://doi.org/10.1186/1471-2148-13-47
Tordai H, Nagy A, Farkas K, Bányai L, Patthy L (2005) Modules, multidomain proteins and organismic complexity. FEBS J 272:5064–5078
van Rijk A, Bloemendal H (2003) Molecular mechanisms of exon shuffling: illegitimate recombination. Genetica 118(2):245–249
Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA (2004a) Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol 14(2):208–216. https://doi.org/10.1016/j.sbi.2004.03.011
Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA (2004b) Supra-domains: evolutionary units larger than single protein domains. J Mol Biol 336(3):809–823. https://doi.org/10.1016/j.jmb.2003.12.026
Wang Z, Zhang XC, Le MH, Xu D, Stacey G, Cheng J (2011) A protein domain co-occurrence network approach for predicting protein function and inferring species phylogeny. PLoS ONE 6(3):e17906
Weiner J 3rd, Beaussart F, Bornberg-Bauer E (2006) Domain deletions and substitutions in the modular protein evolution. FEBS J 273(9):2037–2047
Weiner J, Moore AD, Bornberg-Bauer E (2008) Just how versatile are domains? BMC Evol Biol 8(1):1–14
Wuchty S, Almaas E (2005) Evolutionary cores of domain co-occurrence networks. BMC Evol Biol 5:24. https://doi.org/10.1186/1471-2148-5-24
Ye Y, Godzik A (2004) Comparative analysis of protein domain organization. Genome Res 14(3):343–353
Young JC, Hartl FU (2000) Polypeptide release by Hsp90 involves ATP hydrolysis and is enhanced by the co-chaperone p23. EMBO J 19(21):5930–5940
Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33:2302–2309
Zhu X, Zhao X, Burkholder WF, Gragerov A, Ogata CM, Gottesman ME, Hendrickson WA (1996) Structural analysis of substrate binding by the molecular chaperone DnaK. Science 272(5268):1606–1614
Acknowledgements
We thank Dr. Riaz Mahmood, Professor, Department of Biotechnology, Kuvempu University, Shankaraghatta, India, for assisting in language editing.
Funding
No funding was available for the present work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no conflict of interest.
Additional information
Handling editor: Erich Bornberg-Bauer.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gollapalli, P., Rudrappa, S., Kumar, V. et al. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 91, 598–615 (2023). https://doi.org/10.1007/s00239-023-10129-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-023-10129-w