Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity

  • Bianca Hermine HabermannEmail author


Inferring remote orthologs is a persistent challenge in computational biology. The identification of orthologs is necessary for performing evolutionary analyses, comparative genomics, and genome annotation or for functional predictions and sensible planning of experimental studies. If we miss orthologous relationships due to low sequence conservation, we lose a significant amount of information. Given their fast evolutionary rates, remote orthologs can only be identified on protein level. A pair of proteins that has evolved by speciation and has below 30 % sequence identity can be defined as remote orthologs. Their high sequence divergence prevents their unambiguous recognition as orthologous proteins and does not allow a reliable interpretation of their evolutionary relationship. Thus, many remote orthologs remain hidden to date. In this article, I review current methods for remote orthology inference, highlight existing problems in, and discuss potential solutions for discovering remote orthologs.


Hide Markov Model Orthologous Relationship Orthology Assignment Large Evolutionary Distance Remote Homology Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



I would like to thank Frank Schnorrer and Friedhelm Pfeiffer for critical reading of the manuscript. This work was supported by the Max Planck Society and by the CNRS.


  1. Abagyan RA, Batalov S (1997) Do aligned sequences share the same fold? J Mol Biol 273(1):355–368. doi: 10.1006/jmbi.1997.1287 CrossRefPubMedGoogle Scholar
  2. Afrasiabi C, Samad B, Dineen D, Meacham C, Sjölander K (2013) The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification. Nucleic Acids Res 41(Web Server issue), W242–8. doi: 10.1093/nar/gkt399 Google Scholar
  3. Alexeyenko A, Tamas I, Liu G, Sonnhammer ELL (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics (Oxford, England), 22(14), e9–15. doi: 10.1093/bioinformatics/btl213 Google Scholar
  4. Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5(1):e1000262. doi: 10.1371/journal.pcbi.1000262 CrossRefPubMedPubMedCentralGoogle Scholar
  5. Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C (2012) Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput Biol 8(5):e1002514. doi: 10.1371/journal.pcbi.1002514 CrossRefPubMedPubMedCentralGoogle Scholar
  6. Altenhoff AM, Škunca N, Glover N, Train C-M, Sueki A, Piližota I et al (2015) The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res 43(Database issue), D240–9. doi: 10.1093/nar/gku1158 Google Scholar
  7. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefPubMedPubMedCentralGoogle Scholar
  8. Alva V, Remmert M, Biegert A, Lupas AN, Söding J (2010) A galaxy of folds. Protein Sci: A Publ Protein Soc 19(1):124–130. doi: 10.1002/pro.297 Google Scholar
  9. Banumathy G, Somaiah N, Zhang R, Tang Y, Hoffmann J, Andrake M et al (2009) Human UBN1 is an ortholog of yeast Hpc2p and has an essential role in the HIRA/ASF1a chromatin-remodeling pathway in senescent cells. Mol Cell Biol 29(3):758–770. doi: 10.1128/MCB.01047-08 CrossRefPubMedGoogle Scholar
  10. Barberis M, De Gioia L, Ruzzene M, Sarno S, Coccetti P, Fantucci P et al (2005) The yeast cyclin-dependent kinase inhibitor Sic1 and mammalian p27Kip1 are functional homologues with a structurally conserved inhibitory domain. Biochem J 387(Pt 3):639–647. doi: 10.1042/BJ20041299 CrossRefPubMedPubMedCentralGoogle Scholar
  11. Bedoya O, Tischer I (2014) Remote homology detection incorporating the context of physicochemical properties. Comput Biol Med 45:43–50. doi: 10.1016/j.compbiomed.2013.11.012 CrossRefPubMedGoogle Scholar
  12. Bedoya O, Tischer I (2015) Reducing dimensionality in remote homology detection using predicted contact maps. Comput Biol Med 59:64–72. doi: 10.1016/j.compbiomed.2015.01.020 CrossRefPubMedGoogle Scholar
  13. Bernardes JS, Dávila AMR, Costa VS, Zaverucha G (2007) Improving model construction of profile HMMs for remote homology detection through structural alignment. BMC Bioinform 8(1):435. doi: 10.1186/1471-2105-8-435 CrossRefGoogle Scholar
  14. Bernardes JS, Carbone A, Zaverucha G (2011) A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models. BMC Bioinform 12(1):83. doi: 10.1186/1471-2105-12-83 CrossRefGoogle Scholar
  15. Bhadra R, Sandhya S, Abhinandan KR, Chakrabarti S, Sowdhamini R, Srinivasan N (2006) Cascade PSI-BLAST web server: a remote homology search tool for relating protein domains. Nucleic Acids Res 34(Web Server issue), W143–6. doi: 10.1093/nar/gkl157 Google Scholar
  16. Bhardwaj G, Ko KD, Hong Y, Zhang Z, Ho NL, Chintapalli SV et al (2012) PHYRN: a robust method for phylogenetic analysis of highly divergent sequences. PLoS ONE 7(4):e34261. doi: 10.1371/journal.pone.0034261 CrossRefPubMedPubMedCentralGoogle Scholar
  17. Biegert A, Mayer C, Remmert M, Söding J, Lupas AN (2006) The MPI bioinformatics toolkit for protein sequence analysis. Nucleic Acids Res 34(Web Server issue), W335–9. doi: 10.1093/nar/gkl217 Google Scholar
  18. Blake JD, Cohen FE (2001) Pairwise sequence alignment below the twilight zone. J Mol Biol 307(2):721–735. doi: 10.1006/jmbi.2001.4495 CrossRefPubMedGoogle Scholar
  19. Bork P, Sander C, Valencia A (1993) Convergent evolution of similar enzymatic function on different protein folds: the hexokinase, ribokinase, and galactokinase families of sugar kinases. Protein Sci: A Publ Protein Soc 2(1):31–40. doi: 10.1002/pro.5560020104 CrossRefGoogle Scholar
  20. Burmester T, Hankeln T (2014) Function and evolution of vertebrate globins. Acta Physiol (Oxford, England), 211(3): 501–514. doi: 10.1111/apha.12312 Google Scholar
  21. Chang GS, Hong Y, Ko KD, Bhardwaj G, Holmes EC, Patterson RL, van Rossum DB (2008) Phylogenetic profiles reveal evolutionary relationships within the “twilight zone” of sequence similarity. Proc Natl Acad Sci USA 105(36):13474–13479. doi: 10.1073/pnas.0803860105 CrossRefPubMedPubMedCentralGoogle Scholar
  22. Comin M, Verzotto D (2011) The irredundant class method for remote homology detection of protein sequences. J Computat Biol: J Computat Mol Cell Biol 18(12):1819–1829. doi: 10.1089/cmb.2010.0171 CrossRefGoogle Scholar
  23. Conant GC, Wolfe KH (2008) Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet 9(12):938–950. doi: 10.1038/nrg2482 CrossRefPubMedGoogle Scholar
  24. Dalquen DA, Dessimoz C (2013) Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals. Genome Biol Evol 5(10):1800–1806. doi: 10.1093/gbe/evt132 CrossRefPubMedPubMedCentralGoogle Scholar
  25. Darzentas N, Rigoutsos I, Ouzounis CA (2005) Sensitive detection of sequence similarity using combinatorial pattern discovery: a challenging study of two distantly related protein families. Proteins 61(4):926–937. doi: 10.1002/prot.20608 CrossRefPubMedGoogle Scholar
  26. Datta RS, Meacham C, Samad B, Neyer C, Sjölander K (2009) Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res 37(Web Server issue), W84–9. doi: 10.1093/nar/gkp373 Google Scholar
  27. Dietmann S, Fernandez-Fuentes N, Holm L (2002) Automated detection of remote homology. Curr Opin Struct Biol 12(3):362–367CrossRefPubMedGoogle Scholar
  28. Dong Y, Bogdanova A, Habermann B, Zachariae W, Ahringer J (2007) Identification of the C. elegans anaphase promoting complex subunit Cdc26 by phenotypic profiling and functional rescue in yeast. BMC Dev Biol 7(1):19. doi: 10.1186/1471-213X-7-19 CrossRefPubMedPubMedCentralGoogle Scholar
  29. Doolittle RF (1986) Of Urfs and Orfs: a primer on how to analyze derived amino acid sequences. In: University Science Books, Herndon, VA vol 29, pp 1–103. doi: 10.1002/jobm.3620290411 Google Scholar
  30. Dufayard J-F, Duret L, Penel S, Gouy M, Rechenmann F, Perrière G (2005) Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics (Oxford, England), 21(11): 2596–2603. doi: 10.1093/bioinformatics/bti325 Google Scholar
  31. Eddy SR (2009) A new generation of homology search tools based on probabilistic inference. Genome Inform Int Conf Genome Inform 23(1): 205–211Google Scholar
  32. Eyre TA, Wright MW, Lush MJ, Bruford EA (2007) HCOP: a searchable database of human orthology predictions. Briefings Bioinform 8(1):2–5. doi: 10.1093/bib/bbl030 CrossRefGoogle Scholar
  33. Fariselli P, Rossi I, Capriotti E, Casadio R (2007) The WWWH of remote homolog detection: the state of the art. Briefings Bioinform 8(2):78–87. doi: 10.1093/bib/bbl032 CrossRefGoogle Scholar
  34. Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F et al (2015) HMMER web server: 2015 update. Nucleic Acids Res 43(W1):W30–W38. doi: 10.1093/nar/gkv397 CrossRefPubMedPubMedCentralGoogle Scholar
  35. Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19(2):99–113CrossRefPubMedGoogle Scholar
  36. Gabaldón T, Koonin EV (2013) Functional and evolutionary implications of gene orthology. Nat Rev Genet 14(5):360–366. doi: 10.1038/nrg3456 CrossRefPubMedGoogle Scholar
  37. Galindo A, Hervás-Aguilar A, Rodríguez-Galán O, Vincent O, Arst HN, Tilburn J, Peñalva MA (2007) PalC, one of two Bro1 domain proteins in the fungal pH signalling pathway, localizes to cortical structures and binds Vps32. Traffic (Copenhagen, Denmark) 8(10): 1346–1364. doi: 10.1111/j.1600-0854.2007.00620.x Google Scholar
  38. Ginalski K (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31(13):3804–3807. doi: 10.1093/nar/gkg504 CrossRefPubMedPubMedCentralGoogle Scholar
  39. Gray GS, Fitch WM (1983) Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol Biol Evol 1(1):57–66PubMedGoogle Scholar
  40. Grossberger R, Gieffers C, Zachariae W, Podtelejnikov AV, Schleiffer A, Nasmyth K et al (1999) Characterization of the DOC1/APC10 subunit of the yeast and the human anaphase-promoting complex. J Biol Chem 274(20):14500–14507CrossRefPubMedGoogle Scholar
  41. Gupta MK, Niyogi R, Misra M (2013) An alignment-free method to find similarity among protein sequences via the general form of Chou’s pseudo amino acid composition. SAR QSAR Environ Res 24(7):597–609. doi: 10.1080/1062936X.2013.773378 CrossRefPubMedGoogle Scholar
  42. Heinicke S, Livstone MS, Lu C, Oughtred R, Kang F, Angiuoli SV et al (2007) The Princeton protein orthology database (P-POD): a comparative genomics analysis tool for biologists. PLoS ONE 2(8):e766. doi: 10.1371/journal.pone.0000766 CrossRefPubMedPubMedCentralGoogle Scholar
  43. Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M et al (2016) Ensemble comparative genomics resources. Database: J Biol Databases Curation 2016, bav096. doi: 10.1093/database/bav096 Google Scholar
  44. Höhl M, Ragan MA (2007) Is multiple-sequence alignment required for accurate inference of phylogeny? Syst Biol 56(2):206–221. doi: 10.1080/10635150701294741 CrossRefPubMedGoogle Scholar
  45. Höhl M, Rigoutsos I, Ragan MA (2006) Pattern-based phylogenetic distance estimation and tree reconstruction. Evol Bioinform Online 2:359–375Google Scholar
  46. Huerta-Cepas J, Bueno A, Dopazo J, Gabaldon T (2007) PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucleic Acids Res 36(Database), D491–D496. doi: 10.1093/nar/gkm899 Google Scholar
  47. Hutterer A, Berdnik D, Wirtz-Peitz F, Zigman M, Schleiffer A, Knoblich JA (2006) Mitotic activation of the kinase Aurora-A requires its binding partner Bora. Dev Cell 11(2):147–157. doi: 10.1016/j.devcel.2006.06.002 CrossRefPubMedGoogle Scholar
  48. Ivliev AE, Sergeeva MG (2008) OrthoFocus: program for identification of orthologs in multiple genomes in family-focused studies. Js Bioinform Comput Biol 6(4):811–824CrossRefGoogle Scholar
  49. Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform 11(1):431. doi: 10.1186/1471-2105-11-431 CrossRefGoogle Scholar
  50. Karwath A, King RD (2002) Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinform 3(1):11. doi: 10.1186/1471-2105-3-11 CrossRefGoogle Scholar
  51. Kim S, Kang J, Chung YJ, Li J, Ryu KH (2008) Clustering orthologous proteins across phylogenetically distant species. Proteins 71(3):1113–1122. doi: 10.1002/prot.21792 CrossRefPubMedGoogle Scholar
  52. Kim B-H, Cheng H, Grishin NV (2009) HorA web server to infer homology between proteins using sequence and structural similarity. Nucleic Acids Res 37(Web Server issue), W532–8. doi: 10.1093/nar/gkp328 Google Scholar
  53. Kim J, Ishiguro K-I, Nambu A, Akiyoshi B, Yokobayashi S, Kagami A et al (2015) Meikin is a conserved regulator of meiosis-I-specific kinetochore function. Nature 517(7535):466–471. doi: 10.1038/nature14097 CrossRefPubMedGoogle Scholar
  54. Kitajima TS, Kawashima SA, Watanabe Y (2004) The conserved kinetochore protein shugoshin protects centromeric cohesion during meiosis. Nature 427(6974):510–517. doi: 10.1038/nature02312 CrossRefPubMedGoogle Scholar
  55. Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39(1):309–338. doi: 10.1146/annurev.genet.39.073003.114725 CrossRefPubMedGoogle Scholar
  56. Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for Gene Orthology inference. Briefings Bioinform 12(5):379–391. doi: 10.1093/bib/bbr030 CrossRefGoogle Scholar
  57. Kriventseva EV, Rahman N, Espinosa O, Zdobnov EM (2008) OrthoDB: the hierarchical catalog of eukaryotic orthologs. Nucleic Acids Res 36(Database issue), D271–5. doi: 10.1093/nar/gkm845 Google Scholar
  58. Kueng S, Hegemann B, Peters BH, Lipp JJ, Schleiffer A, Mechtler K, Peters J-M (2006) Wapl controls the dynamic association of cohesin with chromatin. Cell 127(5):955–967. doi: 10.1016/j.cell.2006.09.040 CrossRefPubMedGoogle Scholar
  59. Kumar S (2011) Remote homologue identification of Drosophila GAGA factor in mouse. Bioinformation 7(1):29–32CrossRefPubMedPubMedCentralGoogle Scholar
  60. Kumar A, Cowen L (2009) Augmented training of hidden Markov models to recognize remote homologs via simulated evolution. Bioinformatics (Oxford, England) 25(13): 1602–1608. doi: 10.1093/bioinformatics/btp265 Google Scholar
  61. Kuziemko A, Honig B, Petrey D (2011) Using structure to explore the sequence alignment space of remote homologs. PLoS Comput Biol 7(10):e1002175. doi: 10.1371/journal.pcbi.1002175 CrossRefPubMedPubMedCentralGoogle Scholar
  62. Lawo S, Bashkurov M, Mullin M, Ferreria MG, Kittler R, Habermann B et al (2009) HAUS, the 8-subunit human Augmin complex, regulates centrosome and spindle integrity. Current Biol: CB 19(10):816–826. doi: 10.1016/j.cub.2009.04.033 CrossRefGoogle Scholar
  63. Lee MM, Bundschuh R, Chan MK (2008) Distant homology detection using a LEngth and STructure-based sequence alignment tool (LESTAT). Proteins 71(3):1409–1419. doi: 10.1002/prot.21830 CrossRefPubMedGoogle Scholar
  64. Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. doi: 10.1101/gr.1224503 CrossRefPubMedPubMedCentralGoogle Scholar
  65. Liu K, Raghavan S, Nelesen S, Linder CR, Warnow T (2009) Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science (New York, N.Y.) 324(5934):1561–1564. doi: 10.1126/science.1171243 CrossRefGoogle Scholar
  66. Liu K, Warnow TJ, Holder MT, Nelesen SM, Yu J, Stamatakis AP, Linder CR (2012) SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst Biol 61(1):90–106. doi: 10.1093/sysbio/syr095 CrossRefPubMedGoogle Scholar
  67. Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q et al (2014) Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics (Oxford, England) 30(4): 472–479. doi: 10.1093/bioinformatics/btt709 Google Scholar
  68. Liu B, Chen J, Wang X (2015) Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics: MGG 290(5):1919–1931. doi: 10.1007/s00438-015-1044-4 CrossRefPubMedGoogle Scholar
  69. Makarova KS, Koonin EV, Kelman Z (2012) The CMG (CDC45/RecJ, MCM, GINS) complex is a conserved component of the DNA replication system in all archaea and eukaryotes. Biol Direct 7(1):7. doi: 10.1186/1745-6150-7-7 CrossRefPubMedPubMedCentralGoogle Scholar
  70. Maulik U, Sarkar A (2013) Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels. PLoS ONE 8(2):e46468. doi: 10.1371/journal.pone.0046468 CrossRefPubMedPubMedCentralGoogle Scholar
  71. Meier A, Söding J (2015) Context similarity scoring improves protein sequence alignments in the midnight zone. Bioinformatics (Oxford, England) 31(5): 674–681. doi: 10.1093/bioinformatics/btu697 Google Scholar
  72. Mina JG, Okada Y, Wansadhipathi-Kannangara NK, Pratt S, Shams-Eldin H, Schwarz RT et al (2010) Functional analyses of differentially expressed isoforms of the Arabidopsis inositol phosphorylceramide synthase. Plant Mol Biol 73(4–5):399–407. doi: 10.1007/s11103-010-9626-3 CrossRefPubMedGoogle Scholar
  73. Mirarab S, Nguyen N, Warnow T (2012) SEPP: SATé-enabled phylogenetic placement. In: Pacific symposium on biocomputing. Pacific symposium on biocomputing, pp. 247–258. doi: 10.1142/9789814366496_0024
  74. Muda HM, Saad P, Othman RM (2011) Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. Comput Biol Med 41(8):687–699. doi: 10.1016/j.compbiomed.2011.06.004 CrossRefPubMedGoogle Scholar
  75. Mudgal R, Sowdhamini R, Chandra N, Srinivasan N, Sandhya S (2014) Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. J Mol Biol 426(4):962–979. doi: 10.1016/j.jmb.2013.11.026 CrossRefPubMedGoogle Scholar
  76. Mudgal R, Sandhya S, Kumar G, Sowdhamini R, Chandra NR, Srinivasan N (2015) NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection. Nucleic Acids Res 43(Database issue), D300–5. doi: 10.1093/nar/gku888 Google Scholar
  77. Murzin AG, Bateman A (1997) Distant homology recognition using structural classification of proteins. Proteins Suppl 1:105–112CrossRefGoogle Scholar
  78. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540. doi: 10.1006/jmbi.1995.0159 PubMedGoogle Scholar
  79. NCBI Resource Coordinators (2016) Database resources of the national center for biotechnology information. Nucleic Acids Res 44(D1):D7–D19. doi: 10.1093/nar/gkv1290 CrossRefGoogle Scholar
  80. Nehrt NL, Clark WT, Radivojac P, Hahn MW (2011) Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS Comput Biol 7(6):e1002073. doi: 10.1371/journal.pcbi.1002073 CrossRefPubMedPubMedCentralGoogle Scholar
  81. Nelesen S, Liu K, Wang L-S, Linder CR, Warnow T (2012) DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics (Oxford, England) 28(12): i274–82. doi: 10.1093/bioinformatics/bts218 Google Scholar
  82. Nishiyama T, Ladurner R, Schmitz J, Kreidl E, Schleiffer A, Bhaskara V et al (2010) Sororin mediates sister chromatid cohesion by antagonizing Wapl. Cell 143(5):737–749. doi: 10.1016/j.cell.2010.10.031 CrossRefPubMedGoogle Scholar
  83. Östlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S et al (2010) InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 38(Database issue), D196–203. doi: 10.1093/nar/gkp931 Google Scholar
  84. Ozlü N, Srayko M, Kinoshita K, Habermann B, O’toole ET, Müller-Reichert T et al (2005) An essential function of the C. elegans ortholog of TPX2 is to localize activated aurora A kinase to mitotic spindles. Dev Cell 9(2): 237–248. doi: 10.1016/j.devcel.2005.07.002 Google Scholar
  85. Pelletier L, Ozlü N, Hannak E, Cowan C, Habermann B, Ruer M et al (2004) The Caenorhabditis elegans centrosomal protein SPD-2 is required for both pericentriolar material recruitment and centriole duplication. Current Biol: CB 14(10):863–873. doi: 10.1016/j.cub.2004.04.012 CrossRefGoogle Scholar
  86. Penel S, Arigon A-M, Dufayard J-F, Sertier A-S, Daubin V, Duret L et al (2009) Databases of homologous gene families for comparative genomics. BMC Bioinform 10 Suppl 6(Suppl 6), S3. doi: 10.1186/1471-2105-10-S6-S3 Google Scholar
  87. Penkett CJ, Morris JA, Wood V, Bähler J (2006) YOGY: a web-based, integrated database to retrieve protein orthologs and associated gene ontology terms. Nucleic Acids Res 34(Web Server issue), W330–4. doi: 10.1093/nar/gkl311 Google Scholar
  88. Perutz MF, ROSSMANN MG, CULLIS AF, MUIRHEAD H, WILL G, NORTH AC (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature 185(4711), 416–422Google Scholar
  89. Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J et al (2011) eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res 40(D1):D284–D289. doi: 10.1093/nar/gkr1060 CrossRefPubMedPubMedCentralGoogle Scholar
  90. Proost S, Van Bel M, Vaneechoutte D, Van de Peer Y, Inzé D, Mueller-Roeber B, Vandepoele K (2015) PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res 43(Database issue), D974–81. doi: 10.1093/nar/gku986 Google Scholar
  91. Pryszcz LP, Huerta-Cepas J, Gabaldón T (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res 39(5):e32–e32. doi: 10.1093/nar/gkq953 CrossRefPubMedGoogle Scholar
  92. Rabitsch KP, Gregan J, Schleiffer A, Javerzat J-P, Eisenhaber F, Nasmyth K (2004) Two fission yeast homologs of Drosophila Mei-S332 are required for chromosome segregation during meiosis I and II. Current Biol: CB 14(4):287–301. doi: 10.1016/j.cub.2004.01.051 CrossRefGoogle Scholar
  93. Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175. doi: 10.1038/nmeth.1818 CrossRefGoogle Scholar
  94. Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12(2):85–94CrossRefPubMedGoogle Scholar
  95. Ruan J, Li H, Chen Z, Coghlan A, Coin LJM, Guo Y et al (2008) TreeFam: 2008 Update. Nucleic Acids Res 36(Database issue), D735–40. doi: 10.1093/nar/gkm1005 Google Scholar
  96. Sánchez-Díaz A, González I, Arellano M, Moreno S (1998) The Cdk inhibitors p25rum1 and p40SIC1 are functional homologues that play similar roles in the regulation of the cell cycle in fission and budding yeast. J Cell Sci 111(Pt 6):843–851PubMedGoogle Scholar
  97. Sandhya S, Mudgal R, Jayadev C, Abhinandan KR, Sowdhamini R, Srinivasan N (2012) Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. Mol BioSyst 8(8):2076–2084. doi: 10.1039/c2mb25113b CrossRefPubMedGoogle Scholar
  98. Schreiber F, Sonnhammer ELL (2013) Hieranoid: hierarchical orthology inference. J Mol Biol 425(11):2072–2081. doi: 10.1016/j.jmb.2013.02.018 CrossRefPubMedGoogle Scholar
  99. Schwickart M, Havlis J, Habermann B, Bogdanova A, Camasses A, Oelschlaegel T et al (2004) Swm1/Apc13 is an evolutionarily conserved subunit of the anaphase-promoting complex stabilizing the association of Cdc16 and Cdc27. Mol Cell Biol 24(8):3562–3576. doi: 10.1128/MCB.24.8.3562-3576.2004 CrossRefPubMedPubMedCentralGoogle Scholar
  100. Sémon M, Wolfe KH (2007) Consequences of genome duplication. Curr Opin Genet Dev 17(6):505–512. doi: 10.1016/j.gde.2007.09.007 CrossRefPubMedGoogle Scholar
  101. Shah AR, Oehmen CS, Webb-RobertsonB-J (2008) SVM-HUSTLE–an iterative semi-supervised machine learning approach for pairwise protein remote homology detection. Bioinformatics (Oxford, England) 24(6): 783–790. doi: 10.1093/bioinformatics/btn028 Google Scholar
  102. Shevchenko A, Roguev A, Schaft D, Buchanan L, Habermann B, Sakalar C et al (2008) Chromatin Central: towards the comparative proteome by accurate mapping of the yeast proteomic environment. Genome Biol 9(11):R167. doi: 10.1186/gb-2008-9-11-r167 CrossRefPubMedPubMedCentralGoogle Scholar
  103. Shi G, Zhang L, Jiang T (2010) MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement. BMC Bioinform 11(1):10. doi: 10.1186/1471-2105-11-10 CrossRefGoogle Scholar
  104. Sinha S, Lynn AM (2014) HMM-ModE: implementation, benchmarking and validation with HMMER3. BMC Res Notes 7(1):483. doi: 10.1186/1756-0500-7-483 CrossRefPubMedPubMedCentralGoogle Scholar
  105. Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(Web Server issue), W244–8. doi: 10.1093/nar/gki408 Google Scholar
  106. Söding J, Remmert M, Biegert A, Lupas AN (2006) HHsenser: exhaustive transitive profile search using HMM-HMM comparison. Nucleic Acids Res 34(Web Server issue), W374–8. doi: 10.1093/nar/gkl195 Google Scholar
  107. Sonnhammer ELL, Östlund G (2015) InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic acids research 43(Database issue), D234–9. doi: 10.1093/nar/gku1203 Google Scholar
  108. Stingele J, Habermann B, Jentsch S (2015) DNA-protein crosslink repair: proteases as DNA repair enzymes. Trends Biochem Sci 40(2):67–71. doi: 10.1016/j.tibs.2014.10.012 CrossRefPubMedGoogle Scholar
  109. Studer RA, Robinson-Rechavi M (2009) How confident can we be that orthologs are similar, but paralogs differ? Trends Genet: TIG 25(5):210–216. doi: 10.1016/j.tig.2009.03.004 CrossRefPubMedGoogle Scholar
  110. Szklarczyk R, Wanschers BF, Cuypers TD, Esseling JJ, Riemersma M, van den Brand MA et al (2012) Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase. Genome Biol 13(2):R12. doi: 10.1186/gb-2012-13-2-r12 CrossRefPubMedPubMedCentralGoogle Scholar
  111. Szklarczyk R, Wanschers BFJ, Nijtmans LG, Rodenburg RJ, Zschocke J, Dikow N et al (2013) A mutation in the FAM36A gene, the human ortholog of COX20, impairs cytochrome c oxidase assembly and is associated with ataxia and muscle hypotonia. Hum Mol Genet 22(4):656–667. doi: 10.1093/hmg/dds473 CrossRefPubMedGoogle Scholar
  112. Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science (New York, N.Y.) 278(5338):631–637CrossRefGoogle Scholar
  113. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19(2):327–335. doi: 10.1101/gr.073585.107 CrossRefPubMedPubMedCentralGoogle Scholar
  114. Vinga S, Almeida J (2003) Alignment-free sequence comparison-a review. Bioinformatics (Oxford, England) 19(4): 513–523Google Scholar
  115. Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249(4):816–831. doi: 10.1006/jmbi.1995.0340 CrossRefPubMedGoogle Scholar
  116. Wagner I, Volkmer M, Sharan M, Villaveces JM, Oswald F, Surendranath V, Habermann BH (2014) morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring. BMC Bioinform 15(1):263. doi: 10.1186/1471-2105-15-263 CrossRefGoogle Scholar
  117. Wang Y, Levy DE (2006) C. elegans STAT: evolution of a regulatory switch. FASEB J: Official Publ Fed Am Soc Exp Biol 20(10):1641–1652. doi: 10.1096/fj.06-6051com CrossRefGoogle Scholar
  118. Watson HC, Kendrew JC (1961) The amino-acid sequence of sperm whale myoglobin. Comparison between the amino-acid sequences of sperm whale myoglobin and of human hemoglobin. Nature 190:670–672CrossRefPubMedGoogle Scholar
  119. Wieser D, Niranjan M (2009) Remote homology detection using a kernel method that combines sequence and secondary-structure similarity scores. Silico Biol 9(3):89–103Google Scholar
  120. Wolf YI, Koonin EV (2012) A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes. Genome Biol Evol 4(12):1286–1294. doi: 10.1093/gbe/evs100 CrossRefPubMedPubMedCentralGoogle Scholar
  121. Wu S, Zhang Y (2008) MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 72(2):547–556. doi: 10.1002/prot.21945 CrossRefPubMedPubMedCentralGoogle Scholar
  122. Yamada K, Tomii K (2014) Revisiting amino acid substitution matrices for identifying distantly related proteins. Bioinformatics (Oxford, England) 30(3): 317–325. doi: 10.1093/bioinformatics/btt694 Google Scholar
  123. Yang Y, Tantoso E, Li K-B (2008) Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties. J Theor Biol 252(1):145–154. doi: 10.1016/j.jtbi.2008.01.028 CrossRefPubMedGoogle Scholar
  124. Yona G, Levitt M (2002) Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J Mol Biol 315(5):1257–1275. doi: 10.1006/jmbi.2001.5293 CrossRefPubMedGoogle Scholar
  125. Yu C, Desai V, Cheng L, Reifman J (2012) QuartetS-DB: a large-scale orthology database for prokaryotes and eukaryotes inferred by evolutionary evidence. BMC Bioinform 13(1):143. doi: 10.1186/1471-2105-13-143 CrossRefGoogle Scholar
  126. Zhang Z, Schäffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF (1998) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res 26(17):3986–3990CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Max Planck Institute of BiochemistryMartinsriedGermany
  2. 2.Aix Marseille Université CNRSMarseilleFrance

Personalised recommendations