Skip to main content

Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity

Abstract

Inferring remote orthologs is a persistent challenge in computational biology. The identification of orthologs is necessary for performing evolutionary analyses, comparative genomics, and genome annotation or for functional predictions and sensible planning of experimental studies. If we miss orthologous relationships due to low sequence conservation, we lose a significant amount of information. Given their fast evolutionary rates, remote orthologs can only be identified on protein level. A pair of proteins that has evolved by speciation and has below 30 % sequence identity can be defined as remote orthologs. Their high sequence divergence prevents their unambiguous recognition as orthologous proteins and does not allow a reliable interpretation of their evolutionary relationship. Thus, many remote orthologs remain hidden to date. In this article, I review current methods for remote orthology inference, highlight existing problems in, and discuss potential solutions for discovering remote orthologs.

Keywords

  • Hide Markov Model
  • Orthologous Relationship
  • Orthology Assignment
  • Large Evolutionary Distance
  • Remote Homology Detection

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-41324-2_22
  • Chapter length: 27 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-41324-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   219.99
Price excludes VAT (USA)
Hardcover Book
USD   219.99
Price excludes VAT (USA)
Fig. 22.1
Fig. 22.2
Fig. 22.3
Fig. 22.4
Fig. 22.5

References

  • Abagyan RA, Batalov S (1997) Do aligned sequences share the same fold? J Mol Biol 273(1):355–368. doi:10.1006/jmbi.1997.1287

    CAS  CrossRef  PubMed  Google Scholar 

  • Afrasiabi C, Samad B, Dineen D, Meacham C, Sjölander K (2013) The PhyloFacts FAT-CAT web server: ortholog identification and function prediction using fast approximate tree classification. Nucleic Acids Res 41(Web Server issue), W242–8. doi:10.1093/nar/gkt399

    Google Scholar 

  • Alexeyenko A, Tamas I, Liu G, Sonnhammer ELL (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics (Oxford, England), 22(14), e9–15. doi:10.1093/bioinformatics/btl213

    Google Scholar 

  • Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5(1):e1000262. doi:10.1371/journal.pcbi.1000262

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Altenhoff AM, Studer RA, Robinson-Rechavi M, Dessimoz C (2012) Resolving the ortholog conjecture: orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput Biol 8(5):e1002514. doi:10.1371/journal.pcbi.1002514

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Altenhoff AM, Škunca N, Glover N, Train C-M, Sueki A, Piližota I et al (2015) The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res 43(Database issue), D240–9. doi:10.1093/nar/gku1158

    Google Scholar 

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Alva V, Remmert M, Biegert A, Lupas AN, Söding J (2010) A galaxy of folds. Protein Sci: A Publ Protein Soc 19(1):124–130. doi:10.1002/pro.297

    CAS  Google Scholar 

  • Banumathy G, Somaiah N, Zhang R, Tang Y, Hoffmann J, Andrake M et al (2009) Human UBN1 is an ortholog of yeast Hpc2p and has an essential role in the HIRA/ASF1a chromatin-remodeling pathway in senescent cells. Mol Cell Biol 29(3):758–770. doi:10.1128/MCB.01047-08

    CAS  CrossRef  PubMed  Google Scholar 

  • Barberis M, De Gioia L, Ruzzene M, Sarno S, Coccetti P, Fantucci P et al (2005) The yeast cyclin-dependent kinase inhibitor Sic1 and mammalian p27Kip1 are functional homologues with a structurally conserved inhibitory domain. Biochem J 387(Pt 3):639–647. doi:10.1042/BJ20041299

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Bedoya O, Tischer I (2014) Remote homology detection incorporating the context of physicochemical properties. Comput Biol Med 45:43–50. doi:10.1016/j.compbiomed.2013.11.012

    CAS  CrossRef  PubMed  Google Scholar 

  • Bedoya O, Tischer I (2015) Reducing dimensionality in remote homology detection using predicted contact maps. Comput Biol Med 59:64–72. doi:10.1016/j.compbiomed.2015.01.020

    CrossRef  PubMed  Google Scholar 

  • Bernardes JS, Dávila AMR, Costa VS, Zaverucha G (2007) Improving model construction of profile HMMs for remote homology detection through structural alignment. BMC Bioinform 8(1):435. doi:10.1186/1471-2105-8-435

    CrossRef  Google Scholar 

  • Bernardes JS, Carbone A, Zaverucha G (2011) A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models. BMC Bioinform 12(1):83. doi:10.1186/1471-2105-12-83

    CAS  CrossRef  Google Scholar 

  • Bhadra R, Sandhya S, Abhinandan KR, Chakrabarti S, Sowdhamini R, Srinivasan N (2006) Cascade PSI-BLAST web server: a remote homology search tool for relating protein domains. Nucleic Acids Res 34(Web Server issue), W143–6. doi:10.1093/nar/gkl157

    Google Scholar 

  • Bhardwaj G, Ko KD, Hong Y, Zhang Z, Ho NL, Chintapalli SV et al (2012) PHYRN: a robust method for phylogenetic analysis of highly divergent sequences. PLoS ONE 7(4):e34261. doi:10.1371/journal.pone.0034261

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Biegert A, Mayer C, Remmert M, Söding J, Lupas AN (2006) The MPI bioinformatics toolkit for protein sequence analysis. Nucleic Acids Res 34(Web Server issue), W335–9. doi:10.1093/nar/gkl217

    Google Scholar 

  • Blake JD, Cohen FE (2001) Pairwise sequence alignment below the twilight zone. J Mol Biol 307(2):721–735. doi:10.1006/jmbi.2001.4495

    CAS  CrossRef  PubMed  Google Scholar 

  • Bork P, Sander C, Valencia A (1993) Convergent evolution of similar enzymatic function on different protein folds: the hexokinase, ribokinase, and galactokinase families of sugar kinases. Protein Sci: A Publ Protein Soc 2(1):31–40. doi:10.1002/pro.5560020104

    CAS  CrossRef  Google Scholar 

  • Burmester T, Hankeln T (2014) Function and evolution of vertebrate globins. Acta Physiol (Oxford, England), 211(3): 501–514. doi:10.1111/apha.12312

    Google Scholar 

  • Chang GS, Hong Y, Ko KD, Bhardwaj G, Holmes EC, Patterson RL, van Rossum DB (2008) Phylogenetic profiles reveal evolutionary relationships within the “twilight zone” of sequence similarity. Proc Natl Acad Sci USA 105(36):13474–13479. doi:10.1073/pnas.0803860105

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Comin M, Verzotto D (2011) The irredundant class method for remote homology detection of protein sequences. J Computat Biol: J Computat Mol Cell Biol 18(12):1819–1829. doi:10.1089/cmb.2010.0171

    CAS  CrossRef  Google Scholar 

  • Conant GC, Wolfe KH (2008) Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet 9(12):938–950. doi:10.1038/nrg2482

    CAS  CrossRef  PubMed  Google Scholar 

  • Dalquen DA, Dessimoz C (2013) Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals. Genome Biol Evol 5(10):1800–1806. doi:10.1093/gbe/evt132

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Darzentas N, Rigoutsos I, Ouzounis CA (2005) Sensitive detection of sequence similarity using combinatorial pattern discovery: a challenging study of two distantly related protein families. Proteins 61(4):926–937. doi:10.1002/prot.20608

    CAS  CrossRef  PubMed  Google Scholar 

  • Datta RS, Meacham C, Samad B, Neyer C, Sjölander K (2009) Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res 37(Web Server issue), W84–9. doi:10.1093/nar/gkp373

    Google Scholar 

  • Dietmann S, Fernandez-Fuentes N, Holm L (2002) Automated detection of remote homology. Curr Opin Struct Biol 12(3):362–367

    CAS  CrossRef  PubMed  Google Scholar 

  • Dong Y, Bogdanova A, Habermann B, Zachariae W, Ahringer J (2007) Identification of the C. elegans anaphase promoting complex subunit Cdc26 by phenotypic profiling and functional rescue in yeast. BMC Dev Biol 7(1):19. doi:10.1186/1471-213X-7-19

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Doolittle RF (1986) Of Urfs and Orfs: a primer on how to analyze derived amino acid sequences. In: University Science Books, Herndon, VA vol 29, pp 1–103. doi:10.1002/jobm.3620290411

    Google Scholar 

  • Dufayard J-F, Duret L, Penel S, Gouy M, Rechenmann F, Perrière G (2005) Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics (Oxford, England), 21(11): 2596–2603. doi:10.1093/bioinformatics/bti325

    Google Scholar 

  • Eddy SR (2009) A new generation of homology search tools based on probabilistic inference. Genome Inform Int Conf Genome Inform 23(1): 205–211

    Google Scholar 

  • Eyre TA, Wright MW, Lush MJ, Bruford EA (2007) HCOP: a searchable database of human orthology predictions. Briefings Bioinform 8(1):2–5. doi:10.1093/bib/bbl030

    CAS  CrossRef  Google Scholar 

  • Fariselli P, Rossi I, Capriotti E, Casadio R (2007) The WWWH of remote homolog detection: the state of the art. Briefings Bioinform 8(2):78–87. doi:10.1093/bib/bbl032

    CAS  CrossRef  Google Scholar 

  • Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F et al (2015) HMMER web server: 2015 update. Nucleic Acids Res 43(W1):W30–W38. doi:10.1093/nar/gkv397

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst Zool 19(2):99–113

    CAS  CrossRef  PubMed  Google Scholar 

  • Gabaldón T, Koonin EV (2013) Functional and evolutionary implications of gene orthology. Nat Rev Genet 14(5):360–366. doi:10.1038/nrg3456

    CrossRef  PubMed  Google Scholar 

  • Galindo A, Hervás-Aguilar A, Rodríguez-Galán O, Vincent O, Arst HN, Tilburn J, Peñalva MA (2007) PalC, one of two Bro1 domain proteins in the fungal pH signalling pathway, localizes to cortical structures and binds Vps32. Traffic (Copenhagen, Denmark) 8(10): 1346–1364. doi:10.1111/j.1600-0854.2007.00620.x

    Google Scholar 

  • Ginalski K (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31(13):3804–3807. doi:10.1093/nar/gkg504

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Gray GS, Fitch WM (1983) Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol Biol Evol 1(1):57–66

    CAS  PubMed  Google Scholar 

  • Grossberger R, Gieffers C, Zachariae W, Podtelejnikov AV, Schleiffer A, Nasmyth K et al (1999) Characterization of the DOC1/APC10 subunit of the yeast and the human anaphase-promoting complex. J Biol Chem 274(20):14500–14507

    CAS  CrossRef  PubMed  Google Scholar 

  • Gupta MK, Niyogi R, Misra M (2013) An alignment-free method to find similarity among protein sequences via the general form of Chou’s pseudo amino acid composition. SAR QSAR Environ Res 24(7):597–609. doi:10.1080/1062936X.2013.773378

    CAS  CrossRef  PubMed  Google Scholar 

  • Heinicke S, Livstone MS, Lu C, Oughtred R, Kang F, Angiuoli SV et al (2007) The Princeton protein orthology database (P-POD): a comparative genomics analysis tool for biologists. PLoS ONE 2(8):e766. doi:10.1371/journal.pone.0000766

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M et al (2016) Ensemble comparative genomics resources. Database: J Biol Databases Curation 2016, bav096. doi:10.1093/database/bav096

    Google Scholar 

  • Höhl M, Ragan MA (2007) Is multiple-sequence alignment required for accurate inference of phylogeny? Syst Biol 56(2):206–221. doi:10.1080/10635150701294741

    CrossRef  PubMed  Google Scholar 

  • Höhl M, Rigoutsos I, Ragan MA (2006) Pattern-based phylogenetic distance estimation and tree reconstruction. Evol Bioinform Online 2:359–375

    Google Scholar 

  • Huerta-Cepas J, Bueno A, Dopazo J, Gabaldon T (2007) PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucleic Acids Res 36(Database), D491–D496. doi:10.1093/nar/gkm899

    Google Scholar 

  • Hutterer A, Berdnik D, Wirtz-Peitz F, Zigman M, Schleiffer A, Knoblich JA (2006) Mitotic activation of the kinase Aurora-A requires its binding partner Bora. Dev Cell 11(2):147–157. doi:10.1016/j.devcel.2006.06.002

    CAS  CrossRef  PubMed  Google Scholar 

  • Ivliev AE, Sergeeva MG (2008) OrthoFocus: program for identification of orthologs in multiple genomes in family-focused studies. Js Bioinform Comput Biol 6(4):811–824

    CAS  CrossRef  Google Scholar 

  • Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinform 11(1):431. doi:10.1186/1471-2105-11-431

    CrossRef  Google Scholar 

  • Karwath A, King RD (2002) Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinform 3(1):11. doi:10.1186/1471-2105-3-11

    CrossRef  Google Scholar 

  • Kim S, Kang J, Chung YJ, Li J, Ryu KH (2008) Clustering orthologous proteins across phylogenetically distant species. Proteins 71(3):1113–1122. doi:10.1002/prot.21792

    CAS  CrossRef  PubMed  Google Scholar 

  • Kim B-H, Cheng H, Grishin NV (2009) HorA web server to infer homology between proteins using sequence and structural similarity. Nucleic Acids Res 37(Web Server issue), W532–8. doi:10.1093/nar/gkp328

    Google Scholar 

  • Kim J, Ishiguro K-I, Nambu A, Akiyoshi B, Yokobayashi S, Kagami A et al (2015) Meikin is a conserved regulator of meiosis-I-specific kinetochore function. Nature 517(7535):466–471. doi:10.1038/nature14097

    CAS  CrossRef  PubMed  Google Scholar 

  • Kitajima TS, Kawashima SA, Watanabe Y (2004) The conserved kinetochore protein shugoshin protects centromeric cohesion during meiosis. Nature 427(6974):510–517. doi:10.1038/nature02312

    CAS  CrossRef  PubMed  Google Scholar 

  • Koonin EV (2005) Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 39(1):309–338. doi:10.1146/annurev.genet.39.073003.114725

    CAS  CrossRef  PubMed  Google Scholar 

  • Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for Gene Orthology inference. Briefings Bioinform 12(5):379–391. doi:10.1093/bib/bbr030

    CrossRef  Google Scholar 

  • Kriventseva EV, Rahman N, Espinosa O, Zdobnov EM (2008) OrthoDB: the hierarchical catalog of eukaryotic orthologs. Nucleic Acids Res 36(Database issue), D271–5. doi:10.1093/nar/gkm845

    Google Scholar 

  • Kueng S, Hegemann B, Peters BH, Lipp JJ, Schleiffer A, Mechtler K, Peters J-M (2006) Wapl controls the dynamic association of cohesin with chromatin. Cell 127(5):955–967. doi:10.1016/j.cell.2006.09.040

    CAS  CrossRef  PubMed  Google Scholar 

  • Kumar S (2011) Remote homologue identification of Drosophila GAGA factor in mouse. Bioinformation 7(1):29–32

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Kumar A, Cowen L (2009) Augmented training of hidden Markov models to recognize remote homologs via simulated evolution. Bioinformatics (Oxford, England) 25(13): 1602–1608. doi:10.1093/bioinformatics/btp265

    Google Scholar 

  • Kuziemko A, Honig B, Petrey D (2011) Using structure to explore the sequence alignment space of remote homologs. PLoS Comput Biol 7(10):e1002175. doi:10.1371/journal.pcbi.1002175

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Lawo S, Bashkurov M, Mullin M, Ferreria MG, Kittler R, Habermann B et al (2009) HAUS, the 8-subunit human Augmin complex, regulates centrosome and spindle integrity. Current Biol: CB 19(10):816–826. doi:10.1016/j.cub.2009.04.033

    CAS  CrossRef  Google Scholar 

  • Lee MM, Bundschuh R, Chan MK (2008) Distant homology detection using a LEngth and STructure-based sequence alignment tool (LESTAT). Proteins 71(3):1409–1419. doi:10.1002/prot.21830

    CAS  CrossRef  PubMed  Google Scholar 

  • Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13(9):2178–2189. doi:10.1101/gr.1224503

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Liu K, Raghavan S, Nelesen S, Linder CR, Warnow T (2009) Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science (New York, N.Y.) 324(5934):1561–1564. doi:10.1126/science.1171243

    CAS  CrossRef  Google Scholar 

  • Liu K, Warnow TJ, Holder MT, Nelesen SM, Yu J, Stamatakis AP, Linder CR (2012) SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst Biol 61(1):90–106. doi:10.1093/sysbio/syr095

    CrossRef  PubMed  Google Scholar 

  • Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q et al (2014) Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics (Oxford, England) 30(4): 472–479. doi:10.1093/bioinformatics/btt709

    Google Scholar 

  • Liu B, Chen J, Wang X (2015) Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics: MGG 290(5):1919–1931. doi:10.1007/s00438-015-1044-4

    CAS  CrossRef  PubMed  Google Scholar 

  • Makarova KS, Koonin EV, Kelman Z (2012) The CMG (CDC45/RecJ, MCM, GINS) complex is a conserved component of the DNA replication system in all archaea and eukaryotes. Biol Direct 7(1):7. doi:10.1186/1745-6150-7-7

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Maulik U, Sarkar A (2013) Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels. PLoS ONE 8(2):e46468. doi:10.1371/journal.pone.0046468

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Meier A, Söding J (2015) Context similarity scoring improves protein sequence alignments in the midnight zone. Bioinformatics (Oxford, England) 31(5): 674–681. doi:10.1093/bioinformatics/btu697

    Google Scholar 

  • Mina JG, Okada Y, Wansadhipathi-Kannangara NK, Pratt S, Shams-Eldin H, Schwarz RT et al (2010) Functional analyses of differentially expressed isoforms of the Arabidopsis inositol phosphorylceramide synthase. Plant Mol Biol 73(4–5):399–407. doi:10.1007/s11103-010-9626-3

    CAS  CrossRef  PubMed  Google Scholar 

  • Mirarab S, Nguyen N, Warnow T (2012) SEPP: SATé-enabled phylogenetic placement. In: Pacific symposium on biocomputing. Pacific symposium on biocomputing, pp. 247–258. doi:10.1142/9789814366496_0024

  • Muda HM, Saad P, Othman RM (2011) Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. Comput Biol Med 41(8):687–699. doi:10.1016/j.compbiomed.2011.06.004

    CAS  CrossRef  PubMed  Google Scholar 

  • Mudgal R, Sowdhamini R, Chandra N, Srinivasan N, Sandhya S (2014) Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. J Mol Biol 426(4):962–979. doi:10.1016/j.jmb.2013.11.026

    CAS  CrossRef  PubMed  Google Scholar 

  • Mudgal R, Sandhya S, Kumar G, Sowdhamini R, Chandra NR, Srinivasan N (2015) NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection. Nucleic Acids Res 43(Database issue), D300–5. doi:10.1093/nar/gku888

    Google Scholar 

  • Murzin AG, Bateman A (1997) Distant homology recognition using structural classification of proteins. Proteins Suppl 1:105–112

    CrossRef  Google Scholar 

  • Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540. doi:10.1006/jmbi.1995.0159

    CAS  PubMed  Google Scholar 

  • NCBI Resource Coordinators (2016) Database resources of the national center for biotechnology information. Nucleic Acids Res 44(D1):D7–D19. doi:10.1093/nar/gkv1290

    CrossRef  Google Scholar 

  • Nehrt NL, Clark WT, Radivojac P, Hahn MW (2011) Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS Comput Biol 7(6):e1002073. doi:10.1371/journal.pcbi.1002073

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Nelesen S, Liu K, Wang L-S, Linder CR, Warnow T (2012) DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics (Oxford, England) 28(12): i274–82. doi:10.1093/bioinformatics/bts218

    Google Scholar 

  • Nishiyama T, Ladurner R, Schmitz J, Kreidl E, Schleiffer A, Bhaskara V et al (2010) Sororin mediates sister chromatid cohesion by antagonizing Wapl. Cell 143(5):737–749. doi:10.1016/j.cell.2010.10.031

    CAS  CrossRef  PubMed  Google Scholar 

  • Östlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S et al (2010) InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 38(Database issue), D196–203. doi:10.1093/nar/gkp931

    Google Scholar 

  • Ozlü N, Srayko M, Kinoshita K, Habermann B, O’toole ET, Müller-Reichert T et al (2005) An essential function of the C. elegans ortholog of TPX2 is to localize activated aurora A kinase to mitotic spindles. Dev Cell 9(2): 237–248. doi:10.1016/j.devcel.2005.07.002

    Google Scholar 

  • Pelletier L, Ozlü N, Hannak E, Cowan C, Habermann B, Ruer M et al (2004) The Caenorhabditis elegans centrosomal protein SPD-2 is required for both pericentriolar material recruitment and centriole duplication. Current Biol: CB 14(10):863–873. doi:10.1016/j.cub.2004.04.012

    CAS  CrossRef  Google Scholar 

  • Penel S, Arigon A-M, Dufayard J-F, Sertier A-S, Daubin V, Duret L et al (2009) Databases of homologous gene families for comparative genomics. BMC Bioinform 10 Suppl 6(Suppl 6), S3. doi:10.1186/1471-2105-10-S6-S3

    Google Scholar 

  • Penkett CJ, Morris JA, Wood V, Bähler J (2006) YOGY: a web-based, integrated database to retrieve protein orthologs and associated gene ontology terms. Nucleic Acids Res 34(Web Server issue), W330–4. doi:10.1093/nar/gkl311

    Google Scholar 

  • Perutz MF, ROSSMANN MG, CULLIS AF, MUIRHEAD H, WILL G, NORTH AC (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature 185(4711), 416–422

    Google Scholar 

  • Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J et al (2011) eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res 40(D1):D284–D289. doi:10.1093/nar/gkr1060

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Proost S, Van Bel M, Vaneechoutte D, Van de Peer Y, Inzé D, Mueller-Roeber B, Vandepoele K (2015) PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res 43(Database issue), D974–81. doi:10.1093/nar/gku986

    Google Scholar 

  • Pryszcz LP, Huerta-Cepas J, Gabaldón T (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res 39(5):e32–e32. doi:10.1093/nar/gkq953

    CAS  CrossRef  PubMed  Google Scholar 

  • Rabitsch KP, Gregan J, Schleiffer A, Javerzat J-P, Eisenhaber F, Nasmyth K (2004) Two fission yeast homologs of Drosophila Mei-S332 are required for chromosome segregation during meiosis I and II. Current Biol: CB 14(4):287–301. doi:10.1016/j.cub.2004.01.051

    CAS  CrossRef  Google Scholar 

  • Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175. doi:10.1038/nmeth.1818

    CAS  CrossRef  Google Scholar 

  • Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12(2):85–94

    CAS  CrossRef  PubMed  Google Scholar 

  • Ruan J, Li H, Chen Z, Coghlan A, Coin LJM, Guo Y et al (2008) TreeFam: 2008 Update. Nucleic Acids Res 36(Database issue), D735–40. doi:10.1093/nar/gkm1005

    Google Scholar 

  • Sánchez-Díaz A, González I, Arellano M, Moreno S (1998) The Cdk inhibitors p25rum1 and p40SIC1 are functional homologues that play similar roles in the regulation of the cell cycle in fission and budding yeast. J Cell Sci 111(Pt 6):843–851

    PubMed  Google Scholar 

  • Sandhya S, Mudgal R, Jayadev C, Abhinandan KR, Sowdhamini R, Srinivasan N (2012) Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. Mol BioSyst 8(8):2076–2084. doi:10.1039/c2mb25113b

    CAS  CrossRef  PubMed  Google Scholar 

  • Schreiber F, Sonnhammer ELL (2013) Hieranoid: hierarchical orthology inference. J Mol Biol 425(11):2072–2081. doi:10.1016/j.jmb.2013.02.018

    CAS  CrossRef  PubMed  Google Scholar 

  • Schwickart M, Havlis J, Habermann B, Bogdanova A, Camasses A, Oelschlaegel T et al (2004) Swm1/Apc13 is an evolutionarily conserved subunit of the anaphase-promoting complex stabilizing the association of Cdc16 and Cdc27. Mol Cell Biol 24(8):3562–3576. doi:10.1128/MCB.24.8.3562-3576.2004

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Sémon M, Wolfe KH (2007) Consequences of genome duplication. Curr Opin Genet Dev 17(6):505–512. doi:10.1016/j.gde.2007.09.007

    CrossRef  PubMed  Google Scholar 

  • Shah AR, Oehmen CS, Webb-RobertsonB-J (2008) SVM-HUSTLE–an iterative semi-supervised machine learning approach for pairwise protein remote homology detection. Bioinformatics (Oxford, England) 24(6): 783–790. doi:10.1093/bioinformatics/btn028

    Google Scholar 

  • Shevchenko A, Roguev A, Schaft D, Buchanan L, Habermann B, Sakalar C et al (2008) Chromatin Central: towards the comparative proteome by accurate mapping of the yeast proteomic environment. Genome Biol 9(11):R167. doi:10.1186/gb-2008-9-11-r167

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Shi G, Zhang L, Jiang T (2010) MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement. BMC Bioinform 11(1):10. doi:10.1186/1471-2105-11-10

    CrossRef  Google Scholar 

  • Sinha S, Lynn AM (2014) HMM-ModE: implementation, benchmarking and validation with HMMER3. BMC Res Notes 7(1):483. doi:10.1186/1756-0500-7-483

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(Web Server issue), W244–8. doi:10.1093/nar/gki408

    Google Scholar 

  • Söding J, Remmert M, Biegert A, Lupas AN (2006) HHsenser: exhaustive transitive profile search using HMM-HMM comparison. Nucleic Acids Res 34(Web Server issue), W374–8. doi:10.1093/nar/gkl195

    Google Scholar 

  • Sonnhammer ELL, Östlund G (2015) InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic acids research 43(Database issue), D234–9. doi:10.1093/nar/gku1203

    Google Scholar 

  • Stingele J, Habermann B, Jentsch S (2015) DNA-protein crosslink repair: proteases as DNA repair enzymes. Trends Biochem Sci 40(2):67–71. doi:10.1016/j.tibs.2014.10.012

    CAS  CrossRef  PubMed  Google Scholar 

  • Studer RA, Robinson-Rechavi M (2009) How confident can we be that orthologs are similar, but paralogs differ? Trends Genet: TIG 25(5):210–216. doi:10.1016/j.tig.2009.03.004

    CAS  CrossRef  PubMed  Google Scholar 

  • Szklarczyk R, Wanschers BF, Cuypers TD, Esseling JJ, Riemersma M, van den Brand MA et al (2012) Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase. Genome Biol 13(2):R12. doi:10.1186/gb-2012-13-2-r12

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Szklarczyk R, Wanschers BFJ, Nijtmans LG, Rodenburg RJ, Zschocke J, Dikow N et al (2013) A mutation in the FAM36A gene, the human ortholog of COX20, impairs cytochrome c oxidase assembly and is associated with ataxia and muscle hypotonia. Hum Mol Genet 22(4):656–667. doi:10.1093/hmg/dds473

    CAS  CrossRef  PubMed  Google Scholar 

  • Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science (New York, N.Y.) 278(5338):631–637

    CAS  CrossRef  Google Scholar 

  • Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 19(2):327–335. doi:10.1101/gr.073585.107

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Vinga S, Almeida J (2003) Alignment-free sequence comparison-a review. Bioinformatics (Oxford, England) 19(4): 513–523

    Google Scholar 

  • Vogt G, Etzold T, Argos P (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 249(4):816–831. doi:10.1006/jmbi.1995.0340

    CAS  CrossRef  PubMed  Google Scholar 

  • Wagner I, Volkmer M, Sharan M, Villaveces JM, Oswald F, Surendranath V, Habermann BH (2014) morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring. BMC Bioinform 15(1):263. doi:10.1186/1471-2105-15-263

    CrossRef  Google Scholar 

  • Wang Y, Levy DE (2006) C. elegans STAT: evolution of a regulatory switch. FASEB J: Official Publ Fed Am Soc Exp Biol 20(10):1641–1652. doi:10.1096/fj.06-6051com

    CrossRef  Google Scholar 

  • Watson HC, Kendrew JC (1961) The amino-acid sequence of sperm whale myoglobin. Comparison between the amino-acid sequences of sperm whale myoglobin and of human hemoglobin. Nature 190:670–672

    CAS  CrossRef  PubMed  Google Scholar 

  • Wieser D, Niranjan M (2009) Remote homology detection using a kernel method that combines sequence and secondary-structure similarity scores. Silico Biol 9(3):89–103

    CAS  Google Scholar 

  • Wolf YI, Koonin EV (2012) A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes. Genome Biol Evol 4(12):1286–1294. doi:10.1093/gbe/evs100

    CrossRef  PubMed  PubMed Central  Google Scholar 

  • Wu S, Zhang Y (2008) MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 72(2):547–556. doi:10.1002/prot.21945

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  • Yamada K, Tomii K (2014) Revisiting amino acid substitution matrices for identifying distantly related proteins. Bioinformatics (Oxford, England) 30(3): 317–325. doi:10.1093/bioinformatics/btt694

    Google Scholar 

  • Yang Y, Tantoso E, Li K-B (2008) Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties. J Theor Biol 252(1):145–154. doi:10.1016/j.jtbi.2008.01.028

    CAS  CrossRef  PubMed  Google Scholar 

  • Yona G, Levitt M (2002) Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J Mol Biol 315(5):1257–1275. doi:10.1006/jmbi.2001.5293

    CAS  CrossRef  PubMed  Google Scholar 

  • Yu C, Desai V, Cheng L, Reifman J (2012) QuartetS-DB: a large-scale orthology database for prokaryotes and eukaryotes inferred by evolutionary evidence. BMC Bioinform 13(1):143. doi:10.1186/1471-2105-13-143

    CrossRef  Google Scholar 

  • Zhang Z, Schäffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF (1998) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res 26(17):3986–3990

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

I would like to thank Frank Schnorrer and Friedhelm Pfeiffer for critical reading of the manuscript. This work was supported by the Max Planck Society and by the CNRS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bianca Hermine Habermann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Habermann, B.H. (2016). Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. In: Pontarotti, P. (eds) Evolutionary Biology. Springer, Cham. https://doi.org/10.1007/978-3-319-41324-2_22

Download citation