New Genome Similarity Measures Based on Conserved Gene Adjacencies

  • Luis Antonio B. Kowada
  • Daniel Doerr
  • Simone Dantas
  • Jens Stoye
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9649)

Abstract

Many important questions in molecular biology, evolution and biomedicine can be addressed by comparative genomics approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example to elucidate the phylogenetic relationships between species.

The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomics methods that allow this kind of input are called gene family-based. The most powerful – but also most complex – models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free.

In this paper, we study an intermediate approach between family-based and family-free genomic similarity measures. The model, called gene connections, is on the one hand more flexible than the family-based model, on the other hand the resulting data structure is less complex than in the family-free approach. This intermediate status allows us to achieve results comparable to those for family-free methods, but at running times similar to those for the family-based approach.

Within the gene connection model, we define three variants of genomic similarity measures that have different expression power. We give polynomial-time algorithms for two of them, while we show NP-hardness of the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.

Notes

Acknowledgements

The research of LABK and SD is partially supported by FAPERJ and CNPq. This work was performed while JS was on sabbatical as Special Visiting Researcher at UFF in Niteri, Brazil, funded by Cincia sem Fronteiras/CAPES.

References

  1. 1.
    Sankoff, D.: Edit distance for genome comparison based on non-local operations. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 121–135. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  2. 2.
    Hannenhalli, S., Pevzner, P.A.: Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J. ACM 46(1), 1–27 (1999)CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)CrossRefGoogle Scholar
  4. 4.
    Bergeron, A., Mixtacki, J., Stoye, J.: A new linear time algorithm to compute the genomic distance via the double cut and join distance. Theor. Comput. Sci. 410(51), 5300–5316 (2009)CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics. Computational Biology Series, vol. 1, pp. 207–211. Kluwer Academic Publishers, London (2000)CrossRefGoogle Scholar
  6. 6.
    Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Assignment of orthologous genes via genome rearrangement. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(4), 302–315 (2005)CrossRefGoogle Scholar
  7. 7.
    Angibaud, S., Fertin, G., Rusu, I., Thevenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Bulteau, L., Jiang, M.: Inapproximability of (1,2)-exemplar distance. IEEE/ ACM Trans. Comput. Biol. Bioinform. 10(6), 1384–1390 (2012)CrossRefGoogle Scholar
  9. 9.
    Shao, M., Lin, Y., Moret, B.M.E.: An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Doerr, D., Thvenin, A., Stoye, J.: Gene family assignment-free comparative genomics. BMC Bioinform. 13(Suppl. 19), S3 (2012)CrossRefGoogle Scholar
  11. 11.
    Braga, M.D.V., Chauve, C., Doerr, D., Jahn, K., Stoye, J., Thvenin, A., Wittler, R.: The potential of family-free genome comparison. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution. Computational Biology Series, vol. 19, pp. 287–307. Springer, London (2013)CrossRefGoogle Scholar
  12. 12.
    Doerr, D., Stoye, J., Bcker, S., Jahn, K.: Identifying gene clusters by discovering common intervals in indeterminate strings. BMC Bioinform. 15(Suppl. 6), S2 (2014)Google Scholar
  13. 13.
    Martinez, F.V., Feijo, P., Braga, M.D.V., Stoye, J.: On the family-free DCJ distance and similarity. Algorithms Mol. Biol. 10, 13 (2015)CrossRefGoogle Scholar
  14. 14.
    Zhu, Q., Adam, Z., Choi, V., Sankoff, D.: Generalized gene adjacencies, graph bandwidth, and clusters in yeast evolution. IEEE/ACM Trans. Comput. Biol. Bioinform. 6(2), 213–220 (2009)CrossRefGoogle Scholar
  15. 15.
    Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)CrossRefGoogle Scholar
  16. 16.
    Blanchette, M., Kunisawa, T., Sankoff, D.: Gene order breakpoint evidence in animal mitochondrial phylogeny. J. Mol. Evol. 49(2), 193–203 (1999)CrossRefGoogle Scholar
  17. 17.
    Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under different genomic distances. BMC Bioinform. 10, 120 (2009)CrossRefGoogle Scholar
  18. 18.
    Hopcroft, J.E., Karp, R.M.: An \(n^{5/2}\) algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)CrossRefMATHMathSciNetGoogle Scholar
  19. 19.
    Doerr, D.: Gene family-free genome comparison. Ph.D. thesis, Faculty of Technology, Bielefeld University, Germany (2015)Google Scholar
  20. 20.
    Goodstein, D.M., Shu, S., Howson, R., Neupane, R., Hayes, R.D., Fazo, J., Mitros, T., Dirks, W., Hellsten, U., Putnam, N., Rokhsar, D.S.: Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 40(Database issue), D1178–D1186 (2012)CrossRefGoogle Scholar
  21. 21.
    Sonnhammer, E.L.L., Östlund, G.: Inparanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res. 43(Database issue), D234–D239 (2015)CrossRefGoogle Scholar
  22. 22.
    Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D.L., Garcia-Hernandez, M., Karthikeyan, A.S., Lee, C.H., Nelson, W.D., Ploetz, L., Singh, S., Wensel, A., Huala, E.: The arabidopsis information resource (tair): improved gene annotation and new tools. Nucleic Acids Res. 40(Database issue), D1202–D1210 (2011)Google Scholar
  23. 23.
    Wu, G.A., Prochnik, S., Jenkins, J., Salse, J., Hellsten, U., Murat, F., Perrier, X., Ruiz, M., Scalabrin, S., Terol, J., Takita, M.A., Labadie, K., Poulain, J., Couloux, A., Jabbari, K., Cattonaro, F., Del Fabbro, C., Pinosio, S., Zuccolo, A., Chapman, J., Grimwood, J., Tadeo, F.R., Estornell, L.H., Muñoz-Sanz, J.V., Ibanez, V., Herrero-Ortega, A., Aleza, P., Pérez-Pérez, J., Ramón, D., Brunel, D., Luro, F., Chen, C., Farmerie, W.G., Desany, B., Kodira, C., Mohiuddin, M., Harkins, T., Fredrikson, K., Burns, P., Lomsadze, A., Mark, B., Reforgiato, G., Freitas-Astúa, J., Quetier, F., Navarro, L., Roose, M., Wincker, P., Schmutz, J., Morgante, M., Machado, M.A., Talón, M., Jaillon, O., Ollitrault, P., Gmitter, F., Rokhsar, D.: Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat. Biotechnol. 32(7), 656–662 (2014)CrossRefGoogle Scholar
  24. 24.
    Slotte, T., Hazzouri, K.M., Ågren, J.A., Koenig, D., Maumus, F., Guo, Y.-L., Steige, K., Platts, A.E., Escobar, J.S., Newman, L.K., Wang, W., Mandáková, T., Vello, E., Smith, L.M., Henz, S.R., Steffen, J., Takuno, S., Brandvain, Y., Coop, G., Andolfatto, P., Hu, T.T., Blanchette, M., Clark, R.M., Quesneville, H., Nordborg, M., Gaut, B.S., Lysak, M.A., Jenkins, J., Grimwood, J., Chapman, J., Prochnik, S., Shu, S., Rokhsar, D., Schmutz, J., Weigel, D., Wright, S.I.: The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45(7), 831–835 (2013)CrossRefGoogle Scholar
  25. 25.
    Bartholomé, J., Mandrou, E., Mabiala, A., Jenkins, J., Nabihoudine, I., Klopp, C., Schmutz, J., Plomion, C., Gion, J.-M.: High-resolution genetic maps of eucalyptus improve Eucalyptus grandis genome assembly. New Phytol 206(4), 1283–1296 (2015)CrossRefGoogle Scholar
  26. 26.
    Yang, R., Jarvis, D.E., Chen, H., Beilstein, M.A., Grimwood, J., Jenkins, J., Shu, S., Prochnuk, S., Xin, M., Ma, C., Schmutz, J., Wing, R.A., Mitchell-Olds, T., Schumaker, K.S., Wang, X.: The reference genome of the halophytic plant Eutrema salsugineum. Front. Plant Sci. 4, 46 (2013)Google Scholar
  27. 27.
    Shulaev, V., Sargent, D.J., Crowhurst, R.N., Mockler, T.C., Folkerts, O., Delcher, A.L., Jaiswal, P., Mockaitis, K., Liston, A., Mane, S.P., Burns, P., Davis, T.M., Slovin, J.P., Bassil, N., Hellens, R.P., Evans, C., Harkins, T., Kodira, C., Desany, B., Crasta, O.R., Jensen, R.V., Allan, A.C., Michael, T.P., Setubal, J.C., Celton, J.-M., Rees, D.J.G., Williams, K.P., Holt, S.H., Rojas, J.J.R., Chatterjee, M., Liu, B., Silva, H., Meisel, L., Adato, A., Filichkin, S.A., Troggio, M., Viola, R., Ashman, T.-L., Wang, H., Dharmawardhana, P., Elser, J., Raja, R., Priest, H.D., Bryant, D.W., Fox, S.E., Givan, S.A., Wilhelm, L.J., Naithani, S., Christoffels, A., Salama, D.Y., Carter, J., Girona, E.L., Zdepski, A., Wang, W., Kerstetter, R.A., Schwab, W., Korban, S.S., Davik, J., Monfort, A., Denoyes-Rothan, B., Arus, P., Mittler, R., Flinn, B., Aharoni, A., Bennetzen, J.L., Salzberg, S.L., Dickerman, A.W., Velasco, R., Borodovsky, M., Veilleux, R.E., Folta, K.M.: The genome of woodland strawberry (Fragaria vesca). Nat. Genet. 43(2), 109–116 (2011)CrossRefGoogle Scholar
  28. 28.
    Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., Xu, D., Hellsten, U., May, G.D., Yu, Y., Sakurai, T., Umezawa, T., Bhattacharyya, M.K., Sandhu, D., Valliyodan, B., Lindquist, E., Peto, M., Grant, D., Shu, S., Goodstein, D., Barry, K., Futrell-Griggs, M., Abernathy, B., Du, J., Tian, Z., Zhu, L., Gill, N., Joshi, T., Libault, M., Sethuraman, A., Zhang, X.-C., Shinozaki, K., Nguyen, H.T., Wing, R.A., Cregan, P., Specht, J., Grimwood, J., Rokhsar, D., Stacey, G., Shoemaker, R.C., Jackson, S.A.: Genome sequence of the palaeopolyploid soybean. Nature 463(7278), 178–183 (2010)CrossRefGoogle Scholar
  29. 29.
    Paterson, A.H., Wendel, J.F., Gundlach, H., Guo, H., Jenkins, J., Jin, D., Llewellyn, D., Showmaker, K.C., Shu, S., Udall, J., Yoo, M.-J., Byers, R., Chen, W., Doron-Faigenboim, A., Duke, M.V., Gong, L., Grimwood, J., Grover, C., Grupp, K., Hu, G., Lee, T.-H., Li, J., Lin, L., Liu, T., Marler, B.S., Page, J.T., Roberts, A.W., Romanel, E., Sanders, W.S., Szadkowski, E., Tan, X., Tang, H., Xu, C., Wang, J., Wang, Z., Zhang, D., Zhang, L., Ashrafi, H., Bedon, F., Bowers, J.E., Brubaker, C.L., Chee, P.W., Das, S., Gingle, A.R., Haigler, C.H., Harker, D., Hoffmann, L.V., Hovav, R., Jones, D.C., Lemke, C., Mansoor, S., Rahman, M.U., Rainville, L.N., Rambani, A., Reddy, U.K., Rong, J.-K., Saranga, Y., Scheffler, B.E., Scheffler, J.A., Stelly, D.M., Triplett, B.A., Van Deynze, A., Vaslin, M.F.S., Waghmare, V.N., Walford, S.A., Wright, R.J., Zaki, E.A., Zhang, T., Dennis, E.S., Mayer, K.F.X., Peterson, D.G., Rokhsar, D.S., Wang, X., Schmutz, J.: Repeated polyploidization of gossypium genomes and the evolution of spinnable cotton fibres. Nature 492(7429), 423–427 (2012)CrossRefGoogle Scholar
  30. 30.
    Wang, Z., Hobson, N., Galindo, L., Zhu, S., Shi, D., McDill, J., Yang, L., Hawkins, S., Neutelings, G., Datla, R., Lambert, G., Galbraith, D.W., Grassa, C.J., Geraldes, A., Cronk, Q.C., Cullis, C., Dash, P.K., Kumar, P.A., Cloutier, S., Sharpe, A.G., Wong, G.K.S., Wang, J., Deyholos, M.K.: The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J. 72(3), 461–473 (2012)CrossRefGoogle Scholar
  31. 31.
    Young, N.D., Debellé, F., Oldroyd, G.E.D., Geurts, R., Cannon, S.B., Udvardi, M.K., Benedito, V.A., Mayer, K.F.X., Gouzy, J., Schoof, H., Van de Peer, Y., Proost, S., Cook, D.R., Meyers, B.C., Spannagl, M., Cheung, F., De Mita, S., Krishnakumar, V., Gundlach, H., Zhou, S., Mudge, J., Bharti, A.K., Murray, J.D., Naoumkina, M.A., Rosen, B., Silverstein, K.A.T., Tang, H., Rombauts, S., Zhao, P.X., Zhou, P., Barbe, V., Bardou, P., Bechner, M., Bellec, A., Berger, A., Bergès, H., Bidwell, S., Bisseling, T., Choisne, N., Couloux, A., Denny, R., Deshpande, S., Dai, X., Doyle, J.J., Dudez, A.-M., Farmer, A.D., Fouteau, S., Franken, C., Gibelin, C., Gish, J., Goldstein, S., González, A.J., Green, P.J., Hallab, A., Hartog, M., Hua, A., Humphray, S.J., Jeong, D.-H., Jing, Y., Jöcker, A., Kenton, S.M., Kim, D.-J., Klee, K., Lai, H., Lang, C., Lin, S., Macmil, S.L., Magdelenat, G., Matthews, L., McCorrison, J., Monaghan, E.L., Mun, J.-H., Najar, F.Z., Nicholson, C., Noirot, C., O’Bleness, M., Paule, C.R., Poulain, J., Prion, F., Qin, B., Qu, C., Retzel, E.F., Riddle, C., Sallet, E., Samain, S., Samson, N., Sanders, I., Saurat, O., Scarpelli, C., Schiex, T., Segurens, B., Severin, A.J., Sherrier, D.J., Shi, R., Sims, S., Singer, S.R., Sinharoy, S., Sterck, L., Viollet, A., Wang, B.-B., Wang, K., Wang, M., Wang, X., Warfsmann, J., Weissenbach, J., White, D.D., White, J.D., Wiley, G.B., Wincker, P., Xing, Y., Yang, L., Yao, Z., Ying, F., Zhai, J., Zhou, L., Zuber, A., Dénarié, J., Dixon, R.A., May, G.D., Schwartz, D.C., Rogers, J., Quetier, F., Town, C.D., Roe, B.A.: The medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480(7378), 520–524 (2011)Google Scholar
  32. 32.
    Verde, I., Abbott, A.G., Scalabrin, S., Jung, S., Shu, S., Marroni, F., Zhebentyayeva, T., Dettori, M.T., Grimwood, J., Cattonaro, F., Zuccolo, A., Rossini, L., Jenkins, J., Vendramin, E., Meisel, L.A., Decroocq, V., Sosinski, B., Prochnik, S., Mitros, T., Policriti, A., Cipriani, G., Dondini, L., Ficklin, S., Goodstein, D.M., Xuan, P., Del Fabbro, C., Aramini, V., Copetti, D., Gonzalez, S., Horner, D.S., Falchi, R., Lucas, S., Mica, E., Maldonado, J., Lazzari, B., Bielenberg, D., Pirona, R., Miculan, M., Barakat, A., Testolin, R., Stella, A., Tartarini, S., Tonutti, P., Arus, P., Orellana, A., Wells, C., Main, D., Vizzotto, G., Silva, H., Salamini, F., Schmutz, J., Morgante, M., Rokhsar, D.S.: The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat. Genet. 45(5), 487–494 (2013)CrossRefGoogle Scholar
  33. 33.
    Du, Q., Wang, L., Yang, X., Gong, C., Zhang, D.: Populus endo-\(\beta \)-1,4-glucanases gene family: genomic organization, phylogenetic analysis, expression profiles and association mapping. Planta 241(6), 1417–1434 (2015)CrossRefGoogle Scholar
  34. 34.
    Schmutz, J., McClean, P.E., Mamidi, S., Wu, G.A., Cannon, S.B., Grimwood, J., Jenkins, J., Shu, S., Song, Q., Chavarro, C., Torres-Torres, M., Geffroy, V., Moghaddam, S.M., Gao, D., Abernathy, B., Barry, K., Blair, M., Brick, M.A., Chovatia, M., Gepts, P., Goodstein, D.M., Gonzales, M., Hellsten, U., Hyten, D.L., Jia, G., Kelly, J.D., Kudrna, D., Lee, R., Richard, M.M.S., Miklas, P.N., Osorno, J.M., Rodrigues, J., Thareau, V., Urrea, C.A., Wang, M., Yu, Y., Zhang, M., Wing, R.A., Cregan, P.B., Rokhsar, D.S., Jackson, S.A.: A reference genome for common bean and genome-wide analysis of dual domestications. Nat. Genet. 46(7), 707–713 (2014)CrossRefGoogle Scholar
  35. 35.
    Chan, A.P., Crabtree, J., Zhao, Q., Lorenzi, H., Orvis, J., Puiu, D., Melake-Berhan, A., Jones, K.M., Redman, J., Chen, G., Cahoon, E.B., Gedil, M., Stanke, M., Haas, B.J., Wortman, J.R., Fraser-Liggett, C.M., Ravel, J., Rabinowicz, P.D.: Draft genome sequence of the oilseed species Ricinus communis. Nat. Biotechnol. 28(9), 951–956 (2010)CrossRefGoogle Scholar
  36. 36.
    Motamayor, J.C., Mockaitis, K., Schmutz, J., Haiminen, N., Livingstone, D., Cornejo, O., Findley, S.D., Zheng, P., Utro, F., Royaert, S., Saski, C., Jenkins, J., Podicheti, R., Zhao, M., Scheffler, B.E., Stack, J.C., Feltus, F.A., Mustiga, G.M., Amores, F., Phillips, W., Marelli, J.P., May, G.D., Shapiro, H., Ma, J., Bustamante, C.D., Schnell, R.J., Main, D., Gilbert, D., Parida, L., Kuhn, D.N.: The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biol. 14(6), r53 (2012)CrossRefGoogle Scholar
  37. 37.
    Jaillon, O., Aury, J.-M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., Vezzi, A., Legeai, F., Hugueney, P., Dasilva, C., Horner, D., Mica, E., Jublot, D., Poulain, J., Bruyère, C., Billault, A., Segurens, B., Gouyvenoux, M., Ugarte, E., Cattonaro, F., Anthouard, V., Vico, V., Del Fabbro, C., Alaux, M., Di Gaspero, G., Dumas, V., Felice, N., Paillard, S., Juman, I., Moroldo, M., Scalabrin, S., Canaguier, A., Le Clainche, I., Malacrida, G., Durand, E., Pesole, G., Laucou, V., Chatelet, P., Merdinoglu, D., Delledonne, M., Pezzotti, M., Lecharny, A., Scarpelli, C., Artiguenave, F., Pè, M.E., Valle, G., Morgante, M., Caboche, M., Adam-Blondon, A.-F., Weissenbach, J., Quetier, F., Wincker, P.: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449(7161), 463–467 (2007)CrossRefGoogle Scholar
  38. 38.
    Lechner, M., Hernandez-Rosales, M., Doerr, D., Wieseke, N., Thvenin, A., Stoye, J., Hartmann, R.K., Prohaska, S.J., Stadler, P.F.: Orthology detection combining clustering and synteny for very large datasets. PLoS ONE 9(8), e10515 (2014)CrossRefGoogle Scholar
  39. 39.
    Yang, Z., Sankoff, D.: Natural parameter values for generalized gene adjacencies. J. Comput. Biol. 17(9), 1113–1128 (2010)CrossRefGoogle Scholar
  40. 40.
    Delgado, J., Lynce, I., Manquinho, V.: Computing the summed adjacency disruption number between two genomes with duplicate genes. J. Comput. Biol. 17(9), 1243–1265 (2010)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Luis Antonio B. Kowada
    • 1
  • Daniel Doerr
    • 2
  • Simone Dantas
    • 1
  • Jens Stoye
    • 1
    • 3
  1. 1.Universidade Federal FluminenseNiteróiBrazil
  2. 2.École Polytechnique Fédérale de LausanneLausanneSwitzerland
  3. 3.Faculty of Technology, Center for BiotechnologyBielefeld UniversityBielefeldGermany

Personalised recommendations