Skip to main content

The Statistical Significance of Max-Gap Clusters

  • Conference paper
Comparative Genomics (RCG 2004)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3388))

Included in the following conference series:

Abstract

Identifying gene clusters, genomic regions that share local similarities in gene organization, is a prerequisite for many different types of genomic analyses, including operon prediction, reconstruction of chromosomal rearrangements, and detection of whole-genome duplications. A number of formal definitions of gene clusters have been proposed, as well as methods for finding such clusters and/or statistical tests for determining their significance. Unfortunately, there is very little overlap between previously published rigorous analytical statistical tests and the definitions used in practice. In this paper, we consider the max-gap cluster: a contiguous region containing a maximal set of homologs, where the number of non-homologous genes between pairs of adjacent homologs is never greater than a predefined, fixed parameter, g. Although this is one of the models most widely used in practice, currently the statistical significance of max-gap clusters can only be evaluated using Monte Carlo simulations because no analytical statistical tests have been developed for it. We give exact expressions for the probability of observing such a cluster by chance, assuming a simple reference-region scenario and random gene order, as well as more efficient methods for approximating this probability. We use these methods to identify which regions of the parameter space yield clusters that are statistically significant. Finally, we discuss some of the challenges in extending this model to whole-genome comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amores, A., Force, A., Yan, Y.l., Joly, L., Amemiya, C., Fritz, A., Ho, R.K., Langeland, J., Prince, V., Wang, Y.L., Westerfield, M., Ekker, M., Postlethwait, J.H.: Zebrafish hox clusters and vertebrate genome evolution. Science 282, 1711–1714 (1998)

    Article  Google Scholar 

  2. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000)

    Google Scholar 

  3. Bansal, A.K.: An automated comparative analysis of 17 complete microbial genomes. Bioinformatics 15, 900–908 (1999)

    Article  Google Scholar 

  4. Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 464–476. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Blanc, G., Hokamp, K., Wolfe, K.H.: A recent polyploidy superimposed on older large-scale duplications in the arabidopsis genome. Genome Res. 13(2), 137–144 (2003)

    Article  Google Scholar 

  6. Blanchette, M., Kunisawa, T., Sankoff, D.: Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution 49, 193–203 (1999)

    Article  Google Scholar 

  7. Bork, P., Snel, B., Lehmann, G., Suyama, M., Dandekar, T., Lathe III, W., Huynen, M.: Comparative genome analysis: exploiting the context of genes to infer evolution and predict function. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 281–294. Kluwer Academic Press, Dordrecht (2000)

    Google Scholar 

  8. Bourque, G., Pevzner, P.A.: Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Res. 12(1), 26–36 (2002)

    Google Scholar 

  9. Calabrese, P.P., Chakravarty, S., Vision, T.J.: Fast identification and statistical evaluation of segmental homologies in comparative maps. ISMB (Supplement of Bioinformatics), 74–80 (2003)

    Google Scholar 

  10. Chen, X., Su, Z., Dam, P., Palenik, B., Xu, Y., Jiang, T.: Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. Nucleic Acids Res. 32(7), 2147–2157 (2004)

    Article  Google Scholar 

  11. Coghlan, A., Wolfe, K.H.: Fourfold faster rate of genome rearrangement in nematodes than in Drosophila. Genome Research 12(6), 857–867 (2002)

    Article  Google Scholar 

  12. Cosner, M.E., Jansen, R.K., Moret, B.M.E., Raubeson, L.A., Wang, L.-S., Warnow, T., Wyman, S.: An empirical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 99–121. Kluwer Academic Press, Dordrecht (2000)

    Google Scholar 

  13. Coulier, F., Pontarotti, P., Roubin, R., Hartung, H., Goldfarb, M., Birnbaum, D.: Of worms and men: An evolutionary perspective on the fibroblast growth factor (FGF) and FGF receptor families. J. Mol. Evol. 44, 43–56 (1997)

    Article  Google Scholar 

  14. Danchin, E.G., Abi-Rached, L., Gilles, A., Pontarotti, P.: Abstract conservation of the mhc-like region throughout evolution. Immunogenetics 5(3), 141–148 (2003)

    Article  Google Scholar 

  15. Durand, D., Sankoff, D.: Tests for gene clustering. Journal of Computational Biology 10(3/4), 453–482 (2003)

    Article  Google Scholar 

  16. Ehrlich, J., Sankoff, D., Nadeau, J.H.: Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics 147(1), 289–296 (1997)

    Google Scholar 

  17. El-Mabrouk, N., Nadeau, J.H., Sankoff, D.: Genome halving. In: Springer-Verlag (ed.) Combinatorial Pattern Matching, pp. 235–250 (1998)

    Google Scholar 

  18. El-Mabrouk, N., Sankoff, D.: The reconstruction of doubled genomes. SIAM Journal of Computing 32, 754–792 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  19. Endo, T., Imanishi, T., Gojobori, T., Inoko, H.: Evolutionary significance of intra-genome duplications on human chromosomes. Gene 205(1–2), 19–27 (1997)

    Article  Google Scholar 

  20. Ermolaeva, M.D., White, O., Salzberg, S.: Prediction of operons in microbial genomes. Nucleic Acids Res. 5(29), 1216–1221 (2001)

    Article  Google Scholar 

  21. Gibson, T.J., Spring, J.: Evidence in favour of ancient octaploidy in the vertebrate genome. Biochem. Soc. Trans. 2, 259–264 (2000)

    Google Scholar 

  22. Goldberg, D., McCouch, S., Kleinberg, J.: Algorithms for constructing comparative maps. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 281–294. Kluwer Academic Press, Dordrecht (2000)

    Google Scholar 

  23. Goldberg, L.A., Goldberg, P.W., Paterson, M.S., Pevzner, P., Sahinalp, S.C., Sweedyk, E.: The complexity of gene placement. Journal of Algorithms 41(2), 225–2435 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  24. Graham, Knuth, Patashnik: Concrete Mathematics. Addison-Wesley, Reading (1989)

    MATH  Google Scholar 

  25. Hampson, S., McLysaght, A., Gaut, B., Baldi, P.: LineUp: statistical detection of chromosomal homology with application to plant comparative genomics. Genome Res. 13(5), 999–1010 (2003)

    Article  Google Scholar 

  26. Hannenhalli, S., Chappey, C., Koonin, E.V., Pevzner, P.A.: Genome sequence comparison and scenarios for gene rearrangements: A test case. Genomics 30, 299–311 (1995)

    Article  Google Scholar 

  27. Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 254–265. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  28. Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  29. Housworth, E.A., Postlethwait, J.: Measures of synteny conservation between species pairs. Genetics 162(1), 441–448 (2002)

    Google Scholar 

  30. Hughes, A.L.: Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. MBE 15(7), 854–870 (1998)

    Google Scholar 

  31. Huynen, M., Bork, P.: Measuring genome evolution. Proc. Natl. Acad. Sci. U.S.A. 95, 5849–5856 (1998)

    Article  Google Scholar 

  32. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409(682), 860–921 (2001)

    Google Scholar 

  33. Kasahara, M.: New insights into the genomic organization and origin of the major histocompatibility complex: role of chromosomal (genome) duplication in the emergence of the adaptive immune system. Hereditas 127(1–2), 59–65 (1997)

    Article  Google Scholar 

  34. Katsanis, N., Fitzgibbon, J., Fisher, E.M.: Paralogy mapping: identification of a region in the human MHC triplicated onto human chromosomes 1 and 9 allows the prediction and isolation of novel PBX and NOTCH loci. Genomics 35(1), 101–118 (1996)

    Article  Google Scholar 

  35. Kolsto, A.B.: Dynamic bacterial genome organization. Molecular Microbiology 24, 241–248 (1997)

    Article  Google Scholar 

  36. Lawrence, J.G., Roth, J.R.: Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143, 1843–1860 (1996)

    Google Scholar 

  37. Lipovich, L., Lynch, E.D., Lee, M.K., King, M.-C.: A novel sodium bicarbonate cotransporter-like gene in an ancient duplicated region: SLC4A9 at 5q31. Genome Biology 2(4), 0011.1–0011.13 (2001)

    Google Scholar 

  38. Luc, N., Risler, J.L., Bergeron, A., Raffinot, M.: Gene teams: a new formalization of gene clusters for comparative genomics. Comput. Biol. Chem. 27(1), 59–67 (2003)

    Article  Google Scholar 

  39. Lundin, L.G.: Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics 16(1), 1–19 (1993)

    Article  Google Scholar 

  40. McLysaght, A., Hokamp, K., Wolfe, K.H.: Extensive genomic duplication during early chordate evolution. Nat. Genet. 31(2), 200–204 (2002)

    Article  Google Scholar 

  41. Nadeau, J.H., Taylor, B.A.: Lengths of chromosomal segments conserved since the divergence of man and mouse. Proc. Natl. Acad. Sci. U.S.A. 81, 814–818 (1984)

    Article  Google Scholar 

  42. Nadeau, J.H., Sankoff, D.: Counting on comparative maps. Trends Genet. 14(12), 495–501 (1998)

    Article  Google Scholar 

  43. Nadeau, J.H., Sankoff, D.: The lengths of undiscovered conserved segments in comparative maps. Mamm Genome 9(6), 491–495 (1998)

    Article  Google Scholar 

  44. O’Brien, S.J., Wienberg, J., Lyons, L.A.: Comparative genomics: lessons from cats. Trends Genet. 10(13), 393–399 (1997)

    Article  Google Scholar 

  45. Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. PNAS 96, 2896–2901 (1999)

    Article  Google Scholar 

  46. Pebusque, M.-J., Coulier, F., Birnbaum, D., Pontarotti, P.: Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. MBE 15(9), 1145–1159 (1998)

    Google Scholar 

  47. Pevzner, P.A.: Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  48. Ruvinsky, I., Silver, L.M.: Newly indentified paralogous groups on mouse chromosomes 5 and 11 reveal the age of a t-box cluster duplication. Genomics 40, 262–266 (1997)

    Article  Google Scholar 

  49. Sankoff, D., Bryant, D., Deneault, M., Lang, B.F., Burger, G.: Early eukaryote evolution based on mitochondrial gene order breakpoints. J. Comput. Biol. 3(4), 521–535 (2000)

    Article  Google Scholar 

  50. Sankoff, D., Deneault, M., Bryant, D., Lemieux, C., Turmel, M.: Chloroplast gene order and the divergence of plants and algae from the normalized number of induced breakpoints. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 89–98. Kluwer Academic Press, Dordrecht (2000)

    Google Scholar 

  51. Sankoff, D., El-Mabrouk, N.: Genome rearrangement. In: Jiang, T., Smith, T., Xu, Y., Zhang, M. (eds.) Current Topics in Computational Biology, pp. 135–155. MIT Press, Cambridge (2002)

    Google Scholar 

  52. Sankoff, D., Ferretti, V., Nadeau, J.H.: Conserved segment identification. Journal of Computational Biology 4, 559–565 (1997)

    Article  Google Scholar 

  53. Semple, C., Wolfe, K.H.: Gene duplication and gene conversion in the Caenorhabditis elegans genome. JME 48(5), 555–564 (1999)

    Article  Google Scholar 

  54. Seoighe, C., Wolfe, K.H.: Updated map of duplicated regions in the yeast genome. Gene 238, 253–261 (1999)

    Article  Google Scholar 

  55. Seoighe, C., Wolfe, K.H.: Extent of genomic rearrangement after genome duplication in yeast. Proc. Natl. Acad. Sci. U.S.A. 95(8), 4447–4452 (1998)

    Article  Google Scholar 

  56. Simillion, C., Vandepoele, K., Van Montagu, M.C., Zabeau, M., Van de Peer, Y.: The hidden duplication past of arabidopsis thaliana. Proc. Natl. Acad. Sci. U.S.A. 99(21) (2002)

    Google Scholar 

  57. Skovgaard, M., Jensen, L.J., Brunak, S., Ussery, D., Krogh, A.: On the total number of genes and their length distribution in complete microbial genomes. Trends Genet. 17(8), 425–428 (2001)

    Article  Google Scholar 

  58. Smith, N.G.C., Knight, R., Hurst, L.D.: Vertebrate genome evolution: a slow shuffle or a big bang. BioEssays 21, 697–703 (1999)

    Article  Google Scholar 

  59. Spring, J.: Genome duplication strikes back. Nature Genetics 31, 128–129 (2002)

    Google Scholar 

  60. Tamames, J.: Evolution of gene order conservation in prokaryotes. Genome Biol. 6(2), 0020.1–0020.11 (2001)

    Google Scholar 

  61. Tamames, J., Casari, G., Ouzounis, C., Valencia, A.: Conserved clusters of functionally related genes in two bacterial genomes. JME 44, 66–73 (1997)

    Article  Google Scholar 

  62. Tamames, J., Gonzalez-Moreno, M., Valencia, A., Vicente, M.: Bringing gene order into bacterial shape. Trends Genet. 3(17), 124–126 (2001)

    Article  Google Scholar 

  63. Trachtulec, Z., Forejt, J.: Synteny of orthologous genes conserved in mammals, snake, fly, nematode, and fission yeast. Mamm Genome 3(12), 227–231 (2001)

    Article  Google Scholar 

  64. Uspensky, J.V.: Introduction to Mathematical Probability, pp. 23–24. McGraw- Hill, New York (1937)

    MATH  Google Scholar 

  65. Vandepoele, K., Saeys, Y., Simillion, C., Raes, J., Van De Peer, Y.: The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between arabidopsis and rice. Genome Res. 12(11), 1792–1801 (2002)

    Article  Google Scholar 

  66. Venter, J.C., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001)

    Article  Google Scholar 

  67. Vision, T.J., Brown, D.G., Tanksley, S.D.: The origins of genomic duplications in Arabidopsis. Science 290, 2114–2117 (2000)

    Article  Google Scholar 

  68. Wolfe, K.H., Shields, D.C.: Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997)

    Article  Google Scholar 

  69. Zheng, Y., Szustakowski, J.D., Fortnow, L., Roberts, R.J., Kasif, S.: Computational identification of operons in microbial genomes. Genome Res. 12(8), 1221–1230 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hoberman, R., Sankoff, D., Durand, D. (2005). The Statistical Significance of Max-Gap Clusters. In: Lagergren, J. (eds) Comparative Genomics. RCG 2004. Lecture Notes in Computer Science(), vol 3388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32290-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-32290-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24455-4

  • Online ISBN: 978-3-540-32290-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics