Abstract
Identifying gene clusters, genomic regions that share local similarities in gene organization, is a prerequisite for many different types of genomic analyses, including operon prediction, reconstruction of chromosomal rearrangements, and detection of whole-genome duplications. A number of formal definitions of gene clusters have been proposed, as well as methods for finding such clusters and/or statistical tests for determining their significance. Unfortunately, there is very little overlap between previously published rigorous analytical statistical tests and the definitions used in practice. In this paper, we consider the max-gap cluster: a contiguous region containing a maximal set of homologs, where the number of non-homologous genes between pairs of adjacent homologs is never greater than a predefined, fixed parameter, g. Although this is one of the models most widely used in practice, currently the statistical significance of max-gap clusters can only be evaluated using Monte Carlo simulations because no analytical statistical tests have been developed for it. We give exact expressions for the probability of observing such a cluster by chance, assuming a simple reference-region scenario and random gene order, as well as more efficient methods for approximating this probability. We use these methods to identify which regions of the parameter space yield clusters that are statistically significant. Finally, we discuss some of the challenges in extending this model to whole-genome comparison.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amores, A., Force, A., Yan, Y.l., Joly, L., Amemiya, C., Fritz, A., Ho, R.K., Langeland, J., Prince, V., Wang, Y.L., Westerfield, M., Ekker, M., Postlethwait, J.H.: Zebrafish hox clusters and vertebrate genome evolution. Science 282, 1711–1714 (1998)
Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000)
Bansal, A.K.: An automated comparative analysis of 17 complete microbial genomes. Bioinformatics 15, 900–908 (1999)
Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 464–476. Springer, Heidelberg (2002)
Blanc, G., Hokamp, K., Wolfe, K.H.: A recent polyploidy superimposed on older large-scale duplications in the arabidopsis genome. Genome Res. 13(2), 137–144 (2003)
Blanchette, M., Kunisawa, T., Sankoff, D.: Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution 49, 193–203 (1999)
Bork, P., Snel, B., Lehmann, G., Suyama, M., Dandekar, T., Lathe III, W., Huynen, M.: Comparative genome analysis: exploiting the context of genes to infer evolution and predict function. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 281–294. Kluwer Academic Press, Dordrecht (2000)
Bourque, G., Pevzner, P.A.: Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Res. 12(1), 26–36 (2002)
Calabrese, P.P., Chakravarty, S., Vision, T.J.: Fast identification and statistical evaluation of segmental homologies in comparative maps. ISMB (Supplement of Bioinformatics), 74–80 (2003)
Chen, X., Su, Z., Dam, P., Palenik, B., Xu, Y., Jiang, T.: Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. Nucleic Acids Res. 32(7), 2147–2157 (2004)
Coghlan, A., Wolfe, K.H.: Fourfold faster rate of genome rearrangement in nematodes than in Drosophila. Genome Research 12(6), 857–867 (2002)
Cosner, M.E., Jansen, R.K., Moret, B.M.E., Raubeson, L.A., Wang, L.-S., Warnow, T., Wyman, S.: An empirical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 99–121. Kluwer Academic Press, Dordrecht (2000)
Coulier, F., Pontarotti, P., Roubin, R., Hartung, H., Goldfarb, M., Birnbaum, D.: Of worms and men: An evolutionary perspective on the fibroblast growth factor (FGF) and FGF receptor families. J. Mol. Evol. 44, 43–56 (1997)
Danchin, E.G., Abi-Rached, L., Gilles, A., Pontarotti, P.: Abstract conservation of the mhc-like region throughout evolution. Immunogenetics 5(3), 141–148 (2003)
Durand, D., Sankoff, D.: Tests for gene clustering. Journal of Computational Biology 10(3/4), 453–482 (2003)
Ehrlich, J., Sankoff, D., Nadeau, J.H.: Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics 147(1), 289–296 (1997)
El-Mabrouk, N., Nadeau, J.H., Sankoff, D.: Genome halving. In: Springer-Verlag (ed.) Combinatorial Pattern Matching, pp. 235–250 (1998)
El-Mabrouk, N., Sankoff, D.: The reconstruction of doubled genomes. SIAM Journal of Computing 32, 754–792 (2003)
Endo, T., Imanishi, T., Gojobori, T., Inoko, H.: Evolutionary significance of intra-genome duplications on human chromosomes. Gene 205(1–2), 19–27 (1997)
Ermolaeva, M.D., White, O., Salzberg, S.: Prediction of operons in microbial genomes. Nucleic Acids Res. 5(29), 1216–1221 (2001)
Gibson, T.J., Spring, J.: Evidence in favour of ancient octaploidy in the vertebrate genome. Biochem. Soc. Trans. 2, 259–264 (2000)
Goldberg, D., McCouch, S., Kleinberg, J.: Algorithms for constructing comparative maps. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 281–294. Kluwer Academic Press, Dordrecht (2000)
Goldberg, L.A., Goldberg, P.W., Paterson, M.S., Pevzner, P., Sahinalp, S.C., Sweedyk, E.: The complexity of gene placement. Journal of Algorithms 41(2), 225–2435 (2001)
Graham, Knuth, Patashnik: Concrete Mathematics. Addison-Wesley, Reading (1989)
Hampson, S., McLysaght, A., Gaut, B., Baldi, P.: LineUp: statistical detection of chromosomal homology with application to plant comparative genomics. Genome Res. 13(5), 999–1010 (2003)
Hannenhalli, S., Chappey, C., Koonin, E.V., Pevzner, P.A.: Genome sequence comparison and scenarios for gene rearrangements: A test case. Genomics 30, 299–311 (1995)
Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 254–265. Springer, Heidelberg (2001)
Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)
Housworth, E.A., Postlethwait, J.: Measures of synteny conservation between species pairs. Genetics 162(1), 441–448 (2002)
Hughes, A.L.: Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. MBE 15(7), 854–870 (1998)
Huynen, M., Bork, P.: Measuring genome evolution. Proc. Natl. Acad. Sci. U.S.A. 95, 5849–5856 (1998)
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409(682), 860–921 (2001)
Kasahara, M.: New insights into the genomic organization and origin of the major histocompatibility complex: role of chromosomal (genome) duplication in the emergence of the adaptive immune system. Hereditas 127(1–2), 59–65 (1997)
Katsanis, N., Fitzgibbon, J., Fisher, E.M.: Paralogy mapping: identification of a region in the human MHC triplicated onto human chromosomes 1 and 9 allows the prediction and isolation of novel PBX and NOTCH loci. Genomics 35(1), 101–118 (1996)
Kolsto, A.B.: Dynamic bacterial genome organization. Molecular Microbiology 24, 241–248 (1997)
Lawrence, J.G., Roth, J.R.: Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143, 1843–1860 (1996)
Lipovich, L., Lynch, E.D., Lee, M.K., King, M.-C.: A novel sodium bicarbonate cotransporter-like gene in an ancient duplicated region: SLC4A9 at 5q31. Genome Biology 2(4), 0011.1–0011.13 (2001)
Luc, N., Risler, J.L., Bergeron, A., Raffinot, M.: Gene teams: a new formalization of gene clusters for comparative genomics. Comput. Biol. Chem. 27(1), 59–67 (2003)
Lundin, L.G.: Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics 16(1), 1–19 (1993)
McLysaght, A., Hokamp, K., Wolfe, K.H.: Extensive genomic duplication during early chordate evolution. Nat. Genet. 31(2), 200–204 (2002)
Nadeau, J.H., Taylor, B.A.: Lengths of chromosomal segments conserved since the divergence of man and mouse. Proc. Natl. Acad. Sci. U.S.A. 81, 814–818 (1984)
Nadeau, J.H., Sankoff, D.: Counting on comparative maps. Trends Genet. 14(12), 495–501 (1998)
Nadeau, J.H., Sankoff, D.: The lengths of undiscovered conserved segments in comparative maps. Mamm Genome 9(6), 491–495 (1998)
O’Brien, S.J., Wienberg, J., Lyons, L.A.: Comparative genomics: lessons from cats. Trends Genet. 10(13), 393–399 (1997)
Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. PNAS 96, 2896–2901 (1999)
Pebusque, M.-J., Coulier, F., Birnbaum, D., Pontarotti, P.: Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. MBE 15(9), 1145–1159 (1998)
Pevzner, P.A.: Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge (2000)
Ruvinsky, I., Silver, L.M.: Newly indentified paralogous groups on mouse chromosomes 5 and 11 reveal the age of a t-box cluster duplication. Genomics 40, 262–266 (1997)
Sankoff, D., Bryant, D., Deneault, M., Lang, B.F., Burger, G.: Early eukaryote evolution based on mitochondrial gene order breakpoints. J. Comput. Biol. 3(4), 521–535 (2000)
Sankoff, D., Deneault, M., Bryant, D., Lemieux, C., Turmel, M.: Chloroplast gene order and the divergence of plants and algae from the normalized number of induced breakpoints. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 89–98. Kluwer Academic Press, Dordrecht (2000)
Sankoff, D., El-Mabrouk, N.: Genome rearrangement. In: Jiang, T., Smith, T., Xu, Y., Zhang, M. (eds.) Current Topics in Computational Biology, pp. 135–155. MIT Press, Cambridge (2002)
Sankoff, D., Ferretti, V., Nadeau, J.H.: Conserved segment identification. Journal of Computational Biology 4, 559–565 (1997)
Semple, C., Wolfe, K.H.: Gene duplication and gene conversion in the Caenorhabditis elegans genome. JME 48(5), 555–564 (1999)
Seoighe, C., Wolfe, K.H.: Updated map of duplicated regions in the yeast genome. Gene 238, 253–261 (1999)
Seoighe, C., Wolfe, K.H.: Extent of genomic rearrangement after genome duplication in yeast. Proc. Natl. Acad. Sci. U.S.A. 95(8), 4447–4452 (1998)
Simillion, C., Vandepoele, K., Van Montagu, M.C., Zabeau, M., Van de Peer, Y.: The hidden duplication past of arabidopsis thaliana. Proc. Natl. Acad. Sci. U.S.A. 99(21) (2002)
Skovgaard, M., Jensen, L.J., Brunak, S., Ussery, D., Krogh, A.: On the total number of genes and their length distribution in complete microbial genomes. Trends Genet. 17(8), 425–428 (2001)
Smith, N.G.C., Knight, R., Hurst, L.D.: Vertebrate genome evolution: a slow shuffle or a big bang. BioEssays 21, 697–703 (1999)
Spring, J.: Genome duplication strikes back. Nature Genetics 31, 128–129 (2002)
Tamames, J.: Evolution of gene order conservation in prokaryotes. Genome Biol. 6(2), 0020.1–0020.11 (2001)
Tamames, J., Casari, G., Ouzounis, C., Valencia, A.: Conserved clusters of functionally related genes in two bacterial genomes. JME 44, 66–73 (1997)
Tamames, J., Gonzalez-Moreno, M., Valencia, A., Vicente, M.: Bringing gene order into bacterial shape. Trends Genet. 3(17), 124–126 (2001)
Trachtulec, Z., Forejt, J.: Synteny of orthologous genes conserved in mammals, snake, fly, nematode, and fission yeast. Mamm Genome 3(12), 227–231 (2001)
Uspensky, J.V.: Introduction to Mathematical Probability, pp. 23–24. McGraw- Hill, New York (1937)
Vandepoele, K., Saeys, Y., Simillion, C., Raes, J., Van De Peer, Y.: The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between arabidopsis and rice. Genome Res. 12(11), 1792–1801 (2002)
Venter, J.C., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001)
Vision, T.J., Brown, D.G., Tanksley, S.D.: The origins of genomic duplications in Arabidopsis. Science 290, 2114–2117 (2000)
Wolfe, K.H., Shields, D.C.: Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997)
Zheng, Y., Szustakowski, J.D., Fortnow, L., Roberts, R.J., Kasif, S.: Computational identification of operons in microbial genomes. Genome Res. 12(8), 1221–1230 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoberman, R., Sankoff, D., Durand, D. (2005). The Statistical Significance of Max-Gap Clusters. In: Lagergren, J. (eds) Comparative Genomics. RCG 2004. Lecture Notes in Computer Science(), vol 3388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32290-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-32290-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24455-4
Online ISBN: 978-3-540-32290-0
eBook Packages: Computer ScienceComputer Science (R0)