The Statistical Significance of Max-Gap Clusters

Hoberman, Rose; Sankoff, David; Durand, Dannie

doi:10.1007/978-3-540-32290-0_5

Rose Hoberman²⁰,
David Sankoff²¹ &
Dannie Durand²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3388))

Included in the following conference series:

RECOMB Workshop on Comparative Genomics

370 Accesses
8 Citations

Abstract

Identifying gene clusters, genomic regions that share local similarities in gene organization, is a prerequisite for many different types of genomic analyses, including operon prediction, reconstruction of chromosomal rearrangements, and detection of whole-genome duplications. A number of formal definitions of gene clusters have been proposed, as well as methods for finding such clusters and/or statistical tests for determining their significance. Unfortunately, there is very little overlap between previously published rigorous analytical statistical tests and the definitions used in practice. In this paper, we consider the max-gap cluster: a contiguous region containing a maximal set of homologs, where the number of non-homologous genes between pairs of adjacent homologs is never greater than a predefined, fixed parameter, g. Although this is one of the models most widely used in practice, currently the statistical significance of max-gap clusters can only be evaluated using Monte Carlo simulations because no analytical statistical tests have been developed for it. We give exact expressions for the probability of observing such a cluster by chance, assuming a simple reference-region scenario and random gene order, as well as more efficient methods for approximating this probability. We use these methods to identify which regions of the parameter space yield clusters that are statistically significant. Finally, we discuss some of the challenges in extending this model to whole-genome comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amores, A., Force, A., Yan, Y.l., Joly, L., Amemiya, C., Fritz, A., Ho, R.K., Langeland, J., Prince, V., Wang, Y.L., Westerfield, M., Ekker, M., Postlethwait, J.H.: Zebrafish hox clusters and vertebrate genome evolution. Science 282, 1711–1714 (1998)
Article Google Scholar
Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000)
Google Scholar
Bansal, A.K.: An automated comparative analysis of 17 complete microbial genomes. Bioinformatics 15, 900–908 (1999)
Article Google Scholar
Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 464–476. Springer, Heidelberg (2002)
Chapter Google Scholar
Blanc, G., Hokamp, K., Wolfe, K.H.: A recent polyploidy superimposed on older large-scale duplications in the arabidopsis genome. Genome Res. 13(2), 137–144 (2003)
Article Google Scholar
Blanchette, M., Kunisawa, T., Sankoff, D.: Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution 49, 193–203 (1999)
Article Google Scholar
Bork, P., Snel, B., Lehmann, G., Suyama, M., Dandekar, T., Lathe III, W., Huynen, M.: Comparative genome analysis: exploiting the context of genes to infer evolution and predict function. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 281–294. Kluwer Academic Press, Dordrecht (2000)
Google Scholar
Bourque, G., Pevzner, P.A.: Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Res. 12(1), 26–36 (2002)
Google Scholar
Calabrese, P.P., Chakravarty, S., Vision, T.J.: Fast identification and statistical evaluation of segmental homologies in comparative maps. ISMB (Supplement of Bioinformatics), 74–80 (2003)
Google Scholar
Chen, X., Su, Z., Dam, P., Palenik, B., Xu, Y., Jiang, T.: Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. Nucleic Acids Res. 32(7), 2147–2157 (2004)
Article Google Scholar
Coghlan, A., Wolfe, K.H.: Fourfold faster rate of genome rearrangement in nematodes than in Drosophila. Genome Research 12(6), 857–867 (2002)
Article Google Scholar
Cosner, M.E., Jansen, R.K., Moret, B.M.E., Raubeson, L.A., Wang, L.-S., Warnow, T., Wyman, S.: An empirical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 99–121. Kluwer Academic Press, Dordrecht (2000)
Google Scholar
Coulier, F., Pontarotti, P., Roubin, R., Hartung, H., Goldfarb, M., Birnbaum, D.: Of worms and men: An evolutionary perspective on the fibroblast growth factor (FGF) and FGF receptor families. J. Mol. Evol. 44, 43–56 (1997)
Article Google Scholar
Danchin, E.G., Abi-Rached, L., Gilles, A., Pontarotti, P.: Abstract conservation of the mhc-like region throughout evolution. Immunogenetics 5(3), 141–148 (2003)
Article Google Scholar
Durand, D., Sankoff, D.: Tests for gene clustering. Journal of Computational Biology 10(3/4), 453–482 (2003)
Article Google Scholar
Ehrlich, J., Sankoff, D., Nadeau, J.H.: Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics 147(1), 289–296 (1997)
Google Scholar
El-Mabrouk, N., Nadeau, J.H., Sankoff, D.: Genome halving. In: Springer-Verlag (ed.) Combinatorial Pattern Matching, pp. 235–250 (1998)
Google Scholar
El-Mabrouk, N., Sankoff, D.: The reconstruction of doubled genomes. SIAM Journal of Computing 32, 754–792 (2003)
Article MathSciNet MATH Google Scholar
Endo, T., Imanishi, T., Gojobori, T., Inoko, H.: Evolutionary significance of intra-genome duplications on human chromosomes. Gene 205(1–2), 19–27 (1997)
Article Google Scholar
Ermolaeva, M.D., White, O., Salzberg, S.: Prediction of operons in microbial genomes. Nucleic Acids Res. 5(29), 1216–1221 (2001)
Article Google Scholar
Gibson, T.J., Spring, J.: Evidence in favour of ancient octaploidy in the vertebrate genome. Biochem. Soc. Trans. 2, 259–264 (2000)
Google Scholar
Goldberg, D., McCouch, S., Kleinberg, J.: Algorithms for constructing comparative maps. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 281–294. Kluwer Academic Press, Dordrecht (2000)
Google Scholar
Goldberg, L.A., Goldberg, P.W., Paterson, M.S., Pevzner, P., Sahinalp, S.C., Sweedyk, E.: The complexity of gene placement. Journal of Algorithms 41(2), 225–2435 (2001)
Article MathSciNet MATH Google Scholar
Graham, Knuth, Patashnik: Concrete Mathematics. Addison-Wesley, Reading (1989)
MATH Google Scholar
Hampson, S., McLysaght, A., Gaut, B., Baldi, P.: LineUp: statistical detection of chromosomal homology with application to plant comparative genomics. Genome Res. 13(5), 999–1010 (2003)
Article Google Scholar
Hannenhalli, S., Chappey, C., Koonin, E.V., Pevzner, P.A.: Genome sequence comparison and scenarios for gene rearrangements: A test case. Genomics 30, 299–311 (1995)
Article Google Scholar
Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 254–265. Springer, Heidelberg (2001)
Chapter Google Scholar
Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)
Chapter Google Scholar
Housworth, E.A., Postlethwait, J.: Measures of synteny conservation between species pairs. Genetics 162(1), 441–448 (2002)
Google Scholar
Hughes, A.L.: Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. MBE 15(7), 854–870 (1998)
Google Scholar
Huynen, M., Bork, P.: Measuring genome evolution. Proc. Natl. Acad. Sci. U.S.A. 95, 5849–5856 (1998)
Article Google Scholar
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409(682), 860–921 (2001)
Google Scholar
Kasahara, M.: New insights into the genomic organization and origin of the major histocompatibility complex: role of chromosomal (genome) duplication in the emergence of the adaptive immune system. Hereditas 127(1–2), 59–65 (1997)
Article Google Scholar
Katsanis, N., Fitzgibbon, J., Fisher, E.M.: Paralogy mapping: identification of a region in the human MHC triplicated onto human chromosomes 1 and 9 allows the prediction and isolation of novel PBX and NOTCH loci. Genomics 35(1), 101–118 (1996)
Article Google Scholar
Kolsto, A.B.: Dynamic bacterial genome organization. Molecular Microbiology 24, 241–248 (1997)
Article Google Scholar
Lawrence, J.G., Roth, J.R.: Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics 143, 1843–1860 (1996)
Google Scholar
Lipovich, L., Lynch, E.D., Lee, M.K., King, M.-C.: A novel sodium bicarbonate cotransporter-like gene in an ancient duplicated region: SLC4A9 at 5q31. Genome Biology 2(4), 0011.1–0011.13 (2001)
Google Scholar
Luc, N., Risler, J.L., Bergeron, A., Raffinot, M.: Gene teams: a new formalization of gene clusters for comparative genomics. Comput. Biol. Chem. 27(1), 59–67 (2003)
Article Google Scholar
Lundin, L.G.: Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics 16(1), 1–19 (1993)
Article Google Scholar
McLysaght, A., Hokamp, K., Wolfe, K.H.: Extensive genomic duplication during early chordate evolution. Nat. Genet. 31(2), 200–204 (2002)
Article Google Scholar
Nadeau, J.H., Taylor, B.A.: Lengths of chromosomal segments conserved since the divergence of man and mouse. Proc. Natl. Acad. Sci. U.S.A. 81, 814–818 (1984)
Article Google Scholar
Nadeau, J.H., Sankoff, D.: Counting on comparative maps. Trends Genet. 14(12), 495–501 (1998)
Article Google Scholar
Nadeau, J.H., Sankoff, D.: The lengths of undiscovered conserved segments in comparative maps. Mamm Genome 9(6), 491–495 (1998)
Article Google Scholar
O’Brien, S.J., Wienberg, J., Lyons, L.A.: Comparative genomics: lessons from cats. Trends Genet. 10(13), 393–399 (1997)
Article Google Scholar
Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. PNAS 96, 2896–2901 (1999)
Article Google Scholar
Pebusque, M.-J., Coulier, F., Birnbaum, D., Pontarotti, P.: Ancient large-scale genome duplications: phylogenetic and linkage analyses shed light on chordate genome evolution. MBE 15(9), 1145–1159 (1998)
Google Scholar
Pevzner, P.A.: Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge (2000)
MATH Google Scholar
Ruvinsky, I., Silver, L.M.: Newly indentified paralogous groups on mouse chromosomes 5 and 11 reveal the age of a t-box cluster duplication. Genomics 40, 262–266 (1997)
Article Google Scholar
Sankoff, D., Bryant, D., Deneault, M., Lang, B.F., Burger, G.: Early eukaryote evolution based on mitochondrial gene order breakpoints. J. Comput. Biol. 3(4), 521–535 (2000)
Article Google Scholar
Sankoff, D., Deneault, M., Bryant, D., Lemieux, C., Turmel, M.: Chloroplast gene order and the divergence of plants and algae from the normalized number of induced breakpoints. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 89–98. Kluwer Academic Press, Dordrecht (2000)
Google Scholar
Sankoff, D., El-Mabrouk, N.: Genome rearrangement. In: Jiang, T., Smith, T., Xu, Y., Zhang, M. (eds.) Current Topics in Computational Biology, pp. 135–155. MIT Press, Cambridge (2002)
Google Scholar
Sankoff, D., Ferretti, V., Nadeau, J.H.: Conserved segment identification. Journal of Computational Biology 4, 559–565 (1997)
Article Google Scholar
Semple, C., Wolfe, K.H.: Gene duplication and gene conversion in the Caenorhabditis elegans genome. JME 48(5), 555–564 (1999)
Article Google Scholar
Seoighe, C., Wolfe, K.H.: Updated map of duplicated regions in the yeast genome. Gene 238, 253–261 (1999)
Article Google Scholar
Seoighe, C., Wolfe, K.H.: Extent of genomic rearrangement after genome duplication in yeast. Proc. Natl. Acad. Sci. U.S.A. 95(8), 4447–4452 (1998)
Article Google Scholar
Simillion, C., Vandepoele, K., Van Montagu, M.C., Zabeau, M., Van de Peer, Y.: The hidden duplication past of arabidopsis thaliana. Proc. Natl. Acad. Sci. U.S.A. 99(21) (2002)
Google Scholar
Skovgaard, M., Jensen, L.J., Brunak, S., Ussery, D., Krogh, A.: On the total number of genes and their length distribution in complete microbial genomes. Trends Genet. 17(8), 425–428 (2001)
Article Google Scholar
Smith, N.G.C., Knight, R., Hurst, L.D.: Vertebrate genome evolution: a slow shuffle or a big bang. BioEssays 21, 697–703 (1999)
Article Google Scholar
Spring, J.: Genome duplication strikes back. Nature Genetics 31, 128–129 (2002)
Google Scholar
Tamames, J.: Evolution of gene order conservation in prokaryotes. Genome Biol. 6(2), 0020.1–0020.11 (2001)
Google Scholar
Tamames, J., Casari, G., Ouzounis, C., Valencia, A.: Conserved clusters of functionally related genes in two bacterial genomes. JME 44, 66–73 (1997)
Article Google Scholar
Tamames, J., Gonzalez-Moreno, M., Valencia, A., Vicente, M.: Bringing gene order into bacterial shape. Trends Genet. 3(17), 124–126 (2001)
Article Google Scholar
Trachtulec, Z., Forejt, J.: Synteny of orthologous genes conserved in mammals, snake, fly, nematode, and fission yeast. Mamm Genome 3(12), 227–231 (2001)
Article Google Scholar
Uspensky, J.V.: Introduction to Mathematical Probability, pp. 23–24. McGraw- Hill, New York (1937)
MATH Google Scholar
Vandepoele, K., Saeys, Y., Simillion, C., Raes, J., Van De Peer, Y.: The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between arabidopsis and rice. Genome Res. 12(11), 1792–1801 (2002)
Article Google Scholar
Venter, J.C., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001)
Article Google Scholar
Vision, T.J., Brown, D.G., Tanksley, S.D.: The origins of genomic duplications in Arabidopsis. Science 290, 2114–2117 (2000)
Article Google Scholar
Wolfe, K.H., Shields, D.C.: Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997)
Article Google Scholar
Zheng, Y., Szustakowski, J.D., Fortnow, L., Roberts, R.J., Kasif, S.: Computational identification of operons in microbial genomes. Genome Res. 12(8), 1221–1230 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
Rose Hoberman
Department of Mathematics and Statistics, University of Ottawa, Ontario, Canada
David Sankoff
Departments of Biological Sciences and Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Dannie Durand

Authors

Rose Hoberman
View author publications
You can also search for this author in PubMed Google Scholar
David Sankoff
View author publications
You can also search for this author in PubMed Google Scholar
Dannie Durand
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

KTH, Royal Institute of Technology, Stockholm, Sweden
Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoberman, R., Sankoff, D., Durand, D. (2005). The Statistical Significance of Max-Gap Clusters. In: Lagergren, J. (eds) Comparative Genomics. RCG 2004. Lecture Notes in Computer Science(), vol 3388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32290-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-32290-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24455-4
Online ISBN: 978-3-540-32290-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics