Integer Linear Programs for Discovering Approximate Gene Clusters

  • Sven Rahmann
  • Gunnar W. Klau
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4175)


We contribute to the discussion about the concept of approximate conserved gene clusters by presenting a class of definitions that (1) can be written as integer linear programs (ILPs) and (2) allow several variations that include existing definitions such as common intervals, r-windows, and max-gap clusters or gene teams. While the ILP formulation does not directly lead to optimal algorithms, it provides unprecedented generality and is competitive in practice for those cases where efficient algorithms are known. It allows for the first time a non-heuristic study of large approximate clusters in several genomes. Source code and datasets are available at .


Gene Cluster Gene Content Integer Linear Program Target Function Genome Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Snel, B., Bork, P., Huynen, M.A.: The identification of functional modules from the genomic association of genes. Proc. Natl. Acad. Sci. USA 99, 5890–5895 (2002)CrossRefGoogle Scholar
  2. 2.
    Hoberman, R., Durand, D.: The incompatible desiderata of gene cluster properties. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS, vol. 3678, pp. 73–87. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Wolsey, L.A.: Integer programming. Wiley Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons, Chichester (1998)Google Scholar
  4. 4.
    Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 252–263. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Schmidt, T., Stoye, J.: Quadratic time algorithms for finding common intervals in two and more sequences. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 347–358. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 464–476. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Li, Q., Lee, B.T.K., Zhang, L.: Genome-scale analysis of positional clustering of mouse testis-specific genes. BMC Genomics 6, 7 (2005)CrossRefGoogle Scholar
  8. 8.
    Durand, D., Sankoff, D.: Tests for gene clustering. J. Comput. Biol. 10, 453–482 (2003)CrossRefGoogle Scholar
  9. 9.
    Chauve, C., Diekmann, Y., Heber, S., Mixtacki, J., Rahmann, S., Stoye, J.: On common intervals with errors. Technical Report 2006-02, Abteilung Informationstechnik, Technische Fakultät, Universität Bielefeld (2006) ISSN 0946-7831Google Scholar
  10. 10.
    ILOG, Inc.: CPLEX (1987–2006),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sven Rahmann
    • 1
    • 2
  • Gunnar W. Klau
    • 3
    • 4
  1. 1.Algorithms and Statistics for Systems Biology group, Genome Informatics, Technische FakultätBielefeld UniversityBielefeldGermany
  2. 2.International NRW Graduate School in Bioinformatics and Genome Research 
  3. 3.Mathematics in Life Sciences group, Dept. of Mathematics and Computer ScienceFree University BerlinBerlinGermany
  4. 4.DFG Research Center Matheon “Mathematics for key technologies”Berlin

Personalised recommendations