Cluster Editing

  • Sebastian Böcker
  • Jan Baumbach
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7921)

Abstract

The Cluster Editing problem asks to transform a graph into a disjoint union of cliques using a minimum number of edge modifications. Although the problem has been proven NP-complete several times, it has nevertheless attracted much research both from the theoretical and the applied side. The problem has been the inspiration for numerous algorithms in bioinformatics, aiming at clustering entities such as genes, proteins, phenotypes, or patients. In this paper, we review exact and heuristic methods that have been proposed for the Cluster Editing problem, and also applications of these algorithms for biological problems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sharan, R., Maron-Katz, A., Shamir, R.: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics 19(14), 1787–1799 (2003)CrossRefGoogle Scholar
  2. 2.
    Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. J. Comput. Biol. 6(3-4), 281–297 (1999)CrossRefGoogle Scholar
  3. 3.
    Hartuv, E., Schmitt, A.O., Lange, J., Meier-Ewert, S., Lehrach, H., Shamir, R.: An algorithm for clustering cDNA fingerprints. Genomics 66(3), 249–256 (2000)CrossRefGoogle Scholar
  4. 4.
    Zahn Jr., C.T.: Approximating symmetric relations by equivalence relations. J. Soc. Indust. Appl. Math. 12(4), 840–847 (1964)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Moon, J.W.: A note on approximating symmetric relations by equivalence classes. Siam J. Appl. Math. 14(2), 226–227 (1966)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Rahmann, S., Wittkop, T., Baumbach, J., Martin, M., Truss, A., Böcker, S.: Exact and heuristic algorithms for weighted cluster editing. In: Proc. of Computational Systems Bioinformatics (CSB 2007), vol. 6, pp. 391–401 (2007)Google Scholar
  7. 7.
    Grötschel, M., Wakabayashi, Y.: A cutting plane algorithm for a clustering problem. Math. Program. 45, 52–96 (1989)CrossRefGoogle Scholar
  8. 8.
    Kochenberger, G.A., Glover, F., Alidaee, B., Wang, H.: Clustering of microarray data via clique partitioning. J. Comb. Optim. 10(1), 77–92 (2005)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Delvaux, S., Horsten, L.: On best transitive approximations to simple graphs. Acta. Inform. 40(9), 637–655 (2004)MathSciNetMATHCrossRefGoogle Scholar
  10. 10.
    Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discrete Appl. Math. 144(1-2), 173–182 (2004)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1), 89–113 (2004)MATHCrossRefGoogle Scholar
  12. 12.
    Křivánek, M., Morávek, J.: NP-hard problems in hierarchical-tree clustering. Acta Inform. 23(3), 311–323 (1986)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Komusiewicz, C., Uhlmann, J.: Cluster editing with locally bounded modifications. Discrete Appl. Math. 160(15), 2259–2270 (2012)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001)MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Mannaa, B.: Cluster editing problem for points on the real line: A polynomial time algorithm. Inform. Process. Lett. 110, 961–965 (2010)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Böcker, S.: A golden ratio parameterized algorithm for cluster editing. J. Discrete Algorithms 16, 79–89 (2012)MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, Berlin (1999)CrossRefGoogle Scholar
  18. 18.
    Cao, Y., Chen, J.: Cluster editing: Kernelization based on edge cuts. Algorithmica 64(1), 152–169 (2012)MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Fomin, F.V., Kratsch, S., Pilipczuk, M., Pilipczuk, M., Villanger, Y.: Tight bounds for parameterized complexity of cluster editing. In: Proc. of Symposium on Theoretical Aspects of Computer Science (STACS 2013). LIPIcs, vol. 20, pp. 32–43. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2013)Google Scholar
  20. 20.
    Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. System Sci. 71(3), 360–383 (2005)MathSciNetMATHCrossRefGoogle Scholar
  21. 21.
    Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: Ranking and clustering. J. ACM 55(5), Article 23 (2008)Google Scholar
  22. 22.
    van Zuylen, A., Williamson, D.P.: Deterministic pivoting algorithms for constrained ranking and clustering problems. Math. Oper. Res. 34(3), 594–620 (2009)MathSciNetMATHCrossRefGoogle Scholar
  23. 23.
    Vescia, G.: Descriptive classification of cetacea: whales, porpoises and dolphins. In: Marcotorchino, J., Proth, J.M., Janssen, J. (eds.) Data Analysis in Real Life Environment: Ins and Outs of Solving Problems, pp. 7–14. Elsevier Science, North-Holland, Amsterdam (1985)Google Scholar
  24. 24.
    Vescia, G.: Automatic classification of cetaceans by similarity aggregation. In: Marcotorchino, J., Proth, J.M., Janssen, J. (eds.) Data Analysis in Real Life Environment: Ins and Outs of Solving Problems, pp. 15–24. Elsevier Science, North-Holland, Amsterdam (1985)Google Scholar
  25. 25.
    Marcotorchino, J., Michaud, P.: Heuristic approach to the similarity aggregation problem. Methods of Operations Research 43, 395–404 (1981)MATHGoogle Scholar
  26. 26.
    Marcotorchino, J., Michaud, P.: Optimization in exploratory data analysis. In: Proc. of Symposium on Operations Research, Köln, Germany. Physica Verlag (1981)Google Scholar
  27. 27.
    Schader, M., Tüshaus, U.: Ein Subgradientenverfahren zur Klassifikation qualitativer Daten. Operations Research Spektrum 7, 1–5 (1985)MATHCrossRefGoogle Scholar
  28. 28.
    Böcker, S., Briesemeister, S., Klau, G.W.: Exact algorithms for cluster editing: Evaluation and experiments. Algorithmica 60(2), 316–334 (2011)MathSciNetMATHCrossRefGoogle Scholar
  29. 29.
    Böcker, S., Briesemeister, S., Klau, G.W.: Exact algorithms for cluster editing: Evaluation and experiments. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 289–302. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  30. 30.
    Gramm, J., Guo, J., Hüffner, F., Niedermeier, R.: Graph-modeled data clustering: Fixed-parameter algorithms for clique generation. Theory Comput. Syst. 38(4), 373–392 (2005)MathSciNetMATHCrossRefGoogle Scholar
  31. 31.
    Gramm, J., Guo, J., Hüffner, F., Niedermeier, R.: Automated generation of search tree algorithms for hard graph modification problems. Algorithmica 39(4), 321–347 (2004)MathSciNetMATHCrossRefGoogle Scholar
  32. 32.
    Böcker, S., Briesemeister, S., Bui, Q.B.A., Truss, A.: Going weighted: Parameterized algorithms for cluster editing. Theor. Comput. Sci. 410(52), 5467–5480 (2009)MATHCrossRefGoogle Scholar
  33. 33.
    Böcker, S., Damaschke, P.: Even faster parameterized cluster deletion and cluster editing. Inform. Process. Lett. 111(14), 717–721 (2011)MathSciNetMATHCrossRefGoogle Scholar
  34. 34.
    Protti, F., da Silva, M.D., Szwarcfiter, J.L.: Applying modular decomposition to parameterized cluster editing problems. Theory Comput. Syst. 44(1), 91–104 (2009)MathSciNetMATHCrossRefGoogle Scholar
  35. 35.
    Fellows, M.R.: The lost continent of polynomial time: Preprocessing and kernelization. In: Bodlaender, H.L., Langston, M.A. (eds.) IWPEC 2006. LNCS, vol. 4169, pp. 276–277. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  36. 36.
    Fellows, M.R., Langston, M.A., Rosamond, F.A., Shaw, P.: Efficient parameterized preprocessing for cluster editing. In: Csuhaj-Varjú, E., Ésik, Z. (eds.) FCT 2007. LNCS, vol. 4639, pp. 312–321. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  37. 37.
    Guo, J.: A more effective linear kernelization for cluster editing. Theor. Comput. Sci. 410(8-10), 718–726 (2009)MATHCrossRefGoogle Scholar
  38. 38.
    Chen, J., Meng, J.: A 2k kernel for the cluster editing problem. J. Comput. Syst. Sci. 78(1), 211–220 (2012)MathSciNetMATHCrossRefGoogle Scholar
  39. 39.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  40. 40.
    Wittkop, T., Baumbach, J., Lobo, F.P., Rahmann, S.: Large scale clustering of protein sequences with FORCE — a layout based heuristic for weighted cluster editing. BMC Bioinformatics 8, 396 (2007)CrossRefGoogle Scholar
  41. 41.
    Wittkop, T., Emig, D., Lange, S., Rahmann, S., Albrecht, M., Morris, J.H., Böcker, S., Stoye, J., Baumbach, J.: Partitioning biological data with transitivity clustering. Nat. Methods 7(6), 419–420 (2010)CrossRefGoogle Scholar
  42. 42.
    Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., Ideker, T.: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3), 431–432 (2011)CrossRefGoogle Scholar
  43. 43.
    Morris, J.H., Apeltsin, L., Newman, A.M., Baumbach, J., Wittkop, T., Su, G., Bader, G.D., Ferrin, T.E.: clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12, 436 (2011)CrossRefGoogle Scholar
  44. 44.
    Cerdeira, L.T., Carneiro, A.R., Ramos, R.T.J., de Almeida, S.S., D’Afonseca, V., Schneider, M.P.C., Baumbach, J., Tauch, A., McCulloch, J.A., Azevedo, V.A.C., Silva, A.: Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study. J. Microbiol. Methods 86(2), 218–223 (2011)CrossRefGoogle Scholar
  45. 45.
    Baumbach, J., Tauch, A., Rahmann, S.: Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks. Brief Bioinform. 10(1), 75–83 (2009)CrossRefGoogle Scholar
  46. 46.
    Baumbach, J.: On the power and limits of evolutionary conservation–unraveling bacterial gene regulatory networks. Nucleic Acids Res. 38(22), 7877–7884 (2010)CrossRefGoogle Scholar
  47. 47.
    Röttger, R., Kalaghatgi, P., Sun, P., Soares, S.C., Azevedo, V., Wittkop, T., Baumbach, J.: Density parameter estimation for finding clusters of homologous proteins–tracing actinobacterial pathogenicity lifestyles. Bioinformatics 29(2), 215–222 (2013)CrossRefGoogle Scholar
  48. 48.
    Röttger, R., Kreutzer, C., Vu, T.D., Wittkop, T., Baumbach, J.: Online transitivity clustering of biological data with missing values. In: Proc. of German Conference on Bioinformatics (GCB 2012), pp. 57–68 (2012)Google Scholar
  49. 49.
    Sakai, S., Takaki, Y., Shimamura, S., Sekine, M., Tajima, T., Kosugi, H., Ichikawa, N., Tasumi, E., Hiraki, A.T., Shimizu, A., Kato, Y., Nishiko, R., Mori, K., Fujita, N., Imachi, H., Takai, K.: Genome sequence of a mesophilic hydrogenotrophic methanogen methanocella paludicola, the first cultivated representative of the order methanocellales. PLoS One 6(7), e22898 (2011)Google Scholar
  50. 50.
    Jochmann, N., Kurze, A.K., Czaja, L.F., Brinkrolf, K., Brune, I., Hüser, A.T., Hansmeier, N., Pühler, A., Borovok, I., Tauch, A.: Genetic makeup of the Corynebacterium glutamicum LexA regulon deduced from comparative transcriptomics and in vitro DNA band shift assays. Microbiology 155(pt. 5), 1459–1477 (2009)CrossRefGoogle Scholar
  51. 51.
    Baumbach, J., Rahmann, S., Tauch, A.: Reliable transfer of transcriptional gene regulatory networks between taxonomically related organisms. BMC Syst. Biol. 3, 8 (2009)CrossRefGoogle Scholar
  52. 52.
    Pauling, J., Röttger, R., Tauch, A., Azevedo, V., Baumbach, J.: CoryneRegNet 6.0—updated database content, new analysis methods and novel features focusing on community demands. Nucleic Acids Res. 40(Database issue), D610–D614 (2012)CrossRefGoogle Scholar
  53. 53.
    Pauling, J., Röttger, R., Neuner, A., Salgado, H., Collado-Vides, J., Kalaghatgi, P., Azevedo, V., Tauch, A., Pühler, A., Baumbach, J.: On the trail of EHEC/EAEC—unraveling the gene regulatory networks of human pathogenic Escherichia coli bacteria. Integr. Biol. (Camb.) 4(7), 728–733 (2012)CrossRefGoogle Scholar
  54. 54.
    Wittkop, T., Emig, D., Truss, A., Albrecht, M., Böcker, S., Baumbach, J.: Comprehensive cluster analysis with transitivity clustering. Nat. Protocols 6, 285–295 (2011)CrossRefGoogle Scholar
  55. 55.
    Hauschild, A.C., Schneider, T., Pauling, J., Rupp, K., Jang, M., Baumbach, J., Baumbach, J.: Computational methods for metabolomics data analysis of ion mobility spectrometry data — reviewing the state of the art. Metabolites 2(4), 733–755 (2012)CrossRefGoogle Scholar
  56. 56.
    Wittkop, T., Rahmann, S., Böcker, S., Baumbach, J.: Extension and robustness of transitivity clustering for protein-protein interaction network analysis. Internet Math. 7(4), 255–273 (2011)MathSciNetMATHCrossRefGoogle Scholar
  57. 57.
    Robertson, A.L., Bate, M.A., Buckle, A.M., Bottomley, S.P.: The rate of polyQ-mediated aggregation is dramatically affected by the number and location of surrounding domains. J. Mol. Biol. 413(4), 879–887 (2011)CrossRefGoogle Scholar
  58. 58.
    Pacheco, L.G.C., Slade, S.E., Seyffert, N., Santos, A.R., Castro, T.L.P., Silva, W.M., Santos, A.V., Santos, S.G., Farias, L.M., Carvalho, M.A.R., Pimenta, A.M.C., Meyer, R., Silva, A., Scrivens, J.H., Oliveira, S.C., Miyoshi, A., Dowson, C.G., Azevedo, V.: A combined approach for comparative exoproteome analysis of Corynebacterium pseudotuberculosis. BMC Microbiol. 11(1), 12 (2011)CrossRefGoogle Scholar
  59. 59.
    Wittkop, T., Berman, A.E., Fleisch, K.M., Mooney, S.D.: DEFOG: discrete enrichment of functionally organized genes. Integr. Biol (Camb) 4(7), 795–804 (2012)CrossRefGoogle Scholar
  60. 60.
    Marx, D., Razgon, I.: Fixed-parameter tractability of multicut parameterized by the size of the cutset. In: Proc. of ACM Symposium on Theory of Computing (STOC 2011), pp. 469–478. ACM press, New York (2011)Google Scholar
  61. 61.
    Bousquet, N., Daligault, J., Thomassé, S.: Multicut is FPT. In: Proc. of ACM Symposium on Theory of Computing (STOC 2011), pp. 459–468. ACM press, New York (2011)Google Scholar
  62. 62.
    Damaschke, P.: Fixed-parameter enumerability of cluster editing and related problems. Theory Comput. Syst. 46(2), 261–283 (2010)MathSciNetMATHCrossRefGoogle Scholar
  63. 63.
    Komusiewicz, C., Uhlmann, J.: Alternative parameterizations for cluster editing. In: Černá, I., Gyimóthy, T., Hromkovič, J., Jefferey, K., Králović, R., Vukolić, M., Wolf, S. (eds.) SOFSEM 2011. LNCS, vol. 6543, pp. 344–355. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  64. 64.
    Damaschke, P.: Bounded-degree techniques accelerate some parameterized graph algorithms. In: Chen, J., Fomin, F.V. (eds.) IWPEC 2009. LNCS, vol. 5917, pp. 98–109. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  65. 65.
    Weller, M., Komusiewicz, C., Niedermeier, R., Uhlmann, J.: On making directed graphs transitive. J. Comput. Syst. Sci. 78(2), 559–574 (2012)MathSciNetMATHCrossRefGoogle Scholar
  66. 66.
    Böcker, S., Briesemeister, S., Klau, G.W.: On optimal comparability editing with applications to molecular diagnostics. BMC Bioinformatics 10(suppl. 1), S61 (2009); Proc. of Asia-Pacific Bioinformatics Conference (APBC 2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Sebastian Böcker
    • 1
  • Jan Baumbach
    • 2
  1. 1.Chair for BioinformaticsFriedrich-Schiller-UniversityJenaGermany
  2. 2.Computational Biology Research Group, Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark

Personalised recommendations