Advertisement

Combinatorial Optimization Algorithms to Mine a Sub-Matrix of Maximal Sum

  • Vincent Branders
  • Pierre Schaus
  • Pierre Dupont
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10785)

Abstract

Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. [7] already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (LNS). In this work, we exhibit some key properties of this \(\mathcal {NP}\)-hard problem and define a bounding function such that larger problems can be solved in reasonable time. The use of these properties results in an improved CP-LNS implementation evaluated here. Two additional algorithms are also proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and a mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive.

References

  1. 1.
    Atzmueller, M.: Subgroup discovery. Wiley Interdiscipl. Rev. Data Mining Knowl. Discov. 5(1), 35–49 (2015)CrossRefGoogle Scholar
  2. 2.
    Bentley, J.: Programming pearls: algorithm design techniques. Commun. ACM 27(9), 865–873 (1984)CrossRefGoogle Scholar
  3. 3.
    Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, vol. 8, pp. 93–103 (2000)Google Scholar
  4. 4.
    Dawande, M., Keskinocak, P., Tayur, S.: On the biclique problem in bipartite graphs (1996)Google Scholar
  5. 5.
    Fanaee-T, H., Gama, J.: Eigenspace method for spatiotemporal hotspot detection. Expert Syst. 32(3), 454–464 (2015). eXSY-Nov-13-198.R1CrossRefGoogle Scholar
  6. 6.
    Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)CrossRefGoogle Scholar
  7. 7.
    Le Van, T., van Leeuwen, M., Nijssen, S., Fierro, A.C., Marchal, K., De Raedt, L.: Ranked tiling. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 98–113. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-662-44851-9_7 Google Scholar
  8. 8.
    López-Ibánez, M., Stützle, T.: Automatically improving the anytime behaviour of optimisation algorithms. Eur. J. Oper. Res. 235(3), 569–582 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 1(1), 24–45 (2004)CrossRefGoogle Scholar
  10. 10.
    Nemhauser, G.L., Wolsey, L.A.: Integer programming and combinatorial optimization. Wiley, Chichester (1988). Nemhauser, G.L., Savelsbergh, M.W.P., Sigismondi, G.S.: Constraint classification for mixed integer programming formulations. COAL Bull. 20, 8–12 (1992)Google Scholar
  11. 11.
    OscaR Team: OscaR: Scala in OR (2012). https://bitbucket.org/oscarlib/oscar
  12. 12.
    Parker, J.S., Mullins, M., Cheang, M.C., Leung, S., Voduc, D., Vickery, T., Davies, S., Fauron, C., He, X., Hu, Z., et al.: Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27(8), 1160–1167 (2009)CrossRefGoogle Scholar
  13. 13.
    Perou, C.M., Sørlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack, J.R., Ross, D.T., Johnsen, H., Akslen, L.A., et al.: Molecular portraits of human breast tumours. Nature 406(6797), 747–752 (2000)CrossRefGoogle Scholar
  14. 14.
    Pio, G., Ceci, M., D’Elia, D., Loglisci, C., Malerba, D.: A novel biclustering algorithm for the discovery of meaningful biological correlations between micrornas and their target genes. BMC Bioinform. 14(7), S8 (2013)CrossRefGoogle Scholar
  15. 15.
    Pio, G., Ceci, M., Malerba, D., D’Elia, D.: Comirnet: a web-based system for the analysis of mirna-gene regulatory networks. BMC Bioinform. 16(9), S7 (2015)CrossRefGoogle Scholar
  16. 16.
    Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)CrossRefGoogle Scholar
  17. 17.
    de Saint-Marcq, V.l.C., Schaus, P., Solnon, C., Lecoutre, C.: Sparse-sets for domain implementation. In: CP Workshop on Techniques foR Implementing Constraint programming Systems (TRICS), pp. 1–10 (2013)Google Scholar
  18. 18.
    Takaoka, T.: Efficient algorithms for the maximum subarray problem by distance matrix multiplication. Electron. Not. Theoret. Comput. Sci. 61, 191–200 (2002)CrossRefzbMATHGoogle Scholar
  19. 19.
    Tamaki, H., Tokuyama, T.: Algorithms for the maximum subarray problem based on matrix multiplication. In: SODA 1998, pp. 446–452 (1998)Google Scholar
  20. 20.
    Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering, pp. 321–327. IEEE (2003)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Vincent Branders
    • 1
  • Pierre Schaus
    • 1
  • Pierre Dupont
    • 1
  1. 1.ICTEAM/INGI, Machine Learning GroupUniversité catholique de LouvainLouvain-la-NeuveBelgium

Personalised recommendations