Skip to main content

Combinatorial Optimization Algorithms to Mine a Sub-Matrix of Maximal Sum

  • Conference paper
  • First Online:
New Frontiers in Mining Complex Patterns (NFMCP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10785))

Included in the following conference series:

Abstract

Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. [7] already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (LNS). In this work, we exhibit some key properties of this \(\mathcal {NP}\)-hard problem and define a bounding function such that larger problems can be solved in reasonable time. The use of these properties results in an improved CP-LNS implementation evaluated here. Two additional algorithms are also proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and a mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Essentially by considering the rows and columns of the matrix as the two sets of nodes of a bipartite graph.

References

  1. Atzmueller, M.: Subgroup discovery. Wiley Interdiscipl. Rev. Data Mining Knowl. Discov. 5(1), 35–49 (2015)

    Article  Google Scholar 

  2. Bentley, J.: Programming pearls: algorithm design techniques. Commun. ACM 27(9), 865–873 (1984)

    Article  Google Scholar 

  3. Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, vol. 8, pp. 93–103 (2000)

    Google Scholar 

  4. Dawande, M., Keskinocak, P., Tayur, S.: On the biclique problem in bipartite graphs (1996)

    Google Scholar 

  5. Fanaee-T, H., Gama, J.: Eigenspace method for spatiotemporal hotspot detection. Expert Syst. 32(3), 454–464 (2015). eXSY-Nov-13-198.R1

    Article  Google Scholar 

  6. Herrera, F., Carmona, C.J., González, P., del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)

    Article  Google Scholar 

  7. Le Van, T., van Leeuwen, M., Nijssen, S., Fierro, A.C., Marchal, K., De Raedt, L.: Ranked tiling. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 98–113. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_7

    Google Scholar 

  8. López-Ibánez, M., Stützle, T.: Automatically improving the anytime behaviour of optimisation algorithms. Eur. J. Oper. Res. 235(3), 569–582 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 1(1), 24–45 (2004)

    Article  Google Scholar 

  10. Nemhauser, G.L., Wolsey, L.A.: Integer programming and combinatorial optimization. Wiley, Chichester (1988). Nemhauser, G.L., Savelsbergh, M.W.P., Sigismondi, G.S.: Constraint classification for mixed integer programming formulations. COAL Bull. 20, 8–12 (1992)

    Google Scholar 

  11. OscaR Team: OscaR: Scala in OR (2012). https://bitbucket.org/oscarlib/oscar

  12. Parker, J.S., Mullins, M., Cheang, M.C., Leung, S., Voduc, D., Vickery, T., Davies, S., Fauron, C., He, X., Hu, Z., et al.: Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27(8), 1160–1167 (2009)

    Article  Google Scholar 

  13. Perou, C.M., Sørlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack, J.R., Ross, D.T., Johnsen, H., Akslen, L.A., et al.: Molecular portraits of human breast tumours. Nature 406(6797), 747–752 (2000)

    Article  Google Scholar 

  14. Pio, G., Ceci, M., D’Elia, D., Loglisci, C., Malerba, D.: A novel biclustering algorithm for the discovery of meaningful biological correlations between micrornas and their target genes. BMC Bioinform. 14(7), S8 (2013)

    Article  Google Scholar 

  15. Pio, G., Ceci, M., Malerba, D., D’Elia, D.: Comirnet: a web-based system for the analysis of mirna-gene regulatory networks. BMC Bioinform. 16(9), S7 (2015)

    Article  Google Scholar 

  16. Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)

    Article  Google Scholar 

  17. de Saint-Marcq, V.l.C., Schaus, P., Solnon, C., Lecoutre, C.: Sparse-sets for domain implementation. In: CP Workshop on Techniques foR Implementing Constraint programming Systems (TRICS), pp. 1–10 (2013)

    Google Scholar 

  18. Takaoka, T.: Efficient algorithms for the maximum subarray problem by distance matrix multiplication. Electron. Not. Theoret. Comput. Sci. 61, 191–200 (2002)

    Article  MATH  Google Scholar 

  19. Tamaki, H., Tokuyama, T.: Algorithms for the maximum subarray problem based on matrix multiplication. In: SODA 1998, pp. 446–452 (1998)

    Google Scholar 

  20. Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Proceedings of the Third IEEE Symposium on Bioinformatics and Bioengineering, pp. 321–327. IEEE (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Branders .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Branders, V., Schaus, P., Dupont, P. (2018). Combinatorial Optimization Algorithms to Mine a Sub-Matrix of Maximal Sum. In: Appice, A., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2017. Lecture Notes in Computer Science(), vol 10785. Springer, Cham. https://doi.org/10.1007/978-3-319-78680-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78680-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78679-7

  • Online ISBN: 978-3-319-78680-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics