Advertisement

Mining a Maximum Weighted Set of Disjoint Submatrices

  • Vincent BrandersEmail author
  • Guillaume Derval
  • Pierre Schaus
  • Pierre Dupont
Conference paper
  • 590 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11828)

Abstract

The objective of the maximum weighted set of disjoint submatrices problem is to discover K disjoint submatrices that together cover the largest sum of entries of an input matrix. It has many practical data-mining applications, as the related biclustering problem, such as gene module discovery in bioinformatics. It differs from the maximum-weighted submatrix coverage problem introduced in [6] by the explicit formulation of disjunction constraints: submatrices must not overlap. In other words, all matrix entries must be covered by at most one submatrix. The particular case of \(K=1\), called the maximal-sum submatrix problem, was successfully tackled with constraint programming in [5]. Unfortunately, the case of \(K > 1\) is more challenging to solve as the selection of rows cannot be decided in polynomial time solely from the selection of K sets of columns. It can be proved to be \(\mathcal {NP}\)-hard. We introduce a hybrid column generation approach using constraint programming to generate columns. It is compared to a standard mixed integer linear programming (MILP) through experiments on synthetic datasets. Overall, fast and valuable solutions are found by column generation while the MILP approach cannot handle a large number of variables and constraints.

Keywords

Constraint programming Maximum weighted submatrix Column generation Maximum weighted set of disjoint submatrices problem Bi-cliques Data-mining 

References

  1. 1.
    Gurobi Optimization, LLC (2018). http://www.gurobi.com
  2. 2.
    Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131(1), 195–220 (2012)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bentley, J.: Programming pearls: algorithm design techniques. Commun. ACM 27(9), 865–873 (1984)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Branders, V., Derval, G., Schaus, P., Dupont, P.: Dataset generator for Mining a maximum weighted set of disjoint submatrices, August 2019.  https://doi.org/10.5281/zenodo.3372282
  5. 5.
    Branders, V., Schaus, P., Dupont, P.: Combinatorial optimization algorithms to mine a sub-matrix of maximal sum. In: Appice, A., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2017. LNCS (LNAI), vol. 10785, pp. 65–79. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-78680-3_5CrossRefGoogle Scholar
  6. 6.
    Derval, G., Branders, V., Dupont, P., Schaus, P.: The maximum weighted submatrix coverage problem: a CP approach. In: Rousseau, L.-M., Stergiou, K. (eds.) CPAIOR 2019. LNCS, vol. 11494, pp. 258–274. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-19212-9_17CrossRefGoogle Scholar
  7. 7.
    Desaulniers, G., Desrosiers, J., Solomon, M.M.: Column Generation, vol. 5. Springer, Boston (2006).  https://doi.org/10.1007/b135457CrossRefzbMATHGoogle Scholar
  8. 8.
    Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990)zbMATHGoogle Scholar
  9. 9.
    Le Van, T., van Leeuwen, M., Nijssen, S., Fierro, A.C., Marchal, K., De Raedt, L.: Ranked tiling. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 98–113. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-662-44851-9_7CrossRefGoogle Scholar
  10. 10.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 1(1), 24–45 (2004)CrossRefGoogle Scholar
  11. 11.
    Michel, L., Schaus, P., Van Hentenryck, P.: MiniCP: a lightweight solver for constraint programming (2018). https://minicp.bitbucket.io
  12. 12.
    OscaR Team: OscaR: Scala in OR (2012). https://bitbucket.org/oscarlib/oscar
  13. 13.
    Savelsbergh, M.: A branch-and-price algorithm for the generalized assignment problem. Oper. Res. 45(6), 831–841 (1997)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Takaoka, T.: Efficient algorithms for the maximum subarray problem by distance matrix multiplication. Electron. Notes Theoret. Comput. Sci. 61, 191–200 (2002)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.UCLouvain - ICTEAM/INGILouvain-la-NeuveBelgium

Personalised recommendations