Advertisement

Mining a Maximum Weighted Set of Disjoint Submatrices

  • Vincent BrandersEmail author
  • Guillaume Derval
  • Pierre Schaus
  • Pierre Dupont
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11828)

Abstract

The objective of the maximum weighted set of disjoint submatrices problem is to discover K disjoint submatrices that together cover the largest sum of entries of an input matrix. It has many practical data-mining applications, as the related biclustering problem, such as gene module discovery in bioinformatics. It differs from the maximum-weighted submatrix coverage problem introduced in [6] by the explicit formulation of disjunction constraints: submatrices must not overlap. In other words, all matrix entries must be covered by at most one submatrix. The particular case of \(K=1\), called the maximal-sum submatrix problem, was successfully tackled with constraint programming in [5]. Unfortunately, the case of \(K > 1\) is more challenging to solve as the selection of rows cannot be decided in polynomial time solely from the selection of K sets of columns. It can be proved to be \(\mathcal {NP}\)-hard. We introduce a hybrid column generation approach using constraint programming to generate columns. It is compared to a standard mixed integer linear programming (MILP) through experiments on synthetic datasets. Overall, fast and valuable solutions are found by column generation while the MILP approach cannot handle a large number of variables and constraints.

Keywords

Constraint programming Maximum weighted submatrix Column generation Maximum weighted set of disjoint submatrices problem Bi-cliques Data-mining 

References

  1. 1.
    Gurobi Optimization, LLC (2018). http://www.gurobi.com
  2. 2.
    Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131(1), 195–220 (2012)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bentley, J.: Programming pearls: algorithm design techniques. Commun. ACM 27(9), 865–873 (1984)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Branders, V., Derval, G., Schaus, P., Dupont, P.: Dataset generator for Mining a maximum weighted set of disjoint submatrices, August 2019.  https://doi.org/10.5281/zenodo.3372282
  5. 5.
    Branders, V., Schaus, P., Dupont, P.: Combinatorial optimization algorithms to mine a sub-matrix of maximal sum. In: Appice, A., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2017. LNCS (LNAI), vol. 10785, pp. 65–79. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-78680-3_5CrossRefGoogle Scholar
  6. 6.
    Derval, G., Branders, V., Dupont, P., Schaus, P.: The maximum weighted submatrix coverage problem: a CP approach. In: Rousseau, L.-M., Stergiou, K. (eds.) CPAIOR 2019. LNCS, vol. 11494, pp. 258–274. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-19212-9_17CrossRefGoogle Scholar
  7. 7.
    Desaulniers, G., Desrosiers, J., Solomon, M.M.: Column Generation, vol. 5. Springer, Boston (2006).  https://doi.org/10.1007/b135457CrossRefzbMATHGoogle Scholar
  8. 8.
    Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990)zbMATHGoogle Scholar
  9. 9.
    Le Van, T., van Leeuwen, M., Nijssen, S., Fierro, A.C., Marchal, K., De Raedt, L.: Ranked tiling. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 98–113. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-662-44851-9_7CrossRefGoogle Scholar
  10. 10.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 1(1), 24–45 (2004)CrossRefGoogle Scholar
  11. 11.
    Michel, L., Schaus, P., Van Hentenryck, P.: MiniCP: a lightweight solver for constraint programming (2018). https://minicp.bitbucket.io
  12. 12.
    OscaR Team: OscaR: Scala in OR (2012). https://bitbucket.org/oscarlib/oscar
  13. 13.
    Savelsbergh, M.: A branch-and-price algorithm for the generalized assignment problem. Oper. Res. 45(6), 831–841 (1997)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Takaoka, T.: Efficient algorithms for the maximum subarray problem by distance matrix multiplication. Electron. Notes Theoret. Comput. Sci. 61, 191–200 (2002)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.UCLouvain - ICTEAM/INGILouvain-la-NeuveBelgium

Personalised recommendations