Chapter

Data Mining and Knowledge Discovery for Big Data

Volume 1 of the series Studies in Big Data pp 281-303

A Clustering Approach to Constrained Binary Matrix Factorization

  • Peng JiangAffiliated withDepartment of Computer Science, University of Illinois at Urbana-Champaign Email author 
  • , Jiming PengAffiliated withDepartment of ISE, University of Illinois at Urbana-Champaign
  • , Michael HeathAffiliated withDepartment of Computer Science, University of Illinois at Urbana-Champaign
  • , Rui YangAffiliated withDepartment of ISE, University of Illinois at Urbana-Champaign

* Final gross prices may vary according to local VAT.

Get Access

Abstract

In general, binary matrix factorization (BMF) refers to the problem of finding two binary matrices of low rank such that the difference between their matrix product and a given binary matrix is minimal. BMF has served as an important tool in dimension reduction for high-dimensional data sets with binary attributes and has been successfully employed in numerous applications. In the existing literature on BMF, the matrix product is not required to be binary. We call this unconstrained BMF (UBMF) and similarly constrained BMF (CBMF) if the matrix product is required to be binary. In this paper, we first introduce two specific variants of CBMF and discuss their relation to other dimensional reduction models such as UBMF. Then we propose alternating update procedures for CBMF. In every iteration of the proposed procedure, we solve a specific binary linear programming (BLP) problem to update the involved matrix argument. We explore the relationship between the BLP subproblem and clustering to develop an effective 2- approximation algorithm for CBMF when the underlying matrix has very low rank. The proposed algorithm can also provide a 2-approximation to rank-1 UBMF. We also develop a randomized algorithm for CBMF and estimate the approximation ratio of the solution obtained. Numerical experiments show that the proposed algorithm for UBMF finds better solutions in less CPU time than several other algorithms in the literature, and the solution obtained from CBMF is very close to that of UBMF.

Keywords

Binary matrix factorization binary quadratic programming kmeans clustering approximation algorithm