Matrix Completion Under Interval Uncertainty: Highlights

Marecek, Jakub; Richtarik, Peter; Takac, Martin

doi:10.1007/978-3-030-10997-4_38

Jakub Marecek²⁰,
Peter Richtarik^21,22 &
Martin Takac²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11053))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2736 Accesses

Abstract

We present an overview of inequality-constrained matrix completion, with a particular focus on alternating least-squares (ALS) methods. The simple and seemingly obvious addition of inequality constraints to matrix completion seems to improve the statistical performance of matrix completion in a number of applications, such as collaborative filtering under interval uncertainty, robust statistics, event detection, and background modelling in computer vision. An ALS algorithm MACO by Marecek et al. outperforms others, including Sparkler, the implementation of Li et al. Code related to this paper is available at: http://optml.github.io/ac-dc/.

You have full access to this open access chapter, Download conference paper PDF

Matrix completion with sparse measurement errors

Article 16 January 2023

Matrix completion with ε-algorithm

Article 06 August 2018

Noisy Matrix Completion Using Alternating Minimization

1 Introduction

Matrix completion is a well-known problem: Given dimensions of a matrix X and some of its elements $X_{i,j}, (i, j) \in \mathcal {E}$, the goal is to find the remaining elements. Without imposing any further requirements on X, there are infinitely many solutions. In many applications, however, the matrix completion that minimizes the rank:

$$\begin{aligned} \text {min}_Y\, \text {rank}(Y),\,\, \text {subject to } Y_{i,j} = X_{i,j}, (i,j)\in \mathcal {E}, \end{aligned}$$

(1)

often works as well as the best known solvers for problems in the particular domain. There are literally hundreds of applications of matrix completion, especially in recommender systems [3], where the matrix is composed of ratings, with a row per user and column per product.

Two major challenges remain. The first challenge is related to data quality: when a large proportion of data is missing and one uses matrix completion for data imputation, it may be worth asking whether the remainder data is truly known exactly. The second challenge is related to the rate of convergence and run-time to a fixed precision: many solvers still require hundreds or thousands of CPU-hours to complete a $480189\,\times \,17770$ matrix reasonably well.

The first challenge has been recently addressed [8] by considering a variant of the problem with explicit uncertainty set around each “supposedly known” value. Formally, let X be an $m \times n$ matrix to be reconstructed. Assume that elements $(i,j)\in \mathcal {E}$ of X we wish to fix, for elements $(i,j)\in \mathcal {L}$ we have lower bounds and for elements $(i,j) \in \mathcal {U}$ we have upper bounds. The variant [8] is:

$$\begin{aligned} {\begin{matrix} \mathop {\min }\limits _{X \in \mathbb {R}^{m \times n}} \quad &{} \quad \text {rank}(X)\\ \text {subject to } \quad &{} X_{ij} = X^{\mathcal {E}}_{ij}, (i,j) \in \mathcal {E}\\ &{} X_{ij} \ge X^{\mathcal {L}}_{ij}, (i,j) \in \mathcal {L}\\ &{} X_{ij} \le X^{\mathcal {U}}_{ij}, (i,j) \in \mathcal {U}.\\ \end{matrix}} \end{aligned}$$

(2)

We refer to [8] for the discussion of the superior statistical performance.

2 An Algorithm

The second challenge can be addressed using the observation that a rank-r X is a product of two matrices, $X=L R$, where $L \in \mathbb {R}^{m \times r}$ and $R\in \mathbb {R}^{r \times n}$. Let $L_{i:}$ and $R_{:j}$ be the i-th row and j-h column of L and R, respectively. Instead of (2), we shall consider the smooth, non-convex problem

$$\begin{aligned} \min \{f(L,R)\;:\; L\in \mathbb {R}^{m\times r}, \; R\in \mathbb {R}^{r\times n}\}, \end{aligned}$$

(3)

where

$$\begin{aligned} f(L,R) := \tfrac{\mu }{2}\Vert L\Vert _{F}^2&+ \tfrac{\mu }{2}\Vert R\Vert _{F}^2 \\&+ f_{\mathcal {E}}(L,R) + f_{\mathcal {L}}(L,R) + f_{\mathcal {U}}(L,R), \end{aligned}$$

$$\begin{aligned} f_{\mathcal {E}}(L,R) := {\tfrac{1}{2}{\sum }_{(ij)\in \mathcal {E}}(L_{i:}R_{:j}-X^{\mathcal {E}}_{ij})^2}\\ f_{\mathcal {L}}(L,R) := {\tfrac{1}{2}{\sum }_{(ij)\in \mathcal {L}}(X^{\mathcal {L}}_{ij}-L_{i:}R_{:j})_+^2}\\ f_{\mathcal {U}}(L,R) := {\tfrac{1}{2}{\sum }_{(ij)\in \mathcal {U}}(L_{i:}R_{:j}-X^{\mathcal {U}}_{ij})_+^2} \end{aligned}$$

and $\xi _+ = \max \{0,\xi \}$. The parameter $\mu $ helps to prevent scaling issues^{Footnote 1}. We could optionally set $\mu $ to zero and then from time to time rescale matrices L and R, so that their product is not changed. The term $f_\mathcal {E}$ (resp. $f_\mathcal {U}$, $f_\mathcal {L}$) encourages the equality (resp. inequality) constraints to hold.

Subsequently, we can apply an alternating parallel coordinate descent method called MACO in [8]. This is based on the observation that although f is not convex jointly in (L, R), it is convex in L for fixed R and in L for fixed R. We can hence alternate between fixing R, choosing $\hat{r}$ and $\hat{S}$ of rows of L uniformly at random, updating $L_{i\hat{r}} \leftarrow L_{i\hat{r}} + \delta _{i\hat{r}}$ in parallel for $i \in \hat{S}$, and the respective steps for L. Further, notice that if we fix $i\in \{1,2,\dots ,m\}$ and $\hat{r}\in \{1,2,\dots ,r\}$, and view f as a function of $L_{i\hat{r}}$ only, it has a Lipschitz continuous gradient with constant $ W_{i\hat{r}}^{\mathcal {L}} = \mu + \sum _{v \;:\;(i v) \in \mathcal {E}} R_{\hat{r}v}^2 + \sum _{v \;:\;(i v) \in \mathcal {L}\cup \mathcal {U}} R_{\hat{r}v} ^2.$ That is, for all L, R and $\delta \in \mathbb {R}$, we have $f(L+\delta E_{i\hat{r}},R) \le f(L,R) + \langle \nabla _L f(L,R), E_{i\hat{r}}\rangle \delta + \tfrac{W_{i\hat{r}}^{\mathcal {L}}}{2}\delta ^2,$ where E is the $n\times r$ matrix with 1 in the $(i \hat{r})$ entry and zeros elsewhere. Likewise, one can define V for $R_{\hat{r}j}$. The minimizer of the right hand side of the bound on $f(L+\delta E_{i\hat{r}},R)$ is hence

$$\begin{aligned} \delta _{i\hat{r}}:= -\tfrac{1}{W_{i\hat{r}}^{\mathcal {L}}} \langle \nabla _L f(L,R), E_{i\hat{r}}\rangle , \end{aligned}$$

(4)

where $\langle \nabla _L f(L,R), E_{i\hat{r}}\rangle $ equals

$$ \begin{aligned}\begin{gathered} \textstyle {\mu L_{i \hat{r}} + \sum _{v \;:\;(i v) \in \mathcal {E}} (L_{i:} R_{:v} - X^\mathcal {E}_{iv}) R_{\hat{r}v} } \\ \nonumber \quad \textstyle {+\sum _{v \;:\;(i v) \in \mathcal {U}\; \& \; L_{i:} R_{:v} > X_{iv}^\mathcal {U}} (L_{i:} R_{:v}-X_{iv}^\mathcal {U}) R_{\hat{r}v}} \\ \nonumber \quad \textstyle {+\sum _{v \;:\;(i v) \in \mathcal {L}\; \& \; L_{i:} R_{:v} < X_{iv}^\mathcal {L}} (X_{iv}^\mathcal {L}- L_{i:} R_{:v}) R_{\hat{r}v}.} \end{gathered}\end{aligned}$$

The minimizer of the right hand side of the bound on $f(L,R+\delta E_{\hat{r}j})$ is derived in an analogous fashion.

3 Numerical Experiments

A particular care has been taken to produce a numerically stable and efficient implementation. Algorithmically, the key insight is that Eq. (4) does not require as much computation as it seemingly does. Let us define matrix $A\in \mathbb {R}^{m\times r}$ and $B\in \mathbb {R}^{r\times n}$ such that $A_{iv}=W_{iv}^\mathcal {L}$ and $B_{vj}=V_{vj}^{\mathcal {U}}$. After each update of the solution, we also update those matrices. We also store and update sparse residuals, where $(\varDelta _\mathcal {E})_{i,j}$ is $L_{i:} R_{:j} -X^\mathcal {E}_{ij}$ for $(ij)\in \mathcal {E}$ and zero elsewhere, and similarly for $\varDelta _\mathcal {U}$, $\varDelta _\mathcal {L}$. Subsequently, the computation of $\delta _{i \hat{r}}$ or $\delta _{ \hat{ r} j}$ is greatly simplified.

Our C++ implementation stores all data stored in shared memory and uses OpenMP multi-threading. Figure 1 presents the evolution of RMSE over time on the well-known $480189\,\times \,17770$ matrix of rank 20 on a machine with 24 cores of Intel X5650 clocked at 2.67 GHz and 24 GB of RAM. There is an almost linear speed-up visible from 1 to 4 cores and marginally worse speed-up between 4 and 8 cores. The comparison of run-times of algorithms across multiple papers is challenging, especially when some of the implementations are running across clusters of computers in a distributed fashion. Nevertheless, the best distributed implementation, which uses a custom matrix-completion-specific platform for distributed computing [4], requires the wall-clock time of 95.8 s per epoch on a 5-node cluster, for rank 25, and 121.9 s per epoch on a 10-node cluster, again for rank 25, which translates to the use of 47900 to 121900 node-seconds, on the same $480189\,\times \,17770$ matrix (denoted N1). For a recent Spark-based implementation [4], the authors report the execution time of one epoch of 500 s for rank between 25 and 50 on a 10-node cluster, with 8 Intel Xeon cores and 32 GB of RAM per node. A run of 100 epochs, which is required to obtain an acceptable precision, hence takes 50000 to 300000 node-seconds. As can be seen in Fig. 1, our algorithm processes the 100 epochs within 500 node-seconds, while using 8 comparable cores. This illustration suggests an improvement of two orders of magnitude, in terms of run-time.

4 Conclusions

In conclusion, MACO makes it possible to find stationary points of an NP-Hard problem in matrix completion under uncertainty rather efficiently. The simple and seemingly obvious addition of inequality constraints to matrix completion seems to improve the statistical performance of matrix completion in a number of applications, such as collaborative filtering under interval uncertainty, robust statistics, event detection [7, 9], and background modelling in computer vision [1, 2, 5, 6]. We hope this may spark further research, both in terms of dealing with uncertainty in matrix completion and in terms of the efficient algorithms for the same.

Notes

1.
Let $X=L R$, then also $X=(cL)(\frac{1}{c} R)$ as well, but we see that for $c\rightarrow 0$ or $c\rightarrow \infty $ we have $\Vert L\Vert _{F}^2 + \Vert R\Vert _{F}^2 \ll \Vert cL\Vert _{F}^2 + \Vert \frac{1}{c} R\Vert _{F}^2$.

References

Akhriev, A., Marecek, J., Simonetto, A.: Pursuit of low-rank models of time-varying matrices robust to sparse and measurement noise. Preprint arXiv:1809.03550 (2018, submitted)
Dutta, A., Li, X., Richtarik, P.: Weighted low-rank approximation of matrices and background modeling. Preprint arXiv:1804.06252 (2018, submitted)
Jahrer, M., Töscher, A., Legenstein, R.: Combining predictions for accurate recommender systems. In: KDD, pp. 693–702. ACM (2010)
Google Scholar
Li, B., Tata, S., Sismanis, Y.: Sparkler: supporting large-scale matrix factorization. In: EDBT, pp. 625–636. ACM (2013)
Google Scholar
Li, X., Dutta, A.: Weighted low rank approximation for background estimation problems. In: ICCVW, pp. 1853–1861, October 2017
Google Scholar
Li, X., Dutta, A., Richtarik, P.: A batch-incremental video background estimation model using weighted low-rank approximation of matrices. In: ICCVW, pp. 1835–1843, October 2017
Google Scholar
Marecek, J., Maroulis, S., Kalogeraki, V., Gunopulos, D.: Low-rank methods in event detection. Preprint arXiv:1802.03649 (2018, submitted)
Marecek, J., Richtarik, P., Takac, M.: Matrix completion under interval uncertainty. Eur. J. Oper. Res. 256(1), 35–43 (2017)
Article MathSciNet Google Scholar
Marecek, J., Simonetto, A., Maroulis, S., Kalogeraki, V., Gunopulos, D.: Low-rank subspace pursuit in event detection (2018, submitted)
Google Scholar

Download references

Acknowledgement

The work of JM received funding from the European Union’s Horizon 2020 Programme (Horizon2020/2014-2020) under grant agreement No. 688380. The work of MT was partially supported by the U.S. National Science Foundation, under award numbers NSF:CCF:1618717, NSF:CMMI:1663256, and NSF:CCF:1740796. PR acknowledges support from KAUST Faculty Baseline Research Funding Program.

Author information

Authors and Affiliations

IBM Research – Ireland, Damastown, Dublin 15, Ireland
Jakub Marecek
School of Mathematics, University of Edinburgh, Edinburgh, EH9 3FD, UK
Peter Richtarik
KAUST, 2221 Al-Khwarizmi Building, Thuwal, 23955-6900, Kingdom of Saudi Arabia
Peter Richtarik
Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, 18015, USA
Martin Takac

Authors

Jakub Marecek
View author publications
You can also search for this author in PubMed Google Scholar
Peter Richtarik
View author publications
You can also search for this author in PubMed Google Scholar
Martin Takac
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Marecek .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
National University of Ireland, Galway, Ireland
Edward Curry
IBM Research - Ireland, Dublin, Ireland
Elizabeth Daly
University College Dublin, Dublin, Ireland
Brian MacNamee
Nokia (Ireland), Dublin, Ireland
Alice Marascu
Vodafone, Milan, Italy
Fabio Pinelli
IBM Research - Ireland, Dublin, Ireland
Michele Berlingerio
University College Dublin, Dublin, Ireland
Neil Hurley

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marecek, J., Richtarik, P., Takac, M. (2019). Matrix Completion Under Interval Uncertainty: Highlights. In: Brefeld, U., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science(), vol 11053. Springer, Cham. https://doi.org/10.1007/978-3-030-10997-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-10997-4_38
Published: 18 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10996-7
Online ISBN: 978-3-030-10997-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Matrix Completion Under Interval Uncertainty: Highlights

Abstract

Similar content being viewed by others