1 Introduction

The goal of low-rank matrix completion is to recover a low-rank matrix from its partially (noisy) observed entries. This problem has recently received an increased attention due to the emergence of several challenging applications, such as recommender systems [1, 38] (particularly the famous Netflix challenge [6]), genotype imputation [12, 19], image processing [7, 18, 25] and quantum state tomography [15, 28, 30]. Different approaches from frequentist to Bayesian methods have been proposed and studied from theoretical and computational points of views, see for example [2, 3, 8,9,10,11, 21, 23, 24, 26, 29, 34, 37, 40].

From a frequentist point of view, most of the recent methods are usually based on penalized optimization. A seminal result can be found in Candés and Recht [9], Candés and Tao [10] for exact matrix completion (noiseless case) and further developed in the noisy case in Candés and Plan [8], Koltchinskii et al. [21], Negahban and Wainwright [33]. Some efficient algorithms had also been studied, for example see [17, 31, 34]. More particularly, in the notable work [21], the authors studied nuclear-norm penalized estimators and provided reconstruction error rate for their methods. They also showed that these error rates are minimax-optimal (up to a logarithmic factor). Note that the error rate, i.e. the average quadratic error on the entries, of a rank-r matrix size \( m \times p \) from n-observations can not be better than: \( r\max (m,p)/n \) [21].

More recently, in a work by Chen et al. [11], de-biased estimators have been proposed for the problem of noisy low-rank matrix completion. The estimation accuracy of this estimator is shown to be sharp in the sense that it reaches the minimax-optimal rate without any additional logarithmic factor. A sharp bound has also been obtained by a different estimator in Klopp [20]. However, uncertainty quantification is not given. More importantly, the confidence intervals on the reconstruction of entries of the underlying matrix are also provided by using the de-biased estimators in the work by Chen et al. [11]. It is noted that conducting uncertainty quantification for matrix completion is not straightforward. This is because, in general, the solutions for matrix completion do not admit closed-form and the distributions of the estimates returned by the state-of-the-art algorithms are hard to derive.

On the other hand, uncertainty quantification can be obtained straightforwardly from a Bayesian perspective. More specifically, the unknown matrix is considered as a random variable with a specific prior distribution and statistical inference can be obtained using the posterior distribution, for example considering credible intervals. Bayesian methods have been studied for low-rank matrix completion mainly from a computational viewpoint [4, 4, 5, 13, 23, 24, 37, 39, 40]. Most Bayesian estimators are based on conjugate priors which allow to use Gibbs sampling [4, 37] or Variational Bayes methods [24]. These algorithms are fast enough to deal with and actually tested on large datasets like Netflix [6] or MovieLens [16]. However, the theoretical understanding of Bayesian estimators is quite limited, up to our knowledge, [29] and [3] are the only prominent examples. More specifically, they showed that a Bayesian estimator with a low-rank factorization prior reaches the minimax-optimal rate up to a logarithmic factor and the paper [3] further shows that the same estimation error rate can be obtained by using a Variational Bayesian estimator.

In this paper, to understand the performance of Bayesian approaches when compared to the de-biased estimators, we perform numerical comparisons on the estimation accuracy (the estimation error, the normalized squared error and the prediction error, see Sect. 3) considering the de-biased estimator in Chen et al. [11] and the Bayesian methods [3] for which the statistical properties have been well studied. Furthermore, we examine in detail the behaviour of the confidence intervals obtained by the de-biased estimator and the Bayesian credible intervals. Interestingly, it is noted that recent works [35, 36] show that Bayesian methods are now the most accurate in practical recommender systems. Although Bayesian methods have become popular in the problem of matrix completion, its uncertainty quantification (e.g. credible intervals) has received much more limited attention in the literature.

Results from simulation comparisons release originally interesting messages. More specifically, the de-biased estimator is just as good as the Bayesian estimators when we look at the estimation accuracy, although it is completely successful in improving the estimator being de-biased. On the other hand, the Bayesian approaches are much more stable than the de-biased method and, in addition, they outperform the de-biased estimator especially in the case of small samples. Moreover, we find that the coverage rates of the 95% confidence intervals obtained using the de-biased estimator are lower than the 89% equal-tailed credible intervals. These evidences suggest that the Bayesian estimators may actually reach the minimax-optimal rate sharply and the log-term could be due to the technical proofs (the PAC-Bayesian bounds technique). Furthermore, the concentration rate of the corresponding Bayesian posterior discussed in Alquier and Ridgway [3] with a log-term might not be tight.

The rest of the paper is structured as follows. In Sect. 2 we present the low-rank matrix completion problem, then introduce the de-biased estimator and the corresponding confidence interval and provide details on the considered Bayesian estimators. In Sect. 3, simulation studies comparing the different methods are presented. We discuss our results and give some concluding remarks in the final section.

2 Low-rank matrix completion

2.1 Model

In this work, we adopt the statistical model commonly studied in the literature for noisy matrix completion [11]. Let \( M^* \in \mathbb {R}^{m\times p} \) be an unknown rank-r matrix of interest. We partially observe some noisy entries of \(M^*\) as

$$\begin{aligned} Y_{ij} = M^*_{ij} + \mathcal {E}_{ij}, \quad (i,j) \in \Omega \end{aligned}$$
(1)

where \(\Omega \subseteq \lbrace 1, \ldots , m \rbrace \times \lbrace 1, \ldots ,p \rbrace \) is a small subset of indexes and \( \mathcal {E}_{ij} \sim \mathcal {N}(0, \sigma ^2) \) are independently generated noise at the location (ij) . The random sampling model is assumed that each index \( (i,j) \in \Omega \) is observed independently with probability \( \kappa \) (i.e., data are missing uniformly at random). Then, the problem of estimating \( M^* \) with \( n = |\Omega | < mp \) is called the (noisy) low-rank matrix completion problem.

Let \( \mathcal {P}_\Omega (\cdot ) : \mathbb {R}^{m\times p} \mapsto \mathbb {R}^{m\times p} \) be the orthogonal projection onto the observed entries in the index set \(\Omega \) that

$$\begin{aligned} \mathcal {P}_\Omega (Y)_{ij} = {\left\{ \begin{array}{ll} Y_{ij}, &{} \text { if } (i,j) \in \Omega , \\ 0 , &{} \text { if } (i,j) \notin \Omega \end{array}\right. } . \end{aligned}$$

Notations: For a matrix \(A\in \mathbb {R}^{m\times p} \), \( \Vert A\Vert _F = \sqrt{\textrm{trace}(A^\top A)} \) denotes its Frobenius norm and \( \Vert A\Vert _* = \textrm{trace} (\sqrt{A^\top A}) \) denotes its nuclear norm. \( \left[ a \pm b\right] \) denotes the interval \( \left[ a -b, a + b \right] \). We use \( I_q \) to denote the identity matrix of dimension \(q \times q \).

2.2 The de-biased estimator

Let \(\hat{M}\) be either the solution of the following nuclear norm regularization [31]

$$\begin{aligned} \min _{Z\in \mathbb {R}^{m\times p}} \frac{1}{2} \Vert \mathcal {P}_\Omega (Z - Y) \Vert _F^2 + \lambda \Vert Z \Vert _* , \end{aligned}$$

or of the following factorization minimization [17]

$$\begin{aligned} \min _{U\in \mathbb {R}^{m\times r} , V\in \mathbb {R}^{p\times r} } \frac{1}{2} \Vert \mathcal {P}_\Omega (Y - UV^\top ) \Vert _F^2 + \frac{\lambda }{2} \Vert U \Vert ^2_F + \frac{\lambda }{2} \Vert V \Vert ^2_F, \end{aligned}$$
(2)

where \(\lambda >0 \) is a tuning parameter. The optimization problem in (2) can be seen as the problem of finding the MAP (maximum a posteriori) in Bayesian modeling where Gaussian priors are used on columns of the factors U and V, detailed discussion can be found in d in Alquier et al. [4], Fithian and Mazumder [14].

Given an estimator \( \hat{M} \) as above, the de-biased estimator [11] is defined as

$$\begin{aligned} M^{db} := \textrm{Pr}_{\mathrm{rank-}r} \left[ \hat{M} - \mathcal {P}_\Omega ( \hat{M} - Y ) \right] , \end{aligned}$$
(3)

where \(\textrm{Pr}_{\mathrm{rank-}r} (B) = \arg \min _{A, \textrm{rank}(A) \le r} \Vert A- B\Vert _F \) is the projection onto the set of rank-r matrices.

Remark 1

The estimation accuracy of the de-biased estimator, provided in Theorem 3 in Chen et al. [11] under some assumptions, is \( \Vert M^{db} - M^* \Vert _F^2 \le c \max (m,p)r \sigma ^2/n \) without any extra log-term and c is universal numerical constant.

2.2.1 Confidence interval

Let \( \hat{M} = \hat{U} \hat{\Sigma } \hat{V} ^{\top } \) be the singular values decomposition of \( \hat{M} \). Put

$$\begin{aligned} v_{ij} := \sigma ^2 \left[ U^{db}_{i.} (U^{db\top }U^{db})^{-1}U^{db\top }_{i.} + V^{db}_{j.} (V^{db\top }V^{db})^{-1}V^{db\top }_{j.} \right] /\kappa , \end{aligned}$$
(4)

where

$$\begin{aligned} U^{db} = \hat{U} ( \hat{\Sigma } + (\lambda /\kappa ) I_r )^{1/2} \text { and } V^{db} = \hat{V} ( \hat{\Sigma } + (\lambda /\kappa ) I_r )^{1/2}. \end{aligned}$$

Then, given a significance level \(\alpha \in (0,1) \), the following interval

$$\begin{aligned} \left[ M^{db}_{ij} \pm \Phi ^{-1} (1-\alpha /2) \sqrt{v_{ij}} \right] \end{aligned}$$

is a nearly accurate two-sided \((1-\alpha )\) confidence interval of \(M^*_{ij}\), where \( \Phi (\cdot ) \) is the CDF of the standard normal distribution. This is given in Corollary 1 in Chen et al. [11]. This method is implemented in the R package dbMC [27].

2.3 The Bayesian estimators

The Bayesian estimator studied in Alquier and Ridgway [3] is given by

$$\begin{aligned} M^B := \int M \rho _\lambda ( M | Y ) dM \end{aligned}$$

where

$$\begin{aligned} \rho _\lambda ( M | Y ) \propto L(Y| M)^\lambda \pi (M) \end{aligned}$$

is the posterior and \(L(Y| M)^\lambda \) is the likelihood raised to the power \(\lambda \). Here \(\lambda \in (0,1)\) is a tuning parameter and \( \pi (M)\) is the prior distribution.

2.3.1 Priors

A popular choice for the priors in Bayesian matrix completion is to assign conditional Gaussian priors to \(U \in \mathbb {R}^{m\times K} \) and \(V \in \mathbb {R}^{p\times K} \) such that

$$\begin{aligned} M = UV^\top = \sum _{k=1}^K U_{.k}V^\top _{.k}, \end{aligned}$$

for a fixed integer \(K \le \min (m,p) \). More specifically, for \(k \in \{1,\ldots ,K \} \), independently

$$\begin{aligned} \begin{aligned} U_{.k}&\sim \mathcal {N} (0, \gamma _k I_m), \\ V_{.k}&\sim \mathcal {N} (0, \gamma _k I_p), \\ \gamma _k^{-1}&\sim \Gamma (a,b), \end{aligned} \end{aligned}$$
(5)

where \( I_q \) is the identity matrix of dimension \(q \times q \) and ab are some tuning parameters. This type of prior is conjugate so that the conditional posteriors can be derived explicitly in closed form and allows to use the Gibbs sampler, see [37] for details. Some reviews and discussion on low-rank factorization priors can be found in Alquier [2], Alquier et al. [4].

Remark 2

In the case that the rank r is not known, it is natural to take K as large as possible, e.g \(K = \min (m,p) \) but this may be computationally prohibitive if K is large.

Remark 3

The estimation error for this Bayesian estimator, under some assumptions, given in Corollary 4.2 in Alquier and Ridgway [3], is \( \Vert M^{B} - M^* \Vert _F^2 \le \max (m,p)r \sigma ^2/n \) with an additional (multiplicative) log-term by \(\log (n\max (m,p)) \). It is noted that the rate is also reached in Mai and Alquier [29] with an additional (multiplicative) log-term by \(\log (\min (m,p)) \) under general sampling distribution however the authors considered some truncated priors.

For a given rank-r, we propose to consider the following prior, called fixed-rank-prior,

$$\begin{aligned} \begin{aligned} U_{.k}&\sim \mathcal {N} (0, I_m), \\ V_{.k}&\sim \mathcal {N} (0, I_p), \end{aligned} \end{aligned}$$
(6)

for \(k = 1, \ldots ,r\). This prior is a simplified version of the above prior. We note that for \(K > r\) the Gibbs sampler of the fixed-rank-prior will be faster than Gibbs sampler for the above prior. Interestingly, results from simulation for the Bayesian estimator with this prior are slightly better than the one based on the above prior at some point.

Remark 4

We remark that the theoretical estimation error for the Bayesian estimator with the fixed-rank-prior given in (6) remains unchanged following by Corollary 4.2 in Alquier and Ridgway [3].

2.3.2 Credible intervals

Using Bayesian approach, the credibility intervals for the matrix and their functions (e.g. entries) can be easily constructed using the Markov Chain Monte Carlo (MCMC) technique. Here, we focus on the equal-tailed credible interval for an entry (Fig. 1).

More precisely, the credible intervals are reported using the 89% equal-tailed intervals that are recommended by Kruschke [22], McElreath [32] for small posterior samples as in our situations with 500 posterior samples. We noted that, according to Salakhutdinov and Mnih [37] as the data are too big to draw a reasonable size sample, the authors state that drawing only 500 observations from the Gibbs Sampler took 90 hours for the Netflix dataset. Thus, we focus on the 89% equal-tailed credible intervals for 500 posterior samples. It is, however, noted that to obtain 95% intervals, an effective posterior sample size of at least 10,000 is recommended [22], which is computationally costly to run on all of our simulations. A few examples with 10,000 posterior samples are examined in Fig. 2.

Fig. 1
figure 1

Q–Q (quantile–quantile) plot to compare the 10,000 posterior samples for some entries against the standard normal distribution. Top row (from left to right, 3 figures) is the results from Setting I with \(r = 2,p = 100, \tau = 50\%\). Bottom row (from left to right, 3 figures) is the results from Setting I with \(r = 2,p = 1000, \tau = 50\%\)

Fig. 2
figure 2

Plot to compare the limiting Gaussian distributions of the de-biased estimator and the histograms of the 10,000 posterior samples for some entries. The dotted line is the true value of the entries. Top row (from left to right, 3 figures) is the results from Setting I with \(r = 2,p = 100, \tau = 50\%\). Bottom row (from left to right, 3 figures) is the results from Setting I with \(r = 2,p = 1000, \tau = 50\%\)

3 Simulation studies

3.1 Experimental designs

In order to access the performance of different estimators, a series of experiments were conducted with simulated data. We fix \( m = 100 \) and alternate the other dimension by taking \( p = 100 \) and \( p = 1000 \). The rank r is varied between \(r = 2 \) and \(r = 5 \).

  • Setting I: In the first setting, a rank-r matrix \( M^*_{m\times p} \) is generated as the product of two rank-r matrices,

    $$\begin{aligned} M^* = U^*_{m\times r} V_{p\times r}^{*\top } , \end{aligned}$$

    where the entries of \(U^* \) and \( V^* \) are i.i.d \( \mathcal {N} (0 , 1) \). With a missing rate \( \tau = 20\%, 50\% \) and \(80\%\), the entries of the observed set are drawn uniformly at random. This sampled set is then corrupted by noise as in (1), where the \( \mathcal {E}_i \) are i.i.d \( \mathcal {N} (0 , 1) \).

  • Setting II: The second series of simulations is similar to the first one, except that the matrix \(M^*\) is no longer rank-r, but it can be well approximated by a rank-r matrix:

    $$\begin{aligned} M^*=U^*_{m\times r} V_{p\times r}^{*\top } + \frac{1}{10} A_{m\times 50} B_{p\times 50}^\top \end{aligned}$$

    where the entries of A and B are i.i.d \( \mathcal {N} (0 , 1) \).

  • Setting III: This setting is similar to Setting I but here a heavy tail noise is used. More specifically, the noise \( \mathcal {E}_i \) are i.i.d Student distribution \( t_3 \) with 3 degrees of freedom.

  • Setting IV: The set up of this setting is also similar to Setting I. However, we consider a more extreme case where the entries of \(U^* \), \( V^* \) and the noise \( \mathcal {E}_i \) are all i.i.d drawn from the Student distribution with 3 degrees of freedom.

Remark 5

We note that for the second series of simulations, with approximate low-rank matrices, the theory of the de-biased estimator can not be used whereas theoretical guarantees for Bayesian estimators are still valid, see [3]. The setting I follows exactly the minimax-optimal regime and thus it will allow to access the accuracy of the considered estimators. The last 2 settings (III and IV) are misspecification models set up where the theoretical guarantee is not available for all considered estimators.

The behavior of an estimator (say \(\widehat{M}\)) is evaluated through the average squared error (ase) per entry

$$\begin{aligned} \textrm{ase} := \frac{1}{mp} \Vert \widehat{M} - M^*\Vert _F^2 \end{aligned}$$

and the relative squared error (rse)

$$\begin{aligned} \textrm{rse} := \frac{ \Vert \widehat{M} - M^*\Vert _F^2}{ \Vert M^*\Vert _F^2 }. \end{aligned}$$

We also measure the error in predicting the missing entries by using

$$\begin{aligned} \textrm{Pred} := \frac{ \Vert \mathcal {P}_{\bar{\Omega } } ( \widehat{M} - M^*) \Vert _F^2}{mp- n}, \end{aligned}$$

where \( \bar{\Omega } \) is the set of un-observed entries. For each setup, we generate 100 data sets (simulation replicates) and report the average and the standard deviation for a measure of error of each estimator over the replicates.

We compare the de-biased estimator (denoted by ‘d.b’), the Bayesian estimator with the fixed-rank-prior (6) (denoted by ‘f.Bayes’) and the Bayesian estimator with the (flexible rank) prior (5) (denoted by ‘Bayes’). As a by-product in calculating the de-biased estimator through the Alternating Least Squares estimator (2), we also report the results for this estimator, denoted it by ‘als’.

The ‘als’ estimator is available from the R package ‘softImpute’ [31] and is used with default options. The ‘d.b’ estimator is run with \( \lambda = 2.5\sigma \sqrt{mp} \) as in Chen et al. [11]. The ‘f.Bayes’ and ‘Bayes’ estimators are used with tuning parameter \(\lambda = 1/(4\sigma ^2) \) and parameters for the prior of ’Bayes’ estimator are \(K=10,a=1, b=1/100\). The Gibbs samplers for these two Bayesian estimators are run with 500 steps and 100 burn-in steps.

Table 1 Simulation results for Setting I (exact low-rank)
Table 2 Simulation results for Setting II (approximate low-rank)
Table 3 Simulation results for Setting III (heavy tail noise)
Table 4 Simulation results for Setting IV (extreme case)

3.2 Results on estimation accuracy

From the results in Tables 1 and 2, it is clear that the de-biased estimator significantly outperforms its ancestry estimator being de-biased. Whereas, the de-biased estimator is just as good as the Bayesian methods in some cases.

More specifically, in Table 1, the de-biased estimator behaves similar compared to Bayesian estimators in the case with high rates of observation (say \(\tau = 20\%\) or \( 50\% \)). With the case of highly missing rate \(\tau = 80\% \), the de-biased estimator returns highly unstable results, this may be because its ancestry estimator (here it is the als estimator) is unstable with few observations. However, when the dimension of the matrix increases, the differences between the de-biased estimator and the Bayesian estimators become smaller. This is also recognized for the setting of approximate low-rank matrices as in Table 2 and in Table 5, 6.

The ‘f.Bayes’ method yields the best results quite often in terms of all considered errors (ase, Nase and Pred) in setting of exact low-rank matrices. However, it is noted that for the setting with the true underlying matrix being approximately low-rank, in Tables 2 and 6, the ‘Bayes’ approach is slightly better than the ‘f.Bayes’ approach at some point. This can be explained as the ‘Bayes’ approach employs a kind of approximate low-rank prior through the Gamma prior on the variance of the factor matrices and thus it is able to adapt to the approximate low-rankness.

Results in the cases of model misspecification with heavy tail noise are given Tables 3 and 4. Although Bayesian methods, especially ‘f.Bayes’ method, yield better results compared with ‘als’ or ‘db’, all methods fail in the case of highly missing data, \(\tau = 80\%\). This could be due to the fact that these considered methods are all designed for the case of Gaussian noise and thus they are not robust to other heavy tail noise, such as Student noise.

3.3 Results on uncertainty quantification

To examine the uncertainty quantification across the methods, we simulate a matrix as in Sect. 3.1 then we repeat the observation process 100 times. More precisely, we obtain 100 data sets by replicating the observation of \( 20\%, 50\% \) and \(80\%\) entries of the matrix \( M^* \) using a uniform sampling and then each sampled set is corrupted by noise as in (1), where the \( \mathcal {E}_i \) are i.i.d \( \mathcal {N} (0 , 1) \).

Table 5 Simulation results for Setting I on empirical coverage rate of the confidence intervals and of the credible intervals of the entries (standard deviation is given in parentheses)

Tables 5 and 6 gather the empirical coverage rate of the confidence intervals and of the credible intervals of all methods over 100 independent experiments. More precisely, we report the 95% confidence intervals for the de-biased method. The credible interval is reported using the 89% equal-tailed interval, see [22, 32], for small posterior samples as in our situations with 500 posterior samples. We noted that to obtain 95% intervals, an effective posterior sample size of at least 10,000 is recommended [22]. A few examples from Setting I with 10000 posterior samples are given in Fig. 2.

A noteworthy conclusion from the results in Tables 5 and 6 is that the coverage rates of the 89% credible intervals are significantly higher than those of the 95% confidence intervals revealed by the de-biased method. The credible intervals of the ’f.Bayes’ approach show a slightly better coverage rate than those based on the ’Bayes’ approach. It is also noted that in the setting of approximate low-rankness, Table 6, where we do not have theoretical guarantee for the de-biased estimator, the coverage rate of the confidence intervals is very low while the credible intervals still come with reliable coverage rates. These results further explain why Bayesian methods yield better results in accuracy as in Tables 1 and 2.

Table 6 Simulation results for Setting II on empirical coverage rate of the confidence intervals and of the credible intervals of the entries (standard deviation is given in parentheses)
Table 7 Simulation results for Setting III on empirical coverage rate of the confidence intervals and of the credible intervals of the entries (standard deviation is given in parentheses)

In Fig. 2, we compare the limiting Gaussian distribution of the de-biased estimator and posterior samples for the ‘f.Bayes’ method against the true entries of interest. It is shown that the limiting Gaussian distribution of the de-biased estimator yields a slightly sharper tail distribution compared to the distribution of the posterior samples. In addition, displays the Q-Q (quantile-quantile) plots of 10,000 posterior samples of some entries vs. the standard Gaussian random variables. It shows that the posterior distributions of these entries reasonably well match the standard Gaussian distribution.

Table 8 Simulation results for Setting IV on empirical coverage rate of the confidence intervals and of the credible intervals of the entries (standard deviation is given in parentheses)

Results on empirical coverage rate of the confidence intervals and of the credible intervals for Setting III and IV with heavy tail noises are gathered in Tables 7 and 8. We can see that there is a slight reduction in the empirical coverage rate of all methods compared with those in the Gaussian noise setting in Table 5. As in Setting I and II, the empirical coverage rates of confidence intervals decrease quickly as the missing rates \( \tau \) increase, while the empirical coverage rates of credible intervals remain stable.

4 Discussion and conclusion

In this paper, we have provided extensive numerical comparisons between the de-biased estimator and the Bayesian estimators in the problem of low-rank matrix completion. Results from numerical simulations draw a systematic picture of the behaviour of these estimators originally. More specifically, on the estimation accuracy, the de-biased estimator is comparable to the Bayesian estimators whereas the Bayesian estimators are much more stable and in some cases can outperform the de-biased estimator, especially in the small samples regime. Moreover, the credible intervals reasonably cover the underlying entries quite well and slightly better than the confidence intervals in exact low-rank matrix completion. However, in the case of approximate low-rankness, the confidence intervals revealed by the de-biased estimators no longer work well. These results are interested for and can be served as a guideline for researchers as well as practitioners in many areas where one only has access to a few observations.

On the other hand, the results in this work suggest that the considered Bayesian estimators may actually reach the minimax-optimal rate of convergence without additional logarithmic factor. The extra log-terms could be due to the PAC-Bayesian bounds technique that used to prove the theoretical properties of the Bayesian estimator. Moreover, as shown in Alquier and Ridgway [3], the same rate with log-term is proved for the concentration of the corresponding posterior and we conjecture that this rate could also be improved due to the coverage of credible intervals. These are important questions that remain open up to our knowledge.

Last but not least, it is also important to perform the comparisons with the Variational Bayesian (VB) method in Lim and Teh [24] where its theoretical guarantees are given in Lim and Teh [3], because this method is very popular for matrix completion with large datasets. This will be the objective of our future work. However, we would like to note that, in a preprint [4], the authors had performed some comparisons between the Bayesian approach and the VB method. The message from their works is that we can expect that VB should be more or less as accurate as Bayes, maybe slightly less, but that the credibility intervals would be inaccurate (see e.g Figure 3 in Alquier et al. [4]).