Bayesian Tensor Completion and Decomposition with Automatic CP Rank Determination Using MGP Shrinkage Prior

Tensor completion, which completes high-dimensional data with missing entries, has many applications, such as recommender systems and image inpainting. Low-rank CP decomposition is one of the popular methods in tensor completion and is an extension of matrix decomposition to higher order tensors. However, unlike matrix factorization, it is NP-hard to obtain the rank of CP decomposition directly. In this paper, our objective is simultaneously achieving tensor completion and rank determination in CP decomposition. This can be achieved using Bayesian CP decomposition with Multiplicative Gamma Process (MGP) as the prior distribution. MGP is a distribution that decays the components. Using MGP, the proposed method avoids duplication of components and enables highly accurate rank estimation in Bayesian tensor modeling. In addition, MGP helps to reduce noise sensitivity and estimation time. Numerical experiments using artificial data and image data demonstrate the effectiveness of the proposed method.


Introduction
Most of the data in real problems are tensor data. For example, a recommender system is based on customer purchase history data of "customer × merchandise × time" [15], image processing is based on three-dimensional data of "height × width × channel" [10,20,21]. In addition, tensor data in real problems often have many missing entries due to measurement errors, and the data often contain noise [1,2,10,18,19,33]. To solve such a tensor completion, it is often necessary to extract the essential features hidden in the high-dimensional data [6]. Based on the extracted features, the values of missing entries are estimated. Therefore, from the analogy of the matrix factorization, which is a low-rank approximation method for matrices, we consider decomposing tensors into tensors with small degrees of freedom (latent factors). This is defined as a tensor decomposition [17]. There are two standard representative models of tensor decomposition: Tucker decomposition and CP decomposition [7,11,17,32]. CP decomposition is a method of approximating an Nth-order tensor of the size I 1 × ⋯ × I N by a sum of R rank-1 tensors (the outer product of N vectors a (n) r ). A matrix of the size I n × R with the column vector a (n) r is defined as a factor matrix and corresponds to a latent factor in tensor decomposition. In addition, the minimum value of R in a decomposition that reconstructs the original tensor without error (referred to as exact decomposition) is defined as CP rank. On the other hand, the Tucker decomposition is represented by factor matrices and a core tensor that describes the relationships between the factors [33]. Since the Tucker decomposition is equal to CP decomposition when its core tensor is super-diagonal, we can consider that CP decomposition is a more constrained model than Tucker decomposition [16]. CP decomposition is a natural extension of singular value decomposition, and we can interpret the core tensor as representing something like singular values in CP decomposition. In this paper, we focus on CP decomposition.
A typical way to find the CP decomposition is minimizing the loss function, such as Euclidean distance between a low-rank CP decomposition model and the observed tensor. CANDECOMP/PARAFAC weighted optimization (CPWOPT) is a method that formulates CP decomposition with missing data as an weighted least squares method [1] and has been applied to extract traffic patterns in Intelligent Transportation Systems (ITS), where missing data is a common problem [24]. However, tensor completion based on the least-squares method without regularization may not uniquely determine the solution and tends to be sensitive to noise when the rank is estimated to be larger than true rank. In this sense, this is considered as overfitting in tensor decompositions, which causes a severe deterioration of estimation accuracy. In addition, these methods require the rank to be determined in advance, which incurs a high computational cost in rank selection.
Another way to perform tensor decomposition is Bayesian approaches. It is a type of method to infer the posterior probability distribution of parameters, such as a factor matrix. Unlike optimization methods, it has the advantage of being able to evaluate not only the estimated value but also its ambiguity. Some of the reported works include network structure analysis and collaborative filtering using tensor decomposition estimated by MCMC [29,36]. However, the convergence speed of the MCMC estimation method is very slow.
ARD (Automatic Rank Determination) is a technique for tensor completion using CP decomposition with variational Bayes [37]. ARD is an algorithm that can perform rank estimation as well as tensor completion and employs a hierarchical Bayesian model with a prior distribution that induces group sparsity in all factor matrices to improve robustness against noise. ARD allows us to avoid costly rank selection and achieve efficient tensor completion and rank estimation. However, ARD often causes duplication in the column vectors of factor matrices in replicated experiments. This leads to over-estimation of the CP ranks, which deteriorates completion accuracy, estimation time and reduces compression performance.
In this paper, we propose a new tensor completion method based on Bayesian CP decomposition with ARD using Multiplicative Gamma Process (MGP) shrinkage prior. MGP shrinkage prior is a distribution so that the core tensor of ARD is shrunk as much as possible [4,26]. Since the core tensor and the factor matrix are linked, when the core tensor decays, the column vectors of factor matrices is ordered, and duplicates are removed. By applying MGP shrinkage prior to ARD, the proposed method can improve the accuracy of rank estimation, reduce the estimation time, and enhance robustness to noise.
Our contribution can be summarized as follows: 1. We confirmed duplication in the column vectors of factor matrices in ARD by numerical experiments. We also gave mathematical proof of a property concerning the duplication of bases in particular situations. It is a significant contribution to point out this issue, because duplication of the column vectors of the factor matrix leads to an overestimation of the rank, resulting in worse estimation accuracy, an increase in the estimation time, and reduced compression performance. 2. To reduce the redundancy of the model based on the duplication of the column vectors of the factor matrix, which is a drawback of ARD, we proposed a new probabilistic model using MGP. We also derived a variational Bayesian inference algorithm based on this probabilistic model that simultaneously performs tensor completion, denoising, and rank estimation. 3. Experiments using synthetic data showed an increase in the accuracy of rank estimation and a decrease in estimation error. Experiments on real-world image and traffic data show that the estimation time is significantly reduced.
The rest of this paper is organized as follows: related works are described in "Related Works", the review of conventional ARD method and its problems are described in "Review of ARD (Automatic Rank Determination)", the proposed MGP-ARD method is described in "Proposed Method", the experiments of the proposed method are described in "Experiment", and the conclusion is described in "Conclusions".

Preliminaries and Notation
Vectors are represented as lowercase boldface a ∈ ℝ I , matrices as uppercase A ∈ ℝ I×J , and higher order tensors are written by calligraphic letters A ∈ ℝ I 1 ×⋯×I N . A single entry of a tensor is represented as A i 1 ,…,i N . An N − 1 th order tensor that fixes only one mode of the tensor is denoted as The inner product of a tensor is defined as ⟨A, B⟩ , where The Frobenius norm is defined as ‖A‖ F = √ ⟨A, A⟩. The Hadamard product of the matrices A ∈ ℝ I×J and B ∈ ℝ I×J is A ⊛ B ∈ ℝ I×J , and the Kronecker product of the matrices A ∈ ℝ I×J and B ∈ ℝ K×L is A ⊗ B ∈ ℝ IK×JL , and the Khatri-Rao product of the matrices A ∈ ℝ I×K and B ∈ ℝ J×K is denoted by A ⊙ B ∈ ℝ IJ×K , respectively. In particular, the Hadamard product of a set of matrices is defined by SN Computer Science and the Khatri-Rao product of a set of matrices in reverse order is defined by

Related Works
There are four approaches for the CP rank estimation: Supervised learning, optimization, probabilistic estimation, and Bayesian inference.
First, supervised learning can be used for rank estimation. Zhou et al. [41] proposed to use a CNN for CP rank estimation. Note that this approach requires training data.
On the other hand, there is a method of rank estimation based on optimization. Since the rank of the tensor is not a convex function, a method of convex relaxation using nuclear norm minimization has been proposed [27]. For example, one of the nuclear norms of a tensor is defined as the sum of the nuclear norms of the matrices (n-unfolding) in each mode [10,21,31]. In some studies, this has been applied to tensor completion [9,14,20].
There are also several methods based on probabilistic approaches. One method is to use probabilistic CP decomposition to estimate the CP rank by MAP estimation [3], which has been applied to channel estimation in MIMO systems [42]. Another approach is to use the EM algorithm to perform CP rank estimation and image denoising [34]. Although these studies are also stochastic approaches, they differ from our method in that they are based on point estimation and, therefore, cannot infer uncertainty in the solution.
On the other hand, there are methods to estimate the tensor rank using Bayesian estimation. For example, there is a method to obtain the CP rank for binary and real number tensors [26]. This method is different from our method in that it uses MCMC and convergence is slow. In addition, this method is challenging to incorporate constraints, such as smoothness into the equation, because the equation is complicated due to the explicit mathematical model of the core tensor. There is also a method that uses variational Bayes to estimate the Tucker rank instead of the CP rank [38]. The ARD method is a variational Bayesian method that can simultaneously perform tensor completion, denoising, and rank estimation [37], and has been applied to the spatiotemporal traffic data imputation [8]. However, ARD has the disadvantage of duplication in the column vectors of factor matrices described in "Introduction", making the model redundant.

Review of ARD (Automatic Rank Determination)
ARD is an algorithm for tensor completion based on Bayesian CP decomposition [37]. ARD can be applied to noisy data and can also perform rank estimation simultaneously. The modeling is constructed using hierarchical Bayes with a prior distribution that induces sparsity, and variational Bayes is used as the inference method.

Modeling
The Y is an N-order tensor containing missing entries of size I 1 × I 2 × ⋯ × I N . We define (i 1 , i 2 , … , i N ) ∈ as the index of the observation part, and the O is the mask tensor such that the observed part is 1 and the missing part is 0. The Y ∶= Y ⊛ O is defined as the element being observed. We also assume that Y is the observed data with noise added to the latent tensor X , and formulate it as Y = X + . Here, is assumed to follow the independently and identically distributed (i.i.d.) Gaussian distribution. The latent tensor X is defined as which is represented by CP decomposition model. We define {A (n) } N n=1 as the factor matrix, such that The probabilistic models for CP decomposition is represented by Next, we will discuss rank determination in ARD. In general, it is very difficult to estimate the dimension R of the latent space with the least redundancy, i.e., CP rank [12]. By the definition of CP rank [17], R should be a minimum value. ARD attempts to automatically determine CP rank in the process of Bayesian inference by setting up a prior distribution that induces sparsity for all factor matrices. This is based on the idea of sparse Bayesian learning [30], automatic relevance determination [22,25,35], and automatic association decision [23]. We will explain the discussion so far in more detail using formulas. For all the factor matrices A (n) , prior distribution of ARD is defined as is Gamma distribution. The prior distribution of the factor matrix (Eq. (6)) has a mean 0, so its value approaches 0 as the precision increases. Furthermore, since all factor matrices share the precision parameters, that is, the inverse of the precision can be interpreted as super diagonal entries of the core tensor. This prior distribution is sparse, because it sets the unimportant components of the factor matrix to zero. The number of components of the factor matrix obtained from the inference corresponds to CP rank. Since the model is robust to noise because of its sparsity, denoising can also be achieved from this prior distribution. With this prior distribution, ARD can efficiently derive CP rank.
The precision of CP decomposition model is also set as a probability distribution, and the distribution is defined as In summary, we define the unobserved latent parameters as = {A (1) , … , A (N) , , c } , probabilistic modeling of ARD is defined as Inference Next, we will give an overview of inference methods. The goal of Bayesian inference is to derive the posterior distribution of parameters. The posterior distribution is derived as The missing entries are estimated by calculating the predictive distribution, and the predictive distribution is derived by In Eqs. (10) and (11) we need to calculate the multiple integrals with parameter , which is very difficult with complex latent variables. In variational Bayes, to find a distribution q( ) that approximates the true posterior distribution p( |Y ) , we derive q such that the KL divergence is minimized. This derivation is In this study, we employ the mean-field approximation for q( ) , is defined as Computing Eqs. (12) and (13) after the mean field approximation, we get and the approximate distribution is the expected value for of all variables except j . From Eq. (14), the parameters = {A (1) , … , A (N) , , } can be inferred alternately. Since there are dependencies among the variables, it is necessary to obtain them using an iterative method.
After obtaining the approximate posterior distribution, the approximate predictive distribution can be obtained from and tensor completion, including ambiguity, can be achieved.
In ARD, the rank is obtained simultaneously with tensor completion. In the process of Bayesian inference, the result of the update of affects the new posterior distribution of the entire factor matrix A (n) , which in turn affects the next update of . Therefore, when r becomes very large, the prior distribution forces the rth component of A (n) toward zero. The tensor rank can then be obtained by counting the number of non-zero components of the factor matrix.

Overestimation of the CP Rank by ARD
In this section, we discuss a phenomenon that ARD sometimes generates redundancy in CP decomposition model. In our experiments, we found that ARD can cause duplication in the column vectors of the factor matrix, and it was difficult to improve even with more iterations. Duplication in the experiment is shown in Fig. 1. This is the factor matrix obtained by the ARD inference results when using a thirdorder tensor with true rank 3 and size 30 × 30 × 30 as the data for completion. We can see that the ARD estimated the CP rank as 5 with redundant bases. This phenomenon occurs quite frequently. The results of the previous experiment after 100 trials show that in as many as 40 out of 100 trials, the estimated rank is greater than the true rank of 3. It can also be shown mathematically that, under very specific conditions, when duplication in the column vectors of one factor matrix, the other factor matrices will also overlap.
Theorem 1 Let the n-th factor matrix A (n) be from Eq. (4). If factor matrices A (k≠n) (k = 1, … , n − 1, n + 1, … , N) are rank 1 for all k, then the mean of A (n) is less than or equal to 1.

Proof
In the proof, we show that the approximate distribution q(A (n) ) is Gaussian, and the mean matrix A (n) with mean vector ã (n) i n ,∶ of a Gaussian is of rank 1 or less. A (n) is defined as By computing Eq. (14), the approximated posterior distribution of the factor matrix A (n) is defined as and its parameters can be calculated by where the Y i n denotes the n-mode i n th N − 1 th order tensor of the observation tensor, and O i n denotes the n-mode i n th N − 1 th order tensor of the mask tensor. (⋅) is the indicator function and vec(⋅) is the vectorization function of the tensor. The mean matrix can be calculated from Eqs. (16) and (17) to where Y (n) is defined as . From rank(XY) ≦ rank(X) and Eq. (18), rank A (n) ≦ 1 , so the rank of A (n) is less than or equal to 1. ◻ Since basis duplication is equivalent to rank reduction, Theorem 1 shows mathematically that the basis duplication of the Nth factor matrix is induced by the duplication of the basis of the non-Nth factor matrix. However, note that since this is a theorem under the very restrictive conditions of rank 1, basis duplication is essentially an assertion based on experimental results.
Overestimation of the CP rank causes at least three kinds of issues in applications. First, it directly reduces the data compression performance of the CP decomposition. Second, it reduces robustness to noise, because the extra components out of the true rank may fit to the noise parameters. Third, it increases the computational cost of the algorithm, since the computational complexity of ARD is proportional to R 3 .

Proposed Method
In this section, we propose a new tensor completion/decomposition method that uses a prior distribution in which the core tensor is decayed with Multiplicative Gamma Process (MGP). We call the proposed method as MGP-ARD. Because of the effects of MGP shrinkage prior, MGP-ARD reduces the problem of model redundancy in ARD.

Modeling
MGP-ARD is a method in which the MGP [26] distribution is set as the prior distribution of accuracy in ARD (see Eq. (6)). The distribution of MGP used in the proposed method is defined as In this study, we employ somewhat different formulation/ modeling of MGP of original one. Equation (19) is a model that expresses that as the index r increases, the accuracy increases, and the core tensor decays. The accuracy increases as the index r increases, because r is a truncated (19) p( r | r ) = Ga( r |c 0 , r ),

Fig. 2
Overview of the decay of the core tensor by MGP. The scale parameter r of the gamma distribution decreases as the index r increases. Thus, r increases as the index r increases, resulting in a decrease in core tensor 1 r SN Computer Science gamma distribution from 0 to 1, and the scale parameter r of the gamma distribution is a multiplication of r . An overview of the decay of the core tensor is illustrated in Fig. 2. MGP is based on the idea of nonparametric factor analysis [4]. In nonparametric factor analysis, to resolve the indistinguishability [28] caused by the rotational invariance of factor analysis, a gamma distribution has been introduced such that the values decay to zero as the index increases in the column direction of the loading matrix, achieving a significant reduction in the number of parameters. MGP-ARD avoids duplication of bases by having the factor matrices be ordered, thus improving the redundancy of the model, which is an issue in ARD. An overview of the improvement of duplication is illustrated in Fig. 3.
It is also expected that the order invariance of CP decomposition is resolved in the proposed method, which narrows down the solution space of parameters and achieves efficient estimation.
The graphical model of MGP-ARD is shown in Fig. 4. The parameters of the MGP-ARD are = {A (1) , … , A (N) , , , c } , and the probability modeling of the MGP-ARD is defined as from (19). The prior distribution of A (n) , c other than MGP shrinkage prior distribution is identical to ARD, i.e., Eqs. (6) and (8).
The approximated posterior distribution of the factor matrix A (n) is defined as and its parameters can be calculated by (21) q n (A (n) ) =  Focusing on the expression of the posterior covariance V (n) i n , we can see that it is controlled by the noise precision parameter c of CP decomposition model. In other words, if c is large, the contribution of the factor matrix A (�n) , which is a model term, will be large, and if it is small, the contribution of , which is a term related to decay (MGP), will be enormous. Focusing on the expression for the posterior mean V (n) i n , we can see that Y , which represents the observed data, and A (�n) , which represents the factor matrix of the model, are correlated.
The approximated posterior distribution of the factor matrix accuracy r is defined as and its parameters can be calculated by Focusing on the expression for d r M , the first term is r , which is related to the decay (MGP), and the second term is q [a (n)T ∶,r a (n) ∶,r ] , which is related to the model. In other words, in the case of ARD, when the rth component of the factor matrix, ‖a ∶,r ‖ 2 2 , becomes small, the accuracy of the rth component increases and induces sparsity, while in MGP-ARD, also, the decay mechanism r also affects the accuracy.
The approximated distribution of the posterior distribution of the degeneracy mechanism r is defined as  and its parameters can be calculated by The approximated posterior distribution of the accuracy c of the model for CP decomposition is defined as and its parameters can be calculated by Focusing on the expression for b M , the second term represents the error between the observed data Y and the latent tensor X (factor matrix A (n) ), which is the model. Next, we will discuss the specific formulation for the variational lower bound. The convergence of the algorithm is determined by where is the convergence threshold. The variational lower bound L(q) can be calculated by Finally, we discuss the specific formula for predictive distribution.
The purpose of predictive distributions is tensor completion. The predictive distribution can be approximately calculated by where T is Student's t-distribution and The process of deriving Eq. (27) is described in [5]. An overview of the algorithm is shown in Algorithm 1. Similar to ARD, when the value of the rth element of the factor matrix A (n) becomes zero (below the threshold) during the inference process, the rth factor is removed. The number of components in the factor matrix is the rank, enabling rank determination.

Computational Complexity
The computation cost of the factor matrices A (n) in Eq. (21) is O(NR 2 M + R 3 ∑ n I n ) , where N is the order of the tensor, M denotes the number of observations, i.e., the input data size. R is the number of latent components in each A (n) , i.e., model complexity or tensor rank. The computation cost of the hyperparameter in Eq. (22) is O(R 2 ∑ n I n ) .
The computation cost of the hyperparameter in Eq. (23) is O(R 2 ) . The computation cost of the hyperparameter in Eq. (24) is O(R 2 M) . Therefore, the overall complexity of our algorithm is O(NR 2 M + R 3 ) , which scales linearly with the data size but polynomially with the model complexity. It can be seen that the algorithm strongly depends on the tensor rank, i.e., the complexity R of the model. The computational complexity of ARD is O(NR 2 M + R 3 ) , which is the same as the proposed method. However, unlike ARD, MGP-ARD does not evaluate R excessively high, so the computation time is shorter, and the algorithm is faster.

Experiment
In this section, we describe the experiments to verify the effectiveness of the proposed MGP-ARD. Since MGP-ARD is proposed to correctly estimate the CP rank, which was overestimated in ARD, the following points are verified from both artificial data and image data.
1. By attempting to estimate the CP rank for artificial tensor data, we examine whether the estimation accuracy of MGP-ARD is improved compared to ARD [37], an original method. We also examine whether the estimation error is reduced by not overestimating the rank in the problem of noisy tensor completion. 2. We attempt to recover incomplete images with noise. We verify that the estimation time is reduced by suppress-ing the overestimation of the rank while maintaining a high estimation system by comparing it to ARD [37] and MGP-a [26].

Experiments on Artificial Data (Rank is Known)
In tensor completion, we verify whether the duplication of column vectors is improved and the accuracy of rank estimation is improved compared to the existing original ARD method. To check the accuracy of the rank estimation, we use synthetic data. In this experiment, the number of tensor orders is 3, and the sizes are 30 × 30 × 30 and 20 × 40 × 10 . The true rank is 3 or 5, respectively. To investigate the robustness to the missing, we experiment with different observation rates (0.2, 0.5, and 0.9). The noise is 20 dB, and 50 trials are performed for each experimental pattern. The convergence threshold is 1.0 × 10 −5 , and the estimation accuracy is RSE = . The initial value of R is twice the true rank ( Table 1).
The experimental results are shown in Fig. 5. In all cases, the accuracy of rank estimation of MGP-ARD (the proposed method) was higher than that of ARD (the original method). In particular, when the loss rate is low, ARD tends to overestimate the rank, because the amount of noise in the data is large. At the same time, MGP-ARD estimates the true rank, resulting in a significant difference in estimation accuracy.
Next, we also compared ARD and MGP-ARD in various noise levels: SNR is 0, 10, and 20 dB. The size of the tensor is 30 × 30 × 30 , the missing rate is 0.5, the true rank is 3 and 5, and each combination is tried 50 times. The experimental results are shown in Fig. 6, Table 2. In all cases, the accuracy of rank estimation of MGP-ARD (the proposed method) was higher than that of ARD (the original method). The proposed method has a minor change in the distribution in response to noise than the conventional method.
The actual improvement in duplication is shown in Fig. 7. This figure shows the factor matrices finally obtained by ARD and MGP-ARD when the true rank is 5, the observation rate is 0.5, and the SNR is 20 dB. In ARD, the number of components in the obtained factor matrix is 8, so the rank estimation result is 8. On the other hand, the number of components in the factor matrix obtained by MGP-ARD is 5, which results in a rank estimation result of 5. Since the true rank is 5, we can see that MGP-ARD more accurately estimates the rank, while ARD overestimates 3. This is because of the duplication in the column vectors of the factor matrix of ARD, and MGP-ARD has improved this. Furthermore, the RSE of MGP-ARD is lower than that of ARD, suggesting that sensitivity to noise is avoided by not overestimating the ranks. Figure 8 shows the convergence of the ARD and MGP-ARD algorithms, i.e., the variational lower bound and rank. The size of the tensor is 30 × 30 × 30 , true rank is 3, observation rate is 0.5, SNR is 20 dB. From Fig. 8, the convergence of MGP-ARD was slightly slower than ARD, but the variational lower bound of MGP-ARD was better than that of ARD finally. In addition, the estimated rank of MGP-ARD was significantly decreased in early stage of the iterations. Figure 9 shows estimation time against the rank values in both methods for ARD and MGP-ARD. True ranks are 5, 10, 15, and 20, size is 30 × 30 × 30 , SNR is 20 dB, and missing rate is 0.5. It can be seen that MGP-ARD is faster than ARD in all ranks. The gap of computation times between ARD and MGP-ARD becomes larger as the true rank increases. This is because the computational complexity of both methods is dominated by O(R 3 ) . Table 1 shows the mean and median RSE of the proposed method (MGP-ARD) and the existing method (ARD). Both mean and median RSE were lower in MGP-ARD under most conditions, suggesting that RSE was improved. This suggests that the MGP-ARD does not overestimate the rank, thereby reducing the redundancy of the model and avoiding sensitivity to noise.

Experiments with Image Data (Rank is Unknown)
We experiment with MGP-ARD using real data as well as artificial data. In this experiment, we try to recover the missing image with noise for image inpainting. Since the true rank is not yet known, we mainly evaluate the estimation accuracy and time.
First, we use 8 types of uniform missing images. The observation rate is 0.1 ( 90% missing) and the SNR is 10 [dB] . The original image and the image with further missing data that contains noise are shown in Fig. 10. We experiment with image inpainting using MGP-a [26], ARD and . We also use PSNR and SSIM as other assessment measures. The initial value of R is set to 100.
The completion results are shown in Fig. 10. The recovery performance (RSE, PSNR, SSIM) and runtime are also summarized in Table 3. Table 3, MGP-ARD is the fastest while maintaining a high performance among the three methods. MGP-ARD is only slightly less accurate than ARD, but not enough to be distinguished by the naked eye, referring to the image inpainting results in Fig. 10. In terms of estimation time, we achieved completion in about half the estimation time of ARD. This can be attributed to the fact that the computation time of MGP-ARD is proportional to R 3 , and the improvement of redundancy reduces R, which in turn reduces the computation time.
Next, we have also experimented with images in which certain areas are completely missing. We use two types of images (baboon, sailboat) in which the upper left corner is largely missing in a rectangular shape. The original image and the image with further missing data are shown in Fig. 11. The algorithm, the convergence threshold, initial R, and the metrics are the same as for the uniform missing images experiment.
The completion results are shown in Fig. 11. The recovery performance (RSE, PSNR, SSIM) and runtime are also summarized in Table 4. Table 4 shows that MGP-ARD has the highest accuracy and the fastest estimation for both images. Figure 11 shows that MGP-ARD performed the smoothest completion due to its low rank property, which is considered to increase the completion accuracy.
In summary, we confirmed that MGP-ARD significantly reduces the estimation time while maintaining the same level of estimation accuracy compared to ARD in image inpainting.

Experiments with Traffic Data (Rank is Unknown)
With Intelligent Transportation Systems (ITS) operation, the analysis of large-scale traffic data in urban centers is becoming more and more critical. In general, traffic data contains information about time, space, and individual attributes, and the number of data is enormous. Such high-dimensional data with multiple characteristics can be regarded as tensor data. The problem in analyzing such high-dimensional data is missing values due to hardware/software or communication network failures. In this experiment, we use MGP-ARD to perform tensor completion on incomplete traffic data and estimate the missing values.
In this section, we conduct numerical experiments based on a traffic speed data set collected in Guangzhou, China. This experiment is based on the work of Chen et al. [8]. This data set is available at https:// doi. org/ 10. 5281/ zenodo. 12052 29. This data set consists of travel speeds observed at 10-min intervals (144-time intervals per day) from 214 road segments over 2 months (61 days from August 1, 2016, to September 30, 2016). The speed data can be organized as a Fig. 7 Improvement in the duplication of the column vectors. true rank is 5, observation rate is 0.5, SNR is 20 dB. In ARD, the 1st and 5th, 7th, and 8th components overlap, but MGP-ARD improves this and correctly estimates the true rank of 5 . The initial value of R is set to 100.
The completion results are shown in Fig. 12. The recovery performance (RSE) and runtime are also summarized in Table 5. Table 5 shows that better estimation can be achieved with a faster estimation time. Figure 12 also shows that MGP-ARD estimates better than ARD at the 1:00 p.m. In summary, we confirmed that MGP-ARD significantly reduces the estimation time and improves estimation accuracy compared to ARD in speed traffic data completion.   12 Completion results of traffic speed data (km/h). The graph shows the ARD of the data with 70% of the elements missing, the completion result by MGP-ARD, and the data before the missing elements (True). The range of the data is 3 days

Conclusions
We proposed a method to avoid the model redundancy in ARD, an original Bayesian CP decomposition, and achieve more accurate tensor completion and rank estimation. Therefore, we proposed a new Bayesian CP decomposition framework called MGP-ARD, in which the MGP prior distribution is set such that the core tensor is decayed. The redundancy of the model described here refers to the overlap of the column vector of the factor matrices, which causes the original ARD method to overestimate the rank. In the proposed method, MGP-ARD sets an ordinal order to the factor matrix, eliminating the duplication of the column vectors of the factor matrix. The avoidance of model redundancy leads to an improvement in sensitivity to noise and estimation time.
The effectiveness of the proposed method is confirmed by experiments on synthetic data and real data. In the experiment of tensor completion on synthetic data, we confirmed that the rank estimation accuracy is improved compared to the original method by removing the duplication of the column vectors of the factor matrices. This also ensured that sensitivity to noise was avoided. In the experiments of tensor completion on real data, we mainly investigated the accuracy and estimation time of completion estimation for image inpainting. We confirmed that the estimation time was reduced while maintaining high estimation accuracy. In addition, because of its robustness to noise, MGP-ARD is a very good technique that provides not only rank estimation and tensor completion but also tensor decomposition.
Future works include the extension of this study to other tensor decomposition models, such as tensor train and ring decompositions [13,39,40,40].
Funding This work was partially supported by JSPS KAKENHI Grant number 20H04208 and 20H04249.

Availability of Data and Materials
Our methods are validated on public data sets.

Conflict of Interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.