Abstract
In recent years, a vast amount of data has been accumulated across various fields in industry and academia, and with the rise of artificial intelligence and machine learning technologies, knowledge discovery and high-precision predictions through such data have been demanded. However, real-world data is diverse, including network data that represent relationships, data with multiple modalities or views, data that is distributed across multiple institutions and requires a certain level of information confidentiality.
You have full access to this open access chapter, Download chapter PDF
In recent years, a vast amount of data has been accumulated across various fields in industry and academia, and with the rise of artificial intelligence and machine learning technologies, knowledge discovery and high-precision predictions through such data have been demanded. However, real-world data is diverse, including network data that represent relationships, data with multiple modalities or views, data that is distributed across multiple institutions and requires a certain level of information confidentiality. There is also data that requires extracting latent features in complex subspaces for analysis. Therefore, analysis methods that can handle such diversity are needed. In this chapter, we introduce effective methods for such data using novel numerical analysis techniques.
This chapter is organized as follows. Section 4.1 gives an overview of several spectral methods for unsupervised dimensionality reduction and clustering. Section 4.2 describes a recent advanced dimensionality reduction method based on complex moment-based subspace and matrix trace optimization. Section 4.3 shows methods that can utilize data relationships with multiple views simultaneously. In Sect. 4.4, we describe so-called data collaboration analysis that can securely utilize data distributed across multiple institutions.
In this chapter, we denote the numerical dataset of interest by \(X = [{\boldsymbol{x}}_1, {\boldsymbol{x}}_2, \dots , {\boldsymbol{x}}_n]^\textrm{T} \in \mathbb {R}^{n \times m}\), where n is the number of samples (or data objects) and m is the number of features (or attributes). m is also referred to as dimensionality of the data objects. We use the symbol \(:=\) for definition. We denote \([n] := \{1,2,\dots ,n\}\). For matrix A, the (i, j)-element is denoted by \([A]_{i,j}\). \(I_n\) and \(O_{n,m}\) denote the identity matrix of order n and the \(n \times m\) zero matrix, respectively.
1 Spectral Methods for Machine Learning
In this section, we describe spectral methods for unsupervised dimensionality reduction and clustering. Spectral methods involve the decomposition or representation of data or relationships in terms of their spectral components. Many spectral methods rely on computing eigenspace of matrices derived from the data in some form. The spectral methods introduced in this section rely on a natively given graph or a graph computed by similarities between the data objects.
Let \(\mathcal {G}:=(\mathcal {V},\mathcal {E})\) be a (weighted) undirected graph where \(\mathcal {V}\) is the set of vertices and \(\mathcal {E}\) is the set of edges. Here we assume that \(\mathcal {G}\) is natively given, for example, as in the application in social network analysis. Well-known approaches to form the graph from the data matrix X will be shown at the end of this section. Let W be the adjacency matrix (which is symmetric) of \(\mathcal {G}\). \(w_{i,j}\) is (i, j) element of W. When \((i,j) \in \mathcal {E}\), a positive weight value is assigned to \(w_{i,j}\), otherwise \(w_{i,j} = 0\). D is a diagonal matrix whose ith diagonal element is \(d_i := \sum _{j} w_{i,j}\) (for \(i \in [n]\)). \(L := D - W\) is called graph-Laplacian and is known to be positive semi-definite. It is known that the number of zero eigenvalues of L coincides the number of connected components of \(\mathcal {G}\). When \(\mathcal {G}\) is connected, the values of all elements of the eigenvector corresponding to only zero eigenvalue are constant. In this section, we assume that G is connected, for simplicity. For other cases, refer to [45].
The Laplacian eigenmap [1] is a dimensionality reduction method that minimizes a matrix trace involving L. When one considers dimensionality reduction to \(\ell \)-dimension, the objective function is
with the constraint \(U^{\textrm{T}} D U = I_\ell \). The solution of the minimization can be obtained by computing eigenvectors corresponding to smallest \(\ell \) non-zero eigenvalues of the generalized eigenvalue problem \(L \boldsymbol{u} = \lambda D \boldsymbol{u}\). This minimization problem can be regarded as a continuous relaxation of the normalize cut problem [45]. The ith row vector of U is regarded as \(\ell \) dimensional representation of the ith data object.
Spectral clustering [34, 45] is one of the most popular clustering methods. This clustering method is a method that applies k-means for low-dimensional representation obtained by the Laplacian eigenmap. The locality preserving projection (LPP) [12] can be regarded as a linear approximation of the Laplacian eigenmap. Its objective function is
with the constraint \(Z^{\textrm{T}} X^{\textrm{T}} D X Z = I_\ell \). Here we show the algorithm of spectral clustering in Algorithm 1.
![An algorithm for spectral clustering. The input reads data matrix X belongs to script n times m, and the output reads cluster memberships C and includes 5 steps.](http://media.springernature.com/lw685/springer-static/image/chp%3A10.1007%2F978-981-99-9772-5_4/MediaObjects/538215_1_En_4_Figa_HTML.png)
The modularity maximization [33] is a popular optimization function for graph clustering and is commonly used for community detection in social network graphs. Modularity maximization is a combinatorial optimization problem. In the modularity maximization, random graph model is introduced. \(p_{i,j}\) is the expected weight value of edge between vertex i and j of the random graph model (for unweighted case, \(p_{i,j}\) is the probability of the occurrence of an edge). Modularity Q is defined as
Here, \(g_i\) is the cluster index of the ith vertex and \(\delta (g_i, g_j) = 1\) where \(g_i = g_j\) otherwise 0. Large modularity indicates that the intra-cluster vertices of the graph are more densely connected than a given random graph model. In the Chung–Lu random graph model [33], we define \(p_{i,j}\) as \(p_{i,j} = \frac{1}{\textrm{Vol}(\mathcal {G})} {D_{i,i} D_{j,j}}\) where \(\textrm{Vol}(\mathcal {G}) = \sum _{i} D_{i,i}\) so that the expected degree of the random graph becomes same as those of \(\mathcal {G}\). The modularity matrix is defined as
where \(\boldsymbol{d}\) is the vector formed by the diagonal elements of D. Here we introduce a cluster membership matrix U. \([U]_{i,j} = 1\) when the ith vertex belongs to the jth cluster and \([U]_{i,j} = 0\) otherwise. Using this, modularity can be written as
Here we omitted the constant factor. When we relax the binary constraint and allow the continuous values (with constraint \(U^{\textrm{T}} U = I_n\)), the solution is obtained by computing the eigenvectors corresponding to the \(\ell \) largest eigenvalues of the standard eigenvalue problem
This dimensionality reduction method is not as widely used as the Laplacian eigenmap, but could be an alternative to it. It is proposed as an anomaly detection method for densely connected subgraphs [9].
Now we introduce methods for constructing a graph from the data matrix X. One of the most popular approaches is to make the kNN graph. Given pairwise similarity, kNN graph we set \((i,j) \in \mathcal {E}\) if vertex i is a k-nearest neighbor vertex of j. Because the edge occurrence is not necessarily symmetric, some process for symmetrization is performed [45]. The complexity of a brute force approach to compute pairwise similarity (or distance) is \(O(n^2)\) that is not tractable for large n. There are approximation methods such as recursive Lanczos bisection [2] and kNN descent [5] that are aimed at reducing computational complexity.
2 Complex Moment-Based Supervised Eigenmap for Dimensionality Reduction
Increasing the number of features (dimensionality) seems to lead to better performance. However, in practice, adding more features leads to worse performance, i.e. the curse of dimensionality. Dimensionality reduction reduces the number of features (dimensionality) to avoid the curse of dimensionality. Using dimensionality reduction methods, we can speed up algorithms, reduce the risk of overfitting and improve the accuracy of prediction results.
In this section, we briefly introduce dimensionality reduction methods based on matrix trace optimization. These methods use a few eigenvectors to construct a low-dimensional space while preserving certain properties of the original data. We also introduce a complex moment-based supervised eigenmap [17] that uses a large number of eigenvectors to improve recognition performance.
2.1 Dimensionality Reduction Based on a Matrix Trace Optimization
Let m and n be the number of the features and samples for training dataset \(X = [{\boldsymbol{x}}_1, {\boldsymbol{x}}_2, \dots , {\boldsymbol{x}}_n]^\textrm{T} \in \mathbb {R}^{n \times m}\). Here, we consider linear and nonlinear dimensionality reduction methods that construct low-dimensional data \(Y = [{\boldsymbol{y}}_1, {\boldsymbol{y}}_2, \dots , {\boldsymbol{y}}_n]^\textrm{T} \in \mathbb {R}^{n \times \ell }\), which retain some of the properties of the original data. Specifically, linear dimensionality reduction methods reduce the original data X to the low-dimensional data Y using a linear map \(B \in \mathbb {R}^{m \times \ell }\), i.e.
In each dimensionality reduction method based on a matrix trace optimization, a symmetric matrix \(A_1 \in \mathbb {R}^{m \times m}\) and a symmetric positive definite matrix \(A_2 \in \mathbb {R}^{m \times m}\) are defined, respectively. Then, the linear map B is formulated by the minimization or maximization of a matrix trace:
This is solved as \(\ell \) eigenvectors of the corresponding generalized eigenvalue problem:
Here, we have \(B = [{\boldsymbol{t}}_1, {\boldsymbol{t}}_2, \dots , {\boldsymbol{t}}_\ell ]\).
Principal component analysis (PCA) [24, 39] and locality preserving projections (LPP) [12] are two of the typical unsupervised dimensionality reduction methods in this class. PCA aims to maximize the variance of the projected vectors, while LPP aims to preserve the local similarity of the original data. Discriminant analysis is the typical supervised method that maximizes the between-class scatter and reduces the within-class scatter. A family of discriminant analysis methods are proposed for supervised dimensionality reduction, including Fisher discriminant analysis (FDA) [7, 8], local FDA (LFDA) [43], semi-supervised LFDA (SELF) [44] and locality adaptive discriminant analysis (LADA) [28].
Nonlinear dimensionality reduction methods, which use a nonlinear map and the kernel trick [42], are widely used as improvements over linear dimensionality reduction methods. Nonlinear dimensionality reduction methods transform the original data X to \(\phi (X) = [\phi ({\boldsymbol{x}}_1),\phi ({\boldsymbol{x}}_2), \dots , \phi ({\boldsymbol{x}}_n)]^\textrm{T}\) with a nonlinear kernel function and reduce the dimension of \(\phi (X)\) using a nonlinear map \(\widetilde{B}\) such that \(Y = \phi (X)\widetilde{B}\). With appropriate nonlinear functions, nonlinear dimensionality reduction methods are expected to improve the recognition performance.
In general, we set \(\widetilde{B} = \phi (X)^\textrm{T}\widehat{B}\) with \(\widehat{B} \in \mathbb {R}^{n \times \ell }\) and directly set the Gram matrix \(K = \phi (X) \phi (X)^\textrm{T} \in \mathbb {R}^{n \times n}\) without computing \(\phi (X)\) to reduce the computational costs. The Gaussian kernel, polynomial kernel and sigmoid kernel are commonly used as the kernel functions.
2.2 A Complex Moment-Based Supervised Eigenmap
As written in Sect. 4.2.1, most of the existing dimensionality reduction methods use only \(\ell \) eigenvectors to construct the low-dimensional space with a dimension \(\ell \), which may lead to loss of useful information for achieving successful classification. To overcome the deficiency of information loss, recently, a complex moment-based supervised eigenmap (CMSE) for dimensionality reduction has been proposed [17]. CMSE allows us to achieve better recognition performance by using a complex moment-based subspace that includes \(d > \ell \) eigenvectors, where d can be set independently of the dimension \(\ell \) of the low-dimensional space.
The basic concepts of CMSE for high recognition performance are summarized as follows:
-
Use the complex moment-based subspace \(\mathcal {S}_\Omega \), which is equivalent to the invariant subspace with respect to the multiple eigenvectors.
-
Use the novel minimization problem that combines the matrix trace derived from the dimensionality reduction methods and the squared error straightforwardly using the ground truth data \(Z \in \mathbb {R}^{n \times \ell }\).
Let \(A_1\) and \(A_2\) be the matrices used in a given dimensionality reduction method such as LPP or LFDA. Based on the concept of complex moment-based eigensolvers [14, 40, 41], we also let \(\mathcal {S}_\Omega \) be a complex moment-based subspace with respect to a given real interval \(\Omega = [a,b] \subset \mathbb {R}\) defined by
where \(L, M \in \mathbb {N}_+, V \in \mathbb {R}^{m \times L}\) and \(\Gamma \) is a positively oriented Jordan curve around \(\Omega \). Then, the subspace \(\mathcal {S}_\Omega = \mathcal {R}(S)\) is equivalent to the subspace spanned by eigenvectors \({\boldsymbol{t}}_i\) corresponding to eigenvalues \(\lambda _i \in \Omega \) of (4.1), that is,
Then, using \(d = \textrm{rank}(\mathcal {S}_\Omega ) \ge \ell \) eigenvectors, CMSE introduces the following minimization problem on \(\mathcal {S}_\Omega \) to obtain the linear map \(B \in \mathbb {R}^{m \times \ell }\):
with
where the objective function E(B) combines a matrix trace derived from dimensionality reduction methods and a squared error straightforwardly using the ground truth data Z. Here, \(\mu \in [0,1]\) is a weight parameter for both terms and \(f(\cdot )\) is a (meromorphic) weight function of each eigenvector for minimization.
In (4.3), the column vectors of the linear map B are constrained by \(A_2\)-orthonormal bases of the complex moment-based subspace \(\mathcal {S}_\Omega \). Let \(U \in \mathbb {R}^{m \times d}\) be an \(A_2\)-orthogonal matrix (\(U^\textrm{T}A_2U=I_d\)) whose columns are \(A_2\)-orthonormal bases of the complex moment-based subspace \(\mathcal {S}_\Omega \) and \(T = U^\textrm{T}A_1U\). Then, the minimization problem (4.3) can be written as
where \(B = UC\). A minimization problem with an orthogonal constraint (4.4) is called an unbalanced orthogonal Procrustes (UOP) problem, which is solved using an iterative method [6, 38].
In practice, the contour integral (4.2) is approximated by a numerical integration rule such as the N-point trapezoidal rule, as follows:
using symmetric property. In addition, to improve the numerical stability, we apply a low-rank approximation of \(\widehat{S}\) with a truncated singular value decomposition on an \(A_2\)-inner product:
where \(\widehat{\Sigma }\) is a diagonal matrix whose diagonal entries are the larger part of the singular values, i.e., \(\sigma _i/\sigma _1 \ge \delta \) \((\sigma _1 \ge \sigma _2 \ge \dots \ge \sigma _{LM})\). The columns of \(\widehat{U}\) and \(\widehat{W}\) are the corresponding left and right singular vectors. Let \(\widehat{d}\) be a numerical rank, \(\sigma _{\widehat{d}} / \sigma _1 \ge \delta > \sigma _{\widehat{d}+1} / \sigma _1\). Then, the UOP problem (4.4) is rewritten as
where \(\widehat{T} = \widehat{U}^\textrm{T} A_1 \widehat{U}\) and the map is obtained from \(B = \widehat{U} \widehat{C}\).
The algorithm of CMSE is summarized in Algorithm 2. One of the most time-consuming parts of CMSE is computing the solutions of the N/2 linear systems with L right-hand sides in (4.5) and Step 2 of Algorithm 2 as follows:
Since CMSE has hierarchical parallelism for solving these linear systems, CMSE shows a high parallel efficiency demonstrated in [48].
![An algorithm for complex moment-based supervised eigenmap. The input has training dataset and parameters, and the output reads linear eigenmap B belongs to script R m times l and includes 4 steps.](http://media.springernature.com/lw685/springer-static/image/chp%3A10.1007%2F978-981-99-9772-5_4/MediaObjects/538215_1_En_4_Figb_HTML.png)
3 Multi-view Data Analysis
Multi-view data are ubiquitous in real-world applications. Multi-view data refers to data that includes multiple sets of features or representations, each providing a different perspective or view of the objects or entities being observed. For instance, pictures often have textual tags and descriptions and the same semantic meaning can be represented in multilingual forms. Each individual view has its specific property in these data, while different views often contain complementary information [47]. Multi-view learning can be roughly divided into supervised and unsupervised methods. Here, we mainly introduce the unsupervised method, i.e., multi-view clustering, which has emerged as a powerful tool for exploring the underlying structure of data from different sources.
Next, we introduce three kinds of multi-view clustering methods, including multi-kernel learning, multi-view graph clustering and multi-view subspace clustering.
3.1 Multi-Kernel Learning
Multi-kernel learning has been widely applied in order to deal with multi-view data, which intends to optimally combine a group of predefined kernels in order to improve clustering performance [47]. The general method for multi-view data is to use a linear combination of several kernel functions. Since the weights of different views are important, the weights of different kernels should be taken into consideration. An auto-weighted multi-kernel method has been proposed to weigh the views and kernels simultaneously [51]. First, Kernel Principal Component Analysis (KPCA) is used on each view to reduce the dimension of multi-view data. Then, the designed weighted Gaussian kernel is applied to the low-dimensional multi-view data. The weighted Gaussian kernel integrates the advantage of the Gaussian kernel and the Polynomial kernel, which is formulated as
The above formula can be expanded based on the binomial theorem as follows.
Thus, the weighted Gaussian kernel is a combination of d Gaussian kernels with different widths in the range from \(\sigma ^2\) to \(\sigma ^2/s\). Weight \(R^{d-s}\) is used to reflect the importance of the \(d-s\) kernel and enlarge the linear translation of distance between points.
For multi-view data with m views, n samples and k clusters, the objective function based on the K-means and weighted Gaussian kernel is formulated as
where \(c_j^v\) is the cluster center, and \(\delta _{ij}\) is the indicator variable with \(\delta _{ij}=1\) if \(x_i \in c_j\), otherwise \(\delta _{ij}=0\). This step drives the weight of each view and cluster center. After finite iterations, it arrives at the final clustering result.
By replacing the weighted Gaussian kernel \(K(x,y)=\phi (x)\cdot \phi (y)\), the above formula can be rewritten as
The above formula inherits the properties of K-means and kernel, while the weighted Gaussian kernel integrates the advantages of the Gaussian kernel and Polynomial kernel.
3.2 Multi-view Graph Clustering
The objective of multi-view graph clustering is to find a fusion graph across all views and then use other methods such as spectral clustering on the fusion graph to obtain the final clustering result. Here, we introduce a parameter-free Auto-weighted Multiple Graph Learning (AMGL) method [37], which implements the automatic allocation of weights by modifying the traditional spectral clustering model without any hyperparameters.
In spectral clustering, based on the Laplacian matrix, the objective function can be defined as follows.
Based on the above formula and replacing L with \(L^v\), i.e., the Laplacian matrix in each view v, a new general framework for multiple graph learning is proposed.
where no weight factors are explicitly defined. By taking the derivative of the Lagrange function of the above formula, the weight of each view \(\alpha ^v\) can be obtained based on F as
Then, the objective function of the AMGL method is set as
In the above formula, F will be continuously used to update \(\alpha ^v\) according to Eq. (4.7), which inspires taking an alternating optimization strategy to compute F and \(\alpha ^v\) iteratively.
3.3 Multi-view Subspace Clustering
Multi-view subspace clustering is to learn a unified representation or a latent space from all view data. Then, the unified representation is fed into an off-the-shelf clustering model to obtain the clustering results. Here, we introduce Non-Negative Matrix Factorization (NMF)-based method.
NMF aims to find two non-negative matrices \(U \in \mathbb {R}^{n\times p}\) and \(V \in \mathbb {R}^{p\times m}\) to adequately approximate the original matrix \(X \in \mathbb {R}^{n\times m}\). The reconstruction processes can be formulated as the following optimization problem.
Here, U is termed as the basis matrix, while V is the indicator matrix, and p denotes the desired reduced dimension. Due to the non-negative constraints, it can learn a part-based representation.
For multi-view data, the objective is to combine multi-view information in the NMF framework. Generally, a common indicator matrix \(V^*\) is enforced in the NMF among different views to perform multi-view clustering. One of the widely used methods is to push each view-dependent indicator matrix \(V^v\) toward a common indicator matrix \(V^*\) [29]. The optimization problem is formulated as
The constraint \(\Vert U^{v}_{.,k}\Vert _1=1\) is to guarantee that \(V^{v}\) is within the same range for different views. After obtaining the consensus matrix \(V^*\), the cluster label of the data point i can be computed as \(\arg \max _k V^*_{i,k}\).
4 Data Collaboration Analysis
In various real-world applications, there is a growing demand for the integrated data analysis, in which the datasets are owned by multiple parties in a distributed manner, and they are collaboratively analyzed. For example, in medical data analysis for rare diseases, it was reported that when the analysis is conducted using only data from a single institution, the accuracy is insufficient because of the small sample size [30]. However, sharing the original medical data is difficult because of privacy concerns, and even if it could be achieved, we have to pay huge costs for cross-institutional cross-border communications. Therefore, methods to achieve privacy-preserving analysis in which datasets are collaboratively analyzed without sharing the original data are attracting attention.
In this section, we briefly introduce typical privacy-preserving methods and describe a data collaboration analysis.
4.1 Privacy-Preserving Integrated Data Analysis
A compelling application scenario for privacy-preserving integrated data analysis is medical data analysis involving multiple institutions. Consider a scenario where there are several municipalities, each with its own set of patients or data samples. Within each municipality, patients receive medical examinations or treatments at multiple medical institutions. There are a variety of medical data (i.e., features), such as red blood cell count and white blood cell count, assigned to each patient and the type of data varies by institution. Therefore, medical data are partitioned by samples into multiple municipalities (i.e., horizontal data partitioning) and by features into multiple medical institutions (i.e., vertical data partitioning).
If the analysis is conducted using only data in a single institution (local analysis), the accuracy may be insufficient because of the small sample size and limited features. By integrating data from multiple medical institutions in multiple municipalities (centralized analysis), one can achieve a highly accurate analysis; however, data sharing is difficult because of the perspective of data confidentiality. Thus, privacy-preserving integrated data analysis for horizontally and vertically partitioned data is essential.
Cryptographic computation is one of the most well-known methods used for ensuring privacy preservation [4, 11, 23]. Cryptographic methods can compute a function over distributed data while retaining the privacy of the data. Any given function can be computed by applying fully homomorphic encryption [10]. However, this method is not feasible for a machine learning model construction of large datasets because of the large computational cost even with the latest implementations [3, 50].
In the context of a model construction, a typical technology of a privacy-preserving integrated data analysis is federated learning systems introduced by Google. The concept of federated learning systems was first proposed by Google [25] typically for Android phone model updates [31]. Federated learning is primarily based on (deep) neural network and updates the model iteratively.
To update the model, federated stochastic gradient descent (FedSGD) and federated averaging (FedAvg) are typical strategies [31]. FedSGD is a direct extension of the stochastic gradient descent method. In each iteration of the gradient descent method, each party locally computes a gradient from the shared model using the local dataset and sends it to the server. The shared gradients are averaged and used to update the model. Instead, in FedAvg, each party performs more than one batch update using the local dataset, and sends the updated model to the server. Then, the shared models are averaged to update. Federated learning including more recent methods, such as FedProx [27] and FedCodl [36], require cross-institutional communication in each iteration. For more details, we refer to [26, 46] and references therein.
Recently, non-model share-type federated learning called data collaboration (DC) analysis has been proposed [18, 21]. Instead of the above model share-type federated learning, the DC analysis centralizes the dimensionality-reduced intermediate representation. The centralized intermediate representations are transformed to incorporable forms called collaboration representations. Then, the collaborative representation is analyzed as a single dataset. The DC analysis does not require iterative cross-institutional communications.
The DC analysis has been developed to interpretable model construction [15], novelty detection [22], feature selection [49] and survival analysis [20]. Privacy and accuracy of DC analysis were analyzed in [13]. In addition, identifiability, which is essential for analyzing personal information, of the shared intermediate representation was analyzed [19]. The paper [19] proposed a non-readily identifiable DC analysis that realizes privacy-preserving analysis sharing only non-readily identifiable intermediate representations.
4.2 Data Collaboration Analysis
Let m and n denote the numbers of features (dimensionality of each data) and training data samples. Let \(X = [{\boldsymbol{x}}_{1}, {\boldsymbol{x}}_{2}, \dots , {\boldsymbol{x}}_{n}]^\textrm{T} \in \mathbb {R}^{n \times m}\) and \(Y = [{\boldsymbol{y}}_1, {\boldsymbol{y}}_2, \dots ,\) \({\boldsymbol{y}}_n]^\textrm{T} \in \mathbb {R}^{n \times \ell }\) be the training dataset and the corresponding ground truth or label. Here, for privacy-preserving integrated data analysis on multiple parties, we introduce the algorithm of the DC analysis for supervised learning of horizontal partitioned data, that is, data samples are partitioned into c parties as follows:
Then, the ith party has a partial dataset and the corresponding ground truth,
where \(n = \sum _{i=1}^c n_i\). Note that DC analysis is applicable to the datasets with partially common features [32] and horizontal and vertical partitioned data [21].
DC analysis operates in two roles: worker and master. Workers have the private dataset \(X_{i}\) and corresponding ground truth \(Y_i\), which must be analyzed without sharing \(X_{i}\). Master supports the collaboration analysis.
First, all workers generate the same anchor data \(X^\textrm{anc} \in \mathbb {R}^{r \times m}\), which is shareable data consisting of public data or dummy data that are randomly constructed. Although, random anchor data functions well for DC analysis in general [18, 21, 22], using an anchor data whose distribution is close to that of the raw dataset can improve the recognition performance [16]. Then, each worker constructs intermediate representations,
where \(\widetilde{m}_{i} < m\), and centralizes them to the master. Here, we can use non-supervised and supervised dimensionality reduction methods, e.g., in Sect. 4.2.
At the master side, mapping function \(g_i\) for the collaboration representation is constructed satisfying \(g_i(\widetilde{X}_i^\textrm{anc}) \approx g_{i'}(\widetilde{X}_{i'}^\textrm{anc})\) \((i, i' = 1, 2, \dots , c)\) in some sense. In practice, \(g_i\) is set as a linear function \(g_i(\widetilde{X}_i^\textrm{anc}) = \widetilde{X}_j^\textrm{anc} G_j\) with \(G_i \in \mathbb {R}^{\widetilde{m}_i \times \widehat{m}}\) and is constructed using the following minimal perturbation problem:
This can be solved by a singular value decomposition (SVD)-based algorithm for total least squares problems. Let
be the rank \(\widehat{m}\) approximation based on SVD. Then, the target matrix \(G_i\) is obtained as follows:
where \(\dagger \) denotes the Moore–Penrose inverse and \(C \in \mathbb {R}^{\widehat{m} \times \widehat{m}}\) is a nonsingular matrix, for example, \(C=I_{\widehat{m}}\) and \(C=\Sigma _{\widehat{m}}\) are used in practice. The collaboration representations are analyzed as a single dataset, that is,
with the shared ground truth \(Y_i\) using some supervised machine learning or the deep learning methods for constructing the model function h of the collaboration representation \(\widehat{X}\). Functions \(g_i\) and h are returned to the ith worker.
Let \(X_i^\textrm{test} \in \mathbb {R}^{s_i \times m}\) be a test dataset of the ith party. For the prediction phase, the prediction result \(Y_i^\textrm{pred}\) of \(X_i^\textrm{test}\) is obtained by the following equation:
through the intermediate and collaboration representations.
![An algorithm for data collaboration analysis has input, output, and four steps. Input reads X i belongs to script R n i times m, Y i belongs to script R n i times l, and X i test individually.](http://media.springernature.com/lw685/springer-static/image/chp%3A10.1007%2F978-981-99-9772-5_4/MediaObjects/538215_1_En_4_Figc_HTML.png)
The algorithm of the DC analysis is summarized in Algorithm 3, where \(g_i\) is set by (4.9). The DC analysis requires only three cross-institutional communications, Steps 1, 4 and 9 in Algorithm 3. This is a major advantage over federated learning.
As in [13], the DC analysis has the following double privacy layer for the protection of private data \(X_{i}\):
-
No one can infer the private data \(X_{i}\) under the protocol;
-
Even if \(f_{i}\) is stolen, the private data \(X_{i}\) is still protected regarding \(\varepsilon \)-DR privacy [35].
Under the protocol (Algorithm 3), the function \(f_i\) for the intermediate representation is private and cannot be inferred by others because both the input and output of \(f_i\) are not possessed. Therefore, it is impossible to infer the original data \(X_i\) only from the shared intermediate representation \(\widetilde{X}_i = f_i(X_i)\). In addition, the function \(f_i\) is set to a dimensionality reduction function such that \(\widetilde{m}_i < m\). Therefore, it is impossible to obtain the original data \(X_i\) from \(\widetilde{X}_i = f_i(X_i)\) even when using \(f_i\).
References
M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Proc. Syst. 14, 585–591 (2001)
J. Chen, H.-R. Fang, Y. Saad, Fast approximate KNN graph construction for high dimensional data via recursive Lanczos bisection. J. Mach. Learn. Res. 10, 1989–2012 (2009)
I. Chillotti, N. Gama, M. Georgieva, M. Izabachene, Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds, in International Conference on the Theory and Application of Cryptology and Information Security (Springer, 2016), pp. 3–33
H. Cho, D.J. Wu, B. Berger, Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36(6), 547–551 (2018)
W. Dong, C. Moses, K. Li, Efficient k-nearest neighbor graph construction for generic similarity measures, in Proceedings of the 20th International Conference on World Wide Web (2011), pp. 577–586. https://doi.org/10.1145/1963405.1963487
L. Eldén, H. Park, A procrustes problem on the Stiefel manifold. Numer. Math. 82(4), 599–619 (1999)
R.A. Fisher, The use of multiple measurements in taxonomic problems. Ann. Hum. Genet. 7(2), 179–188 (1936)
K. Fukunaga, Introduction to Statistical Pattern Recognition (Academic, 2013)
Y. Futamura, X. Ye, A. Imakura, T. Sakurai, Spectral anomaly detection in large graphs using a complex moment-based eigenvalue solver. ASCE-ASME J. Risk Uncertain. Eng. Syst. A 6(2) (2020). https://doi.org/10.1061/ajrua6.0001054
C. Gentry, Fully homomorphic encryption using ideal lattices, in Proceedings of the 41st annual ACM Symposium on Theory of Computing (2009), pp. 169–178
R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, J. Wernsing, Cryptonets: applying neural networks to encrypted data with high throughput and accuracy, in Proceedings of the 33rd International Conference on Machine Learning (2016), pp. 201–210
X. He, P. Niyogi, Locality preserving projections. Adv. Neural Inf. Proc. Syst. 16, 153–160 (2004)
A. Imakura, A. Bogdanova, T. Yamazoe, K. Omote, T. Sakurai, Accuracy and privacy evaluations of collaborative data analysis, in Proceedings of the 2nd AAAI Workshop on Privacy-Preserving Artificial Intelligence (2021)
A. Imakura, L. Du, T. Sakurai, Relationships among contour integral-based methods for solving generalized eigenvalue problems. Jpn. J. Ind. Appl. Math. 33(3), 721–750 (2016)
A. Imakura, H. Inaba, Y. Okada, T. Sakurai, Interpretable collaborative data analysis on distributed data. Expert Syst. Appl. 177, 114891 (2021)
A. Imakura, M. Kihira, Y. Okada, T. Sakurai, Another use of SMOTE for interpretable data collaboration analysis. Expert Syst. Appl. 228, 120385 (2023)
A. Imakura, M. Matsuda, X. Ye, T. Sakurai, Complex moment-based supervised eigenmap for dimensionality reduction, in Proceedings of the 33rd AAAI Conference on Artificial Intelligence (2019), pp. 3910–3918
A. Imakura, T. Sakurai, Data collaboration analysis framework using centralization of individual intermediate representations for distributed data sets. ASCE-ASME J. Risk Uncertain. Eng. Syst. A 6, 04020018 (2020)
A. Imakura, T. Sakurai, Y. Okada, T. Fujii, T. Sakamoto, H. Abe, Non-readily identifiable data collaboration analysis for multiple datasets including personal information. Inf. Fusion 98, 101826 (2023)
A. Imakura, R. Tsunoda, R. Kagawa, K. Yamagata, T. Sakurai, DC-COX: data collaboration Cox proportional hazards model for privacy-preserving survival analysis on multiple parties. J. Biomed. Inf. 137, 104264 (2023)
A. Imakura, X. Ye, T. Sakurai, Collaborative data analysis: non-model sharing-type machine learning for distributed data, in Knowledge Management and Acquisition for Intelligent Systems (2021), pp. 14–29
A. Imakura, X. Ye, T. Sakurai, Collaborative novelty detection for distributed data by a probabilistic method, in Asian Conference on Machine Learning (2021), pp. 932–947
S. Jha, L. Kruger, P. McDaniel, Privacy preserving clustering, in European Symposium on Research in Computer Security (Springer, 2005), pp. 397–417
I.T. Jolliffe, Principal component analysis and factor analysis, in Principal component analysis (Springer, 1986), pp. 115–128
J. Konečnỳ, H.B. McMahan, F.X. Yu, P. Richtarik, A.T. Suresh, D. Bacon, Federated learning: Strategies for improving communication efficiency, in NIPS Workshop on Private Multi-Party Machine Learning (2016)
Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, B. He, A survey on federated learning systems: vision, hype and reality for data privacy and protection (2019). arXiv:1907.09693
T. Li, A.K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, V. Smith, Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2, 429–450 (2020)
X. Li, M. Chen, F. Nie, Q. Wang, Locality adaptive discriminant analysis, in Proceedings of the 26th International Joint Conference on Artificial Intelligence (AAAI Press, 2017), pp. 2201–2207
J. Liu, C. Wang, J. Gao, J. Han, Multi-view clustering via joint nonnegative matrix factorization, in Proceedings of the 2013 SIAM International Conference on Data Mining (2013), pp. 252–260
D. Mascalzoni, A. Paradiso, M. Hansson, Rare disease research: breaking the privacy barrier. Appl. Transl. Genom. 3(2), 23–29 (2014)
H.B. McMahan, E. Moore, D. Ramage, S. Hampson, et al., Communication-efficient learning of deep networks from decentralized data (2016). arXiv:1602.05629
A. Mizoguchi, A. Imakura, T. Sakurai, Application of data collaboration analysis to distributed data with misaligned features. Inf. Med. Unlocked 32, 101013 (2022)
M.E.J. Newman, Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006). https://doi.org/10.1103/PhysRevE.74.036104
A. Ng, M. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Proc. Syst. 14, 849–856 (2001)
H. Nguyen, D. Zhuang, P.-Y. Wu, M. Chang, Autogan-based dimension reduction for privacy preservation. Neurocomputing 384, 94–103 (2020)
X. Ni, X. Shen, H. Zhao, Federated optimization via knowledge codistillation. Expert Syst. Appl. 191, 116310 (2022)
F. Nie, J. Li, X. Li, et al., Parameter-free auto-weighted multiple graph learning: a framework for multiview clustering and semi-supervised classification, in Proceedings of the 25th International Joint Conference on Artificial Intelligence (2016), pp. 1881–1887
H. Park, A parallel algorithm for the unbalanced orthogonal procrustes problem. Parallel Comput. 17(8), 913–923 (1991)
K. Pearson, LIII. On lines and planes of closest fit to systems of points in space. London, Edinburgh Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
T. Sakurai, Y. Futamura, A. Imakura, T. Imamura, Scalable eigen-analysis engine for large-scale eigenvalue problems, in Advanced Software Technologies for Post-Peta Scale Computing: The Japanese Post-Peta CREST Research Project (Springer, 2019), pp. 37–57
T. Sakurai, H. Sugiura, A projection method for generalized eigenvalue problems using numerical integration. J. Comput. Appl. Math. 159(1), 119–128 (2003)
B. Schölkopf, A. Smola, K.-R. Müller, Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
M. Sugiyama, Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. J. Mach. Learn. Res. 8, 1027–1061 (2007)
M. Sugiyama, T. Idé, S. Nakajima, J. Sese, Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Mach. Learn. 78, 35–61 (2010)
U. von Luxburg, A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2), Article 12 (2019)
Y. Yang, H. Wang, Multi-view clustering: a survey. Big Data Min. Anal. 1(2), 83–107 (2018)
T. Yano, Y. Futamura, A. Imakura, T. Sakurai, Efficient implementation of a dimensionality reduction method using a complex moment-based subspace, in The International Conference on High Performance Computing in Asia-Pacific Region (2021), pp. 83–89
X. Ye, H. Li, A. Imakura, T. Sakurai, Distributed collaborative feature selection based on intermediate representation, in Proceedings of the 28th International Joint Conference on Artificial Intelligence (2019), pp. 4142–4149
J. Zalonis, F. Armknecht, B. Grohmann, M. Koch, Report: state of the art solutions for privacy preserving machine learning in the medical context (2022). arXiv:2201.11406
P. Zhang, Y. Yang, B. Peng, M. He, Multi-view clustering algorithm based on variable weight and MKL, in Proceedings of the International Joint Conference on Rough Sets (2017), pp. 599–610
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Sakurai, T., Futamura, Y., Imakura, A., Ye, X. (2024). Numerical Analysis for Data Relationship. In: Ikeda, K., et al. Advanced Mathematical Science for Mobility Society. Springer, Singapore. https://doi.org/10.1007/978-981-99-9772-5_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-9772-5_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9771-8
Online ISBN: 978-981-99-9772-5
eBook Packages: Computer ScienceComputer Science (R0)