Graph refining via iterative regularization framework
 60 Downloads
Abstract
Graphbased methods have been widely applied in clustering problems. The mainstream pipeline for these methods is to build an affinity matrix first, and then use the spectral clustering methods to construct a graph. The existing studies about such a pipeline mainly focus on how to build a good affinity matrix, while the spectral method has only been considered as an endup step to achieve the clustering tasks. However, the quality of the constructed graph has significant influences on the clustering results. Unlike most of the existing works, our studies in this paper focus on how to refine the original graph to construct a good graph by giving the number of clusters. We show that spectral clustering method has a good property of block structure preserving by giving the priori knowledge about number of clusters. Based on the property, we provide an iterative regularization framework to refine the original graph. The regularization framework is based on a welldesigned reproducing kernel Hilbert spaces for vectorvalued (RKHSvv) functions, which is in favor of doing kernel tricks on graph reconstruction. The elements in RKHSvv are multiple outputs affinity functions. We show that finding an optimal multiple outputs function is equivalent to construct a graph, and the associated affinity matrix of such a graph can be obtained in a form of multiplication between a kernel matrix and an unknown coefficient matrix.
Keywords
Graphbased Affinity matrix Spectral clustering method Regularization framework1 Introduction

We provide a graph reconstruction strategy which can refine the original graph by giving the number of clusters. Such a strategy is based on the block structurepreserving properties of spectral clustering.

We formulate an iterative regularization framework to implement the graph refining strategy and design a new reproducing kernel Hilbert spaces of vectorvalued functions as the hypothesis space for the regularization framework. To the best of our knowledge, this paper is the first one introducing RKHSvv to study the clustering problems.

We provide an effective graph refining model based on the iterative regularization framework.
2 Related work
As an extensive review of graph clustering and regularization framework beyond the scope of this paper, we review the work related to our approach including graphbased clustering and the framework of regularization, the reproducing kernel theory will be a review as well.
Graph clustering is the combination of vertices of the graph, taking into account the edge structure of the graph; each cluster should have multiple edges, and the cluster is relatively small [19, 21]. Generally speaking, given a dataset, the goal of clustering is to divide the dataset into multiple categories, so that the elements assigned to a particular class are similar or connected in some predefined sense. However, not all graphs have the structure of natural clusters. If the structure of the graph is completely uniform, and the edges are distributed on the vertex set, the clustering of any algorithm is arbitrary [26].
Regularization has become the main theme of machine learning and statistics. It provides an intuitive and principle tool for learning from highdimensional data. The consistency of the results of the practical algorithms and general theory of linear algebra has been studied indepth by means of Euclidean norms or regularized Hilbertian specifications [27, 28, 29]. While based on the advantages of dealing with nonlinear problems, the kernels methods have been widely used in the literature [30, 31, 32].
Reproducing kernel theory has significant implementations in integral equations, differential equations, probability, and statistics [33]. In the recent times, this theory is applied for various model problems by many authors [34, 35, 36]. The simplest and most practical method of multitask learning is the regularized multitask learning, which solutions to related tasks are close to each other. Due to its general and simple formulation, regularized multitask learning has been applied to various types of learning problems, such as regression and classification [37]. And some works [38, 39, 40] generalized RKHS of scalarvalued functions to vectorvalued cases to deal with the multiple task learning [41].
Our works can be regarded as an extension to such a generalization in clustering problem. Nie et al. [42] provided methods to reconstruct a graph from the original graph, which is related to ours in the aspect of the problem consideration. Compared with [42], our works in this paper can be considered as a more general formulation, which lets [42] be the special cases of ours.
3 The proposed approach
In this section, we will first introduce the block structurepreserving property and then propose the iterative regularization framework for graph refining based on the reproducing kernel Hilbert spaces for vectorvalued, followed by the method and optimization, with which the block structure can enhance the model.
3.1 Block structure preserving
We have the following lemma about block diagonalization:
Lemma 1
[43] If \(\mathbf {A} \in \mathbb {R}^{N \times N}\), the spectrum of \(\mathbf {A}\) is \(\sigma (\mathbf {A}) = \{\mu _1,\mu _2, \dots , \mu _s\}\), there exist such a nonsingular matrix \(\mathbf {P}\) and a set of matrix \((\mathbf {A}_1, \mathbf {A}_2, \cdots , \mathbf {A}_s)\) that \(\mathbf {P}^{1} \mathbf {A} \mathbf {P} = \hbox {diag } \begin{bmatrix} \mathbf {A}_1&0&...&0\\ 0&\mathbf {A}_2&...&0\\ ...&...&...&...\\ 0&0&...&\mathbf {A}_s \end{bmatrix}\) and the spectrum of \(\mathbf {A}_i\) is \(\sigma (\mathbf {A}_i) = \{\mu _i\}\). That means, \(\mathbf {P}^{1} \mathbf {A} \mathbf {P} = \hbox {diag } \begin{bmatrix} \mu _1 \mathbf {v}_1 \mathbf {v}_1^T&0&...&0\\ 0&\mu _1 \mathbf {v}_2 \mathbf {v}_2^T&...&0\\ ...&...&...&...\\ 0&0&...&\mu _s \mathbf {v}_s \mathbf {v}_s^T \end{bmatrix}\), where \(\mu _i \mathbf {v}_i \mathbf {v}_i^T = \mathbf {A}_i\).
From Lemma 1, we can easily have the following theorem about \(\mathbf {B}\).
Theorem 1
If \(\mathbf {B} \in \mathbb {R}^{N \times N} = \mathbf {Y} \mathbf {Y}^T, \mathbf {Y} \in \mathbb {R}^{N \times C}, \mathbf {Y}^T\mathbf {Y} = \mathbf {I}\), there exist such a nonsingular matrix \(\mathbf {P}\) and a set of unit vectors \((\mathbf {v}_1, \mathbf {v}_2, \cdots , \mathbf {v}_C)\) that satisfy \(\mathbf {P}^{1} \mathbf {B} \mathbf {P} = diag( \mathbf {v}_1 \mathbf {v}_1^T, \mathbf {v}_2 \mathbf {v}_2^T, \cdots , \mathbf {v}_C \mathbf {v}_C^T)\).
Proof
Since \(rank(\mathbf {B}) = rank(\mathbf {Y}^{} \mathbf {Y}^{T})=C\), we have \(\sigma (\mathbf {A}) = \{\mu _1,\mu _2, \dots , \mu _C\}\), where \(\mu _1 = \mu _2 = \cdots = \mu _C = 1\). From Theorem 1, there exist such a nonsingular matrix \(\mathbf {P}\) and a set of unit vectors \((\mathbf {v}_1, \mathbf {v}_2, \cdots , \mathbf {v}_C)\) that \(\mathbf {P}^{1} \mathbf {B} \mathbf {P} = diag( \mathbf {v}_1 \mathbf {v}_1^T, \mathbf {v}_2 \mathbf {v}_2^T, \cdots , \mathbf {v}_C \mathbf {v}_C^T)\). \(\square\)
We also have the following theorem about \(\mathbf {B}\) when \(\mathbf {A}\) is diagonally block.
Theorem 2
Given a diagonal block affinity matrix \(\mathbf {A}\) with C blocks, its Laplacian matrix is denoted as \(\mathbf {L}_{\mathbf {A}}\). If \(\mathbf {Y}^*\) is the optimal solution of Eq. (1), where the Laplacian matrix is \(\mathbf {L}_{\mathbf {A}}\), the affinity matrix \(\mathbf {B} = \mathbf {Y}^* \mathbf {Y}^{*T}\) is also a diagonal block and the blocks structure is same as \(\mathbf {A}\) (see Fig. 2).
Proof
Theorem 2 reveals that the spectral clustering has a property of Block Structure Preserving [44]: given an original affinity matrix \(\mathbf {A}\) which is diagonal block, the affinity matrix which constructed by doing spectral clustering on \(\mathbf {L}_{\mathbf {A}}\) is also diagonal block and has the same block structure as the original affinity matrix \(\mathbf {A}\) (shown in Fig. 2).
3.2 The iterative regularization framework for graph refining
The second term of objective function is to minimize the difference between \(\mathbf {Z}\) and \(\mathbf {F}\). If \(\mathbf {Z} = \mathbf {Y}^{*}\mathbf {Y}^{*T}\), from the analysis in Sect. 3.1, we know that the graph of \(\mathbf {Z}\) always has C clusters. The minimization in the second term actually preserves the information of C clusters structure for the graph of \(\mathbf {F}\). The rest two constraints about f are used to ensure that the learned function is nonnegative and normalized.
Note that, our regularization framework can be put into an iteratively optimization procedure, in which Eqs. (3) and (4) both are solved alternately. We give this optimization procedure in Algorithm 1 to preserve the information of C clusters structure for the graph of \(\mathbf {F}\).
3.3 Hypothesis space of RKHSvv
In general cases, the regularization framework is considered as a setting of binary classification or regression, of which the hypothesis space is defined as a Hilbert space of scalarvalued function. By defining a reproducing kernel [45] in such a space, a reproducing kernel Hilbert space (RKHS) [33] is obtained. Although considering the hypothesis space as a general RKHS has gained numerous achievements in machine learning [34, 35, 36, 46], the function form of scalarvalued outputs is limited in some tasks which need vectorvalue outputs. To deal with the limitation, the generalization from the scalarvalued RKHS to vectorvalued , called RKHSvv, has been introduced and applied in the literature [37, 38]. We use RKHSvv as the hypothesis space \(\mathscr {H}\) for Eq. (3). To define such RKHSvv in this work formally, we first define reproducing kernel in a Hilbert spaces for vectorvalued function.
Definition 1
(Vectorvalued reproducing Kernel) Let \((\mathscr {H}, \langle ,\rangle _{\mathscr {H}})\) be a Hilbert space of functions from a certain input space \(\mathscr {X}\) to \(\mathscr {S}\). A function \(k:\mathscr {X} \times \mathscr {X} \rightarrow \mathbb {R}\), is called a reproducing kernel for \(\mathscr {H}\) if, for all \(\mathbf {x} \in \mathscr {X}\), \(\mathbf {c} \in \mathscr {S}\) , and \(f \in \mathscr {H}\), we have that \(k(\mathbf {x}, \cdot )\mathbf {c} \in \mathscr {H}\) and the reproducing property holds: \(\langle f(\mathbf {x}),\mathbf {c}\rangle _{\mathscr {S}} = \langle f, k(\mathbf {x}, \cdot )\mathbf {c}\rangle _{\mathscr {H}}\).
We can also generalize the concept of positive definite :
Definition 2
It’s easy to verify that common scalarvalued PD kernels , e.g., linear kernel and RBF kernel, are also vectorvalued PD.
Definition 3
A function \(k(\mathbf {x}, \cdot )\mathbf {c}\) in a RKHSvv \(\mathscr {H}\) has the reproducing property:
Corollary 1
Proof
From the definition of RKHSvv, for all \(\mathbf {x} \in \mathscr {X}\) and \(f \in \mathscr {H}\), there exists a bounded linear functional \(\phi _{\mathbf {x}}[f] = <f(\mathbf {x}), \mathbf {c}>_{\mathscr {S}}\). Then, according to the Riesz representation theorem, there also exists such a function \(k(\mathbf {x}, \cdot ) \mathbf {c} \in \mathscr {H}\) that \(\phi _x[f] = \langle k(\mathbf {x}, \cdot ) \mathbf {c}, f \rangle _{\mathscr {H}}\). \(\square\)
Next, we clarify some differences between our definitions of RKHSvv and the previous works in multiple task learning [37, 38, 39, 40, 47, 48]. In these works, the reproducing kernel is defined as a function in a form of \(\varGamma :\mathscr {X} \times \mathscr {X} \rightarrow \mathscr {S}^2\). According to the definition of \(\mathscr {S}\), the output is a MbyM matrix. The reason for building such a formulation is that if \(\varGamma\) is separable, a structure matrix \(\mathbf {A} \in \mathscr {S}^2\) can be separated from \(\varGamma\): \(\mathbf {A} k(\mathbf {x}, \cdot ) = \varGamma (\mathbf {x}, \cdot )\), where \(k(\mathbf {x}, \cdot )\) is a scalarvalued kernel. \(\mathbf {A}\) is very useful for multiple tasks learning, since it encodes the structure information for tasks. However, in our affinity reconstruction problem, such information is not needed. Since the function itself in RKHSvv is used to describe the structure of input: affinities, we do not need to define a matrixoutput reproducing kernel any more.
Similar to the scalar setting [35], we can easily obtain a representer theorem for RKHSvv.
Theorem 3
By the representer theorem of RKHSvv, the graph refining iterative regularization framework can be rewritten in a reproducing kernel Hilbert space. The significance of the theorem is that it shows that a whole range of learning algorithms have solutions that can be expressed as expansions in terms of the training examples.
3.4 Block structure enhanced model
For an implementation of our proposed regularization framework, we provide a specific model by giving the definitions for the loss functions \(\mathscr {L}_1, \mathscr {L}_2\) and the affinity measurement function g.
3.5 Optimization for block structure enhanced model
4 Experiments
In this section, we evaluate the performance of the BSE model on both synthetic and real data. In the case of regularization, a form of capacity control leads to choosing an optimal fixed parameter for a given dataset. The key point of our work is to define and bound the capacity of the regularization framework for block structure enhanced model. In the experiments, we use the fixed hyperparameters: \(\lambda _1 = 0.1, \lambda _2 = 0.01\). We observe that in practice, our affinity matrix converges from random initialization in a few iterations, so the number of iterations is also fixed to \(T = 15\). In the experiment on synthetic data, we compare the results of BSE with that of CLR [42]. In the experiment on real data, we compare the results of BSE with that of CLR [42] and LSR [20].
4.1 Refining results on block diagonal synthetic data
4.2 Refining results on real data
Datasets We use two popular facial databases: Extended Yale Database B (YaleB) [49] and AR database [50]. For YaleB, we use the first 10 class data, each class contains 64 images. The images are resized into \(32 \times 32\). We also test a subset of AR which consists of 1400 clean faces distributed over 50 male subjects and 50 female subjects. All the AR images are downsized and normalized from \(165 \times 120\) to \(55 \times 40\). For computational efficiency, we also perform principal component analysis (PCA) to reduce the dimensionality of the YaleB and AR by reserving \(98\%\).
In the experiments, we find out that both of the BSE with \(\mathbf {K} = \mathbf {I}\) and the BSE with Gaussian kernel are convergent after only 3 iterations averagely, and the maximum iteration is less than 6.
5 Conclusions
In this paper, we provide an iterative regularization framework to refine the graph by giving the number of clusters. We design a new reproducing kernel Hilbert spaces of vectorvalued functions as the hypothesis space for this regularization framework. Moreover, we also provide a specific graph refining model which based on the observation of block structure enhanced the effect. The experiment results on synthetic and real data show the competitiveness of our method compared with CLR and LSR, which are used. The exhaustive analyses on the experiment results with different attributes present the capabilities of our method.
Notes
Acknowledgements
This research was supported by the Shenzhen Research Council (Grant No. JCYJ20160406161948211, JCYJ20160226201453085, JSGG20150331152017052, JCYJ20160531194006833), by the National Natural Science Foundation of China (Grant No. 61672183, 61272366, 61672444), by Science and Technology Planning Project of Guanddong Province (Grant No. 2016B090918047).
Compliance with ethical standards
Conflicts of interest
The authors declare that they have no competing interests.
References
 1.Peng Q, Chen YM et al (2016) A hybrid of local and global saliencies for detecting image salient region and appearance. IEEE Trans Syst Man Cybern Syst 47:1–12Google Scholar
 2.Chen WS, Zhang C, Chen S (2013) Geometric distribution weight information modeled using radial basis function with fractional order for linear discriminant analysis method. Adv Math Phys 2013(2013):885–905zbMATHGoogle Scholar
 3.Liu RZ, Tang YY, Fang B (2014) Topological coding and its application in the refinement of sift. IEEE Trans Cybern 44(11):2155–2166CrossRefGoogle Scholar
 4.Chen L, Chen CL, Lu M (2011) A multiplekernel fuzzy cmeans algorithm for image segmentation. IEEE Trans Syst Man Cybern B 41(5):1263–1274CrossRefGoogle Scholar
 5.He Z, You X, Tang YY (2008) Writer identification of chinese handwriting documents using hidden markov tree model. Pattern Recogn 41(4):1295–1307CrossRefGoogle Scholar
 6.Helli B, Moghaddam ME (2010) A textindependent persian writer identification based on feature relation graph (FRG). Pattern Recogn 43(6):2199–2209CrossRefGoogle Scholar
 7.He Z, You X, Zhou L, Cheung Y, Jianwei D (2010) Writer identification using fractal dimension of wavelet subbands in gabor domain. Integr Comput Aided Eng 17(17):157–165CrossRefGoogle Scholar
 8.Freeman William T, Willsky Alan S, Sudderth Erik B (2006) Graphical models for visual object recognition and tracking. Massachusetts Institute of TechnologyGoogle Scholar
 9.Yuan D, Lu X, Li D, Liang Y, Zhang X (2018) Particle filter redetection for visual tracking via correlation filters. Multimed Tools Appl pp 1–25Google Scholar
 10.Ma X, Liu Q et al (2016) Visual tracking via exemplar regression model. Knowl Based Syst 106:26–37CrossRefGoogle Scholar
 11.Jing XY, Zhu X, et al (2015) Superresolution person reidentification with semicoupled lowrank discriminant dictionary learning. In IEEE conference on computer vision and pattern recognition, pp 695–704Google Scholar
 12.Ou W, Yuan D, Liu Q, Cao Y (2018) Object tracking based on online representative sample selection via nonnegative least square. Multimed Tools Appl 77(9):10569–10587CrossRefGoogle Scholar
 13.Ng AY, Jordan MI, Weiss Y et al (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856Google Scholar
 14.He Z, Chung AC (2010) 3d bspline waveletbased local standard deviation (bwlsd): Its application to edge detection and vascular segmentation in magnetic resonance angiography. Int J Comput Vis 87(3):235–265CrossRefGoogle Scholar
 15.Wu F, Jing XY et al (2015) Multiview lowrank dictionary learning for image classification. Pattern Recogn 50(C):143–154zbMATHGoogle Scholar
 16.Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905CrossRefGoogle Scholar
 17.Cheng C, Li Z (2016) An efficient segmentation method based on dynamic graph merging. Int J Wavelets Multiresolut Inf ProcessGoogle Scholar
 18.Chen L, Li J, Chen CL (2013) Regional multifocus image fusion using sparse representation. Opt Express 21(4):5182–5197CrossRefGoogle Scholar
 19.Elhamifar E, Vidal R (2009) Sparse subspace clustering. In CVPRGoogle Scholar
 20.Lu CY, Min H, Zhao ZQ, Zhu L, Huang DS, Yan S (2012) Robust and efficient subspace segmentation via least squares regression. In ECCVGoogle Scholar
 21.Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by lowrank representation. In ICMLGoogle Scholar
 22.Chen L, Liu L, Philip Chen CL (2016) A robust bisparsity model with nonlocal regularization for mixed noise reduction. Inf Sci 354:101–111CrossRefGoogle Scholar
 23.Ge Q, Jing X, Wu F, Wei Z, Xiao L, Shao W, Dong Y, Li H (2016) Structurebased lowrank model with graph nuclear norm regularization for noise removalGoogle Scholar
 24.Chen WS, Yuen PC, Xie X (2011) Kernel machinebased ranklifting regularized discriminant analysis method for face recognition. Neurocomputing 74(17):2953–2960CrossRefGoogle Scholar
 25.Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgeCrossRefGoogle Scholar
 26.Schaeffer SE (2007) Survey: graph clustering. Comput Sci Rev 1(1):27–64CrossRefGoogle Scholar
 27.Poggio T, Shelton CR (2002) On the mathematical foundations of learning. Am Math Soc 39(1):1–49MathSciNetGoogle Scholar
 28.Vapnik VN, Vapnik V (1998) Statistical learning theory, vol 1. Wiley, New YorkzbMATHGoogle Scholar
 29.Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50MathSciNetCrossRefGoogle Scholar
 30.Jia C (2015) An operator approach to analysis of conditional kernel canonical correlation. Int J Wavelets Multiresolut Inf Process 13(4)Google Scholar
 31.Chen WS, Zhao Y et al (2016) Supervised kernel nonnegative matrix factorization for face recognition. Neurocomputing 205:165–181CrossRefGoogle Scholar
 32.Li X, Liu Q et al (2016) A multiview model for visual tracking via correlation filters. Knowl Based Syst 113:88–99CrossRefGoogle Scholar
 33.Berlinet A, ThomasAgnan C (2011) Reproducing kernel Hilbert spaces in probability and statistics. Springer, New YorkzbMATHGoogle Scholar
 34.De Vito E, Rosasco L, Caponnetto A, Piana M, Verri A (2004) Some properties of regularized kernel methods. J Mach Learn Res 5:1363–1390MathSciNetzbMATHGoogle Scholar
 35.Schölkopf B, Herbrich R, Smola AJ (2001) A generalized representer theorem. In Computational learning theoryGoogle Scholar
 36.Belkin M, Niyogi P, Sindhwani V (2005) On manifold regularization. In AISTATSGoogle Scholar
 37.Theodoros E, Micchelli Charles A, Massimiliano P (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6:615–637MathSciNetzbMATHGoogle Scholar
 38.Dinuzzo F, Ong CS, Pillonetto G, Gehler PV (2011) Learning output kernels with block coordinate descent. In ICMLGoogle Scholar
 39.Jawanpuria P, Lapin M, Hein M, Schiele B (2015) Efficient output kernel learning for multiple tasks. In NIPSGoogle Scholar
 40.Ciliberto C, Poggio T, Rosasco L (2015) Convex learning of multiple tasks and their structure. In ICMLGoogle Scholar
 41.He Z, Li X, You X, Tao D, Tang YY (2016) Connected component model for multiobject tracking. IEEE Trans Image Process 25(8):3698–3711MathSciNetCrossRefGoogle Scholar
 42.Nie F, Wang X, Jordan MI, Huang H (2016) The constrained laplacian rank algorithm for graphbased clustering. In AAAIGoogle Scholar
 43.Koliha JJ (2001) Block diagonalization. Math Bohemica 126(1):237–246MathSciNetzbMATHGoogle Scholar
 44.Hao Y, Lei H, Tar SXD (2005) Block structure preserving model order reduction. In IEEE international behavioral modeling and simulation workshopGoogle Scholar
 45.Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404MathSciNetCrossRefGoogle Scholar
 46.Huang J, You X, Yuan Y, Yang F, Lin L (2010) Rotation invariant iris feature extraction using gaussian markov random fields with nonseparable wavelet. Neurocomputing 73(4–6):883–894CrossRefGoogle Scholar
 47.Yuan D, Lu X, Li D, He Z, Luo N (2017) Multiple feature fused for visual tracking via correlation filters. In International conference on security, pattern analysis, and cybernetics, pp 88–93Google Scholar
 48.Jing XY, Wu F, Zhu X, Dong X, Ma F, Li Z (2016) Multispectral lowrank structured dictionary learning for face recognition. Pattern Recogn 59:14–25CrossRefGoogle Scholar
 49.Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227CrossRefGoogle Scholar
 50.Martinez AM (1998) The ar face database. CVC technical report, 24Google Scholar