Abstract
A novel binning and learning framework is presented for analyzing and applying large data sets that have no explicit knowledge of distribution parameterizations, and can only be assumed generated by the underlying probability density functions (PDFs) lying on a nonparametric statistical manifold. For models’ discretization, the uniform sampling-based data space partition is used to bin flat-distributed data sets, while the quantile-based binning is adopted for complex distributed data sets to reduce the number of under-smoothed bins in histograms on average. The compactified histogram embedding is designed so that the Fisher–Riemannian structured multinomial manifold is compatible to the intrinsic geometry of nonparametric statistical manifold, providing a computationally efficient model space for information distance calculation between binned distributions. In particular, without considering histogramming in optimal bin number, we utilize multiple random partitions on data space to embed the associated data sets onto a product multinomial manifold to integrate the complementary bin information with an information metric designed by factor geodesic distances, further alleviating the effect of over-smoothing problem. Using the equipped metric on the embedded submanifold, we improve classical manifold learning and dimension estimation algorithms in metric-adaptive versions to facilitate lower-dimensional Euclidean embedding. The effectiveness of our method is verified by visualization of data sets drawn from known manifolds, visualization and recognition on a subset of ALOI object database, and Gabor feature-based face recognition on the FERET database.
Similar content being viewed by others
Notes
ALOI: Amsterdam Library of Object Images. available from: < http://staff.science.uva.nl/~aloi/>.
FERET face database, available from: http://www.nist.gov/humanid/feret.
References
Carter KM, Reich R, Finn WG, Hero AO (2009) FINE: Fisher information non-parametric embedding. IEEE Trans Pattern Anal Mach Intell 31(11):2093–2098
Zhang Z, Chow TWS, Zhao MB (2013) Trace ratio optimization-based semi-supervised nonlinear dimensionality reduction for marginal manifold visualization. IEEE Trans Knowl Data Eng 25(5):1148–1161
Lebanon G (2005) Information geometry, the embedding principle, and document classification. In: 2nd International Symposium on Information Geometry and its Applications, 1–8
Donoho D (2000) High-dimensional data analysis: The curses and blessings of dimensionality, Aide-Memoire of a Lecture at AMS conference on Math Challenges of 21st Century. http://www-stat.stanford.edu/~donoho/Lectures/AMS2000/AMS2000.html
Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Proc. Int’l Conf. Database Theory, pp 217–235
Fu Y, Li Z, Huang TS, Katsaggelos AK (2008) Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval. Comput Vis Image Underst 110(3):390–402
Van Der Maaten LJP, Postma EO, Van Den Herik HJ (2009) Dimensionality reduction: A comparative review. TiCC TR 2009-005
Balasubramanian M, Schwartz EL (2002) The Isomap algorithm and topological stability. Science 295:7
Lafon S, Lee AB (2006) Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans Pattern Anal Mach Intell 28(9):1393–1403
Van der Maaten L (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Morrison A, Ross G, Chalmers M (2003) Fast multidimensional scaling through sampling, springs and interpolation. Inf Vis 2(1):68–77
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Belkin M, Niyogi P (2002) Laplacian Eigenmaps and spectral techniques for embedding and clustering. Neural Inf Process Systems 14:585–591
Donoho DL, Grimes C (2005) Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 102(21):7426–7431
Orsenigo C, Vercellis C (2013) A comparative study of nonlinear manifold learning methods for cancer microarray data classification. Expert Syst Appl 40(6):2189–2197
Xie B, Mu Y, Tao DC, Huang KZ (2011) m-SNE: multiview stochastic neighbor embedding. IEEE Trans Systems Man Cybern Part B 41(4):1088–1096
Lebanon G (2005) Riemannian geometry and statistical machine learning. PhD thesis, Carnegie Mellon University
Lee S-M, Abbott AL, Araman PA (2007) Dimensionality reduction and clustering on statistical manifolds. In: Proceedings of IEEE International Conference on CVPR, pp 1–7
Nielsen F (2013) Pattern learning and recognition on statistical manifolds: an information-geometric review. Lect Notes Comput Sci 7953:1–25
Zou J, Liu CC, Zhang Y, Lu GF (2013) Object recognition using Gabor co-occurrence similarity. Pattern Recogn 46(1):434–448
Zhang Y, Liu CC (2013) Gabor feature-based face recognition on product gamma manifold via region weighting. Neurocomputing 117(6):1–11
Amari S, Nagaoka H (2000) Methods of information geometry. AMS and Oxford U. Press, USA
Mio W, Badlyans D, Liu XW (2005) A computational approach to Fisher information geometry with applications to image analysis, 3757. Springer, Berlin, pp 18–33
Zhang J, Hästö P (2006) Statistical manifold as an affine space: a functional equation approach. J Math Psychol 50(1):60–65
Brunelli R, Mich O (2001) Histograms analysis for image retrieval. Pattern Recogn 34(8):1625–1637
Dias R (2011) Nonparametric estimation: smoothing and visualization. http://www.ime.unicamp.br/~dias/SDV.pdf
Elgammal A, Duraiswami R, Davis LS (2003) Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking. IEEE Trans Pattern Anal Mach Intell 25:1499–1504
He K, Meeden G (1997) Selecting the number of bins in a histogram: a decision theoretic approach. J Stat Plann Inference 61(1):49–59
Leow WK, Li R (2004) The analysis and applications of adaptive-binning color histograms. Comput Vis Image Underst 94(1–3):67–91
Shimazaki H, Shinomoto S (2007) A method for selecting the bin size of a time histogram. Neural Comput 19(6):1503–1527
Čencov NN (1982) Statistical decision rules and optimal inference. American Mathematical Society
Young RA, Lesperance RM (2001) The Gaussian Derivative model for spatial-temporal vision: II. Cortical data. Spat Vis 14(3,4):321–389
Mukhopadhyay ND, Chatterjee S (2011) High dimensional data analysis using multivariate generalized spatial quantiles. J Multivar Anal 102:768–780
Liu WF, Tao DC (2013) Multiview Hessian regularization for image annotation. IEEE Transactions on Image Processing, 22 (7): 2676-268
Liu WF, Tao DC (2014) Multiview Hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118(1):50–60
Jost J (2002) Riemannian geometry and geometric analysis. Springer, Berlin
Srivastava A, Jermyn IH, Joshi S (2007) Riemannian analysis of probability density functions with applications. In: Proceedings of IEEE CVPR’07, pp 1–8
Kamiński M, Zygierewicz J, Kuś R, Crone N (2005) Analysis of multichannel biomedical data. Acta Neurobiol Exp (Wars) 65:443–452
Skopenkov A (2001) Embedding and knotting of manifolds in Euclidean spaces. In: Young N, Choi Y (ed.) Surveys in contemporary mathematics. London Math. Soc. Lect. Notes 347 (2): 48–342
Carter KM, Hero AO, Raich R (2007) De-biasing for intrinsic dimension estimation. In: Proceedings of IEEE Statistical Signal Processing Workshop, pp 601–605
Levina E, Bickel PJ (2005) Maximum likelihood estimation of intrinsic dimension. Neural Inf Process Systems 17:777–784
Nguyen GH, Bouzerdoum A, Phung SL (2009) Learning pattern classification tasks with imbalanced data sets. Pattern Recognition, IN-TECH Publishing, 193–208
Barbehenn M, Munchen MG (1998) A note on the complexity of Dijkstra’s algorithm for graphs with weighted vertices. IEEE Trans Comput 47:263
Juang CF, Sun WK, Chen GC (2009) Object detection by color histogram-based fuzzy classifier with support vector learning. Neurocomputing 72:2464–2476
Mika S, Ratsch G, Weston J, Scholkopf B, Muller KR (1999) Fisher discriminant analysis with kernels. IEEE International Workshop on Neural Networks for Signal Processing, pp 41–48
Van der Maaten LJP (2007) An introduction to dimensionality reduction using matlab. Report MICC 07-07 2, Hotelling
Shen L, Bai L, Fairhurst M (2007) Gabor wavelets and generalized discriminant analysis for face identification and verification. Image Vis Comput 25(5):553–563
Durrett R (1996) Probability: theory and examples, 2nd edn. International Thomson Publishing Company, New York
Acknowledgments
This paper was partially supported by National Natural Science Foundation of China (Grant No. 71171003), and Anhui Natural Science Foundation (Grant No. KJ2011B022).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
Proof of Lemma 1. For any s, we need to prove
where \(\left\| {\varOmega_{{k_{s} }}^{s} } \right\| = \left| {Y_{{\left( {k_{s} + 1} \right)}}^{s} - Y_{{\left( {k_{s} } \right)}}^{s} } \right|\), and \(0 = Y_{(0)}^{s} , \ldots ,Y_{(k)}^{s} ,Y_{(k + 1)}^{s} , \ldots ,Y_{(n)}^{s} ,Y_{(n + 1)}^{s} = 1\) are the order statistics of \(0 { = }Y_{0}^{s} ,Y_{1}^{s} , \ldots ,Y_{n}^{s} ,Y_{n + 1}^{s} = 1.\) Since, the density function of k-th order statistic \(Y_{(k)}^{s}\) is
one can obtain
where \(1 \le k \le n,\) and \(\varGamma ( \cdot )\) is the Gamma function. So \({\mathbb{E}}\left| {\left( {Y_{(k)}^{s} } \right) - \left( {Y_{(k - 1)}^{s} } \right)} \right| = {\mathbb{E}}\left( {Y_{(k)}^{s} } \right) - {\mathbb{E}}\left( {Y_{(k - 1)}^{s} } \right) = \frac{1}{n + 1}\), and \(\mathop {\sup }\limits_{1 < k \le n} {\mathbb{E}}\left| {\left( {Y_{(k)}^{s} } \right) - \left( {Y_{(k - 1)}^{s} } \right)} \right| \to 0,\;\;\left( {n \to \infty } \right),\) that is,
In addition, we can obtain
and
These results show that for any fixed s,
namely,
It completes the proof. □
Appendix B
Proof of Lemma 2. For \(\forall x \in {\mathcal{X}}\), let \(C\left( {x,\varvec{Y}_{n} } \right) \subset \aleph \left( {\varvec{Y}_{n} } \right)\) be the minimum covering of \(\left[ {{\mathbf{0}},x} \right]\), and \(\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)\) be the relative complement set of \(\left[ {{\mathbf{0}},x} \right]\) in \(C\left( {x,\varvec{Y}_{n} } \right)\), that is,
By the definitions of \(F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right)\) and \(F_{{X_{i} }} \left( x \right)\), we can obtain
and then
Because \(x_{i1} , \ldots ,x_{{in_{i} }}\) are i.i.d., the expectation of the right side of above equation can be expressed as
Applying the Fubini theorem [48], we can get
For \(p_{i} \left( x \right)\) is continuous on \({\mathcal{X}}\), so exist \(M_{i} \in \left( {0,1} \right]\), making \(0 \le p_{i} \left( x \right) \le M_{i}\) for all \(x\), and then
where \({\mathbb{E}}_{i}\),\({\mathbb{E}}_{{\varvec{Y}_{n} }}\) and \({\mathbb{E}}\)are the expectation operators that correspond to the distributions determined by PDFs \(p_{i} \left( x \right)\), \(p_{{\varvec{Y}_{n} }} \left( {\varvec{y}_{n} } \right) = 1\) and \(p_{{\varvec{x}_{i1} \text{,}\varvec{Y}_{n} }} \left( {x\text{,}\varvec{y}_{n} } \right)\) respectively; \(\text{Vol}\left( \cdot \right)\) is the volume operator for some area of \({\mathcal{X}}\). For any \(x \in {\mathcal{X}}\)and \(\varvec{Y}_{n} \in U_{n} \left( {\mathcal{X}} \right)\), using the above definition, we can find
So \(\text{Vol}\left( {\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right) \le \left[ {\Delta \left( {\varvec{Y}_{n} } \right)} \right]^{d}\) (a.e.). According to lemma 1, we can obtain
It follows the conclusion. □
Appendix C
Proof of Proposition 1. According to the Glivenko–Cantelli theorem [48], one can get
which implies \(F_{{X_{i} }} \left( x \right)\) converges to \(F_{i} \left( x \right)\) in probability as \(n_{i} \to \infty\). Lemma 2 shows \(F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right)\) converges to \(F_{{X_{i} }} \left( x \right)\) in probability as \(n \to \infty\). So the conclusion holds □
Appendix D
Proof of Proposition 2. We first take \(\varvec{y} = \left( {y_{1} , \ldots ,y_{n} } \right)\) as n real variables in \((\mathop {\mathcal{X}}\limits^{o} )^{n} = \left( {{\mathbf{0}}\text{,}{\mathbf{1}}} \right)^{n}\). Based on previous definitions, for data set \(X_{i}\) of X, \(n_{{k_{1} , \ldots ,k_{d} }}^{i}\) is continuous almost everywhere (a.e.) w.r.t. \(\varvec{y}\) for fixed \(k_{1} , \ldots ,k_{d}\). Further \(H_{i} \left( \varvec{y} \right)\) defined by Eq. (16). is also a.e. continuous in \((\mathop {\mathcal{X}}\limits^{o} )^{n}\) w.r.t. \(\varvec{y}\) for all \(k_{1} , \ldots ,k_{d}\). Since the function \(\text{dist}_{g} \left( { \cdot , \cdot } \right)\) described by Eq. (6) is continuous on\({\bar{\mathcal{P}}}_{m} \times {\bar{\mathcal{P}}}_{m}\), we can get \(\text{dist}_{g} \left( {H_{i} \left( \varvec{y} \right),H_{j} \left( \varvec{y} \right)} \right)\) a.e. continuous w.r.t. y when another data set \(X_{j}\) of X is also considered. For \(\varvec{Y}_{n}^{(1)} , \ldots ,\varvec{Y}_{n}^{(T)}\) are i.i.d., so \(\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(1)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(1)} }} \left( {X_{j} } \right)} \right), \ldots ,\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(T)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(T)} }} \left( {X_{j} } \right)} \right)\) are i.i.d. almost surely, in addition, we can obtain
where \(\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( \cdot \right)\) is defined in Eq. (17). The strong law of large numbers [48] implies
Due to the continuity of \(\text{dist}_{g} \left( { \cdot , \cdot } \right)\), we can obtain
Thus, we can get \(\text{Diss}\left( {\bar{H}_{i,n,T}^{\varepsilon } ,\bar{H}_{j,n,T}^{\varepsilon } } \right) \to T^{ - 1} \cdot \sum\nolimits_{t = 1}^{T} {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)}\)as \(\varepsilon \to 0\), which follows the conclusion. □
Rights and permissions
About this article
Cite this article
Zhang, Y., Liu, C. & Zou, J. Histogram-based embedding for learning on statistical manifolds. Pattern Anal Applic 19, 21–40 (2016). https://doi.org/10.1007/s10044-014-0379-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-014-0379-5