Skip to main content
Log in

Histogram-based embedding for learning on statistical manifolds

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

A novel binning and learning framework is presented for analyzing and applying large data sets that have no explicit knowledge of distribution parameterizations, and can only be assumed generated by the underlying probability density functions (PDFs) lying on a nonparametric statistical manifold. For models’ discretization, the uniform sampling-based data space partition is used to bin flat-distributed data sets, while the quantile-based binning is adopted for complex distributed data sets to reduce the number of under-smoothed bins in histograms on average. The compactified histogram embedding is designed so that the Fisher–Riemannian structured multinomial manifold is compatible to the intrinsic geometry of nonparametric statistical manifold, providing a computationally efficient model space for information distance calculation between binned distributions. In particular, without considering histogramming in optimal bin number, we utilize multiple random partitions on data space to embed the associated data sets onto a product multinomial manifold to integrate the complementary bin information with an information metric designed by factor geodesic distances, further alleviating the effect of over-smoothing problem. Using the equipped metric on the embedded submanifold, we improve classical manifold learning and dimension estimation algorithms in metric-adaptive versions to facilitate lower-dimensional Euclidean embedding. The effectiveness of our method is verified by visualization of data sets drawn from known manifolds, visualization and recognition on a subset of ALOI object database, and Gabor feature-based face recognition on the FERET database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. ALOI: Amsterdam Library of Object Images. available from: < http://staff.science.uva.nl/~aloi/>.

  2. FERET face database, available from: http://www.nist.gov/humanid/feret.

References

  1. Carter KM, Reich R, Finn WG, Hero AO (2009) FINE: Fisher information non-parametric embedding. IEEE Trans Pattern Anal Mach Intell 31(11):2093–2098

    Article  Google Scholar 

  2. Zhang Z, Chow TWS, Zhao MB (2013) Trace ratio optimization-based semi-supervised nonlinear dimensionality reduction for marginal manifold visualization. IEEE Trans Knowl Data Eng 25(5):1148–1161

    Article  Google Scholar 

  3. Lebanon G (2005) Information geometry, the embedding principle, and document classification. In: 2nd International Symposium on Information Geometry and its Applications, 1–8

  4. Donoho D (2000) High-dimensional data analysis: The curses and blessings of dimensionality, Aide-Memoire of a Lecture at AMS conference on Math Challenges of 21st Century. http://www-stat.stanford.edu/~donoho/Lectures/AMS2000/AMS2000.html

  5. Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Proc. Int’l Conf. Database Theory, pp 217–235

  6. Fu Y, Li Z, Huang TS, Katsaggelos AK (2008) Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval. Comput Vis Image Underst 110(3):390–402

    Article  Google Scholar 

  7. Van Der Maaten LJP, Postma EO, Van Den Herik HJ (2009) Dimensionality reduction: A comparative review. TiCC TR 2009-005

  8. Balasubramanian M, Schwartz EL (2002) The Isomap algorithm and topological stability. Science 295:7

    Article  Google Scholar 

  9. Lafon S, Lee AB (2006) Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans Pattern Anal Mach Intell 28(9):1393–1403

    Article  Google Scholar 

  10. Van der Maaten L (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  11. Morrison A, Ross G, Chalmers M (2003) Fast multidimensional scaling through sampling, springs and interpolation. Inf Vis 2(1):68–77

    Article  Google Scholar 

  12. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  Google Scholar 

  13. Belkin M, Niyogi P (2002) Laplacian Eigenmaps and spectral techniques for embedding and clustering. Neural Inf Process Systems 14:585–591

    Google Scholar 

  14. Donoho DL, Grimes C (2005) Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 102(21):7426–7431

    Article  MATH  Google Scholar 

  15. Orsenigo C, Vercellis C (2013) A comparative study of nonlinear manifold learning methods for cancer microarray data classification. Expert Syst Appl 40(6):2189–2197

    Article  Google Scholar 

  16. Xie B, Mu Y, Tao DC, Huang KZ (2011) m-SNE: multiview stochastic neighbor embedding. IEEE Trans Systems Man Cybern Part B 41(4):1088–1096

    Article  Google Scholar 

  17. Lebanon G (2005) Riemannian geometry and statistical machine learning. PhD thesis, Carnegie Mellon University

  18. Lee S-M, Abbott AL, Araman PA (2007) Dimensionality reduction and clustering on statistical manifolds. In: Proceedings of IEEE International Conference on CVPR, pp 1–7

  19. Nielsen F (2013) Pattern learning and recognition on statistical manifolds: an information-geometric review. Lect Notes Comput Sci 7953:1–25

    Article  Google Scholar 

  20. Zou J, Liu CC, Zhang Y, Lu GF (2013) Object recognition using Gabor co-occurrence similarity. Pattern Recogn 46(1):434–448

    Article  MATH  Google Scholar 

  21. Zhang Y, Liu CC (2013) Gabor feature-based face recognition on product gamma manifold via region weighting. Neurocomputing 117(6):1–11

    Article  Google Scholar 

  22. Amari S, Nagaoka H (2000) Methods of information geometry. AMS and Oxford U. Press, USA

    MATH  Google Scholar 

  23. Mio W, Badlyans D, Liu XW (2005) A computational approach to Fisher information geometry with applications to image analysis, 3757. Springer, Berlin, pp 18–33

  24. Zhang J, Hästö P (2006) Statistical manifold as an affine space: a functional equation approach. J Math Psychol 50(1):60–65

    Article  MathSciNet  MATH  Google Scholar 

  25. Brunelli R, Mich O (2001) Histograms analysis for image retrieval. Pattern Recogn 34(8):1625–1637

    Article  MATH  Google Scholar 

  26. Dias R (2011) Nonparametric estimation: smoothing and visualization. http://www.ime.unicamp.br/~dias/SDV.pdf

  27. Elgammal A, Duraiswami R, Davis LS (2003) Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking. IEEE Trans Pattern Anal Mach Intell 25:1499–1504

    Article  Google Scholar 

  28. He K, Meeden G (1997) Selecting the number of bins in a histogram: a decision theoretic approach. J Stat Plann Inference 61(1):49–59

    Article  MATH  Google Scholar 

  29. Leow WK, Li R (2004) The analysis and applications of adaptive-binning color histograms. Comput Vis Image Underst 94(1–3):67–91

    Article  Google Scholar 

  30. Shimazaki H, Shinomoto S (2007) A method for selecting the bin size of a time histogram. Neural Comput 19(6):1503–1527

    Article  MathSciNet  MATH  Google Scholar 

  31. Čencov NN (1982) Statistical decision rules and optimal inference. American Mathematical Society

  32. Young RA, Lesperance RM (2001) The Gaussian Derivative model for spatial-temporal vision: II. Cortical data. Spat Vis 14(3,4):321–389

    Article  Google Scholar 

  33. Mukhopadhyay ND, Chatterjee S (2011) High dimensional data analysis using multivariate generalized spatial quantiles. J Multivar Anal 102:768–780

    Article  MathSciNet  MATH  Google Scholar 

  34. Liu WF, Tao DC (2013) Multiview Hessian regularization for image annotation. IEEE Transactions on Image Processing, 22 (7): 2676-268

  35. Liu WF, Tao DC (2014) Multiview Hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118(1):50–60

    Article  MathSciNet  Google Scholar 

  36. Jost J (2002) Riemannian geometry and geometric analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  37. Srivastava A, Jermyn IH, Joshi S (2007) Riemannian analysis of probability density functions with applications. In: Proceedings of IEEE CVPR’07, pp 1–8

  38. Kamiński M, Zygierewicz J, Kuś R, Crone N (2005) Analysis of multichannel biomedical data. Acta Neurobiol Exp (Wars) 65:443–452

    Google Scholar 

  39. Skopenkov A (2001) Embedding and knotting of manifolds in Euclidean spaces. In: Young N, Choi Y (ed.) Surveys in contemporary mathematics. London Math. Soc. Lect. Notes 347 (2): 48–342

  40. Carter KM, Hero AO, Raich R (2007) De-biasing for intrinsic dimension estimation. In: Proceedings of IEEE Statistical Signal Processing Workshop, pp 601–605

  41. Levina E, Bickel PJ (2005) Maximum likelihood estimation of intrinsic dimension. Neural Inf Process Systems 17:777–784

    Google Scholar 

  42. Nguyen GH, Bouzerdoum A, Phung SL (2009) Learning pattern classification tasks with imbalanced data sets. Pattern Recognition, IN-TECH Publishing, 193–208

  43. Barbehenn M, Munchen MG (1998) A note on the complexity of Dijkstra’s algorithm for graphs with weighted vertices. IEEE Trans Comput 47:263

    Article  MathSciNet  Google Scholar 

  44. Juang CF, Sun WK, Chen GC (2009) Object detection by color histogram-based fuzzy classifier with support vector learning. Neurocomputing 72:2464–2476

    Article  Google Scholar 

  45. Mika S, Ratsch G, Weston J, Scholkopf B, Muller KR (1999) Fisher discriminant analysis with kernels. IEEE International Workshop on Neural Networks for Signal Processing, pp 41–48

  46. Van der Maaten LJP (2007) An introduction to dimensionality reduction using matlab. Report MICC 07-07 2, Hotelling

  47. Shen L, Bai L, Fairhurst M (2007) Gabor wavelets and generalized discriminant analysis for face identification and verification. Image Vis Comput 25(5):553–563

    Article  Google Scholar 

  48. Durrett R (1996) Probability: theory and examples, 2nd edn. International Thomson Publishing Company, New York

Download references

Acknowledgments

This paper was partially supported by National Natural Science Foundation of China (Grant No. 71171003), and Anhui Natural Science Foundation (Grant No. KJ2011B022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Zhang.

Appendices

Appendix A

Proof of Lemma 1. For any s, we need to prove

$$\hbox{max} \left\{ {{\mathbb{E}}\left( {\left\| {\varOmega_{{k_{s} }}^{s} } \right\|} \right),s = 1, \ldots ,d;1 \le k_{s} \le n + 1} \right\} \to 0\;\text{as} \, n \to \infty ,$$
(29)

where \(\left\| {\varOmega_{{k_{s} }}^{s} } \right\| = \left| {Y_{{\left( {k_{s} + 1} \right)}}^{s} - Y_{{\left( {k_{s} } \right)}}^{s} } \right|\), and \(0 = Y_{(0)}^{s} , \ldots ,Y_{(k)}^{s} ,Y_{(k + 1)}^{s} , \ldots ,Y_{(n)}^{s} ,Y_{(n + 1)}^{s} = 1\) are the order statistics of \(0 { = }Y_{0}^{s} ,Y_{1}^{s} , \ldots ,Y_{n}^{s} ,Y_{n + 1}^{s} = 1.\) Since, the density function of k-th order statistic \(Y_{(k)}^{s}\) is

$$f_{k} \left( y \right) = \frac{n!}{(k - 1)!(n - k)!} \cdot y^{k - 1} \left( {1 - y} \right)^{n - k} ,\;\;\left( {0 \le y \le 1,}\quad \right.\left. {1 \le k \le n} \right),$$
(30)

one can obtain

$${\mathbb{E}}\left( {Y_{(k)}^{s} } \right) = \frac{{\varGamma \left( {n + 1} \right) \cdot \varGamma \left( {k + 1} \right)}}{{\varGamma (k) \cdot \varGamma \left( {n + 2} \right)}} = \frac{k}{n + 1},$$
(31)

where \(1 \le k \le n,\) and \(\varGamma ( \cdot )\) is the Gamma function. So \({\mathbb{E}}\left| {\left( {Y_{(k)}^{s} } \right) - \left( {Y_{(k - 1)}^{s} } \right)} \right| = {\mathbb{E}}\left( {Y_{(k)}^{s} } \right) - {\mathbb{E}}\left( {Y_{(k - 1)}^{s} } \right) = \frac{1}{n + 1}\), and \(\mathop {\sup }\limits_{1 < k \le n} {\mathbb{E}}\left| {\left( {Y_{(k)}^{s} } \right) - \left( {Y_{(k - 1)}^{s} } \right)} \right| \to 0,\;\;\left( {n \to \infty } \right),\) that is,

$$\left\| {\varOmega_{{k_{s} }}^{s} } \right\| = \left| {\left( {Y_{{(k_{s} )}}^{s} } \right) - \left( {Y_{{(k_{s} - 1)}}^{s} } \right)} \right|\mathop{\longrightarrow}\limits_{}^{L_{1}}0\;\;\left( {n \to \infty } \right),\quad{\text{as}}\;1 < k_{s} \le n.$$
(32)

In addition, we can obtain

$${\mathbb{E}}\left| {\left( {Y_{(1)}^{s} } \right) - \left( {Y_{(0)}^{s} } \right)} \right| = {\mathbb{E}}\left| {\left( {Y_{(1)}^{s} } \right)} \right| = \frac{1}{n + 1} \to 0\;\;\left( {n \to \infty } \right),$$
(33)

and

$${\mathbb{E}}\left| {\left( {Y_{(n + 1)}^{s} } \right) - \left( {Y_{(n)}^{s} } \right)} \right|{ = {\mathbb{E}}}\left( {Y_{(n + 1)}^{s} } \right) - {\mathbb{E}}\left( {Y_{(n)}^{s} } \right) = 1 - \frac{n}{n + 1} \to 0,\quad \left( {n \to \infty } \right).$$
(34)

These results show that for any fixed s,

$$\left\| {\varOmega_{{k_{s} }}^{s} } \right\|\mathop{\longrightarrow}\limits_{}^{L_{1}}0\;\; \quad \left( {n \to \infty } \right),k_{s} = 1, \ldots ,n + 1,$$
(35)

namely,

$$\mathop {\hbox{max} }\limits_{s} \left\{ {\left\| {\varOmega_{{k_{s} }}^{i} } \right\|,k_{s} = 1, \ldots ,n + 1} \right\}\mathop{\longrightarrow}\limits_{}^{L_{1}}0\; \quad \left( {n \to \infty } \right).$$
(36)

It completes the proof. □

Appendix B

Proof of Lemma 2. For \(\forall x \in {\mathcal{X}}\), let \(C\left( {x,\varvec{Y}_{n} } \right) \subset \aleph \left( {\varvec{Y}_{n} } \right)\) be the minimum covering of \(\left[ {{\mathbf{0}},x} \right]\), and \(\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)\) be the relative complement set of \(\left[ {{\mathbf{0}},x} \right]\) in \(C\left( {x,\varvec{Y}_{n} } \right)\), that is,

$$\delta_{ + } \left( {x,\varvec{Y}_{n} } \right) = \left\{ {z:z \in C\left( {x,\varvec{Y}_{n} } \right) \wedge z \notin \left[ {{\mathbf{0}},x} \right]} \right\}.$$
(37)

By the definitions of \(F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right)\) and \(F_{{X_{i} }} \left( x \right)\), we can obtain

$$F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right) - F_{{X_{i} }} \left( x \right) = \sum\limits_{j = 1}^{{n_{i} }} {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } ,$$
(38)

and then

$${\mathbb{E}}\left( {\left| {F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right) - F_{{X_{i} }} \left( x \right)} \right|} \right) = {\mathbb{E}}\left( {\sum\limits_{j = 1}^{{n_{i} }} {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } } \right).$$
(39)

Because \(x_{i1} , \ldots ,x_{{in_{i} }}\) are i.i.d., the expectation of the right side of above equation can be expressed as

$$\sum\limits_{j = 1}^{{n_{i} }} {\left( {{\mathbb{E}}\left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } \right)} \right)} = n_{i} \cdot \left( {{\mathbb{E}}\left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } \right)} \right)$$
(40)

Applying the Fubini theorem [48], we can get

$${\mathbb{E}}\left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } \right) = \int\limits_{{(\mathop {\mathcal{X}}\limits^{o} )^{n} }} {\left( {{\mathbb{E}}_{i} \left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{y}_{n} } \right)} \right\}}} } \right)} \right)\prod\limits_{j = 1}^{n} {\text{d}y_{j} } } .$$
(41)

For \(p_{i} \left( x \right)\) is continuous on \({\mathcal{X}}\), so exist \(M_{i} \in \left( {0,1} \right]\), making \(0 \le p_{i} \left( x \right) \le M_{i}\) for all \(x\), and then

$$\begin{gathered} {\mathbb{E}}\left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } \right) = \int\limits_{{(\mathop {\mathcal{X}}\limits^{o} )^{n} }} {\left( {{\mathbb{E}}_{i} \left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{y}_{n} } \right)} \right\}}} } \right)} \right)\prod\limits_{j = 1}^{n} {\text{d}y_{j} } } \hfill \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \le \int\limits_{{(\mathop {\mathcal{X}}\limits^{o} )^{n} }} {\left( {M_{i} \cdot \int\limits_{{\delta_{ + } \left( {x,\varvec{y}_{n} } \right)}} {\text{dz}} } \right)} \prod\limits_{j = 1}^{n} {\text{d}y_{j} } \hfill \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = M_{i} \cdot {\mathbb{E}}_{{\varvec{Y}_{n} }} \left( {\text{Vol}\left( {\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right)} \right) \hfill \\ \end{gathered}$$
(42)

where \({\mathbb{E}}_{i}\),\({\mathbb{E}}_{{\varvec{Y}_{n} }}\) and \({\mathbb{E}}\)are the expectation operators that correspond to the distributions determined by PDFs \(p_{i} \left( x \right)\), \(p_{{\varvec{Y}_{n} }} \left( {\varvec{y}_{n} } \right) = 1\) and \(p_{{\varvec{x}_{i1} \text{,}\varvec{Y}_{n} }} \left( {x\text{,}\varvec{y}_{n} } \right)\) respectively; \(\text{Vol}\left( \cdot \right)\) is the volume operator for some area of \({\mathcal{X}}\). For any \(x \in {\mathcal{X}}\)and \(\varvec{Y}_{n} \in U_{n} \left( {\mathcal{X}} \right)\), using the above definition, we can find

$$\delta_{ + } \left( {x,\varvec{Y}_{n} } \right) \subset \left[ {x^{1} ,x^{1} + \Delta \left( {\varvec{Y}_{n} } \right)} \right] \times \ldots \times \left[ {x^{d} ,x^{d} + \Delta \left( {\varvec{Y}_{n} } \right)} \right] \cap {\mathcal{X}}\;\left( {{\text{a}}.{\text{e}}.} \right).$$
(43)

So \(\text{Vol}\left( {\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right) \le \left[ {\Delta \left( {\varvec{Y}_{n} } \right)} \right]^{d}\) (a.e.). According to lemma 1, we can obtain

$$\text{Vol}\left( {\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right)\mathop{\longrightarrow}\limits_{}^{L_{1}}0\;\; \quad \left( {n \to \infty } \right).$$
(44)

It follows the conclusion. □

Appendix C

Proof of Proposition 1. According to the Glivenko–Cantelli theorem [48], one can get

$$\sup_{x} \left| {F_{{X_{i} }} \left( x \right) - F_{i} \left( x \right)} \right|\mathop{\longrightarrow}\limits_{}^{a.s.}0\; \quad \left( {n_{i} \to \infty } \right),$$
(45)

which implies \(F_{{X_{i} }} \left( x \right)\) converges to \(F_{i} \left( x \right)\) in probability as \(n_{i} \to \infty\). Lemma 2 shows \(F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right)\) converges to \(F_{{X_{i} }} \left( x \right)\) in probability as \(n \to \infty\). So the conclusion holds □

Appendix D

Proof of Proposition 2. We first take \(\varvec{y} = \left( {y_{1} , \ldots ,y_{n} } \right)\) as n real variables in \((\mathop {\mathcal{X}}\limits^{o} )^{n} = \left( {{\mathbf{0}}\text{,}{\mathbf{1}}} \right)^{n}\). Based on previous definitions, for data set \(X_{i}\) of X, \(n_{{k_{1} , \ldots ,k_{d} }}^{i}\) is continuous almost everywhere (a.e.) w.r.t. \(\varvec{y}\) for fixed \(k_{1} , \ldots ,k_{d}\). Further \(H_{i} \left( \varvec{y} \right)\) defined by Eq. (16). is also a.e. continuous in \((\mathop {\mathcal{X}}\limits^{o} )^{n}\) w.r.t. \(\varvec{y}\) for all \(k_{1} , \ldots ,k_{d}\). Since the function \(\text{dist}_{g} \left( { \cdot , \cdot } \right)\) described by Eq. (6) is continuous on\({\bar{\mathcal{P}}}_{m} \times {\bar{\mathcal{P}}}_{m}\), we can get \(\text{dist}_{g} \left( {H_{i} \left( \varvec{y} \right),H_{j} \left( \varvec{y} \right)} \right)\) a.e. continuous w.r.t. y when another data set \(X_{j}\) of X is also considered. For \(\varvec{Y}_{n}^{(1)} , \ldots ,\varvec{Y}_{n}^{(T)}\) are i.i.d., so \(\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(1)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(1)} }} \left( {X_{j} } \right)} \right), \ldots ,\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(T)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(T)} }} \left( {X_{j} } \right)} \right)\) are i.i.d. almost surely, in addition, we can obtain

$${\mathbb{E}}\left( {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)} \right)\varvec{ = }{\mathbb{E}}\left( {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n} }} \left( {X_{j} } \right)} \right)} \right),\;\;t = 1, \ldots ,T,$$
(46)

where \(\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( \cdot \right)\) is defined in Eq. (17). The strong law of large numbers [48] implies

$$T^{ - 1} \cdot \sum\limits_{t = 1}^{T} {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)} \mathop{\longrightarrow}\limits_{}^{\text{a}\text{.s}}{\mathbb{E}}\left( {\text{dist}_{g} \left( {H_{i} \left( {\varvec{Y}_{n} } \right),H_{j} \left( {\varvec{Y}_{n} } \right)} \right)} \right)\;\;\left( {T \to \infty } \right).$$
(47)

Due to the continuity of \(\text{dist}_{g} \left( { \cdot , \cdot } \right)\), we can obtain

$$\text{dist}_{g} \left( {H_{i}^{\varepsilon } \left( {\varvec{Y}_{n}^{(t)} } \right),H_{j}^{\varepsilon } \left( {\varvec{Y}_{n}^{(t)} } \right)} \right) \to \text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)\;\;\;\left( {\varepsilon \to 0} \right),\;t = 1, \ldots ,T.$$
(48)

Thus, we can get \(\text{Diss}\left( {\bar{H}_{i,n,T}^{\varepsilon } ,\bar{H}_{j,n,T}^{\varepsilon } } \right) \to T^{ - 1} \cdot \sum\nolimits_{t = 1}^{T} {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)}\)as \(\varepsilon \to 0\), which follows the conclusion. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Liu, C. & Zou, J. Histogram-based embedding for learning on statistical manifolds. Pattern Anal Applic 19, 21–40 (2016). https://doi.org/10.1007/s10044-014-0379-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-014-0379-5

Keywords

Navigation