Histogram-based embedding for learning on statistical manifolds

Zhang, Yue; Liu, Chuancai; Zou, Jian

doi:10.1007/s10044-014-0379-5

Histogram-based embedding for learning on statistical manifolds

Theoretical Advances
Published: 13 June 2014

Volume 19, pages 21–40, (2016)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Yue Zhang^1,2,
Chuancai Liu¹ &
Jian Zou²

474 Accesses
Explore all metrics

Abstract

A novel binning and learning framework is presented for analyzing and applying large data sets that have no explicit knowledge of distribution parameterizations, and can only be assumed generated by the underlying probability density functions (PDFs) lying on a nonparametric statistical manifold. For models’ discretization, the uniform sampling-based data space partition is used to bin flat-distributed data sets, while the quantile-based binning is adopted for complex distributed data sets to reduce the number of under-smoothed bins in histograms on average. The compactified histogram embedding is designed so that the Fisher–Riemannian structured multinomial manifold is compatible to the intrinsic geometry of nonparametric statistical manifold, providing a computationally efficient model space for information distance calculation between binned distributions. In particular, without considering histogramming in optimal bin number, we utilize multiple random partitions on data space to embed the associated data sets onto a product multinomial manifold to integrate the complementary bin information with an information metric designed by factor geodesic distances, further alleviating the effect of over-smoothing problem. Using the equipped metric on the embedded submanifold, we improve classical manifold learning and dimension estimation algorithms in metric-adaptive versions to facilitate lower-dimensional Euclidean embedding. The effectiveness of our method is verified by visualization of data sets drawn from known manifolds, visualization and recognition on a subset of ALOI object database, and Gabor feature-based face recognition on the FERET database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Euclidean maps for histograms: generalized Aitchison embeddings

Article 13 August 2014

Tam Le & Marco Cuturi

Locality Preserving Projections with Adaptive Neighborhood Size

Unsupervised discretization by two-dimensional MDL-based histogram

Article Open access 16 February 2023

Lincen Yang, Mitra Baratchi & Matthijs van Leeuwen

Notes

ALOI: Amsterdam Library of Object Images. available from: < http://staff.science.uva.nl/~aloi/>.
FERET face database, available from: http://www.nist.gov/humanid/feret.

References

Carter KM, Reich R, Finn WG, Hero AO (2009) FINE: Fisher information non-parametric embedding. IEEE Trans Pattern Anal Mach Intell 31(11):2093–2098
Article Google Scholar
Zhang Z, Chow TWS, Zhao MB (2013) Trace ratio optimization-based semi-supervised nonlinear dimensionality reduction for marginal manifold visualization. IEEE Trans Knowl Data Eng 25(5):1148–1161
Article Google Scholar
Lebanon G (2005) Information geometry, the embedding principle, and document classification. In: 2nd International Symposium on Information Geometry and its Applications, 1–8
Donoho D (2000) High-dimensional data analysis: The curses and blessings of dimensionality, Aide-Memoire of a Lecture at AMS conference on Math Challenges of 21st Century. http://www-stat.stanford.edu/~donoho/Lectures/AMS2000/AMS2000.html
Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Proc. Int’l Conf. Database Theory, pp 217–235
Fu Y, Li Z, Huang TS, Katsaggelos AK (2008) Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval. Comput Vis Image Underst 110(3):390–402
Article Google Scholar
Van Der Maaten LJP, Postma EO, Van Den Herik HJ (2009) Dimensionality reduction: A comparative review. TiCC TR 2009-005
Balasubramanian M, Schwartz EL (2002) The Isomap algorithm and topological stability. Science 295:7
Article Google Scholar
Lafon S, Lee AB (2006) Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Trans Pattern Anal Mach Intell 28(9):1393–1403
Article Google Scholar
Van der Maaten L (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Morrison A, Ross G, Chalmers M (2003) Fast multidimensional scaling through sampling, springs and interpolation. Inf Vis 2(1):68–77
Article Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Article Google Scholar
Belkin M, Niyogi P (2002) Laplacian Eigenmaps and spectral techniques for embedding and clustering. Neural Inf Process Systems 14:585–591
Google Scholar
Donoho DL, Grimes C (2005) Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 102(21):7426–7431
Article MATH Google Scholar
Orsenigo C, Vercellis C (2013) A comparative study of nonlinear manifold learning methods for cancer microarray data classification. Expert Syst Appl 40(6):2189–2197
Article Google Scholar
Xie B, Mu Y, Tao DC, Huang KZ (2011) m-SNE: multiview stochastic neighbor embedding. IEEE Trans Systems Man Cybern Part B 41(4):1088–1096
Article Google Scholar
Lebanon G (2005) Riemannian geometry and statistical machine learning. PhD thesis, Carnegie Mellon University
Lee S-M, Abbott AL, Araman PA (2007) Dimensionality reduction and clustering on statistical manifolds. In: Proceedings of IEEE International Conference on CVPR, pp 1–7
Nielsen F (2013) Pattern learning and recognition on statistical manifolds: an information-geometric review. Lect Notes Comput Sci 7953:1–25
Article Google Scholar
Zou J, Liu CC, Zhang Y, Lu GF (2013) Object recognition using Gabor co-occurrence similarity. Pattern Recogn 46(1):434–448
Article MATH Google Scholar
Zhang Y, Liu CC (2013) Gabor feature-based face recognition on product gamma manifold via region weighting. Neurocomputing 117(6):1–11
Article Google Scholar
Amari S, Nagaoka H (2000) Methods of information geometry. AMS and Oxford U. Press, USA
MATH Google Scholar
Mio W, Badlyans D, Liu XW (2005) A computational approach to Fisher information geometry with applications to image analysis, 3757. Springer, Berlin, pp 18–33
Zhang J, Hästö P (2006) Statistical manifold as an affine space: a functional equation approach. J Math Psychol 50(1):60–65
Article MathSciNet MATH Google Scholar
Brunelli R, Mich O (2001) Histograms analysis for image retrieval. Pattern Recogn 34(8):1625–1637
Article MATH Google Scholar
Dias R (2011) Nonparametric estimation: smoothing and visualization. http://www.ime.unicamp.br/~dias/SDV.pdf
Elgammal A, Duraiswami R, Davis LS (2003) Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking. IEEE Trans Pattern Anal Mach Intell 25:1499–1504
Article Google Scholar
He K, Meeden G (1997) Selecting the number of bins in a histogram: a decision theoretic approach. J Stat Plann Inference 61(1):49–59
Article MATH Google Scholar
Leow WK, Li R (2004) The analysis and applications of adaptive-binning color histograms. Comput Vis Image Underst 94(1–3):67–91
Article Google Scholar
Shimazaki H, Shinomoto S (2007) A method for selecting the bin size of a time histogram. Neural Comput 19(6):1503–1527
Article MathSciNet MATH Google Scholar
Čencov NN (1982) Statistical decision rules and optimal inference. American Mathematical Society
Young RA, Lesperance RM (2001) The Gaussian Derivative model for spatial-temporal vision: II. Cortical data. Spat Vis 14(3,4):321–389
Article Google Scholar
Mukhopadhyay ND, Chatterjee S (2011) High dimensional data analysis using multivariate generalized spatial quantiles. J Multivar Anal 102:768–780
Article MathSciNet MATH Google Scholar
Liu WF, Tao DC (2013) Multiview Hessian regularization for image annotation. IEEE Transactions on Image Processing, 22 (7): 2676-268
Liu WF, Tao DC (2014) Multiview Hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118(1):50–60
Article MathSciNet Google Scholar
Jost J (2002) Riemannian geometry and geometric analysis. Springer, Berlin
Book MATH Google Scholar
Srivastava A, Jermyn IH, Joshi S (2007) Riemannian analysis of probability density functions with applications. In: Proceedings of IEEE CVPR’07, pp 1–8
Kamiński M, Zygierewicz J, Kuś R, Crone N (2005) Analysis of multichannel biomedical data. Acta Neurobiol Exp (Wars) 65:443–452
Google Scholar
Skopenkov A (2001) Embedding and knotting of manifolds in Euclidean spaces. In: Young N, Choi Y (ed.) Surveys in contemporary mathematics. London Math. Soc. Lect. Notes 347 (2): 48–342
Carter KM, Hero AO, Raich R (2007) De-biasing for intrinsic dimension estimation. In: Proceedings of IEEE Statistical Signal Processing Workshop, pp 601–605
Levina E, Bickel PJ (2005) Maximum likelihood estimation of intrinsic dimension. Neural Inf Process Systems 17:777–784
Google Scholar
Nguyen GH, Bouzerdoum A, Phung SL (2009) Learning pattern classification tasks with imbalanced data sets. Pattern Recognition, IN-TECH Publishing, 193–208
Barbehenn M, Munchen MG (1998) A note on the complexity of Dijkstra’s algorithm for graphs with weighted vertices. IEEE Trans Comput 47:263
Article MathSciNet Google Scholar
Juang CF, Sun WK, Chen GC (2009) Object detection by color histogram-based fuzzy classifier with support vector learning. Neurocomputing 72:2464–2476
Article Google Scholar
Mika S, Ratsch G, Weston J, Scholkopf B, Muller KR (1999) Fisher discriminant analysis with kernels. IEEE International Workshop on Neural Networks for Signal Processing, pp 41–48
Van der Maaten LJP (2007) An introduction to dimensionality reduction using matlab. Report MICC 07-07 2, Hotelling
Shen L, Bai L, Fairhurst M (2007) Gabor wavelets and generalized discriminant analysis for face identification and verification. Image Vis Comput 25(5):553–563
Article Google Scholar
Durrett R (1996) Probability: theory and examples, 2nd edn. International Thomson Publishing Company, New York

Download references

Acknowledgments

This paper was partially supported by National Natural Science Foundation of China (Grant No. 71171003), and Anhui Natural Science Foundation (Grant No. KJ2011B022).

Author information

Authors and Affiliations

School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, 210094, China
Yue Zhang & Chuancai Liu
School of Mathematics and Physics, Anhui Polytechnic University, Wuhu, 241000, China
Yue Zhang & Jian Zou

Authors

Yue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chuancai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Zhang.

Appendices

Appendix A

Proof of Lemma 1. For any s, we need to prove

$$\hbox{max} \left\{ {{\mathbb{E}}\left( {\left\| {\varOmega_{{k_{s} }}^{s} } \right\|} \right),s = 1, \ldots ,d;1 \le k_{s} \le n + 1} \right\} \to 0\;\text{as} \, n \to \infty ,$$

(29)

where $\left\| {\varOmega_{{k_{s} }}^{s} } \right\| = \left| {Y_{{\left( {k_{s} + 1} \right)}}^{s} - Y_{{\left( {k_{s} } \right)}}^{s} } \right|$, and $0 = Y_{(0)}^{s} , \ldots ,Y_{(k)}^{s} ,Y_{(k + 1)}^{s} , \ldots ,Y_{(n)}^{s} ,Y_{(n + 1)}^{s} = 1$ are the order statistics of $0 { = }Y_{0}^{s} ,Y_{1}^{s} , \ldots ,Y_{n}^{s} ,Y_{n + 1}^{s} = 1.$ Since, the density function of k-th order statistic $Y_{(k)}^{s}$ is

$$f_{k} \left( y \right) = \frac{n!}{(k - 1)!(n - k)!} \cdot y^{k - 1} \left( {1 - y} \right)^{n - k} ,\;\;\left( {0 \le y \le 1,}\quad \right.\left. {1 \le k \le n} \right),$$

(30)

one can obtain

$${\mathbb{E}}\left( {Y_{(k)}^{s} } \right) = \frac{{\varGamma \left( {n + 1} \right) \cdot \varGamma \left( {k + 1} \right)}}{{\varGamma (k) \cdot \varGamma \left( {n + 2} \right)}} = \frac{k}{n + 1},$$

(31)

where $1 \le k \le n,$ and $\varGamma ( \cdot )$ is the Gamma function. So ${\mathbb{E}}\left| {\left( {Y_{(k)}^{s} } \right) - \left( {Y_{(k - 1)}^{s} } \right)} \right| = {\mathbb{E}}\left( {Y_{(k)}^{s} } \right) - {\mathbb{E}}\left( {Y_{(k - 1)}^{s} } \right) = \frac{1}{n + 1}$, and $\mathop {\sup }\limits_{1 < k \le n} {\mathbb{E}}\left| {\left( {Y_{(k)}^{s} } \right) - \left( {Y_{(k - 1)}^{s} } \right)} \right| \to 0,\;\;\left( {n \to \infty } \right),$ that is,

$$\left\| {\varOmega_{{k_{s} }}^{s} } \right\| = \left| {\left( {Y_{{(k_{s} )}}^{s} } \right) - \left( {Y_{{(k_{s} - 1)}}^{s} } \right)} \right|\mathop{\longrightarrow}\limits_{}^{L_{1}}0\;\;\left( {n \to \infty } \right),\quad{\text{as}}\;1 < k_{s} \le n.$$

(32)

In addition, we can obtain

$${\mathbb{E}}\left| {\left( {Y_{(1)}^{s} } \right) - \left( {Y_{(0)}^{s} } \right)} \right| = {\mathbb{E}}\left| {\left( {Y_{(1)}^{s} } \right)} \right| = \frac{1}{n + 1} \to 0\;\;\left( {n \to \infty } \right),$$

(33)

and

$${\mathbb{E}}\left| {\left( {Y_{(n + 1)}^{s} } \right) - \left( {Y_{(n)}^{s} } \right)} \right|{ = {\mathbb{E}}}\left( {Y_{(n + 1)}^{s} } \right) - {\mathbb{E}}\left( {Y_{(n)}^{s} } \right) = 1 - \frac{n}{n + 1} \to 0,\quad \left( {n \to \infty } \right).$$

(34)

These results show that for any fixed s,

$$\left\| {\varOmega_{{k_{s} }}^{s} } \right\|\mathop{\longrightarrow}\limits_{}^{L_{1}}0\;\; \quad \left( {n \to \infty } \right),k_{s} = 1, \ldots ,n + 1,$$

(35)

namely,

$$\mathop {\hbox{max} }\limits_{s} \left\{ {\left\| {\varOmega_{{k_{s} }}^{i} } \right\|,k_{s} = 1, \ldots ,n + 1} \right\}\mathop{\longrightarrow}\limits_{}^{L_{1}}0\; \quad \left( {n \to \infty } \right).$$

(36)

It completes the proof. □

Appendix B

Proof of Lemma 2. For $\forall x \in {\mathcal{X}}$, let $C\left( {x,\varvec{Y}_{n} } \right) \subset \aleph \left( {\varvec{Y}_{n} } \right)$ be the minimum covering of $\left[ {{\mathbf{0}},x} \right]$, and $\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)$ be the relative complement set of $\left[ {{\mathbf{0}},x} \right]$ in $C\left( {x,\varvec{Y}_{n} } \right)$, that is,

$$\delta_{ + } \left( {x,\varvec{Y}_{n} } \right) = \left\{ {z:z \in C\left( {x,\varvec{Y}_{n} } \right) \wedge z \notin \left[ {{\mathbf{0}},x} \right]} \right\}.$$

(37)

By the definitions of $F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right)$ and $F_{{X_{i} }} \left( x \right)$, we can obtain

$$F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right) - F_{{X_{i} }} \left( x \right) = \sum\limits_{j = 1}^{{n_{i} }} {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } ,$$

(38)

and then

$${\mathbb{E}}\left( {\left| {F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right) - F_{{X_{i} }} \left( x \right)} \right|} \right) = {\mathbb{E}}\left( {\sum\limits_{j = 1}^{{n_{i} }} {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } } \right).$$

(39)

Because $x_{i1} , \ldots ,x_{{in_{i} }}$ are i.i.d., the expectation of the right side of above equation can be expressed as

$$\sum\limits_{j = 1}^{{n_{i} }} {\left( {{\mathbb{E}}\left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } \right)} \right)} = n_{i} \cdot \left( {{\mathbb{E}}\left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } \right)} \right)$$

(40)

Applying the Fubini theorem [48], we can get

$${\mathbb{E}}\left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } \right) = \int\limits_{{(\mathop {\mathcal{X}}\limits^{o} )^{n} }} {\left( {{\mathbb{E}}_{i} \left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{y}_{n} } \right)} \right\}}} } \right)} \right)\prod\limits_{j = 1}^{n} {\text{d}y_{j} } } .$$

(41)

For $p_{i} \left( x \right)$ is continuous on ${\mathcal{X}}$, so exist $M_{i} \in \left( {0,1} \right]$, making $0 \le p_{i} \left( x \right) \le M_{i}$ for all $x$, and then

$$\begin{gathered} {\mathbb{E}}\left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right\}}} } \right) = \int\limits_{{(\mathop {\mathcal{X}}\limits^{o} )^{n} }} {\left( {{\mathbb{E}}_{i} \left( {1_{{\left\{ {x_{ij} \in \delta_{ + } \left( {x,\varvec{y}_{n} } \right)} \right\}}} } \right)} \right)\prod\limits_{j = 1}^{n} {\text{d}y_{j} } } \hfill \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \le \int\limits_{{(\mathop {\mathcal{X}}\limits^{o} )^{n} }} {\left( {M_{i} \cdot \int\limits_{{\delta_{ + } \left( {x,\varvec{y}_{n} } \right)}} {\text{dz}} } \right)} \prod\limits_{j = 1}^{n} {\text{d}y_{j} } \hfill \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = M_{i} \cdot {\mathbb{E}}_{{\varvec{Y}_{n} }} \left( {\text{Vol}\left( {\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right)} \right) \hfill \\ \end{gathered}$$

(42)

where ${\mathbb{E}}_{i}$,${\mathbb{E}}_{{\varvec{Y}_{n} }}$ and ${\mathbb{E}}$are the expectation operators that correspond to the distributions determined by PDFs $p_{i} \left( x \right)$, $p_{{\varvec{Y}_{n} }} \left( {\varvec{y}_{n} } \right) = 1$ and $p_{{\varvec{x}_{i1} \text{,}\varvec{Y}_{n} }} \left( {x\text{,}\varvec{y}_{n} } \right)$ respectively; $\text{Vol}\left( \cdot \right)$ is the volume operator for some area of ${\mathcal{X}}$. For any $x \in {\mathcal{X}}$and $\varvec{Y}_{n} \in U_{n} \left( {\mathcal{X}} \right)$, using the above definition, we can find

$$\delta_{ + } \left( {x,\varvec{Y}_{n} } \right) \subset \left[ {x^{1} ,x^{1} + \Delta \left( {\varvec{Y}_{n} } \right)} \right] \times \ldots \times \left[ {x^{d} ,x^{d} + \Delta \left( {\varvec{Y}_{n} } \right)} \right] \cap {\mathcal{X}}\;\left( {{\text{a}}.{\text{e}}.} \right).$$

(43)

So $\text{Vol}\left( {\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right) \le \left[ {\Delta \left( {\varvec{Y}_{n} } \right)} \right]^{d}$ (a.e.). According to lemma 1, we can obtain

$$\text{Vol}\left( {\delta_{ + } \left( {x,\varvec{Y}_{n} } \right)} \right)\mathop{\longrightarrow}\limits_{}^{L_{1}}0\;\; \quad \left( {n \to \infty } \right).$$

(44)

It follows the conclusion. □

Appendix C

Proof of Proposition 1. According to the Glivenko–Cantelli theorem [48], one can get

$$\sup_{x} \left| {F_{{X_{i} }} \left( x \right) - F_{i} \left( x \right)} \right|\mathop{\longrightarrow}\limits_{}^{a.s.}0\; \quad \left( {n_{i} \to \infty } \right),$$

(45)

which implies $F_{{X_{i} }} \left( x \right)$ converges to $F_{i} \left( x \right)$ in probability as $n_{i} \to \infty$. Lemma 2 shows $F_{{H_{i} \left( {\varvec{Y}_{n} } \right)}} \left( x \right)$ converges to $F_{{X_{i} }} \left( x \right)$ in probability as $n \to \infty$. So the conclusion holds □

Appendix D

Proof of Proposition 2. We first take $\varvec{y} = \left( {y_{1} , \ldots ,y_{n} } \right)$ as n real variables in $(\mathop {\mathcal{X}}\limits^{o} )^{n} = \left( {{\mathbf{0}}\text{,}{\mathbf{1}}} \right)^{n}$. Based on previous definitions, for data set $X_{i}$ of X, $n_{{k_{1} , \ldots ,k_{d} }}^{i}$ is continuous almost everywhere (a.e.) w.r.t. $\varvec{y}$ for fixed $k_{1} , \ldots ,k_{d}$. Further $H_{i} \left( \varvec{y} \right)$ defined by Eq. (16). is also a.e. continuous in $(\mathop {\mathcal{X}}\limits^{o} )^{n}$ w.r.t. $\varvec{y}$ for all $k_{1} , \ldots ,k_{d}$. Since the function $\text{dist}_{g} \left( { \cdot , \cdot } \right)$ described by Eq. (6) is continuous on${\bar{\mathcal{P}}}_{m} \times {\bar{\mathcal{P}}}_{m}$, we can get $\text{dist}_{g} \left( {H_{i} \left( \varvec{y} \right),H_{j} \left( \varvec{y} \right)} \right)$ a.e. continuous w.r.t. y when another data set $X_{j}$ of X is also considered. For $\varvec{Y}_{n}^{(1)} , \ldots ,\varvec{Y}_{n}^{(T)}$ are i.i.d., so $\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(1)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(1)} }} \left( {X_{j} } \right)} \right), \ldots ,\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(T)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(T)} }} \left( {X_{j} } \right)} \right)$ are i.i.d. almost surely, in addition, we can obtain

$${\mathbb{E}}\left( {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)} \right)\varvec{ = }{\mathbb{E}}\left( {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n} }} \left( {X_{j} } \right)} \right)} \right),\;\;t = 1, \ldots ,T,$$

(46)

where $\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( \cdot \right)$ is defined in Eq. (17). The strong law of large numbers [48] implies

$$T^{ - 1} \cdot \sum\limits_{t = 1}^{T} {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)} \mathop{\longrightarrow}\limits_{}^{\text{a}\text{.s}}{\mathbb{E}}\left( {\text{dist}_{g} \left( {H_{i} \left( {\varvec{Y}_{n} } \right),H_{j} \left( {\varvec{Y}_{n} } \right)} \right)} \right)\;\;\left( {T \to \infty } \right).$$

(47)

Due to the continuity of $\text{dist}_{g} \left( { \cdot , \cdot } \right)$, we can obtain

$$\text{dist}_{g} \left( {H_{i}^{\varepsilon } \left( {\varvec{Y}_{n}^{(t)} } \right),H_{j}^{\varepsilon } \left( {\varvec{Y}_{n}^{(t)} } \right)} \right) \to \text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)\;\;\;\left( {\varepsilon \to 0} \right),\;t = 1, \ldots ,T.$$

(48)

Thus, we can get $\text{Diss}\left( {\bar{H}_{i,n,T}^{\varepsilon } ,\bar{H}_{j,n,T}^{\varepsilon } } \right) \to T^{ - 1} \cdot \sum\nolimits_{t = 1}^{T} {\text{dist}_{g} \left( {\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{i} } \right),\varphi_{{\varvec{Y}_{n}^{(t)} }} \left( {X_{j} } \right)} \right)}$as $\varepsilon \to 0$, which follows the conclusion. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Liu, C. & Zou, J. Histogram-based embedding for learning on statistical manifolds. Pattern Anal Applic 19, 21–40 (2016). https://doi.org/10.1007/s10044-014-0379-5

Download citation

Received: 17 June 2013
Accepted: 20 May 2014
Published: 13 June 2014
Issue Date: February 2016
DOI: https://doi.org/10.1007/s10044-014-0379-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Histogram-based embedding for learning on statistical manifolds

Abstract

Access this article

Similar content being viewed by others

Adaptive Euclidean maps for histograms: generalized Aitchison embeddings

Locality Preserving Projections with Adaptive Neighborhood Size

Unsupervised discretization by two-dimensional MDL-based histogram

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Histogram-based embedding for learning on statistical manifolds

Abstract

Access this article

Similar content being viewed by others

Adaptive Euclidean maps for histograms: generalized Aitchison embeddings

Locality Preserving Projections with Adaptive Neighborhood Size

Unsupervised discretization by two-dimensional MDL-based histogram

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Appendix C

Appendix D

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation