Skip to main content

A Fast Algorithm to Estimate the Square Root of Probability Density Function

  • Conference paper
  • First Online:
Research and Development in Intelligent Systems XXXIII (SGAI 2016)


A fast maximum likelihood estimator based on a linear combination of Gaussian kernels is introduced to represent the square root of probability density function. It is shown that, if the kernel centres and kernel width are known, then the underlying problem can be formulated as a Riemannian optimization one. The first order Riemannian geometry of the sphere manifold and vector transport are explored, and then the well-known Riemannian conjugate gradient algorithm is used to estimate the model parameters. For completeness the k-means clustering algorithm and a grid search are applied to determine the centers and kernel width respectively. Illustrative examples are employed to demonstrate that the proposed approach is effective in constructing the estimate of the square root of probability density function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press (2008)

    Google Scholar 

  2. Mishra, B., Meyer, G., Bach, F., Sepulchre, R.: Low-rank optimization with trace norm penalty. SIAM J. Optim. 23(4), 2124–2149 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Harandi, M., Hartley, R., Shen, C., Lovell, B., Sanderson, C.: Extrinsic methods for coding and dictionary learning on Grassmann manifolds, pp. 1–41 (2014). arXiv:1401.8126

  4. Lui, Y.M.: Advances in matrix manifolds for computer vision. Image Vision Comput. 30, 380–388 (2012)

    Article  Google Scholar 

  5. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  6. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)

    Book  MATH  Google Scholar 

  7. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  8. Chen, S., Hong, X., Harris, C.J.: Particle swarm optimization aided orthogonal forward regression for unified data modelling. IEEE Trans. Evol. Comput. 14(4), 477–499 (2010)

    Google Scholar 

  9. Rutkowski, L.: Adaptive probabilistic neural networks for pattern classification in time-varying environment. IEEE Trans. Neural Netw. 15(4), 811–827 (2004)

    Google Scholar 

  10. Yin, H., Allinson, N.W.: Self-organizing mixture networks for probability density estimation. IEEE Trans. Neural Netw. 12(2), 405–411 (2001)

    Article  Google Scholar 

  11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  12. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1066–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  13. Weston, J., Gammerman, A., Stitson, M.O., Vapnik, V., Vovk, V., Watkins, C.: Support vector density estimation. In: Schölkopf, B., Burges, C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 293–306. MIT Pres, Cambridge, MA (1999)

    Google Scholar 

  14. Vapnik, V., Mukherjee, S.: Support vector method for multivariate density estimation. In: Solla, S., Leen, T., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, pp. 659–665. MIT Press, Cambridge, MA (2000)

    Google Scholar 

  15. Girolami, M., He, C.: Probability density estimation from optimally condensed data samples. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1253–1264 (2003)

    Article  Google Scholar 

  16. Hong, X., Gao, J., Chen, S., Zia, T.: Sparse density estimation on the multinomial manifold. IEEE Trans. Neural Netw. Learn. Syst. (In Press, 2015)

    Google Scholar 

  17. Choudhury, A.: Fast machine learning algorithms for large data. Ph.D. dissertation, School of Engineering Sciences, University of Southampton (2002)

    Google Scholar 

  18. Chen, S., Hong, X., Harris, C.J., Sharkey, P.M.: Sparse modeling using forward regression with PRESS statistic and regularization. IEEE Trans. Syst. Man Cybern. Part B 34(2), 898–911 (2004)

    Article  Google Scholar 

  19. Chen, S., Hong, X., Harris, C.J.: Sparse kernel density construction using orthogonal forward regression with leave-one-out test score and local regularization. IEEE Trans. Syst. Man Cybern. Part B 34(4), 1708–1717 (2004)

    Article  Google Scholar 

  20. Chen, S., Hong, X., Harris, C.J.: An orthogonal forward regression techniques for sparse kernel density estimation. Neurocomputing 71(4–6), 931–943 (2008)

    Article  Google Scholar 

  21. Hong, X., Chen, S., Qatawneh, A., Daqrouq, K., Sheikh, M., Morfeq, A.: Sparse probability density function estimation using the minimum integrated square error. Neurocomputing 115, 122–129 (2013)

    Article  Google Scholar 

  22. Bilmes, J.A.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report ICSI-TR-97-021. University of California, Berkeley (1998)

    Google Scholar 

  23. Pinheiro, A., Vidakovic, B.: Estimating the square root of density via compactly supported wavelets. Comput. Stat. Data Anal. 25(4), 399–415 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  24. Hager, W.W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2(1), 35–58 (2006)

    MathSciNet  MATH  Google Scholar 

  25. Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15, 1455–1459 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xia Hong .

Editor information

Editors and Affiliations

Appendix A

Appendix A

To integrate \(q_{i,j}=\int K_{\sigma }\big (\varvec{x},\varvec{c}_i \big ) K_{\sigma }\big (\varvec{x},\varvec{c}_j \big ) d\varvec{x}\), we let \(\varvec{x}=[x_1,...x_m]^\mathrm{T}\), we have

$$\begin{aligned}&q_{i,j}=\frac{1}{ (2 \pi \sigma ^2)^{m} } \int ... \int \exp \left( -\frac{\Vert \varvec{x}- \varvec{c}_i \Vert ^2 }{2\sigma ^2} -\frac{\Vert \varvec{x}- \varvec{c}_j \Vert ^2 }{2\sigma ^2} \right) \nonumber \\&\ \ \ dx_1...dx_m\nonumber \\&=\frac{1}{ (2 \pi \sigma ^2)^{m} } \prod _{l=1}^{m} \int \exp \Big ( -\frac{(x_l-c_{i,l})^2}{2\sigma ^2} - \frac{(x_l- c_{j,l})^2}{2\sigma ^2} \Big )dx_l \end{aligned}$$

in which

$$\begin{aligned}&\int \exp \Big ( -\frac{(x_l-c_{i,l})^2}{2\sigma ^2} - \frac{(x_l-c_{j,l})^2}{2\sigma ^2} \Big )dx_l \nonumber \\&=\int \exp \Big ( -\frac{ x_l^2- (c_{i,l} +c_{j,l} )x_l +(c_{i,l}^2 +c_{j,l}^2)/2 }{ \sigma ^2 } \Big )dx_l\nonumber \\&=\exp \Big (-\frac{ \frac{ c_{j,l}^2 + c_{i,l}^2 }{2 }-\big (\frac{ c_{i,l} +c_{j,l} }{2} \big )^2 }{ \sigma ^2 } \Big ) \nonumber \\&\ \ \ \times \int \exp \Big ( -\frac{ \big [x_l-(c_{i,l} +c_{j,l} ) \big ]^2}{ \sigma ^2 } \Big )dx_l \end{aligned}$$

By making use of \(\int \frac{1}{\sqrt{2\pi \sigma ^2}} \exp \Big (-\frac{(x_l-c)^2}{2\sigma ^2} \Big )dx_l =1\), i.e. Gaussian density integrates to one, we have

$$\begin{aligned}&\ \ \int \exp \Big ( -\frac{(x_l-c_{i,l})^2}{2\sigma ^2}- \frac{(x_l-c_{j,l})^2}{2\sigma ^2} \Big )dx_l\nonumber \\&=\sqrt{ \pi \sigma ^2 }\exp \Big (-\frac{ \frac{ c_{j,l}^2 + c_{i,l}^2 }{2 }-\big (\frac{ c_{i,l} +c_{j,l}}{2} \big )^2 }{ \sigma ^2 } \Big )\nonumber \\&=\sqrt{ \pi \sigma ^2 }\exp \Big (-\frac{ (c_{j,l} - c_{i,l})^2 }{4 \sigma ^2 } \Big ) \end{aligned}$$


$$\begin{aligned} q_{i,j}= \frac{ 1}{ (4 \pi \sigma ^2)^{m} } \exp \left( -\frac{\Vert \varvec{c}_i- \varvec{c}_j \Vert ^2 }{4\sigma ^2} \right) =K_{\sqrt{2}\sigma }\big (\varvec{c}_i,\varvec{c}_j\big ) \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Hong, X., Gao, J. (2016). A Fast Algorithm to Estimate the Square Root of Probability Density Function. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXIII. SGAI 2016. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47174-7

  • Online ISBN: 978-3-319-47175-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics