Pareto Density Estimation: A Density Estimation for Knowledge Discovery

  • Alfred Ultsch
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Pareto Density Estimation (PDE) as defined in this work is a method for the estimation of probability density functions using hyperspheres. The radius of the hyperspheres is derived from optimizing information while minimizing set size. It is shown, that PDE is a very good estimate for data containing clusters of Gaussian structure. The behavior of the method is demonstrated with respect to cluster overlap, number of clusters, different variances in different clusters and application to high dimensional data. For high dimensional data PDE is found to be appropriate for the purpose of cluster analysis. The method is tested successfully on a difficult high dimensional real world problem: stock picking in falling markets.


Density Estimation High Dimensional Data True Density Variable Kernel Probability Density Estimation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. DEBOECK, G.J. and ULTSCH, A. (2002): Picking Stocks with Emergent Self-Organizing Value Maps. In: M. Novak (Ed.): Neural Networks World, 10,1–2, 203–216.Google Scholar
  2. DEVROYE, L. and LUGOSI, G. (1996): A universally acceptable smoothing factor for kernel density estimation. Annals of Statistics, 24, 2499–2512.MathSciNetCrossRefGoogle Scholar
  3. DEVROYE, L. and LUGOSI, G. (1997): Non-asymptotic universal smoothing factors kernel complexity and Yatracos classes. Annals of Stat., 25, 2626–2637.MathSciNetCrossRefGoogle Scholar
  4. DEVROYE, L. and LUGOSI, G. (2000): Variable kernel estimates: on the impossibility of tuning the parameters. In: E. Giné and D. Mason (Eds.): High-Dimensional Probability. Springer-Verlag, New York.Google Scholar
  5. ESTER, M., KRIEGEL, H.-P., and SANDER, J. (1996): A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. On Knowledge Discovery and Data Mining.Google Scholar
  6. HALL, P.( 1992): On global properties of variable bandwidth density estimators. Annals of Statistics, 20, 762–778.Google Scholar
  7. HINNEBURG, A. and KEIM, D.A. (1998): An Efficient Approach to Clustering in Large Multimedia Databases with Noise, Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining.Google Scholar
  8. MARANJIAN, S. (2002): The Best Number of Stocks, The Motley Fool, 26.Google Scholar
  9. O’NEIL, W.J. (1995): How to make money in stocks. Mc Gaw Hill, New York.Google Scholar
  10. SCOTT, D.W. (1992): Multivariate Density Estimation. Wiley-Interscience, New York.Google Scholar
  11. ULTSCH, A. (2001): Eine Begründung der Pareto 80/20 Regel und Grenzwerte für die ABC-Analyse, Technical Report Nr. 30, Department of Computer Science, University of Marburg.Google Scholar
  12. ULTSCH, A. (2003): Optimal density estimation in data containing clusters of unknown structure, Technical Report Nr. 34, Department of Computer Science, University of Marburg.Google Scholar
  13. XU, X., ESTER, M., KRIEGEL, H.-P., and SANDER, J. (1998): Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases, Proc. Conf. on Data Engineering, 324–331.Google Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 2005

Authors and Affiliations

  • Alfred Ultsch
    • 1
  1. 1.Databionics Research GroupUniversity of MarburgMarburgGermany

Personalised recommendations