Pareto Density Estimation: A Density Estimation for Knowledge Discovery
Pareto Density Estimation (PDE) as defined in this work is a method for the estimation of probability density functions using hyperspheres. The radius of the hyperspheres is derived from optimizing information while minimizing set size. It is shown, that PDE is a very good estimate for data containing clusters of Gaussian structure. The behavior of the method is demonstrated with respect to cluster overlap, number of clusters, different variances in different clusters and application to high dimensional data. For high dimensional data PDE is found to be appropriate for the purpose of cluster analysis. The method is tested successfully on a difficult high dimensional real world problem: stock picking in falling markets.
KeywordsDensity Estimation High Dimensional Data True Density Variable Kernel Probability Density Estimation
Unable to display preview. Download preview PDF.
- DEBOECK, G.J. and ULTSCH, A. (2002): Picking Stocks with Emergent Self-Organizing Value Maps. In: M. Novak (Ed.): Neural Networks World, 10,1–2, 203–216.Google Scholar
- DEVROYE, L. and LUGOSI, G. (2000): Variable kernel estimates: on the impossibility of tuning the parameters. In: E. Giné and D. Mason (Eds.): High-Dimensional Probability. Springer-Verlag, New York.Google Scholar
- ESTER, M., KRIEGEL, H.-P., and SANDER, J. (1996): A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. On Knowledge Discovery and Data Mining.Google Scholar
- HALL, P.( 1992): On global properties of variable bandwidth density estimators. Annals of Statistics, 20, 762–778.Google Scholar
- HINNEBURG, A. and KEIM, D.A. (1998): An Efficient Approach to Clustering in Large Multimedia Databases with Noise, Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining.Google Scholar
- MARANJIAN, S. (2002): The Best Number of Stocks, The Motley Fool, 26.Google Scholar
- O’NEIL, W.J. (1995): How to make money in stocks. Mc Gaw Hill, New York.Google Scholar
- SCOTT, D.W. (1992): Multivariate Density Estimation. Wiley-Interscience, New York.Google Scholar
- ULTSCH, A. (2001): Eine Begründung der Pareto 80/20 Regel und Grenzwerte für die ABC-Analyse, Technical Report Nr. 30, Department of Computer Science, University of Marburg.Google Scholar
- ULTSCH, A. (2003): Optimal density estimation in data containing clusters of unknown structure, Technical Report Nr. 34, Department of Computer Science, University of Marburg.Google Scholar
- XU, X., ESTER, M., KRIEGEL, H.-P., and SANDER, J. (1998): Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases, Proc. Conf. on Data Engineering, 324–331.Google Scholar