Online estimation of discrete, continuous, and conditional joint densities using classifier chains

Geilke, Michael; Karwath, Andreas; Frank, Eibe; Kramer, Stefan

doi:10.1007/s10618-017-0546-6

Online estimation of discrete, continuous, and conditional joint densities using classifier chains

Published: 25 November 2017

Volume 32, pages 561–603, (2018)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Michael Geilke¹,
Andreas Karwath¹,
Eibe Frank² &
…
Stefan Kramer¹

921 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

We address the problem of estimating discrete, continuous, and conditional joint densities online, i.e., the algorithm is only provided the current example and its current estimate for its update. The family of proposed online density estimators, estimation of densities online (EDO), uses classifier chains to model dependencies among features, where each classifier in the chain estimates the probability of one particular feature. Because a single chain may not provide a reliable estimate, we also consider ensembles of classifier chains and ensembles of weighted classifier chains. For all density estimators, we provide consistency proofs and propose algorithms to perform certain inference tasks. The empirical evaluation of the estimators is conducted in several experiments and on datasets of up to several millions of instances. In the discrete case, we compare our estimators to density estimates computed by Bayesian structure learners. In the continuous case, we compare them to a state-of-the-art online density estimator. Our experiments demonstrate that, even though designed to work online, EDO delivers estimators of competitive accuracy compared to other density estimators (batch Bayesian structure learners on discrete datasets and the state-of-the-art online density estimator on continuous datasets). Besides achieving similar performance in these cases, EDO is also able to estimate densities with mixed types of variables, i.e., discrete and continuous random variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear combination of densities and its direct estimation framework with applications

Article 02 July 2015

Min Xu, Guanjin Wang, … Shitong Wang

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Article 22 May 2018

François Petitjean, Wray Buntine, … Nayyar Zaidi

Ensembles of density estimators for positive-unlabeled learning

Article 01 March 2019

T. M. A. Basile, N. Di Mauro, … A. Vergari

Notes

Below we define the problem in a more general way to consider also drift and recurrent distributions, but we focus only on the most fundamental problem of estimating a single distribution from a stream in this paper.
https://github.com/geilke/mideo.
Please notice that we also compared the online density estimator with a corresponding batch version. The results are available in Online Resource 1.
Please note that the problem of having too few examples for accurately estimating the CPTs could be less prominent when the CPTs are replaced by decision trees (Friedman and Goldszmidt 1996; Su and Zhang 2006).
Unfortunately, even after several emails, the authors of RS-Forest did not respond to our request to share their program.

References

Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1–2):105–139
Article Google Scholar
Bifet A, Holmes G, Pfahringer B, Kranen P, Kremer H, Jansen T, Seidl T (2010) MOA: massive online analysis, a framework for stream classification and clustering. J Mach Learn Res Proc Track 11:44–50
Google Scholar
Blum A (1996) On-line algorithms in machine learning. In: Proceedings of the workshop on On-line Algorithms, Dagstuhl. Springer, pp 306–325
Buchwald F, Girschick T, Frank E, Kramer S (2010) Fast conditional density estimation for quantitative structure-activity relationships. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence, pp 1268–1273
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge
Book MATH Google Scholar
Chakraborty S (2008) Some applications of dirac’s delta function in statistics for more than one random variable. Appl Appl Math Int J (AAM) 3(1):4254
MathSciNet MATH Google Scholar
Cheng MY, Gasser T, Hall P (1999) Nonparametric density estimation under unimodality and monotonicity constraints. J Comput Graph Stat 8(1):1–21
MathSciNet Google Scholar
Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
MATH Google Scholar
Davies S, Moore AW (2002) Interpolating conditional density trees. In: Uncertainty in artificial intelligence, pp 119–127
Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In: International conference on machine learning, pp 279–286
Dembczynski K, Waegeman W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. In: Proceedings of the 20th European conference on artificial intelligence (ECAI 2012), pp 294–299
Dembczynski K, Kotlowski W, Waegeman W, Busa-Fekete R, Hüllermeier E (2016) Consistency of probabilistic classifier trees. In: Proceedings of the 2016 European conference on machine learning and knowledge discovery in databases (ECML PKDD 2016), pp 511–526
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Knowledge discovery and data mining, pp 71–80
Elgammal A, Duraiswami R, Davis LS (2003) Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking. IEEE Trans Pattern Anal Mach Intell 25:1499–1504
Article Google Scholar
Frank E, Bouckaert RR (2009) Conditional density estimation with class probability estimators. In: Proceedings of first Asian conference on machine learning, pp 65–81
Frank E, Kramer S (2004) Ensembles of nested dichotomies for multi-class problems. In: Proceedings of the 21st international conference of machine learning, pp 305–312
Friedman N, Goldszmidt M (1996) Learning bayesian networks with local structure. In: Proceedings of the twelfth annual conference on uncertainty in artificial intelligence (UAI ’96), pp 252–262
Gama J, Pinto C (2006) Discretization from data streams: applications to histograms and data mining. In: SAC, pp 662–667
Geilke M, Karwath A, Frank E, Kramer S (2013) Online estimation of discrete densities. In: Proceedings of the 13th IEEE international conference on data mining, pp 191–200
Geilke M, Karwath A, Kramer S (2014) A probabilistic condensed representation of data for stream mining. In: Proceedings of the 2014 international conference on data science and advanced analytics (DSAA 2014), IEEE, pp 297–303
Geilke M, Karwath A, Kramer S (2015) Modeling recurrent distributions in streams using possible worlds. In: Proceedings of the 2015 international conference on data science and advanced analytics (DSAA 2015), pp 1–9
Goldberger J, Roweis ST (2004) Hierarchical clustering of a mixture model. Adv Neural Inf Process Syst 17:505–512
Google Scholar
Hall P, Presnell B (1999) Density estimation under constraints. J Comput Graph Stat 8(2):259–277
MathSciNet Google Scholar
Holmes MP, Gray AG, Isbell CL Jr (2012) Fast nonparametric conditional density estimation. CoRR arXiv:abs/1206.5278
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Knowledge discovery and data mining, pp 97–106
Hwang JN, Lay SR, Lippman A (1994) Nonparametric multivariate density estimation: a comparative study. IEEE Trans Signal Process 42(10):2795–2810
Article Google Scholar
Kim J, Scott CD (2012) Robust kernel density estimation. J Mach Learn Res 13:2529–2565
MathSciNet MATH Google Scholar
Kristan M, Leonardis A (2010) Online discriminative kernel density estimation. In: International conference on pattern recognition, pp 581–584
Kristan M, Leonardis A, Skocaj D (2011) Multivariate online kernel density estimation with gaussian kernels. Pattern Recogn 44(10–11):2630–2642
Article MATH Google Scholar
Kumar A, Vembu S, Menon AK, Elkan C (2013) Beam search algorithms for multilabel learning. Mach Learn 92(1):65–89
Article MathSciNet MATH Google Scholar
Lambert CG, Harrington SE, Harvey CR, Glodjo A (1999) Efficient on-line nonparametric kernel density estimation. Algorithmica 25(1):37–57
Article MathSciNet MATH Google Scholar
Littlestone N (1987) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318
Google Scholar
Liu H, Lafferty JD, Wasserman LA (2007) Sparse nonparametric density estimation in high dimensions using the rodeo. In: Proceedings of the eleventh international conference on artificial intelligence and statistics, pp 283–290
Mann TP (2006) Numerically stable hidden Markov model implementation. HMM Scaling Tutor, pp 1–8.
Melançon G, Philippe F (2004) Generating connected acyclic digraphs uniformly at random. Inf Process Lett 90(4):209–213
Article MathSciNet MATH Google Scholar
Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, New York
Book MATH Google Scholar
Peherstorfer B, Pflüger D, Bungartz H (2014) Density estimation with adaptive sparse grids for large data sets. In: Proceedings of the 2014 SIAM international conference on data mining, pp 443–451
Ram P, Gray AG (2011) Density estimation trees. In: Knowledge discovery and data mining, pp 627–635
Rau MM, Seitz S, Brimioulle F, Frank E, Friedrich O, Gruen D, Hoyle B (2015) Accurate photometric redshift probability density estimation—method comparison and application. Monthly Notices R Astron Soc 452(4):3710–3725
Article Google Scholar
Raykar VC, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation. In: Proceedings of the sixth SIAM international conference on data mining, pp 524–528
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Article MathSciNet Google Scholar
Scott DW, Sain SR (2004) Multi-dimensional density estimation. Elsevier, Amsterdam, pp 229–263
Google Scholar
Scutari M (2010) Learning Bayesian networks with the bnlearn R package. J Stat Softw 35(3):1–22
Article Google Scholar
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc Ser B (Methodol) 53(3):683–690
MathSciNet MATH Google Scholar
Su J, Zhang H (2006) Full Bayesian network classifiers. In: Proceedings of the twenty-third international conference on machine learning, pp 897–904
Valiant LG (1984) A theory of the learnable. Commun ACM 27(11):1134–1142
Article MATH Google Scholar
Vapnik V, Mukherjee S (1999) Support vector method for multivariate density estimation. In: Neural information processing systems, pp 659–665
Wan R, Wang L (2010) Clustering over evolving data stream with mixed attributes. J Comput Inf Syst 6:1555–1562
Google Scholar
Wang X, Wang Y (2015) Nonparametric multivariate density estimation using mixtures. Stat Comput 25(2):349–364
Article MathSciNet MATH Google Scholar
Wied D, Weißbach R (2012) Consistency of the kernel density estimator: a survey. Stat Papers 53(1):1–21
Article MathSciNet MATH Google Scholar
Wu K, Zhang K, Fan W, Edwards A, Yu PS (2014) RS-forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of the 14th international conference on data mining, pp 600–609
Zhou A, Cai Z, Wei L, Qian W (2003) M-kernel merging: towards density estimation over data streams. In: Proceedings of the eighth international conference on database systems for advanced applications, IEEE computer society, pp 285–292
Zliobaite I, Bifet A, Read J, Pfahringer B, Holmes G (2015) Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach Learn 98(3):455–482
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank the editor and the anonymous reviewers for their comments. They improved the presentation, readability, and quality of this paper substantially. We are particularly grateful to the anonymous reviewer who proposed the exponentiated gradient investment strategy for weighting the classifier chains.

Author information

Authors and Affiliations

Johannes Gutenberg-Universität Mainz, Staudingerweg 9, 55128, Mainz, Germany
Michael Geilke, Andreas Karwath & Stefan Kramer
Department of Computer Science, The University of Waikato, Hamilton, 3240, New Zealand
Eibe Frank

Authors

Michael Geilke
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Karwath
View author publications
You can also search for this author in PubMed Google Scholar
Eibe Frank
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Kramer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Geilke.

Additional information

Responsible editor: Hendrik Blockeel.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 63 KB)

Supplementary material 2 (pdf 209 KB)

Supplementary material 3 (pdf 80 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geilke, M., Karwath, A., Frank, E. et al. Online estimation of discrete, continuous, and conditional joint densities using classifier chains. Data Min Knowl Disc 32, 561–603 (2018). https://doi.org/10.1007/s10618-017-0546-6

Download citation

Received: 18 December 2015
Accepted: 09 November 2017
Published: 25 November 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s10618-017-0546-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online estimation of discrete, continuous, and conditional joint densities using classifier chains

Abstract

Access this article

Similar content being viewed by others

Linear combination of densities and its direct estimation framework with applications

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Ensembles of density estimators for positive-unlabeled learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 63 KB)

Supplementary material 2 (pdf 209 KB)

Supplementary material 3 (pdf 80 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Online estimation of discrete, continuous, and conditional joint densities using classifier chains

Abstract

Access this article

Similar content being viewed by others

Linear combination of densities and its direct estimation framework with applications

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Ensembles of density estimators for positive-unlabeled learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 63 KB)

Supplementary material 2 (pdf 209 KB)

Supplementary material 3 (pdf 80 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation