Fast adaptive kernel density estimator for data streams

Boedihardjo, Arnold P.; Lu, Chang-Tien; Chen, Feng

doi:10.1007/s10115-013-0712-0

Fast adaptive kernel density estimator for data streams

Regular Paper
Published: 12 December 2013

Volume 42, pages 285–317, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Arnold P. Boedihardjo¹,
Chang-Tien Lu² &
Feng Chen³

591 Accesses
12 Citations
Explore all metrics

Abstract

The probability density function (PDF) is an effective data model for a variety of stream mining tasks. As such, accurate estimates of the PDF are essential to reducing the uncertainties and errors associated with mining results. The nonparametric adaptive kernel density estimator (AKDE) provides accurate, robust, and asymptotically consistent estimates of a PDF. However, due to AKDE’s extensive computational requirements, it cannot be directly applied to the data stream environment. This paper describes the development of an AKDE approximation approach that heeds the constraints of the data stream environment and supports efficient processing of multiple queries. To this end, this work proposes (1) the concept of local regions to provide a partition-based variable bandwidth to capture local density structures and enhance estimation quality; (2) a suite of linear-pass methods to construct the local regions and kernel objects online; (3) an efficient multiple queries evaluation algorithm; (4) a set of approximate techniques to increase the throughput of multiple density queries processing; and (5) a fixed-size memory time-based sliding window that updates the kernel objects in linear time. Comprehensive experiments were conducted with real-world and synthetic data sets to validate the effectiveness and efficiency of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Freeway performance measurement system (PeMS) [http://pems.eecs.berkeley.edu]
Aggarwal C (2003) A framework for diagnosing changes in evolving data streams. In: Proceedings of 2003 ACM SIGMOD international conference on management of data. San Diego, CA, pp 575–586
Aggarwal C, Yu PS (2007) Data streams: models and algorithms. In: Aggarwal C (ed) A survey of synopsis construction in data streams. Springer Science and Business Media, New York, pp 69–202
Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. [http://www.ics.uci.edu/~mlearn/MLRepository.html]
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. Madison, WI, pp 1–16
Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Proceedings of 13th Annual ACM-SIAM symposium on discrete algorithms. San Francisco, CA, pp 633–634
Chan CC, Batur C, Srinivasan A (1991) Determination of quantization intervals in rule based model for dynamic systems. In: Proceedings of IEEE conference of systems, man, and, cybernetics. pp 1719–1723
Clear R, Berman S (1988) Estimation of linear interpolation error. In: Proceedings of the annual illuminating engineering society conference
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of 12th international conference on machine learning. pp 194–202
Duoandikoetxea J (2001) Fourier analysis: American mathematical society
Gibbons P, Matias Y, Poosala V (2002) Fast incremental maintenance of approximate histograms. ACM Trans Database Syst 27:261–298
Article Google Scholar
Gilbert A, Kotidis Y, Muthukrishan S, Strauss MJ (2002) How to summarize the universe: dynamic maintenance of quantiles. In: Proceedings of the 28th international conference of very large data bases. Hong Kong, China, pp 454–465
Gray A, Moore A (2003) Rapid evaluation of multiple density models. In: Proceedings of 9th international workshop on artificial intelligence and statistics. Key West, FL
Guha S, Koudas N, Shim K (2006) Approximation and streaming algorithms for histogram construction problems. ACM Trans Database Syst 31:396–438
Article Google Scholar
Heinz C (2007) Density estimation over data streams. Phd, Mathematics, Phillipps-University Marburg
Heinz C, Seeger B (2008) Cluster kernels: resource-aware kernel density estimators over streaming data. IEEE Trans Knowl Data Eng 20:880–893
Article Google Scholar
Heinz C, Seeger B (2006) Exploring data streams with nonparametric estimators. In: Proceedings of 18th international conference on statistical and scientific database management. Vienna, Austria, pp 261–264
Heinz C, Seeger B (2006) Resource-aware kernel density estimators over streaming data. In: Proceedings of 15th ACM international conference on information and knowledge management. Arlington, VA, pp 870–871
Heinz C, Seeger B (2006) Towards kernel density estimation over streaming data. In: Proceedings of 13th international conference on management of data. Delhi, pp 91–102
Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise, in proceedings of ACM Knowledge Discovery and Data Mining 58–65
Ioannidis Y (2003) The history of histograms (abridged). In: Proceedings of 29th international conference on very large databases. Berlin, pp 19–30
Keogh E, Xi X, Wei L, Ratanamahatana CA (2008) The UCR time series classification/clustering. [http://www.cs.ucr.edu/~eamonn/time_series_data]. Available: http://www.cs.ucr.edu/~eamonn/
Ledl T (2004) Kernel density estimation: theory and application in discriminant analysis. Aust J Stat 33:267–279
Google Scholar
Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Discov 6:393–423
Article MathSciNet Google Scholar
Merckt TV (1993) Decision trees in numerical attribute spaces. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1016–1021
Nussbaumer HJ (1982) Fast Fourier transform and convolution algorithms, 2nd edn. Springer, New York
Book Google Scholar
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
Article MATH MathSciNet Google Scholar
Sain SR, Scott DW (1996) On locally adaptive density estimation. J Am Stat Assoc 91:1525–1534
Article MATH MathSciNet Google Scholar
Scott DW (1992) Multivariate density estimation. Wiley, New York
Book MATH Google Scholar
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Book MATH Google Scholar
Smith JO (2011) Digital audio resampling home page. Available http://www-ccrma.stanford.edu/~jos/resample
Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd international conference on very large databases. Seoul, pp 187–198
Wand MP, Jones MC (1995) Kernel smoothing. CRC Press, Boca Raton
Book MATH Google Scholar
Wegman EJ, Marchette DJ (2003) On some techniques for streaming data: a case study of internet packet headers. J Comput Graph Stat 12:1–22
Article MathSciNet Google Scholar
Weiss SM, Galen RS, Tadepalli PV (1991) Maximizing the predictive value of production rules, artificial intelligence, pp 47–71
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD international conference on management of data. Montreal, pp 103–114
Zhang T, Ramakrishnan R, Livny M (1999) Fast density estimation using CF-kernel for very large databases. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining. San Diego, CA, pp 312–316
Zhou A, Cai Z, Wei L, Qian W (2003) M-Kernel merging: towards density estimation over data streams. In: Proceedings of the 8th international conference on database systems for advanced applications. Kyoto, pp 285–292

Download references

Author information

Authors and Affiliations

U. S. Army Corps of Engineers, Alexandria, VA, USA
Arnold P. Boedihardjo
Computer Science Department, Virginia Tech, Falls Church, VA, USA
Chang-Tien Lu
Computer Science Department, Carnegie Melon University, Pittsburgh, PA, USA
Feng Chen

Authors

Arnold P. Boedihardjo
View author publications
You can also search for this author in PubMed Google Scholar
Chang-Tien Lu
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnold P. Boedihardjo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boedihardjo, A.P., Lu, CT. & Chen, F. Fast adaptive kernel density estimator for data streams. Knowl Inf Syst 42, 285–317 (2015). https://doi.org/10.1007/s10115-013-0712-0

Download citation

Received: 29 November 2012
Revised: 27 August 2013
Accepted: 15 November 2013
Published: 12 December 2013
Issue Date: February 2015
DOI: https://doi.org/10.1007/s10115-013-0712-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast adaptive kernel density estimator for data streams

Abstract

Access this article

Similar content being viewed by others

Fast Kernel Density Estimation with Density Matrices and Random Fourier Features

Adaptive robust local online density estimation for streaming data

Parzen Windows: Simplest Regularization Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast adaptive kernel density estimator for data streams

Abstract

Access this article

Similar content being viewed by others

Fast Kernel Density Estimation with Density Matrices and Random Fourier Features

Adaptive robust local online density estimation for streaming data

Parzen Windows: Simplest Regularization Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation