Skip to main content
Log in

KDE based outlier detection on distributed data streams in multimedia network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multimedia networks hold the promise of facilitating large-scale, real-time data processing in complex environments. Their foreseeable applications will help protect and monitor military, environmental, safety-critical, or domestic infrastructures and resources. Cloud infrastructures promise to provide high performance and cost effective solutions to large scale data processing problems. This paper focused on the outlier detection over distributed data stream in real time, proposed kernel density estimation (KDE) based outlier detection algorithm KDEDisStrOut in Storm, firstly formalized the problem of outlier detection using the kernel density estimation technique and update the transported data incrementally between the child node and the coordinator node which reduces the communication cost. Then the paper adopted the exponential decay policy to keep pace with the transient and evolving natures of stream data and changed the weight of different data in the sliding window adaptively made the data analysis more reasonable. Theoretical analysis and experiments on Storm with synthetic and real data show that the KDEDisStrOut algorithm is efficient and effective compared with existing outlier detection algorithms, and more suitable for data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Aggarwal CC, Han J-w, Wang J-y et al (2004) A frame-work for projected clustering of high dimensional data streams.// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, pp 852–863

  2. Armbrust M, Fox A, Gri th R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I et al (2009) Abovethe clouds: A berkeley view of cloud computing. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28

  3. Assent I et al (2012) Anyout: anytime outlier detection on streaming data. Database Systems for Advanced Applications. Springer, Berlin

    Google Scholar 

  4. Bifet A, Holmes G, Kirkby R, Pfahringer B (2011) Data stream mining: a practical approach. The University of Waikato, Hamilton

    Google Scholar 

  5. Botev ZI, Grotowski JF, Kroese DP (2010) Kernel density estimation via diffusion[J]. Ann Stat 38(5):2916–2957

    Article  MathSciNet  MATH  Google Scholar 

  6. Branch JW, Giannella C, Szymanski B et al (2013) In-network outlier detection in wireless sensor networks. Knowl Inf Syst 34(1):23–25

    Article  Google Scholar 

  7. Buchman SM, Lee AB, Schafer CM (2011) High-dimensional density estimation via SCA: an example in the modelling of hurricane tracks. Stat Methodol 8(1):18–30

    Article  MathSciNet  Google Scholar 

  8. Buzzi-Ferraris G, Manenti F (2011) Outlier detection in large data sets. Comput Chem Eng 35:388–390

    Article  Google Scholar 

  9. Chen S, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50

    Article  Google Scholar 

  10. Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA. ACM Press, 133–142

  11. Cheon JJ, Choe T-Y (2013) Distributed processing of snort alert log using Hadoop. Int J Eng Technol 5(3):2685–2690

    Google Scholar 

  12. Crisan D, Mguez J (2014) Particle-kernel estimation of the lter density in statespace models. Bernoulli 20(4):1879–1929. doi:10.3150/13-BEJ545

    Article  MathSciNet  Google Scholar 

  13. Fernandez RC, Weidlich M, Pietzuch P et al (2014) Scalable stateful stream processing for smart grids[C]//Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems. ACM, pp 276–281

  14. Fingar P (2010) Dot Cloud: the 21st century business platform built on Cloud computing. Electronic Industry Press, Beijing

    Google Scholar 

  15. Francia GA, Hutchinson FS (2014) Regulatory and policy compliance with regard to identity theft prevention, detection, and response. In: Crisis management: concepts, methodologies, tools, and applications. Information Science Reference, Hershey. doi:10.4018/978-1-4666-4707-7.ch012, pp 280–310

  16. Gabel M, Keren D, Schuster A (2013) Communication-efficient Outlier Detection for Scale-out Systems. BD3@ VLDB

  17. Hatem, SS, El-Khouly MM (2014) Malware detection in Cloud computing. Int J Adv Comput Sci Appl 5(4)

  18. Jia B, Liu S, Yang Y (2014) Fractal cross-layer service with integration and interaction in Internet of things. Int J Distrib Sensor Netw. doi: 10.1155/2014/760248

  19. Juve G, Deelman E (2010) Scientific workflows and clouds. Crossroads 16(3):14–18

    Article  Google Scholar 

  20. Kleiminger W (2011) Stream processing in the Cloud (R). MEng Honours degree in Computing of Imperial College

  21. Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Gupta A, Shmueli O, Widom J (eds) Proc. of the 24th Int’l conf. on very large databases. ACM Press, New York, pp 392–403

    Google Scholar 

  22. Legg PA, Rosin PL, Marshall D et al (2013) Improving accuracy and efficiency of mutual information for multi-modal retinal image registration using adaptive probability density estimation. Comput Med Imaging Graph 37(7):597–606

    Article  Google Scholar 

  23. Liu S, Fu W, Deng H et al (2013) Distributional fractal creating algorithm in parallel environment. Int J Distrib Sensor Netw. doi:10.1155/2013/281707

  24. Liu S, Fu W, He L et al (2015) Distribution of primary additional errors in fractal encoding method [J]. Multimed Tools Appl. doi:10.1007/s11042-014-2408-1

    Google Scholar 

  25. Liu Z, Zhang H, Meng J et al (2013) WDE based outlinter detection on distributed data stream. Comput Eng 39(2):178–181

    Google Scholar 

  26. Massaro F, D’Abrusco R, Paggi A et al (2013) Unveiling the nature of the unidentified Gamma-Ray Sources. V. Analysis of the radio candidates with the kernel density estimation. Astrophys J Suppl Ser 209:1–10

    Article  Google Scholar 

  27. Milenkoski A, Kounev S (2012) Towards benchmarking intrusion detection systems for virtualized cloud environments. ICITST

  28. Papadimitirou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) LOCI: fast outlier detection using the local correlation integral. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proc. of the 19th Int’l Conf. on Data Engineering. Bangalore. 315–326

  29. Peng L (2011) Cloud computing. Electronic Industry Press, Beijing

    Google Scholar 

  30. Pöthkow K, Hege H-C (2013) Nonparametric models for uncertainty visualization.//Computer Graphics Forum. Blackwell Publishing Ltd, 32(3pt2): 131–140

  31. Saini A, Sharma KK, Dalal S (2014) A survey on outlier detection in WSN. Int J Res Aspects Eng Manage 1(2):69–72

    Google Scholar 

  32. Scott DW (2010) Scott’s rule. Wiley Interdiscip Rev Comput Stat 2(4):497–502

    Article  Google Scholar 

  33. Vakali A, Giatsoglou M, Antaris S (2012) Social networking trends and dynamics detection via a cloud-based framework design. Proceedings of the 21st international conference companion on World Wide Web. ACM

  34. Verde R, Irpino A, Rivoli L (2014) A box-plot and outliers detection proposal for histogram data: new tools for data stream analysis. Analysis and Modeling of Complex Data in Behavioral and Social Sciences Studies in Classification, Data Analysis, and Knowledge Organization, pp 283–291

  35. Watson P, Lord P, Gibson F, Periorellis P, Pitsilis G (2008) Cloud computing for e-Science with CARMEN. In: 2nd Iberian Grid Infrastructure Conference Proceedings, pp 3–14. Netbiblo

  36. Yang F et al (2012) Sonora: a platform for continuous mobile-cloud computing. Technical report, Technical Report. Microsoft Research Asia, pp 1–17

  37. Yu D, Ping L, Li W (2014) Spatio-temporal outlier detection based on cloud computing. J Comput Inf Syst 10(13):5481–5488

    Google Scholar 

  38. Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutorials 12(2):159–170

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the Key Projects in the National Science & Technology Pillar Program during the Twelfth Five-year Plan Period under Grant No.2015BAK07B03, National “Twelfth Five-Year” Plan for Science & Technology Support under Grant No.2013BAH18F02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Huang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

This work is done when Zhigao Zheng was in National Engineering Research Center for E-learning of Central China Normal University.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Z., Jeong, HY., Huang, T. et al. KDE based outlier detection on distributed data streams in multimedia network. Multimed Tools Appl 76, 18027–18045 (2017). https://doi.org/10.1007/s11042-016-3681-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3681-y

Keywords

Navigation