Multimedia Tools and Applications

, Volume 76, Issue 17, pp 18027–18045 | Cite as

KDE based outlier detection on distributed data streams in multimedia network

  • Zhigao Zheng
  • Hwa-Young Jeong
  • Tao HuangEmail author
  • Jiangbo Shu


Multimedia networks hold the promise of facilitating large-scale, real-time data processing in complex environments. Their foreseeable applications will help protect and monitor military, environmental, safety-critical, or domestic infrastructures and resources. Cloud infrastructures promise to provide high performance and cost effective solutions to large scale data processing problems. This paper focused on the outlier detection over distributed data stream in real time, proposed kernel density estimation (KDE) based outlier detection algorithm KDEDisStrOut in Storm, firstly formalized the problem of outlier detection using the kernel density estimation technique and update the transported data incrementally between the child node and the coordinator node which reduces the communication cost. Then the paper adopted the exponential decay policy to keep pace with the transient and evolving natures of stream data and changed the weight of different data in the sliding window adaptively made the data analysis more reasonable. Theoretical analysis and experiments on Storm with synthetic and real data show that the KDEDisStrOut algorithm is efficient and effective compared with existing outlier detection algorithms, and more suitable for data streams.


Kernel density estimation Distributed Data stream Stream analysis Exponential decay policy 



This work is supported by the Key Projects in the National Science & Technology Pillar Program during the Twelfth Five-year Plan Period under Grant No.2015BAK07B03, National “Twelfth Five-Year” Plan for Science & Technology Support under Grant No.2013BAH18F02.

Compliance with ethical standards

Conflict of interest

The authors declare no conflict of interest.


  1. 1.
    Aggarwal CC, Han J-w, Wang J-y et al (2004) A frame-work for projected clustering of high dimensional data streams.// Proceedings of the 30th International Conference on Very Large Data Bases. Toronto, pp 852–863Google Scholar
  2. 2.
    Armbrust M, Fox A, Gri th R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I et al (2009) Abovethe clouds: A berkeley view of cloud computing. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28Google Scholar
  3. 3.
    Assent I et al (2012) Anyout: anytime outlier detection on streaming data. Database Systems for Advanced Applications. Springer, BerlinGoogle Scholar
  4. 4.
    Bifet A, Holmes G, Kirkby R, Pfahringer B (2011) Data stream mining: a practical approach. The University of Waikato, HamiltonGoogle Scholar
  5. 5.
    Botev ZI, Grotowski JF, Kroese DP (2010) Kernel density estimation via diffusion[J]. Ann Stat 38(5):2916–2957MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Branch JW, Giannella C, Szymanski B et al (2013) In-network outlier detection in wireless sensor networks. Knowl Inf Syst 34(1):23–25CrossRefGoogle Scholar
  7. 7.
    Buchman SM, Lee AB, Schafer CM (2011) High-dimensional density estimation via SCA: an example in the modelling of hurricane tracks. Stat Methodol 8(1):18–30MathSciNetCrossRefGoogle Scholar
  8. 8.
    Buzzi-Ferraris G, Manenti F (2011) Outlier detection in large data sets. Comput Chem Eng 35:388–390CrossRefGoogle Scholar
  9. 9.
    Chen S, He H (2011) Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evol Syst 2(1):35–50CrossRefGoogle Scholar
  10. 10.
    Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA. ACM Press, 133–142Google Scholar
  11. 11.
    Cheon JJ, Choe T-Y (2013) Distributed processing of snort alert log using Hadoop. Int J Eng Technol 5(3):2685–2690Google Scholar
  12. 12.
    Crisan D, Mguez J (2014) Particle-kernel estimation of the lter density in statespace models. Bernoulli 20(4):1879–1929. doi: 10.3150/13-BEJ545 MathSciNetCrossRefGoogle Scholar
  13. 13.
    Fernandez RC, Weidlich M, Pietzuch P et al (2014) Scalable stateful stream processing for smart grids[C]//Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems. ACM, pp 276–281Google Scholar
  14. 14.
    Fingar P (2010) Dot Cloud: the 21st century business platform built on Cloud computing. Electronic Industry Press, BeijingGoogle Scholar
  15. 15.
    Francia GA, Hutchinson FS (2014) Regulatory and policy compliance with regard to identity theft prevention, detection, and response. In: Crisis management: concepts, methodologies, tools, and applications. Information Science Reference, Hershey. doi: 10.4018/978-1-4666-4707-7.ch012, pp 280–310
  16. 16.
    Gabel M, Keren D, Schuster A (2013) Communication-efficient Outlier Detection for Scale-out Systems. BD3@ VLDBGoogle Scholar
  17. 17.
    Hatem, SS, El-Khouly MM (2014) Malware detection in Cloud computing. Int J Adv Comput Sci Appl 5(4)Google Scholar
  18. 18.
    Jia B, Liu S, Yang Y (2014) Fractal cross-layer service with integration and interaction in Internet of things. Int J Distrib Sensor Netw. doi:  10.1155/2014/760248
  19. 19.
    Juve G, Deelman E (2010) Scientific workflows and clouds. Crossroads 16(3):14–18CrossRefGoogle Scholar
  20. 20.
    Kleiminger W (2011) Stream processing in the Cloud (R). MEng Honours degree in Computing of Imperial CollegeGoogle Scholar
  21. 21.
    Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Gupta A, Shmueli O, Widom J (eds) Proc. of the 24th Int’l conf. on very large databases. ACM Press, New York, pp 392–403Google Scholar
  22. 22.
    Legg PA, Rosin PL, Marshall D et al (2013) Improving accuracy and efficiency of mutual information for multi-modal retinal image registration using adaptive probability density estimation. Comput Med Imaging Graph 37(7):597–606CrossRefGoogle Scholar
  23. 23.
    Liu S, Fu W, Deng H et al (2013) Distributional fractal creating algorithm in parallel environment. Int J Distrib Sensor Netw. doi: 10.1155/2013/281707
  24. 24.
    Liu S, Fu W, He L et al (2015) Distribution of primary additional errors in fractal encoding method [J]. Multimed Tools Appl. doi: 10.1007/s11042-014-2408-1 Google Scholar
  25. 25.
    Liu Z, Zhang H, Meng J et al (2013) WDE based outlinter detection on distributed data stream. Comput Eng 39(2):178–181Google Scholar
  26. 26.
    Massaro F, D’Abrusco R, Paggi A et al (2013) Unveiling the nature of the unidentified Gamma-Ray Sources. V. Analysis of the radio candidates with the kernel density estimation. Astrophys J Suppl Ser 209:1–10CrossRefGoogle Scholar
  27. 27.
    Milenkoski A, Kounev S (2012) Towards benchmarking intrusion detection systems for virtualized cloud environments. ICITSTGoogle Scholar
  28. 28.
    Papadimitirou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) LOCI: fast outlier detection using the local correlation integral. In: Dayal U, Ramamritham K, Vijayaraman TM (eds) Proc. of the 19th Int’l Conf. on Data Engineering. Bangalore. 315–326Google Scholar
  29. 29.
    Peng L (2011) Cloud computing. Electronic Industry Press, BeijingGoogle Scholar
  30. 30.
    Pöthkow K, Hege H-C (2013) Nonparametric models for uncertainty visualization.//Computer Graphics Forum. Blackwell Publishing Ltd, 32(3pt2): 131–140Google Scholar
  31. 31.
    Saini A, Sharma KK, Dalal S (2014) A survey on outlier detection in WSN. Int J Res Aspects Eng Manage 1(2):69–72Google Scholar
  32. 32.
    Scott DW (2010) Scott’s rule. Wiley Interdiscip Rev Comput Stat 2(4):497–502CrossRefGoogle Scholar
  33. 33.
    Vakali A, Giatsoglou M, Antaris S (2012) Social networking trends and dynamics detection via a cloud-based framework design. Proceedings of the 21st international conference companion on World Wide Web. ACMGoogle Scholar
  34. 34.
    Verde R, Irpino A, Rivoli L (2014) A box-plot and outliers detection proposal for histogram data: new tools for data stream analysis. Analysis and Modeling of Complex Data in Behavioral and Social Sciences Studies in Classification, Data Analysis, and Knowledge Organization, pp 283–291Google Scholar
  35. 35.
    Watson P, Lord P, Gibson F, Periorellis P, Pitsilis G (2008) Cloud computing for e-Science with CARMEN. In: 2nd Iberian Grid Infrastructure Conference Proceedings, pp 3–14. NetbibloGoogle Scholar
  36. 36.
    Yang F et al (2012) Sonora: a platform for continuous mobile-cloud computing. Technical report, Technical Report. Microsoft Research Asia, pp 1–17Google Scholar
  37. 37.
    Yu D, Ping L, Li W (2014) Spatio-temporal outlier detection based on cloud computing. J Comput Inf Syst 10(13):5481–5488Google Scholar
  38. 38.
    Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutorials 12(2):159–170CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Zhigao Zheng
    • 1
  • Hwa-Young Jeong
    • 2
  • Tao Huang
    • 3
    Email author
  • Jiangbo Shu
    • 3
  1. 1.School of Computer & TechnologyHuazhong University of Science and TechnologyWuhanChina
  2. 2.Humanitas CollegeKyung Hee UniversitySeoulSouth Korea
  3. 3.National Engineering Research Center for E-learningCentral China Normal UniversityWuhanChina

Personalised recommendations