Skip to main content
Log in

Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Babcock B, Babu S, Datar M et al. Models and issues in data stream systems. In Proc. the 21st ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems, June 2002, pp.1-16.

  2. Tran T T, Peng L, Li B et al. PODS: A new model and processing algorithms for uncertain data streams. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.159-170.

  3. Cao K Y, Wang G R, Han D H et al. Continuous outlier monitoring on uncertain data streams. Journal of Computer Science and Technology, 2014, 29(3): 436-448.

    Article  MathSciNet  Google Scholar 

  4. Zhao L, Yang Y Y, Zhou X. Continuous probabilistic subspace skyline query processing using grid projections. Journal of Computer Science and Technology, 2014, 29(2): 332-344.

    Article  MathSciNet  Google Scholar 

  5. Zhou A Y, Jin C Q, Wang G R et al. A survey on the management of uncertain data. Chinese Journal of Computers, 2009, 32(1): 1-16. (in Chinese)

    Article  MathSciNet  Google Scholar 

  6. He Q, Shang T, Zhuang F et al. Parallel extreme learning machine for regression based on MapReduce. Neurocomputing, 2013, 102: 52-58.

    Article  Google Scholar 

  7. Aggarwal C C, Yu P S. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(5): 609-623.

    Article  Google Scholar 

  8. Masud M M, Gao J, Khan L et al. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In Proc. the 8th IEEE International Conference on Data Mining, December 2008, pp.929-934.

  9. Xu W, Qin Z, Chang Y. A framework for classifying uncertain and evolving data streams. Information Technology Journal, 2011, 10(10): 1926-1933.

    Article  Google Scholar 

  10. Domingos P, Hulten G. Mining high-speed data streams. In Proc. the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2000, pp.71-80.

  11. Hulten G, Spencer L, Domingos P. Mining time-changing data streams. In Proc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2001, pp.97-106.

  12. Gama J, Rocha R, Medas P. Accurate decision trees for mining high-speed data streams. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2003, pp.523-528.

  13. Liu J, Li X, Zhong W. Ambiguous decision trees for mining concept-drifting data streams. Pattern Recognition Letters, 2009, 30(15): 1347-1355.

    Article  Google Scholar 

  14. Gama J, Kosina P. Learning decision rules from data streams. In Proc. the 22nd International Joint Conference on Artificial Intelligence, July 2011, pp.1255-1260.

  15. Kosina P, Gama J. Handling time changing data with adaptive very fast decision rules. In Machine Learning and Knowledge Discovery in Databases, Flach P, Bie T, Cristianini N (eds.), Springer, 2012, pp.827-842.

  16. Frias-Blanco I, del Campo-Avila J, Ramos Jimenez G et al. Online and nonparametric drift detection methods based on Hoeffding’s bounds. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(3): 810-823.

    Article  Google Scholar 

  17. Street W N, Kim Y. A streaming ensemble algorithm (SEA) for large-scale classification. In Proc. the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2001, pp.377-382.

  18. Stanley K O. Learning concept drift with a committee of decision trees. Technical Report, UT-AI-TR-03-302, Department of Computer Sciences, University of Texas at Austin, USA, 2003.

  19. Wang H, Fan W, Yu P S et al. Mining concept-drifting data streams using ensemble classifiers. In Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2003, pp.226-235.

  20. Nishida K, Yamauchi K, Omori T. ACE: Adaptive classifiers-ensemble system for concept-drifting environments. In Proc. the 6th Int. Workshop on Multiple Classifier Systems, June 2005, pp.176-185.

  21. Li P, Wu X, Hu X et al. A random decision tree ensemble for mining concept drifts from noisy data streams. Applied Artificial Intelligence, 2010, 24(7): 680-710.

    Article  MATH  Google Scholar 

  22. Ye Y,Wu Q, Huang J Z et al. Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recognition, 2013, 46(3): 769-787.

    Article  Google Scholar 

  23. Liang C, Zhang Y, Song Q. Decision tree for dynamic and uncertain data streams. In Proc. the 2nd Asian Conference on Machine Learning, November 2010, pp.209-224.

  24. Qin B, Xia Y, Li F. DTU: A decision tree for uncertain data. In Proc. the 13th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining, April 2009, pp.4-15.

  25. Pan S, Wu K, Zhang Y et al. Classifier ensemble for uncertain data stream classification. In Proc. the 14th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining, June 2010, pp.488-495.

  26. Jenhani I, Amor N B, Elouedi Z. Decision trees as possibilistic classifiers. International Journal of Approximate Reasoning, 2008, 48(3): 784-807.

    Article  MATH  Google Scholar 

  27. Liu B, Xiao Y, Cao L et al. One-class-based uncertain data stream learning. In Proc. the 11th SIAM International Conference on Data Mining, April 2011, pp.992-1003.

  28. Cao K, Wang G, Han D et al. Classification of uncertain data streams based on extreme learning machine. Cognitive Computation, 2015, 7(1): 150-160.

    Article  Google Scholar 

  29. Huang G B, Wang D H, Lan Y. Extreme learning machines: A survey. International Journal of Machine Learning and Cybernetics, 2011, 2(2): 107-122.

    Article  Google Scholar 

  30. Huang G B, Babri H A. Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions. IEEE Transactions on Neural Networks, 1998, 9(1): 224-229.

    Article  Google Scholar 

  31. Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: Theory and applications. Neurocomputing, 2006, 70(1/2/3): 489-501.

  32. Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong-Hong Han.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61173029 and 61272182.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, DH., Zhang, X. & Wang, GR. Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine. J. Comput. Sci. Technol. 30, 874–887 (2015). https://doi.org/10.1007/s11390-015-1566-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-015-1566-6

Keywords

Navigation