The Journal of Supercomputing

, Volume 72, Issue 10, pp 3927–3959 | Cite as

Improvised methods for tackling big data stream mining challenges: case study of human activity recognition

  • Simon FongEmail author
  • Kexing Liu
  • Kyungeun Cho
  • Raymond Wong
  • Sabah Mohammed
  • Jinan Fiaidhi


Big data stream is a new hype but a practical computational challenge founded on data streams that are prevalent in applications nowadays. It is quite well known that data streams that are originated and collected from monitoring sensors accumulate continuously to a very huge amount making traditional batch-based model induction algorithms infeasible for real-time data mining or just-in-time data analytics. In this position paper, following a new data stream mining methodology, namely stream-based holistic analytics and reasoning in parallel (SHARP), a list of data analytic challenges as well as improvised methods are looked into. In particular, two types of decision tree algorithms, batch-mode and incremental-mode, are put under test at sensor data that represents a typical big data stream. We investigate whether and to what extent of two improvised methods—outlier removal and balancing imbalanced class distributions—affect the prediction performance in big data stream mining. SHARP is founded on incremental learning which does not require all the training to be loaded into the memory. This important fundamental concept needs to be supported not only by the decision tree algorithms, but by the other improvised methods usually at the preprocessing stage as well. This paper sheds some light into this area which is often overlooked by data analysts when it comes to big data stream mining.


Data stream mining Big data Very fast decision tree Resampling Sensor data 



The authors are thankful for the financial support from the research grant “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF)”, Grant No. MYRG2015-00128-FST, offered by the University of Macau, FST, and RDAO.


  1. 1.
    Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
  2. 2.
    Pai P-F, Chen T-C (2009) Rough set theory with discriminant analysis in analyzing electricity loads. Expert Syst Appl 36:8799–8806CrossRefGoogle Scholar
  3. 3.
    Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM SIGMOD Rec 34(2):18–26CrossRefzbMATHGoogle Scholar
  4. 4.
    Fan W, Bifet A (2005) Mining big data: current status, and forecast to the future. SIGKDD Explor 14(2):1–5CrossRefGoogle Scholar
  5. 5.
    Murdopo A (2013) Distributed decision tree learning for mining big data streams. Master of Science Thesis. European Master in Distributed ComputingGoogle Scholar
  6. 6.
    Fong S, Zhuang Y, Wong R, Mohammed S (2014) A Scalable data stream mining methodology: stream-based holistic analytics and reasoning in parallel. In: Proceedings of the 2nd International symposium on computational and business intelligence, New Delhi, 7–8 Dec 2014, pp 110–115Google Scholar
  7. 7.
    Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 99:1601–1604Google Scholar
  8. 8.
    Perkins S, Lacker K, Theiler J (2003) Grafting: fast, incremental feature selection by gradient descent in function space. J Mach Learn Res 3:1333–1356MathSciNetzbMATHGoogle Scholar
  9. 9.
    Shu W, Shen H (2014) Incremental feature selection based on rough set in dynamic incomplete data. Pattern Recognit 47(12):3890–3906CrossRefGoogle Scholar
  10. 10.
    Katakis I, Tsoumakas G, Vlahavas I (2005) On the utility of incremental feature selection for the classification of textual data streams. In: PCI 2005, LNCS 3746. Springer, pp 338–348Google Scholar
  11. 11.
    Fong S, Liang J, Wong R, Ghanavati M (2014) A novel feature selection by clustering coefficients of variations. In: Proceedings of the 9th International conference on digital information management (ICDIM), Phitsanulok, 29 Sept–1 Oct 2014, pp 205–213Google Scholar
  12. 12.
    Fong S, Deb S, Yang X-S, Li J (2014) Feature selection in life science classification: metaheuristic swarm search. IT Prof 16(4):24–29CrossRefGoogle Scholar
  13. 13.
    Brest J, Boskovic B, Zamuda A, Fister I, Mezura-Montes E (2013) Real parameter single objective optimization using self-adaptive differential evolution algorithm with more strategies. In: Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Cancun, 20–23 June 2013, pp 377–383Google Scholar
  14. 14.
    Ryoo MS, Aggarwal JK (2011) Stochastic representation and recognition of high-level group activities. Int J Comput Vis (IJCV) 93(2):183–200Google Scholar
  15. 15.
    Fatima I, Fahim M, Lee YK, Lee S (2013) Analysis and effects of smart home dataset characteristics for daily life activity recognition. J Supercomput 66(2):760–780CrossRefGoogle Scholar
  16. 16.
    Edwards Chris (2014) Decoding the language of human movement. Commun ACM 57(12):12–14CrossRefGoogle Scholar
  17. 17.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res Arch 16(1):321–357zbMATHGoogle Scholar
  18. 18.
    Li J, Fong S, Mohammed S, Fiaidhi J (2015) Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. J Supercomput, Springer, pp 1–21Google Scholar
  19. 19.
    Fong S, Wong R, Vasilakos A (2015) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 99:1–12. doi: 10.1109/TSC.2015.2439695
  20. 20.
    Fong S, Zhuang Y, Tang R, Yang X-S, Deb S (2013) Selecting optimal feature set in high-dimensional data by swarm search. J Appl Math 2013:18. doi: 10.1155/2013/590614 (Article ID 590614)

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Simon Fong
    • 1
    Email author
  • Kexing Liu
    • 1
  • Kyungeun Cho
    • 2
  • Raymond Wong
    • 3
  • Sabah Mohammed
    • 4
  • Jinan Fiaidhi
    • 4
  1. 1.Department of Computer and Information ScienceUniversity of MacauMacauChina
  2. 2.Department of Multimedia EngineeringDongguk UniversitySeoulKorea
  3. 3.School of Computer Science and EngineeringUniversity of New South WalesSydneyAustralia
  4. 4.Department of Computer ScienceLakehead UniversityThunder BayCanada

Personalised recommendations