Advertisement

Challenges in Mining Big Data Streams

  • Veena TayalEmail author
  • Ritesh Srivastava
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 847)

Abstract

Big data deals with data of very large data size, heterogeneous data types and from different sources. The data is very complex in nature and having growing data. Dealing with big data is one of the emerging areas of research which is expanding at a rapid rate in all domains of engineering and medical sciences. A major challenge imposes on the analysis of big data is originated from big data generation source, which generate data with very fast speed with varying data distribution due to which the classical methods are unable to process big data. This paper discusses the characteristics, challenges, and issues with big data mining. It also illustrates the examples taken from various fields like medical, finance, social networking sites, stock exchange, etc. to realize the application and importance of big data mining. This paper explains about the use of parallel computing in data mining security issues and how to deal with them. Furthermore, this paper also discusses challenges associated with big streaming data with concept drifts.

Keywords

Big data Data mining Machine learning Online learning 

References

  1. 1.
    Nature Editorial: Community cleverness required. Nature 455(7209), 1 (2008)CrossRefGoogle Scholar
  2. 2.
    Howe, D., et al.: Big data: the future of biocuration. Nature 455, 47–50 (2008)CrossRefGoogle Scholar
  3. 3.
    Labrinidis, A., Jagadish, H.: Challenges and opportunities with big data. Proc. VLDB Endowment 5(12), 2032–2033 (2012)CrossRefGoogle Scholar
  4. 4.
    IBM: What is big data: bring big data to the enterprise. http://www.01.ibm.com/software/data/bigdata/ (2012)
  5. 5.
    Blog, T.: Dispatch from the Denver debate. http://blog.twitter.com/2012/10/dispatch-from-denver-debate.html (2012)
  6. 6.
    Michel, F.: How many photos are uploaded to flickr every day and month? http://www.flickr.com/photos/franckmichel/6855169886/ (2012)
  7. 7.
    Rajaraman, A., Ullman, J.: Mining of Massive Data Sets. Cambridge University Press (2011)Google Scholar
  8. 8.
    Dewdney, P., Hall, P., Schilizzi, R., Lazio, J.: The square kilometre array. Proc. IEEE 97(8), 1482–1496 (2009)CrossRefGoogle Scholar
  9. 9.
    Chang, E.Y., Bai, H., Zhu, K.: Parallel algorithms for mining large-scale rich-media data. In: Proceedings of the 17th ACM International Conference on Multimedia (MM’09), pp. 917–918 (2009)Google Scholar
  10. 10.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp. 71–80 (2000)Google Scholar
  11. 11.
    Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Trans. Knowl. Data Eng. 15(2), 353–367 (2003)CrossRefGoogle Scholar
  12. 12.
    Wu, X., Zhang, C., Zhang, S.: Database classification for multi-database mining. Inf. Syst. 30(1), 71–88 (2005)CrossRefGoogle Scholar
  13. 13.
    Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering. In: Third International Symposium on Intelligent Information Technology and Security Informatics, pp. 63, IEEE (2010)Google Scholar
  14. 14.
    Vidhya, K.A., Aghila, G.: A survey of Naïve Bayes machine learning approach in text document classification (IJCSIS). Int. J. Comput. Sci. Inf. Secur. 7(2) (2010)Google Scholar
  15. 15.
    Wu, X., Zhu, X., Wu, G., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–106 (2014)CrossRefGoogle Scholar
  16. 16.
    TM Forum: Challenges of big data (2012)Google Scholar
  17. 17.
    Srivastava, R., Bhatia, M.: Offline vs. online sentiment analysis: issues with sentiment analysis of online micro-texts. Int. J. Inf. Retrieval Res. (IJIRR) 7(4), 1–18 (2017)Google Scholar
  18. 18.
    Srivastava, R., Bhatia, M.: Real-time unspecified major sub-events detection in the twitter data stream that cause the change in the sentiment score of the targeted event. Int. J. Inf. Technol. Web Eng. (IJITWE) 12(4), 1–21 (2017)CrossRefGoogle Scholar
  19. 19.
    Srivastava, R., Bhatia, M.: Challenges with sentiment analysis of on-line micro-texts. Int. J. Intell. Syst. Appl. 9(7), 31 (2017)Google Scholar
  20. 20.
    Srivastava, R., Bhatia, M.: Ensemble methods for sentiment analysis of on-line micro-texts. Presented at the International Conference on Recent Advances and Innovations in Engineering (ICRAIE) (2016)Google Scholar
  21. 21.
    Shafer, J., Agrawal, R., Mehta, M.: SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22nd Conference on VLDB (1996)Google Scholar
  22. 22.
    Luo, D., Ding, C., Huang, H.: Parallelization with multiplicative algorithms for big data mining. In: Proceedings of the IEEE 12th International Conference on Data Mining, pp. 489–498 (2012)Google Scholar
  23. 23.
    Chen, R., Sivakumar, K., Kargupta, H.: Collective mining of Bayesian networks from distributed heterogeneous data. Knowl. Inf. Syst. 6(2), 164–187 (2004)CrossRefGoogle Scholar
  24. 24.
    Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: integrating R and Hadoop. In: Proceedings of the ACMSIGMOD International Conference on Management Data (SIGMOD’10), pp. 987–998 (2010)Google Scholar
  25. 25.
    Wegener, D., Mock, M., Adranale, D., Wrobel, S.: Toolkit-based high-performance data mining of large data on MapReduce clusters. In: Proceedings of the International Conference on Data Mining Workshops (ICDMW’09), pp. 296–301 (2009)Google Scholar
  26. 26.
    Kopanas, I., Avouris, N., Daskalaki, S., The role of domain knowledge in a large scale data mining project. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence, pp. 288–299 (2002)CrossRefGoogle Scholar
  27. 27.
    Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)CrossRefGoogle Scholar
  28. 28.
    Machanavajjhala, A., Reiter, J.P.: Big privacy: protecting confidentiality in big data. ACM Crossroads 19(1), 20–23 (2012)CrossRefGoogle Scholar
  29. 29.
    Mittal, V., kashyap, I.: Online methods of learning in occurrence of concept drift. Int. J. Comput. Appl. 117(13), 18–22 (2015)CrossRefGoogle Scholar
  30. 30.
    Mittal, V., Kashyap, I.: Empirical study of impact of various concept drifts in data stream mining methods. Int. J. Intell. Syst. Appl. 8(12), 65 (2016)Google Scholar
  31. 31.
    Mittal, V., Kashyap, I.: An overview of real world applications with concept drifting data streams (2018)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.CSE Department, FETMRIIRSFaridabadIndia

Personalised recommendations