Advertisement

DataSpeak: Data Extraction, Aggregation, and Classification Using Big Data Novel Algorithm

  • Venkatesh Gauri Shankar
  • Bali Devi
  • Sumit Srivastava
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 810)

Abstract

A huge amount of data is coming due to large set of computing devices. As a birth of the variety of data, data processing and analysis is a big issue in big data analytics. On other hand, data consistency and scalability is also a major problem in the large set of data. Our research and proposed algorithm aims to data extraction, aggregation, and classification based on novel approach as “DataSpeak”. We have used k-Nearest Neighbors with Spark as reference and produced a novel approach with modified algorithm. We have analyzed our approach on the large dataset from travel and tourism, placement papers, movies and historical, smartphone, etc., domains. As for ability and accuracy of our algorithm, we have used cross validation, precision, recall, and comparative statistical analysis with the existing algorithm. Our approach returns with the fast accessing of data with efficient data extraction in a minimal time when compared to the existing algorithm in same domain. As concerned with the data aggregation and classification, our approach returns 98% of data aggregation and classification based on the data structure.

Keywords

Big data Big data analytics Classification kNN Spark framework 

References

  1. 1.
    Google Cloud and Big Data. https://cloud.google.com/bigquery/ (2016). Accessed 20 Oct 2016
  2. 2.
    Digital Innovation Mobile Big Data. www.digitalinnovationgazette.com/mobile\big\data/ (2017). Accessed 27 Nov 2017
  3. 3.
    Venturebeat Big Data Analytics. www.venturebeat.com/2015/01/22/big-data-and-mobile-analytics-ready-to-rule-2015/ (2017). Accessed 15 Oct 2017
  4. 4.
    Knowledge Hut Types of Big Data. https://www.knowledgehut.com/blog/bigdata-hadoop/types-of-big-data (2017). Accessed 22 Sept 2017
  5. 5.
    Impact Radius The seven Vs. https://www.impactradius.com/blog/7-vs-big-data/ (2017). Accessed 02 Nov 2017
  6. 6.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. In: IEEE Transactions on Information Theory, vol. 13, Issue 1, pp. 21–27, Jan 1967.  https://doi.org/10.1109/TIT.1967.1053964CrossRefGoogle Scholar
  7. 7.
    Tang, J., Liu, J., Zhang, M., Mei, Q.: Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th International Conference on WWW, pp. 287–297 (2016)Google Scholar
  8. 8.
    Shankar, V.G., Somani, G., Gaur, M.S., Laxmi, V., Conti, M.: AndroTaint: an efficient android malware detection framework using dynamic taint analysis. In: 2017 ISEA Asia Security and Privacy (ISEASP), Surat, pp. 1–13 (2017).  https://doi.org/10.1109/iseasp.2017.7976989
  9. 9.
    Shrivastava, A., Verma, V.K., Shankar, V.G.: XTrap: trapping client and server side XSS vulnerability. In: 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC), Waknaghat, pp. 394–398 (2016).  https://doi.org/10.1109/pdgc.2016.7913227
  10. 10.
    Data Aspirant k- Nearest Neighbor. http://dataaspirant.com/2016/12/23/k-nearest-neighbor-classifier-intro/ (2017). Accessed 14 Aug 2017
  11. 11.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, vol. 93, pp. 311–321 (1993)Google Scholar
  12. 12.
    Vaidya, P.M.: An o(nlogn) algorithm for the all-nearest-neighbors problem. In: Discrete Computational Geometry, vol. 4(2), pp. 101–115 (1989)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Apache Apache Spark. https://spark.apache.org/ (2017). Accessed 26 Aug 2017
  14. 14.
    Nada, E., Ahmed, E.: Big data analytics: a literature review paper. In: Lecture Notes in Computer Science, pp. 214–227. Springer, Aug 2014Google Scholar
  15. 15.
    Demetrios, Z.Y., Shonali, K.: Mobile big data analytics: research, practice, and opportunities. In: Proceeding MDM’ 2014, 15th International Conference on Mobile Data Management, vol. 01, pp. 1–2 (2014)Google Scholar
  16. 16.
    He, Y., Yu, F.R., Zhao, N., Yin, H., Yao, H., Robert, C.: Big data analytics in mobile cellular networks. In: IEEE Access, vol. 4 (2016).  https://doi.org/10.1109/access.2016.2540520CrossRefGoogle Scholar
  17. 17.
    EMC.: Dell EMC data science analytics. In: EMC Education Services, pp. 1–508 (2015)Google Scholar
  18. 18.
    Shankar, V.G., Somani, G.: Anti-Hijack: runtime detection of malware initiated hijacking in android. In: Procedia Computer Science, vol. 78, pp. 587–594 (2016).  https://doi.org/10.1016/j.procs.2016.02.105CrossRefGoogle Scholar
  19. 19.
    Fu, C., Cai, D.: EFANNA: An extremely fast approximate nearest neighbor search algorithm based on kNN graph. In: Computer Vision and Pattern Recognition (2016). http://arxiv.org/abs/1609.07228
  20. 20.
    Georgios, S., Mavromoustakis, C.X., Mastorakis, G., Batalla, J.M., Dobre, C., Panagiotakis, S., Pallis, E.: Big data and cloud computing: a survey of the state-of-the-art and research challenges. In: Advances in Mobile Cloud Computing and Big Data in the 5G Era Studies in Big Data 22 (2017)Google Scholar
  21. 21.
    Kune, R., Konugurthi, P.K., Agarwal, A., Chillarige, R.R., Buyya, R.: The anatomy of big data computing. In: Softw. Pract. Exper. 46, 79105 (2016)Google Scholar
  22. 22.
    Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F.: Big Data and cloud computing: innovation opportunities and challenges. In: International Journal of Digital Earth. Published by Informa UK Limited, trading as Taylor Francis (2016)Google Scholar
  23. 23.
    Tsai, C.W., Lai, C.F., Chao, H.C., Vasilakos, A.V.: Big Data Anal Surv J. Big Data 2, 21 (2015).  https://doi.org/10.1186/s40537-015-0030-3CrossRefGoogle Scholar
  24. 24.
    Knoema Tourism Dataset. https://knoema.com/atlas/topics/Tourism/datasets (2017). Accessed 24 Oct 2017
  25. 25.
    Vyoms Placement Dataset. http://www.vyoms.com/placement-papers/domains/details/business-analysis-223.asp (2017). Accessed 28 Oct 2017
  26. 26.
    IMDB Movies Dataset. https://www.kaggle.com/orgesleka/imdbmovies (2017). Accessed 21 Oct 2017
  27. 27.
    Google Smartphone Dataset. https://cloud.google.com/public-datasets/ (2017). Accessed 19 Oct 2017
  28. 28.
    Rtwilson Geographical Dataset. https://freegisdata.rtwilson.com/ (2017). Accessed 13 Oct 2017
  29. 29.
    Google Satellite Dataset. https://earthengine.google.com/datasets/ (2017). Accessed 13 Oct 2017
  30. 30.
    MIT Genetic Dataset. https://www.ll.mit.edu//ideval/data/ (2017). Accessed 17 Oct 2017
  31. 31.
    Shankar, V.G., Jangid, M., Devi, B., Kabra, S.: Mobile big data: malware and its analysis. In: Proceedings of First International Conference on Smart System, Innovations and Computing. Smart Innovation, Systems and Technologies, vol. 79, pp. 831–842, Springer, Singapore (2018).  https://doi.org/10.1007/978-981-10-5828-8_79 CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Venkatesh Gauri Shankar
    • 1
  • Bali Devi
    • 2
  • Sumit Srivastava
    • 1
  1. 1.SCIT, Manipal University JaipurJaipurIndia
  2. 2.CSE, Jayoti Vidyapeeth Women’s UniversityJaipurIndia

Personalised recommendations