Big Data Analytics: A Comparison of Tools and Applications

  • Imane El Alaoui
  • Youssef Gahi
  • Rochdi Messoussi
  • Alexis Todoskoff
  • Abdessamad Kobi
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 37)


With an ever-increasing amount of both data volume and variety, traditional data processing tools became unsuitable for the big data context. This has pushed toward the creation of specific processing tools that are well aligned with emerging needs. However, it is often hard to choose the adequate solution as the wide list of available tools are continuously changing. For this, we present in this paper both a literature review and a technical comparison of the most known analytics tools in order to help mapping it to different needs. Moreover, we underline how much important choosing the appropriate tool is acting for different kind of applications and especially for smart cities environment.


Big data analytics tools Big data tools’ comparison Smart cities 


  1. 1.
    Internet Live Stats - Internet Usage & Social Media Statistics. Accessed 25 Mar 2017
  2. 2.
    Reinsel, D., Gantz, J.: Extracting Value from Chaos. IDC IVIEW, Sponsored by EMC (2011)Google Scholar
  3. 3.
    Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety. META Group Inc., Stamford (2011)Google Scholar
  4. 4.
    Mohanty, S., Das, G., Suman, H., Maharana, P., Ratnakar, R.: A survey on working principle and application of Hadoop. Int. J. Adv. Innovative Res. 4, 71–75 (2015)Google Scholar
  5. 5.
    Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., Sakr, S.: Big data 2.0 processing systems: taxonomy and open challenges. J. Grid Comput. 14(3), 379–405 (2016)CrossRefGoogle Scholar
  6. 6.
    Lu, R., Wu, G., Xie, B., Hu, J.: Stream bench: towards benchmarking modern distributed stream computing frameworks. In: 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 69–78 (2014)Google Scholar
  7. 7.
    Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, New York, NY, USA, pp. 356–361 (2014)Google Scholar
  8. 8.
    Yadranjiaghdam, B., Pool, N., Tabrizi, N.: A survey on real-time big data analytics: applications and tools. In: 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 404–409 (2016)Google Scholar
  9. 9.
    Tsai, C.-W., Lai, C.-F., Chao, H.-C., Vasilakos, A.V.: Big data analytics: a survey. J. Big Data 2(1), 21 (2015)CrossRefGoogle Scholar
  10. 10.
    Gong, Y., Morandini, L., Sinnott, R.O.: The design and benchmarking of a cloud-based platform for processing and visualization of traffic data. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 13–20 (2017)Google Scholar
  11. 11.
    Jiang, Y., Huang, Z., Tsang, D.H.K.: Towards max-min fair resource allocation for stream big data analytics in shared clouds. IEEE Trans. Big Data PP(99), 1 (2017)Google Scholar
  12. 12.
    Gulzar, M.A., Interlandi, M., Condie, T., Kim, M.: BigDebug: interactive debugger for big data analytics in Apache Spark. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, USA, pp. 1033–1037 (2016)Google Scholar
  13. 13.
    Zhu, W., Chen, H., Hu, F.: ASC: improving spark driver performance with SPARK automatic checkpoint. In: 2016 18th International Conference on Advanced Communication Technology (ICACT), pp. 1–8 (2016)Google Scholar
  14. 14.
    Li, H., Chen, T., Xu, W.: Improving spark performance with zero-copy buffer management and RDMA. In: 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 33–38 (2016)Google Scholar
  15. 15.
    Yang, H., Liu, X., Chen, S., Lei, Z., Du, H., Zhu, C.: Improving Spark performance with MPTE in heterogeneous environments. In: 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 28–33 (2016)Google Scholar
  16. 16.
    Yan, Y., Gao, Y., Chen, Y., Guo, Z., Chen, B., Moscibroda, T.: TR-Spark: transient computing for big data analytics. In: Proceedings of the Seventh ACM Symposium on Cloud Computing, New York, USA, pp. 484–496 (2016)Google Scholar
  17. 17.
    Park, G., Park, S., Khan, L., Chung, L.: IRIS: a goal-oriented big data analytics framework on Spark for better business decisions. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 76–83 (2017)Google Scholar
  18. 18.
    Hashem, I.A.T., et al.: The role of big data in smart city. Int. J. Inf. Manag. 36(5), 748–758 (2016)CrossRefGoogle Scholar
  19. 19.
    Yin, C., Xiong, Z., Chen, H., Wang, J., Cooper, D., David, B.: A literature survey on smart cities. Sci. China Inf. Sci. 58(10), 1–18 (2015)CrossRefGoogle Scholar
  20. 20.
    Nuaimi, E.A., Neyadi, H.A., Mohamed, N., Al-Jaroodi, J.: Applications of big data to smart cities. J. Internet Serv. Appl. 6(1), 25 (2015)CrossRefGoogle Scholar
  21. 21.
    Rathore, M.M., Ahmad, A., Paul, A.: IoT-based smart city development using big data analytical approach. In: 2016 IEEE International Conference on Automatica (ICA-ACCA), pp. 1–8 (2016)Google Scholar
  22. 22.
    Nathali Silva, B., Khan, M., Han, K.: Big data analytics embedded smart city architecture for performance enhancement through real-time data processing and decision-making. Wirel. Commun. Mob. Comput. 2017, e9429676 (2017)CrossRefGoogle Scholar
  23. 23.
    Costa, C., Santos, M.Y.: BASIS: a big data architecture for smart cities. In: 2016 SAI Computing Conference (SAI), pp. 1247–1256 (2016)Google Scholar
  24. 24.
    Gomes, E., Dantas, M.A.R., de Macedo, D.D.J., Rolt, C.D., Brocardo, M.L., Foschini, L.: Towards an infrastructure to support big data for a smart city project. In: 2016 IEEE 25th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 107–112 (2016)Google Scholar
  25. 25.
    Mosannenzadeh, F., Di Nucci, M.R., Vettorato, D.: Identifying and prioritizing barriers to implementation of smart energy city projects in Europe: an empirical approach. Energy Policy 105, 191–201 (2017)CrossRefGoogle Scholar
  26. 26.
    Coulouris, G., Dollimore, J., Kindberg, T., Blair, G.: Distributed Systems: Concepts and Design, 5th edn. Pearson, Boston (2011)zbMATHGoogle Scholar
  27. 27.
    HDFS Architecture Guide. Accessed: 27 Mar 2017
  28. 28.
    Google Research Publication: MapReduce. Accessed 21 Jan 2017
  29. 29.
    MapReduce Tutorial. Accessed 27 Mar 2017
  30. 30.
    Lee, K.-H., Lee, Y.-J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with mapreduce: a survey. SIGMOD Rec. 40(4), 11–20 (2012)CrossRefGoogle Scholar
  31. 31.
    Vavilapalli, V.K., et al.: Apache hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, New York, USA, pp. 5:1–5:16 (2013)Google Scholar
  32. 32.
    Apache SparkTM - Lightning-Fast Cluster Computing. Accessed 27 Mar 2017
  33. 33.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, Berkeley, USA, p. 10 (2010)Google Scholar
  34. 34.
    Xin, R.: Spark officially sets a new record in large-scale sorting (2014). Accessed 27 Mar 2017
  35. 35.
    Sort Benchmark Home Page. Accessed 27 Mar 2017
  36. 36.
    Trident Tutorial. Accessed 05 Apr 2017
  37. 37.
    Apache Storm: Accessed 27 Mar 2017
  38. 38.
    Apache Flink: Scalable Stream and Batch Data Processing. Accessed 27 Mar 2017
  39. 39.
    Samza: Accessed 27 Mar 2017
  40. 40.
    Google Trends: Google Trends. Accessed 31 Mar 2017
  41. 41.
    Thommandram, A., Pugh, J.E., Eklund, J.M., McGregor, C., James, A.G.: Classifying neonatal spells using real-time temporal analysis of physiological data streams: algorithm development. In: 2013 IEEE Point-of-Care Healthcare Technologies (PHT), pp. 240–243 (2013)Google Scholar
  42. 42.
    Nair, L.R., Shetty, S.D., Shetty, S.D.: Applying Spark based machine learning model on streaming big data for health status prediction. Comput. Electr. Eng. (2017, in press)Google Scholar
  43. 43.
    Yan, K., You, X., Ji, X., Yin, G., Yang, F.: A hybrid outlier detection method for health care big data. In: 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), pp. 157–162 (2016)Google Scholar
  44. 44.
    Chen, X., Shao, S., Tian, Z., Xie, Z., Yin, P.: Impacts of air pollution and its spatial spillover effect on public health based on China’s big data sample. J. Clean. Prod. 142(Part 2), 915–925 (2017)CrossRefGoogle Scholar
  45. 45.
    Goli-Malekabadi, Z., Sargolzaei-Javan, M., Akbari, M.K.: An effective model for store and retrieve big health data in cloud computing. Comput. Methods Programs Biomed. 132, 75–82 (2016)CrossRefGoogle Scholar
  46. 46.
    Al Rasyid, M.U.H., Yuwono, W., Muharom, S.A., Alasiry, A.H.: Building platform application big sensor data for e-health wireless body area network. In: 2016 International Electronics Symposium (IES), pp. 409–413 (2016)Google Scholar
  47. 47.
    Ma, Y., Wang, Y., Yang, J., Miao, Y., Li, W.: Big health application system based on health internet of things and big data. IEEE Access PP(99), 1 (2016)Google Scholar
  48. 48.
    Ho, K.F., Hirai, H.W., Kuo, Y.H., Meng, H.M., Tsoi, K.K.F.: Indoor air monitoring platform and personal health reporting system: big data analytics for public health research. In: 2015 IEEE International Congress on Big Data, pp. 309–312 (2015)Google Scholar
  49. 49.
    Ta, V.-D., Liu, C.-M., Nkabinde, G.W.: Big data stream computing in healthcare real-time analytics. In: 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 37–42 (2016)Google Scholar
  50. 50.
    Gupta, S., Tripathi, P.: An emerging trend of big data analytics with health insurance in India. In: 2016 International Conference on Innovation and Challenges in Cyber Security (ICICCS-INBUSH), pp. 64–69 (2016)Google Scholar
  51. 51.
    Kumar, K.M., Tejasree, S., Swarnalatha, S.: Effective implementation of data segregation extraction using big data in E - health insurance as a service. In: 2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, pp. 1–5 (2016)Google Scholar
  52. 52.
    Suguna, S., Vithya, M., Eunaicy, J.I.C.: Big data analysis in e-commerce system using HadoopMapReduce. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–6 (2016)Google Scholar
  53. 53.
    Dong, T., Yang, B., Tian, T.: Volatility analysis of Chinese stock market using high-frequency financial big data. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 769–774 (2015)Google Scholar
  54. 54.
    Zamani-Dehkordi, P., Rakai, L., Zareipour, H., Rosehart, W.: Big data analytics for modelling the impact of wind power generation on competitive electricity market prices. In: 2016 49th Hawaii International Conference on System Sciences (HICSS), pp. 2528–2535 (2016)Google Scholar
  55. 55.
    Aivalis, C.J., Gatziolis, K., Boucouvalas, A.C.: Evolving analytics for e-commerce applications: utilizing big data and social media extensions. In: 2016 International Conference on Telecommunications and Multimedia (TEMU), pp. 1–6 (2016)Google Scholar
  56. 56.
    Deng, L., Gao, J., Vuppalapati, C.: Building a big data analytics service framework for mobile advertising and marketing. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications, pp. 256–266 (2015)Google Scholar
  57. 57.
    Zhang, H., Zhang, L., Cheng, X., Chen, W.: A novel precision marketing model based on telecom big data analysis for luxury cars. In: 2016 16th International Symposium on Communications and Information Technologies (ISCIT), pp. 307–311 (2016)Google Scholar
  58. 58.
    Bollen, J., Mao, H., Zeng, X.-J.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)CrossRefGoogle Scholar
  59. 59.
    Zhang, Y., Pennacchiotti, M.: Predicting purchase behaviors from social media. In: Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, pp. 1521–1532 (2013)Google Scholar
  60. 60.
    Attigeri, G.V., Pai, M.M.M., Pai, R.M., Nayak, A.: Stock market prediction: a big data approach. In: TENCON 2015 - 2015 IEEE Region 10 Conference, pp. 1–5 (2015)Google Scholar
  61. 61.
    Wich, M., Kramer, T.: Enrichment of smart home services by integrating social network services and big data analytics. In: 2016 49th Hawaii International Conference on System Sciences (HICSS), pp. 425–434 (2016)Google Scholar
  62. 62.
    Xu, G., Liu, M., Li, F., Zhang, F., Shen, W.: User behavior prediction model for smart home using parallelized neural network algorithm. In: 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 221–226 (2016)Google Scholar
  63. 63.
    Udupi, P.K., Malali, P., Noronha, H.: Big data integration for transition from e-learning to smart learning framework. In: 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC), pp. 1–4 (2016)Google Scholar
  64. 64.
    Jagtap, A., Bodkhe, B., Gaikwad, B., Kalyana, S.: Homogenizing social networking with smart education by means of machine learning and Hadoop: a case study. In: 2016 International Conference on Internet of Things and Applications (IOTA), pp. 85–90 (2016)Google Scholar
  65. 65.
    Raghothama, J., Shreenath, V.M., Meijer, S.: Analytics on public transport delays with spatial big data. In: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, New York, USA, pp. 28–33 (2016)Google Scholar
  66. 66.
    Rathore, M.M., Ahmad, A., Paul, A., Jeon, G.: Efficient graph-oriented smart transportation using internet of things generated big data. In: 2015 11th International Conference on Signal-Image Technology Internet-Based Systems (SITIS), pp. 512–519 (2015)Google Scholar
  67. 67.
    Chua, A., Servillo, L., Marcheggiani, E., Moere, A.V.: Mapping cilento: using geotagged social media data to characterize tourist flows in southern Italy. Tour. Manag. 57, 295–310 (2016)CrossRefGoogle Scholar
  68. 68.
    Hochstetler, J., Hochstetler, L., Fu, S.: An optimal police patrol planning strategy for smart city safety. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1256–1263 (2016)Google Scholar
  69. 69.
    Yamini, J., Babu, Y.R.: Design and implementation of smart home energy management system. In: 2016 International Conference on Communication and Electronics Systems (ICCES), pp. 1–4 (2016)Google Scholar
  70. 70.
    Vaidya, M., Deshpande, S.: Distributed data management in energy sector using Hadoop. In: 2015 IEEE Bombay Section Symposium (IBSS), pp. 1–6 (2015)Google Scholar
  71. 71.
    Kavianand, G., Nivas, V.M., Kiruthika, R., Lalitha, S.: Smart drip irrigation system for sustainable agriculture. In: 2016 IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR), pp. 19–22 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Laboratoire des Systèmes de Télécommunications et Ingénierie de La DécisionUniversity of Ibn TofailKenitraMorocco
  2. 2.Laboratoire Angevin de Recherche en Ingénierie des SystèmesUniversity of AngersAngersFrance
  3. 3.School of Electrical Engineering and Computer ScienceUniversity of OttawaOttawaCanada
  4. 4.Ecore Nationale des Sciences AppliquéesUniversity of Ibn TofailKenitraMorocco

Personalised recommendations