Skip to main content

Big Data Analytics: A Comparison of Tools and Applications

  • Conference paper
  • First Online:
Innovations in Smart Cities and Applications (SCAMS 2017)

Abstract

With an ever-increasing amount of both data volume and variety, traditional data processing tools became unsuitable for the big data context. This has pushed toward the creation of specific processing tools that are well aligned with emerging needs. However, it is often hard to choose the adequate solution as the wide list of available tools are continuously changing. For this, we present in this paper both a literature review and a technical comparison of the most known analytics tools in order to help mapping it to different needs. Moreover, we underline how much important choosing the appropriate tool is acting for different kind of applications and especially for smart cities environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Internet Live Stats - Internet Usage & Social Media Statistics. http://www.internetlivestats.com/. Accessed 25 Mar 2017

  2. Reinsel, D., Gantz, J.: Extracting Value from Chaos. IDC IVIEW, Sponsored by EMC (2011)

    Google Scholar 

  3. Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety. META Group Inc., Stamford (2011)

    Google Scholar 

  4. Mohanty, S., Das, G., Suman, H., Maharana, P., Ratnakar, R.: A survey on working principle and application of Hadoop. Int. J. Adv. Innovative Res. 4, 71–75 (2015)

    Google Scholar 

  5. Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., Sakr, S.: Big data 2.0 processing systems: taxonomy and open challenges. J. Grid Comput. 14(3), 379–405 (2016)

    Article  Google Scholar 

  6. Lu, R., Wu, G., Xie, B., Hu, J.: Stream bench: towards benchmarking modern distributed stream computing frameworks. In: 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 69–78 (2014)

    Google Scholar 

  7. Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, New York, NY, USA, pp. 356–361 (2014)

    Google Scholar 

  8. Yadranjiaghdam, B., Pool, N., Tabrizi, N.: A survey on real-time big data analytics: applications and tools. In: 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 404–409 (2016)

    Google Scholar 

  9. Tsai, C.-W., Lai, C.-F., Chao, H.-C., Vasilakos, A.V.: Big data analytics: a survey. J. Big Data 2(1), 21 (2015)

    Article  Google Scholar 

  10. Gong, Y., Morandini, L., Sinnott, R.O.: The design and benchmarking of a cloud-based platform for processing and visualization of traffic data. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 13–20 (2017)

    Google Scholar 

  11. Jiang, Y., Huang, Z., Tsang, D.H.K.: Towards max-min fair resource allocation for stream big data analytics in shared clouds. IEEE Trans. Big Data PP(99), 1 (2017)

    Google Scholar 

  12. Gulzar, M.A., Interlandi, M., Condie, T., Kim, M.: BigDebug: interactive debugger for big data analytics in Apache Spark. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, USA, pp. 1033–1037 (2016)

    Google Scholar 

  13. Zhu, W., Chen, H., Hu, F.: ASC: improving spark driver performance with SPARK automatic checkpoint. In: 2016 18th International Conference on Advanced Communication Technology (ICACT), pp. 1–8 (2016)

    Google Scholar 

  14. Li, H., Chen, T., Xu, W.: Improving spark performance with zero-copy buffer management and RDMA. In: 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 33–38 (2016)

    Google Scholar 

  15. Yang, H., Liu, X., Chen, S., Lei, Z., Du, H., Zhu, C.: Improving Spark performance with MPTE in heterogeneous environments. In: 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 28–33 (2016)

    Google Scholar 

  16. Yan, Y., Gao, Y., Chen, Y., Guo, Z., Chen, B., Moscibroda, T.: TR-Spark: transient computing for big data analytics. In: Proceedings of the Seventh ACM Symposium on Cloud Computing, New York, USA, pp. 484–496 (2016)

    Google Scholar 

  17. Park, G., Park, S., Khan, L., Chung, L.: IRIS: a goal-oriented big data analytics framework on Spark for better business decisions. In: 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 76–83 (2017)

    Google Scholar 

  18. Hashem, I.A.T., et al.: The role of big data in smart city. Int. J. Inf. Manag. 36(5), 748–758 (2016)

    Article  Google Scholar 

  19. Yin, C., Xiong, Z., Chen, H., Wang, J., Cooper, D., David, B.: A literature survey on smart cities. Sci. China Inf. Sci. 58(10), 1–18 (2015)

    Article  Google Scholar 

  20. Nuaimi, E.A., Neyadi, H.A., Mohamed, N., Al-Jaroodi, J.: Applications of big data to smart cities. J. Internet Serv. Appl. 6(1), 25 (2015)

    Article  Google Scholar 

  21. Rathore, M.M., Ahmad, A., Paul, A.: IoT-based smart city development using big data analytical approach. In: 2016 IEEE International Conference on Automatica (ICA-ACCA), pp. 1–8 (2016)

    Google Scholar 

  22. Nathali Silva, B., Khan, M., Han, K.: Big data analytics embedded smart city architecture for performance enhancement through real-time data processing and decision-making. Wirel. Commun. Mob. Comput. 2017, e9429676 (2017)

    Article  Google Scholar 

  23. Costa, C., Santos, M.Y.: BASIS: a big data architecture for smart cities. In: 2016 SAI Computing Conference (SAI), pp. 1247–1256 (2016)

    Google Scholar 

  24. Gomes, E., Dantas, M.A.R., de Macedo, D.D.J., Rolt, C.D., Brocardo, M.L., Foschini, L.: Towards an infrastructure to support big data for a smart city project. In: 2016 IEEE 25th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 107–112 (2016)

    Google Scholar 

  25. Mosannenzadeh, F., Di Nucci, M.R., Vettorato, D.: Identifying and prioritizing barriers to implementation of smart energy city projects in Europe: an empirical approach. Energy Policy 105, 191–201 (2017)

    Article  Google Scholar 

  26. Coulouris, G., Dollimore, J., Kindberg, T., Blair, G.: Distributed Systems: Concepts and Design, 5th edn. Pearson, Boston (2011)

    MATH  Google Scholar 

  27. HDFS Architecture Guide. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Introduction. Accessed: 27 Mar 2017

  28. Google Research Publication: MapReduce. https://research.google.com/archive/mapreduce.html. Accessed 21 Jan 2017

  29. MapReduce Tutorial. https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html. Accessed 27 Mar 2017

  30. Lee, K.-H., Lee, Y.-J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with mapreduce: a survey. SIGMOD Rec. 40(4), 11–20 (2012)

    Article  Google Scholar 

  31. Vavilapalli, V.K., et al.: Apache hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, New York, USA, pp. 5:1–5:16 (2013)

    Google Scholar 

  32. Apache SparkTM - Lightning-Fast Cluster Computing. https://spark.apache.org/. Accessed 27 Mar 2017

  33. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, Berkeley, USA, p. 10 (2010)

    Google Scholar 

  34. Xin, R.: Spark officially sets a new record in large-scale sorting (2014). http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html. Accessed 27 Mar 2017

  35. Sort Benchmark Home Page. http://sortbenchmark.org/. Accessed 27 Mar 2017

  36. Trident Tutorial. http://storm.apache.org/releases/1.0.1/Trident-tutorial.html. Accessed 05 Apr 2017

  37. Apache Storm: http://storm.apache.org/. Accessed 27 Mar 2017

  38. Apache Flink: Scalable Stream and Batch Data Processing. https://flink.apache.org/. Accessed 27 Mar 2017

  39. Samza: http://samza.apache.org/. Accessed 27 Mar 2017

  40. Google Trends: Google Trends. https://g.co/trends/aes0h. Accessed 31 Mar 2017

  41. Thommandram, A., Pugh, J.E., Eklund, J.M., McGregor, C., James, A.G.: Classifying neonatal spells using real-time temporal analysis of physiological data streams: algorithm development. In: 2013 IEEE Point-of-Care Healthcare Technologies (PHT), pp. 240–243 (2013)

    Google Scholar 

  42. Nair, L.R., Shetty, S.D., Shetty, S.D.: Applying Spark based machine learning model on streaming big data for health status prediction. Comput. Electr. Eng. (2017, in press)

    Google Scholar 

  43. Yan, K., You, X., Ji, X., Yin, G., Yang, F.: A hybrid outlier detection method for health care big data. In: 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), pp. 157–162 (2016)

    Google Scholar 

  44. Chen, X., Shao, S., Tian, Z., Xie, Z., Yin, P.: Impacts of air pollution and its spatial spillover effect on public health based on China’s big data sample. J. Clean. Prod. 142(Part 2), 915–925 (2017)

    Article  Google Scholar 

  45. Goli-Malekabadi, Z., Sargolzaei-Javan, M., Akbari, M.K.: An effective model for store and retrieve big health data in cloud computing. Comput. Methods Programs Biomed. 132, 75–82 (2016)

    Article  Google Scholar 

  46. Al Rasyid, M.U.H., Yuwono, W., Muharom, S.A., Alasiry, A.H.: Building platform application big sensor data for e-health wireless body area network. In: 2016 International Electronics Symposium (IES), pp. 409–413 (2016)

    Google Scholar 

  47. Ma, Y., Wang, Y., Yang, J., Miao, Y., Li, W.: Big health application system based on health internet of things and big data. IEEE Access PP(99), 1 (2016)

    Google Scholar 

  48. Ho, K.F., Hirai, H.W., Kuo, Y.H., Meng, H.M., Tsoi, K.K.F.: Indoor air monitoring platform and personal health reporting system: big data analytics for public health research. In: 2015 IEEE International Congress on Big Data, pp. 309–312 (2015)

    Google Scholar 

  49. Ta, V.-D., Liu, C.-M., Nkabinde, G.W.: Big data stream computing in healthcare real-time analytics. In: 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), pp. 37–42 (2016)

    Google Scholar 

  50. Gupta, S., Tripathi, P.: An emerging trend of big data analytics with health insurance in India. In: 2016 International Conference on Innovation and Challenges in Cyber Security (ICICCS-INBUSH), pp. 64–69 (2016)

    Google Scholar 

  51. Kumar, K.M., Tejasree, S., Swarnalatha, S.: Effective implementation of data segregation extraction using big data in E - health insurance as a service. In: 2016 3rd International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, pp. 1–5 (2016)

    Google Scholar 

  52. Suguna, S., Vithya, M., Eunaicy, J.I.C.: Big data analysis in e-commerce system using HadoopMapReduce. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 2, pp. 1–6 (2016)

    Google Scholar 

  53. Dong, T., Yang, B., Tian, T.: Volatility analysis of Chinese stock market using high-frequency financial big data. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 769–774 (2015)

    Google Scholar 

  54. Zamani-Dehkordi, P., Rakai, L., Zareipour, H., Rosehart, W.: Big data analytics for modelling the impact of wind power generation on competitive electricity market prices. In: 2016 49th Hawaii International Conference on System Sciences (HICSS), pp. 2528–2535 (2016)

    Google Scholar 

  55. Aivalis, C.J., Gatziolis, K., Boucouvalas, A.C.: Evolving analytics for e-commerce applications: utilizing big data and social media extensions. In: 2016 International Conference on Telecommunications and Multimedia (TEMU), pp. 1–6 (2016)

    Google Scholar 

  56. Deng, L., Gao, J., Vuppalapati, C.: Building a big data analytics service framework for mobile advertising and marketing. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications, pp. 256–266 (2015)

    Google Scholar 

  57. Zhang, H., Zhang, L., Cheng, X., Chen, W.: A novel precision marketing model based on telecom big data analysis for luxury cars. In: 2016 16th International Symposium on Communications and Information Technologies (ISCIT), pp. 307–311 (2016)

    Google Scholar 

  58. Bollen, J., Mao, H., Zeng, X.-J.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)

    Article  Google Scholar 

  59. Zhang, Y., Pennacchiotti, M.: Predicting purchase behaviors from social media. In: Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, pp. 1521–1532 (2013)

    Google Scholar 

  60. Attigeri, G.V., Pai, M.M.M., Pai, R.M., Nayak, A.: Stock market prediction: a big data approach. In: TENCON 2015 - 2015 IEEE Region 10 Conference, pp. 1–5 (2015)

    Google Scholar 

  61. Wich, M., Kramer, T.: Enrichment of smart home services by integrating social network services and big data analytics. In: 2016 49th Hawaii International Conference on System Sciences (HICSS), pp. 425–434 (2016)

    Google Scholar 

  62. Xu, G., Liu, M., Li, F., Zhang, F., Shen, W.: User behavior prediction model for smart home using parallelized neural network algorithm. In: 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 221–226 (2016)

    Google Scholar 

  63. Udupi, P.K., Malali, P., Noronha, H.: Big data integration for transition from e-learning to smart learning framework. In: 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC), pp. 1–4 (2016)

    Google Scholar 

  64. Jagtap, A., Bodkhe, B., Gaikwad, B., Kalyana, S.: Homogenizing social networking with smart education by means of machine learning and Hadoop: a case study. In: 2016 International Conference on Internet of Things and Applications (IOTA), pp. 85–90 (2016)

    Google Scholar 

  65. Raghothama, J., Shreenath, V.M., Meijer, S.: Analytics on public transport delays with spatial big data. In: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, New York, USA, pp. 28–33 (2016)

    Google Scholar 

  66. Rathore, M.M., Ahmad, A., Paul, A., Jeon, G.: Efficient graph-oriented smart transportation using internet of things generated big data. In: 2015 11th International Conference on Signal-Image Technology Internet-Based Systems (SITIS), pp. 512–519 (2015)

    Google Scholar 

  67. Chua, A., Servillo, L., Marcheggiani, E., Moere, A.V.: Mapping cilento: using geotagged social media data to characterize tourist flows in southern Italy. Tour. Manag. 57, 295–310 (2016)

    Article  Google Scholar 

  68. Hochstetler, J., Hochstetler, L., Fu, S.: An optimal police patrol planning strategy for smart city safety. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1256–1263 (2016)

    Google Scholar 

  69. Yamini, J., Babu, Y.R.: Design and implementation of smart home energy management system. In: 2016 International Conference on Communication and Electronics Systems (ICCES), pp. 1–4 (2016)

    Google Scholar 

  70. Vaidya, M., Deshpande, S.: Distributed data management in energy sector using Hadoop. In: 2015 IEEE Bombay Section Symposium (IBSS), pp. 1–6 (2015)

    Google Scholar 

  71. Kavianand, G., Nivas, V.M., Kiruthika, R., Lalitha, S.: Smart drip irrigation system for sustainable agriculture. In: 2016 IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR), pp. 19–22 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imane El Alaoui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Alaoui, I., Gahi, Y., Messoussi, R., Todoskoff, A., Kobi, A. (2018). Big Data Analytics: A Comparison of Tools and Applications. In: Ben Ahmed, M., Boudhir, A. (eds) Innovations in Smart Cities and Applications. SCAMS 2017. Lecture Notes in Networks and Systems, vol 37. Springer, Cham. https://doi.org/10.1007/978-3-319-74500-8_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74500-8_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74499-5

  • Online ISBN: 978-3-319-74500-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics