A Distributed, Scalable Computing Facility for Big Data Analytics in Atmospheric Physics

  • Reena Bharathi
  • S. C. Shirwaikar
  • Vilas Kharat
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 721)


Technological advancements in computing and communication have led to a flood of data from different domains like healthcare, social networks, Internet commerce and finance. Over the past few years a larger chunk of data comes from the domain of scientific applications, using simulated experiments or collected using sensors. This development calls for new architectural models for data acquisition, storage, and large-scale data analytics.

In this paper, we present a distributed and scalable computing facility, using low cost machines, which support analytics of large scientific data sets, constituting three sequential modules, namely data pre-processing, data analytics and data post-processing. These three modules together form a big data value chain which is illustrated through a case study related to Atmospheric physics.


Data analytics Big data MapReduce Clustering Hadoop Atmospheric physics 



The experimental data sets for this work are obtained from the Atmospheric Physics research lab, Nowrosjee Wadia College, Pune. The authors would like to thank Dr. Gajanan Aher and his team for their enthusiastic support and guidance.


  1. 1.
    Cuzzocrea, A., Song, I.-Y., Davis, K.C.: Analytics over large-scale multidimensional data: the big data revolution!. In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, pp. 101–104. ACM (2011)Google Scholar
  2. 2.
    Hu, H., Wen, Y., Chua, T.-S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)CrossRefGoogle Scholar
  3. 3.
    Chen, H., Chiang, R.H.L., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)Google Scholar
  4. 4.
    Boyd, D., Crawford, K.: Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15(5), 662–679 (2012)CrossRefGoogle Scholar
  5. 5.
    Gorton, I., Greenfield, P., Szalay, A., Williams, R.: Data-intensive computing in the 21st century. Computer 41(4), 30–32 (2008)CrossRefGoogle Scholar
  6. 6.
    Srirama, S.N., Jakovits, P., Vainikko, E.: Adapting scientific computing problems to clouds using MapReduce. Future Gener. Comput. Syst. 28(1), 184–192 (2012)CrossRefGoogle Scholar
  7. 7.
    Tudoran, R., Costan, A., Antoniu, G., Bougé, L.: A performance evaluation of azure and nimbus clouds for scientific applications. In: Proceedings of the 2nd International Workshop on Cloud Computing Platforms, p. 4. ACM (2012)Google Scholar
  8. 8.
    Wang, L., Tao, J., Kunze, M., Castellanos, A.C., Kramer, D., Karl, W.: Scientific cloud computing: early definition and experience. In: HPCC, vol. 8, pp. 825–830 (2008)Google Scholar
  9. 9.
    Ramakrishnan, L., Zbiegel, P.T., Campbell, S., Bradshaw, R., Canon, R.S., Coghlan, S., Sakrejda, I., Desai, N., Declerck, T., Liu, A.: Magellan: experiences from a science cloud. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing, pp. 49–58. ACM (2011)Google Scholar
  10. 10.
    Grossman, R.L., Gu, Y., Mambretti, J., Sabala, M., Szalay, A., White, K.: An overview of the open science data cloud. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 377–384. ACM (2010)Google Scholar
  11. 11.
    Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: IEEE Fourth International Conference on eScience, eScience 2008, pp. 277–284. IEEE (2008)Google Scholar
  12. 12.
    Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRefGoogle Scholar
  13. 13.
    Pawar, G.V., Devara, P.C.S., Aher, G.R.: Identification of aerosol types over an urban site based on air-mass trajectory classification. Atmos. Res. 164, 142–155 (2015)CrossRefGoogle Scholar
  14. 14.
    Jorba, O., Pérez, C., Rocadenbosch, F., Baldasano, J.: Cluster analysis of 4-day back trajectories arriving in the Barcelona area, Spain, from 1997 to 2002. J. Appl. Meteorol. 43(6), 887–901 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • Reena Bharathi
    • 1
  • S. C. Shirwaikar
    • 1
  • Vilas Kharat
    • 1
  1. 1.Department of Computer ScienceSavitribai Phule Pune UniversityPuneIndia

Personalised recommendations