Cluster Computing

, Volume 21, Issue 1, pp 189–204 | Cite as

A Gaussian process based big data processing framework in cluster computing environment

  • Gunasekaran ManogaranEmail author
  • Daphne Lopez


Machine learning algorithms play a vital role in the prediction of an outbreak of diseases based on climate change. Dengue outbreak is caused by improper maintenance of water storages, lack of urbanization, deforestation, lack of vaccination and awareness. Moreover, a number of dengue cases are varying based on climate season. There is a need to develop the prediction model for modeling the dengue outbreak based climate change. To model the dengue outbreak, Gaussian process regression (GPR) model is applied in this paper that uses the seasonal average of various climate parameters such as maximum temperature, minimum temperature, precipitation, wind, relative humidity and solar. The number of dengue cases and climate data for each block of Tamil Nadu, India are collected from Integrated Disease Surveillance Project and Global Weather Data for SWAT Inc respectively. Local Moran’s I spatial autocorrelation is used in this paper for geographical visualization of hotspot regions. The outbreak of dengue and its hot spot regions are geographically visualized with the help of ArcGIS 10.1 software. The day wise big climate data is collected and stored in the Hadoop cluster computing environment. MapReduce framework is used to reduce the day wise climate data into seasonal climate averages such as winter, summer, and monsoon. The seasonal climate data and number of dengue incidence (health data) are integrated based on the geo-location (latitude and longitude). GPR is used to develop the prediction model for dengue based on the integrated data (climate and health data). The proposed Gaussian process based prediction model is compared with various machine learning approaches such as multiple regression, support vector machine and random forests. Experimental results demonstrate the effectiveness of our Gaussian process based prediction framework.


Cluster computing Hadoop cluster MapReduce Dengue Disease Gaussian process Local Moran Spatial autocorrelation Weather data and climate change 


  1. 1.
    Tanner, L., Schreiber, M., Low, J.G., Ong, A., Tolfvenstam, T., Lai, Y.L., Ng, L.C., Leo, Y.S., Puong, L.T., Vasudevan, S.G., Simmons, C.P.: Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLoS Negl Trop Dis. 2(3), e196 (2008)CrossRefGoogle Scholar
  2. 2.
    Gharbi, M., Quenel, P., Gustave, J., Cassadou, S., La Ruche, G., Girdary, L., Marrama, L.: Time series analysis of dengue incidence in Guadeloupe, French West Indies: forecasting models using climate variables as predictors. BMC Infect. Dis. 11(1), 1 (2011)CrossRefGoogle Scholar
  3. 3.
    Eisen, L., Eisen, R.J.: Using geographic information systems and decision support systems for the prediction, prevention, and control of vector-borne diseases. Annu. Rev. Entomol. 7(56), 41–61 (2011)CrossRefGoogle Scholar
  4. 4.
    Buczak, A.L., Koshute, P.T., Babin, S.M., Feighner, B.H., Lewis, S.H.: A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data. BMC Med. Inform. Decis. Mak. 12(1), 1 (2012)CrossRefGoogle Scholar
  5. 5.
    Chadwick, D., Arch, B., Wilder-Smith, A., Paton, N.: Distinguishing dengue fever from other infections on the basis of simple clinical and laboratory features: application of logistic regression analysis. J. Clin. Virol. 35(2), 147–53 (2006)Google Scholar
  6. 6.
    Rogers, D.J., Suk, J.E., Semenza, J.C.: Using global maps to predict the risk of dengue in Europe. Acta Trop. 31(129), 1–4 (2014)CrossRefGoogle Scholar
  7. 7.
    Lopez, D., Gunasekaran, M.: Assessment of vaccination strategies using fuzzy multi-criteria decision making. In: Proceedings of the Fifth International Conference on Fuzzy and Neuro Computing (FANCCO-2015), pp. 195–208. Springer, New York (2015)Google Scholar
  8. 8.
    Lopez, D., Gunasekaran, M., Murugan, B.S., Kaur, H., Abbas, K.M.: Spatial big data analytics of influenza epidemic in Vellore, India. In: IEEE International Conference on InBig Data (Big Data), pp. 19–24 (2014)Google Scholar
  9. 9.
    Lopez, D., Sekaran, G.: Climate change and disease dynamics—a big data perspective. Int. J. Infect. Dis. 45, 23–24 (2016)CrossRefGoogle Scholar
  10. 10.
    Pfeiffer, D.U., Stevens, K.B.: Spatial and temporal epidemiological analysis in the big data era. Prev. Vet. Med. 122(1), 213–20 (2015)Google Scholar
  11. 11.
    Pickard, B.R., Baynes, J., Mehaffey, M., Neale, A.C.: Translating big data into big climate ideas. Solutions 6(1), 64–73 (2015)Google Scholar
  12. 12.
    Schnase, J.L., Duffy, D.Q., Tamkin, G.S., Nadeau, D., Thompson, J.H., Grieg, C.M., McInerney, M.A., Webster, W.P.: MERRA analytic services: meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-service. Environ. Urban Syst. Comput. 61, 198–211 (2014)CrossRefGoogle Scholar
  13. 13.
    Faghmous, J.H., Kumar, V.: A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2(3), 155–163 (2014)Google Scholar
  14. 14.
    Lee, J.G., Kang, M.: Geospatial big data: challenges and opportunities. Big Data Res. 2(2), 74–81 (2015)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Nativi, S., Mazzetti, P., Santoro, M., Papeschi, F., Craglia, M., Ochiai, O.: Big data challenges in building the global earth observation system of systems. Environ. Model. Softw. 30(68), 1–26 (2015)CrossRefGoogle Scholar
  16. 16.
    Groves, P., Kayyali, B., Knott, D., Van Kuiken, S.: The ‘big data’ revolution in healthcare. McKinsey Q. (2013)Google Scholar
  17. 17.
    Chawla, N.V., Davis, D.A.: Bringing big data to personalized healthcare: a patient-centered framework. J. Gen. Intern. Med. 28(3), 660–665 (2013)Google Scholar
  18. 18.
    Edlund, S.B., Davis, M.A., Kaufman, J.H.: The spatiotemporal epidemiological modeler. In: Proceedings of the 1st ACM International Health Informatics Symposium 2010 Nov 11, pp. 817–820. ACMGoogle Scholar
  19. 19.
    Seo, S., Wallat, M., Graepel, T., Obermayer, K., Gaussian process regression: Active data selection and test point rejection. In: Mustererkennung, pp. 27–34. Springer, Berlin (2000)Google Scholar
  20. 20.
    Albinati, J., Meira, Jr., W., Pappa, G.L.: An accurate gaussian process-based early warning system for dengue fever. arXiv:1608.03343 (2016)
  21. 21.
    Stegle, O., Fallert, S.V., MacKay, D.J., Brage, S.: Gaussian process robust regression for noisy heart rate data. IEEE Trans. Biomed. Eng. 55(9), 2143–2151 (2008)Google Scholar
  22. 22.
    Vathsangam, H., Emken, A., Spruijt-Metz, D., Sukhatme, G.S.: Toward free-living walking speed estimation using gaussian process-based regression with on-body accelerometers and gyroscopes. In: IEEE 2010 4th International Conference on Pervasive Computing Technologies for Healthcare 2010 Mar 22, pp. 1–8Google Scholar
  23. 23.
    Chandola, V., Vatsavai, R.R.: A scalable gaussian process analysis algorithm for biomass monitoring. Stat. Anal. Data Min. 4(4), 430–445 (2011)Google Scholar
  24. 24.
    Höhle, M.: Additive-multiplicative regression models for spatio-temporal epidemics. Biom. J. 51(6), 961–978 (2009)Google Scholar
  25. 25.
    Pang, J., Liu, D., Liao, H., Peng, Y., Peng, X.: Anomaly detection based on data stream monitoring and prediction with improved Gaussian process regression algorithm. In: IEEE Conference on Prognostics and Health Management (PHM), Jun 22, pp. 1–7 (2014)Google Scholar
  26. 26.
    Haran, M., Bhat, K.S., Molineros, J., De Wolf, E.: Estimating the risk of a crop epidemic from coincident spatio-temporal processes. J. Agric. Biol. Environ. Stat. 15(2), 158–175 (2010)Google Scholar
  27. 27.
    Dengue Fever Vaccine Program. (2016). Accessed 16 Sept 2016
  28. 28.
    WHO. (2016). Accessed 16 Sept 2016
  29. 29.
    National Programmes under NRHM, Annual Report 2013-14. MOHFW. (2016). Accessed 16 Sept 2016
  30. 30.
    Herriman, R.: India: Dengue cases double, malaria cases down in 2015 | Outbreak News Today. (2016). Accessed 16 Sept 2016
  31. 31.
    Nath, D.: Dengue cases: Delhi sets record in 20 years. The Hindu. (2015). Accessed 16 Sept 2016
  32. 32.
    India, P.: Delhi Faces Worst Dengue Outbreak Since 1996. Over 12,000 Cases Reported. (2016). 16 Sept 2016
  33. 33.
    Victor, T. J., Malathi, M., Asokan, R., Padmanaban, P.: Laboratory-based dengue fever surveillance in Tamil Nadu, India. Indian J. Med. Res. 126(2), 112 (2007)Google Scholar
  34. 34.
    NVBDCP | National Vector Borne Disease Control Programme. (2016). Accessed 16 Sept 2016
  35. 35.
    Manogaran, G., Thota, C., Kumar, M.V.: MetaCloudDataStorage architecture for big data security in cloud computing. Procedia Comput. Sci. 31(87), 128–133 (2016)Google Scholar
  36. 36.
    Manogaran, G., Thota, C., Lopez, D., Vijayakumar, V., Abbas, K.M., Sundarsekar, R.: Big data knowledge system in healthcare. In: Internet of Things and Big Data Technologies for Next Generation Healthcare 2017, pp. 133–157. Springer, BerlinGoogle Scholar
  37. 37.
    Manogaran, G., Lopez, D.: Disease surveillance system for big climate data processing and dengue transmission. Int. J. Ambient Comput. Intell. 8(2), 88–105 (2017)CrossRefGoogle Scholar
  38. 38.
    Gunasekaran, P., Kaveri, K., Mohana, S., Arunagiri, K., Babu, B.S., Priya, P.P., Kiruba, R., Kumar, V.S., Sheriff, A.K.: Dengue disease status in Chennai (2006–2008): a retrospective analysis. Indian J. Med. Res. 133(3), 322 (2011)Google Scholar
  39. 39.
    Bhuvaneswari, C., Raja, R., Arunagiri, K., Mohana, S., Sathiyamurthy, K., Krishnasamy, K., Gunasekaran, P.: Dengue epidemiology in Thanjavur and Trichy district, Tamilnadu-Jan 2011-Dec 2011. Indian J. Med. Sci. 65(6), 260 (2011)CrossRefGoogle Scholar
  40. 40.
    Anuradha, M., Dandekar, R.H., Banoo, S.: Laboratory diagnosis and incidence of Dengue virus infection: a hospital based study. Perambalur. Int. J. Biomed. Res. 5(3), 207–210 (2014)Google Scholar
  41. 41.
    Lopez, D., Manogaran, G.: Big Data Architecture for Climate Change and Disease Dynamics. CRC Press, Boca Raton (2016)Google Scholar
  42. 42.
    Thota, C., Manogaran. G., Lopez, D., Vijayakumar, V.: Big data security framework for distributed cloud data centers. In: Cybersecurity Breaches and Issues Surrounding Online Threat Protection 2017, pp. 288–310. IGI GlobalGoogle Scholar
  43. 43.
    Lopez, D., Manogaran, G.: Modelling the H1N1 influenza using mathematical and neural network approaches. Biomed. Res. 28(8), 3711–3715 (2017)Google Scholar
  44. 44.
    Manogaran, G., Thota, C., Lopez, D., Sundarasekar, R.: Big data security intelligence for healthcare industry 4.0. In: Cybersecurity for Industry 4.0: Analysis for Design and Manufacturing, vol. 3, p. 103 (2017)Google Scholar
  45. 45.
    Manogaran, G., Lopez, D.: Spatial cumulative sum algorithm with big data analytics for climate change detection. Comput. Electr. Eng. (2017). doi: 10.1016/j.compeleceng.2017.04.006
  46. 46.
    Anselin, L.: Local indicators of spatial association–LISA. Geogr. Anal. 27(2), 93–115 (1995)CrossRefGoogle Scholar
  47. 47.
    Almeida, A.S., Medronho, R.D., Valencia, L.I.: Spatial analysis of dengue and the socioeconomic context of the city of Rio de Janeiro (Southeastern Brazil). Revista de Saúde Pública. 43(4), pp. 666–673 (2009)Google Scholar
  48. 48.
    Hu, W., Clements, A., Williams, G., Tong, S.: Spatial analysis of notified dengue fever infections. Epidemiol. Infect. 139(03), 391–399 (2011)Google Scholar
  49. 49.
    Fearn, T.: Gaussian process regression. NIR News 24(6), 23–24 (2013)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of Information Technology & EngineeringVIT UniversityVelloreIndia

Personalised recommendations