Predictive ability of covariate-dependent Markov models and classification tree for analyzing rainfall data in Bangladesh

  • Sultan MahmudEmail author
  • M. Ataharul Islam
Original Paper


This study attempts to make comparison between different parametric regressive models for the bivariate binary data with a machine learning technique. The data on sequential occurrence of rainfall in consecutive days is considered. The outcomes are classified as rainfall in both days, rainfall in one of the consecutive days, and no rainfall in both days. The occurrence of rainfall in consecutive days is analyzed by using statistical models with covariate dependence and classification tree for the period from 1980 to 2014. We have used relative humidity, minimum temperature, maximum temperature, sea level pressure, sunshine hour, and cloud cover in the model as covariates. The binary outcome variable is defined as the occurrence or non-occurrence of rainfall. Five regions of Bangladesh are considered in this study and one station from each region is selected on the basis of two criteria: (i) contains fewer missing values and (ii) representative of the regional characteristics geographically. Several measures are used to compare the models based on Markov chain and classification tree. It is found that for yearly data, both the Markov model and classification tree performed satisfactorily. However, the seasonal data show variation of rainfall. In some seasons, both models perform equally good such as monsoon, pre-monsoon, and post-monsoon, but in the winter season, the Markov model works poorly whereas classification tree fails to work. Additionally, we also observe that the Markov model performed consistently for each season and performs better compared with the classification tree. It has been demonstrated that the covariate-dependent Markov models can be used as classifiers alternative to the classification tree. It is revealed that the predictive ability of the covariate-dependent Markov model based on Markovian assumption performs either better or equally good compared with the classification tree. The joint models also consistently showed better predictive performance compared with regressive model for whole year data as well as for several seasonal data.


Daily rainfall Markov model Logistic regression Model comparison Classification tree Predictive ability 



  1. Abubakar UY, Lawal A, Muhammed A (2013) The use of Markov model in continuous time for prediction of rainfall for crop production. IOSR J Math 7(1):38–45. Google Scholar
  2. Arminger G, Enache D, Bonne T (1997) Analyzing credit risk data: a comparison of logistic discrimination, classification tree analysis, and feed forward networks. Comput Stat 12(2):293–310Google Scholar
  3. Bahaga TK, Kucharski F, Mengistu Tsidu G, Yang H (2016) Assessment of prediction and predictability of short rains over equatorial East Africa using a multi-model ensemble. Theor Appl Climatol 123(3):637–649. Google Scholar
  4. Bonney GE (1986) Regressive logistic models for familial disease and other binary traits. Biometrics 42(3):611–625Google Scholar
  5. Bonney GE (1987) Logistic regression for dependent binary observations. Biometrics 43(4):951–973. Google Scholar
  6. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159Google Scholar
  7. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. CRC Press, Boca Raton, FloridaGoogle Scholar
  8. Chaudhuri S, Goswami S, Das D, Middey A (2014) Meta-heuristic ant colony optimization technique to forecast the amount of summer monsoon rainfall: skill comparison with Markov chain model. Theor Appl Climatol 116(3):585–595. Google Scholar
  9. Dahale SD, Panchawagh N, Singh SV, Ranatunge ER, Brikshavana M (1994) Persistence in rainfall occurrence over tropical South-East Asia and equatorial Pacific. Theor Appl Climatol 49(1):27–39. Google Scholar
  10. Deni SM, Jemain AA (2009) Fitting the distribution of dry and wet spells with alternative probability models. Meteorog Atmos Phys 104(1–2):13–27Google Scholar
  11. Dodd LE, Pepe MS (2003) Partial AUC estimation and regression. Biometrics 59(3):614–623Google Scholar
  12. Franklin J (1998) Predicting the distribution of shrub species in southern California from climate and terrain derived variables. J Veg Sci 9(5):733–748Google Scholar
  13. Englehart PJ, Douglas AV (2009) Diagnosing warm-season rainfall variability in Mexico: a classification tree approach. Int J Climatol 30(5):694–704. Google Scholar
  14. Gerlitz L (2015) Using fuzzified regression trees for statistical downscaling and regionalization of near surface temperatures in complex terrain. Theor Appl Climatol 122(1):337–352. Google Scholar
  15. Goyal MK (2014) Monthly rainfall prediction using wavelet regression and neural network: an analysis of 1901–2002 data, Assam, India. Theor Appl Climatol 118(1):25–34. Google Scholar
  16. Guisan A, Theurillat J-P, Kienast F (1998) Predicting the potential distribution of plant species in an alpine environment. J Veg Sci 9(1):65–74Google Scholar
  17. Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123Google Scholar
  18. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36Google Scholar
  19. Huang J, Lu J, Ling CX (2003) Comparing naive Bayes, decision trees, and SVM with AUC and accuracy. In Data Mining, 2003. ICDM 2003. Third IEEE international conference on, pages 553–556. IEEEGoogle Scholar
  20. Islam M, Chowdhury R, Bae S, Singh K (2014) Assessing the association in repeated measures of depression. Adv Appl Statist 42(2):83Google Scholar
  21. Islam MA, Chowdhury RI (2006) A higher order Markov model for analyzing covariate dependence. Appl Math Model 30(6):477–488Google Scholar
  22. Islam MA, Chowdhury RI (2007) First and higher order transition models with covariate dependence. In: F. Yang (ed) Progress in applied mathematical modeling. Nova Science, New York, pp 153–198Google Scholar
  23. Islam MA, Chowdhury RI (2010) Prediction of disease status: a regressive model approach for repeated measures. Statist Methodol 7(5):520–540Google Scholar
  24. Islam MA, Chowdhury RI (2017) Quasi-likelihood methods. In: In analysis of repeated measures data. Springer, pp 151–159.
  25. Islam MA, Chowdhury RI, Huda S (2009) Markov models with covaraite dependence for repeated measures. Nova Science, New YorkGoogle Scholar
  26. Islam MA, Chowdhury RI, Singh KP (2012) A Markov model for analyzing polytomous outcome data. Pak J Stat Oper Res 8(3):593–603Google Scholar
  27. Ji F, Ekström M, Evans JP, Teng J (2014) Evaluating rainfall patterns using physics scheme ensembles from a regional atmospheric model. Theor Appl Climatol 115(1):297–304. Google Scholar
  28. Jin L, Zhu J, Huang Y, Zhao H-s, Lin K-p, Jin J (2015) A nonlinear statistical ensemble model for short-range rainfall prediction. Theor Appl Climatol 119(3):791–807. Google Scholar
  29. Lavanya D, Rani KU (2012) Ensemble decision tree classier for breast cancer data. Int J Inf Technol Convergence Serv 2(1):17–24Google Scholar
  30. Lawal A, Abubakar UY, Danladi H, Gana AS (2016) Prediction of annual rainfall pattern using hidden Markov model (HMM) in Jos, Plateau State, Nigeria. J Appl Sci Environ Manag 20(3):617–622–622. Google Scholar
  31. Lee S, Cho S, Wong PM (1998) Rainfall prediction using artificial neural networks. J Geogr Inf Decis Anal 2(2):233–242Google Scholar
  32. Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W (2003) Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med 26(3):172–181Google Scholar
  33. Ling CX, Huang J, Zhang H (2003) AUC: a better measure than accuracy in comparing learning algorithms. In Conference of the Canadian Society for Computational Studies of Intelligence. Springer, pp. 329-341Google Scholar
  34. Meko DM, Baisan CH (2001) Pilot study of latewood-width of conifers as an indicator of variability of summer rainfall in the North American monsoon region. Int J Climatol 21(6):697–708. Google Scholar
  35. Moore WC, Meyers DA, Wenzel SE, Teague WG, Li H, Li X, D'Agostino Jr R, Castro M, Curran-Everett D, Fitzpatrick AM et al (2010) Identification of asthma phenotypes using cluster analysis in the severe asthma research program. Am J Respir Crit Care Med 181(4):315–323Google Scholar
  36. Muenz LR, Rubinstein LV (1985) Markov models for covariate dependence of binary sequences. Bio-metrics 41:91–101Google Scholar
  37. Nair A, Mohanty UC, Acharya N (2013) Monthly prediction of rainfall over India and its homogeneous zones during monsoon season: a supervised principal component regression approach on general circulation model products. Theor Appl Climatol 111(1):327–339. Google Scholar
  38. Nourani V, Razzaghzadeh Z, Baghanam AH, Molajou A (2018) ANN-based statistical downscaling of climatic parameters using decision tree predictor screening method. Theor Appl Climatol.
  39. Pearce J, Ferrier S (2000) Evaluating the predictive performance of habitat models developed using logistic regression. Ecol Model 133(3):225–245Google Scholar
  40. Ochola WO, Kerkides P (2003) A Markov chain simulation model for predicting critical wet and dry spells in Kenya: analysing rainfall events in the Kano Plains. Irrig Drain 52(4):327–342. Google Scholar
  41. Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using gis. Comput Geosci 51:350–365Google Scholar
  42. Raftery A, Tavare S (1994) Estimation and modelling repeated patterns in high order Markov chains with the mixture transition distribution model. Appl Stat 43(1):179–199Google Scholar
  43. Rao NJM, Biazi E (1983) Probability distribution models for daily rainfall data for an Interior Station of Brazil. Arch Meteorol Geophys Bioclimatol B 33(3):261–265. Google Scholar
  44. Rezac M, Rezac F (2011) How to measure the quality of credit scoring models. Finance a Uver 61(5):486Google Scholar
  45. Rudd M, GStat JM, Priestley JL (2017) A comparison of decision tree with logistic regression model for prediction of worst non-financial payment status in commercial credit.
  46. Rudolfer SM, Paliouras G, Peers IS (1999) A comparison of logistic regression to decision tree induction in the diagnosis of carpal tunnel syndrome. Comput Biomed Res 32(5):391–414Google Scholar
  47. Sahai A, Soman M, Satyan V (2000) All India summer monsoon rainfall prediction using an artificial neural network. Clim Dyn 16(4):291–302Google Scholar
  48. Sinha NC, Ataharul Islam M, Ahamed KS (2011) Logistic regression models for higher order transition probabilities of Markov chain for analyzing the occurrences of daily rainfall data. J Mod Appl Stat Methods 10(1):337–348. Google Scholar
  49. Sole X, Guino E, Valls J, Iniesta R, Moreno V (2006) Snpstats: a web tool for the analysis of association studies. Bioinformatics 22(15):1928–1929Google Scholar
  50. Solomatine DP, Dulal KN (2003) Model trees as an alternative to neural networks in rainfall runoff modelling. Hydrol Sci J 48(3):399–411Google Scholar
  51. Sonnadara DUJ, Jayewardene DR (2015) A Markov chain probability model to describe wet and dry patterns of weather at Colombo. Theor Appl Climatol 119(1):333–340. Google Scholar
  52. Steinberg D, Colla P (2009) CART: classification and regression trees. In: The Top Ten Algorithms in Data Mining, vol 9, p 179Google Scholar
  53. Therneau T, Atkinson B, Ripley B (2015) rpart: recursive partitioning and regression trees. R package version 4.1–10Google Scholar
  54. Therneau TM, Atkinson EJ et al (1997) An introduction to recursive partitioning using the RPART routines. Stats 116:1–52Google Scholar
  55. Thuiller W, Araujo MB, Lavorel S (2003) Generalized models vs. classification tree analysis: predicting spatial distributions of plant species at different scales. J Veg Sci 14(5):669–680Google Scholar
  56. Yusuf AU (2014) Markov chain model and its application to annual rainfall distribution for crop production. Am J Theor Appl Stat 3(2):39. Google Scholar
  57. Zhu W, Zeng N, Wang N et al (2010) Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS implementations, vol 19. NESUG proceedings: Health Care and Life Sciences, Baltimore, p 67Google Scholar

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Statistical Research and TrainingUniversity of DhakaDhakaBangladesh

Personalised recommendations