Skip to main content

Analysis of District-Level Monsoon Rainfall Patterns in India: A Pilot Study

  • Conference paper
  • First Online:
Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2017)


Agricultural activities in India are heavily reliant on the monsoon rainfall during July–September every year. Indian Meteorological Department has been issuing rainfall forecasts since 1886. These predictions at a country or broad region level have limited benefits since different areas may see wide variations even when the overall average for India remains stable. This study explored possibilities of creating a cluster of districts as a more granular yet cohesive unit for rainfall forecast, by using different weather and atmospheric variables for past 12 months. Analytically, Principal Component Analysis (PCA) was used to reduce data dimensionality before creating an optimal cluster solution. Subsequently, a set of cluster-level linear regression models was found to perform better than a single regression model based on the entire sample. While district-level predictions showed limited value, the sequential combination of unsupervised and supervised techniques showed promising results at an overall level. These results will serve as a strong baseline for the planned extension of this pilot study which will use advanced machine learning techniques to improve upon the prediction performance further.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    Defined as at least 10 days ahead as per UK Meteorological Office definition.

  2. 2.

    +1 was used to tackle the 0 values.

  3. 3.

  4. 4.

  5. 5.

    Long Period Average (LPA) is defined as the average actual monsoon rainfall for 1951–2000.


  1. Dunne, T.: Stochastic aspects of the relations between climate, hydrology and landform evolution. Trans. Jpn Geomorphol. Union 12(1), 1–24 (1991)

    Google Scholar 

  2. Comrey, A.L., Lee, H.B.: A First Course in Factor Analysis. Lawrence Eribaum Associates Inc., Hillsdale (1992)

    Google Scholar 

  3. Omotosho, J.B., Balogun, A.A., Ogunjobi, K.: Predicting monthly and seasonal rainfall, onset and cessation of the rainy season in West Africa using only surface data. Int. J. Climatol. 20(8), 865–880 (2000)

    Article  Google Scholar 

  4. Zaïane, O.R., Foss, A., Lee, C.-H., Wang, W.: On data clustering analysis: scalability, constraints, and validation. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 28–39. Springer, Heidelberg (2002). doi:10.1007/3-540-47887-6_4

    Chapter  Google Scholar 

  5. Sen, N.: New forecast models for Indian south-west monsoon season rainfall. Curr. Sci. 84(10), 1290–1291 (2003)

    Google Scholar 

  6. Stenseth, N.C., Ottersen, G., Hurrell, J.W., Mysterud, A., Lima, M., Chan, K.S., et al.: Studying climate effects on ecology through the use of climate indices: the North Atlantic Oscillation, El Nino Southern Oscillation and beyond. Proc. R. Soc. Lond. B Biol. Sci. 270(1529), 2087–2096 (2003)

    Article  Google Scholar 

  7. Rajeevan, M., Pai, D.S., Dikshit, S.K., Kelkar, R.R.: IMD’s new operational models for long-range forecast of southwest monsoon rainfall over India and their verification for 2003. Curr. Sci. 86(3), 422–431 (2004)

    Google Scholar 

  8. Kim, M., Ramakrishna, R.S.: New indices for cluster validity assessment. Pattern Recogn. Lett. 26(15), 2353–2363 (2005)

    Article  Google Scholar 

  9. Rokach, L., Maimon, O.: Clustering methods. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 321–352. Springer, Boston (2005). doi:10.1007/0-387-25465-X_15

    Chapter  Google Scholar 

  10. Aksoy, E.: Clustering with GIS: an attempt to classify turkish district data. In: XXIII FIG Congress, pp. 8–13, November 2006

    Google Scholar 

  11. Tripathi, S., Srinivas, V.V., Nanjundiah, R.S.: Downscaling of precipitation for climate change scenarios: a support vector machine approach. J. Hydrol. 330(3), 621–640 (2006)

    Article  Google Scholar 

  12. Bottman, N., Essig, W., Whittle, S.: Why weight? A cluster-theoretic approach to political districting. In: MCM 2007, Department of Mathematics, University of Washington (2007)

    Google Scholar 

  13. Rajeevan, M., Pai, D.S., Kumar, R.A., Lal, B.: New statistical models for long-range forecasting of southwest monsoon rainfall over India. Clim. Dyn. 28(7–8), 813–828 (2007)

    Article  Google Scholar 

  14. Ingsrisawang, L., Ingsriswang, S., Luenam, P., Trisaranuwatana, P., Klinpratoom, S., Aungsuratana, P., Khantiyanan, W.: Applications of statistical methods for rainfall prediction over the Eastern Thailand. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 3 (2010)

    Google Scholar 

  15. Bello-Orgaz, G., Menéndez, H.D., Camacho, D.: Adaptive k-means algorithm for overlapped graph clustering. Int. J. Neural Syst. 22(05), 1250018 (2012)

    Article  Google Scholar 

  16. Kisi, O., Cimen, M.: Precipitation forecasting by using wavelet-support vector machine conjunction model. Eng. Appl. Artif. Intell. 25(4), 783–792 (2012)

    Article  Google Scholar 

  17. Kumar, A., Pai, D.S., Singh, J.V., Singh, R., Sikka, D.R.: Statistical models for long-range forecasting of southwest monsoon rainfall over India using step wise regression and neural network. Atmos. Clim. Sci. 2(03), 322 (2012)

    Google Scholar 

  18. Ansari, H.: Forecasting seasonal and annual rainfall based on nonlinear modeling with Gamma test in North of Iran. Int. J. Eng. Pract. Res. 2(1), 16–29 (2013)

    MathSciNet  Google Scholar 

  19. Rao, M.V.V., Kumar, S., Brahmam, G.N.V.: A study of the geographical clustering of districts in Uttar Pradesh using nutritional anthropometric data of preschool children. Indian J. Med. Res. 137(1), 73 (2013)

    Google Scholar 

  20. Yong, A.G., Pearce, S.: A beginner’s guide to factor analysis: focusing on exploratory factor analysis. Tutor. Quant. Methods Psychol. 9(2), 79–94 (2013)

    Article  Google Scholar 

  21. Chifurira, R., Chikobvu, D.: A weighted multiple regression model to predict rainfall patterns: principal component analysis approach. Mediter. J. Soc. Sci. 5(7), 34 (2014)

    Google Scholar 

  22. Menéndez, H.D., Barrero, D.F., Camacho, D.: A genetic graph-based approach for partitional clustering. Int. J. Neural Syst. 24(03), 1430008 (2014)

    Article  Google Scholar 

  23. Menéndez, H.D., Otero, F.E., Camacho, D.: Medoid-based clustering using ant colony optimization. Swarm Intell. 10(2), 123–145 (2016)

    Article  Google Scholar 

Download references


This work was undertaken as part of the Master of Technology in Enterprise Business Analytics program in Institute of Systems Science, National University of Singapore, under the guidance of Dr. Rita Chakravarti. Authors would like to thank Dr. Chakravarti for her guidance and the two anonymous reviewers for their valuable suggestions on this paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sougata Deb .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Deb, S., Acebedo, C.M.L., Yu, J., Dhanapal, G., Periasamy, N. (2017). Analysis of District-Level Monsoon Rainfall Patterns in India: A Pilot Study. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69455-9

  • Online ISBN: 978-3-319-69456-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics