Mining Weather Data Using Fuzzy Cluster Analysis

  • Zhijian Liu
  • Roy George


The need to analyze the vast quantities of weather data collected has led to the development of new data mining tools and techniques. Mining this data can produce new insights into weather, climatological and environmental trends that have both scientific and practical significance. This chapter discusses the challenges posed by weather databases and examines the use of fuzzy clustering for analyzing such data. It proposes the extension of the fuzzy K-Means clustering algorithm to account for the spatio-temporal nature of weather data. It introduces an unsupervised fuzzy clustering algorithm, based on the fuzzy KMeans and defines a cluster validity index which is used to determine an optimal number of clusters. These techniques are validated on weather data in the South Central US, and global climate data (sea level pressure). It is seen that the algorithm is able to identify and preserve interesting phenomena in the weather data.


Fuzzy Cluster Weather Data Validity Index Cluster Validity Index Fuzzy Cluster Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Forgy, E., Cluster Analysis of Multivariate Data: Efficiency Versus Interpretability of Classifications. Biometry, 1965. 21(785).Google Scholar
  2. Gahegan, M., Data Mining and Knowledge Discovery in the Geographical Domain: Intersection of Geospatial Information and Information Technology, National Academies, Computer Science and Telecommunications Board White Paper. September 2001.Google Scholar
  3. Gahegan, M., et al., The Integration of Geographic Visualization with Knowledge Discovery in Databases and Geocomputation. Cartography and Geographic Information Systems, Special Issue on Research Challenges in Geovisualization, 2001. 28(1): p. 29–44.Google Scholar
  4. Gasch, A.P. and M.B. Eisen, Exploring the Conditional Coregulation of Yeast Gene Expression through Fuzzy K-Means Clustering. Genome Biology Research, 2002: p. 0059.1–0059.22.Google Scholar
  5. Guler, C., G.D. Thyne, and e. al., Evaluation of Graphical and Multivariate Statistical Methods for Classification of Water Chemistry Data. Hydrogeology Journal, 2002. 10(4): p. 455–474.Google Scholar
  6. Koperski, K. and J. Han. Discovery of Spatial Association Rules in Geographic Information Databases. in 4th International Symposium on Large Spatial Databases (SSD95). 1995. Maine.Google Scholar
  7. MacQueen, J.B. Some Methods for Classification and Analysis of Multivariate Observations. in 5th Berkeley Symp. on Probability and Statistics. 1967. Berkeley.Google Scholar
  8. Mohan, B.K., Integration of Irs-1a L2 Data by Fuzzy Logic Approaches for Landuse Classification. International Journal of Remote Sensing, 2000. 21(8): p. 1709–1713.Google Scholar
  9. Openshaw, S., The Modifiable Areal Unit Problem. CATMOG 38, 1984.Google Scholar
  10. Roddick, J.F. and M. Spiliopoulou, A Bibliography of Temporal, Spatial, and Spatio-Temporal Data Mining Research. SIGKDD Explorations, 1999. 1(1).Google Scholar
  11. Smyth, P., K. Ide, and M. Ghil, Multiple Regimes in Nothern Hemisphere Height Fields Via Mixture Model Clustering. Atmospheric Science, 2000. 56: p. 3704–3723.Google Scholar
  12. Steinbach, M., et al. Discovery of Climate Indices Using Clustering. in Knowledge Discovery in Databases. 2003. Washington D. C.Google Scholar
  13. Luke, B.T. Pearson’s Correlation Coefficient in Scholar
  14. deGruijter, J.J. and A.B. McBratney, A Modified Fuzzy K Means for Predictive Classification, in Classification and Related Methods of Data Analysis, H.H. Bock, Editor. 1988, Elsevier Science: Amsterdam. p. 97–104.Google Scholar
  15. Baker, N., et al., The Navy Operational Global Atmospheric Prediction System: A Brief History of Past, Present, and Future Developments. 1998.Google Scholar
  16. Ertoz, L., M. Steinbach, and V. Kumar. A New Shared Nearest Neighbor Clustering Algorithm and Its Applications. in Workshop on Clustering High Dimensional Data and its Applications, SIAM Data Mining. 2002. Arlington, VA.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Zhijian Liu
    • 1
  • Roy George
    • 1
  1. 1.Army Center for Research in Information Science, Department of Computer ScienceClark Atlanta UniversityAtlanta

Personalised recommendations