Abstract
The problem of downscaling the effects of global scale climate variability into predictions of local hydrology has important implications for water resource management. Our research aims to identify predictive relationships that can be used to integrate solar and ocean-atmospheric conditions into forecasts of regional water flows. In recent work we have developed an induction technique called second-order table compression, in which learning can be viewed as a process that transforms a table consisting of training data into a second-order table (which has sets of atomic values as entries) with fewer rows by merging rows in consistency preserving ways. Here, we apply the second-order table compression technique to generate predictive models of future water inflows of Lake Okeechobee, a primary source of water supply for south Florida. We also describe SORCER, a second-order table compression learning system and compare its performance with three well-established data mining techniques: neural networks, decision tree learning and associational rule mining. SORCER gives more accurate results, on the average, than the other methods with average accuracy between 49% and 56% in the prediction of inflows discretized into four ranges. We discuss the implications of these results and the practical issues in assessing the results from data mining models to guide decision-making.
Similar content being viewed by others
References
J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann: San Mateo, CA, 1993.
B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Proceedings of Knowledge Discovery and Data Mining, New York, USA, 1998, pp. 80–86. Also in http://www.comp.nus.edu.sg/~dm2/result.html.
P. Langley and H. Simon, “Applications of machine learning and rule induction,” CACM, vol. 38, no.11, pp. 55–64, 1995.
T. Mitchell, Machine Learning, McGraw-Hill Companies: New York, NY, 1997.
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, 1994, pp. 487–499.
D. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.
P. Trimble, S. Everett, and C. Neidrauer, “A refined approach to Lake Okeechobee water management: An application of climate forecasts,” Special report, South Florida Water Management District, Florida, 1998.
P. Trimble, E. Santee, and C. Neidrauer, “Including the effects of solar activity for more efficient water management: An application of neural networks,” in Proceedings of the Workshop on AI Application in Solar-Terrestrial Physics, Sweden, 1997.
P. Domingos, “The role of Occam's Razor in knowledge discovery,” Data Mining and Knowledge Discovery, vol. 3, pp. 409–425, 1999.
J. Leuchner and R. Hewett, “A formal framework for large decision tables,” in Proceedings of the International Knowledge Retrieval, Use, and Storage for Efficiency Symposium (KRUSE), Santa Cruz, USA, 1997, pp. 65–179.
R. Hewett and J. Leuchner, “Second-order relations and decision tables,” TR97-27, CSE, Florida Atlantic University, Boca Raton, FL, 1997.
R. Hewett and J. Leuchner, “The power of second-order decision tables,” in Proceedings of SIAM International Conference on Data Mining (SDM’02), Arlington, USA, 2002, pp. 384–399.
H.C. Willet, “Climate responses to variable solar activity-past present and predicted,” in Climate History, Periodicity, Predictability, edited by R. Michael and Rampino, Van Nostrand Reinhold Company, Inc.: MIT, Cambridge, 1987.
United States Army Corps of Engineers, Rules Curves and Key Operating Regulation Manual, Append. D, 1978.
K. Kira and L.A. Rendell, “A practical approach to feature selection,” in Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland, 1992, pp. 249–256.
R. Kohavi and G.H. John, “Wrappers approach,” in Feature Selection for Knowledge Discovery and Data Mining, edited by H. Liu and H. Motoda, Kluwer Academic Publishers, pp. 33–50, 1998.
I. Kononenko, “Estimating attributes: Analysis and extensions of RELIEF,” in Proceedings of the 7th European Conf. on Machine Learning, Catania, Italy, 1994, pp. 171–182.
J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in Proceedings of the 12th International Conference on Machine Learning, San Francisco, CA, 1995, pp. 194–202.
U. Fayyad and K.B. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning,” in Proceedings of the 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann: San Francisco, CA, 1993, pp. 1022–1027.
L. Lapin, Statistics for Modern Business Decisions, Harcourt Brace Jovanovich, Inc., 1973.
S. Weiss and C. Kulikowski, Computer Systems That Learn, Morgan Kaufmann: San Francisco, CA, 1991.
D.R. Easterling, “Development of regional climate scenarios using a downscaling approach,” Climatic Change, vol. 41, pp. 615–634, 1999.
H. von Storch, E. Zorita, and U. Cubash, “Downscaling of global climate change estimates to regional scales: An application to Iberian rainfall in wintertime,” Journal of Climate, vol. 6, pp. 1161–1171, 1993.
E. Zorita and H. von Storch, “A survey of statistical downscaling techniques,” Institute of Hydrophysics, GKSS Forschungszentrum Geesthacht, Germany, 1997.
L. Leung, M. Wigmosta, S. Ghan, D. Epstein, and L. Vail, “Application of subgrid orographic precipitation/surface hydrology scheme to a mountain watershed,” Journal of Geophysics Research, vol. 101, pp. 12803–12817, 1996.
E. Chown and T. Dietterich, “A comparison of neural network and process-based models for vegetation distribution under global climate change,” Technical Report, Department of Computer Science, Oregon State University, Corvallis, OR, 1997.
K. Hsu, H. Gupta, and S. Sorooshian, “Artificial neural network modeling of the rainfall runoff process,” Water Resources Research, vol. 31, pp. 2517–2530, 1995.
R. Hewett, J. Leuchner, and P. Trimble, “Discovering hydrologic forecasting rules for water management using table compression: A preliminary result,” in Proceedings of the 5th International Conference on Computer Science and Informatics (CS&I 2000), Atlantic City, USA, 2000, pp. 476–479.
J. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no.1, pp. 81–106, 1986.
U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT Press: Menlo Park, CA, 1996.
S. Džeroski, “Inductive logic programming and knowledge discovery and databases,” in Advanced in Knowledge Discovery and Data Mining, edited by U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press/The MIT Press: Menlo Park, CA, pp. 117–152, 1996.
S. Muggleton and C. Feng, “Efficient induction of logic programs,” in Proceedings of the First Conference on Algorithmic Learning Theory, Tokyo, Japan, 1990, pp. 368–381.
S. Hong, “R-MINI:Aheuristic algorithm for generating minimal rules from examples,” in Proceedings of the Third Pacific Rim International Conference on Artificial Intelligence, PRICAI’94, Beijing, China, 1994, pp. 331–337.
S. Hong, R. Cain, and D. Ostapko, “MINI: A heuristic approach for logic minimization,” IBM Journal of Research and Development, pp. 443–458, 1974.
C. Apte and S. Hong, “Predicting equity returns from securities data,” in Advanced in Knowledge Discovery and Data Mining, edited by U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press/The MIT Press: Menlo Park, CA, pp. 541–560, 1996.
R. Hewett, J. Leuchner, and M. Carvalho, “Generating predictive models of regional water flows from global climate history with machine learning,” in Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Tucson, USA, 2001, pp. 292–297.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hewett, R. Data Mining for Generating Predictive Models of Local Hydrology. Applied Intelligence 19, 157–170 (2003). https://doi.org/10.1023/A:1026005922241
Issue Date:
DOI: https://doi.org/10.1023/A:1026005922241