Abstract
In the batch learning setting it suffices to take into account only a reduced number of threshold candidates in discretizing the value range of a numerical attribute for many commonly-used attribute evaluation functions. We show that the same techniques are also efficiently applicable in the on-line learning scheme. Only constant time per example is needed for determining the changes on data grouping. Hence, one can apply multi-way splits, e.g., in the standard approach to decision tree learning from data streams. We also briefly consider modifications needed to cope with drifting concepts. Our empirical evaluation demonstrates that often the reduction in threshold candidates obtained is high for the important attributes. In a data stream logarithmic growth in the number of potential cut points and the reduced number of threshold candidates is observed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. 6th ACM SIGKDD Conf. on Data Mining and Knowl. Discovery, pp. 71–80. ACM Press, New York (2000)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proc. 9th ACM SIGKDD Conf. on Data Mining and Knowledge Discovery, pp. 523–528. ACM Press, New York (2003)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. 7th ACM SIGKDD Conf. on Data Mining and Knowledge Discovery, pp. 97–106. ACM Press, New York (2001)
Jin, R., Agrawal, G.: Efficient decision tree construction for streaming data. In: Proc. 9th ACM SIGKDD Conf. on Data Mining and Knowledge Discovery, pp. 571–576. ACM Press, New York (2003)
Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Proc. 2005 ACM Symp. on Appl. Comput., pp. 573–577. ACM Press, New York (2005)
Gama, J., Pinto, C.: Dizcretization from data streams: Applications to histograms and data mining. In: Proc. 2006 ACM Symp. on Applied Computing, pp. 662–667. ACM Press, New York (2006)
Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8, 87–102 (1992)
Fulton, T., Kasif, S., Salzberg, S.: Efficient algorithms for finding multi-way splits for decision trees. In: Proc. 12th ICML, pp. 244–251. Morgan Kaufmann, San Francisco (1995)
Zighed, D., Rakotomalala, R., Feschet, F.: Optimal multiple intervals discretization of continuous attributes for supervised learning. In: Proc. 3rd Intl. Conf. on Knowledge Discovery and Data Mining, pp. 295–298. AAAI Press, Menlo Park (1997)
Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 201–244 (1999)
Elomaa, T., Rousu, J.: Efficient multisplitting revisited: Optima-preserving elimination of partition candidates. Data Mining Knowl. Discovery 8, 97–126 (2004)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth (1984)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. 13th Intl. Joint Conf. on Artificial Intelligence, pp. 1022–1027. Morgan Kaufmann, San Francisco (1993)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elomaa, T., Lehtinen, P. (2008). Maintaining Optimal Multi-way Splits for Numerical Attributes in Data Streams. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_49
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)