Incrementally Optimized Decision Tree for Mining Imperfect Data Streams
The Very Fast Decision Tree (VFDT) is one of the most important classification algorithms for real-time data stream mining. However, imperfections in data streams, such as noise and imbalanced class distribution, do exist in real world applications and they jeopardize the performance of VFDT. Traditional sampling techniques and post-pruning may be impractical for a non-stopping data stream. To deal with the adverse effects of imperfect data streams, we have invented an incremental optimization model that can be integrated into the decision tree model for data stream classification. It is called the Incrementally Optimized Very Fast Decision Tree (I-OVFDT) and it balances performance (in relation to prediction accuracy, tree size and learning time) and diminishes error and tree size dynamically. Furthermore, two new Functional Tree Leaf strategies are extended for I-OVFDT that result in superior performance compared to VFDT and its variant algorithms. Our new model works especially well for imperfect data streams. I-OVFDT is an anytime algorithm that can be integrated into those existing VFDT-extended algorithms based on Hoeffding bound in node splitting. The experimental results show that I-OVFDT has higher accuracy and more compact tree size than other existing data stream classification methods.
KeywordsData stream mining decision tree classification optimized very fast decision tree incremental optimization
Unable to display preview. Download preview PDF.
- 1.Pedro, D., Geoff, H.: Mining high-speed data streams. In: Proc. of the Sixth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)Google Scholar
- 2.Geoff, H., Pedro, D.: VFML-a toolkit for mining high-speed time-changing data streams (2003), http://www.cs.washington.edu/dm/vfml/
- 3.Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)Google Scholar
- 6.Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, pp. 97–106 (2001)Google Scholar
- 7.Gama, J., Ricardo, R.: Accurate decision trees for mining high-speed data streams. In: Proc. of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM (2003)Google Scholar
- 8.Pfahringer, B., Holmes, G., Kirkby, R.: New options for Hoeffding trees. In: Proc. of the 20th Australian Joint Conference on Advances in Artificial Intelligence, Gold Coast, Australia, pp. 90–99 (2007)Google Scholar
- 9.Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Proc. of the 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, pp. 573–577 (2005)Google Scholar
- 10.Chen, L., Yang, Z., Xue, L.: OcVFDT: one-class very fast decision tree for one-class classification of data streams. In: Proc. of the Third International Workshop on Knowledge Discovery from Sensor Data, pp. 79–86. ACM (2009)Google Scholar
- 13.Oza, N., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann (2001)Google Scholar
- 14.Kirkby, R.: Improving Hoeffding Trees. PhD thesis, University of Waikato, New Zealand (2008)Google Scholar