Advertisement

Incrementally Optimized Decision Tree for Mining Imperfect Data Streams

  • Hang Yang
  • Simon Fong
Part of the Communications in Computer and Information Science book series (CCIS, volume 293)

Abstract

The Very Fast Decision Tree (VFDT) is one of the most important classification algorithms for real-time data stream mining. However, imperfections in data streams, such as noise and imbalanced class distribution, do exist in real world applications and they jeopardize the performance of VFDT. Traditional sampling techniques and post-pruning may be impractical for a non-stopping data stream. To deal with the adverse effects of imperfect data streams, we have invented an incremental optimization model that can be integrated into the decision tree model for data stream classification. It is called the Incrementally Optimized Very Fast Decision Tree (I-OVFDT) and it balances performance (in relation to prediction accuracy, tree size and learning time) and diminishes error and tree size dynamically. Furthermore, two new Functional Tree Leaf strategies are extended for I-OVFDT that result in superior performance compared to VFDT and its variant algorithms. Our new model works especially well for imperfect data streams. I-OVFDT is an anytime algorithm that can be integrated into those existing VFDT-extended algorithms based on Hoeffding bound in node splitting. The experimental results show that I-OVFDT has higher accuracy and more compact tree size than other existing data stream classification methods.

Keywords

Data stream mining decision tree classification optimized very fast decision tree incremental optimization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pedro, D., Geoff, H.: Mining high-speed data streams. In: Proc. of the Sixth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)Google Scholar
  2. 2.
    Geoff, H., Pedro, D.: VFML-a toolkit for mining high-speed time-changing data streams (2003), http://www.cs.washington.edu/dm/vfml/
  3. 3.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. Journal of Machine Learning Research 11, 1601–1604 (2010)Google Scholar
  4. 4.
    Yang, H., Fong, S.: Moderated VFDT in Stream Mining Using Adaptive Tie Threshold and Incremental Pruning. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2011. LNCS, vol. 6862, pp. 471–483. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2009), pp. 329–338. ACM, New York (2009)CrossRefGoogle Scholar
  6. 6.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, pp. 97–106 (2001)Google Scholar
  7. 7.
    Gama, J., Ricardo, R.: Accurate decision trees for mining high-speed data streams. In: Proc. of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM (2003)Google Scholar
  8. 8.
    Pfahringer, B., Holmes, G., Kirkby, R.: New options for Hoeffding trees. In: Proc. of the 20th Australian Joint Conference on Advances in Artificial Intelligence, Gold Coast, Australia, pp. 90–99 (2007)Google Scholar
  9. 9.
    Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Proc. of the 2005 ACM Symposium on Applied Computing, Santa Fe, New Mexico, pp. 573–577 (2005)Google Scholar
  10. 10.
    Chen, L., Yang, Z., Xue, L.: OcVFDT: one-class very fast decision tree for one-class classification of data streams. In: Proc. of the Third International Workshop on Knowledge Discovery from Sensor Data, pp. 79–86. ACM (2009)Google Scholar
  11. 11.
    Sattar, H., Ying, Y.: Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Min. Knowl. Discov., 1384–5810 19(1), 95–131 (2009)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Bradford, J., Kunz, C., Kohavi, R., Brunk, C., Brodley, C.: Pruning Decision Trees with Misclassification Costs. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 131–136. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  13. 13.
    Oza, N., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann (2001)Google Scholar
  14. 14.
    Kirkby, R.: Improving Hoeffding Trees. PhD thesis, University of Waikato, New Zealand (2008)Google Scholar
  15. 15.
    Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sums of observations. Annals of Mathematical Statistics 23, 493–507 (1952)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hang Yang
    • 1
  • Simon Fong
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of MacauTaipaMacau SAR, China

Personalised recommendations