Advances in Knowledge Discovery and Data Mining

Volume 5012 of the series Lecture Notes in Computer Science pp 296-307

Handling Numeric Attributes in Hoeffding Trees

  • Bernhard PfahringerAffiliated withUniversity of Waikato
  • , Geoffrey HolmesAffiliated withUniversity of Waikato
  • , Richard KirkbyAffiliated withUniversity of Waikato

* Final gross prices may vary according to local VAT.

Get Access


For conventional machine learning classification algorithms handling numeric attributes is relatively straightforward. Unsupervised and supervised solutions exist that either segment the data into pre-defined bins or sort the data and search for the best split points. Unfortunately, none of these solutions carry over particularly well to a data stream environment. Solutions for data streams have been proposed by several authors but as yet none have been compared empirically. In this paper we investigate a range of methods for multi-class tree-based classification where the handling of numeric attributes takes place as the tree is constructed. To this end, we extend an existing approximation approach, based on simple Gaussian approximation. We then compare this method with four approaches from the literature arriving at eight final algorithm configurations for testing. The solutions cover a range of options from perfectly accurate and memory intensive to highly approximate. All methods are tested using the Hoeffding tree classification algorithm. Surprisingly, the experimental comparison shows that the most approximate methods produce the most accurate trees by allowing for faster tree growth.