Advertisement

A Study of Random Linear Oracle Ensembles

  • Amir Ahmad
  • Gavin Brown
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5519)

Abstract

Random Linear Oracle (RLO) ensembles of Naive Bayes classifiers show excellent performance [12]. In this paper, we investigate the reasons for the success of RLO ensembles. Our study suggests that the decomposition of most of the classes of the dataset into two subclasses for each class is the reason for the success of the RLO method. Our study leads to the development of a new output manipulation based ensemble method; Random Subclasses (RS). In the proposed method, we create new subclasses from each subset of data points that belongs to the same class using RLO framework and consider each subclass as a class of its own. The comparative study suggests that RS is similar to RLO method, whereas RS is statistically better than or similar to Bagging and AdaBoost.M1 for most of the datasets. The similar performance of RLO and RS suggest that the creation of local structures (subclasses) is the main reason for the success of RLO. The another conclusion of this study is that RLO is more useful for classifiers (linear classifiers etc.) that have limited flexibility in their class boundaries. These classifiers can not learn complex class boundaries. Creating subclasses makes new, easier to learn, class boundaries.

Keywords

Classifier Ensemble Naive Bayes Clusters Subclasses 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alpaydin, E.: Combined 5 x 2 cv f Test Comparing Supervised Classification Learning Algorithms. Neural Computation 11(8), 1885–1892 (1999)CrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)zbMATHGoogle Scholar
  3. 3.
    Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10, 1895–1923 (1998)CrossRefGoogle Scholar
  4. 4.
    Domingos, P., Pazzani, M.: On the Optimality of the Simple Bayesian Classifier under Zero-one Loss. Machine Learning 29, 103–130 (1997)CrossRefzbMATHGoogle Scholar
  5. 5.
    Eick, C.F., Nidal, Z.: Using Supervised Clustering to Enhance Classifiers. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS, vol. 3488, pp. 248–256. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Freund, Y.: Boosting a Weak Learning Algorithm By Majority. Information and Computation 121(2), 256–285 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Hand, D.J., Yu, K.: Idiot’s Bayes - Not so Stupid After All. International Statistical Review 69, 385–399 (2001)zbMATHGoogle Scholar
  9. 9.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)CrossRefzbMATHGoogle Scholar
  10. 10.
    Kuncheva, L.I., Rodriguez, J.J.: Classifier ensembles with a random linear oracle. IEEE Trans. on Knowledge and Data Engineering 19(4), 500–508 (2007)CrossRefGoogle Scholar
  11. 11.
    Rish, I., Heellertein, J., Jayram, T.: An Analysis of Naive Bayes on Low-Entropy Distributions, Tech. Report RC91994, IBM T. J. Watson Research Center (2001)Google Scholar
  12. 12.
    Rodriguez, J.J., Kuncheva, L.I.: Naive bayes ensembles with a random oracle. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 450–458. Springer, Heidelberg (2007)Google Scholar
  13. 13.
    Tumer, K., Ghosh, J.: Error Correlation and Error Reduction in Ensemble Classifiers. Connect. Sci. 8(3), 385–404 (1996)CrossRefGoogle Scholar
  14. 14.
    Vilalta, R., Achari, M.R., Eick, C.F.: Class Decomposition via Clustering: A New Framework for Low Variance Classifiers. In: ICDM 2003 (2003)Google Scholar
  15. 15.
    Vilalta, R., Rish, I.: A Decomposition of Classes via Clustering to Explain and Improve Naive Bayes. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS, vol. 2837, pp. 444–455. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques., 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Amir Ahmad
    • 1
  • Gavin Brown
    • 1
  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUK

Personalised recommendations