Advertisement

Random Projection Ensemble Classifiers

  • Alon Schclar
  • Lior Rokach
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 24)

Abstract

We introduce a novel ensemble model based on random projections. The contribution of using random projections is two-fold. First, the randomness provides the diversity which is required for the construction of an ensemble model. Second, random projections embed the original set into a space of lower dimension while preserving the dataset’s geometrical structure to a given distortion. This reduces the computational complexity of the model construction as well as the complexity of the classification. Furthermore, dimensionality reduction removes noisy features from the data and also represents the information which is inherent in the raw data by using a small number of features. The noise removal increases the accuracy of the classifier.

The proposed scheme was tested using WEKA based procedures that were applied to 16 benchmark dataset from the UCI repository.

Keywords

Ensemble methods Random projections Classification Pattern recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)CrossRefGoogle Scholar
  2. 2.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar
  3. 3.
    Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, USA, August 26-29, 2001, pp. 245–250 (2001)Google Scholar
  4. 4.
    Bourgain, J.: On lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal of Mathematics 52, 46–52 (1985)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)Google Scholar
  6. 6.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall, Inc., New York (1993)Google Scholar
  7. 7.
    Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory 52(2), 489–509 (2006)CrossRefGoogle Scholar
  8. 8.
    Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)Google Scholar
  9. 9.
    Donoho, D.L.: Compressed sensing. IEEE Transactions on Information Theory 52(4), 1289–1306 (2006)CrossRefGoogle Scholar
  10. 10.
    Zhang Fern, X., Brodley, C.E.: Random projection for high dimensional data clustering: A cluster ensemble approach, pp. 186–193 (2003)Google Scholar
  11. 11.
    Folgieri, R.: Ensembles based on Random Projection for gene expression data analysis. PhD thesis, University of Milano (2007)Google Scholar
  12. 12.
    Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. machine learning. In: Proceedings for the Thirteenth International Conference, pp. 148–156. Morgan Kaufmann, San Francisco (1996)Google Scholar
  13. 13.
    Goel, N., Bebis, G., Nefian, A.: Face recognition experiments with random projection. In: Proceedings of SPIE, vol. 5779, p. 426 (2005)Google Scholar
  14. 14.
    Hegde, C., Wakin, M., Baraniuk, R.G.: Random projections for manifold learning. In: Neural Information Processing Systems (NIPS) (December 2007)Google Scholar
  15. 15.
    Hein, M., Audibert, Y.: Intrinsic dimensionality estimation of submanifolds in Euclidean space. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 289–296 (2005)Google Scholar
  16. 16.
    Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)CrossRefGoogle Scholar
  17. 17.
    Johnson, W.B., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space. Contemporary Mathematics 26, 189–206 (1984)Google Scholar
  18. 18.
    Kuncheva, L.I.: Combining Pattern Classifiers. Methods and Algorithms. John Wiley and Sons, Chichester (2004)CrossRefGoogle Scholar
  19. 19.
    Kuncheva, L.I.: Diversity in multiple classifier systems (editorial). Information Fusion 6(1), 3–4 (2004)CrossRefGoogle Scholar
  20. 20.
    Leigh, W., Purvis, R., Ragusa, J.M.: Forecasting the nyse composite index with technical analysis, pattern recognizer, neural networks, and genetic algorithm: a case study in romantic decision support. Decision Support Systems 32(4), 361–377 (2002)CrossRefGoogle Scholar
  21. 21.
    Linial, M., Linial, N., Tishby, N., Yona, G.: Global self-organization of all known protein sequences reveals inherent biological signatures. Journal of Molecular Biology 268(2), 539–556 (1997)CrossRefGoogle Scholar
  22. 22.
    Mangiameli, P., West, D., Rampal, R.: Model selection for medical diagnosis decision support systems. Decision Support Systems 36(3), 247–259 (2004)CrossRefGoogle Scholar
  23. 23.
    Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning, pp. 211–218 (1997)Google Scholar
  24. 24.
    Polikar, R.: Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6(3), 21–45 (2006)CrossRefGoogle Scholar
  25. 25.
    Quinlan, R.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  26. 26.
    Rokach, L.: Mining manufacturing data using genetic algorithm-based feature set decomposition. International Journal of Intelligent Systems Technologies and Applications 4(1/2), 57–78 (2008)CrossRefGoogle Scholar
  27. 27.
    Rooney, N., Patterson, D., Tsymbal, A., Anand, S.: Random subspacing for regression ensembles. Technical report, Department of Computer Science, Trinity College Dublin, Ireland, February 10 (2004)Google Scholar
  28. 28.
    Valentini, G., Muselli, M., Ruffino, F.: Bagged ensembles of svms for gene expression data analysis. In: Proceeding of the International Joint Conference on Neural Networks - IJCNN, pp. 1844–1849. IEEE Computer Society Press, Los Alamitos (2003)CrossRefGoogle Scholar
  29. 29.
    Vapnik, V.N.: The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, Heidelberg (1999)Google Scholar
  30. 30.
    Yang, Z., Nie, X., Xu, W., Guo, J.: An approach to spam detection by naive bayes ensemble based on decision induction. In: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA 2006) (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Alon Schclar
    • 1
  • Lior Rokach
    • 1
  1. 1.Department of Information System Engineering and Deutsche Telekom Research LaboratoriesBen-Gurion UniversityBeer-ShevaIsrael

Personalised recommendations