A Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach

  • Frederic Stahl
  • David May
  • Hugo Mills
  • Max Bramer
  • Mohamed Medhat Gaber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9070)


The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.


Random Forest Base Classifier Computing Node Training Instance Ensemble Classifier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
  2. 2.
    Bacardit, J., Krasnogor, N.: The infobiotics PSP benchmarks repository. Technical report (2008)Google Scholar
  3. 3.
    Bache, K., Lichman, M.: UCI machine learning repository (2013)Google Scholar
  4. 4.
    Bramer, M.A.: Automatic induction of classification rules from examples using N-Prism. In: Bramer, M., Macintosh, A., Coenen, F. (eds.) Research and Development in Intelligent Systems XVI, pp. 99–121. Springer-Verlag, London (2000)CrossRefGoogle Scholar
  5. 5.
    Bramer, M.A.: An information-theoretic approach to the pre-pruning of classification rules. In: Musen, M.A., Neumann, B., Studer, R. (eds.) Intelligent Information Processing. IFIP, vol. 93, pp. 201–212. Springer, Boston (2002)CrossRefGoogle Scholar
  6. 6.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MATHMathSciNetGoogle Scholar
  7. 7.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar
  8. 8.
    Cendrowska, J.: PRISM: an algorithm for inducing modular rules. Int. J. Man Mach. Stud. 27(4), 349–370 (1987)CrossRefMATHGoogle Scholar
  9. 9.
    Chan, P., Stolfo, S.J.: Meta-Learning for multi strategy and parallel learning. In: Proceedings of Second International Workshop on Multistrategy Learning, pp. 150–165 (1993)Google Scholar
  10. 10.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)CrossRefGoogle Scholar
  11. 11.
    Grandvalet, Y.: Bagging equalizes influence. Mach. Learn. 55(3), 251–270 (2004)CrossRefMATHGoogle Scholar
  12. 12.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier, Amsterdam (2011)Google Scholar
  13. 13.
    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 3rd edn. Morgan Kaufmann, San Mateo (2003)Google Scholar
  14. 14.
    Ho, T.K.: Random decision forests. In: International Conference on Document Analysis and Recognition, vol. 1, p. 278 (1995)Google Scholar
  15. 15.
    Hwang, K., Briggs, F.A.: Computer Architecture and Parallel Processing. McGraw-Hill Book Co., New York (1987). International editionGoogle Scholar
  16. 16.
    Liu, T., Rosenberg, C., Rowley, H.A.: Clustering billions of images with large scale nearest neighbor search. In: Proceedings of the Eighth IEEE Workshop on Applications of Computer Vision, WACV 2007, Washington, DC, USA, p. 28. IEEE Computer Society (2007)Google Scholar
  17. 17.
    Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow. 2, 1426–1437 (2009)CrossRefGoogle Scholar
  18. 18.
    Quinlan, R.J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  19. 19.
    Stahl, F., Bramer, M.: Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks. Knowl.-Based Syst. 35, 49–63 (2012)CrossRefGoogle Scholar
  20. 20.
    Stahl, F., Bramer, M.: Random prism: a noise-tolerant alternative to random forests. Expert Syst. 31(4), 411–420 (2013)Google Scholar
  21. 21.
    Stahl, F., Bramer, M., Adda, M.: Parallel rule induction with information theoretic pre-pruning. In: Bramer, M., Ellis, R., Petridis, M. (eds.) Research and Development in Intelligent Systems XXVI, pp. 151–164. Springer, London (2010)CrossRefGoogle Scholar
  22. 22.
    Stahl, F., May, D., Bramer, M.: Parallel random prism: a computationally efficient ensemble learner for classification. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXIX, pp. 21–34. Springer, London (2012)CrossRefGoogle Scholar
  23. 23.
    Tlili, R., Slimani, Y.: A hierarchical dynamic load balancing strategy for distributed data mining. Int. J. Adv. Sci. Technol. 39, 29–48 (2012)Google Scholar
  24. 24.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques: Practical Machine Learning Tools and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science, Amsterdam (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Frederic Stahl
    • 1
  • David May
    • 2
  • Hugo Mills
    • 1
  • Max Bramer
    • 3
  • Mohamed Medhat Gaber
    • 4
  1. 1.School of Systems EngineeringUniversity of ReadingReadingUK
  2. 2.Real Time Information Systems LtdChichesterUK
  3. 3.School of ComputingUniversity of PortsmouthPortsmouthUK
  4. 4.School of Computing Science and Digital MediaRobert Gordon UniversityAberdeenUK

Personalised recommendations