Abstract
Ensemble learning techniques generate multiple classifiers, so called base classifiers, whose combined classification results are used in order to increase the overall classification accuracy. In most ensemble classifiers the base classifiers are based on the Top Down Induction of Decision Trees (TDIDT) approach. However, an alternative approach for the induction of rule based classifiers is the Prism family of algorithms. Prism algorithms produce modular classification rules that do not necessarily fit into a decision tree structure. Prism classification rulesets achieve a comparable and sometimes higher classification accuracy compared with decision tree classifiers, if the data is noisy and large. Yet Prism still suffers from overfitting on noisy and large datasets. In practice ensemble techniques tend to reduce the overfitting, however there exists no ensemble learner for modular classification rule inducers such as the Prism family of algorithms. This article describes the first development of an ensemble learner based on the Prism family of algorithms in order to enhance Prism’s classification accuracy by reducing overfitting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hadoop, http://hadoop.apache.org/mapreduce/ 2011.
C L Blake and C J Merz. UCI repository of machine learning databases. Technical report, University of California, Irvine, Department of Information and Computer Sciences, 1998.
M A Bramer. Automatic induction of classification rules from examples using N-Prism. In Research and Development in Intelligent Systems XVI, pages 99–121, Cambridge, 2000. Springer-Verlag.
M A Bramer. An information-theoretic approach to the pre-pruning of classification rules. In B Neumann M Musen and R Studer, editors, Intelligent Information Processing, pages 201–212. Kluwer, 2002.
M A Bramer. Inducer: a public domain workbench for data mining. International Journal of Systems Science, 36(14):909–919, 2005.
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
J. Cendrowska. PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4):349–370, 1987.
Philip Chan and Salvatore J Stolfo. Experiments on multistrategy learning by meta learning. In Proc. Second Intl. Conference on Information and Knowledge Management, pages 314–323, 1993.
Philip Chan and Salvatore J Stolfo. Meta-Learning for multi strategy and parallel learning. In Proceedings. Second International Workshop on Multistrategy Learning, pages 150–165, 1993.
Nitesh V. Chawla, Lawrence O. Hall, Kevin W. Bowyer, and W. Philip Kegelmeyer. Learning ensembles from bites: A scalable and accurate approach. J. Mach. Learn. Res., 5:421–451, December 2004.
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107–113, January 2008.
Saso Dzeroski and Bernard Zenko. Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54:255–273, 2004.
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.
Tin Kam Ho. Random decision forests. Document Analysis and Recognition, International Conference on, 1:278, 1995.
R S Michalski. On the Quasi-Minimal solution of the general covering problem. In Proceedings of the Fifth International Symposium on Information Processing, pages 125–128, Bled, Yugoslavia, 1969.
Domingos P. and Hulten G. Mining high-speed data streams. In In International Conference on Knowledge Discovery and Data Mining, pages 71–81, 2000.
Biswanath Panda, Joshua S. Herbach, Sugato Basu, and Roberto J. Bayardo. Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow., 2:1426–1437, August 2009.
Foster Provost. Distributed data mining: Scaling up and beyond. In Advances in Distributed and Parallel Knowledge Discovery, pages 3–27. MIT Press, 2000.
R J Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1993.
Ross J Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.
P. Smyth and R M Goodman. An information theoretic approach to rule induction from databases. Transactions on Knowledge and Data Engineering, 4(4):301–316, 1992.
F T Stahl, MA Bramer, and M Adda. PMCRI: A parallel modular classification rule induction framework. In MLDM, pages 148–162. Springer, 2009.
Frederic Stahl, Max Bramer, and Mo Adda. J-PMCRI: a methodology for inducing pre-pruned modular classification rules. IFIP Advances in Information and Communication Technology, 331:47–56, 2010.
Frederic Stahl, Max Bramer, and Mo Adda. Parallel rule induction with information theoretic pre-pruning. In Research and Development in Intelligent Systems XXVI, volume 4, pages 151–164. Springerlink, 2010.
Frederic Stahl, Mohamed Medhat Gaber, Max Bramer, and Phillip S. Yu. Distributed hoeffding trees for pocket data mining. In The 2011 International Conference on High Performance Computing and Simulation, Istanbul, Turkey, in Press (2011).
Frederic Stahl, Mohamed Medhat Gaber, Han Liu, Max Bramer, and Phillip S. Yu. Distributed classification for pocket data mining. In 19th International Symposium on Methodologies for Intelligent Systems, Warsaw, Poland, in Press (2011). Springer.
Frederic T. Stahl and Max Bramer. Induction of modular classification rules: Using jmaxpruning. In SGAI Conf.’10, pages 79–92, 2010.
Frederic T. Stahl, Max Bramer, and Mo Adda. Parallel induction of modular classification rules. In SGAI Conf., pages lookup–lookup. Springer, 2008.
Frederic T. Stahl, Mohamed Medhat Gaber, Max Bramer, and Philip S. Yu. Pocket data mining: Towards collaborative data mining in mobile computing environments. In ICTAI (2)’10, pages 323–330, 2010.
I H Witten and F Eibe. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 1999.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this paper
Cite this paper
Stahl, F., Bramer, M. (2011). Random Prism: An Alternative to Random Forests. In: Bramer, M., Petridis, M., Nolle, L. (eds) Research and Development in Intelligent Systems XXVIII. SGAI 2011. Springer, London. https://doi.org/10.1007/978-1-4471-2318-7_1
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2318-7_1
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2317-0
Online ISBN: 978-1-4471-2318-7
eBook Packages: Computer ScienceComputer Science (R0)