Parallel induction algorithms for data mining
In the last decade, there has been an explosive growth in the generation and collection of data. Nonetheless, the quality of information inferred from this voluminous data has not been proportional to its size. One of the reasons for this is that the computational complexities of the algorithms used to extract information from the data are normally proportional to the number of input data items resulting in prohibitive execution time on large data sets. Parallelism is one solution to this problem. In this paper we present preliminary results on experiments in parallelising C4.5, a classification-rule learning system using decision-trees as a model representation, which has been used as a base model for investigating methods for parallelising induction algorithms. The experiments assess the potential for improving the execution time by exploiting parallelism in the algorithm.
- Jaturon Chattratichat, John Darlington, Moustafa Ghanem, Yike Guo, Harald Hüning, Martin Köhler, Janjao Sutiwaraphun, Hing Wing To, and Dan Yang. Large scale data mining: The challenges and the solutions. In Third International Conference on Knowledge Discovery and Data Mining, KDD-97. American Association for Artificial Intelligence, 1997 (submitted).
- E. Han, A. Srivastava, and V. Kumar. Parallel formulation of inductive classification learning algorithm. Technical Report 96-040, Department of Computer and Information Sciences, University of Minnesota, 1996.
- S. R. Hedberg. Parallelism speeds data mining. IEEE Parallel and Distributed Technology System and Applications, 3(4):3–6, 1995. CrossRef
- C. J. Merz and P. M. Murphy. UCI repository of machine learning databases. University of California, Department of Information and Computer Science, http://www.ics.uci.edu/-mlearn/MLRepository.html, 1996.
- J. R. Quinlan. C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, Inc, 1993.
- Janjao Sutiwaraphun. Data mining on parallel machines. MSc thesis, Department of Computing, Imperial College, September 1996.
- Parallel induction algorithms for data mining
- Book Title
- Advances in Intelligent Data Analysis Reasoning about Data
- Book Subtitle
- Second International Symposium, IDA-97 London, UK, August 4–6, 1997 Proceedings
- pp 437-445
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Industry Sectors
- eBook Packages
To view the rest of this content please follow the download PDF link above.