Advertisement

Porting Decision Tree Algorithms to Multicore Using FastFlow

  • Marco Aldinucci
  • Salvatore Ruggieri
  • Massimo Torquati
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6321)

Abstract

The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7× speedup on an Intel dual-quad core machine.

Keywords

parallel classification C4.5 multicores structured parallel programming streaming 

References

  1. 1.
    Aldinucci, M., Meneghin, M., Torquati, M.: Efficient Smith-Waterman on multi-core with FastFlow. In: Proc. of the Euromicro Conf. on Parallel, Distributed and Network-based Processing (PDP), pp. 195–199. IEEE, Pisa (2010)Google Scholar
  2. 2.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. CACM 52(10), 56–67 (2009)Google Scholar
  3. 3.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing 37(1), 55–69 (1996)CrossRefGoogle Scholar
  4. 4.
    Buehrer, G.T.: Scalable mining on emerging architectures. Phd thesis, Columbus, OH, USA (2008)Google Scholar
  5. 5.
    Cole, M.: Bringing skeletons out of the closet: A pragmatic manifesto for skeletal parallel programming. Parallel Computing 30(3), 389–406 (2004)CrossRefGoogle Scholar
  6. 6.
    Coppola, M., Vanneschi, M.: High-performance data mining with skeleton-based structured parallel programming. Parallel Computing 28(5), 793–813 (2002)CrossRefGoogle Scholar
  7. 7.
    Gehrke, J.E., Ramakrishnan, R., Ganti, V.: RainForest — A framework for fast decision tree construction of large datasets. Data Mining and Knowledge Discovery 4(2/4), 127–162 (2000)CrossRefGoogle Scholar
  8. 8.
    Ghoting, A., Buehrer, G., Parthasarathy, S., Kim, D., Nguyen, A., Chen, Y.K., Dubey, P.: Cache-conscious frequent pattern mining on a modern processor. In: Proc. of the Intl. Conf. on Very Large Data Bases (VLDB), pp. 577–588 (2005)Google Scholar
  9. 9.
    Han, E., Srivastava, A., Kumar, V.: Parallel formulation of inductive classification parallel algorithm. Tech. rep., Department Computer and Information Science, University of Minnesota (1996)Google Scholar
  10. 10.
    Jin, R., Yang, G., Agrawal, G.: Shared memory parallelization of data mining algorithms: Techniques, programming interface, and performance. IEEE Transactions on Knowledge and Data Engineering 17, 71–89 (2005)CrossRefGoogle Scholar
  11. 11.
    Joshi, M., Karypis, G., Kumar, V.: ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In: Proc. of IPPS/SPDP, pp. 573–579. IEEE, Los Alamitos (1998)Google Scholar
  12. 12.
    Lim, T., Loh, W., Shih, Y.: A comparison of prediction accuracy, complexity, and training time of thirthy-tree old and new classification algorithms. Machine Learning Journal 40, 203–228 (2000)zbMATHCrossRefGoogle Scholar
  13. 13.
    Park, I., Voss, M.J., Kim, S.W., Eigenmann, R.: Parallel programming environment for OpenMP. Scientific Programming 9, 143–161 (2001)Google Scholar
  14. 14.
    Pisharath, J., Zambreno, J., Ozisikyilmaz, B., Choudhary, A.: Accelerating data mining workloads: Current approaches and future challenges in system architecture design. In: Proc. of Workshop on High Performance and Distributed Mining (2006)Google Scholar
  15. 15.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  16. 16.
    Ruggieri, S.: Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering 14, 438–444 (2002)CrossRefGoogle Scholar
  17. 17.
    Ruggieri, S.: YaDT: Yet another Decision tree Builder. In: 16th IEEE Int. Conf. on Tools with Artificial Intelligence (ICTAI), pp. 260–265. IEEE, Los Alamitos (2004)CrossRefGoogle Scholar
  18. 18.
    Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: Proc. of the Intl. Conf. on Very Large Data Bases (VLDB), pp. 544–555 (1996)Google Scholar
  19. 19.
    Sodan, A.C., Machina, J., Deshmeh, A., Macnaughton, K., Esbaugh, B.: Parallelism via multithreaded and multicore CPUs. IEEE Computer 43(3), 24–32 (2010)Google Scholar
  20. 20.
    Sreenivas, M.K., Alsabti, K., Ranka, S.: Parallel out-of-core divide-and-conquer techniques with application to classification trees. In: Proc. of IPPS/SPDP, pp. 555–562. IEEE, Los Alamitos (1999)Google Scholar
  21. 21.
    Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  22. 22.
    Vanneschi, M.: The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28(12), 1709–1732 (2002)zbMATHCrossRefGoogle Scholar
  23. 23.
    Zaki, M., Ho, C.T., Agrawal, R.: Parallel classification for data mining on shared-memory multiprocessors. In: Proc. of the Intl. Conf. on Data Engineering (ICDE), pp. 198–205. IEEE, Los Alamitos (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Marco Aldinucci
    • 1
  • Salvatore Ruggieri
    • 2
  • Massimo Torquati
    • 2
  1. 1.Computer Science DepartmentUniversity of TorinoItaly
  2. 2.Computer Science DepartmentUniversity of PisaItaly

Personalised recommendations