Decision Tree Induction Methods for Distributed Environment

Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 59)


Since the amount of information is rapidly growing, there is an overwhelming interest in efficient distributed computing systems including Grids, public-resource computing systems, P2P systems and cloud computing. In this paper we take a detailed look at the problem of modeling and optimization of network computing systems for parallel decision tree induction methods. First, we present a comprehensive discussion on mentioned induction methods with a special focus on their parallel versions. Next, we propose a generic optimization model of a network computing system that can be used for distributed implementation of parallel decision trees. To illustrate our work we provide results of numerical experiments showing that the distributed approach enables significant improvement of the system throughput.


decision tree parallel machine learning distributed computing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ben-Haim, Y., Yom-Tov, E.: A streaming parallel decision tree algorithm. In: Proceedings of the PASCAL Workshop on Large Scale Learning Challenge, Helsinki, Finland (2008)Google Scholar
  2. 2.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth (1984)Google Scholar
  4. 4.
    Brodley, C.E., Utgoff, P.E.: Multivariate decision trees. Machine Learning 19(1), 45–77 (1995)zbMATHGoogle Scholar
  5. 5.
    Cover, T.M.: The best two independent measurements are not the two best. IEEE Transactions on Systems, Man and Cybernetics 4(1), 116–117 (1974)zbMATHGoogle Scholar
  6. 6.
    Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(1-4), 131–156 (1997)CrossRefGoogle Scholar
  7. 7.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Willey and Sons, New York (2001)zbMATHGoogle Scholar
  8. 8.
    Foster, I., Iamnitchi, A.: On death, taxes and the convergence of peer-to-peer and grid computing. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, pp. 118–128. Springer, Heidelberg (2003)Google Scholar
  9. 9.
    ILOG: CPLEX 11.0. user’s manual (2007)Google Scholar
  10. 10.
    Jin, R., Agrawal, G.: Communication and memory efficient parallel decision tree construction. In: Proceedings of the 3rd SIAM Conference on Data Mining, San Francisco, US, pp. 119–129 (2003)Google Scholar
  11. 11.
    Kufrin, R.: Decision trees on parallel processors. Parallel Processing for Artificial Intelligence 3, 279–306 (1997)CrossRefGoogle Scholar
  12. 12.
    Kurzyński, M.: The optimal strategy of a tree classifier. Pattern Recognition 16(1), 81–87 (1983)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Landwehr, N., et al.: Logistic model trees. Machine Learning 95(1-2), 161–205 (2005)CrossRefGoogle Scholar
  14. 14.
    Mehta, M., et al.: SLIQ: A fast scalable classifier for data mining. In: Proceedings of the 5th International Conference on Extending Database Technology, pp. 18–32. Avignon, France (1996)Google Scholar
  15. 15.
    Mitchell, T.M.: Machine Learning. McGraw-Hill Company, Incorporated, New York (1997)zbMATHGoogle Scholar
  16. 16.
    Nabrzyski, J., Schopf, J., Wêglarz, J.: Grid resource management: state of the art and future trends. Kluwer Academic Publishers, Boston (2004)zbMATHGoogle Scholar
  17. 17.
    Paliouras, G., Bree, D.S.: The effect of numeric features on the scalability of inductive learning programs. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 218–231. Springer, Heidelberg (1995)Google Scholar
  18. 18.
    Pióro, M., Medhi, D.: Routing, Flow, and Capacity Design in Communication and Computer Networks. Morgan Kaufman Publishers, San Francisco (2004)zbMATHGoogle Scholar
  19. 19.
    Quinlan, J.R.: C4.5: Program for Machine Learning. Morgan Kaufman, San Mateo (1993)Google Scholar
  20. 20.
    Shafer, J., et al.: SPRINT: A scalable parallel classifier for data mining. In: Proceedings of the 22nd Conference on Very Large Databases, pp. 544–555 (1996)Google Scholar
  21. 21.
    Srivastava, A., et al.: Parallel formulations of decision tree classification algorithms. Data Mining and Knowledge Discovery 3(3), 237–261 (1999)CrossRefGoogle Scholar
  22. 22.
    Taylor, I.: From P2P to Web services and grids: peers in a client/server world. Springer, Heidelberg (2005)zbMATHGoogle Scholar
  23. 23.
    Yidiz, O.T., Dikmen, O.: Parallel univariate decision trees. Pattern Recognition Letters 28, 825–832 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Chair of Systems and Computer NetworksWrocław University of TechnologyWrocławPoland

Personalised recommendations