Parallel Data Mining Revisited. Better, Not Faster

  • Zaenal Akbar
  • Violeta N. Ivanova
  • Michael R. Berthold
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7619)

Abstract

In this paper we argue that parallel and/or distributed compute resources can be used differently: instead of focusing on speeding up algorithms, we propose to focus on improving accuracy. In a nutshell, the goal is to tune data mining algorithms to produce better results in the same time rather than producing similar results a lot faster. We discuss a number of generic ways of tuning data mining algorithms and elaborate on two prominent examples in more detail. A series of exemplary experiments is used to illustrate the effect such use of parallel resources can have.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press (1972)Google Scholar
  2. 2.
    Johnson, D.S.: Approximation algorithms for combinatorial problems. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, STOC 1973, pp. 38–49. ACM, New York (1973)CrossRefGoogle Scholar
  3. 3.
    Beasley, J.E.: Or-library: Distributing test problems by electronic mail. The Journal of the Operational Research Society 41(11), 1069–1072 (1990)Google Scholar
  4. 4.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)Google Scholar
  5. 5.
    Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  6. 6.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  7. 7.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar
  8. 8.
    Quinlan, J.R., Cameron-Jones, R.M.: Oversearching and layered search in empirical learning. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1019–1024 (1995)Google Scholar
  9. 9.
    Murthy, S., Salzberg, S.: Lookahead and pathology in decision tree induction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1025–1031 (1995)Google Scholar
  10. 10.
    Iba, W., Langley, P.: Induction of one-level decision trees. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 233–240 (1992)Google Scholar
  11. 11.
    Sarkar, U., Chakrabarti, P., Ghose, S., Desarkar, S.: Improving Greedy Algorithms by Lookahead-Search. Journal of Algorithms 16(1), 1–23 (1994)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Elomaa, T., Malinen, T.: On Lookahead Heuristics in Decision Tree Learning. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 445–453. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Esmeir, S., Markovitch, S.: Lookahead-based algorithms for anytime induction of decision trees. In: Twenty-First International Conference on Machine Learning, ICML 2004, pp. 33–40. ACM Press, New York (2004)CrossRefGoogle Scholar
  14. 14.
    Brönnimann, H., Goodrich, M.T.: Almost optimal set covers in finite vc-dimension (preliminary version). In: Proceedings of the Tenth Annual Symposium on Computational Geometry, SCG 1994, pp. 293–302. ACM, New York (1994)CrossRefGoogle Scholar
  15. 15.
    Berger, B., Rompel, J., Shor, P.W.: Efficient nc algorithms for set cover with applications to learning and geometry. Journal of Computer and System Sciences 49(3), 454–477 (1994)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Blelloch, G.E., Peng, R., Tangwongsan, K.: Linear-work greedy parallel approximate set cover and variants. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 23–32. ACM, New York (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Zaenal Akbar
    • 1
  • Violeta N. Ivanova
    • 1
  • Michael R. Berthold
    • 1
  1. 1.Nycomed-Chair for Bioinformatics and Information Mining, Dept. of Computer and Information ScienceUniversity of KonstanzKonstanzGermany

Personalised recommendations