Skip to main content

Parallel Data Mining Revisited. Better, Not Faster

  • Conference paper
Advances in Intelligent Data Analysis XI (IDA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7619))

Included in the following conference series:

Abstract

In this paper we argue that parallel and/or distributed compute resources can be used differently: instead of focusing on speeding up algorithms, we propose to focus on improving accuracy. In a nutshell, the goal is to tune data mining algorithms to produce better results in the same time rather than producing similar results a lot faster. We discuss a number of generic ways of tuning data mining algorithms and elaborate on two prominent examples in more detail. A series of exemplary experiments is used to illustrate the effect such use of parallel resources can have.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press (1972)

    Google Scholar 

  2. Johnson, D.S.: Approximation algorithms for combinatorial problems. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, STOC 1973, pp. 38–49. ACM, New York (1973)

    Chapter  Google Scholar 

  3. Beasley, J.E.: Or-library: Distributing test problems by electronic mail. The Journal of the Operational Research Society 41(11), 1069–1072 (1990)

    Google Scholar 

  4. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)

    Google Scholar 

  5. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  6. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  7. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  8. Quinlan, J.R., Cameron-Jones, R.M.: Oversearching and layered search in empirical learning. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1019–1024 (1995)

    Google Scholar 

  9. Murthy, S., Salzberg, S.: Lookahead and pathology in decision tree induction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1025–1031 (1995)

    Google Scholar 

  10. Iba, W., Langley, P.: Induction of one-level decision trees. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 233–240 (1992)

    Google Scholar 

  11. Sarkar, U., Chakrabarti, P., Ghose, S., Desarkar, S.: Improving Greedy Algorithms by Lookahead-Search. Journal of Algorithms 16(1), 1–23 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  12. Elomaa, T., Malinen, T.: On Lookahead Heuristics in Decision Tree Learning. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 445–453. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Esmeir, S., Markovitch, S.: Lookahead-based algorithms for anytime induction of decision trees. In: Twenty-First International Conference on Machine Learning, ICML 2004, pp. 33–40. ACM Press, New York (2004)

    Chapter  Google Scholar 

  14. Brönnimann, H., Goodrich, M.T.: Almost optimal set covers in finite vc-dimension (preliminary version). In: Proceedings of the Tenth Annual Symposium on Computational Geometry, SCG 1994, pp. 293–302. ACM, New York (1994)

    Chapter  Google Scholar 

  15. Berger, B., Rompel, J., Shor, P.W.: Efficient nc algorithms for set cover with applications to learning and geometry. Journal of Computer and System Sciences 49(3), 454–477 (1994)

    Article  MathSciNet  Google Scholar 

  16. Blelloch, G.E., Peng, R., Tangwongsan, K.: Linear-work greedy parallel approximate set cover and variants. In: Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 23–32. ACM, New York (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Akbar, Z., Ivanova, V.N., Berthold, M.R. (2012). Parallel Data Mining Revisited. Better, Not Faster. In: Hollmén, J., Klawonn, F., Tucker, A. (eds) Advances in Intelligent Data Analysis XI. IDA 2012. Lecture Notes in Computer Science, vol 7619. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34156-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34156-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34155-7

  • Online ISBN: 978-3-642-34156-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics