Skip to main content

Widened KRIMP: Better Performance through Diverse Parallelism

  • Conference paper
Advances in Intelligent Data Analysis XIII (IDA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8819))

Included in the following conference series:

Abstract

We demonstrate that the previously introduced Widening framework is applicable to state-of-the-art Machine Learning algorithms. Using Krimp, an itemset mining algorithm, we show that parallelizing the search finds better solutions in nearly the same time as the original, sequential/greedy algorithm. We also introduce Reverse Standard Candidate Order (RSCO) as a candidate ordering heuristic for Krimp.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  2. Akbar, Z., Ivanova, V.N., Berthold, M.R.: Parallel data mining revisited. Better, not faster. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 23–34. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Akl, S.G.: Parallel real-time computation: Sometimes quantity means quality. In: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN 2000, pp. 2–11. IEEE (2000)

    Google Scholar 

  4. Arlia, D., Coppola, M.: Experiments in parallel clustering with DBSCAN. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 326–331. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)

    Google Scholar 

  6. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds.) Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V (GfKL 2007), Berlin, Germany. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 319–326 (2007)

    Google Scholar 

  7. Böhm, C., Noll, R., Plant, C., Wackersreuther, B., Zherdin, A.: Data mining using graphics processing units. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 63–90. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. Borgelt, C., Kruse, R.: Induction of association rules: Apriori implementation. In: Compstat, pp. 395–400. Springer (2002)

    Google Scholar 

  9. Chan, P., Stolfo, S.J.: Experiments on multistrategy learning by meta-learning. In: Proceedings of the Second International Conference on Information and Knowledge Management, pp. 314–323 (1993)

    Google Scholar 

  10. Coenen, F.: LUCS-KDD DN software (2003)

    Google Scholar 

  11. Dhillon, I.S., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 245–260. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  12. Drosou, M., Pitoura, E.: Comparing diversity heuristics. Technical report, Technical Report 2009-05. Computer Science Department, University of Ioannina (2009)

    Google Scholar 

  13. Erkut, E.: The discrete p-dispersion problem. European Journal of Operational Research 46(1), 48–60 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  14. Farivar, R., Rebolledo, D., Chan, E., Campbell, R.: A parallel implementation of k-means clustering on GPUs. In: Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 340–345 (2008)

    Google Scholar 

  15. Ivanova, V.N., Berthold, M.R.: Diversity-driven widening. In: Proceedings of the 12th International Symposium on Intelligent Data Analysis (IDA 2013) (2013)

    Google Scholar 

  16. Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles (1901)

    Google Scholar 

  17. Kantabutra, S., Couch, A.L.: Parallel k-means clustering algorithm on nows. NECTEC Technical Journal 1(6), 243–247 (2000)

    Google Scholar 

  18. Liu, G., Lu, H., Yu, J.X., Wei, W., Xiao, X.: AFOPT: An efficient implementation of pattern growth approach. In: Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations (2003)

    Google Scholar 

  19. Lowerre, B.T.: The HARPY speech recognition system. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1976)

    Google Scholar 

  20. Meinl, T.: Maximum-Score Diversity Selection. PhD thesis, University of Konstanz (July 2010)

    Google Scholar 

  21. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)

    Article  MATH  Google Scholar 

  22. Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the adap learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, vol. 261, p. 265 (1988)

    Google Scholar 

  23. Stoffel, K., Belkoniene, A.: Parallel k/h-means clustering for large data sets. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 1451–1454. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  24. Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining itemsets that compress. Data Mining and Knowledge Discovery 23(1), 169–214 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. Wolberg, W.H., Mangasarian, O.L.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences 87(23), 9193–9196 (1990)

    Article  MATH  Google Scholar 

  26. Zhao, W., Ma, H., He, Q.: Parallel k-Means Clustering Based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sampson, O., Berthold, M.R. (2014). Widened KRIMP: Better Performance through Diverse Parallelism. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds) Advances in Intelligent Data Analysis XIII. IDA 2014. Lecture Notes in Computer Science, vol 8819. Springer, Cham. https://doi.org/10.1007/978-3-319-12571-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12571-8_24

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12570-1

  • Online ISBN: 978-3-319-12571-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics