Advertisement

Widened KRIMP: Better Performance through Diverse Parallelism

  • Oliver Sampson
  • Michael R. Berthold
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8819)

Abstract

We demonstrate that the previously introduced Widening framework is applicable to state-of-the-art Machine Learning algorithms. Using Krimp, an itemset mining algorithm, we show that parallelizing the search finds better solutions in nearly the same time as the original, sequential/greedy algorithm. We also introduce Reverse Standard Candidate Order (RSCO) as a candidate ordering heuristic for Krimp.

Keywords

Association Rule Code Table Standard Cover Direct Placement Candidate Itemsets 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)Google Scholar
  2. 2.
    Akbar, Z., Ivanova, V.N., Berthold, M.R.: Parallel data mining revisited. Better, not faster. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 23–34. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Akl, S.G.: Parallel real-time computation: Sometimes quantity means quality. In: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN 2000, pp. 2–11. IEEE (2000)Google Scholar
  4. 4.
    Arlia, D., Coppola, M.: Experiments in parallel clustering with DBSCAN. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 326–331. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)Google Scholar
  6. 6.
    Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds.) Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V (GfKL 2007), Berlin, Germany. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 319–326 (2007)Google Scholar
  7. 7.
    Böhm, C., Noll, R., Plant, C., Wackersreuther, B., Zherdin, A.: Data mining using graphics processing units. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 63–90. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Borgelt, C., Kruse, R.: Induction of association rules: Apriori implementation. In: Compstat, pp. 395–400. Springer (2002)Google Scholar
  9. 9.
    Chan, P., Stolfo, S.J.: Experiments on multistrategy learning by meta-learning. In: Proceedings of the Second International Conference on Information and Knowledge Management, pp. 314–323 (1993)Google Scholar
  10. 10.
    Coenen, F.: LUCS-KDD DN software (2003)Google Scholar
  11. 11.
    Dhillon, I.S., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 245–260. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  12. 12.
    Drosou, M., Pitoura, E.: Comparing diversity heuristics. Technical report, Technical Report 2009-05. Computer Science Department, University of Ioannina (2009)Google Scholar
  13. 13.
    Erkut, E.: The discrete p-dispersion problem. European Journal of Operational Research 46(1), 48–60 (1990)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Farivar, R., Rebolledo, D., Chan, E., Campbell, R.: A parallel implementation of k-means clustering on GPUs. In: Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 340–345 (2008)Google Scholar
  15. 15.
    Ivanova, V.N., Berthold, M.R.: Diversity-driven widening. In: Proceedings of the 12th International Symposium on Intelligent Data Analysis (IDA 2013) (2013)Google Scholar
  16. 16.
    Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles (1901)Google Scholar
  17. 17.
    Kantabutra, S., Couch, A.L.: Parallel k-means clustering algorithm on nows. NECTEC Technical Journal 1(6), 243–247 (2000)Google Scholar
  18. 18.
    Liu, G., Lu, H., Yu, J.X., Wei, W., Xiao, X.: AFOPT: An efficient implementation of pattern growth approach. In: Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations (2003)Google Scholar
  19. 19.
    Lowerre, B.T.: The HARPY speech recognition system. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1976)Google Scholar
  20. 20.
    Meinl, T.: Maximum-Score Diversity Selection. PhD thesis, University of Konstanz (July 2010)Google Scholar
  21. 21.
    Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)zbMATHCrossRefGoogle Scholar
  22. 22.
    Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the adap learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, vol. 261, p. 265 (1988)Google Scholar
  23. 23.
    Stoffel, K., Belkoniene, A.: Parallel k/h-means clustering for large data sets. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 1451–1454. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  24. 24.
    Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining itemsets that compress. Data Mining and Knowledge Discovery 23(1), 169–214 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Wolberg, W.H., Mangasarian, O.L.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences 87(23), 9193–9196 (1990)zbMATHCrossRefGoogle Scholar
  26. 26.
    Zhao, W., Ma, H., He, Q.: Parallel k-Means Clustering Based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Oliver Sampson
    • 1
  • Michael R. Berthold
    • 1
  1. 1.Bioinformatics and Information Mining , Department of Computer and Information ScienceUniversity of KonstanzGermany

Personalised recommendations