Widened KRIMP: Better Performance through Diverse Parallelism

Sampson, Oliver; Berthold, Michael R.

doi:10.1007/978-3-319-12571-8_24

Oliver Sampson¹⁷ &
Michael R. Berthold¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8819))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1505 Accesses
6 Citations

Abstract

We demonstrate that the previously introduced Widening framework is applicable to state-of-the-art Machine Learning algorithms. Using Krimp, an itemset mining algorithm, we show that parallelizing the search finds better solutions in nearly the same time as the original, sequential/greedy algorithm. We also introduce Reverse Standard Candidate Order (RSCO) as a candidate ordering heuristic for Krimp.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, vol. 1215, pp. 487–499 (1994)
Google Scholar
Akbar, Z., Ivanova, V.N., Berthold, M.R.: Parallel data mining revisited. Better, not faster. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 23–34. Springer, Heidelberg (2012)
Chapter Google Scholar
Akl, S.G.: Parallel real-time computation: Sometimes quantity means quality. In: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, I-SPAN 2000, pp. 2–11. IEEE (2000)
Google Scholar
Arlia, D., Coppola, M.: Experiments in parallel clustering with DBSCAN. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 326–331. Springer, Heidelberg (2001)
Chapter Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)
Google Scholar
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Preisach, C., Burkhardt, H., Schmidt-Thieme, L., Decker, R. (eds.) Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V (GfKL 2007), Berlin, Germany. Studies in Classification, Data Analysis, and Knowledge Organization, pp. 319–326 (2007)
Google Scholar
Böhm, C., Noll, R., Plant, C., Wackersreuther, B., Zherdin, A.: Data mining using graphics processing units. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 63–90. Springer, Heidelberg (2009)
Chapter Google Scholar
Borgelt, C., Kruse, R.: Induction of association rules: Apriori implementation. In: Compstat, pp. 395–400. Springer (2002)
Google Scholar
Chan, P., Stolfo, S.J.: Experiments on multistrategy learning by meta-learning. In: Proceedings of the Second International Conference on Information and Knowledge Management, pp. 314–323 (1993)
Google Scholar
Coenen, F.: LUCS-KDD DN software (2003)
Google Scholar
Dhillon, I.S., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 245–260. Springer, Heidelberg (2000)
Chapter Google Scholar
Drosou, M., Pitoura, E.: Comparing diversity heuristics. Technical report, Technical Report 2009-05. Computer Science Department, University of Ioannina (2009)
Google Scholar
Erkut, E.: The discrete p-dispersion problem. European Journal of Operational Research 46(1), 48–60 (1990)
Article MathSciNet MATH Google Scholar
Farivar, R., Rebolledo, D., Chan, E., Campbell, R.: A parallel implementation of k-means clustering on GPUs. In: Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 340–345 (2008)
Google Scholar
Ivanova, V.N., Berthold, M.R.: Diversity-driven widening. In: Proceedings of the 12th International Symposium on Intelligent Data Analysis (IDA 2013) (2013)
Google Scholar
Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles (1901)
Google Scholar
Kantabutra, S., Couch, A.L.: Parallel k-means clustering algorithm on nows. NECTEC Technical Journal 1(6), 243–247 (2000)
Google Scholar
Liu, G., Lu, H., Yu, J.X., Wei, W., Xiao, X.: AFOPT: An efficient implementation of pattern growth approach. In: Proceedings of the ICDM Workshop on Frequent Itemset Mining Implementations (2003)
Google Scholar
Lowerre, B.T.: The HARPY speech recognition system. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1976)
Google Scholar
Meinl, T.: Maximum-Score Diversity Selection. PhD thesis, University of Konstanz (July 2010)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Article MATH Google Scholar
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the adap learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, vol. 261, p. 265 (1988)
Google Scholar
Stoffel, K., Belkoniene, A.: Parallel k/h-means clustering for large data sets. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 1451–1454. Springer, Heidelberg (1999)
Chapter Google Scholar
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining itemsets that compress. Data Mining and Knowledge Discovery 23(1), 169–214 (2011)
Article MathSciNet MATH Google Scholar
Wolberg, W.H., Mangasarian, O.L.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences 87(23), 9193–9196 (1990)
Article MATH Google Scholar
Zhao, W., Ma, H., He, Q.: Parallel k-Means Clustering Based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics and Information Mining , Department of Computer and Information Science, University of Konstanz, Germany
Oliver Sampson & Michael R. Berthold

Authors

Oliver Sampson
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Berthold
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, KU Leuven, 3001, Heverlee, Belgium
Hendrik Blockeel & Matthijs van Leeuwen &
Brunel University, UB8 3PH, Uxbridge, UK
Veronica Vinciotti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sampson, O., Berthold, M.R. (2014). Widened KRIMP: Better Performance through Diverse Parallelism. In: Blockeel, H., van Leeuwen, M., Vinciotti, V. (eds) Advances in Intelligent Data Analysis XIII. IDA 2014. Lecture Notes in Computer Science, vol 8819. Springer, Cham. https://doi.org/10.1007/978-3-319-12571-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-12571-8_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12570-1
Online ISBN: 978-3-319-12571-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics