Skip to main content

Bucket Selection: A Model-Independent Diverse Selection Strategy for Widening

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XVI (IDA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10584))

Included in the following conference series:

Abstract

When using a greedy algorithm for finding a model, as is the case in many data mining algorithms, there is a risk of getting caught in local extrema, i.e., suboptimal solutions. Widening is a technique for enhancing greedy algorithms by using parallel resources to broaden the search in the model space. The most important component of widening is the selector, a function that chooses the next models to refine. This selector ideally enforces diversity within the selected set of models in order to ensure that parallel workers explore sufficiently different parts of the model space and do not end up mimicking a simple beam search. Previous publications have shown that this works well for problems with a suitable distance measure for the models, but if no such measure is available, applying widening is challenging. In addition these approaches require extensive, sequential computations for diverse subset selection, making the entire process much slower than the original greedy algorithm. In this paper we propose the bucket selector, a model-independent randomized selection strategy. We find that (a) the bucket selector is a lot faster and not significantly worse when a diversity measure exists and (b) it performs better than existing selection strategies in cases without a diversity measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Due to the nature of the set cover problem and the chosen heuristic, models with the same score occur frequently and other tie-breaking methods may be feasible. This is, however, out of scope of this work.

  2. 2.

    http://openjdk.java.net/projects/code-tools/jmh/ (1/26/2017).

References

  1. Akbar, Z., Ivanova, V.N., Berthold, M.R.: Parallel data mining revisited. Better, not faster. In: Hollmén, J., Klawonn, F., Tucker, A. (eds.) IDA 2012. LNCS, vol. 7619, pp. 23–34. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34156-4_4

    Chapter  Google Scholar 

  2. Amado, N., Gama, J., Silva, F.: Parallel implementation of decision tree learning algorithms. In: Brazdil, P., Jorge, A. (eds.) EPIA 2001. LNCS, vol. 2258, pp. 6–13. Springer, Heidelberg (2001). doi:10.1007/3-540-45329-6_4

    Chapter  Google Scholar 

  3. Beasley, J.E.: OR-Library: distributing test problems by electronic mail. J. Opl. Res. Soc. 41(11), 1069–1072 (1990)

    Article  Google Scholar 

  4. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  5. Bruno, N., Galindo-Legaria, C.A., Joshi, M.: Polynomial heuristics for query optimization. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 589–600 (2010)

    Google Scholar 

  6. Zhihua, D., Lin, F.: A novel parallelization approach for hierarchical clustering. Parallel Comput. 31(5), 523–527 (2005)

    Article  Google Scholar 

  7. Fillbrunn, A., Berthold, M.R.: Diversity-driven widening of hierarchical agglomerative clustering. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 84–94. Springer, Cham (2015). doi:10.1007/978-3-319-24465-5_8

    Chapter  Google Scholar 

  8. Goldberg, D.E., Richardson, J.T.: Genetic algorithms with sharing for multimodal function optimization. In: Proceedings of International Conference on Genetic Algorithms (ICGA), pp. 41–49 (1987)

    Google Scholar 

  9. Ivanova, V.N., Berthold, M.R.: Diversity-driven widening. In: Tucker, A., Höppner, F., Siebes, A., Swift, S. (eds.) IDA 2013. LNCS, vol. 8207, pp. 223–236. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41398-8_20

    Chapter  Google Scholar 

  10. Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9(3), 256–278 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  11. Korte, B., Vygen, J.: Combinatorial Optimization. Algorithms and Combinatorics. Springer, Heidelberg (2013)

    MATH  Google Scholar 

  12. Sampson, O., Berthold, M.R., Widened, K.: Better performance through diverse parallelism. In: Proceedings of International Symposium on Intelligent Data Analysis (IDA), pp. 276–285 (2014)

    Google Scholar 

  13. Sampson, O.R., Berthold, M.R.: Widened learning of Bayesian network classifiers. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds.) IDA 2016. LNCS, vol. 9897, pp. 215–225. Springer, Cham (2016). doi:10.1007/978-3-319-46349-0_19

    Chapter  Google Scholar 

  14. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of International Conference on Management of Data (SIGMOD), pp. 23–34 (1979)

    Google Scholar 

Download references

Acknowledgements

This work was partially funded by BMBF (grant 031A535C) and the Konstanz Research School Chemical Biology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Fillbrunn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Fillbrunn, A., Wörteler, L., Grossniklaus, M., Berthold, M.R. (2017). Bucket Selection: A Model-Independent Diverse Selection Strategy for Widening. In: Adams, N., Tucker, A., Weston, D. (eds) Advances in Intelligent Data Analysis XVI. IDA 2017. Lecture Notes in Computer Science(), vol 10584. Springer, Cham. https://doi.org/10.1007/978-3-319-68765-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68765-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68764-3

  • Online ISBN: 978-3-319-68765-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics