Distributed Pasting of Small Votes

Chawla, N. V.; Hall, L. O.; Bowyer, K. W.; Moore, T. E.; Kegelmeyer, W. P.

doi:10.1007/3-540-45428-4_5

N. V. Chawla⁶,
L. O. Hall⁶,
K. W. Bowyer⁷,
T. E. Moore Jr.⁶ &
…
W. P. Kegelmeyer⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2364))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

666 Accesses
10 Citations

Abstract

Bagging and boosting are two popular ensemble methods that achieve better accuracy than a single classifier. These techniques have limitations on massive datasets, as the size of the dataset can be a bottleneck. Voting many classifiers built on small subsets of data (“pasting small votes”) is a promising approach for learning from massive datasets. Pasting small votes can utilize the power of boosting and bagging, and potentially scale up to massive datasets. We propose a framework for building hundreds or thousands of such classifiers on small subsets of data in a distributed environment. Experiments show this approach is fast, accurate, and scalable to massive datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, Vol 36, 105–139. Kluwer (1999).
Article Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, University of California, Irvine, Dept. of Information and Computer Sciences (1998).
Google Scholar
Breiman, L. Bagging predictors. Machine Learning, Vol 24. Kluwer (1996) 123–140.
MATH MathSciNet Google Scholar
Breiman, L.: Pasting small votes for classification in large databases and on-line. Machine Learning, Vol 36. Kluwer (1999) 85–103.
Article Google Scholar
Chan, P., Stolfo, S.: Towards parallel and distributed learning by meta-learning. AAAI Workshop on Knowledge Discovery and Databases. (1993) 227–240.
Google Scholar
Chawla, N., Eschrich, S., Hall, L.O.; Creating ensembles of classifiers. First IEEE International Conference on Data Mining. (2000).
Google Scholar
Chawla, N.V., Moore, T.E., Bowyer, K.W., Hall, L.O., Springer, C., Kegelmeyer, W.P.: Bagging is a small dataset phenomenon. International Conference of Computer Vision and Pattern Recognition (CVPR). (2000) 684–689.
Google Scholar
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, Vol 40. Kluwer (2000) 139–158.
Article Google Scholar
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. Thirteenth International Conference on Machine Learning. (1996).
Google Scholar
Hall, L.O., Chawla, N.V., Bowyer, K.W., Kegelmeyer, W.P.: Learning rules from distributed data. Workshop of Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (1999).
Google Scholar
Jones, D.: Protein secondary structure prediction based on decision-specific scoring matrices. Journal of Molecular Biology, Vol 292. (1999) 195–202.
Article Google Scholar
Latinne, P., Debeir, O., Decaestecker, C.: Different ways of weakening decision trees and their impact on classification accuracy of DT combination. First International Workshop on Multiple Classifier Systems. Lecture Notes in Computer Science, Vol 1857. Springer-Verlag, (2000) 200–210.
Chapter Google Scholar
Lazarevic, A., Obradovic, Z.: The distributed boosting algorithm. Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2000).
Google Scholar
Musick, R., Catlett, J., Russell, S.. Decision theoretic subsampling for induction on large databases. Tenth International Conference on Machine Learning, Amherst, MA. (1993) 212–219.
Google Scholar
Provost, F.J., Hennessy D.N.: Scaling up: Distributed machine learning with cooperation. Thirteenth National Conference on Artificial Intelligence. (1996) 74–79.
Google Scholar
Protein Data Bank. http://www.rcsb.org/pdb/
Provost, F., Jensen D., Oates, T.: Efficient progressive sampling. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (1999) 23–32.
Google Scholar
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufman San Mateo, CA (1992).
Google Scholar
Sandia National Labs.: ASCI RED, the world’s first TeraFLOPS supercomputer. http://www.sandia.gov/ASCI/Red.
Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. http://www.acm.org/sigkdd/kdd2001/ (2001).

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of South Florida, 4202 E. Fowler Avenue, Tampa, Florida, 33620, USA
N. V. Chawla, L. O. Hall & T. E. Moore Jr.
Department of Computer Science and Engineering, University of Notre Dame, 384 Fitzpatrick Hall, Notre Dame, IN, 46556, USA
K. W. Bowyer
Biosystems Research Department, Sandia National Labs, P.O. Box 969, MS 9951, Livermore, CA, 94551-0969, USA
W. P. Kegelmeyer

Authors

N. V. Chawla
View author publications
You can also search for this author in PubMed Google Scholar
L. O. Hall
View author publications
You can also search for this author in PubMed Google Scholar
K. W. Bowyer
View author publications
You can also search for this author in PubMed Google Scholar
T. E. Moore Jr.
View author publications
You can also search for this author in PubMed Google Scholar
W. P. Kegelmeyer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Electrical and Electronical Engineering, University of Cagliari, Piazza D’Armi, 09123, Cagliari, Italy
Fabio Roli
Centre for Vision, Speech and Signal Processing, University of Surrey, Guilford, Surrey, GUZ 7XH, UK
Josef Kittler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chawla, N.V., Hall, L.O., Bowyer, K.W., Moore, T.E., Kegelmeyer, W.P. (2002). Distributed Pasting of Small Votes. In: Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2002. Lecture Notes in Computer Science, vol 2364. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45428-4_5

Download citation

DOI: https://doi.org/10.1007/3-540-45428-4_5
Published: 21 June 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43818-2
Online ISBN: 978-3-540-45428-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics