Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function. This tutorial discusses how to create robust reductions that perform well in practice. The reductions discussed here can be used to solve any supervised learning problem with a standard binary classification or regression algorithm available in any machine learning toolkit. We also discuss common design flaws in folklore reductions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Ailon and M. Mohri (2007) An Efficient Reduction of Ranking to Classification, New York University Technical Report, TR-2007-903.
E. Allwein, R. Schapire, and Y. Singer (2000) Reducing multiclass to binary: A unifying approach for margin classifiers, Journal of Machine Learning Research, 1:113–141.
A. Asuncion, D. Newman (2007) UCI Machine Learning Repository, http://mlearn.ics.uci.edu/MLRepository.html, University of California, Irvine.
N. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. Sorkin (2007) Robust reductions from ranking to classification, Proceedings of the 20th Annual Conference on Learning Theory (COLT), Lecture Notes in Computer Science 4539: 604–619.
A. Beygelzimer, V. Dani, T. Hayes, J. Langford, and B. Zadrozny (2005) Error limiting reductions between classification tasks, Proceedings of the 22nd International Conference on Machine Learning (ICML), 49–56.
A. Beygelzimer, J. Langford, and P. Ravikumar (2008) Filter trees for cost sensitive multiclass classification.
A. Beygelzimer, J. Langford, and P. Ravikumar (2008) Error Correcting Tournaments.
A. Beygelzimer, J. Langford, B. Zadrozny (2005) Weighted One-Against-All, Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), 720–725.
E. Bredensteiner and K. Bennett (1999) Multicategory classification by Support Vector Machines, Computational Optimization and Applications, 12(1-3): 53–79.
L. Breiman (1996) Bagging predictors, Machine Learning, 26(2):123–140.
K. Crammer and Y. Singer (2001) On the algorithmic implementation of multiclass mernel-based vector machines, Journal of Machine Learning Research 2: 265–292.
K. Crammer and Y. Singer (2002) On the learnability and design of output codes for multiclass problems, Machine Learning, 47, 2-3: 201–233.
T. Dietterich and G. Bakiri (1995) Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research, 2: 263–286.
P. Domingos (1999) MetaCost: A general method for making classifiers cost-sensitive, Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD), 155–164.
C. Drummond and R. Holte (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria, Proceedings of the 17th International Conference on Machine Learning (ICML), 239–246.
C. Elkan (2001) The foundations of cost-sensitive learning, Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), 973–978.
Y. Freund, Y. Mansour and R. Schapire (2004) Generalization bounds for averaged classifiers, The Annals of Statistics, 32(4): 1698–1722.
Y. Freund and R. Schapire (1997) A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55(1): 119–139.
Y. Guermeur, A. Elisseeff, and H. Paugam-Moisy (2000) A new multi-class SVM based on a uniform convergence result, Proceedings of the IEEE International Joint Conference on Neural Networks 4, 183–188.
V. Guruswami and A. Sahai (1999) Multiclass learning, boosting, and error-correcting codes, Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT), 145–155.
T. Hastie and R. Tibshirani (1998) Classification by pairwise coupling, Advances in Neural Information Processing Systems (NIPS), 507–513.
L. Huang, X. Nguyen, M. Garofalakis, J. Hellerstein, M. Jordan, A. Joseph, and N. Taft (2007) Communication-efficient online detection of network-wide anomalies, Proceedings of the 26th Annual IEEE Conference on Computer Communications (INFOCOM), 134–142.
A. Kalai and R. Servedio (2003) Boosting in the presence of noise, Proceedings of the 35th Annual ACM Symposium on the Theory of Computing (STOC), 195–205.
M. Kearns (1998) Efficient noise-tolerant learning from statistical queries, Journal of the ACM, 45:6, 983–1006.
R. Koenker and K. Hallock (2001) Quantile regression, Journal of Economic Perspectives, 15, 143–156.
J. Langford and A. Beygelzimer (2005) Sensitive Error Correcting Output Codes, Proceedings of the 18th Annual Conference on Learning Theory (COLT), 158–172.
J. Langford, R. Oliveira and B. Zadrozny (2006) Predicting conditional quantiles via reduction to classification, Proceedings of the 22nd Conference in Uncertainty in Artificial Intelligence (UAI).
J. Langford and B. Zadrozny (2005) Estimating class membership probabilities using classifier learners, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics.
J. Langford and B. Zadrozny (2005) Relating reinforcement learning performance to classification performance, Proceedings of the 22nd International Conference on Machine Learning (ICML), 473–480.
Y. Lee, Y. Lin, and G. Wahba (2004) Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data, Journal of American Statistical Association, 99: 67–81.
C. Hsu and C. Lin (2002) A comparison of methods for multi-class support vector machines, IEEE Transactions on Neural Networks, 13, 415–425.
D. Margineantu (2002) Class probability estimation and cost-sensitive classification decisions, Proceedings of the 13th European Conference on Machine Learning, 270–281.
M. Mesnier, M. Wachs, R. Sambasivan, A. Zheng, and G. Ganger (2007) Modeling the relative fitness of storage, International Conference on Measuremen and Modeling of Computer Systems (SIGMETRICS), 37–48.
J. von Neumann (1951) Various techniques used in connection with random digits, National Bureau of Standards, Applied Mathematics Series, 12: 36–38.
J. Platt (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, 61–74.
J. Platt, N. Cristiani and J. Shawe-Taylor (2000) Large margin DAGs for multiclass classification, Advances of Neural Information Processing Systems, 12: 547–553.
J. Platt, E. Kiciman and D. Maltz (2008) Fast variational inference for large-scale internet diagnosis, Advances in Neural Information Processing Systems 20.
R. Rifkin and A. Klautau (2004) In defense of one-vs-all classification, Journal of Machine Learning Research, 5: 101–141.
I. Rish, M. Brodie and S. Ma (2002) Accuracy versus efficiency in probabilistic diagnosis, Proceedings of National Conference on Artificial Intelligence (AAAI), 560–566.
I. Takeuchi, Q. Le, T. Sears, and A. Smola (2006) Nonparametric quantile estimation, The Journal of Machine Learning Research, 7, 1231–1264.
G. Tesauro, R. Das, H. Chan, J. Kephart, D. Levine, F. Rawson, and C. Lefurgy (2008) Managing power consumption and performance of computing systems using reinforcement learning, Advances in Neural Information Processing Systems 20.
V. Vapnik (1998) Statistical Learning Theory, John Wiley and Sons.
H. Wang, J. Platt, Y. Chen, R. Zhang, and Y. Wang (2004) Automatic Misconfiguration Troubleshooting with PeerPressure, Proceedings of the 6th Symposium on Operating Systems Design and Implementation, (2004). Also in Proceedings of the International Conference on Measurements and Modeling of Computer Systems, SIGMETRICS 2004, 398–399.
J. Weston and C. Watkins (1998) Multiclass support vector machines, Proceedings of the 11th European Symposium on Artificial Neural Networks, 219–224.
I. Witten and E. Frank (2000) Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann, http://www.cs.waikato.ac.nz/ml/weka/.
B. Zadrozny and C. Elkan (2001) Learning and making decisions when costs and probabilities are both unknown, Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD), 203–213.
B. Zadrozny, J. Langford and N. Abe (2003) Cost-sensitive learning by cost-proportionate example weighting, Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), 435–442.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Beygelzimer, A., Langford, J., Zadrozny, B. (2008). Machine Learning Techniques—Reductions Between Prediction Quality Metrics. In: Liu, Z., Xia, C.H. (eds) Performance Modeling and Engineering. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79361-0_1
Download citation
DOI: https://doi.org/10.1007/978-0-387-79361-0_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79360-3
Online ISBN: 978-0-387-79361-0
eBook Packages: Computer ScienceComputer Science (R0)