Skip to main content

Machine Learning Techniques—Reductions Between Prediction Quality Metrics

  • Chapter
Book cover Performance Modeling and Engineering

Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function. This tutorial discusses how to create robust reductions that perform well in practice. The reductions discussed here can be used to solve any supervised learning problem with a standard binary classification or regression algorithm available in any machine learning toolkit. We also discuss common design flaws in folklore reductions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Ailon and M. Mohri (2007) An Efficient Reduction of Ranking to Classification, New York University Technical Report, TR-2007-903.

    Google Scholar 

  2. E. Allwein, R. Schapire, and Y. Singer (2000) Reducing multiclass to binary: A unifying approach for margin classifiers, Journal of Machine Learning Research, 1:113–141.

    Article  MathSciNet  Google Scholar 

  3. A. Asuncion, D. Newman (2007) UCI Machine Learning Repository, http://mlearn.ics.uci.edu/MLRepository.html, University of California, Irvine.

    Google Scholar 

  4. N. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. Sorkin (2007) Robust reductions from ranking to classification, Proceedings of the 20th Annual Conference on Learning Theory (COLT), Lecture Notes in Computer Science 4539: 604–619.

    Article  MathSciNet  Google Scholar 

  5. A. Beygelzimer, V. Dani, T. Hayes, J. Langford, and B. Zadrozny (2005) Error limiting reductions between classification tasks, Proceedings of the 22nd International Conference on Machine Learning (ICML), 49–56.

    Google Scholar 

  6. A. Beygelzimer, J. Langford, and P. Ravikumar (2008) Filter trees for cost sensitive multiclass classification.

    Google Scholar 

  7. A. Beygelzimer, J. Langford, and P. Ravikumar (2008) Error Correcting Tournaments.

    Google Scholar 

  8. A. Beygelzimer, J. Langford, B. Zadrozny (2005) Weighted One-Against-All, Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), 720–725.

    Google Scholar 

  9. E. Bredensteiner and K. Bennett (1999) Multicategory classification by Support Vector Machines, Computational Optimization and Applications, 12(1-3): 53–79.

    Article  MATH  MathSciNet  Google Scholar 

  10. L. Breiman (1996) Bagging predictors, Machine Learning, 26(2):123–140.

    Google Scholar 

  11. K. Crammer and Y. Singer (2001) On the algorithmic implementation of multiclass mernel-based vector machines, Journal of Machine Learning Research 2: 265–292.

    Article  Google Scholar 

  12. K. Crammer and Y. Singer (2002) On the learnability and design of output codes for multiclass problems, Machine Learning, 47, 2-3: 201–233.

    Article  MATH  Google Scholar 

  13. T. Dietterich and G. Bakiri (1995) Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research, 2: 263–286.

    MATH  Google Scholar 

  14. P. Domingos (1999) MetaCost: A general method for making classifiers cost-sensitive, Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD), 155–164.

    Google Scholar 

  15. C. Drummond and R. Holte (2000) Exploiting the cost (in)sensitivity of decision tree splitting criteria, Proceedings of the 17th International Conference on Machine Learning (ICML), 239–246.

    Google Scholar 

  16. C. Elkan (2001) The foundations of cost-sensitive learning, Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), 973–978.

    Google Scholar 

  17. Y. Freund, Y. Mansour and R. Schapire (2004) Generalization bounds for averaged classifiers, The Annals of Statistics, 32(4): 1698–1722.

    MATH  MathSciNet  Google Scholar 

  18. Y. Freund and R. Schapire (1997) A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55(1): 119–139.

    Article  MATH  MathSciNet  Google Scholar 

  19. Y. Guermeur, A. Elisseeff, and H. Paugam-Moisy (2000) A new multi-class SVM based on a uniform convergence result, Proceedings of the IEEE International Joint Conference on Neural Networks 4, 183–188.

    Google Scholar 

  20. V. Guruswami and A. Sahai (1999) Multiclass learning, boosting, and error-correcting codes, Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT), 145–155.

    Google Scholar 

  21. T. Hastie and R. Tibshirani (1998) Classification by pairwise coupling, Advances in Neural Information Processing Systems (NIPS), 507–513.

    Google Scholar 

  22. L. Huang, X. Nguyen, M. Garofalakis, J. Hellerstein, M. Jordan, A. Joseph, and N. Taft (2007) Communication-efficient online detection of network-wide anomalies, Proceedings of the 26th Annual IEEE Conference on Computer Communications (INFOCOM), 134–142.

    Google Scholar 

  23. A. Kalai and R. Servedio (2003) Boosting in the presence of noise, Proceedings of the 35th Annual ACM Symposium on the Theory of Computing (STOC), 195–205.

    Google Scholar 

  24. M. Kearns (1998) Efficient noise-tolerant learning from statistical queries, Journal of the ACM, 45:6, 983–1006.

    Article  MATH  MathSciNet  Google Scholar 

  25. R. Koenker and K. Hallock (2001) Quantile regression, Journal of Economic Perspectives, 15, 143–156.

    Article  Google Scholar 

  26. J. Langford and A. Beygelzimer (2005) Sensitive Error Correcting Output Codes, Proceedings of the 18th Annual Conference on Learning Theory (COLT), 158–172.

    Google Scholar 

  27. J. Langford, R. Oliveira and B. Zadrozny (2006) Predicting conditional quantiles via reduction to classification, Proceedings of the 22nd Conference in Uncertainty in Artificial Intelligence (UAI).

    Google Scholar 

  28. J. Langford and B. Zadrozny (2005) Estimating class membership probabilities using classifier learners, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics.

    Google Scholar 

  29. J. Langford and B. Zadrozny (2005) Relating reinforcement learning performance to classification performance, Proceedings of the 22nd International Conference on Machine Learning (ICML), 473–480.

    Google Scholar 

  30. Y. Lee, Y. Lin, and G. Wahba (2004) Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data, Journal of American Statistical Association, 99: 67–81.

    Article  MATH  MathSciNet  Google Scholar 

  31. C. Hsu and C. Lin (2002) A comparison of methods for multi-class support vector machines, IEEE Transactions on Neural Networks, 13, 415–425.

    Article  Google Scholar 

  32. D. Margineantu (2002) Class probability estimation and cost-sensitive classification decisions, Proceedings of the 13th European Conference on Machine Learning, 270–281.

    Google Scholar 

  33. M. Mesnier, M. Wachs, R. Sambasivan, A. Zheng, and G. Ganger (2007) Modeling the relative fitness of storage, International Conference on Measuremen and Modeling of Computer Systems (SIGMETRICS), 37–48.

    Google Scholar 

  34. J. von Neumann (1951) Various techniques used in connection with random digits, National Bureau of Standards, Applied Mathematics Series, 12: 36–38.

    MathSciNet  Google Scholar 

  35. J. Platt (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, 61–74.

    Google Scholar 

  36. J. Platt, N. Cristiani and J. Shawe-Taylor (2000) Large margin DAGs for multiclass classification, Advances of Neural Information Processing Systems, 12: 547–553.

    Google Scholar 

  37. J. Platt, E. Kiciman and D. Maltz (2008) Fast variational inference for large-scale internet diagnosis, Advances in Neural Information Processing Systems 20.

    Google Scholar 

  38. R. Rifkin and A. Klautau (2004) In defense of one-vs-all classification, Journal of Machine Learning Research, 5: 101–141.

    MathSciNet  Google Scholar 

  39. I. Rish, M. Brodie and S. Ma (2002) Accuracy versus efficiency in probabilistic diagnosis, Proceedings of National Conference on Artificial Intelligence (AAAI), 560–566.

    Google Scholar 

  40. I. Takeuchi, Q. Le, T. Sears, and A. Smola (2006) Nonparametric quantile estimation, The Journal of Machine Learning Research, 7, 1231–1264.

    MathSciNet  Google Scholar 

  41. G. Tesauro, R. Das, H. Chan, J. Kephart, D. Levine, F. Rawson, and C. Lefurgy (2008) Managing power consumption and performance of computing systems using reinforcement learning, Advances in Neural Information Processing Systems 20.

    Google Scholar 

  42. V. Vapnik (1998) Statistical Learning Theory, John Wiley and Sons.

    Google Scholar 

  43. H. Wang, J. Platt, Y. Chen, R. Zhang, and Y. Wang (2004) Automatic Misconfiguration Troubleshooting with PeerPressure, Proceedings of the 6th Symposium on Operating Systems Design and Implementation, (2004). Also in Proceedings of the International Conference on Measurements and Modeling of Computer Systems, SIGMETRICS 2004, 398–399.

    Google Scholar 

  44. J. Weston and C. Watkins (1998) Multiclass support vector machines, Proceedings of the 11th European Symposium on Artificial Neural Networks, 219–224.

    Google Scholar 

  45. I. Witten and E. Frank (2000) Data Mining: Practical machine learning tools with Java implementations, Morgan Kaufmann, http://www.cs.waikato.ac.nz/ml/weka/.

  46. B. Zadrozny and C. Elkan (2001) Learning and making decisions when costs and probabilities are both unknown, Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD), 203–213.

    Google Scholar 

  47. B. Zadrozny, J. Langford and N. Abe (2003) Cost-sensitive learning by cost-proportionate example weighting, Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM), 435–442.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Beygelzimer, A., Langford, J., Zadrozny, B. (2008). Machine Learning Techniques—Reductions Between Prediction Quality Metrics. In: Liu, Z., Xia, C.H. (eds) Performance Modeling and Engineering. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79361-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-79361-0_1

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-79360-3

  • Online ISBN: 978-0-387-79361-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics