Machine Learning

, Volume 92, Issue 2–3, pp 479–502 | Cite as

Multi-stage classifier design

  • Kirill Trapeznikov
  • Venkatesh Saligrama
  • David Castañón


In many classification systems, sensing modalities have different acquisition costs. It is often unnecessary to use every modality to classify a majority of examples. We study a multi-stage system in a prediction time cost reduction setting, where the full data is available for training, but for a test example, measurements in a new modality can be acquired at each stage for an additional cost. We seek decision rules to reduce the average measurement acquisition cost. We formulate an empirical risk minimization problem (ERM) for a multi-stage reject classifier, wherein the stage k classifier either classifies a sample using only the measurements acquired so far or rejects it to the next stage where more attributes can be acquired for a cost. If we restrict ourselves to binary classification setting then, to solve the ERM problem, we show that the optimal reject classifier at each stage is a combination of two binary classifiers, one biased towards positive examples and the other biased towards negative examples. We use this parameterization to construct stage-by-stage global surrogate risk, develop an iterative algorithm in the boosting framework and present convergence and generalization results. We test our work on synthetic, medical and explosives detection datasets. Our results demonstrate that substantial cost reduction without a significant sacrifice in accuracy is achievable.


Multi-stage classification Sequential decision Boosting Cost sensitive learning 



This work is partially supported by the U.S. DHS Award 2008-ST-061-ED000, NSF Grant 0932114 and NGA Grant HM1582-09-1-0037.


  1. Allwein, E. L., Schapire, R. E., & Singer, Y. (2001). Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141. MathSciNetMATHGoogle Scholar
  2. Bartlett, P. L., & Wegkamp, M. H. (2008). Classification with a reject option using a hinge loss. The Journal of Machine Learning Research, 9, 1823–1840. MathSciNetMATHGoogle Scholar
  3. Bilgic, M., & Getoor, L. (2007). Voila: efficient feature-value acquisition for classification. In AAAI conference on artificial intelligence. Google Scholar
  4. Chen, M., Xu, Z., Weinberger, K. Q., Chapelle, O., & Kedem, D. (2012). Classifier cascade: tradeoff between accuracy and feature evaluation cost. In International conference on artificial intelligence and statistics. Google Scholar
  5. Chow, C. (1970). On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16(1), 41–46. doi: 10.1109/TIT.1970.1054406. MATHCrossRefGoogle Scholar
  6. Cordella, L. P., & Sansone, C. (2007). A multi-stage classification system for detecting intrusions in computer networks. Pattern Analysis & Applications, 10(2), 83–100. MathSciNetCrossRefGoogle Scholar
  7. El-Yaniv, R., & Wiener, Y. (2011). Agnostic selective classification. In Advances in neural information processing systems. Google Scholar
  8. Fan, W., Chu, F., Wang, H., & Yu, P. S. (2002). Pruning and dynamic scheduling of cost-sensitive ensembles. In AAAI conference on artificial intelligence. Google Scholar
  9. Fan, W., Lee, W., Stolfo, S. J., & Miller, M. (2000). A multiple model cost-sensitive approach for intrusion detection. In European conference on machine learning. Google Scholar
  10. Friedman, J., Hastie, T., & Tibshirani, R. (2001). Springer series in statistics: Vol1. The elements of statistical learning. Berlin: Springer. MATHGoogle Scholar
  11. Grandvalet, Y., Rakotomamonjy, A., Keshet, J., & Canu, S. (2008). Support vector machines with a reject option. In Advances in neural information processing systems. Google Scholar
  12. Ji, S., & Carin, L. (2007). Cost-sensitive feature acquisition and classification. Pattern Recognition, 40(5), 1474–1485. MATHCrossRefGoogle Scholar
  13. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1), 99–134. MathSciNetMATHCrossRefGoogle Scholar
  14. Kanani, P., & Melville, P. (2008). Prediction-time active feature-value acquisition for cost-effective customer targeting. In Advances in neural information processing systems. Google Scholar
  15. Kapoor, A., & Horvitz, E. (2009). Breaking boundaries: active information acquisition across learning and diagnosis. In Advances in neural information processing systems. Google Scholar
  16. Lee, W., Fan, W., Miller, M., Stolfo, S. J., & Zadok, E. (2002). Toward cost-sensitive modeling for intrusion detection and response. Journal of Computer Security, 10(1), 5–22. Google Scholar
  17. Liu, L. P., Yu, Y., Jiang, Y., & Zhou, Z. H. (2008). TEFE: a time-efficient approach to feature extraction. In International conference on data mining. Google Scholar
  18. MacKay, D. J. (1992). Information-based objective functions for active data selection. Neural Computation, 4(4), 590–604. CrossRefGoogle Scholar
  19. Masnadi-Shirazi, H., & Vasconcelos, N. (2009). On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In Advances in neural information processing systems. Google Scholar
  20. Rodríguez-Díaz, E., & Castañón, D. A. (2009). Support vector machine classifiers for sequential decision problems. In IEEE conference on decision and control. Google Scholar
  21. Rosset, S., Zhu, J., & Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. The Journal of Machine Learning Research, 5, 941–973. MathSciNetMATHGoogle Scholar
  22. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686. MathSciNetMATHCrossRefGoogle Scholar
  23. Sheng, V. S., & Ling, C. X. (2006). Feature value acquisition in testing: a sequential batch test algorithm. In International conference on machine learning (pp. 809–816). Google Scholar
  24. Trapeznikov, K., Saligrama, V., & Castañon, D. A. (2012). Multi-stage classifier design. In Asian conference on machine learning. Google Scholar
  25. Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154. CrossRefGoogle Scholar
  26. Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Knowledge discovery and data mining. CrossRefGoogle Scholar
  27. Yuan, C., & Casasent, D. (2003). A novel support vector classifier with better rejection performance. In Computer vision and pattern recognition. Google Scholar
  28. Zhang, C., & Zhang, Z. (2010). A survey of recent advances in face detection (Microsoft research technical report). Google Scholar
  29. Zubek, V. B., & Dietterich, T. G. (2002). Pruning improves heuristic search for cost-sensitive learning. In International conference on machine learning. Google Scholar

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Kirill Trapeznikov
    • 1
  • Venkatesh Saligrama
    • 1
  • David Castañón
    • 1
  1. 1.BostonUSA

Personalised recommendations