Optimization and Engineering

, Volume 14, Issue 2, pp 225–250 | Cite as

Robust formulations for clustering-based large-scale classification

  • Saketha Nath Jagarlapudi
  • Aharon Ben-Tal
  • Chiranjib Bhattacharyya
Article

Abstract

Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.

Keywords

Confidence sets Robustness Large dataset classification SOCP Chebyshev inequality 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arnold BC, Shavelle RM (1998) Joint confidence sets for the mean and variance of a normal distribution. Am Stat 52(2):133–140 MathSciNetGoogle Scholar
  2. Babaria R, Nath JS, Krishnan S, Sivaramakrishnan KR, Bhattacharyya C, Murty MN (2007) Focused crawling with scalable ordinal regression solvers. In: Proceedings of the ICML conference, Oregon Google Scholar
  3. Bach FR, Heckerman D, Horvitz E (2005) On the path to an ideal roc curve: considering cost asymmetry in learning classifiers. In: Proceedings of the tenth international workshop on artificial intelligence and statistics (AISTATS) Google Scholar
  4. Ben-Tal A, Nemirovski A (1988) Robust convex optimization. Math Oper Res 23:769–805 MathSciNetCrossRefGoogle Scholar
  5. Ben-Tal A, Nemirovski A (2002) Robust optimization—methodology and applications. Math Program 92:453–480 MathSciNetMATHCrossRefGoogle Scholar
  6. Ben-Tal A, Nemirovski A (2006) On safe tractable approximations of chance constrained linear matrix inequalities. Available online at http://www.optimization-online.org/DB_HTML/2006/10/1484.html
  7. Ben-Tal A, Bhadra S, Bhattacharyya C, Nath JS (2011) Chance constrained uncertain classification via robust optimization. Math Prog (Series B) 127(1):145–173 MATHCrossRefGoogle Scholar
  8. Bennett KP, Bredensteiner EJ (2000) Duality and geometry in SVM classifiers. In: Proceedings of the 17th international conference on machine learning. Kaufmann, San Francisco, pp 57–64 Google Scholar
  9. Bertsimas D, Sethuraman J (2001) Moment problems and semidefinite optimization. In: Handbook of semidefinite optimization, pp 469–509 Google Scholar
  10. Bhattacharyya C (2004) Second order cone programming formulations for feature selection. J Mach Learn Res 5:1417–1433 MATHGoogle Scholar
  11. Bhattacharyya C, Shivaswamy PK, Smola AJ (2004) A second order cone programming formulation for classifying missing data. In: Advances in neural information processing systems Google Scholar
  12. Bi J, Zhang T (2004) Support vector classification with input data uncertainty. In: Advances in neural information processing systems Google Scholar
  13. Bottou L, Chapelle O, DeCoste D, Weston J (2007) Large scale Kernel machines. MIT Press, Cambridge Google Scholar
  14. Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178 MathSciNetMATHCrossRefGoogle Scholar
  15. Erdougan E, Iyengar G (2006) An active set method for single-cone second-order cone programs. SIAM J Control Optim 17(2):459–484 CrossRefGoogle Scholar
  16. Hsieh CJ, Chang KW, Lin CJ, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the international conference on machine learning Google Scholar
  17. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323 CrossRefGoogle Scholar
  18. Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 217–226 CrossRefGoogle Scholar
  19. Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, New York Google Scholar
  20. Lanckriet GR, Ghaoui LE, Bhattacharyya C, Jordan MI (2003) A robust minimax approach to classification. J Mach Learn Res 3:555–582 MathSciNetMATHGoogle Scholar
  21. Li B, Chi M, Fan J, Xue X (2007) Support cluster machine. In: Proceedings of the international conference on machine learning Google Scholar
  22. Li DC, Fang YH (2008) An algorithm to cluster data for efficient classification of support vector machines. Expert Syst Appl 34(3):2013–2018. doi:10.1016/j.eswa.2007.02.016 CrossRefGoogle Scholar
  23. Marshall AW, Olkin I (1960) Multivariate Chebychev inequalities. Ann Math Stat 31(4):1001–1014 MathSciNetMATHCrossRefGoogle Scholar
  24. Mood AM (1950) Introduction to the theory of statistics. McGraw–Hill, New York MATHGoogle Scholar
  25. Nath JS, Bhattacharyya C, Murty MN (2006) Clustering based large margin classification: a scalable approach using SOCP formulation. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 674–679 CrossRefGoogle Scholar
  26. Nesterov Y, Nemirovskii A (1993) Interior point algorithms in convex programming. Studies in Applied Mathematics, vol 13. SIAM, Philadelphia Google Scholar
  27. Pavlov D, Chudova D, Smyth P (2000) Towards scalable support vector machines using squashing. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 295–299 CrossRefGoogle Scholar
  28. Scheffé H (1959) The analysis of variance. Wiley, New York MATHGoogle Scholar
  29. Scholkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge Google Scholar
  30. Shih K, han Chang Y, Rennie J, Karger D (2002) Not too hot, not too cold: the bundled-SVM is just right! In: Workshop on text learning, ICML Google Scholar
  31. Sturm JF (1999) Using SeDuMi 1.02, A MATLAB toolbox for optimization over symmetric cones. Optim Methods Softw 11–12:625–653 MathSciNetCrossRefGoogle Scholar
  32. Vapnik V (1998) Statistical learning theory. Wiley, New York MATHGoogle Scholar
  33. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD international conference on management of data, pp 103–114 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Saketha Nath Jagarlapudi
    • 1
    • 2
  • Aharon Ben-Tal
    • 3
  • Chiranjib Bhattacharyya
    • 1
  1. 1.Dept. of Computer Science and AutomationIndian Institute of ScienceBangaloreIndia
  2. 2.Dept. of Computer Science and Engg.Indian Institute of Technology BombayMumbaiIndia
  3. 3.Faculty of Industrial Engineering and ManagementTechnionHaifaIsrael

Personalised recommendations