Learning to classify with missing and corrupted features
 Ofer Dekel,
 Ohad Shamir,
 Lin Xiao
 … show all 3 hide
Abstract
A common assumption in supervised machine learning is that the training examples provided to the learning algorithm are statistically identical to the instances encountered later on, during the classification phase. This assumption is unrealistic in many realworld situations where machine learning techniques are used. We focus on the case where features of a binary classification problem, which were available during the training phase, are either deleted or become corrupted during the classification phase. We prepare for the worst by assuming that the subset of deleted and corrupted features is controlled by an adversary, and may vary from instance to instance. We design and analyze two novel learning algorithms that anticipate the actions of the adversary and account for them when training a classifier. Our first technique formulates the learning problem as a linear program. We discuss how the particular structure of this program can be exploited for computational efficiency and we prove statistical bounds on the risk of the resulting classifier. Our second technique addresses the robust learning problem by combining a modified version of the Perceptron algorithm with an onlinetobatch conversion technique, and also comes with statistical generalization guarantees. We demonstrate the effectiveness of our approach with a set of experiments.
 Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository.
 Bennett, K. P. (1999). Combining support vector and mathematical programming methods for classification. In Advances in kernel methods: support vector learning (pp. 307–326). Cambridge: MIT Press.
 Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
 Carr, R. D., & Lancia, G. (2000). Compact vs. exponentialsize LP relaxations (Technical Report SAND20002170). SANDIA Report, September 2000.
 CesaBianchi, N., Conconi, A., & Gentile, C. (2004). On the generalization ability of online learning algorithms. IEEE Transactions on Information Theory, 50(9), 2050–2057. CrossRef
 Dalvi, N., Domingos, P., Mausam, Sanghai, S., & Verma, D. (2004). Adversarial classification. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD) (pp. 99–108). New York: ACM.
 Dekel, O., Shamir, O. (2008). Learning to classify with missing and corrupted features. In Proceedings of the twentyfifth international conference on machine learning.
 Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
 Gamble, E. S., Macskassy, S. A., & Minton, S. (2007). Classification with pedigree and its applicability to record linkage. In Workshop on textmining & linkanalysis.
 Globerson, A., & Roweis, S. (2006). Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd international conference on machine learning (pp. 353–360).
 Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin: Springer.
 Joachims, T. (1998). Making largescale support vector machine learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning. Cambridge: MIT Press.
 LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. CrossRef
 Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linearthreshold learning using winnow. In Proceedings of the fourth annual workshop on computational learning theory (pp. 147–156).
 Lowd, D., & Meek, C. (2005). Good word attacks on statistical spam filters. In Proceedings of the second conference on email and antispam (CEAS).
 McAllester, D. A. (2003). Simplified PACBayesian margin bounds. In Proceedings of the sixteenth annual conference on computational learning theory (pp. 203–215).
 Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–407. CrossRef
 Teo, C.H., Globerson, A., Roweis, S., & Smola, A. J. (2008). Convex learning with invariances. In Advances in neural information processing systems 21.
 Trefethen, L. N., & Bau, D. (1997). Numerical linear algebra. SIAM: Philadelphia.
 Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
 Wittel, G., & Wu, S. (2004). On attacking statistical spam filters. In Proceedings of the first conference on email and antispam (CEAS).
 Wright, S. J. (1997). Primaldual interiorpoint methods. SIAM: Philadelphia.
 Title
 Learning to classify with missing and corrupted features
 Journal

Machine Learning
Volume 81, Issue 2 , pp 149178
 Cover Date
 20101101
 DOI
 10.1007/s1099400951248
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Adversarial environment
 Binary classification
 Deleted features
 Industry Sectors
 Authors

 Ofer Dekel ^{(1)}
 Ohad Shamir ^{(2)}
 Lin Xiao ^{(1)}
 Author Affiliations

 1. Microsoft Research, One Microsoft Way, Redmond, 98052, WA, USA
 2. The Hebrew University, Jerusalem, 91904, Israel