Learning to classify with missing and corrupted features
 Ofer Dekel,
 Ohad Shamir,
 Lin Xiao
 … show all 3 hide
Abstract
A common assumption in supervised machine learning is that the training examples provided to the learning algorithm are statistically identical to the instances encountered later on, during the classification phase. This assumption is unrealistic in many realworld situations where machine learning techniques are used. We focus on the case where features of a binary classification problem, which were available during the training phase, are either deleted or become corrupted during the classification phase. We prepare for the worst by assuming that the subset of deleted and corrupted features is controlled by an adversary, and may vary from instance to instance. We design and analyze two novel learning algorithms that anticipate the actions of the adversary and account for them when training a classifier. Our first technique formulates the learning problem as a linear program. We discuss how the particular structure of this program can be exploited for computational efficiency and we prove statistical bounds on the risk of the resulting classifier. Our second technique addresses the robust learning problem by combining a modified version of the Perceptron algorithm with an onlinetobatch conversion technique, and also comes with statistical generalization guarantees. We demonstrate the effectiveness of our approach with a set of experiments.
 Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository.
 Bennett, K. P. (1999) Combining support vector and mathematical programming methods for classification. Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp. 307326
 Boyd, S., Vandenberghe, L. (2004) Convex optimization. Cambridge University Press, Cambridge
 Carr, R. D., & Lancia, G. (2000). Compact vs. exponentialsize LP relaxations (Technical Report SAND20002170). SANDIA Report, September 2000.
 CesaBianchi, N., Conconi, A., Gentile, C. (2004) On the generalization ability of online learning algorithms. IEEE Transactions on Information Theory 50: pp. 20502057 CrossRef
 Dalvi, N., Domingos, P., Mausam, , Sanghai, S., Verma, D. (2004) Adversarial classification. Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp. 99108
 Dekel, O., Shamir, O. (2008). Learning to classify with missing and corrupted features. In Proceedings of the twentyfifth international conference on machine learning.
 Dietterich, T. G., Bakiri, G. (1995) Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research 2: pp. 263286
 Gamble, E. S., Macskassy, S. A., & Minton, S. (2007). Classification with pedigree and its applicability to record linkage. In Workshop on textmining & linkanalysis.
 Globerson, A., & Roweis, S. (2006). Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd international conference on machine learning (pp. 353–360).
 Hastie, T., Tibshirani, R., Friedman, J. (2001) The elements of statistical learning. Springer, Berlin
 Joachims, T. Making largescale support vector machine learning practical. In: Schölkopf, B., Burges, C., Smola, A. eds. (1998) Advances in kernel methods—support vector learning. MIT Press, Cambridge
 LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. (1998) Gradientbased learning applied to document recognition. Proceedings of the IEEE 86: pp. 22782324 CrossRef
 Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linearthreshold learning using winnow. In Proceedings of the fourth annual workshop on computational learning theory (pp. 147–156).
 Lowd, D., & Meek, C. (2005). Good word attacks on statistical spam filters. In Proceedings of the second conference on email and antispam (CEAS).
 McAllester, D. A. (2003). Simplified PACBayesian margin bounds. In Proceedings of the sixteenth annual conference on computational learning theory (pp. 203–215).
 Rosenblatt, F. (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65: pp. 386407 CrossRef
 Teo, C.H., Globerson, A., Roweis, S., & Smola, A. J. (2008). Convex learning with invariances. In Advances in neural information processing systems 21.
 Trefethen, L. N., Bau, D. (1997) Numerical linear algebra. Philadelphia, SIAM
 Vapnik, V. N. (1998) Statistical learning theory. Wiley, New York
 Wittel, G., & Wu, S. (2004). On attacking statistical spam filters. In Proceedings of the first conference on email and antispam (CEAS).
 Wright, S. J. (1997) Primaldual interiorpoint methods. Philadelphia, SIAM
 Title
 Learning to classify with missing and corrupted features
 Journal

Machine Learning
Volume 81, Issue 2 , pp 149178
 Cover Date
 20101101
 DOI
 10.1007/s1099400951248
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Adversarial environment
 Binary classification
 Deleted features
 Industry Sectors
 Authors

 Ofer Dekel ^{(1)}
 Ohad Shamir ^{(2)}
 Lin Xiao ^{(1)}
 Author Affiliations

 1. Microsoft Research, One Microsoft Way, Redmond, 98052, WA, USA
 2. The Hebrew University, Jerusalem, 91904, Israel