Asuncion, A., & Newman, D. J. (2007). *UCI machine learning repository*.

Bennett, K. P. (1999). Combining support vector and mathematical programming methods for classification. In *Advances in kernel methods: support vector learning* (pp. 307–326). Cambridge: MIT Press.

Boyd, S., & Vandenberghe, L. (2004).

*Convex optimization*. Cambridge: Cambridge University Press.

MATH
Carr, R. D., & Lancia, G. (2000). *Compact vs. exponential-size LP relaxations* (Technical Report SAND2000-2170). SANDIA Report, September 2000.

Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2004). On the generalization ability of on-line learning algorithms.

*IEEE Transactions on Information Theory*,

*50*(9), 2050–2057.

CrossRefMathSciNet
Dalvi, N., Domingos, P., Mausam, Sanghai, S., & Verma, D. (2004). Adversarial classification. In *Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD)* (pp. 99–108). New York: ACM.

Dekel, O., Shamir, O. (2008). Learning to classify with missing and corrupted features. In *Proceedings of the twenty-fifth international conference on machine learning*.

Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes.

*Journal of Artificial Intelligence Research*,

*2*, 263–286.

MATH
Gamble, E. S., Macskassy, S. A., & Minton, S. (2007). Classification with pedigree and its applicability to record linkage. In *Workshop on text-mining & link-analysis*.

Globerson, A., & Roweis, S. (2006). Nightmare at test time: robust learning by feature deletion. In *Proceedings of the 23rd international conference on machine learning* (pp. 353–360).

Hastie, T., Tibshirani, R., & Friedman, J. (2001).

*The elements of statistical learning*. Berlin: Springer.

MATH
Joachims, T. (1998). Making large-scale support vector machine learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), *Advances in kernel methods—support vector learning*. Cambridge: MIT Press.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition.

*Proceedings of the IEEE*,

*86*(11), 2278–2324.

CrossRef
Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow. In *Proceedings of the fourth annual workshop on computational learning theory* (pp. 147–156).

Lowd, D., & Meek, C. (2005). Good word attacks on statistical spam filters. In *Proceedings of the second conference on email and anti-spam (CEAS)*.

McAllester, D. A. (2003). Simplified PAC-Bayesian margin bounds. In *Proceedings of the sixteenth annual conference on computational learning theory* (pp. 203–215).

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain.

*Psychological Review*,

*65*, 386–407.

CrossRefMathSciNet
Teo, C.-H., Globerson, A., Roweis, S., & Smola, A. J. (2008). Convex learning with invariances. In *Advances in neural information processing systems 21*.

Trefethen, L. N., & Bau, D. (1997).

*Numerical linear algebra*. SIAM: Philadelphia.

MATH
Vapnik, V. N. (1998).

*Statistical learning theory*. New York: Wiley.

MATH
Wittel, G., & Wu, S. (2004). On attacking statistical spam filters. In *Proceedings of the first conference on email and anti-spam (CEAS)*.

Wright, S. J. (1997).

*Primal-dual interior-point methods*. SIAM: Philadelphia.

MATH