Sequential approaches for learning datumwise sparse representations
 Gabriel DulacArnold,
 Ludovic Denoyer,
 Philippe Preux,
 Patrick Gallinari
 … show all 4 hide
Abstract
In supervised classification, data representation is usually considered at the dataset level: one looks for the “best” representation of data assuming it to be the same for all the data in the data space. We propose a different approach where the representations used for classification are tailored to each datum in the data space. One immediate goal is to obtain sparse datumwise representations: our approach learns to build a representation specific to each datum that contains only a small subset of the features, thus allowing classification to be fast and efficient. This representation is obtained by way of a sequential decision process that sequentially chooses which features to acquire before classifying a particular point; this process is learned through algorithms based on Reinforcement Learning.
The proposed method performs well on an ensemble of mediumsized sparse classification problems. It offers an alternative to global sparsity approaches, and is a natural framework for sequential classification problems. The method extends easily to a whole family of sparsityrelated problem which would otherwise require developing specific solutions. This is the case in particular for costsensitive and limitedbudget classification, where feature acquisition is costly and is often performed sequentially. Finally, our approach can handle nondifferentiable loss functions or combinatorial optimization encountered in more complex feature selection problems.
 Breiman, L., Friedman, J., Olshen, R., Stone, C. (1984) Classification and regression trees. Wadsworth, Belmont
 Daumé, H., Marcu, D. (2005) Learning as search optimization: approximate large margin methods for structured prediction. Proceedings of ICML. ACM, New York, pp. 169176
 DulacArnold, G., Denoyer, L., Gallinari, P. (2011) Text classification: a sequential reading approach. Proceedings of ECIR. Springer, Berlin, pp. 411423
 DulacArnold, G., Denoyer, L., Preux, P., Gallinari, P. (2012) Fast reinforcement learning with large action sets using errorcorrecting output codes for MDP factorization. Proc. of ECML.
 Efron, B., Hastie, T., Johnstone, I. (2004) Least angle regression. Annals of Statistics 52: pp. 13901400
 Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C. (2008) LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research 9: pp. 18711874
 Frank, A., & Asuncion, A. (2010). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
 Gaudel, R., Sebag, M. (2010) Feature selection as a oneplayer game. Proceedings of ICML.
 Girgin, S., Preux, P. (2008) Feature discovery in reinforcement learning using genetic programming. Proceedings of European conference on genetic programming.
 Greiner, R. (2002) Learning costsensitive active classifiers. Artificial Intelligence 139: pp. 137174 CrossRef
 Guyon, I., Elisseefi, A. (2003) An introduction to variable and feature selection. Journal of Machine Learning Research 3: pp. 11571182
 Guyon, I., Gunn, S., BenHur, A. (2005) Result analysis of the NIPS 2003 feature selection challenge. Proceedings of NIPS.
 HarPeled, S., Roth, D., Zimak, D. (2002) Constraint classification: a new approach to multiclass classification. Proceedings of NIPS.
 Huang, J., Zhang, T., Metaxas, D. (2009) Learning with structured sparsity. Proceedings of ICML.
 Jenatton, R., Audibert, J. Y., Bach, F. (2011) Structured variable selection with sparsityinducing norms. Journal of Machine Learning Research 12: pp. 27772824
 Ji, S., Carin, L. (2007) Costsensitive feature acquisition and classification. Pattern Recognition 40: pp. 14741485 CrossRef
 Kanani, P. H., McCallum, A. K. (2012) Selecting actions for resourcebounded information extraction using reinforcement learning. Proceedings of ACM international conference on web search and data mining, WSDM’12. ACM, New York, pp. 253262 CrossRef
 Kapoor, A., Greiner, R. (2005) Learning and classifying under hard budgets. Proceedings ECML. Springer, Berlin, pp. 170181
 Lagoudakis, M. G., Parr, R. (2003) Reinforcement learning as classification: leveraging modern classifiers. Proceedings of ICML.
 Lazaric, A., Ghavamzadeh, M., Munos, R. (2010) Analysis of a classificationbased policy iteration algorithm. Proceedings of ICML. pp. 607614
 LeCun, Y., Bottou, L., Bengio, Y. (1998) Gradientbased learning applied to document recognition. Proceedings of the IEEE 86: pp. 22782324 CrossRef
 Louradour, J., Kermorvant, C. (2011) Sampledependent feature selection for faster document image categorization. Proceedings of ICDAR. pp. 309313
 Maes, F., Denoyer, L., Gallinari, P. (2009) Structured prediction with reinforcement learning. Machine Learning Journal 77: pp. 271301 CrossRef
 Póczos, B., AbbasiYadkori, Y., Szepesvári, C., Greiner, R., Sturtevant, N. (2009) Learning when to stop thinking and do something!. Proceedings of ICML.
 Puterman, M. L. (1994) Markov decision processes: discrete stochastic dynamic programming. WileyInterscience, New York
 Quinlan, J. (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
 Rückstieß, T., Osendorfer, C., Smagt, P. (2011) Sequential feature selection for classification. Australasian conference on artificial intelligence.
 Sutton, R., Barto, A. (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
 Tibshirani, R. (1994) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B 58: pp. 267288
 Turney, P. (1995) Costsensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. The Journal of Artificial Intelligence Research 2: pp. 369409
 Yuan, M., Lin, Y. (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B 68: pp. 4967
 Zou, H., Hastie, T. (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B. Statistical Methodology 67: pp. 301320 CrossRef
 Title
 Sequential approaches for learning datumwise sparse representations
 Journal

Machine Learning
Volume 89, Issue 12 , pp 87122
 Cover Date
 20121001
 DOI
 10.1007/s1099401253067
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Springer US
 Additional Links
 Topics
 Keywords

 Classification
 Features selection
 Sparsity
 Sequential models
 Reinforcement learning
 Industry Sectors
 Authors

 Gabriel DulacArnold ^{(1)}
 Ludovic Denoyer ^{(1)}
 Philippe Preux ^{(2)}
 Patrick Gallinari ^{(1)}
 Author Affiliations

 1. UPMC, LIP6, Université Pierre et Marie Curie, Case 169, 4 Place Jussieu, 75005, Paris, France
 2. LIFL (UMR CNRS) & INRIA Lille NordEurope, Université de Lille, Villeneuve d’Ascq, France