Signature Selection for Grouped Features with a Case Study on Exon Microarrays
When features are grouped, it is desirable to perform feature selection groupwise in addition to selecting individual features. It is typically the case in data obtained by modern high-throughput genomic profiling technologies such as exon microarrays, which measure the amount of gene expression in fine resolution. Exons are disjoint subsequences corresponding to coding regions in genes, and exon microarrays enable us to study the event of different usage of exons, called alternative splicing, which is presumed to contribute to development of diseases. To identify such events, all exons that belong to a relevant gene may have to be selected, perhaps with different weights assigned to them to detect most relevant ones. In this chapter we discuss feature selection methods to handle grouped features. A popular shrinkage method, lasso, and its variants will be our focus, that are based on regularized regression with generalized linear models. Data from exon microarrays will be used for a case study.
KeywordsPenalized regression Lasso Group lasso Sparsity Convex regularization
This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project C1.
- 2.Bach, F.R.: Bolasso: model consistent Lasso estimation through the bootstrap. In: The 25th International Conference on Machine Learning, pp. 33–40 (2008)Google Scholar
- 3.Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
- 10.Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)Google Scholar
- 11.Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning, 2nd ed. 2009. corr. 10th printing 2013 edn. Springer (2009)Google Scholar
- 12.Jacob, L., Obozinski, G., Vert, J.P.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th International Conference on Machine Learning, pp. 433–440. Montreal, Quebec, (2009)Google Scholar
- 13.Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Network flow algorithms for structured sparsity. In: Advances in Neural Information Processing Systems, vol. 23, pp. 1558–1566. MIT Press (2010)Google Scholar
- 19.Mestdagh, P., Boström, A.K., Impens, F., Fredlund, E., Peer, G.V., Antonellis, P.D., von Stedingk, K., Ghesquière, B., Schulte, S., Dews, M., Thomas-Tikhonenko, A., Schulte, J.H., Zollo, M., Schramm, A., Gevaert, K., Axelson, H., Speleman, F., Vandesompele, J.: The miR-17-92 microRNA cluster regulates multiple components of the TGF-\(\beta \) pathway in neuroblastoma. Mol. Cell 40(5), 762–773 (2010)CrossRefGoogle Scholar
- 21.Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
- 25.Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group Lasso. In: Advances in Neural Information Processing Systems, vol. 24, pp. 352–360. MIT Press (2011)Google Scholar