Feature Selection by Transfer Learning with Linear Regularized Models

Helleputte, Thibault; Dupont, Pierre

doi:10.1007/978-3-642-04180-8_52

Thibault Helleputte^22,23 &
Pierre Dupont^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2704 Accesses
11 Citations

Abstract

This paper presents a novel feature selection method for classification of high dimensional data, such as those produced by microarrays. It includes a partial supervision to smoothly favor the selection of some dimensions (genes) on a new dataset to be classified. The dimensions to be favored are previously selected from similar datasets in large microarray databases, hence performing inductive transfer learning at the feature level. This technique relies on a feature selection method embedded within a regularized linear model estimation. A practical approximation of this technique reduces to linear SVM learning with iterative input rescaling. The scaling factors depend on the selected dimensions from the related datasets. The final selection may depart from those whenever necessary to optimize the classification objective. Experiments on several microarray datasets show that the proposed method both improves the selected gene lists stability, with respect to sampling variation, as well as the classification performances.

Download to read the full chapter text

Chapter PDF

Forward Iterative Feature Selection Based on Laplacian Score

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

Selecting Features from Foreign Classes

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. PNAS 103(15), 5923–5928 (2006)
Article Google Scholar
Edgar, R., Barrett, T.: Ncbi geo standards and services for microarray data. Nature Biotechnology 24, 1471–1472 (2006)
Article Google Scholar
Parkinson, H., Kapushesky, M., Kolesnikov, N., Rustici, G., Shojatalab, M., Abeygunawardena, N., Berube, H., Dylag, M., Emam, I., Farne, A., Holloway, E., Lukk, M., Malone, J., Mani, R., Pilicheva, E., Rayner, T.F., Rezwan, F., Sharma, A., Williams, E., Bradley, X.Z., Adamusiak, T., Brandizi, M., Burdett, T., Coulson, R., Krestyaninova, M., Kurnosov, P., Maguire, E., Neogi, S.G., Rocca-Serra, P., Sansone, S.-A., Sklyar, N., Zhao, M., Sarkans, U., Brazma, A.: ArrayExpress update–from an archive of functional genomics experiments to the atlas of gene expression. Nucl. Acids Res. 37(suppl-1), D868–D872 (2009)
Article Google Scholar
Silver, D.L., Bennett, K.P.: Guest editor’s introduction: special issue on inductive transfer learning. Machine Learning 73, 215–220 (2008)
Article Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. Technical Report HKUST-CS08-08, Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China (November 2008)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: NIPS, pp. 41–48 (2006)
Google Scholar
Daumé III, H., Marcu, D.: Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research 26, 101–126 (2007)
MathSciNet MATH Google Scholar
Wang, Z., Song, Y., Zhang, C.: Transferred dimensionality reduction. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 550–565. Springer, Heidelberg (2008)
Chapter Google Scholar
Liao, X., Xue, Y., Carin, L.: Logistic regression with an auxiliray data source. In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 505–512 (2005)
Google Scholar
Huang, J., Smola, A., Gretton, A., Borgwardt, K., Schölkopf, B.: Correcting sample selection bias by unlabeled data. In: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp. 601–608. MIT Press, Cambridge (2007)
Google Scholar
Dai, W., Yang, Q., Xue, G., Yu, Y.: Selft-thaught clustering. In: Proceedings of the 25th International Conference of Machine Learning, pp. 200–207 (2008)
Google Scholar
Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 109–117 (2004)
Google Scholar
Lawrence, N., Platt, C.: Learning to learn with the informative vector machine. In: Proceedings of the 21st International Conference on Machine Learning, p. 65. ACM, New York (2004)
Google Scholar
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, July 2006, pp. 120–128. Association for Computational Linguistics (2006)
Google Scholar
Mierswa, I., Wurst, M.: Efficient case based feature construction. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 641–648. Springer, Heidelberg (2005)
Chapter Google Scholar
Guyon, I., Elisseef, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Krishnapuram, B., Carin, L., Hartemink, A.: 14: Gene Expression Analysis: Joint Feature Selection and Classifier Design. In: Kernel Methods in Computational Biology, pp. 299–317. MIT Press, Cambridge (2004)
Google Scholar
Ein-Dor, L., Kela, I., Getz, G., Givol, D., Domany, E.: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21 (2005)
Google Scholar
Mukherjee, S.: 9: Classifying Microarray Data Using Support Vector Machines. In: A Practical Approach to Microarray Data Analysis, pp. 166–185. Springer, Heidelberg (2003)
Chapter Google Scholar
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems, pp. 668–674 (2000)
Google Scholar
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46, 131–159 (2002)
Article MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Article MATH Google Scholar
Weston, J., Elisseef, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research 3, 1439–1461 (2003)
MathSciNet MATH Google Scholar
Helleputte, T., Dupont, P.: Partially supervised feature selection with regularized linear models. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Google Scholar
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Article Google Scholar
Chandran, U., Ma, C., Dhir, R., Bisceglia, M., Lyons-Weiler, M., Liang, W., Michalopoulos, G., Becich, M., Monzon, F.: Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer 7(1), 64 (2007)
Article Google Scholar
Welsh, J.B., Sapinoso, L.M., Su, A.I., Kern, S.G., Wang-Rodriguez, J., Moskaluk, C.A., Frierson Jr., F.H., Hampton, G.M.: Analysis of Gene Expression Identifies Candidate Markers and Pharmacological Targets in Prostate Cancer. Cancer Res 61(16), 5974–5978 (2001)
Google Scholar
Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th International Multi-Conference: Artificial Intelligence and Applications, Anaheim, CA, USA, pp. 390–395. ACTA Press (2007)
Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52, 239–281 (2003)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computing Science and Engineering Dept., University of Louvain, Reaumur Building, Place Sainte Barbe 2, B-1348, Louvain-la-Neuve, Belgium
Thibault Helleputte & Pierre Dupont
Machine Learning Group, University of Louvain, Belgium
Thibault Helleputte & Pierre Dupont

Authors

Thibault Helleputte
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Dupont
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Helleputte, T., Dupont, P. (2009). Feature Selection by Transfer Learning with Linear Regularized Models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_52

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Selection by Transfer Learning with Linear Regularized Models

Abstract

Chapter PDF

Similar content being viewed by others

Forward Iterative Feature Selection Based on Laplacian Score

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

Selecting Features from Foreign Classes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Feature Selection by Transfer Learning with Linear Regularized Models

Abstract

Chapter PDF

Similar content being viewed by others

Forward Iterative Feature Selection Based on Laplacian Score

IFS: An Incremental Feature Selection Method to Classify High-Dimensional Data

Selecting Features from Foreign Classes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation