Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 285))

  • 3036 Accesses

Abstract

This paper proposes an unsupervised gene selection algorithm based on the singular value decomposition (SVD) to determine the most informative genes from a cancer gene expression dataset. These genes are important for many tasks including cancer clustering and classification, data compression, and samples characterization. The proposed algorithm is designed by making use of the SVD’s clustering capability to find the natural groupings of the genes. The most informative genes are then determined by selecting the closest genes to the corresponding cluster’s centers. These genes are then used to construct a new (pruned) dataset of the same samples but with less dimensionality. The experimental results using some standard datasets in cancer research show that the proposed algorithm can reliably improve performances of the SVD and kmeans algorithm in cancer clustering tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gao, Y., Church, G.: Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics 21(21), 3970–3975 (2005)

    Google Scholar 

  2. Dueck, D., et al.: Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics 21(1), 145–151 (2005)

    Google Scholar 

  3. Brunet, J.P., et al.: Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl Acad. Sci. USA 101(12), 4164–4169 (2003)

    Google Scholar 

  4. Kim, H., Park, H.: Sparse non-negative matrix factorizations via alternating non-negativity constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)

    Google Scholar 

  5. Carmona-Saez, et al.: Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 7(78) (2006)

    Google Scholar 

  6. Inamura, K., et al.: Two subclasses of lung squamous cell carcinoma with different gene expression profiles and prognosis identified by hierarchical clustering and non-negative matrix factorization. Oncogene (24), 7105–7113 (2005)

    Google Scholar 

  7. Fogel, P., et al.: Inferential, robust non-negative matrix factorization analysis of microarray data. Bioinformatics 23(1), 44–49 (2007)

    Google Scholar 

  8. Zheng, C.H., et al.: Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Transactions on Information Technology in Biomedicine 13(4), 599–607 (2009)

    Google Scholar 

  9. Wang, J.J.Y., et al.: Non-negative matrix factorization by maximizing correntropy for cancer clustering. BMC Bioinformatics 14(107) (2013)

    Google Scholar 

  10. Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Google Scholar 

  11. Guyon, I., et al.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)

    Google Scholar 

  12. Yuvaraj, N., Vivekanandan, P.: An efficient SVM based tumor classification with symmetry non-negative matrix factorization using gene expression data. In Int’l Conf. on Information Communication and Embedded Systems, pp. 761–768 (2013)

    Google Scholar 

  13. Pirooznia, M., et al.: A comparative study of different machine learning methods on microarray gene expression data. BMC Genomics 9(Suppl 1), S13 (2008)

    Google Scholar 

  14. Liu, X., et al.: An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 6 (2005)

    Google Scholar 

  15. Wang, L., et al.: Accurate cancer classification using expressions of very few genes. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(1), 40–53 (2007)

    Google Scholar 

  16. Chuang, L.Y., et al.: Improved binary PSO for feature selection using gene expression data. Computational Biology and Chemistry 32(1), 29–37 (2008)

    Google Scholar 

  17. Mitra, P., Majumder, D.D.: Feature selection and gene clustering from gene expression data. In 17th Int’l Conf. on Pattern Recognition, pp.343–346 (2004)

    Google Scholar 

  18. Furey, T.S., et al.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Google Scholar 

  19. Moon, S., Qi, H.: Hybrid dimensionality reduction method based on support vector machine and independent component analysis. IEEE Transactions on Neural Networks and Learning Systems 23(5), 749–761 (2012)

    Google Scholar 

  20. Lee, Y., Lee, C.K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9), 1132–1139 (2003)

    Google Scholar 

  21. Zhang, X., et al.: Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7(197) (2006)

    Google Scholar 

  22. Lu, Y., Han, J.: Cancer classification using gene expression data. Information Systems 28(4), 243–268 (2003)

    Google Scholar 

  23. Zhang, H.H., et al.: Gene selection using support vector machines with non-convex penalty. Bioinformatics 22(1), 88–95 (2006)

    Google Scholar 

  24. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In 7th ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 269–274 (2001)

    Google Scholar 

  25. Drineas, et al.: Clustering large graphs via the singular value decomposition. Machine Learning 56(1-3), 9–33 (2004)

    Google Scholar 

  26. Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936)

    Google Scholar 

  27. Golub, G.H., Kahan, W.: Calculating the singular values and pseudo-inverse of a matrix. J. SIAM Numerical Analysis 2(2), 205–224 (1965)

    Google Scholar 

  28. Golub, G.H., van Loan, C.F.: Matrix computations 3rd edition. Johns Hopkins University Press (1996)

    Google Scholar 

  29. Souto, M.C.P., et al.: Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9(497) (2008)

    Google Scholar 

  30. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)

    Google Scholar 

  31. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)

    Google Scholar 

  32. Vinh, N.X., et al.: Information theoretic measures for clustering comparison: Is a correction for chance necessary? In 26th Annual Int’l Conf. on Machine Learning, pp. 1073–1080 (2009)

    Google Scholar 

Download references

Acknowledgments

The author would like to thank the reviewers for useful comments. This research was supported by Ministry of Higher Education of Malaysia and Universiti Teknologi Malaysia under Exploratory Research Grant Scheme R.J130000.7828.4L095.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andri Mirzal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Singapore

About this paper

Cite this paper

Mirzal, A. (2014). SVD Based Gene Selection Algorithm. In: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore. https://doi.org/10.1007/978-981-4585-18-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-4585-18-7_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-4585-17-0

  • Online ISBN: 978-981-4585-18-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics