Gene Extraction Based on Sparse Singular Value Decomposition

  • Xiangzhen Kong
  • Jinxing LiuEmail author
  • Chunhou Zheng
  • Junliang Shang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9771)


In this paper, we develop a new feature extraction method based on sparse singular value decomposition (SSVD). We apply SSVD algorithm to select the characteristic genes from Colorectal Cancer (CRC) genomic dataset, and then the differentially expressed genes obtained are evaluated by the tools based on Gene Ontology. As a gene extraction method, SSVD is also compared with some existing feature extraction methods such as independent component analysis (ICA), the p-norm robust feature extraction (PREE) and sparse principal component analysis (SPCA). The experimental results show that SSVD method outperforms the existing algorithms.


Singular value decomposition Gene extraction Sparse constraint Gene Ontology 



This work was supported in part by the grants of the National Science Foundation of China, Nos. 61572284, 61502272, 61572283; Shenzhen Municipal Science and Technology Innovation Council, No. JCYJ20140417172417174; Natural Science Foundation of Shandong Province, No. BS2014DX004.


  1. 1.
    Lee, D., Lee, W., Lee, Y., Pawitan, Y.: Super-sparse principal component analyses for high-throughput genomic data. BMC Bioinf. 11(1), 296 (2010)CrossRefGoogle Scholar
  2. 2.
    Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Liu, J.X., Wang, Y.T., Zheng, C.H.: Robust PCA based method for discovering differentially expressed genes. BMC Bioinf. 14(Suppl 8), S3 (2013)CrossRefGoogle Scholar
  4. 4.
    Liu, J.X., Zheng, C.H., Xu, Y.: Extracting plants core genes responding to abiotic stresses by penalized matrix decomposition. Comput. Biol. Med. 42(5), 582–589 (2012)CrossRefGoogle Scholar
  5. 5.
    Huang, D.S., Zheng, C.H.: Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15), 1855–1862 (2006)CrossRefGoogle Scholar
  6. 6.
    Liu, J., Liu, J.X., Gao, Y.L., Kong, X.Z., Wang, D.: A p-norm robust feature extraction method for identifying differentially expressed genes. PLoSONE 10(7), e0133124 (2015)CrossRefGoogle Scholar
  7. 7.
    Lee, M., Shen, H.P., Huang, J.Z., Marron, J.S.: Biclustering via sparse value decomposition. Biometrics 66, 1087–1095 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1, 211–218 (1936)CrossRefzbMATHGoogle Scholar
  9. 9.
    Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(475), 1418–1429 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Zou, H., Hastie, T., Tibshirani, R.: On the “degrees of freedom” of the lasso. Ann. Stat. 35, 2173–2192 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Kilian, J., Whitehead, D., Horak, J., Wanke, D., Weinl, S., Batistic, O.: The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant J. 50(2), 347–363 (2007)CrossRefGoogle Scholar
  13. 13.
    Zheng, C.H., Huang, D.S., Zhang, L., Kong, X.Z.: Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans. Inf. Technol. Biomed. 13(4), 599–607 (2009)CrossRefGoogle Scholar
  14. 14.
    Sartor, M.A., Mahavisno, V., Keshamouni, V.G., Cavalcoli, J., Wright, Z., Karnovsky, A., Kuick, R., Jagadish, H., Mirel, B., Weymouth, T.: ConceptGen: a gene set enrichment and gene set relation mapping tool. Bioinformatics 26(4), 456–463 (2010)CrossRefGoogle Scholar
  15. 15.
    Boyle, E.I., Weng, S.A., Gollub, J., Jin, H., Botstein, D., Cherry, J.M., Sherlock, G.: GO: termfinder-open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 20(18), 3710–3715 (2004)CrossRefGoogle Scholar
  16. 16.
    Chen, J., Bardes, E.E., Aronow, B.J., Jegga, A.G.: ToppGene suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37(suppl 2), W305–W311 (2009)CrossRefGoogle Scholar
  17. 17.
    Wang, E.T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S.F., Schroth, G.P., Burge, C.B.: Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221), 470–476 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Xiangzhen Kong
    • 1
  • Jinxing Liu
    • 1
    • 2
    Email author
  • Chunhou Zheng
    • 1
  • Junliang Shang
    • 1
  1. 1.School of Information Science and EngineeringQufu Normal UniversityRizhaoChina
  2. 2.Bio-Computing Research Center, Shenzhen Graduate SchoolHarbin Institute of TechnologyShenzhenChina

Personalised recommendations