Semi-supervised Feature Extraction for RNA-Seq Data Analysis

  • Jin-Xing Liu
  • Yong XuEmail author
  • Ying-Lian Gao
  • Dong Wang
  • Chun-Hou Zheng
  • Jun-Liang Shang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9227)


It is of urgency to effectively identify differentially expressed genes from RNA-Seq data. In this paper, we propose a novel method, semi-supervised feature extraction, to analyze RNA-Seq data. Our scheme is shown as follows. Firstly, we construct a graph Laplacian matrix and refine it by using labeled samples. Secondly, we find semi-supervised optimal maps by solving a generalized eigenvalue problem. Thirdly, we solve an optimal problem via joint L2,1-norm constraint to obtain a projection matrix. Finally, we identify differentially expressed genes based on the projection matrix. The results on real RNA-Seq data sets demonstrate the feasibility and effectiveness of our method.


Feature extraction L2,1-norm constraint Spectral regression RNA-Seq data analysis 



This work was supported in part by the NSFC under grant Nos. 61370163 and 61272339; China Postdoctoral Science Foundation funded project, No. 2014M560264; Shandong Provincial Natural Science Foundation, under grant Nos. ZR2013FL016 and BS2014DX004; Shenzhen Municipal Science and Technology Innovation Council (Nos. JCYJ20140417172417174, CXZZ20140904154910774 and JCYJ20140904154645958).


  1. 1.
    Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)Google Scholar
  2. 2.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Journée, M., Nesterov, Y., Richtarik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Liu, J.-X., Xu, Y., Zheng, C.-H., Wang, Y., Yang, J.-Y.: Characteristic gene selection via weighting principal components by singular values. PLoS ONE 7, e38873 (2012)CrossRefGoogle Scholar
  5. 5.
    Witten, D.M., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009)CrossRefzbMATHGoogle Scholar
  6. 6.
    Zheng, C.H., Zhang, L., Ng, V., Shiu, C.K., Huang, D.S.: Molecular pattern discovery based on penalized matrix decomposition. IEEE/ACM Trans. Comput. Biol. Bioinf. 8, 1592–1603 (2011)CrossRefGoogle Scholar
  7. 7.
    France, S.L., Douglas Carroll, J., Xiong, H.: Distance metrics for high dimensional nearest neighborhood recovery: compression and normalization. Inf. Sci. 184, 92–110 (2012)CrossRefGoogle Scholar
  8. 8.
    Cai, D., He, X., Han, J.: Spectral regression for efficient regularized subspace learning. In: IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, pp. 1–8 (2007)Google Scholar
  9. 9.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: NIPS, pp. 585–591 (2001)Google Scholar
  10. 10.
    Nie, F., Huang, H., Cai, X., Ding, C.: Efficient and robust feature selection via joint l2, 1-norms minimization. Adv. Neural Inf. Process. Syst. 23, 1813–1821 (2010)Google Scholar
  11. 11.
    Cai, D., He, X., Han, J.: SRDA: an efficient algorithm for large-scale discriminant analysis. IEEE Trans. Knowl. Data Eng. 20, 1–12 (2008)CrossRefGoogle Scholar
  12. 12.
    Li, J., Witten, D.M., Johnstone, I.M., Tibshirani, R.: Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics 13, 523–538 (2012)CrossRefGoogle Scholar
  13. 13.
    Bullard, J.H., Purdom, E., Hansen, K.D., Dudoit, S.: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinf. 11, 94 (2010)CrossRefGoogle Scholar
  14. 14.
    Tonner, P., Srinivasasainagendra, V., Zhang, S., Zhi, D.: Detecting transcription of ribosomal protein pseudogenes in diverse human tissues from RNA-seq data. BMC Genomics 13, 412 (2012)CrossRefGoogle Scholar
  15. 15.
    Frazee, A., Langmead, B., Leek, J.: ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinf. 12, 449 (2011)CrossRefGoogle Scholar
  16. 16.
    Chen, J., Bardes, E.E., Aronow, B.J., Jegga, A.G.: ToppGene suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jin-Xing Liu
    • 1
    • 2
  • Yong Xu
    • 1
    Email author
  • Ying-Lian Gao
    • 3
  • Dong Wang
    • 2
  • Chun-Hou Zheng
    • 4
  • Jun-Liang Shang
    • 2
  1. 1.Bio-Computing Research Center, Shenzhen Graduate SchoolHarbin Institute of TechnologyShenzhenChina
  2. 2.School of Information Science and EngineeringQufu Normal UniversityRizhaoChina
  3. 3.Library of Qufu Normal UniversityQufu Normal UniversityRizhaoChina
  4. 4.School of Mechanical Engineering and AutomationAnhui UniversityHefeiChina

Personalised recommendations