Skip to main content
Log in

Gene selection using independent variable group analysis for tumor classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Microarrays are capable of detecting the expression levels of thousands of genes simultaneously. So, gene expression data from DNA microarray are characterized by many measured variables (genes) on only a few samples. One important application of gene expression data is to classify the samples. In statistical terms, the very large number of predictors or variables compared to small number of samples makes most of classical “class prediction” methods unemployable. Generally, this problem can be avoided by selecting only the relevant features or extracting new features containing the maximal information about the class label from the original data. In this paper, a new method for gene selection based on independent variable group analysis is proposed. In this method, we first used t-statistics method to select a part of genes from the original data. Then, we selected the key genes from the selected genes for tumor classification using IVGA. Finally, we used SVM to classify tumors based on the key genes selected using IVGA. To validate the efficiency, the proposed method is applied to classify three different DNA microarray data sets. The prediction results show that our method is efficient and feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Alhoniemi E, Honkela A, Lagus K, Seppä J, Wagner P, Valpola H (2006) Compact modeling of data using independent variable group analysis. Technical Report E3, Helsinki University of Technology, Publications in Computer and Information Science, Espoo, Finland

  2. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750

    Article  Google Scholar 

  3. Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99:6562–6566

    Article  MATH  Google Scholar 

  4. Bae K, Mallick BK (2004) Gene selection using a two-level hierarchical Bayesian model. Bioinformatics 20:3423–3430

    Article  Google Scholar 

  5. Caló DG, Galibemberti G, Pillati M, Viroli C (2005) Variable selection in cell classification problems: a strategy based on independent component analysis. In: Vichi M, Monari P, Mignani S, Montanari A (eds) New development in classification and data analysis. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 21–30

    Google Scholar 

  6. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  7. Devore J, Peck R (1997) Statistics: the exploration and analysis of data, 3rd edn. Duxbury Press, Pacific Grove, CA

    Google Scholar 

  8. Draghici S, Kulaeva O, Hoff B, Petrov A, Shams S, Tainsky MA (2003) Sorin noise sample method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays. Bioinformatics 19:1348–1359

    Article  Google Scholar 

  9. Dudoit S, Fridyland JF, Speed TP (2002) Comparison of discrimination methods for the classification of tumor using gene expression data. J Am Stat Assoc 97:77–87

    Article  MATH  Google Scholar 

  10. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537

    Article  Google Scholar 

  11. Haykin S (1994) Neural networks, a comprehensive foundation. Prentice-Hall, NJ

    MATH  Google Scholar 

  12. Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Info Sci 178(18):3577–3594

    Article  MATH  MathSciNet  Google Scholar 

  13. Hu QH, Yu DR, Xie ZX (2008) Neighborhood classifiers. Expert Syst Appl 34(2):866–876

    Article  Google Scholar 

  14. Huang DS, Zheng CH (2006) Independent component analysis based penalized discriminant method for tumor classification using gene expression data. Bioinformatics 22(15):1855–1862

    Article  Google Scholar 

  15. Kitter J (1986) Feature selection and extraction. In: Young TY, Fu K-S (eds) Handbook of pattern recognition and image processing. Academic Press, NY

    Google Scholar 

  16. Kraskov A, Stögbauer H, Andrzejak RG, Grassberger P (2005) Hierarchical clustering using mutual information. Europhys Lett 70(2):278–284

    Article  MathSciNet  Google Scholar 

  17. Lagus K, Alhoniemi E, Valpola H (2001) Independent variable group analysis. In: Dorffner G, Bischof H, Hornik K (eds) International conference on artificial neural networks—ICANN 2001, ser. LLNCS, vol 2130. Springer, Vienna, Austria. August, pp 203–210

  18. Lagus K, Alhoniemi E, Seppä J, Honkela A, Wagner P (2005) Independent variable group analysis in learning compact representations for data. In: Honkela T, Könönen V, Pöllä M, Simula O (eds) Proceedings of the international and interdisciplinary conference on adaptive knowledge representation and reasoning (AKRR’05). Espoo, Finland, June, pp 49–56

  19. Lee KE, Sha N, Dougherty ER, Vannucci M, Mallick BK (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97

    Article  Google Scholar 

  20. Li W, Sun F, Grosse I (2004) Extreme value distribution based gene selection criteria for discriminant microarray data analysis using logistic regression. J Comput Biol 1:215–226

    Article  Google Scholar 

  21. Nanni L, Lumini A, Brahnam Sheryl (2010) Advanced machine learning techniques for microarray spot quality classification. Neural Comput Appl 19(3):471–475

    Article  Google Scholar 

  22. Nguyen DV, Rocke DM (2002) Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 18(1):39–50

    Article  Google Scholar 

  23. Nilsson M, Gustafsson H, Andersen SV, Kleijn WB (2002) Gaussian mixture model based mutual information estimation between frequency bands in speech. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing 2002 (ICASSP ‘02), 1, pp I–525–I–528

  24. Pochet N, De Smet F, Suykens JAK, De Moor BLR (2004) Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics 20:3185–3195

    Article  Google Scholar 

  25. Shevade SK, Keerthi S (2003) A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19:2246–2253

    Article  Google Scholar 

  26. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209

    Article  Google Scholar 

  27. Studený M, Vejnarová J (1999) The multiinformation function as a tool for measuring stochastic dependence. In: Jordan M (ed) Learning in graphical models. The MIT Press, Cambridge, pp 261–297

    Google Scholar 

  28. Thomas G et al (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11:1227–1236

    Article  Google Scholar 

  29. Troyanskaya G et al (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18:1454–1461

    Article  Google Scholar 

  30. West M (2003) Bayesian factor regression models in the ‘Large p, Small n’ paradigm. Bayesian Stat 7:723–732

    Google Scholar 

  31. Zhang HH, Ahn J, Lin X, Park C (2006) Gene selection using support vector machines with non-convex penalty. Bioinformatics 22:88–95

    Article  Google Scholar 

  32. Zhao XM, Cheung YM, Huang DS (2010) Analysis of gene expression data using RPEM algorithm in normal mixture model with dynamic adjustment of learning rate. Int J Pattern Recogn Artif Intell 24(4):651–666

    Article  Google Scholar 

  33. Zhao XM, Wang RS, Chen LN, Aihara Kazuyuki (2008) Uncovering signal transduction networks from high-throughput data by integer linear programming. Nucl Acids Res 36(9):e48

    Article  Google Scholar 

  34. Zheng CH, Huang DS, Zhang L, Kong XZ (2009) Tumor clustering using non-negative matrix factorization with gene selection. IEEE Trans Info Technol Biomed 13(4):599–607

    Article  Google Scholar 

  35. Zheng CH, Huang DS, Li K, Irwin George, Sun ZL (2007) MISEP method for post-nonlinear blind source separation. Neural Comput 19(9):2557–2578

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant Nos. 30700161 & 30900321, the Foundation for Young Scientist of Shandong Province, China under Grant No. 2008BS01010, and the LIESMARS Special Research Funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chun-Hou Zheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, CH., Chong, YW. & Wang, HQ. Gene selection using independent variable group analysis for tumor classification. Neural Comput & Applic 20, 161–170 (2011). https://doi.org/10.1007/s00521-010-0513-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-010-0513-2

Keywords

Navigation