Skip to main content
Log in

Classification consistency analysis for bootstrapping gene selection

  • ICONIP2006
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Consistency modelling for gene selection is a new topic emerging from recent cancer bioinformatics research. The result of operations such as classification, clustering, or gene selection on a training set is often found to be very different from the same operations on a testing set, presenting a serious consistency problem. In practice, the inconsistency of microarray datasets prevents many typical gene selection methods working properly for cancer diagnosis and prognosis. In an attempt to deal with this problem, this paper proposes a new concept of classification consistency and applies it for microarray gene selection problem using a bootstrapping approach, with encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Ding C, Peng H (2003) Minimum Redundancy Feature Selection for Gene Expression Data. In: Paper presented at the Proc. IEEE Computer Society Bioinformatics Conference (CSB 2003), Stanford

  2. Furey T, Cristianini N et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914

    Article  Google Scholar 

  3. Jaeger J, Sengupta R et al (2003) Improved gene selection for classification of microarrays. In: Paper presented at the Pacific Symposium on Biocomputing

  4. Tusher V, Tibshirani R et al (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98(9):5116–5121

    Article  MATH  Google Scholar 

  5. Zhang C, Lu X, Zhang X (2006) Significance of gene ranking for classification of microarray samples. IEEE/ACM Trans Comput Biol Bioinform 3(3):312–320

    Article  Google Scholar 

  6. Duch W, Biesiada J (2006) Margin based feature selection filters for microarray gene expression data. Int J Inform Technol Intell Comput 1:9–33

    Google Scholar 

  7. Draghici S, Kulaeva O et al (2003) Noise sampling method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays. Bioinformatics 19(11):1348–1359

    Article  Google Scholar 

  8. Efron B, Tibshirani R et al (2001) Empirical bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160

    Article  MATH  MathSciNet  Google Scholar 

  9. Lee KE, Sha N et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19(1):90–97

    Article  Google Scholar 

  10. Tibshirani RJ (2006) A simple method for assessing sample sizes in microarray experiments. BMC Bioinform 7:106

    Article  Google Scholar 

  11. Kauai H, Kasabov N, Middlemiss M et al (2003) A generic connectionist-based method for on-line feature selection and modelling with a case study of gene expression data analysis. In: Paper presented at the Conferences in Research and Practice in Information Technology Series: proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, vol 19, Adelaide, Australia

  12. Wang Z, Palade V, Xu Y (2006) Neuro-Fuzzy ensemble approach for microarray cancer gene expression data analysis. In: Proceedings of 2006 international symposium on evolving fuzzy systems, pp 241–246

  13. Wolf L, Shashua A et al (2004) Selecting relevant genes with a spectral approach (No. CBCL Paper No.238). Massachusetts Institute of Technology, Cambridge

  14. Huerta EB, Duval B et al (2006) A hybrid GA/SVM approach for gene selection and classification of microarray data. Lect Notes Comput Sci 3907:34–44

    Article  Google Scholar 

  15. Alon U, Barkai N et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750

    Article  Google Scholar 

  16. Li L, Weinberg CR et al (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12):1131–1142

    Article  Google Scholar 

  17. Wahde M, Szallasi Z (2006) Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms. Soft Comput 10(4):338–345

    Article  Google Scholar 

  18. Wahde M, Szallasi Z (2006) A Survey of methods for classification of gene expression data using evolutionary algorithms. Expert Rev Mol Diagn 6(1):101–110

    Article  Google Scholar 

  19. Mukherjee S, Roberts SJ (2004) Probabilistic consistency analysis for gene selection. Paper presented at the CSB, Stanford

  20. Mukherjee S, Roberts SJ et al (2005) Data-adaptive test statistics for microarray data. Bioinformatics 21(Suppl 2):ii108–ii114

    Article  Google Scholar 

  21. Shipp MA, Ross KN et al (2002) Supplementary information for diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74

    Article  Google Scholar 

  22. Golub TR (2004) Toward a functional taxonomy of cancer. Cancer Cell 6(2):107–108

    Article  MathSciNet  Google Scholar 

  23. Pomeroy S, Tamayo P et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442

    Article  Google Scholar 

  24. Petricoin EF, Ardekani AM et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577

    Article  Google Scholar 

  25. Van ’t Veer LJ, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536

    Article  Google Scholar 

  26. Gordon GJ, Jensen R et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62:4963–4967

    Google Scholar 

  27. Breiman L, Spector P (1992) Submodel selection and evaluation in regression: the Xrandom case60. Int Stat Rev 60:291–319

    Article  Google Scholar 

  28. Kohavi R (1995) A study of crossvalidation and bootstrap for accuracy estimation and model selection. In: Paper presented at the international joint conference on artificial intelligence (IJCAI), Montreal

  29. Ransohoff DF (2005) Bias as a threat to the validity of cancer molecular marker research. Nat Rev Cancer 5(2):142149

    Article  Google Scholar 

  30. Staal FJT, Cario G et al (2006) Consensus guidelines for microarray gene expression analyses in leukemia from three European leukemia networks. Leukemia 20:1385–1392

    Article  Google Scholar 

  31. Allison DB, Cui X et al (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7:55–65

    Article  Google Scholar 

  32. Kawasaki ES (2006) The end of the microarray tower of babel: will universal standards lead the way? J Biomol Tech 17:200–206

    Google Scholar 

  33. Pham TD, Wells C et al (2006) Analysis of microarray gene expression data. Curr Bioinform 1:37–53

    Google Scholar 

  34. Asyali MH, Colak D et al (2006) Gene expression profile classification: a review. Curr Bioinform 1:55–73

    Google Scholar 

  35. Sauerbrei W, Hollander N et al (2006) Evidence-based assessment and application of prognostic markers: the long way from single studies to meta-analysis. Commun Stat Theory Methods 35:1333–1342

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The research presented in the paper was partially funded by the New Zealand Foundation for Research, Science and Technology under the grant: NERF/AUTX02-01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaoning Pang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pang, S., Havukkala, I., Hu, Y. et al. Classification consistency analysis for bootstrapping gene selection. Neural Comput & Applic 16, 527–539 (2007). https://doi.org/10.1007/s00521-007-0110-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-007-0110-1

Keywords

Navigation