Neural Computing and Applications

, Volume 16, Issue 6, pp 527–539 | Cite as

Classification consistency analysis for bootstrapping gene selection

  • Shaoning Pang
  • Ilkka Havukkala
  • Yingjie Hu
  • Nikola Kasabov
ICONIP2006

Abstract

Consistency modelling for gene selection is a new topic emerging from recent cancer bioinformatics research. The result of operations such as classification, clustering, or gene selection on a training set is often found to be very different from the same operations on a testing set, presenting a serious consistency problem. In practice, the inconsistency of microarray datasets prevents many typical gene selection methods working properly for cancer diagnosis and prognosis. In an attempt to deal with this problem, this paper proposes a new concept of classification consistency and applies it for microarray gene selection problem using a bootstrapping approach, with encouraging results.

Notes

Acknowledgments

The research presented in the paper was partially funded by the New Zealand Foundation for Research, Science and Technology under the grant: NERF/AUTX02-01.

References

  1. 1.
    Ding C, Peng H (2003) Minimum Redundancy Feature Selection for Gene Expression Data. In: Paper presented at the Proc. IEEE Computer Society Bioinformatics Conference (CSB 2003), StanfordGoogle Scholar
  2. 2.
    Furey T, Cristianini N et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914CrossRefGoogle Scholar
  3. 3.
    Jaeger J, Sengupta R et al (2003) Improved gene selection for classification of microarrays. In: Paper presented at the Pacific Symposium on BiocomputingGoogle Scholar
  4. 4.
    Tusher V, Tibshirani R et al (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98(9):5116–5121MATHCrossRefGoogle Scholar
  5. 5.
    Zhang C, Lu X, Zhang X (2006) Significance of gene ranking for classification of microarray samples. IEEE/ACM Trans Comput Biol Bioinform 3(3):312–320CrossRefGoogle Scholar
  6. 6.
    Duch W, Biesiada J (2006) Margin based feature selection filters for microarray gene expression data. Int J Inform Technol Intell Comput 1:9–33Google Scholar
  7. 7.
    Draghici S, Kulaeva O et al (2003) Noise sampling method: an ANOVA approach allowing robust selection of differentially regulated genes measured by DNA microarrays. Bioinformatics 19(11):1348–1359CrossRefGoogle Scholar
  8. 8.
    Efron B, Tibshirani R et al (2001) Empirical bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Lee KE, Sha N et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19(1):90–97CrossRefGoogle Scholar
  10. 10.
    Tibshirani RJ (2006) A simple method for assessing sample sizes in microarray experiments. BMC Bioinform 7:106CrossRefGoogle Scholar
  11. 11.
    Kauai H, Kasabov N, Middlemiss M et al (2003) A generic connectionist-based method for on-line feature selection and modelling with a case study of gene expression data analysis. In: Paper presented at the Conferences in Research and Practice in Information Technology Series: proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, vol 19, Adelaide, AustraliaGoogle Scholar
  12. 12.
    Wang Z, Palade V, Xu Y (2006) Neuro-Fuzzy ensemble approach for microarray cancer gene expression data analysis. In: Proceedings of 2006 international symposium on evolving fuzzy systems, pp 241–246Google Scholar
  13. 13.
    Wolf L, Shashua A et al (2004) Selecting relevant genes with a spectral approach (No. CBCL Paper No.238). Massachusetts Institute of Technology, CambridgeGoogle Scholar
  14. 14.
    Huerta EB, Duval B et al (2006) A hybrid GA/SVM approach for gene selection and classification of microarray data. Lect Notes Comput Sci 3907:34–44CrossRefGoogle Scholar
  15. 15.
    Alon U, Barkai N et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750CrossRefGoogle Scholar
  16. 16.
    Li L, Weinberg CR et al (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17(12):1131–1142CrossRefGoogle Scholar
  17. 17.
    Wahde M, Szallasi Z (2006) Improving the prediction of the clinical outcome of breast cancer using evolutionary algorithms. Soft Comput 10(4):338–345CrossRefGoogle Scholar
  18. 18.
    Wahde M, Szallasi Z (2006) A Survey of methods for classification of gene expression data using evolutionary algorithms. Expert Rev Mol Diagn 6(1):101–110CrossRefGoogle Scholar
  19. 19.
    Mukherjee S, Roberts SJ (2004) Probabilistic consistency analysis for gene selection. Paper presented at the CSB, StanfordGoogle Scholar
  20. 20.
    Mukherjee S, Roberts SJ et al (2005) Data-adaptive test statistics for microarray data. Bioinformatics 21(Suppl 2):ii108–ii114CrossRefGoogle Scholar
  21. 21.
    Shipp MA, Ross KN et al (2002) Supplementary information for diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74CrossRefGoogle Scholar
  22. 22.
    Golub TR (2004) Toward a functional taxonomy of cancer. Cancer Cell 6(2):107–108CrossRefMathSciNetGoogle Scholar
  23. 23.
    Pomeroy S, Tamayo P et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442CrossRefGoogle Scholar
  24. 24.
    Petricoin EF, Ardekani AM et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359:572–577CrossRefGoogle Scholar
  25. 25.
    Van ’t Veer LJ, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536CrossRefGoogle Scholar
  26. 26.
    Gordon GJ, Jensen R et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62:4963–4967Google Scholar
  27. 27.
    Breiman L, Spector P (1992) Submodel selection and evaluation in regression: the Xrandom case60. Int Stat Rev 60:291–319CrossRefGoogle Scholar
  28. 28.
    Kohavi R (1995) A study of crossvalidation and bootstrap for accuracy estimation and model selection. In: Paper presented at the international joint conference on artificial intelligence (IJCAI), MontrealGoogle Scholar
  29. 29.
    Ransohoff DF (2005) Bias as a threat to the validity of cancer molecular marker research. Nat Rev Cancer 5(2):142149CrossRefGoogle Scholar
  30. 30.
    Staal FJT, Cario G et al (2006) Consensus guidelines for microarray gene expression analyses in leukemia from three European leukemia networks. Leukemia 20:1385–1392CrossRefGoogle Scholar
  31. 31.
    Allison DB, Cui X et al (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 7:55–65CrossRefGoogle Scholar
  32. 32.
    Kawasaki ES (2006) The end of the microarray tower of babel: will universal standards lead the way? J Biomol Tech 17:200–206Google Scholar
  33. 33.
    Pham TD, Wells C et al (2006) Analysis of microarray gene expression data. Curr Bioinform 1:37–53Google Scholar
  34. 34.
    Asyali MH, Colak D et al (2006) Gene expression profile classification: a review. Curr Bioinform 1:55–73Google Scholar
  35. 35.
    Sauerbrei W, Hollander N et al (2006) Evidence-based assessment and application of prognostic markers: the long way from single studies to meta-analysis. Commun Stat Theory Methods 35:1333–1342MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • Shaoning Pang
    • 1
  • Ilkka Havukkala
    • 1
  • Yingjie Hu
    • 1
  • Nikola Kasabov
    • 1
  1. 1.Knowledge Engineering and Discovery Research InstituteAuckland University of TechnologyAucklandNew Zealand

Personalised recommendations