Advertisement

A Framework for Multi-class Learning in Micro-array Data Analysis

  • Nicoletta Dessì
  • Barbara Pes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5651)

Abstract

A large pool of techniques have already been developed for analyzing micro-array datasets but less attention has been paid on multi-class classification problems. In this context, selecting features and quantify classifiers may be hard since only few training examples are available in each single class. This paper demonstrates a framework for multi-class learning that considers learning a classifier within each class independently and grouping all relevant features in a single dataset. Next step, that dataset is presented as input to a classification algorithm that learns a global classifier across the classes. We analyze two micro-array datasets using the proposed framework. Results demonstrate that our approach is capable of identifying a small number of influential genes within each class while the global classifier across the classes performs better than existing multi-class learning methods.

Keywords

Micro-array data analysis Multi-class learning Feature Selection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Piatetsky-Shapiro, G., Tamayo, P.: Microarray Data Mining: Facing the Challenges. ACM SIGKDD Explorations 5(2) (2003)Google Scholar
  2. 2.
    Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefPubMedGoogle Scholar
  3. 3.
    Golub, T.R., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefPubMedGoogle Scholar
  4. 4.
    Guyon, I., Weston, J., Barnill, S.: Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning 46, 389–422 (2002)CrossRefGoogle Scholar
  5. 5.
    Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004)CrossRefPubMedGoogle Scholar
  6. 6.
    Hastie, T., Tibshirani, R., Friedman, J.: The elements of Statistical Learning: Data Mining, Inference, Prediction. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Weston, J., Watkins, C.: Multi-class support vector machines. Technical Report, Department of Computer Science, Holloway, University of London, Egham, UK (1998)Google Scholar
  8. 8.
    Lee, Y., Lee, C.K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19, 1132–1139 (2003)CrossRefPubMedGoogle Scholar
  9. 9.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)CrossRefGoogle Scholar
  10. 10.
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(3), 1–12 (2005)CrossRefGoogle Scholar
  11. 11.
    Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)CrossRefGoogle Scholar
  12. 12.
    Pranckeviciene, E., Somorjai, R.: On Classification Models of Gene Expression Microarrays: The Simpler the Better. International Joint Conference on Neural Networks (2006)Google Scholar
  13. 13.
    Yukinawa, N., et al.: Optimal aggregation of binary classifiers for multi-class cancer diagnosis using gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics (preprint) (2008)Google Scholar
  14. 14.
    Simon, H.: Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n). SIGKDD Explorations 5(2), 31–36 (2003)CrossRefGoogle Scholar
  15. 15.
    Bell, D., Wang, H.: A formalism for relevance and its application in feature subset selection. Mach. Learning 41(2), 175–195 (2000)CrossRefGoogle Scholar
  16. 16.
    Caruana, R., Freitag, D.: How useful is relevance? In: Working Notes of the AAAI Fall Symposium on Relevance. AAAI Press, N. Orleans (1994)Google Scholar
  17. 17.
    Bosin, A., Dessì, N., Pes, B.: A Cost-Sensitive Approach to Feature Selection in Micro-Array Data Classification. In: Masulli, F., Mitra, S., Pasi, G. (eds.) WILF 2007. LNCS, vol. 4578, pp. 571–579. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  18. 18.
    Bosin, A., Dessì, N., Pes, B.: Capturing Heuristics and Intelligent Methods for Improving Micro-array Data Classification. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 790–799. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Yeoh, E.J., et al.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143 (2002)CrossRefPubMedGoogle Scholar
  20. 20.
    Bhattacharjee, A., Richards, W.G., et al.: Classification of human lung carcinomas by mrna expression profiling reveals distinct adenoma subclasses. PNAS 98, 13790–13795 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Elsevier, Amsterdam (2005)Google Scholar
  22. 22.
    Statnikov, A., Aliferis, C.F., et al.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5) (2005)Google Scholar
  23. 23.
    Liu, H., et al.: A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Genome informatics 13, 51–60 (2002)PubMedGoogle Scholar
  24. 24.
    Ling, N.E., Hasan, Y.A.: Classification on microarray data. In: IMT-GT Regional Conference on Mathematics, Statistics and Applications, Malaysia (2006)Google Scholar
  25. 25.
    Ding, Y., Wilkins, D.: Improving the Performance of SVM-RFE to Select Genes in Microarray Data. BMC Bioinformatics 7(suppl. 2), S12 (2006)CrossRefGoogle Scholar
  26. 26.
    Piatetsky-Shapiro, G., et al.: Capturing Best Practice for Microarray Gene Expression Data Analysis. In: SIGKDD 2003, Washington, USA (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nicoletta Dessì
    • 1
  • Barbara Pes
    • 1
  1. 1.Dipartimento di Matematica e InformaticaUniversità degli Studi di CagliariCagliariItaly

Personalised recommendations