Advertisement

A Study on the Importance of Differential Prioritization in Feature Selection Using Toy Datasets

  • Chia Huey Ooi
  • Shyh Wei Teng
  • Madhu Chetty
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)

Abstract

Previous empirical works have shown the effectiveness of differential prioritization in feature selection prior to molecular classification. We now propose to determine the theoretical basis for the concept of differential prioritization through mathematical analyses of the characteristics of predictor sets found using different values of the DDP (degree of differential prioritization) from realistic toy datasets. Mathematical analyses based on analytical measures such as distance between classes are implemented on these predictor sets. We demonstrate that the optimal value of the DDP is capable of forming a predictor set which consists of classes of features which are well separated and are highly correlated to the target classes – a characteristic of a truly optimal predictor set. From these analyses, the necessity of adjusting the DDP based on the dataset of interest is confirmed in a mathematical manner, indicating that the DDP-based feature selection technique is superior to both simplistic rank-based selection and state-of-the-art equal-priorities scoring methods. Applying similar analyses to real-life multiclass microarray datasets, we obtain further proof of the theoretical significance of the DDP for practical applications.

Keywords

Feature Selection Marker Gene Feature Selection Technique Optimal Classification Accuracy Ovum Concept 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. In: Paper presented at the Proc. 21st Australasian Computer Science Conf. (1998)Google Scholar
  2. 2.
    Ding, C., Long, F., Peng, H.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)CrossRefPubMedGoogle Scholar
  3. 3.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Machine Learning Research 3, 1157–1182 (2003)Google Scholar
  4. 4.
    Knijnenburg, T.A., Reinders, M.J.T., Wessels, L.F.A.: The selection of relevant and non-redundant features to improve classification performance of microarray gene expression data. In: Proc. 11th Annual Conf. of the Advanced School for Computing and Imaging, Heijen, NL (2005)Google Scholar
  5. 5.
    Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)CrossRefPubMedGoogle Scholar
  6. 6.
    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., et al.: Multi-class cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Chai, H., Domeniconi, C.: An evaluation of gene selection methods for multi-class microarray data classification. In: Paper presented at the Proc. 2nd European Workshop on Data Mining and Text Mining in Bioinformatics (2004)Google Scholar
  8. 8.
    Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Paper presented at the Proc. of ACM SIGKDD 2004 (2004)Google Scholar
  9. 9.
    Ooi, C.H., Chetty, M., Gondal, I.: The role of feature redundancy in tumor classification. In: Zhang, D., Jain, A.K. (eds.) ICBA 2004. LNCS, vol. 3072. Springer, Heidelberg (2004)Google Scholar
  10. 10.
    Ooi, C.H., Chetty, M., Teng, S.W.: Relevance, redundancy and differential prioritization in feature selection for multiclass gene expression data. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds.) ISBMDA 2005. LNCS (LNBI), vol. 3745. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Ooi, C.H., Chetty, M., Teng, S.W.: Modeling microarray datasets for efficient feature selection. In: Paper presented at the Proc. 4th Australasian Conf. on Knowledge Discovery and Data Mining (AusDM 2005) (2005a)Google Scholar
  12. 12.
    Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression dat. J. Am. Stat. Assoc. 97, 77–87 (2002)CrossRefGoogle Scholar
  13. 13.
    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., et al.: Multi-class cancer diagnosis using tumor gene expression signatures. Proc. Natl. Acad. Sci. USA 98, 15149–15154 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Munagala, K., Tibshirani, R., Brown, P.: Cancer characterization and feature set extraction by discriminative margin clustering. BMC Bioinformatics 5, 21 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Park, M., Hastie, T.: Hierarchical classification using shrunken centroids. Department of Statistics, Stanford University. Technical Report (2005), http://www-stat.stanford.edu/~hastie/Papers/hpam.pdf
  16. 16.
    Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., et al.: Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 24, 227–235 (2000)CrossRefPubMedGoogle Scholar
  17. 17.
    Yeoh, E.-J., Ross, M.E., Shurtleff, S.A., Williams, W.K., Patel, D., Mahfouz, R., et al.: Classification, subtype discovery, and prediction of outcome in pediatric lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2), 133–143 (2002)CrossRefPubMedGoogle Scholar
  18. 18.
    Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., et al.: Classification and diagnostic prediction of cancers using expression profiling and artificial neural networks. Nat. Med. 7, 673–679 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Bhattacharjee, A., Richards, W.G., Staunton, J.E., Li, C., Monti, S., Vasa, P., et al.: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA 98, 13790–13795 (2001)CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., et al.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41–47 (2002)CrossRefPubMedGoogle Scholar
  21. 21.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Chia Huey Ooi
    • 1
  • Shyh Wei Teng
    • 1
  • Madhu Chetty
    • 1
  1. 1.Faculty of Information TechnologyMonash UniversityAustralia

Personalised recommendations