Dimension Reduction of Gene Expression Data for Designing Optimized Rule Base Classifier

  • Amit Paul
  • Jaya Sil
  • Chitrangada Das Mukhopadhyay
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 266)


The paper highlights the need of dimension reduction of voluminous gene expression microarray data for developing a robust classifier to predict patients with cancerous genes. The proposed algorithm builds a fuzzy rule based classifier with optimized rule set without much sacrificing classification accuracy. The gene expression matrix is first discretized using linguistic values. The importance factor of each gene is then evaluated representing the degree of presence of a unique linguistic value of the gene both in disease and nondisease classes. Initial fuzzy rule base consists higher ranking genes and gradually other genes are included in the rule base in order to achieve maximum classification accuracy. Thus optimum rule set is built with important genes for classification of test data set. The methodology proposed here has been successfully demonstrated for the lung cancer classification problem, which includes 97 smokers with lung cancer and 90 without lung cancer gene expression data. The results are promising even though maximum number of genes are removed from the original data.


Fuzzy rule base Linguistic variable Fuzzy importance 


  1. 1.
    Kononenko, I.: Inductive and bayesian learning in medical diagnosis. Appl. Artif. Intell. 7(4), 317–337 (1993)CrossRefGoogle Scholar
  2. 2.
    Wolberg, W., Street, W.: Mangasarian ol. machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Lett. 77, 163–171 (1994)CrossRefGoogle Scholar
  3. 3.
    Wolberg, W., Street, W.: Mangasarian ol. image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal. Quant. Cytol. Histol. 17(2), 77–87 (1995)Google Scholar
  4. 4.
    Kurgan, L., Cios, K., Tadeusiewicz, R., Ogiela, M., Goodenday, L.S.: Knowledge discovery approach to automated cardiac spect diagnosis. Artif. Intell. Med. 23(2), 149–169 (2001)Google Scholar
  5. 5.
    Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19(5), 563–570 (2003)CrossRefGoogle Scholar
  6. 6.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)CrossRefMATHGoogle Scholar
  7. 7.
    Yu, J., Ongarello, S., Fiedler, R., Chen, X., Toffolo, G., Cobelli, C., et al.: Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21, 2200–2209 (2005)CrossRefGoogle Scholar
  8. 8.
    Oh, I., Lee, J., Moon, B.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)CrossRefGoogle Scholar
  9. 9.
    Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  10. 10.
    Liu, H., Motoda, H.: Feature extraction, construction and selection: a data mining perspective, 1st edn. Kluwer, Norwell (1998)CrossRefMATHGoogle Scholar
  11. 11.
    Conilione, P., Wang, D.: A comparative study on feature selection for E. coli promoter recognition. Int. J. Inf. Technol. 11, 54–66 (2005)Google Scholar
  12. 12.
    Degroeve, S., Baets, B., de Peer, Y., Rouzé, P.: Feature subset selection for splice site prediction. Bioinformatics 18(Suppl 2), 75–83 (2002)CrossRefGoogle Scholar
  13. 13.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  14. 14.
    Liu, H., Yu, L.: Toward integrated feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRefGoogle Scholar
  15. 15.
    Kuncheva, L.: Fuzzy Classifier Design. Springer, Heidelberg (2000)CrossRefMATHGoogle Scholar
  16. 16.
    Leondes, C. (ed.): Fuzzy Theory Systems: Techniques and Applications, vol. 1–4. Academic Press, San Diego (1999)Google Scholar
  17. 17.
    Yuan, Y., Shaw, M.: Induction of fuzzy decision trees. Fuzzy Sets Syst. 25, 125–139 (1995)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Ichihashi, H., Shirai, T., Nagasaka, K., Miyoshi, T.: Neuro-fuzzy ID3: a method of inducing fuzzy decision trees with linear programming for maximizing entropy and an algebraic method for incremental learning. Fuzzy Sets Syst. 84, 1–19 (1996)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Yuan, Y., Zhuang, H.: A genetic algorithm for generating fuzzy classification rules. Fuzzy Sets Syst. 84, 1–19 (1996)CrossRefMATHGoogle Scholar
  20. 20.
    Castillo, L., Gonzalez, A., Perez, P.: Including a simplicity criterion in the selection of the best rule in a genetic fuzzy learning algorithm. Fuzzy Sets Syst. 120(2), 309–321 (2001)CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Castro, J., Castro-Schez, J., Zurita, J.: Use of a fuzzy machine learning technique in the knowledge acquisition process. Fuzzy Sets Syst. 123(3), 307–320 (2001)CrossRefMATHMathSciNetGoogle Scholar
  22. 22.
    Jin, Y.: Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement. IEEE Trans. Fuzzy Syst. 8(2), 212–221 (2000)CrossRefGoogle Scholar
  23. 23.
    de Oliveira, V.: Semantic constraints for membership function optimization. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 29(1), 128–138 (1999)Google Scholar
  24. 24.
    Pedrycz, W., de Oliveira, V.: Optimization of fuzzy models. IEEE Trans. Systems Man Cybern. Part B Cybern. 26(4), 627–637 (1996)Google Scholar
  25. 25.
    Setnes, M., Babuska, R., Verbruggen, B.: Rule-based modeling: precision and transparency. IEEE Trans. Systems Man and Cybern. Part C Appl. Rev. 28(1), 165–169 (1998)Google Scholar
  26. 26.
    Setnes, M., Roubos, H.: GA-fuzzy based modeling and classification: complexity and performance. IEEE Trans. Fuzzy Syst. 8(5), 509–522 (2000)CrossRefGoogle Scholar
  27. 27.
    Spira, A., Beane, J., Shah, V., Steiling, K., Liu, G., Schembri, F., Gilman, S., Dumas, Y., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M., Brody, J.: Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 13(3), 361–366 (2007)CrossRefGoogle Scholar
  28. 28.
    Gustafson, A., Soldi, R., Anderlind, C., Scholand, M., Qian, J., Zhang, X., Cooper, K., Walker, D., McWilliams, A., Liu, G., Szabo, E., Brody, J., Massion, P., Lenburg, M., Lam, S., Bild, A., Spira, A.: Airway PI3K pathway activation is an early and reversible event in lung cancer development. Sci. Transl. Med. 2(26), 26–25 (2010)Google Scholar
  29. 29.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)Google Scholar
  30. 30.
    Hall, M.: Correlation-based feature selection for machine learning. Thesis for the degree of Doctor of Philosophy (1999)Google Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  • Amit Paul
    • 1
  • Jaya Sil
    • 2
  • Chitrangada Das Mukhopadhyay
    • 3
  1. 1.Computer Science and EngineeringSt. Thomas College of Engineering and TechnologyKhidirporeIndia
  2. 2.Computer Science and TechnologyBengal Engineering and Science UniversityShibpurIndia
  3. 3.Health Care Science and TechnologyBengal Engineering and Science UniversityShibpurIndia

Personalised recommendations