Sample Classification Based on Gene Subset Selection

  • Sunanda DasEmail author
  • Asit Kumar Das
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 410)


Microarray datasets contain genetic information of patients analysis of which can reveal new findings about the cause and subsequent treatment of any disease. With an objective to extract biologically relevant information from the datasets, many techniques are used in gene analysis. In the paper, the concepts like functional dependency and closure of an attribute of database technology are applied to find the most important gene subset and based on which the samples of the gene datasets are classified as normal and disease samples. The gene dependency is defined as the number of genes dependent on a particular gene using gene similarity measurement on collected samples. The closure of a gene is computed using gene dependency set which helps to know how many genes are logically implied by it. Finally, the minimum number of genes whose closure logically implies all the genes in the dataset is selected for sample classification.


Gene selection Gene dependency Closure of a gene Sample classification 


  1. 1.
    Alon, U., Barkai, N., Notterman, D.A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Ding, C., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)CrossRefGoogle Scholar
  3. 3.
    Tsky-Shapiro, G., Smyth, P., Uthurusamy, R.: From Data Mining to Knowledge Discovery: An Overview in Advances in Knowledge Discovery and Data Mining, pp. 1–36. (1996)Google Scholar
  4. 4.
    Lavrajc, N., Keravnou, E., Zupan, B.: Intelligent Data Analysis in Medicine and Pharmacology. Kluwer Academic Publishers (1997)Google Scholar
  5. 5.
    Wolf, S., Oliver, H., Herbert, S., Michael, M.: Intelligent data mining for medical quality management. In: Proceedings of the Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, Berlin, Germany (2000)Google Scholar
  6. 6.
    Ye, C.Z., Yang, J., Geng, D.Y., Zhou, Y., Chen, N.Y.: Fuzzy rules to predict degree of malignancy in brain glioma. Med. Biol. Comput. Eng. 40(2), 145–152 (2002)Google Scholar
  7. 7.
    Das, S., Das, A.K.: An approach towards most cancerous gene Selection from microarray data. ICCIDM 3, 641–648 (2014)Google Scholar
  8. 8.
    Das, A.K., Pati, S.K.: Gene subset selection for cancer classification using statistical and rough set approach, pp. 294–302. Evol. Memetic Comput., Swarm (2012)Google Scholar
  9. 9.
    Das, A.K., Pati, S.K., Chakrabarty, S.: Reduct generation of microarray dataset using rough set and graph theory for unsupervised learning. In: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology, pp. 555–561. (2012)Google Scholar
  10. 10.
    Kerber, R., ChiMerge.: Discretization of numeric attributes. In: Proceedings of AAAI-92. Ninth International Conference on Artificial Intelligence, pp. 123–128. AAAI-Press, (1992)Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNeotia Institute of Technology, Management and ScienceSouth 24-ParganaIndia
  2. 2.Department of Computer Science and TechnologyIndian Institute of Engineering Science and TechnologyHowrahIndia

Personalised recommendations