An Approach Towards Most Cancerous Gene Selection from Microarray Data

  • Sunanda Das
  • Asit Kumar Das
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 33)


Microarray gene dataset is often very high-dimensional which presents complicated problems, like the degradation of data accessing, data manipulating and query processing performance. Dimensionality reduction efficiently tackles this problem and benefited us to visualize the intrinsic properties hidden in the dataset. Therefore, Rough set theory (RST) has been used for selecting only the relevant attributes of the dataset, called reduct, sufficient to characterize the information system. The investigation has been carried out on the publicly available microarray dataset. The analysis revealed that Rough Set using the concepts of dependency among genes is able to extract the various dominant genes in term of reducts which play an important role in causing the disease. Experimental results show the effectiveness of the algorithm.


Dependency among genes Similarity measure Reduct generation 


  1. 1.
    Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRefGoogle Scholar
  2. 2.
    Velayutham, C., Thangavel, K.: Unsupervised quick reduct algorithm using rough set theory. J. Electr. Sci. Technol. 9(3), 193–201 (2011)Google Scholar
  3. 3.
    Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1106–1119 (2012)CrossRefGoogle Scholar
  4. 4.
    Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRefGoogle Scholar
  5. 5.
    Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of Ninth National Conference on Artificial Intelligence, pp. 129–134 (1992)Google Scholar
  6. 6.
    Langley, P.: Selection of relevant features in machine learning. In: Proceedings on AAAI Fall Symposium Relevance, pp. 1–5 (1994)Google Scholar
  7. 7.
    Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective (Kluwer International Series in Engineering & Computer Science). Academic Publishers, New York (1998)Google Scholar
  8. 8.
    Miller A.J., Hall, C.: Subset Selection in Regression (1990)Google Scholar
  9. 9.
    Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishing, Norwell (1991)Google Scholar
  10. 10.
    Polkowski, L.: Rough Sets: Mathematical Foundations. Advances in Soft Computing. Physica Verlag, Heidelberg (2002)Google Scholar
  11. 11.
    Baixeries, J.: A formal concept analysis framework to mine functional dependencies. In: Proceeding of the Workshop on Mathematical Methods for Learning (2004)Google Scholar
  12. 12.
    Kerber, R., ChiMerge.: Discretization of Numeric Attributes. In: Proceedings of AAAI-92, Ninth International Conference on Artificial Intelligence, AAAI-Press, pp. 123–128 (1992)Google Scholar
  13. 13.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MATHGoogle Scholar
  14. 14.
    Hall, M.A.: Correlation-based feature selection for machine learning. The University of Waikato, New Zealand (1999)Google Scholar
  15. 15.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)Google Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  1. 1.Management and ScienceNeotia Institute of TechnologyCalcuttaIndia
  2. 2.Department of Computer Science and TechnologyIndian Institute of Engineering Science and TechnologyHowrahIndia

Personalised recommendations