Hybrid Firefly Based Simultaneous Gene Selection and Cancer Classification Using Support Vector Machines and Random Forests

  • Atulji Srivastava
  • Saurabh Chakrabarti
  • Subrata Das
  • Shameek Ghosh
  • V. K. Jayaraman
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 201)


Microarray cancer gene expression datasets are high dimensional and thus complex for efficient computational analysis. In this study, we address the problem of simultaneous gene selection and robust classification of cancerous samples by presenting two hybrid algorithms, namely Discrete firefly based Support Vector Machines (DFA-SVM) and DFA-Random Forests (DFA-RF) with weighted gene ranking as heuristics. The performances of the algorithms are then tested using two cancer gene expression datasets retrieved from the Kent Ridge Biomedical Dataset Repository. Our results show that both DFA-SVM and DFA-RF can help in extracting more informative genes aiding to building high performance prediction models.


Cancer classification Weighted gene ranking Firefly algorithm Support vector machines Random forests 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



VKJ gratefully acknowledges the Department of Science and Technology (DST), New Delhi, India for financial support.


  1. Patil, D., Raj, R., Shingade, P., Kulkarni, B., Jayaraman, V.K.: Feature selection and classification employing hybrid ant colony optimization-random forest methodology. Comb Chem High Throughput Screen, vol. 12, no. 5. 507–513 (2009).Google Scholar
  2. Sharma, S.,,Ghosh, S., Anantharaman, N., Jayaraman, V.K., 2012.: Simultaneous informative gene extraction and cancer classification using aco-antminer and aco-random forests. Advances in Intelligent and Soft Computing. Springer, vol. 132. 755–761 (2012).Google Scholar
  3. Gupta A., Jayaraman V. K., Kulkarni. B. D.: Feature selection for cancer classification using ant colony optimization and support vector machines. Analysis of Biological Data : A Soft Computing Approach. ser. World Scientific, Singapore. 259 –280(2006).Google Scholar
  4. Nikumbh S., Ghosh S., Jayaraman V. K.: Biogeography-Based Informative Gene Selection and Cancer Classification Using SVM and Random Forests. In IEEE World Congress on Computational Intelligence (IEEE WCCI 2012), Australia, In IEEE Press.(2012).Google Scholar
  5. John G. H., Kohavi R., and Pfleger K.: Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Conference on Machine Learning. 121–129.(1994).Google Scholar
  6. Yang X-S.: Nature-Inspired Metaheuristic Algorithm. Luniver Press(2008).Google Scholar
  7. Yang X-S.: Firefly algorithms for multimodal optimization, in: Stochastic Algorithms: Foundations and Applications, SAGA, Lecture Notes in Computer Sciences, 5792, 169-178.(2009).Google Scholar
  8. Jati G. K. and Suyanto S.: Evolutionary discrete firefly algorithm for travelling salesman problem. In ICAIS2011. Lecture Notes in Artificial Intelligence (LNAI 6943). 393-403 (2011).Google Scholar
  9. Palit S., Sinha S., Molla M., Khanra A., Kule M.: A cryptanalytic attack on the knapsack cryptosystem using binary Firefly algorithm. In 2nd Int. Conference on Computer and Communication Technology (ICCCT), 15-17 Sept 2011,India, pp. 428-432 (2011).Google Scholar
  10. Sayadi M. K., Ramezanian R., Ghaffari-Nasab N.: A discrete firefly meta-heuristic with local search for makespan minimization in permutation flow shop scheduling problems. Int. J. of Industrial Engineering Computations 1: 1–10 (2010).Google Scholar
  11. Aungkulanon P., Chai-ead, N., Luangpaiboon P.: Simulated manufacturing process improvement via particle swarm optimisation and firefly algorithms. In Prof. Int. Multiconference of Engineers and Computer Scientists 2: 1123–1128. (2011).Google Scholar
  12. U. Hönig U.: A firefly algorithm-based approach for scheduling task graphs in homogenous systems. Proceeding Informatics. DOI:  10.2316/P.2010.724-033, 724 (2010).
  13. Senthilnath J., Omkar S.N. and Mani V.: Clustering using firefly algorithm: Performance study, Swarm and Evolutionary Computation, June (2011).Google Scholar
  14. Han J., Kamber M., and Pei J., Data Mining: Concepts and Techniques - Information Gain, ser. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann (2011).Google Scholar
  15. Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., and Witten I. H.: The weka data mining software: An update. SIGKDD Explor, vol. 11. 130–133(2009).Google Scholar
  16. C. N. Shawe-Taylor J.: Support Vector Machines and Other Kernel-based Methods. Cambridge, UK. Cambridge University Press. (2000).Google Scholar
  17. Boser, Bernhard E., Guyon, Isabelle M., and Vapnik, Vladimir N.: Training algorithm for optimal margin classifiers. In 5th Annual ACM Workshop on COLT, 144–152, Pittsburgh, PA, 1992. ACM Press(1992).Google Scholar
  18. Chang, C.-C and Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology. vol. 2. 27:1–27:27(2011).Google Scholar
  19. Breiman L.: Random forests. Machine Learning. vol. 45. pp. 5–32. (2001).Google Scholar
  20. Breiman L. and Stone F.O.: Classification and regression trees. Chapman and Hall. (1984).Google Scholar
  21. Kent ridge bio-medical dataset. URL:
  22. Alon U., Barkai N., Notterman D.A., Gish K., Ybarra S., Mack D., and Levine A.J.,.: Broad patterns of gene expression revealed byclustering analysis of tumor and normal colon tissues probed byoligonucleotide arrays. Proceedings of the National Academy of Sciences. vol. 96. no. 12. pp. 6745–6750(1999).Google Scholar
  23. Golub T.R., Slonim D.K., Tamayo P., Huard C., Gaasenbee M., Mesirov J.P., Coller H., Loh M. L., Downing J.R., Caligiuri M.A., Bloomfield C.D., and Lander E.S. : Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science. vol. 286. no. 5439. 531–537.(1999).Google Scholar
  24. Guyon I., Weston J., Barnhill S., and Vapnik V.: Gene selection for cancer classification using support vector machines. Machine Learning. vol. 46. 389–422. (2002).Google Scholar
  25. Mohammad S., Azadeh M. and Mansoor. S.; Identification of disease-causing genes using microarray data mining and gene ontology. BMC Medical Genomics. vol. 4. 4:12 (2011).Google Scholar
  26. Liu Q., Sung A. H., Chen Z., Liu J., Chen L., Qiao M., Wang Z, Huang X. and Deng Y.: Gene selection and classification for cancer microarray data based on machine learning and similarity measures. BMC Genomics. vol. 12. 130–133(2011).Google Scholar
  27. L. Sun, D. Miao, and H. Zhang.: Efficient gene selection with rough sets from gene expression data. In Rough Sets and Knowledge Technology, ser. Lecture Notes in Computer Science. vol. 5009. 164–171(2008).Google Scholar

Copyright information

© Springer India 2013

Authors and Affiliations

  • Atulji Srivastava
    • 1
  • Saurabh Chakrabarti
    • 2
  • Subrata Das
    • 2
  • Shameek Ghosh
    • 3
  • V. K. Jayaraman
    • 3
  1. 1.Dr. D.Y. Patil Biotechnology and Bioinformatics InstitutePadmashree Dr. D.Y. Patil UniversityPuneIndia
  2. 2.Department of Computer ScienceUniversity of PunePuneIndia
  3. 3.Evolutionary Computing and Image Processing Group, Center for Development of Advanced Computing (CDAC)Pune University CampusPuneIndia

Personalised recommendations