Rough Set Based Variable Tolerance Attribute Selection on High-Dimensional Microarray Imbalanced Data

  • Arunkumar ChinnaswamyEmail author
  • Ramakrishnan Srinivasan
  • Sooraj M. Poolakkaparambil
Original Article


The two most interesting and challenging problems in machine learning that attracts huge attention from industry and academia are attribute selection and classification of imbalanced datasets. Rough set models have gained much of importance in the recent years because they neither use any prior information nor assumptions. Conventional rough set models deal only with discretized data whereas real-world applications use real-valued data. Discretization of real-valued data leads to loss of information that changes the characteristics of the whole dataset. One of the solutions proposed to solve this problem is the concept of tolerance-based rough set. This paper proposes variable tolerance rough set method that computes the tolerance value for each attribute compared to the traditional fixed tolerance-based rough set that uses a fixed value for attribute selection. The class imbalanced dataset is normalized, converted to balanced dataset, and correlation-based filter is used to reduce the dimensionality of the datasets. The proposed method is applied on the dimensionality reduced balanced dataset. The computed statistical measures reveal that the proposed method exhibits better performance compared to fixed tolerance rough set as evident from the experimental results.


Rough set Attribute selection Microarray Fixed tolerance Variable tolerance 



We would like to express our deep sense of gratitude to the Management team and Chairperson of the Department of Computer Science and Engineering, Amrita University, India and also the Management, Secretary and Principal of Dr.Mahalingam College of Engineering and Technology, Pollachi, India, for supporting us to carry out this research work.


  1. 1.
    F. Provost, D. Jensen, T. Oates, Efficient progressive sampling, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, 23–32 (1999)Google Scholar
  2. 2.
    H. Liu, J. Li, L. Wong, A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome informatics 13, 51–60 (2002)Google Scholar
  3. 3.
    L. Li, Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb Chem High Throughput Screen 4(8), 727–739 (2001)CrossRefGoogle Scholar
  4. 4.
    Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data Kluwer Academic Publishers (The Netherlands, Dordrecht, 1991)CrossRefGoogle Scholar
  5. 5.
    D. Vanderpooten, Similarity relation as a basis for rough approximations. Advanced Machine Intelligence Soft Computing 4, 17–33 (1997)Google Scholar
  6. 6.
    H. Inbarani, A.T. Azar, G. Jothi, Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Prog Biomed 113(1), 175–185 (2014)CrossRefGoogle Scholar
  7. 7.
    Arunkumar C, Ramakrishnan S., Attribute selection using fuzzy roughset based customized similarity measure for lung cancer microarray gene expression data. Future Computing and Informatics Journal, doi: (2018)
  8. 8.
    J. Stepaniuk, Optimizations of rough set model. Fundamenta Informaticae 36(2–3), 265–283 (1998)MathSciNetzbMATHGoogle Scholar
  9. 9.
    A. Chouchoulas, Q. Shen, Rough set-aided keyword reduction for text categorization. Appl Artif Intell 15(9), 843–873 (2001)CrossRefGoogle Scholar
  10. 10.
    Jensen R, Shen Q (2008) Computational intelligence and feature selection: rough and fuzzy approaches. Vol. 8. John Wiley & SonsGoogle Scholar
  11. 11.
    B.O. Alijla, L.-P. Wong, C.P. Lim, A.T. Khader, M.A. Al-Betar, A modified intelligent water drops algorithm and its application to optimization problems. Expert Syst Appl 41(15), 6555–6569 (2014)CrossRefGoogle Scholar
  12. 12.
    N. Jiao, D. Miao, An efficient gene selection algorithm based on tolerance rough set theory. Rough sets, fuzzy sets, data mining and granular computing. RSFDGrC 2009. Lect Notes Comput Sci 5908, 176–183 (2009)CrossRefGoogle Scholar
  13. 13.
    M.S. Raza, U. Qamar, An incremental dependency calculation technique for feature selection using rough sets. Inf Sci 343, 41–65 (2016)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Parthalain NSM (2009) Rough set extensions for feature selectionGoogle Scholar
  15. 15.
    N. Chawla, K.W. Bowyer, L.O. Hall, W. Philip Kegelmeyer, SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16(1), 321–357 (2002)CrossRefGoogle Scholar
  16. 16.
    R. Blagus, L. Lusa, SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics 14(106), 1–16 (2013)Google Scholar
  17. 17.
    Z. Zheng, Y. Cai, Y. Li, Oversampling method for imbalanced classification. Computing and Informatics 34(5), 1017–1037 (2015)zbMATHGoogle Scholar
  18. 18.
    G. Bontempi, A blocking strategy to improve gene selection for classification of gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(2), 293–300 (2007)CrossRefGoogle Scholar
  19. 19.
    Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. Proceedings of the 17th international conference on machine learning, 359–366Google Scholar
  20. 20.
    Anarki JR, Eftekhari M (2013) Rough set based feature selection – a review. Proceedings of the 5th conference on Information and Knowledge Technology, 301–306Google Scholar
  21. 21.
    S. Dash, B. Patra, B.K. Tripathy, Study of classification accuracy of microarray data for Cancer classification using multivariate and hybrid feature selection method. IOSR Journal of Engineering 2(8), 112–119 (2012)CrossRefGoogle Scholar
  22. 22.
    C.-S. Yang, L.-Y. Chuang, C.-H. Ke, C.-H. Yang, A hybrid method of feature selection for microarray gene expression data. IAENG Int J Comput Sci 35(3), 219–225 (2008)Google Scholar
  23. 23.
    Hall MA (1999) Correlation-based feature selection for machine learning. University of WaikatoGoogle Scholar
  24. 24.
    Arunkumar C, Ramakrishnan S (2015) Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. Innovations in Bioinspired computing and applications. Proceedings of the 6th International Conference in Bioinspired computing and Applications, Advances in Intelligent Systems and Computing, Springer, 229–239Google Scholar
  25. 25.
    C. Arunkumar, S. Ramakrishnan, A comparative study of hybrid feature selection methods using correlation coefficient for microarray data. Journal of Network and Innovative Computing 4(1), 164–174 (2016)Google Scholar
  26. 26.
    C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, A. Nowe, A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(4), 1106–1119 (2012)CrossRefGoogle Scholar
  27. 27.
    Q. Shen, R. Jensen, Rough sets, their extensions and applications. Int J Autom Comput 4(3), 217–228 (2007)CrossRefGoogle Scholar
  28. 28.
    Jensen R, Shen Q (2007) Tolerance-based and fuzzy-rough feature selection. Proceedings of Fuzzy Systems ConferenceGoogle Scholar
  29. 29.
    A. Skowron, J. Stepaniuk, Tolerance approximation spaces. Fundamenta Informaticae 27(2–3), 245–253 (1996)MathSciNetzbMATHGoogle Scholar
  30. 30.
    T.S. Furey, Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)CrossRefGoogle Scholar
  31. 31.
    Quinlan JR (2014) C4. 5: programs for machine learning. ElsevierGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Arunkumar Chinnaswamy
    • 1
    Email author
  • Ramakrishnan Srinivasan
    • 2
  • Sooraj M. Poolakkaparambil
    • 1
  1. 1.Department of Computer Science and EngineeringAmrita School of EngineeringCoimbatoreIndia
  2. 2.Department of Information TechnologyDr.Mahalingam College of Engineering and TechnologyPollachiIndia

Personalised recommendations