Abstract
The selection of gene identifier from microarray databases is a challenging task since microarray contains large number of gene attributes for a few samples. This article proposes a novel fuzzy-rough set-based gene expression features selection using fuzzy-rough reduct under multi-granular space for human diabetes patient. Firstly, fuzzy multi-granular gain has been computed from the expression datasets via fuzzy entropy which reduces the dimension of the database. Thereafter, the features have been selected from microarray using the fuzzy rough reduct and information gain with respect to their expression patterns. To reduce the computational cost, a decision making scheme has been designed using a rough approximation of a fuzzy concept in the field of multi-granulation framework. Finally, we have recognized the association among the genomes that have expressively different expression patterns from controlled state to the diabetic state with respect to their impression using modified fuzzy-rough nearest neighbour classifier (FRNNC). Five standard diabetic microarray datasets have been considered to quantify the efficiency of the designed FRNNC model and are validated with F measure using diabetes gene expression NCBI database and it performs superior compared to existing methods.
Similar content being viewed by others
References
Thomas A, Rebecca L (2015) Prevalence of diabetic retinopathy within a national diabetic retinopathy screening service. Br J Ophthalmol 99(1):64–68. https://doi.org/10.1136/bjophthalmol-2013-304017
Florez JC, Manning AK, Dupuis J (2007) A 100K genome-wide association scan for diabetes and related traits in the Framingham Heart Study: replication and integration with other genome-wide datasets. Diabetes 56(12):3063–3074. https://doi.org/10.2337/db07-0451
Hanson RL, Bogardus C, Duggan D (2007) A search for variants associated with young-onset type 2 diabetes in Americal Indians in 100K genotyping array. Diabetes 56(12):3045–3052. https://doi.org/10.2337/db07-0462
Rmapersaud E, Damcott CM, Fu M (2007) Identification of novel candidate genes for type 2 diabetes from a genome-wide association scan in the old order amish: evidence for replication from diabetes related quantitative traits and from independent populations. Diabetes 56(12):3053–3062. https://doi.org/10.2337/db07-0457
Das R, Kalita J, Bhattacharyya DK (2011) A pattern matching approach for clustering gene expression data. Int J Data Min Model Manag. https://doi.org/10.1504/IJDMMM.2011.041492
Jiang D, Peri J, Zhang A (2003) DHC: a density based hierarchical clustering methods for time series gene expression data. IEEE Int Symp Bioinform Bioeng. https://doi.org/10.1109/BIBE.2003.1188978
Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87. https://doi.org/10.1198/016214502753479248
Nayak RK, Mishra D, Shaw K, Mishra S (2012) Rough set based attribute clustering for sample classification of gene expression data. Int Conf Model Optim Comput. https://doi.org/10.1016/j.proeng.2012.06.219
Banerjee M, Mitra S, Banka H (2007) Evolutionary Rough feature selection in gene expression data. IEEE Trans Syst Man Cybern Part C Appl Rev. https://doi.org/10.1109/TSMCC.2007.897498
Maji P, Pal SK (2007) Protein sequence analysis using relational soft clustering algorithms. Int J Comput Math 84(5):599–617. https://doi.org/10.1080/00207160701210083
Tong MKH, Liu C, Xu W (2013) An ensemble of SVM classifiers based on gene pairs. Comput Biol Med 43(6):729–737. https://doi.org/10.1016/j.compbiomed.2013.03.010
Danaee P, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. Pac Symp Biocomput. https://doi.org/10.1142/9789813207813_0022
Xie R, Quitadamo A, Cheng J, Shi X (2016) A predictive model of gene expression using a deep learning framework. In: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM). https://doi.org/10.1109/BIBM.2016.7822599
Jia L, Peng Q, Chen X, Sun Z (2016) A multi-objective heuristic algorithm for gene expression microarray data classification. Expert Syst Appl 59:13–19. https://doi.org/10.1016/j.eswa.2016.04.020
Gao L, Ye M et al (2017) Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genom Proteom Bioinform 15:389–395. https://doi.org/10.1016/j.gpb.2017.08.002
Lu H, Chen J et al (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62. https://doi.org/10.1016/j.neucom.2016.07.0800
Sarah MA, Saleh AI, Labib M (2019) Gene expression cancer classification using modified K-nearest neighbors technique. Biosystems 176:41–51. https://doi.org/10.1016/j.biosystems.2018.12.009
Abualigah L, Shehab M, Alshinwan M et al (2020) Ant lion optimizer: a comprehensive survey of its variants and applications. Arch Comput Methods Eng. https://doi.org/10.1007/s11831-020-09420-6
Abualigah L, Diabat A, Geem ZW (2020) A comprehensive survey of the harmony search algorithm in clustering applications. Appl Sci 10(11):3827. https://doi.org/10.3390/app10113827
Sun L, Kong X, Xu J et al (2019) A hybrid gene selection method based on relieff and ant colony optimization algorithm for tumor classification. Sci Rep 9:8978. https://doi.org/10.1038/s41598-019-45223-x
Zadeh LA (1999) Fuzzy logic = computing with words. In: Zadeh LA, Kacprzyk J (eds) Computing with words in information/intelligent systems 1. Studies in fuzziness and soft computing, vol 33. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1873-4_1
Polkowski L, Skowron A (1998) Rough sets in knowledge discovery. Studies in fuzziness and soft computing series. Physica-Verlag, Heidelberg. https://doi.org/10.1007/978-3-7908-1883-3
Qu Y, Shen Q, Mac-Parthalain N, Shang C, Wu W (2012) Fuzzy similarity-based nearest-neighbour classification as alternatives to their fuzzy-rough parallels. Int J Approx Reason 54(1):184–195. https://doi.org/10.1016/j.ijar.2012.06.008
Ghosh A, De RK (2016) Fuzzy correlation association mining: selection altered associations among the genes, and some possible marker genes mediating certain cancers. Appl Soft Comput 38:587–605. https://doi.org/10.1016/j.asoc.2015.09.057
Nguyen T, Nahavandi S (2016) Modified AHP for gene selection and cancer classification using type-2 fuzzy logic. IEEE Trans Fuzzy Syst 24(2):273–287. https://doi.org/10.1109/TFUZZ.2015.2453153
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, Norwell. https://doi.org/10.1007/978-94-011-3534-4
Hu Q, Zhang L, An S, Zhang D, Yu D (2012) On robust fuzzy rough set models. IEEE Trans Fuzzy Syst 20(4):636–651. https://doi.org/10.1109/TFUZZ.2011.2181180
Sun B, Ma W, Qian Y (2017) Multigranulation fuzzy rough set over two universes and its application to decision making. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2017.01.036
Jensen R, Parthalain NM (2015) Towards scalable fuzzy—rough feature selection. Inf Sci 15:1–15. https://doi.org/10.1016/j.ins.2015.06.025
Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: theory and applications. Prentice-Hall PTR, Upper Saddle River. https://doi.org/10.1021/ci950144a
Li J, Zhang L, Li H et al (2019) Integrated entropy-based approach for analyzing exons and introns in DNA sequences. BMC Bioinform 20:283. https://doi.org/10.1186/s12859-019-2772-y
Sarkar M (2007) Fuzzy-rough nearest neighbors algorithm. Fuzzy Sets Syst 158:2123–2152. https://doi.org/10.1016/j.tcs.2011.05.040
Jensen R, Cornelis C (2011) Fuzzy-rough nearest neighbour classification and prediction. Theor Comput Sci 412:5871–5884. https://doi.org/10.1016/j.tcs.2011.05.040
Available: http://www.ncbi.nlm.nih.gov/geo
Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Gene Inform 13:51–60. https://doi.org/10.1016/j.procs.2013.10.003
Melin P, Castillo OA (2014) Review on type-2 fuzzy logic applications in clustering, classification and pattern recognition. Appl Soft Comput 21:568–577. https://doi.org/10.1016/j.asoc.2014.04.017
Ghosh SK, Ghosh A, Chakrabarti A (2018) VEA: vessel extraction algorithm by active contour model and a novel wavelet analyzer for diabetic retinopathy detection. Int J Image Gr. https://doi.org/10.1142/S0219467818500080
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of interest in this research.
Rights and permissions
About this article
Cite this article
Ghosh, S., Ghosh, A. A Novel Human Diabetes Biomarker Recognition Approach Using Fuzzy Rough Multigranulation Nearest Neighbour Classifier Model. Interdiscip Sci Comput Life Sci 12, 461–475 (2020). https://doi.org/10.1007/s12539-020-00391-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-020-00391-7