Abstract
Schizophrenia is a debilitating psychiatric disorder that can significantly affect a patient’s quality of life and lead to permanent brain damage. Although medical research has identified certain genetic risk factors, the specific pathogenesis of the disorder remains unclear. Despite the prevalence of research employing magnetic resonance imaging, few studies have focused on the gene level and gene expression profile involving a large number of screened genes. However, the high dimensionality of genetic data presents a great challenge to accurately modeling the data. To tackle the current challenges, this study presents a novel feature selection strategy that utilizes heuristic feature fusion and a multi-objective optimization genetic algorithm. The goal is to improve classification performance and identify the key gene subset for schizophrenia diagnostics. Traditional gene screening techniques are inadequate for accurately determining the precise number of key genes associated with schizophrenia. Our innovative approach integrates a filter-based feature selection method to reduce data dimensionality and a multi-objective optimization genetic algorithm for improved classification tasks. By combining the filtering and wrapper methods, our strategy leverages their respective strengths in a deliberate manner, leading to superior classification accuracy and a more efficient selection of relevant genes. This approach has demonstrated significant improvements in classification results across 11 out of 14 relevant datasets. The performance on the remaining three datasets is comparable to the existing methods. Furthermore, visual and enrichment analyses have confirmed the practicality of our proposed method as a promising tool for the early detection of schizophrenia.
Similar content being viewed by others
Data availability
The data underlying this article are available in Gene Expression Omnibus (GEO) publicly accessible database. All the datasets were derived from sources in the public domain:
GSE12649: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12649.
GSE12654: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12654.
GSE12679: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12679.
GSE17612: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17612.
GSE21138: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE21138.
GSE21935: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE21935.
GSE26927: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE26927.
GSE35974: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE35974.
GSE35977: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE35977.
GSE35978: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE35978.
GSE53987: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53987.
GSE62191: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE62191.
GSE87610: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87610.
GSE93987: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93987.
References
Adhao R, Pachghare V (2020) Feature selection using principal component analysis and genetic algorithm. J Discrete Math Sci Crypt 23(2):595–602
Aghamaleki-Sarvestani Z et al (2020) Catechol-O-methyltransferase gene expression in stress-induced and non-stress induced schizophrenia. Psychiatr Genet 30(1):10–18
Alkelai A et al (2022) The benefit of diagnostic whole genome sequencing in schizophrenia and other psychotic disorders. Mol Psychiatry 27(3):1435–1447
Almutiri T, Saeed F (2022) A hybrid feature selection method combining Gini index and support vector machine with recursive feature elimination for gene expression classification. Int J Data Min Modelling Manage 14(1):41–62
Archie SR, Al Shoyaib A, Cucullo L (2021) Blood-brain barrier dysfunction in CNS disorders and putative therapeutic targets: an overview. Pharmaceutics 13(11):1779
Arion D et al (2015) Distinctive transcriptome alterations of prefrontal pyramidal neurons in schizophrenia and schizoaffective disorder. Mol Psychiatry 20(11):1397–1405
AY P, Rayanki B (2020) A generic algorithmic protocol approaches to improve network life time and energy efficient using combined genetic algorithm with simulated annealing in MANET. Int J Intell Unmanned Syst 8(1):23–42
Bozzatello P et al (2020) Effects of omega 3 fatty acids on main dimensions of psychopathology. Int J Mol Sci 21(17):6042
Bracher-Smith M, Crawford K, Escott-Price V (2021) Machine learning for genetic prediction of psychiatric disorders: a systematic review. Mol Psychiatry 26(1):70–79
Bracher-Smith M et al (2022) Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank. Schizophr Res 246:156–164
Ceccarelli F et al (2020) Bringing data from curated pathway resources to Cytoscape with OmniPath. Bioinformatics 36(8):2632–2633
Chen C et al (2018) The transcription factor POU3F2 regulates a gene coexpression network in brain tissue from patients with psychiatric disorders. Sci Transl Med 10(472):eaat8178
Chen K et al (2020) An evolutionary multitasking-based feature selection method for high-dimensional classification. IEEE Trans Cybernetics 52(7):7172–7186
Cruz-Martinez C, Reyes-Garcia CA, Vanello N (2022) A novel event-related fMRI supervoxels-based representation and its application to schizophrenia diagnosis. Comput Methods Programs Biomed 213:106509
Cui H, Xu J, Zhou H (2022) The Effectiveness of Cognitive Behavioral Therapy on Schizophrenia in China: A Systematic Reveiw. In, 8th International Conference on Humanities and Social Science Research (ICHSSR 2022). Atlantis Press; 2022. p. 2112–2116
Datta D et al (2020) Mapping phosphodiesterase 4D (PDE4D) in macaque dorsolateral prefrontal cortex: postsynaptic compartmentalization in layer III pyramidal cell circuits. Front Neuroanat 14:578483
Guan F et al (2022) Integrative omics of schizophrenia: from genetic determinants to clinical classification and risk prediction. Mol Psychiatry 27(1):113–126
Gunasekara CJ et al (2021) A machine learning case–control classifier for schizophrenia based on DNA methylation in blood. Translational Psychiatry 11(1):412
GuolinKe QM et al (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:52
Harris LW et al (2008) The cerebral microvasculature in Schizophrenia: a laser capture Microdissection Study. PLoS ONE 3(12):e3964
Henkel ND et al (2022) A disorder of broken brain bioenergetics. Mol Psychiatry 27(5):2393–2404Schizophrenia
Iwamoto K et al (2004) Molecular characterization of bipolar disorder by comparing gene expression profiles of postmortem brains of major mental disorders. Mol Psychiatry 9(4):406–416
Iwamoto K, Bundo M, Kato T (2005) Altered expression of mitochondria-related genes in postmortem brains of patients with bipolar disorder or schizophrenia, as revealed by large-scale DNA microarray analysis. Hum Mol Genet 14(2):241–253
Jahromi AH, Taheri M (2017) A non-parametric mixture of Gaussian naive Bayes classifiers based on local independent features. 2017 Artificial intelligence and signal processing conference (AISP) :209–212
Kakhramonovich TP (2022) Epidemiology of Pysichiatric disorders. Tex J Med Sci 12:102–105
Kavitha K et al (2020) ,. Laplacian score and Top scoring pair Feature selection algorithms. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) :214–219
Kusko R et al (2018) Large-scale transcriptomic analysis reveals that pridopidine reverses aberrant gene expression and activates neuroprotective pathways in the YAC128 HD mouse. Mol Neurodegener 13:1–5
Lanz TA et al (2019) Postmortem transcriptional profiling reveals widespread increase in inflammation in schizophrenia: a comparison of prefrontal cortex, striatum, and hippocampus among matched tetrads of controls with subjects diagnosed with schizophrenia, bipolar or major depressive disorder. Transl Psychiatry 9(1):151
Leske M et al (2022) BiGAMi: bi-objective genetic algorithm fitness function for feature selection on Microbiome datasets. Methods Protocols 5(3):42
Li T et al (2017) A scored human protein–protein interaction network to catalyze genomic interpretation. Nat Methods 14(1):61–64
Li X et al (2020) Variants and expression changes in PPAR-encoding genes display no significant association with schizophrenia. Biosci Rep 40(7):BSR20201083
Li Z et al (2022) Identification of potential biomarkers and their correlation with immune infiltration cells in schizophrenia using combinative bioinformatics strategy. Psychiatry Res 314:114658
Luo F et al (2019) DeepPhos: prediction of protein phosphorylation sites with deep learning. Bioinformatics 35(16):2766–2773
Maycox PR et al (2009) Analysis of gene expression in two large schizophrenia cohorts identifies multiple changes associated with nerve terminal function. Mol Psychiatry 14(12):1083–1094
Mirjalili S (2019) Genetic algorithm. Evolutionary algorithms and neural networks: theory and applications. Springer International Publishing, Cham, pp 43–55
Murray AJ et al (2021) Oxidative stress and the pathophysiology and symptom profile of schizophrenia spectrum disorders. Front Psychiatry 12:703452
Nohara Y et al (2022) Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed 214:106584
Oughtred R et al (2019) The BioGRID interaction database: 2019 update. Nucleic Acids Res 47(D1):D529–D541
Pardinas AF et al (2018) Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat Genet 50(3):381–389
Piñero J et al (2021) The DisGeNET cytoscape app: exploring and visualizing disease genomics data. Comput Struct Biotechnol J 19:2960–2967
Pourpanah F et al (2023) A review of artificial fish swarm algorithms: recent advances and applications. Artif Intell Rev 56(3):1867–1903
Qureshi MNI, Oh J, Lee B (2019) 3D-CNN based discrimination of schizophrenia using resting-state fMRI. Artif Intell Med 98:10–17
Sharma I et al (2022) Association of toll-like receptor 2 gene polymorphism (rs3804099) with susceptibility to Schizophrenia risk in the Dogra population of Jammu region, North India. Eur J Psychiatry 36(2):106–113
Shin W et al (2021) Influence of cytochrome P450 2D6 polymorphism on hippocampal white matter and treatment response in schizophrenia. Npj Schizophrenia 7(1):5
Szklarczyk D et al (2020) The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 49(D1):D605–D612
Tunç S et al (2019) Serum ceruloplasmin-ferroxidase activity in bipolar disorder is elevated compared to major depressive disorder and schizophrenia: a controlled study. Psychiatry Clin Psychopharmacol 29(3):307–314
Türei D et al (2021) Integrated intra-and intercellular signaling knowledge for multicellular omics analysis. Mol Syst Biol 17(3):e9923
Wang D, Tan D, Liu L (2018) Particle swarm optimization algorithm: an overview. Soft Comput 22:387–408
Wang J et al (2019) Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches. Brief Bioinform 20(3):931–951
Wang P-H, Tu Y-S, Tseng YJ (2019b) PgpRules: a decision tree based prediction server for P-glycoprotein substrates and inhibitors. Bioinformatics 35(20):4193–4195
Wang Q et al (2019c) A bayesian framework that integrates multi-omics data and gene networks predicts risk genes from schizophrenia GWAS data. Nat Neurosci 22(5):691–699
Wei G et al (2020) A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput 93:106337
Wiharto W et al (2022) Hybrid feature selection method based on genetic algorithm for the diagnosis of Coronary Heart Disease. J Inform Communication Convergence Eng 20(1):31–40
Xie Q et al (2019) A core collection of pan-schizophrenia genes allows building cohort-specific signatures of affected brain. Sci Rep 9(1):12671
Yan W et al (2022) Mapping relationships among schizophrenia, bipolar and schizoaffective disorders: a deep classification and clustering framework using fMRI time series. Schizophr Res 245:141–150
Yang Q et al (2020a) Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data. Brief Bioinform 21(3):1058–1068
Yang Z et al (2020b) Robust discriminant feature selection via joint L2, 1-norm distance minimization and maximization. Knowl Based Syst 207:106090
Yuan Z et al (2021) ,. Large-scale robust deep auc maximization: A new surrogate loss and empirical studies on medical image classification. Proceedings of the IEEE/CVF International Conference on Computer Vision :3040–3049
Yuan X et al (2022) Pro-inflammatory cytokine levels are elevated in female patients with schizophrenia treated with clozapine. Psychopharmacology 239(3):765–771
Zahiri J et al (2020) Protein complex prediction: a survey. Genomics 112(1):174–183
Zhou Y et al (2019) Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun 10(1):1–10
Funding
The work was jointly supported by the Zhejiang Provincial Natural Science Foundation of China (No.LY21F020017), National Natural Science Foundations of China (No. U20A20386), Science and Technology Program of Zhejiang Province (No. 2022C03043, 2022C01016), GuangDong Basic and Applied Basic Research Foundation (No.2022A1515110570).
Author information
Authors and Affiliations
Contributions
ZC: Methodology, software, visualization, Writing—original draft. RG: Investigation, methodology, Writing—review & editing, funding acquisition, project administration. CW: Methodology, formal analysis, supervision, funding acquisition. AE: Formal analysis, investigation, data curation, Writing—review & editing. XF: Formal analysis, review & editing, visualization. WM: Conceptualization, Writing—review & editing. FQ: Conceptualization, formal analysis, Writing—review & editing, funding acquisition. GJ:Data curation, review & editing. XF: Investigation, data curation, Writing—review & editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, Z., Ge, R., Wang, C. et al. Identification of important gene signatures in schizophrenia through feature fusion and genetic algorithm. Mamm Genome (2024). https://doi.org/10.1007/s00335-024-10034-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00335-024-10034-7