Abstract
Schizophrenia (SCZ) is a chronic, severe, and complex psychiatric disorder that affects all aspects of personal functioning. While SCZ has a very strong biological component, there are still no objective diagnostic tests. Lately, special attention has been given to epigenetic biomarkers in SCZ. In this study, we introduce a three-step, automated machine learning (AutoML)-based, data-driven, biomarker discovery pipeline approach, using genome-wide DNA methylation datasets and laboratory validation, to deliver a highly performing, blood-based epigenetic biosignature of diagnostic clinical value in SCZ. Publicly available blood methylomes from SCZ patients and healthy individuals were analyzed via AutoML, to identify SCZ-specific biomarkers. The methylation of the identified genes was then analyzed by targeted qMSP assays in blood gDNA of 30 first-episode drug-naïve SCZ patients and 30 healthy controls (CTRL). Finally, AutoML was used to produce an optimized disease-specific biosignature based on patient methylation data combined with demographics. AutoML identified a SCZ-specific set of novel gene methylation biomarkers including IGF2BP1, CENPI, and PSME4. Functional analysis investigated correlations with SCZ pathology. Methylation levels of IGF2BP1 and PSME4, but not CENPI were found to differ, IGF2BP1 being higher and PSME4 lower in the SCZ group as compared to the CTRL group. Additional AutoML classification analysis of our experimental patient data led to a five-feature biosignature including all three genes, as well as age and sex, that discriminated SCZ patients from healthy individuals [AUC 0.755 (0.636, 0.862) and average precision 0.758 (0.690, 0.825)]. In conclusion, this three-step pipeline enabled the discovery of three novel genes and an epigenetic biosignature bearing potential value as promising SCZ blood-based diagnostics.
Similar content being viewed by others
Introduction
Schizophrenia (SCZ) is a chronic, severe, and debilitating psychiatric disorder with a complex and heterogeneous genetic and neurobiological background, that influences early brain development and affects all areas of personal functioning [1,2,3,4]. The disorder is typically characterized by “positive” (e.g., delusions, hallucinations, disorganized behavior, and thinking), “negative” (e.g., loss of motivation and interest, social withdrawal, anhedonia, affective flattening), and “cognitive” symptoms and is associated with serious disability and functional impairment, as well as with a much higher risk of physical and mental comorbidities and a much lower overall life expectancy [1,2,3,4].
The pathophysiology of SCZ is extremely complex and, despite the qualitative research of the past decades, only in part understood. The neurodevelopmental hypothesis of SCZ postulates that the risk of developing the disorder lies in a heritable risk and additional environmental exposures that occur throughout development [5, 6]. Accordingly, although SCZ features a very strong genetic component with a heritability of about 80%, a very broad range of additional environmental factors and stressors has been suggested to contribute to the neurodevelopment of the disorder. Hence, only 50% of monozygotic twins of patients develop the disease, pointing to a significant role of epigenetic modifications. Such epigenetic modifications include DNA methylation, histone modifications, and non-coding RNAs [7]. Accumulating knowledge suggests that altered DNA methylation profiles of several genes are implicated in the pathogenesis of SCZ and related psychiatric disorders. For example, studies on the reelin gene (RELN), an important GABAergic candidate gene, revealed increased methylation levels of RELN promoter in different brain areas of patients with SCZ [8] and the peripheral blood of SCZ patients [9]. Regarding the dopaminergic pathway, methylation studies have focused on the catechol-O-methyltransferase gene (COMT). In an epigenetic analysis of post-mortem human brains, authors found significant hypomethylation of COMT promoter in SCZ patients [10] also noted in the saliva DNA of SCZ patients [11]. Methylation analysis of the serotonin signaling receptor genes, 5-hydroxytryptamine receptor 1 A (HTR1A) and 5-hydroxytryptamine receptor 2 A (HTR2A) revealed significant associations between hypermethylation and SCZ brain [12] or blood [13]. Furthermore, multiple studies have shown that blood DNA methylation of the brain-derived neurotrophic factor gene (BDNF) is related to SCZ [14, 15], although altered BDNF methylation was not observed in brain tissue [16].
Among many others, urbanicity, viral infections, maternal immune activation, obstetric complications, nutrient deprivation, parental absence, toxins, childhood trauma, cannabis exposure, etc. have been all repeatedly shown to be related to a higher risk of developing SCZ [1,2,3,4]. The long-lasting impact of such environmental factors and stressors on neurodevelopment is suggested to be mainly mediated by epigenetic modifications that alter genome function [6].
Although there is mounting evidence for several functional, neuroanatomical, and molecular alternations in SCZ, the routine diagnosis and treatment follow-up of the disorder still mainly depends on clinical examination, psychometric scales, and medical history. This fact underlines the urgent need for new reliable, routinely applicable, and easily accessible objective biomarkers for the early diagnosis and treatment of SCZ [17, 18]. However, despite the increasing research interest in biomarkers in psychiatry, very few have been established in clinical practice and many findings remain unconfirmed [19]. Lately, special attention has been given to blood-based, epigenetic biomarkers in SCZ [20].
However, single biomarkers are unlikely to bear the accuracy and validity required to become clinically relevant. Instead, genome-wide methylation analyses through microarrays or next-generation sequencing (NGS) enable the study of a vast number of methylation-relevant regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5’ → 3’ direction (CpG sites), providing high-dimensional datasets. Machine learning (ML) techniques for their analysis offer the opportunity to build biosignatures (panels of biomarkers) of personalized clinical importance [21,22,23]. Moreover, ML automation through highly innovative tools (AutoML) is increasingly popular, as it enables deep exploitation of omics datasets by applying multiple algorithms and performing effective feature selection. Automated machine learning (AutoML) is a process that allows to fully automate the machine learning process end-to-end. In particular, the JADBio platform used in this study, is an ad-hoc tool for biomedical research, with a proven capacity to effectively build predictive models by analyzing relatively small datasets, such as often those of patients [24,25,26,27,28,29,30,31]. It can automate the pre-analysis steps including data integration, preprocessing, cleaning, and engineering (feature construction), the analysis steps including algorithm selection, training of the models, and hyperparameter optimization, as well as the post-analysis steps including interpretation, explanation, and visualization of the analysis process and the output model. The main advantages of AutoML tools lie upon no requirement for coding knowledge, reduction of analysis time, and minimization of human-caused mistakes. Our research group has already published important AutoML-driven results and predictive models in COVID-19 [32], diabetes [33], Alzheimer’s disease [34], breast cancer [35], and suicide risk amongst depressive patients [36].
In this study, we have used a three-step, AutoML-based, data-driven biomarker discovery pipeline approach, using data from genome-wide DNA methylation datasets to build specific biosignatures, followed by pilot clinical validation via methylation analysis in the blood of SCZ patients and healthy controls. We deliver highly performing, blood-based epigenetic biosignatures that may hold promise for clinically relevant interventions and could promote the understanding of disease pathophysiology.
Subjects and methods
This study uses a three-step approach to discover, validate, and deliver a highly performing, blood-based epigenetic biosignature of diagnostic clinical value in SCZ (cf. Figure 1). Overall, our pipeline focuses on the exploitation and knowledge mining of high-throughput methylome datasets to establish specific disease biosignatures to be validated in clinical practice and promote the understanding of SCZ pathophysiology.
Step 1—in silico analysis
Through an AutoML-aided, data-driven biomarker discovery approach using data from high-throughput microarray blood DNA methylome datasets, we first identified SCZ-specific gene methylation biomarkers. Genes identified were further investigated for their biological relevance to known SCZ pathophysiological pathways.
Data sources
The Gene Expression Omnibus (GEO) database [37] was used to retrieve publicly available, blood-based DNA methylation data from SCZ patients and healthy individuals. The GEO database was searched using “Schizophrenia” as a keyword, “Methylation profiling by array” as the study type, and “Homo sapiens” as the organism of interest. In total, 19 studies were found; between them, only those using the Infinium Human Methylation 27k, 450k, or 850k BeadChips arrays, blood as the study tissue, and providing adequate normalized data were selected for further analysis, in particular three studies, namely GSE41037 [38], GSE157252 [39], and GSE41169 [38]. Dataset information is presented in Table 1.
Biomarker identification
Feature selection performed via AutoML identifies a minimum subset of features bearing the maximal classifying ability between groups. The innovative and specially designed for analyzing high-dimensional biological datasets AutoML technology JADBio, version 1.2.8 [24], was employed to build SCZ-specific biosignatures based on the retrieved high-dimensional methylation data and the demographical information as previously described [32, 34]. JADBio applies to low or high-sample data, as well as to high-dimensional or low-scale omics data, and produces accurate predictive models estimating the out-of-sample model’s performance after bootstrap correction and cross-validation. JADBio preprocesses data including mean imputation, mode imputation, constant removal, and standardization, and then, tries several predictive algorithms such as Classification Random Forests, Support Vector Machines, Ridge Logistic Regression, and Classification Decision Trees. Specifically, for small sample sizes, it employs a stratified, K-fold, repeated cross-validation BBC-CV algorithm protocol that exhibits small estimation variance and removes estimation bias. BBC-CV’s main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models [25].
For each AutoML analysis, we used extensive model tuning effort, we chose the area under the curve (AUC) metric for optimization of biosignature performance and we set the classifier maximum size to three features (cf. Table 1, legend). The predictive power of each biosignature was assessed using AUC and average precision (also known as area under the precision-recall curve) metrics.
Biological interpretation
To explore the CpG sites included in the specific biosignatures built by AutoML, each CpG was allocated to its corresponding gene with an extended window size of 20 kb downstream and upstream. Identified genes and respective proteins were further studied to unfold relevance to SCZ pathology. Gene Ontology (GO) and pathway data for each gene were retrieved via the GeneCards Suite [40]. The GO analysis covers three domains: molecular function, the elemental activities of a gene product at the molecular level; cellular component, the parts of a cell or its extracellular environment; biological process, chemical reactions, or other events that are involved and are pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. Also, subcellular location and tissue expression distribution were retrieved via the UniProtKB database information [41].
Then, we employed the multiple UniReD tool [42, 43] to identify the functional relationships of respective proteins. Multiple UniReD is a mining tool of published biomedical literature that associates the proteins of interest (query list) to a list of reference proteins that are known and verified to be involved in the disease under investigation (reference list) [44,45,46]. Multiple UniReD produces a score for each protein of interest that signifies its relatedness to the proteins in the reference list. The higher the score the higher the functional association of the proteins of interest to the reference list proteins. In particular, multiple UniReD searches for the existence of any association in UniReD clusters between the protein pairs of the two lists (reference and query list). If the pair is present in a UniReD cluster, the association will obtain a score of 1 and UniReD will continue to the next pair of reference and query proteins. If a protein has not been analyzed by UniReD, no results will be retrieved. If multiple UniReD cannot find any association within the clusters, it will proceed to the next step in order to search for paralogues. More specifically, multiple UniReD will search for a co-occurrence of a paralogue of the query protein with the reference protein in a UniReD cluster. If such a pair is present, the association will obtain a score of 0.5. In the case that UniReD does not predict any association within a query paralogue and the reference protein, then multiple UniReD will investigate whether the query protein is part of a complex. In this case, it will search for co-occurrences of the reference protein in the complex. If such an association is confirmed, the association will obtain a score of 0.5. Finally, if none of the above applies, UniReD will search for a relation between the orthologues of reference and query proteins. If such an association is documented, it will obtain a score of 0.5. When none of the aforementioned cases are applicable, then no score will be assigned to the specific protein pair [43]. In order to build our reference list of genes with a known and established role in SCZ and to include an adequate number of genes to cover multiple pathways, we searched the literature for the most cited candidate genes and we were also based on descriptive genetic reviews [47,48,49,50,51]. We then performed a further focused study for each identified protein. A 33 list of protein-coding genes known for their implication in SCZ pathophysiology according to the literature was built and used in multiple UniReD, to test the associations of identified features. Reference genes used are presented in Suppl. Table S1.
Finally, in an attempt to identify possible protein-protein interaction networks among features and the 33 protein-coding genes, we performed a functional protein association network analysis using the STRING database [52]. STRING is a database of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, knowledge transfer between organisms, and interactions aggregated from other (primary) databases.
Step 2—methylation analysis
Following the in silico analysis, the methylation profiles of the selected genes were investigated using the SYBR green-based methylation-specific polymerase chain reaction (qMSP) method in blood-derived genomic DNA (gDNA) samples of SCZ patients compared to healthy individuals, in order to validate the initial biosignatures in a laboratory setting and their clinical validity. Finally, further AutoML analysis of methylation and clinical data led to optimized diagnostic biosignatures of clinical importance.
Clinical Samples
Our study’s groups consisted of 30 first-onset, drug-naive SCZ patients and 30 age- and sex-matched healthy individuals (control group, CTRL) without other mental or physical comorbidities. Screening included a thorough physical and neurological examination, routine blood laboratory tests, urine toxicology screen, electrocardiogram (ECG), and a structured face-to-face clinical interview. Inclusion criteria of SCZ patients included: age 18 - 45 years, first episode, drug naïve, being able to sign informed consent and diagnosis of schizophrenia, schizoaffective disorder, schizophreniform disorder, brief psychotic disorder, or psychotic disorder not otherwise specified confirmed individually by two separate experienced psychiatrists according to the diagnostic criteria of the fourth edition of the Diagnostic and Statistical Manual of Mental Disorder-IV Text Revision (DSM-IV-TR) and the Mini International Neuropsychiatric Interview plus. The operational criteria for the first episode were: first episode of uninterrupted positive symptoms, no matter the duration, no symptom remission for 1 month or longer duration. Exclusion criteria included the presence or self-reported history of any chronic or acute physical and Axis I mental co-morbidities, body mass index (BMI) values beyond 18-30 kg/m2, frequent usage of any either illicit or prescribed drugs or over-the-counter medications, drinking of more than 100 g of alcohol per week, abnormal physical and neurological examinations, basic blood laboratory test values deviating from the normal range, positive urine toxicology screen, pregnancy, nursing, and pathological initial ECG. Hypothyroidism in the euthyroid state through hormonal substitution, as well as hypertension in the normotensive state through antihypertensive medication, did not serve as exclusion criteria. SCZ patients were recruited in the Dept. of Psychiatry at the Aristotle University of Thessaloniki. Psychopathology symptom severity of SCZ patients was evaluated with the Positive and Negative Syndrome Scale (PANSS). Age- and sex-matched healthy control individuals were recruited from the blood donation unit of the University General Hospital of Alexandroupolis using the same exclusion criteria. Demographic and psychometric data of both SCZ and CRTL groups are presented in Suppl. Table S2. The study was approved by the Ethics Review Committee (ERC) of the Aristotle University of Thessaloniki and the University General Hospital of Alexandroupolis, Greece, and was conducted according to the ethical principles of the 1964 Declaration of Helsinki and its later amendments. After a full oral and written explanation of the purpose and procedures of the investigation, written informed consent was obtained from each patient and healthy control participant before initiating the screening procedure and enrollment in the study.
qMSP analysis
Methylation analysis was performed in 30 SCZ samples and 30 CTRL samples. Blood samples were obtained in EDTA-coated tubes between 09.00 and 12.00 of each study day. Blood was stored at −20 °C until further analysis. Genomic DNA (gDNA) from peripheral blood was extracted using the QIAamp DNA Blood Mini kit (Qiagen, Germany) according to the manufacturer’s instructions. Then, the gDNA quantity was checked via Nanodrop Spectrophotometer (Thermo Fisher Scientific, UK) and then was stored at −20 °C. Bisulfite conversion of DNA was performed by EZ DNA Methylation-Gold ™ Kit (ZYMO Research, USA) as suggested by the manufacturer. In each reaction, CpGenome Human methylated and non-methylated DNA controls (Merck Millipore, Germany) were included as negative and positive control samples, respectively. The converted gDNA was stored at −80 °C, ready for methylation analysis.
A methylation-independent PCR assay for the β-actin gene (ACTB) was used to verify the sufficient quality and quantity of converted gDNA. Methylation levels of investigated genes were analyzed using quantitative SYBR Green-based methylation-specific PCR (qMSP) assays. Primers specific for the methylated sequence of each gene, all found in the gene body, were newly designed using the MethPrimer software [53]. Primer sequences are provided in Suppl. Table S3. To set up robust qMSP assays, extensive optimization was performed. Specificity and cross-reactivity of primers were evaluated using unconverted gDNA and converted methylated and non-methylated DNA controls. The analytical specificity of qMSP assays was evaluated by using mixes of methylated and non-methylated DNA standards (100%, 50%, 10%, 1%, 0%). The analytical sensitivity of assays was evaluated using serial dilutions of methylated and non-methylated DNA controls in H2O. The reproducibility (calculated as coefficients of variation, CVs), efficiency, and linearity were also evaluated in order to complete the validation file of the established assays. All samples were run in duplicates. The results were calculated using the Rotor-Gene 6000 Series Software 1.7 (Qiagen). The analysis was performed according to the RQ sample (Relative Quantification) = 2−∆∆CT method [54]. Specifically, ∆∆CT values were generated for each target after normalization by ACTB values and using 100% methylated control as a calibrator.
Statistical analysis
Initially, the Kolmogorov–Smirnov test was applied to check for normality in the distribution of continuous methylation data. Due to the lack of normality in our data, the Mann–Whitney U test and Spearman correlation test were used for pairwise comparisons between CTRL and SCZ groups. Continuous variables are expressed as median (minimum-maximum) or mean ± standard deviation (SD). Categorical variables are shown as absolute frequencies (percentages). In all tests performed, statistical significance was set at a two-sided p-value < .05. Statistical analysis was carried out with the jamovi version 2.3 statistical software package retrieved from https://www.jamovi.org.
Step 3—AutoML analysis
For our AutoML classification analysis, we employed the AutoML technology JADBio [24] to produce optimized disease-specific biosignatures based on our experimental patient methylation data combined with demographic data (cf. Table 3, Legend). For the analysis, we used extensive model tuning effort and we chose the AUC metric for optimization of performance. The AUC is also used to assess the overall performance of the classification model.
Results
In silico-built schizophrenia-specific biosignatures
Genome-wide blood methylation microarray data from the GSE41037, GSE41169, and GSE157252 studies retrieved from GEO were analyzed using the AutoML platform JADBio to build SCZ-specific methylation biosignature models. The methylation dataset is uploaded to the platform, and analysis is performed automatically by applying multiple machine learning algorithms and performing effective feature selection. The analysis steps include algorithm selection, training of the models, and hyperparameter optimization, as well as the post-analysis of the output model. The biosignature delivered includes a minimum subset of features bearing the maximal classifying ability between groups. The models’ performance metrics are calculated automatically, following internal validation. Features of each model and a performance overview as well as the feature selection method and the predictive algorithm are presented in Table 1. Among the three datasets, the GSE41037 dataset biosignature and its included biomarkers (https://app.jadbio.com/share/bc38a8a0-b7c7-4c08-bd04-8433c1b14cc3) were selected for further analysis and clinical validation in our clinical cohort due to the favorable combination of the larger study group size, the high performance demonstrated in silico (cf. Figure 2) and the greater score in functional associations to SCZ (see below).
Biomarker biological interpretation
Genes featured in the GSE41037 and GSE157252 dataset biosignatures were found to be involved in several biological processes such as sex differentiation, centromere complex assembly, regulation of cytokine production, neuron migration, nervous system development, DNA repair, cellular response to DNA damage stimulus, and protein maturation by [2Fe-2S] cluster transfer (cf. Table 2). According to GeneCards, CENPI and PSME4’s molecular function is related to protein binding in the nucleus of the cell, while IGFBP1’s molecular function is associated with nucleic acid binding, also in the nucleus of the cell. CISD3’s is related to metal ion binding in the cytoplasm and mitochondria of the cell and MDGA1’s molecular function is associated with protein binding in the extracellular space.
To examine the potential role of the protein products of the genes featured in the GSE41037 and GSE157252 dataset biosignatures in the pathophysiology of SCZ, we utilized the literature mining tool multiple UniReD to assess functional associations between proteins, as previously applied [33]. For this analysis, we used a list of 33 protein-coding genes with a known role in SCZ according to literature (cf. Supplementary Table S1). All genes were found to be associated with SCZ pathways. In particular, IGF2BP1 reached a score of 23, MDGA1 reached a score of 15, CENPI reached a score of 10, CISD3 reached a score of 9, and PSME4 a score of 2 (Fig. 3).
In addition, we performed a functional protein association network analysis among IGF2BP1, PSME4, CENPI, MDGA1, CISD3, and the 33 protein-coding genes with a known role in SCZ used in the multiple UniReD analysis, and only CENPI displayed an association with UFD1L (cf. Supplementary Fig. S1).
Laboratory methylation analyses in SCZ patients and CTRL individuals
Following the in silico analysis, the methylation profiles of IGF2BP1, PSME4, and CENPI genes were investigated in SCZ patients compared to CTRL, to validate the biomarkers’ performance in discriminating SCZ in a clinical setting. Methylation levels of IGF2BP1 and PSME4, but not CENPI were found to differ in a statistically significant way, IGF2BP1 being higher and PSME4 lower in the SCZ group as compared to the CTRL group (IGF2BP1: U = 156, nSCZ = nCTRL, p = 0.007, PSME4: U = 123, nSCZ = nCTRL, p = 0.015, CENPI: U = 196, nSCZ = nCTRL, p = 0.138). (cf. Fig. 4). Similarly, when groups were split according to sex, again methylation levels of IGF2BP1 and PSME4, but not CENPI were found to differ in a statistically significant way, IGF2BP1 being higher and PSME4 lower in the SCZ female group as compared to the CTRL female group (IGF2BP1: U = 15, nSCZ = nCTRL, p = 0.001, PSME4: U = 12, nSCZ = nCTRL, p = 0.001, CENPI: U = 47, nSCZ = nCTRL, p = 0.160). No statistically significant differences were noticed between the male groups (cf. Supplementary Table S4). Also, no statistically significant differences were noticed between methylation levels of IG2BP1, PSME4, and CENPI and age, or any other clinical/demographical data (cf. Supplementary Table S4).
Performance of single biomarkers in the diagnosis of schizophrenia
Among the three featured GSE41037 dataset biosignature genes, IGF2BP1 showed the best AUC in the diagnosis of SCZ (AUC = 0.718) while CENPI and PSME4 showed low AUC (AUC = 0.367 and AUC = 0.340, respectively). ROC curves are presented in Suppl. Figure S3.
AutoML predictive analysis in the diagnosis of schizophrenia
Our experimental data were further analyzed by the JADBio AutoML platform ML to produce diagnostic biosignatures validated in clinical samples. In our AutoML classification analysis, the task was to predict SCZ versus health combining the blood methylation measurements with demographic data. In this AutoML analysis, JADBio trained 3017 different machine learning pipelines (also called configurations), corresponding to different model types. Each one was employed many times during cross-validation (a repeated 10-fold CV without dropping, max. repeats = 20), leading to fitting 241,360 model instances (https://app.jadbio.com/share/5a59c593-9b99-48dc-9131-0447641ea556). The AutoML analysis produced a best-performing five-feature biosignature via the Classification Random Forests algorithm including methylation status of all three genes, age, and sex was able to discriminate groups with an AUC of 0.755 (0.636, 0.862), an accuracy of 0.720 (0.658,0,779) and an average precision of 0.758 (0.690, 0.825) (cf. Fig. 5). The methylation biomarkers were shown to bear higher importance into the model’s performance than age and gender as depicted by the feature Importance plot (Fig. 5D). In another AutoML Classification analysis where only age and sex were used as possible features produced biosignature resulted in an AUC of 0.545 (0.399, 0.686) bearing no classification power.
Discussion
In the management of SCZ, there is a long-recognized need for new objective biomarkers that can improve diagnostic accuracy. In the present study, we introduce an AutoML-based pipeline that can identify disease-specific epigenetic biomarkers through feature selection and deliver highly performing methylation-based biosignatures with the aim of aiding the development of clinical tools for the accurate diagnosis of SCZ in blood. Therefore, as a first step, we analyzed in silico publicly available, high-throughput microarray blood methylation datasets from SCZ patients and healthy individuals through AutoML predictive analysis. Feature selection of AutoML analysis revealed three SCZ-specific methylation CpGs in IGF2BP1, CENPI, and PSME4 genes that combined in a model can predict SCZ in the methylome database. The methylation profile of the identified genes was then analyzed, in step 2, by targeted qMSP assays in blood gDNA of 30 first-episode SCZ patients and 30 healthy controls (CTRL). Finally, at step 3, AutoML analysis of our experimental clinical data combined with demographics led to a best-performing five-feature biosignature including all three genes, age, and sex, that was able to discriminate drug-naïve SCZ patients from healthy individuals with high AUC and precision. It is known and empirically expected that SCZ occurs more frequently in males and that young adulthood is the typical age of onset for SCZ. Therefore, age and sex were expected as potential predictive factors in our ML model. Still, the methylation biomarkers were indicated to bear higher importance in the model’s performance.
UniReD analysis was performed in order to reveal if AutoML-identified genes were somewhat associated with SCZ. It is important to mention, that while STRING is a standard bioinformatic tool used to identify possible protein-protein interaction networks by integrating information from various sources (literature, experiments, databases and genome context), multiple UniReD used here as an additional approach, is a novel computational tool which is able to not only identify known associations between proteins described in the biomedical literature, but also to predict novel interactions which are not yet experimentally documented. While STRING analysis revealed scarce associations, UniReD results demonstrated some relevance, especially IGF2BP1 which showed the highest score when analyzed against a list of genes with an established role in SCZ pathophysiology. Indeed, IGF2BP’s family, including IGF2BP1, are key regulators of neuronal development, neuronal cell migration, and specification [55] being implicated in cytoskeletal signaling, translational control, Wnt/Hedgehog/Notch pathways [56, 57]. IGF2BP1 and CENPI expression have been also found to be closely related to abnormal psychomotor behavior in SCZ [58]. In addition, CENPI expression was found to be significantly dysregulated between unaffected biological siblings and affected SCZ individuals [59]. Our approach, employing a data-driven unsupervised way to build classifying biosignatures, pointed to these three genes as bearing in combination high classifying power. Therefore, their involvement into SCZ pathophysiological pathways is worth further attention for a deeper understanding of SCZ biology, through functional in vitro or in vivo studies.
Previous studies leveraged either biological entities such as gut microbiota, blood gene expression, methylation, and SNPs, or neuroimaging and electroencephalogram data, and employed ML tools for SCZ discrimination with very promising results [60,61,62,63,64,65]. The predictive abilities of the models built were in the range of AUC 0.780 to 0.993. Chan et al. developed via LASSO regression method a serum protein-based biosignature of 26 biomarkers that discriminated efficiently first-onset, drug-naive SCZ patients from controls, reaching an AUC of 0.970 [60]. Lin et al. combined G72 genetic variation and its protein levels and developed a naive Bayes-based biosignature using G72 rs1421292 and G72 protein for identifying SCZ with high discriminative power (AUC = 0.935) [61]. Trakadis et al. using whole exome sequencing data developed a biosignature of 372 genes via the XGBoost algorithm that can predict efficiently individuals at high risk for SCZ (AUC = 0.950) [62]. In the study of Chen et al., an epigenetic signature based on blood DNA methylation data was built that differentiated SCZ from healthy control and other neurological disorders such as bipolar disorder (AUC = 0.780) [63]. Ke et al. developed a biosignature via support vector machines without feature selection and as input features multi-biological data. Among them, the top 5% discriminative features included gut microbiota features (Lactobacillus, Haemophilus, and Prevotella), blood features (superoxide dismutase level, monocyte-lymphocyte ratio, and neutrophil count), and the electroencephalogram features (nodal local efficiency, nodal efficiency, and nodal shortest path length in the temporal and frontal-parietal brain areas). This biosignature showed also high discriminative power (AUC = 0.970) [64]. Zhu et al. built an expression-based biosignature via SVM including the expression profile of six genes (GNAI1, FYN, PRKCA, YWHAZ, PRKCB, and LYN) that can differentiate SCZ patients from healthy individuals with the highest discriminative power so far (AUC = 0.993) [65]. More recently, a blood-based machine learning case-control classifier using DNA methylation data, by applying sparse partial least squares discriminating analysis on Human Methylation 450 K array data, demonstrated an AUC of 0.67 in discriminating SCZ from healthy individuals [66]. In addition, treatment-resistance SCZ was identified from non-resistant disease with a high accuracy of 88.3%, by a risk score model based on the blood methylation of 5 genes (LOC404266, LOXL2, CERK, CHMP7, and SLC17A9) identified by a ML algorithm applied on genome-wide methylation dataset analysis [67].
Here, in order to build methylation-based biosignatures, we employed, for the first time, AutoML using the JADBio platform. As we previously proposed [32, 34], this approach presents two major advantages for further developments in biomarker discovery. (a) It produces high-performing classifiers with low-feature numbers via feature selection, i.e., automatic calculations for identifying the minimum feature number within a dataset of some thousands of features that retain the maximum classifying power. (b) It has been shown to shield against typical methodological pitfalls in data analysis that lead to overfitting and overestimating performance and, therefore, to misleading results and in particular in low number biomedical datasets. The performance in terms of sensitivity/specificity of our model as shown in laboratory validation, is comparable to this reported for other models so far. Still, the low number of features in our model, the minimally-invasive approach along with the relatively simpler qPCR technology are significant advantages for a potential diagnostic test to be implemented in clinical practice. Reducing the feature size of a signature by feature selection via AutoML, is a great advantage towards more cost-effective assays with less technical requirements for multiplexing, moving from the multi-dimensional omics results to simpler classifiers. Furthermore, the AutoML-aided approach chosen here, has a proven capacity to refrain from performance overestimations, which is the main observed concern in transferring data-driven built models into real life. Indeed, JADBio has been shown to shield against typical methodological pitfalls in data analysis that lead to overfitting and overestimating performance and, therefore, to misleading results and in particular in low number biomedical datasets. It was shown that, on typical biomedical datasets, JADBio identifies signatures with just a handful of molecular quantities, while maintaining competitive predictive performance. Furthermore, an advancement of the use of machine learning classification tools, like JADBio is that those features that do not demonstrate statistically significant changes between groups (possibly due to the small group of patients), as in our study the CENPI gene methylation status, may be selected in a model together with other features, to add to a combined classification performance. At the same time, it reliably estimates the performance of the models from the training data alone, without losing samples to validation [24, 25]. Upon further extensive analytical and clinical validation, the signatures built can offer feasible solutions for laboratory tests that could be applied in a standardly equipped diagnostic lab.
Epigenetic modifications, including DNA methylation patterns, play an important role in regulating gene expression. These modifications can affect gene expression by promoting or inhibiting the binding of transcription factors and other regulatory proteins to DNA. Environmental factors such as diet, stress, pathogens, toxins, and lifestyle have been shown to trigger epigenetic changes in an exposure- and/or a disease-related manner. A combination of genetic and environmental risk factors is thought to influence the normal processes of brain development and maturation, manifesting as a range of neurotransmitter and circuit disorders and connectivity disorders in early adulthood, like SCZ [68]. In SCZ and other multifactorial neuropsychiatric diseases, epigenetic processes have been shown to mediate the effects of environmental risks, but may also interact with the genomic risks associated with these diseases [69]. In specific, there is evidence that prenatal exposure to certain environmental factors, such as maternal obstetric complications [70], malnutrition [71], or infections [72], has been associated with an increased risk of SCZ. Also, urban upbringing [73], cannabis use during adolescence [74], and childhood adversity [75] are also considered environmental risk factors for SCZ. Understanding environmental risk factors can enhance the predictive power of models assessing schizophrenia risk. Including variables related to prenatal or childhood exposures, psychosocial stressors, or substance use can improve the accuracy of risk prediction models. Future research should aim to integrate genetic, epigenetic, and environmental factors to provide a more comprehensive understanding of the complex interplay contributing to the development of schizophrenia. This approach will contribute to the development of more effective prevention and intervention strategies.
Changes in blood cell composition can influence DNA methylation patterns. Blood is a heterogeneous tissue composed of various cell types, including white blood cells (leukocytes), red blood cells, and platelets. Each cell type has a distinct DNA methylation profile, and alterations in the proportion of different cell types within the blood can contribute to changes in overall DNA methylation patterns. Previous studies have shown that SCZ is associated with elevated levels of WBC count (i.e., higher WBC count, lymphocyte count, neutrophil count, basophil count, eosinophil count, and monocyte count) [76]. Possibly, changes in blood cell composition can explain the DNA methylation alterations found in this study.
The relatively small group size of SCZ patients and healthy individuals participating in our experimental validation part of the study represents a limitation and possibly prohibits further significant differences in IGF2BP1, CENPI, and PSME4 methylation levels to emerge. Nevertheless, although the sample size is one of many important design elements contributing to the successful implementation in biomarker discovery, the use of AutoML overcomes such limitations and aids robust and maximal data extrapolation from small cohorts. Future validation of built biosignatures in a larger group of patients should be conducted in order to confirm its clinical value and demonstrate performance in terms of sensitivity/specificity in the real-world setting. Further studies should also address the value of the methylation of each one of the identified genes, with both bioinformatic and experimental indications pointing to IGF2BP1 as the one with the greatest interest. IGF2BP1, being a member of the IGF2BP family, is implicated in cytoskeletal signaling, translational control, Wnt/Hedgehog/Notch pathways, and regulation of cytokine production and neuronal development, migration and specification [56, 57]. PSME4 on the other hand is involved in the CDK-mediated phosphorylation and removal of Cdc6 pathway and DNA repair and cellular response to DNA damage stimulus [77]. As such, both genes deserve further attention through experimental approaches, aiming to unfold potential participation in SCZ pathogenetic processes, which could explain their value in discriminating SCZ patients. Identifying biomarkers that can be used as diagnostics or predictors of treatment response (theranostics) in people with SCZ will be an important step towards being able to provide personalized management of this complex mental disorder [67]. In addition, these biomarkers should also be tested against other diagnostic entities that share genetic overlaps with SCZ, such as bipolar disorder, depression, epilepsy, autism, and multiple sclerosis [78, 79].
Blood-based methylation biomarkers could offer a minimally invasive approach to early diagnosis and prediction of treatment response upon prospective evaluation [66, 67]. Our study demonstrates the potential of AutoML-driven data approaches in discovering novel blood-based epigenetic biomarkers in SCZ while also informing disease biology.
Data availability
The data utilized in the in silico part of the study are publicly available and the clinical data are available upon reasonable request.
References
McCutcheon RA, Reis Marques T, Howes OD. Schizophrenia—an overview. JAMA Psychiatry. 2020;77:201–10.
Jauhar S, Johnstone M, McKenna PJ. Schizophrenia. Lancet. 2022;399:473–86.
Kahn RS, Sommer IE, Murray RM, Meyer-Lindenberg A, Weinberger DR, Cannon TD, et al. Schizophrenia. Nat Rev Disease Primers. 2015;1:15067.
Owen MJ, Sawa A, Mortensen PB. Schizophrenia. Lancet. 2016;388:86–97.
Fatemi SH, Folsom TD. The neurodevelopmental hypothesis of schizophrenia, revisited. Schizophr Bull. 2009;35:528–48.
Khavari B, Cairns MJ. Epigenomic dysregulation in schizophrenia: in search of disease etiology and biomarkers. Cells. 2020;9:1837.
Kuehner JN, Bruggeman EC, Wen Z, Yao B. Epigenetic regulations in neuropsychiatric disorders. Front Genet. 2019;10:268.
Tochigi M, Iwamoto K, Bundo M, Komori A, Sasaki T, Kato N, et al. Methylation status of the reelin promoter region in the brain of schizophrenic patients. Biological psychiatry. 2008;63:530–3.
Nabil Fikri RM, Norlelawati AT, Nour El-Huda AR, Hanisah MN, Kartini A, Norsidah K, et al. Reelin (RELN) DNA methylation in the peripheral blood of schizophrenia. J Psychiatric Res. 2017;88:28–37.
Abdolmaleky HM, Cheng K-h, Faraone SV, Wilcox M, Glatt SJ, Gao F, et al. Hypomethylation of MB-COMT promoter is a major risk factor for schizophrenia and bipolar disorder. Human Mol Genet. 2006;15:3132–45.
Nohesara S, Ghadirivasfi M, Mostafavi S, Eskandari M-R, Ahmadkhaniha H, Thiagalingam S, et al. DNA hypomethylation of MB-COMT promoter in the DNA derived from saliva in schizophrenia and bipolar disorder. J Psychiatric Res. 2011;45:1432–8.
Cheah S-Y, Lawford BR, Young RM, Morris CP, Voisey J. mRNA expression and DNA methylation analysis of serotonin receptor 2A (HTR2A) in the human schizophrenic brain. Genes. 2017;82017:14.
Carrard A, Salzmann A, Malafosse A, Karege F. Increased DNA methylation status of the serotonin receptor 5HTR1A gene promoter in schizophrenia and bipolar disorder. J Affective Disorders. 2011;132:450–3.
Ikegame T, Bundo M, Sunaga F, Asai T, Nishimura F, Yoshikawa A, et al. DNA methylation analysis of BDNF gene promoters in peripheral blood cells of schizophrenia patients. Neurosci Res. 2013;77:208–14.
Kordi-Tamandani DM, Sahranavard R, Torkamanzehi A. DNA methylation and expression profiles of the brain-derived neurotrophic factor (BDNF) and dopamine transporter (DAT1) genes in patients with schizophrenia. Mol Biol Rep. 2012;39:10889–93.
Cheah S-Y, McLeay R, Wockner LF, Lawford BR, Young RM, Morris CP, et al. Expression and methylation of BDNF in the human brain in schizophrenia. World J Biol Psychiatry. 2017;18:392–400.
Pickard BS. Schizophrenia biomarkers: translating the descriptive into the diagnostic. J Psychopharmacol. 2015;29:138–43.
Weickert CS, Weickert TW, Pillai A, Buckley PF. Biomarkers in schizophrenia: a brief conceptual consideration. Dis Markers. 2013;35:3–9.
Lozupone M, La Montagna M, D’Urso F, Daniele A, Greco A, Seripa D, et al. The role of biomarkers in psychiatry. Adv Exp Med Biol. 2019;1118:135–62.
Guidotti A, Auta J, Davis JM, Dong E, Gavin DP, Grayson DR, et al. Toward the identification of peripheral epigenetic biomarkers of schizophrenia. J Neurogenet. 2014;28:41–52.
Guncar G, Kukar M, Notar M, Brvar M, Cernelc P, Notar M, et al. An application of machine learning to haematological diagnosis. Sci Rep. 2018;8:411.
Abdar M, Ksiazek W, Acharya UR, Tan RS, Makarenkov V, Plawiak P. A new machine learning technique for an accurate diagnosis of coronary artery disease. Comput Methods Programs Biomed. 2019;179:104992.
Elaziz MA, Hosny KM, Salah A, Darwish MM, Lu S, Sahlol AT. New machine learning method for image-based diagnosis of COVID-19. PLoS ONE. 2020;15:e0235187.
Tsamardinos I, Charonyktakis P, Papoutsoglou G, Borboudakis G, Lakiotaki K, Zenklusen JC, et al. Just Add Data: automated predictive modeling for knowledge discovery and feature selection. NPJ Precis Oncol. 2022;6:38.
Tsamardinos I, Greasidou E, Borboudakis G. Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation. Mach Learn. 2018;107:1895–922.
Paparazzo E, Geracitano S, Lagani V, Bartolomeo D, Aceto MA, D’Aquila P, et al. A blood-based molecular clock for biological age estimation. Cells. 2022;12:32.
Jacob SG, Sulaiman MMBA, Bennet B. Feature signature discovery for autism detection: an automated machine learning based feature ranking framework. Comput Intell Neurosci. 2023;2023:6330002.
Deutsch L, Stres B. The importance of objective stool classification in fecal 1H-NMR metabolomics: exponential increase in stool crosslinking is mirrored in systemic inflammation and associated to fecal acetate and methionine. Metabolites. 2021;11:172.
Danilatou V, Nikolakakis S, Antonakaki D, Tzagkarakis C, Mavroidis D, Kostoulas T, et al. Outcome prediction in critically-Ill patients with venous thromboembolism and/or cancer using machine learning algorithms: external validation and comparison with scoring systems. Int J Mol Sci. 2022;23:7132.
Rounis K, Makrakis D, Papadaki C, Monastirioti A, Vamvakas L, Kalbakis K, et al. Correction: prediction of outcome in patients with non-small cell lung cancer treated with second line PD-1/PDL-1 inhibitors based on clinical parameters: Results from a prospective, single institution study. PLoS ONE. 2023;18:e0294382.
Bowler S, Papoutsoglou G, Karanikas A, Tsamardinos I, Corley MJ, Ndhlovu LC. A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity. Sci Rep. 2022;12:17480.
Papoutsoglou G, Karaglani M, Lagani V, Thomson N, Roe OD, Tsamardinos I, et al. Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets. Sci Rep. 2021;11:15107.
Karaglani M, Panagopoulou M, Baltsavia I, Apalaki P, Theodosiou T, Iliopoulos I, et al. Tissue-specific methylation biosignatures for monitoring diseases: an in silico approach. Int J Mol Sci. 2022;23:2959.
Karaglani M, Gourlia K, Tsamardinos I, Chatzaki E. Accurate blood-based diagnostic biosignatures for alzheimer’s disease via automated machine learning. J Clin Med. 2020;9:3016.
Panagopoulou M, Karaglani M, Manolopoulos VG, Iliopoulos I, Tsamardinos I, Chatzaki E. Deciphering the methylation landscape in breast cancer: diagnostic and prognostic biosignatures through automated machine learning. Cancers (Basel). 2021;13:1677.
Adamou M, Antoniou G, Greasidou E, Lagani V, Charonyktakis P, Tsamardinos I, et al. Toward automatic risk assessment to support suicide prevention. Crisis. 2019;40:249–56.
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2012;41:D991–D995.
Horvath S, Zhang Y, Langfelder P, Kahn RS, Boks MP, van Eijk K, et al. Aging effects on DNA methylation modules in human brain and blood tissue. Genome Biol. 2012;13:R97.
Piao YH, Cui Y, Rami FZ, Li L, Karamikheirabad M, Kang SH, et al. Methylome-wide association study of patients with recent-onset psychosis. Clin Psychopharmacol Neurosci. 2022;20:462–73.
Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards Suite: from gene data mining to disease genome sequence analyses. Curr Protocols Bioinform. 2016;54:1.30.1–1.30.33.
Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A. UniProtKB/Swiss-Prot. Methods Mol Biol (Clifton, NJ). 2007;406:89–112.
Theodosiou T, Papanikolaou N, Savvaki M, Bonetto G, Maxouri S, Fakoureli E, et al. UniProt-Related Documents (UniReD): assisting wet lab biologists in their quest on finding novel counterparts in a protein network. NAR Genom Bioinform. 2020;2:lqaa005.
Baltsavia I, Theodosiou T, Papanikolaou N, Pavlopoulos GA, Amoutzias GD, Panagopoulou M, et al. Prediction and ranking of biomarkers using multiple UniReD. Int J Mol Sci. 2022;23:11112.
Savvaki M, Kafetzis G, Kaplanis SI, Ktena N, Theodorakis K, Karagogeos D. Neuronal, but not glial, Contactin 2 negatively regulates axon regeneration in the injured adult optic nerve. Eur J Neurosci. 2021;53:1705–21.
Kalafatakis I, Kalafatakis K, Tsimpolis A, Giannakeas N, Tsipouras M, Tzallas A, et al. Using the Allen gene expression atlas of the adult mouse brain to gain further insight into the physiological significance of TAG-1/Contactin-2. Brain Struct Funct. 2020;225:2045–56.
Antuamwine BB, Bosnjakovic R, Hofmann-Vega F, Wang X, Theodosiou T, Iliopoulos I, et al. N1 versus N2 and PMN-MDSC: a critical appraisal of current concepts on tumor-associated neutrophils and new directions for human oncology. Immunol Rev. 2023;314:250–79.
Boulenouar H, Benhatchi H, Guermoudi F, Oumiloud AH, Rahoui A. An actualized screening of schizophrenia-associated genes. Egyptian J Med Human Genet. 2022;23:81.
Zhan N, Sham PC, So H-C, Lui SSY. The genetic basis of onset age in schizophrenia: evidence and models. Front Genet. 2023;14:1163361.
Johansson AS, Owe-Larsson B, Hetta J, Lundkvist GB. Altered circadian clock gene expression in patients with schizophrenia. Schizophrenia Res. 2016;174:17–23.
Meltzer HY, Li Z, Huang M, Prus A. Serotonergic mechanisms in schizophrenia: evolution and current concepts. Curr Psychosis Therapeutics Reports. 2006;4:12–9.
Wagh VV, Vyas P, Agrawal S, Pachpor TA, Paralikar V, Khare SP. Peripheral blood-based gene expression studies in schizophrenia: a systematic review. Front Genet. 2021;12:736483.
Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49:D605–D612.
Li LC, Dahiya R. MethPrimer: designing primers for methylation PCRs. Bioinformatics (Oxford, England). 2002;18:1427–31.
Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods (San Diego, Calif). 2001;25:402–8.
Ravanidis S, Kattan FG, Doxakis E. Unraveling the pathways to neuronal homeostasis and disease: mechanistic insights into the role of RNA-binding proteins and associated factors. Int J Mol Sci. 2018;19:2280.
Nielsen J, Christiansen J, Lykke-Andersen J, Johnsen AH, Wewer UM, Nielsen FC. A family of insulin-like growth factor II mRNA-binding proteins represses translation in late development. Mol Cell Biol. 1999;19:1262–70.
Huang H, Weng H, Sun W, Qin X, Shi H, Wu H, et al. Recognition of RNA N(6)-methyladenosine by IGF2BP proteins enhances mRNA stability and translation. Nat Cell Biol. 2018;20:285–95.
Zhang Y, You X, Li S, Long Q, Zhu Y, Teng Z, et al. Peripheral blood leukocyte RNA-seq identifies a set of genes related to abnormal psychomotor behavior characteristics in patients with schizophrenia. Med Sci Monitor: Int Med J Exp Clin Res. 2020;26:e922426.
Glatt SJ, Stone WS, Nossova N, Liew CC, Seidman LJ, Tsuang MT. Similarities and differences in peripheral blood gene-expression signatures of individuals with schizophrenia and their first-degree biological relatives. Am J Med Genet B: Neuropsychiatric Genet. 2011;156B:869–87.
Chan MK, Krebs MO, Cox D, Guest PC, Yolken RH, Rahmoune H, et al. Development of a blood-based molecular biomarker test for identification of schizophrenia before disease onset. Transl Psychiatry. 2015;5:e601.
Lin E, Lin CH, Lai YL, Huang CH, Huang YJ, Lane HY. Combination of G72 genetic variation and G72 protein level to detect schizophrenia: machine learning approaches. Front Psychiatry. 2018;9:566.
Trakadis YJ, Sardaar S, Chen A, Fulginiti V, Krishnan A. Machine learning in schizophrenia genomics, a case-control study using 5,090 exomes. Am J Med Genet B: Neuropsychiatric Genet. 2019;180:103–12.
Chen J, Zang Z, Braun U, Schwarz K, Harneit A, Kremer T, et al. Association of a reproducible epigenetic risk profile for schizophrenia with brain methylation and function. JAMA Psychiatry. 2020;77:628–36.
Ke PF, Xiong DS, Li JH, Pan ZL, Zhou J, Li SJ, et al. An integrated machine learning framework for a discriminative analysis of schizophrenia using multi-biological data. Sci Rep. 2021;11:14636.
Zhu L, Wu X, Xu B, Zhao Z, Yang J, Long J, et al. The machine learning algorithm for the diagnosis of schizophrenia on the basis of gene expression in peripheral blood. Neurosci Lett. 2021;745:135596.
Gunasekara CJ, Hannon E, MacKay H, Coarfa C, McQuillin A, Clair DS, et al. A machine learning case–control classifier for schizophrenia based on DNA methylation in blood. Transl Psychiatry. 2021;11:412.
Lu AK, Lin JJ, Tseng HH, Wang XY, Jang FL, Chen PS, et al. DNA methylation signature aberration as potential biomarkers in treatment-resistant schizophrenia: Constructing a methylation risk score using a machine learning method. J Psychiatric Res. 2023;157:57–65.
Millan MJ, Andrieux A, Bartzokis G, Cadenhead K, Dazzan P, Fusar-Poli P, et al. Altering the course of schizophrenia: progress and perspectives. Nat Rev Drug Discov. 2016;15:485–515.
Hannon E, Dempster E, Viana J, Burrage J, Smith AR, Macdonald R, et al. An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation. Genome Biol. 2016;17:176.
Cannon M, Jones PB, Murray RM. Obstetric complications and schizophrenia: historical and meta-analytic review. Am J psychiatry. 2002;159:1080–92.
Sarris J, Logan AC, Akbaraly TN, Amminger GP, Balanzá-Martínez V, Freeman MP, et al. Nutritional medicine as mainstream in psychiatry. Lancet Psychiatry. 2015;2:271–4.
Arias I, Sorlozano A, Villegas E, de Dios Luna J, McKenney K, Cervilla J, et al. Infectious agents associated with schizophrenia: a meta-analysis. Schizophrenia Res. 2012;136:128–36.
Vassos E, Pedersen CB, Murray RM, Collier DA, Lewis CM. Meta-analysis of the association of urbanicity with schizophrenia. Schizophrenia Bull. 2012;38:1118–23.
Arseneault L, Cannon M, Witton J, Murray RM. Causal association between cannabis and psychosis: examination of the evidence. Br J Psychiatry : the journal of mental science. 2004;184:110–7.
Liang H, Olsen J, Yuan W, Cnattingus S, Vestergaard M, Obel C, et al. Early life bereavement and schizophrenia: a nationwide cohort study in Denmark and Sweden. Medicine (Baltimore). 2016;95:e2434.
Gao Z, Li B, Guo X, Bai W, Kou C. The association between schizophrenia and white blood cells count: a bidirectional two-sample Mendelian randomization study. BMC Psychiatry. 2023;23:271.
Yazgili AS, Ebstein F, Meiners S. The proteasome activator PA200/PSME4: an emerging new player in health and disease. Biomolecules 2022;12:1150.
Smeland OB, Bahrami S, Frei O, Shadrin A, O’Connell K, Savage J, et al. Genome-wide analysis reveals extensive genetic overlap between schizophrenia, bipolar disorder, and intelligence. Mol Psychiatry. 2020;25:844–53.
Lee SH, Ripke S, Neale BM, Faraone SV, Purcell SM, Perlis RH, et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet. 2013;45:984–94.
Acknowledgements
Greece 2.0, National Recovery and Resilience Plan Flagship program TAEDR-0535850 supported this work.
Funding
Greece 2.0, National Recovery and Resilience Plan Flagship program TAEDR-0535850 supported this work.
Author information
Authors and Affiliations
Contributions
MK, AA, and EC conceived and designed the study. VPB and EC supervised the study. MK and AA wrote the manuscript and performed data analysis. MP, TT and II assisted with statistical methodology and interpretation of data. EP, PA, PB, and KZ contributed to the sample and data collection. EC acquired the funding. II, VPB and EC revised and contributed to the final version of the manuscript. All authors have reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
EC and TT are co-founders of ABCureD PC, while all other authors declare no actual or potential conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Karaglani, M., Agorastos, A., Panagopoulou, M. et al. A novel blood-based epigenetic biosignature in first-episode schizophrenia patients through automated machine learning. Transl Psychiatry 14, 257 (2024). https://doi.org/10.1038/s41398-024-02946-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41398-024-02946-4
- Springer Nature Limited