Unveiling new genetic insights in rheumatoid arthritis for drug discovery through Taxonomy3 analysis

Kozlowska, Justyna; Humphryes-Kirilov, Neil; Pavlovets, Anastasia; Connolly, Martin; Kuncheva, Zhana; Horner, Jonathan; Manso, Ana Sousa; Murray, Clare; Fox, J. Craig; McCarthy, Alun

doi:10.1038/s41598-024-64970-0

Unveiling new genetic insights in rheumatoid arthritis for drug discovery through Taxonomy3 analysis

Article
Open access
Published: 19 June 2024

Volume 14, article number 14153, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Unveiling new genetic insights in rheumatoid arthritis for drug discovery through Taxonomy3 analysis

Download PDF

Justyna Kozlowska¹,
Neil Humphryes-Kirilov¹,
Anastasia Pavlovets¹,
Martin Connolly¹,
Zhana Kuncheva¹,
Jonathan Horner¹,
Ana Sousa Manso¹,
Clare Murray¹,
J. Craig Fox¹ &
…
Alun McCarthy¹

789 Accesses
59 Altmetric
8 Mentions
Explore all metrics

Abstract

Genetic support for a drug target has been shown to increase the probability of success in drug development, with the potential to reduce attrition in the pharmaceutical industry alongside discovering novel therapeutic targets. It is therefore important to maximise the detection of genetic associations that affect disease susceptibility. Conventional statistical methods such as genome-wide association studies (GWAS) only identify some of the genetic contribution to disease, so novel analytical approaches are required to extract additional insights. C4X Discovery has developed Taxonomy3, a unique method for analysing genetic datasets based on mathematics that is novel in drug discovery. When applied to a previously published rheumatoid arthritis GWAS dataset, Taxonomy3 identified many additional novel genetic signals associated with this autoimmune disease. Follow-up studies using tool compounds support the utility of the method in identifying novel biology and tractable drug targets with genetic support for further investigation.

Genomics-driven drug discovery based on disease-susceptibility genes

Article Open access 10 March 2021

Identification of potential drug targets for rheumatoid arthritis from genetic insights: a Mendelian randomization study

Article Open access 11 September 2023

Network and pathway expansion of genetic disease associations identifies successful drug targets

Article Open access 01 December 2020

Introduction

Attrition in drug development is a major issue for the pharmaceutical industry, particularly for complex diseases, with significant R&D resources spent on projects that do not deliver new medicines for patients. It is now widely accepted that genetic support for a therapeutic drug target leads to an increased probability of successful delivery to patients of a new medicine, with the potential to significantly reduce attrition^1,2. These observations have directed the focus of drug discovery groups towards results from genetic studies of disease. As opposed to rare diseases, which are often caused by the dysfunction of a single gene, common diseases are complex traits influenced by the added contribution of many genetic variants. Conventional analyses of Genome-wide association studies (GWAS) have generated thousands of associations, initially based on the study of single-nucleotide polymorphisms (SNPs) arrays but increasingly using whole exome—or whole genome-sequence data³. A critically important observation is that > 90% of GWAS variants identified fall in non-coding regions of the genome and thus do not directly affect the coding sequence of a gene but rather accumulate in DNA regulatory elements and could for example disrupt binding sites for transcription factors likely regulating the expression levels of genes in a cell type-specific manner. Therefore, it is unclear which genes these variants regulate and in which cell types or physiological contexts this regulation occurs. This has hindered the translation of GWAS findings and insights into clinical interventions⁴.

Another key current limitation of using GWAS datasets to provide insights into relevant disease biology is that no analysis method can extract all the genetic information relevant to a disease embedded in the genetics, leading to the concept of ‘missing heritability.’ i.e. the genetic component of disease susceptibility that is calculated from family or twin studies, but cannot be fully accounted for by the known genetic associations detected.

Many explanations for this missing heritability have been proposed, including: the existence of many rare mutations with strong effect not captured by the standard GWAS analysis methodologies⁵, gene–gene interactions with strong effect (where the individual genes are not in themselves sufficiently strongly associated with the disease to be detected)^6,7,8, or ‘omnigenic’ models proposing that many thousands of genes with small effect impact disease susceptibility such that even current meta-analyses are insufficiently powered to detect most of the signals^6,7,8.

Based on twin and family studies, heritability of rheumatoid arthritis (RA) is estimated at ~ 60%, showing a significant impact of genetic variation on disease aetiology⁹. A number of RA GWASs have been published, with datasets that have included over 300,000 people^10,11,12. The largest of these studies identified ~ 75 loci associated with diagnosis of RA (11 of which were novel) and while this is an important contribution to understanding the aetiology of the disease, the authors calculate that the total of all these findings accounts for 40–50% of the heritability thought to affect development of RA, leaving significant missing heritability still to be identified¹².

Whatever the reasons for the missing heritability, it is clear that many different analytical approaches are required to maximise the extraction of genetic insights from disease datasets. To this end, C4X Discovery has implemented a unique method (called Taxonomy3) for analysing and visualising human genetic datasets for drug target identification. Taxonomy3 represents a significant departure from traditional GWAS. It employs a probabilistic mathematical framework to visualise high-dimensional genetic data. At its core, the method utilises individualized divergences and Log Bayes Factors (LBFs), an approach that transforms genotype data matrices into a data matrix and facilitates a nuanced comparison between genetic profiles. Distinct from conventional GWAS techniques that predominantly focus on frequency and association, Taxonomy3 allows to visualise the genetic architecture. In addition, Taxonomy3 scales principal components according to their corresponding eigenvalues, facilitating the interpretation of signal importance. For instance, the first principal component is scaled by the largest eigenvalue since it explains a component with the most variability. This allows for direct comparisons with subsequent scaled principal components, which diminish in size as they account for decreasing variability associated with case–control separation. That way it accentuates the evidential weight of genetic variants, thereby providing a refined visual aid, especially beneficial for complex traits like RA. This capability to handle complexities inherent in genetic data marks Taxonomy3 as a powerful tool for uncovering genetic targets. This methodological shift facilitates the identification of subtle genetic signals that might be overlooked in traditional analyses, making it particularly suitable for complex autoimmune diseases, where genetic contributions are diverse and often polygenic. The method has previously been applied to a range of complex disorders and clinical applications: Stevens-Johnson syndrome (SJS) and toxic epidermal necrolysis (TEN)¹³, drug-induced skin blistering disorders^14,15, cutaneous adverse drug reactions¹⁶, carbamazepine hypersensitivity¹⁷, and phase 2a weight loss clinical trial data¹⁸. In this paper, we present details of application of Taxonomy3 to a rheumatoid arthritis case/control dataset, show the additional genetic insights generated by the method, and provide examples where the findings allow the potential initiation of drug discovery programmes based on drug targets identified through these new genetic findings.

Methods

Overview of Taxonomy3 method

Taxonomy3 is based on correlations of individualized divergences (named Log Bayes Factors, LBFs)¹⁹, and their Principal Component Analysis (PCA): the mathematical principle of the method has been described previously^20,21,22. Briefly, for each genotype in a case/control group, the LBF is calculated, and for all subjects, each genotype is replaced by the calculated LBF for that genotype (for details see Supplementary Information). In this way, the initial genotype data matrix is transformed into a LBF matrix of the same dimension, representing the information contributed by each subject and SNP pertaining to the overall case/control distinction. LBFs in this context compare the likelihood of the data under two different hypotheses (for example, presence or absence of a genetic association). The log transformation is used for several reasons, including stabilising variance and making the scale of the data more manageable, which is especially important given the high dimensionality and complexity of GWAS data. This process is crucial for generating variance, both between and within groups, which is a key aspect of identifying significant genetic signals. A critical aspect of Taxonomy3 is the computation of individualized log divergences for each variable. This involves comparing the parameters of the marginal distribution of that variable under two different models. For instance, in a case–control study, these two models could represent the case group and the control group. Divergences are used in the literature to measure how different entities (e.g. SNPs) are from a representative benchmark such as the mean or median of the studied group. An ‘individualized divergence’ provides the researcher with a measure of how similar/different an individual entity is from that benchmark. Studying those individual divergences provides the researcher with an opportunity to explore not only coarse differences between groups but to pinpoint granular differences within the groups. Divergences may be defined in many different ways depending on the underlying scientific question and the explored statistical hypothesis^21,23. Throughout this study ‘LBFs’ refer to the observed LBFs that are calculated directly from the data without adjusting for the frequency of variables. This approach is suitable when the aim is to capture the impact of rare genetic variants that are often underestimated in genetic analyses. LBFs are real numbers and have additive properties allowing the use of linear algebra tools. In Taxonomy3, LBFs are used to generate what is effectively a new empirical genetic model from the SNP data. This model is then used in further analysis steps such as eigen decomposition of correlations of LBFs (PCA), which is the preferred multivariate analysis method as it produces independent sets of correlated variables. To accurately determine how much each variable contributes to case/control distinction, we projected variable loadings on to the observed case/control direction in relevant dimensional space. Other datatypes (gene expression, clinical variables etc.) can also be transformed into LBFs, which in turn can be co-analysed alongside genotype LBFs²¹.

The principal component results are visualised using biplots to display the relative position of subjects and variables. From this, variables can be identified that are important for discriminating cases from controls. Statistical significance of the results is assessed by permutation of the case/control labels and re-analysis, which is performed 10 000 times. For each variable, p-values are obtained by comparing the true observed projected loading to the Gaussian mixture model (Mixmod software²⁴) fitted to the distribution of the permuted projected loadings. The Family Wise Error Rate Šidák correction for multiple testing is used to define the genome-wide threshold for significant variables using the exact number of tests being carried out.

Taxonomy3 analysis data management

Subjects and phenotype data

We analysed the rheumatoid arthritis (RA) dataset from the Welcome Trust Case Control Consortium (WTCCC), comprising RA cases (n = 1999) recruited from sites across the UK¹¹. As controls, we studied UK National Blood Service (NBS) controls (n = 1480) from the same source. Both cases and controls were genotyped with the Affymetrix 500 K SNP chip (For more information on the cases and controls, see¹¹). The WTCCC has limited phenotype data on the disease samples: disease status, age, sex and broad geographical region within Britain. The downloaded data were stored and analysed on the Amazon Web Services cloud computing facilities. To ensure security, the data were encrypted, stored behind a HIPAA-compliant firewall, and access to the data was restricted to a defined list of IP addresses.

Genotypes

The latest annotations file for the discontinued Affymetrix 500 K DNA chip was obtained from the manufacturer. We used genotypes derived using the Chiamo algorithm as in the original WTCCC analysis, discarding genotypes having a call probability lower than 90%¹¹.

Univariate quality control

The case and control data were first subject to conventional data quality control. The objective of this procedure was to detect and remove potential biases that could undermine the analysis.

Sex: Checking for sex discrepancies is critical to Taxonomy3 as LBFs are not defined in the same way for X-linked SNPs in males and females. Sex was inferred for each patient from intensities of X-linked genotypes and X chromosome heterozygosity and checked against the reported sex.

Missing values: Subjects with a missing value rate above 5% were removed from the analysis.

Relatedness: The objective of this analysis was to detect and remove inter-related subjects/samples from subsequent analyses. We selected a total of 35,941 highly variable SNPs having a high Minor Allele Frequency (above 48%). We then determined the percentage of identical genotypes in all possible pairs of subjects.

Variables: Variables were excluded from the analysis for any of the following reasons:

1.
Monomorphic in the whole population
2.
Missing > 5% of values
3.
SNPs departing from Hardy Weinberg Equilibrium in the control group (threshold p = 10^–8).

In classical GWASs, SNPs having a low minor allele frequency (MAF) are usually removed due to power considerations. This is not necessary in Taxonomy3, as the method is not affected by rare variants.

HLA imputation

Human leukocyte (HLA) genotyping is not available in the WTCCC RA dataset, therefore imputation of antigen (HLA) genotypes from SNP genotype data was performed in-house using the HIBAG imputation method²⁵ with pre-trained parameter estimates specific to Affymetrix500k for European ancestry.

Genetic ancestry

Taxonomy3’s focus on individual variations and the relative weighting of genetic variants can make it more sensitive to subtle population-level differences. This could potentially accentuate the differences arising from population structure and is accounted for in an additional QC step—HapMap co-analysis, which is similar to PCA-based techniques used in traditional GWAS. In Taxonomy3, however, genetic ancestries other than the main genetic ancestry that is being analysed are visualised and excluded and only the main homogenous population is progressed.

We carried out a co-analysis of the HapMap data with the RA (cases) and NBS (healthy controls) datasets, looking to define an ethnically homogenous cluster of subjects. The objective of the analysis was to position RA and NBS subjects within a Caucasian/non-Caucasian ethnic contrast and to define an ethnically homogenous subgroup of Caucasians. Cohorts included in HapMap co-analysis were as follows:

Reference population: HapMap Caucasians (CEU)
Contrast populations: HapMap Chinese (CHB), Japanese (JPT), Tuscan (TSI) and Africans (YRI)
Unknown subjects: RA and NBS subjects

Appropriately matched case and control populations for the trait being studied were selected using probabilistically defined regions using the Mahalanobis distance around centres determined by k-means clusters for the reference population of interest. An appropriate α percentile was selected to retain the most subjects while minimising population heterogeneity. This approach was used to exclude subjects introducing bias due to non-disease related patient stratification and potentially confounding the case/control distinction.

SNP-to-gene mapping

A custom SNP-to-gene mapping pipeline was used, which implemented both FUMA²⁶ and variant to gene (V2G) from Open Targets Genetics²⁷ to gather evidence. First, SNPs in high LD (r² ≥ 0.6) with Taxonomy3 SNPs were extracted from the EUR population of 1000 Genomes Data Phase 3²⁸. These SNPs were then mapped to genes (“Tax3 genes”) by genomic distance (< 40 kb from gene boundary + 1 kb promoter), predicted functional consequences (variant effect prediction—VEP²⁹), significant cis e/pQTLs (Open Targets Genetics collection) in relevant tissues, and chromatin mapping (Open Targets Genetics collection). These measures were then weighted and used to calculate a prioritisation score TOPSIS from the MCDA R package³⁰ based on a theoretical best and worst SNP-to-gene mapping. Standard evidence weights were based on Open Targets Genetics V2G scoring (functional prediction: 0.35, QTL evidence: 0.35, genomic distance: 0.2, chromatin proximity: 0.1). A TOPSIS score threshold of 0.4 was implemented to represent high quality mappings, whereby the SNP occurs within a gene or has a significant eQTL. This pipeline is available on GitHub (https://github.com/c4x-discovery/tax3-ra-publication).

Assessment of novelty

To assess the novelty of the findings at the SNP and gene level, data curated by the Open Targets³¹ and Open Targets Genetics²⁷ platforms were used. All reported GWAS results related to EFO_0000685 (rheumatoid arthritis) in European populations were downloaded using the Open Targets Genetics API (November 2021) and lead SNPs were subjected to our SNP-to-gene mapping pipeline. Open Targets Indirect Disease Association Scores for Tax3 genes were investigated for novelty at the gene level (Open Targets download version 06.21).

Bioinformatic assessment of drug tractability

Target tractability details were downloaded from Open Targets³¹ (Open Targets download version 11.21) to identify potential modalities to drug genetic targets directly or via relevant biological interactions. Scores were allocated to terms to aid visualisation of druggable targets. (Approved Drug: 1, Advanced Clinical: 0.9, Phase 1 Clinical: 0.8, Structure with Ligand: 0.5, UniProt loc high conf: 0.5, Literature: 0.5, GO CC high conf: 0.45, High-Quality Ligand: 0.4, UniProt loc med conf: 0.4, UniProt Ubiquitination: 0.4, High-Quality Pocket: 0.3, UniProt SigP or TMHMM: 0.3, Database Ubiquitination: 0.3, Med-Quality Pocket: 0.2, GO CC med conf: 0.2, Half-life Data: 0.2, Druggable Family: 0.1, Human Protein Atlas loc: 0.1, Small Molecule Binder: 0.1).

Network and pathway enrichment

Network expansion was conducted on the protein-coding Tax3 genes from SNP-to-gene mapping, using experimental protein–protein interaction data from IntAct³² (downloaded from Open Targets). A minimum IntAct score threshold of 0.5 was used to identify medium–high confidence known protein–protein interactions between Tax3 genes, and interactors were included that interacted with at least 2 Tax3 genes. This provided a network of potential interactors that may be occurring in relevant cells and tissues. Disease-relevant gene expression data was obtained from Genevestigator³³. 511 mRNA-seq samples were retrieved from studies relevant to RA, including only human samples labelled as RA or healthy control. This relevant set included blood and synovial tissues and cell types. Gene expression-based clustering was performed on Tax3 genes + interactors using WGCNA³⁴ and clusters were analysed in the context of the protein–protein interaction network. Network and pathway enrichment was performed using the anRichment R package³⁴ using the built-in “GO” and “biosys” collections. Network visualisation and clustering was performed within R using igraph³⁵ and visNetwork³⁶.

Cell assays

Preliminary validation of putative targets was performed using tool compounds in a cytokine release assay with peripheral blood mononuclear cells (PBMCs). Briefly, human PBMCs from healthy donors (n = 3) were prepared from buffy coats and resuspended in RPMI-1640 containing 10% FBS, 1% penicillin/streptomycin, 2 mM L-glutamine and 50 μM 2-Mercaptoethanol. 1 × 10⁵ cells were added per well to a 96-well flat-bottomed plate, and for stimulated wells, anti-CD3 (final concentration 0.25 µg/mL) was added to cells immediately prior to seeding to the plate. Compounds (BML-210, CTLA-4 Fc Chimera, C4X_17358 and Merimepodib) were solubilised in DMSO with the final vehicle concentration in wells of 0.1%. Treatments were added in triplicate to wells in a final volume of 100 µL per well. Cells were cultured for 72 h at 37 °C, 5% CO₂. At the end of the culture period, cells were stained for viability using eBioscience Fixable Viability Dye eFluor 780, and supernatants were collected for subsequent assessment of cytokine production by multiplex using ThermoFisher custom ProcartaPlex kits.

Results

Univariate quality control

Subjects and patients: sex discrepancies were discovered in 19 subjects, 2 subjects had a missing value rate above 5%, and 5 subjects were found to be related: these subjects were removed from the analysis. A total of 26 subjects, predominantly RA patients, were removed due to univariate QC deviations.

Variables: The following steps were combined to establish the variables that were used in subsequent analyses (Table 1):

Table 1 Univariate QC procedures for SNPs (using the 90% genotype call probability threshold).

Full size table

We obtained the latest annotations file from Affymetrix pertaining to their discontinued 500 K chip. Table S1 in the Supplementary Information shows descriptive statistics for merged NBS & RA datasets and Supplementary Table S2 shows the chromosomal location of available variables.

From an initial list of 500,306 variables, the final QC’d data for HapMap co-analysis included 480,785 variables.

HapMap co-analysis

Chiamo generated genotype calls were used. A total of 480,785 SNPs shared by all datasets, non-monomorphic and having a total missing value rate lower than 5%, were analysed.

Figure 1a shows the global Caucasian/non-Caucasian contrast. Subjects were somewhat dispersed, but the majority overlapped with the CEU cluster.

Figure 1b shows the genetic ancestry boundaries for cases and controls defined using a Mahalanobis distance from the Caucasian cluster centre. A percentile value of α = 0.2 was selected to define subjects in both cohorts using their respective ellipses.

The RA population in the WTCCC dataset was heavily skewed towards females. To reduce the chance of spurious associations cases and controls were matched with respect to sex by balancing the cohort to achieve an odds ratio of 1.

The output from the HapMap co-analysis gives a final population of 2005 subjects (Table 2) Subsequent Taxonomy3 analyses were restricted to these subjects.

Table 2 Final case/control cohorts.

Full size table

Taxonomy3 analysis results

The primary output of Taxonomy3 analysis is shown in the biplot in Fig. 2. The cases and controls are well separated by the 1st principal component (PC1), and the case/control dummy variable, which is a categorical representation used to differentiate between case and control groups in the dataset. When we observe the alignment of this dummy variable in our PCA biplots, particularly against principal components, it indicates which components are most influential in distinguishing between cases and controls. In Fig. 2a that variable is almost horizontally aligned with the X-axis, showing that case/control separation is the biggest source of variation in the dataset and that it is captured almost exclusively in PC1 (Fig. 2a). This is confirmed by the Scree plot (Fig. 2b) which shows that most variation between case and controls is accounted for by PC1, with much smaller contributions from the other components. Therefore, variables projecting from the central origin along the case/control axis are relevant to discriminating cases and controls. This significant portion of explained variation underscores the effectiveness of Taxonomy3 in delineating between case and control groups. 2nd principal component (PC2) reveals a clustering pattern splitting both cases and controls into three distinct subgroups. Figure 2c and d show the loci driving the separation in PC1 (loading 1) and PC2 (loading 2), respectively. Signals found on PC2 are orthogonal to the case/control dummy variable and subsequently to the risk SNPs found on PC1, therefore have a very modest contribution to case–control separation. Figure 2e introduces a projected loadings plot with dimension 5. Projected loadings refer to the visualisation technique used in PCA to display the contribution of each variable to the principal components. Dimension 5 denotes the fifth principal component that has been extracted from the dataset. This component is part of the higher-dimensional space that PCA creates to capture the variance in the data pertinent to case/controls distinction. Here, dimension 5 captures most of the signal that distinguishes between the case and control groups and additional dimensions (dim6, dim7, etc.) do not contribute to this distinction (Supplementary Fig. 1).

Permutation analysis was used to determine statistical significance of these findings. A total of 10,000 permutations of case/control labels and re-analysis were carried out. Permutation analysis plays a critical role in distinguishing true genetic associations from false positives. By randomly shuffling the case and control labels of our dataset and recalculating the association statistics for each SNP, we can create a null distribution of associations that would be expected by chance alone. By comparing our original association results to this null distribution, we can more accurately assess the likelihood that the observed associations are due to genuine biological signals rather than random fluctuations. This rigorous statistical method helps to reinforce the validity of our findings and is an essential step in the fine-mapping process to pinpoint potential causal variants. Based on the inspection of the Scree plot (Fig. 2b), 5 PCA components were retained. The Šidák genome wide significance threshold was used (alpha = 5%, p value = 1.13e−07). The resulting Manhattan plot is shown in Fig. 2f. Statistically significant loci were spread across the genome with a large LD block located on chromosome 6. Many of the loci on chromosome 6 are located within the major histocompatibility complex (MHC) region as indicated by HLA imputed variables (on the Manhattan plot Fig. 3e plotted as chromosome HLA).

Interpretation of Taxonomy3 findings (SNP-to-gene)

Taxonomy3 genetic findings were mapped to genes using the parameters and pipeline described in Methods. The combined Taxonomy3 analysis revealed 173 SNPs that exceeded the Šidák significance threshold. As expected, and consistent with variants detected by conventional analysis, all of these SNPs except for one SNP (rs1800416) are located in non-coding regions of the genome. 109 significant SNPs mapped to the HLA region on chr6. The HLA region presents a challenge for SNP-to-gene mapping, as it is highly polymorphic, has a very high gene density and a low recombination rate resulting in a strong LD structure. Using the SNP-to-gene thresholds detailed in methods, a list of 233 genes was obtained (190 protein-coding genes).

Novelty of Taxonomy3 findings

The novelty of Tax3 SNPs were assessed in two ways, by colocalising Taxonomy3 genetic signals with published GWAS results, and using gene-level disease association scores from Open Targets³¹ to explore known associations (genetic and others) between Tax3 genes and RA and related autoimmune disorders.

First, Tax3 SNPs were compared to significant GWAS results by matching exact rsIDs or matching rsIDs to SNPs in high LD (r² ≥ 0.6 in 1 KG Ph3 EUR). Using this method, 2 loci within the MHC region were matched (rs6457620/HLA-DQB1 and rs9268557/HLA-DRA) and 1 locus on chr1 (rs6679677/PTPN22). To expand this genetic mapping, significant GWAS SNPs were mapped to genes using the same pipeline as Tax3 SNPs. This identified 6 additional loci outside the MHC region mapping to the same top genes, RTN4IP1 and SUPT3H on chr6, ACOXL on chr2, TTC34 on chr1, LYZL1 on chr10 and AMOTL1 on chr11.

For gene-level disease association, 3 related autoimmune disorders were selected (psoriasis, inflammatory bowel disease (IBD) and systemic lupus erythematosus) based on their potential overlap of disease aetiology with RA and the understanding of heritable elements for these diseases. The scores are hierarchically cumulative, so the autoimmune disease scores represent an accumulated score for all autoimmune disorders according to Open Targets³¹. Figure 3a displays a heatmap of these scores, excluding 86 Tax3 genes that have a value of 0 for all selected scores. SNP-to-gene evidence (TOPSIS) is also displayed, and MHC region genes are labelled. Rows and columns are hierarchically clustered using hclust from the fastcluster³⁷ R package. This analysis revealed 3 additional weak known genetic associations with RA (chr2 rs4338920 LRP1B, chr13 rs17086772 SLC46A3 and chr4 rs17669915 LEF1). These weak associations are from GWAS for non-European populations, case-case comparative GWAS and similar disease GWAS. Other interesting genes without known genetic association with RA include 2 genes targeted by drugs in RA clinical trials (PDE5A: Dipyridamole, IMPDH1: Mizoribine), 1 additional gene targeted by drugs in clinical trials for other autoimmune diseases (ADRA1: Isoxsuprine for multiple sclerosis), 6 genes with known genetic association to autoimmune diseases but not RA (C1QTNF6, BCL2L11, KAZN, ETV3, MACROD2, PROK2, KCNE4), and 3 genes from pathways linked to IBD (PROK2, CYTH4, RAC2).

Taxonomy3 findings have been shown to support many existing genetic associations with RA, reveal unknown genetic associations with genes that are known to be associated with RA or related autoimmune disorders, and provide 150 novel, genetically associated RA targets for further validation and exploration.

In-silico exploration of Taxonomy3 findings

Bioinformatics analysis was performed to interpret and prioritise the Tax3 genes. The 190 protein-coding genes underwent enrichment analysis as described in Methods. The top significantly enriched terms included Antigen processing and presentation (KEGG) (FDR p = 3.96e−03), MHC class II antigen presentation (REACTOME) (FDR p = 8.27e−03), and many KEGG terms for autoimmune and infectious diseases, including Staphylococcus aureus infection (FDR p = 3.96e−03), Autoimmune thyroid disease (FDR p = 8.27e−03), Systemic lupus erythematosus (FDR p = 1.69e−02), Asthma (FDR p = 8.27e−03), and Rheumatoid arthritis (p = 1.69e−02). These results were highly dominated with HLA genes, so a separate enrichment was performed excluding genes from the MHC region. The top terms for this analysis were transferase activity (p = 1.53e−03) and bone morphogenic protein (BMP) signalling pathway (p = 1.53e−03), the latter of which has been linked to autoimmune disease³⁸.

Genetic association can only elucidate part of the mechanistic network for complex diseases, and our inclusive SNP-to-gene mapping pipeline will include false positives, so in order to contextualise our gene list and encourage dropout of false positives, a network expansion and clustering approach was undertaken. Experimentally validated protein–protein interactions (PPI) from IntAct³² were obtained between genes within the 190 protein-coding Tax3 genes, and also between interactor proteins and at least 2 Tax3 genes. The resulting PPI network included 114 Tax3 genes and 359 interactors. Network visualisation and clustering revealed some peripheral clusters, some clusters formed of pairs of Tax3 nodes with many common interactors, and a large central subnetwork of interconnected nodes (Fig. 3b). These interactions are mainly from in-vitro studies, so represent potential protein–protein interactions that could occur in a physiological context. To interpret this network and identify relevant clusters and subnetworks, 2 complementary approaches were undertaken, PPI network community identification using network clustering methods within igraph³⁵, and network coexpression clustering using WGCNA³⁴.

PPI subnetworks were identified using the cluster walktrap method from the igraph R package³⁵. This identified 20 clusters with at least 5 members. For coexpression clustering, relevant gene expression data was downloaded from Genevestigator³³. 511 patient samples from RA studies were selected, representing multiple disease-relevant tissue types, and 233 Tax3 genes + 349 interactors underwent network coexpression clustering using WGCNA³⁴. This identified 14 clusters, 13 of which had distinct eigengene expression profiles in RA-relevant tissues. Correlation analysis of cluster eigengenes revealed 3 overall cluster groups (Fig. 3c) representing targets that were 1: widely expressed but highest in synovial membrane and monocytes, and low in fibroblast synoviocytes, 2: synovial membrane-specific, 3: highest expression in T-cells and monocytes, showing that Tax3 genes are enriched for genes expressed in highly disease-relevant cell types. 5 of these coexpression clusters had significantly enriched pathway terms, using a combined collection of GO, KEGG and REACTOME terms (Fig. 3d). The tan cluster was highly enriched for MHC class II genes, and many diseases for which MHC-II factors are associated. The purple cluster was enriched for inflammatory response genes (FDR p = 4.53e−02). This cluster did not feature any direct protein–protein interactions, but expanding this cluster for direct interactors displayed multiple purple interactor genes that may provide additional druggable targets to perturb the RA-specific inflammatory network identified through Tax3 genetic associations (Fig. 3e). Genes mapped from the MHC region represent distinct subnetworks of this analysis, as illustrated in Supplementary Fig. S2). This analysis provides potential drug targets that could be taken forward into a variety of assays to explore and validate their role in disease aetiology.

Small-molecule druggability (tractability) assessment was also conducted on network nodes using data from the Open Targets platform³¹ as described in the Methods. 3 additional interactor genes were identified that are reported to be targeted by drugs in clinical trials for autoimmune diseases (FGFR3: Masitinib for RA, KCNA3: Dalfampridine for multiple sclerosis, PSMB5: Bortezomib and Ixazomib for autoimmune thrombocytopenic purpura) In order to progress the analysis and validate potential therapeutic targets, genes with available tool compounds were selected that may impact and disrupt the disease network.

In vitro validation of Taxonomy3 findings

In an attempt to validate the novel genetic associations from Taxonomy3 analysis in a preliminary cellular context, we identified those putative RA targets which were entirely novel, and those established RA targets without a previous genetic link that have been previously found to be expressed in leukocytes. Publicly available tool compounds were available for MEF2B and IMPDH1, and an internal compound (C4X_17358) was synthesized as an inhibitor of the potassium channel Kv1.3 (IC₅₀ 2 nM determined via SyncroPatch, data not shown), of which the Tax3 gene KCNE4 is a functional co-factor (Table 3). These compounds were tested in the PBMC validation assay as described in the Methods. The study was designed to investigate potential effects of the tool compounds on immune cells within the PBMC population and so, anti-CD3 was used as the stimulus such that the co-stimulatory signal would be provided by antigen presenting cells within the culture rather than bypassing this by addition of anti-CD28. Anti-CD3 stimulation resulted in a robust increase in T-cell proliferation (Supplementary Fig. S3). CTLA-4 Fc Chimera was included in the study as a positive control^39,40. Whilst it is not certain that the putative genetic variants modulate the mapped genes in PBMCs specifically, using tool compounds to investigate potential modulation of inflammatory pathways in these cells would provide justification to investigate this further.

Table 3 Taxonomy3 targets to be tested in PBMC validation assay.

Full size table

Following 72 h of treatment, no compound was found to reduce cell viability below 75% (Fig. 4). Pro-inflammatory cytokine measurements were carried out for (interleukins) IL-17a, IL-6, interferon gamma (IFNγ) and tumour necrosis factor alpha (TNFα) (Fig. 4a–d) in cells stimulated with anti-CD3. BML-210 treatment resulted in a trend to further increase inflammatory cytokine release with increasing compound concentration (Fig. 4a). Comparatively, treatment with C4X_17358, merimepodib or CTLA-4 Fc Chimera resulted in a trend toward a decrease in pro-inflammatory cytokines with increasing compound concentration (Fig. 4b–d).

Discussion

To date, divergences have been used in assessing diagnostic kit performance⁴⁵, and more recently they have been used in Bayesian Analysis of Gene Essentiality (BAGEL) methodology to analyse pooled CRISPR studies⁴⁶. Taxonomy3 is the first application of which we are aware that applies individualised LBFs to human genetic data linked to a binary outcome to be used in drug discovery. Replacing individual genotypes with the corresponding LBF and analysing the LBF matrix using PCA generates a number of outputs:

Variables of interest relevant for case/control discrimination;
Heterogeneity in the case/control populations, allowing for sub-groups within the populations to be identified;
Variables of interest relevant for sub-group separation.

As this is a different approach to exploring genetic data, it is not surprising that the outputs are not identical to the output of the conventional GWAS analysis. In the original analysis of this RA case/control dataset¹¹, two peaks were seen, one on chromosome 6 in the MHC region, and one on chromosome 1. The Manhattan plot from our Taxonomy3 analysis (Fig. 2a,e) shows that we have replicated these two findings and also greatly increased the number of significant associations. Unlike traditional GWAS, which primarily seeks to estimate parameters (like odds ratios) associated with SNP-trait linkages, Taxonomy3 places emphasis on deriving conclusions based on the data and available evidence. This shift in focus allows Taxonomy3 to provide insights into the distinctions between groups and the structure of individual observations within these groups. Comparison of our findings with the data currently available on Open Targets shows that while some of our additional findings have been observed in other, subsequent larger studies or meta-analyses (which provides additional validation for our methodology), there are other findings that provide truly novel insights.

Bioinformatic analysis of the Tax3 genes showed significant clustering in immune-related pathways (Fig. 3b), providing good validation for the newly identified genes. Likewise, co-expression clustering analysis showed that changes of expression of the genes were enriched in T-cells, synovial tissues and monocytes—highly disease-relevant cell types (Fig. 3c).

Suitable tool compounds were only available for a few of the Tax3 genes—Kv1.3 (KCNE4), IMPDH1 and MEF2B: two of these are of particular note:

KCNE4: Association with this gene was identified through Taxonomy3 analysis. This gene has not previously been associated with RA, and codes for an accessory protein modulating the activity of Kv1.3^47,48. This K⁺ channel is important for the functioning of T_EM cells, which have a key role in maintaining the autoimmune drive in RA^49,50,51. For this reason, inhibiting this channel has been proposed as a possible target for various inflammatory diseases^52,53,54 including RA. We consider that the addition of genetic support for the pathway from our Taxonomy3 analysis significantly increases the likelihood of successful clinical development for inhibitors of Kv1.3.

MEF2B: This gene mapped from a SNP identified in Taxonomy3 analysis. The gene has not been associated with RA in published GWAS, and there is limited data to implicate the gene in RA. Examination of the tool compound BML-210, that blocks the interaction of MEF2 with histone deacetylase (HDAC), in in vitro models showed consistent effects of inhibition of MEF2B on immune cell function. This result underlines the ability of Taxonomy3 analysis to generate novel genetic insights, adding significantly to our knowledge of disease aetiology, and flagging novel drug targets with genetic support.

Association with NR4A3 was also identified in Taxonomy3 analysis passing the Benjamini–Hochberg genome wide significance threshold (false discovery rate correction). This nuclear receptor is known to impact expression of FoxP3, the key gene required for regulatory T-cell (Treg) formation^55,56,57. A number of publications have demonstrated a dysfunction of Treg cells in RA patients^58,59,60, and therapeutic modulation of Tregs for RA has been proposed⁶¹. Unfortunately, no suitable tool compounds were available to probe the functioning of this gene in the PBMC model.

There are a number of limitations to this study. Firstly, we have not been able to analyse other datasets (potentially from other genetic ancestries) to examine the translation of our findings to other populations. Secondly, as with all genetic studies, the inherent uncertainties of SNP-to-gene mapping means that the genes described here may not be the true genes involved in the genetic susceptibility detected by Taxonomy3. However, we have maximised the chance of including the causal gene by using a combination of positional and functional mapping in our SNP-to-gene process and downstream triaging to identify the genes with a high probability of impacting disease aetiology. Furthermore, as non-coding variants are thought to regulate genes in a cell type specific manner, it is possible that some or all of the subset of Tax3 genes we examined in the PMBC assay (MEF2B, Kv1.3 (KCNE4), IMPDH1) are in fact regulated by the variants detected, in a different cell type (e.g. fibroblast synoviocytes). Once confidence in a SNP-gene mapping is achieved, along with demonstrating clear impact of the target in a disease relevant context (conduct of additional studies extending beyond the remit of this publication) validation of the genetics will need to be confirmed e.g. show differential regulation of the gene in question depending on which allele is present. As the genotypes for the PBMC donors were not available this could not be examined alongside the tool compound examination. Extension of the studies to investigate the role of the genetic targets in PBMC from donors with RA, as well as identification of the cell types involved in the response for the individual targets, would also provide a greater level of support for the relevance of the targets in the disease setting.

In this paper, we describe a unique method of analysing genetic datasets. The results of our Taxonomy3 analyses present an opportunity for initiation of drug discovery programmes for treating RA, the identified novel targets (e.g. MEF2B) benefitting from the increased chances of success due to their genetic association with the disease, subject to further experimental validation. The method can be applied to all datasets with well-defined dichotomous cohorts e.g. case/control, mild/severe, responders/non-responders, and has the added benefit that other data types (e.g. mRNA expression, clinical data, longitudinal phenotypes) can all be converted into LBFs and co-analysed with genetic data²¹. The method can identify novel genes or pathways of interest for a disease (or other phenotype of interest) leading to innovative drug discovery programmes with the added confidence of genetic support.

Data availability

The data that support the findings of this study are available from Wellcome Trust Case Control Consortium but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for that project was provided by the Wellcome Trust under award 076113. Accession numbers for data used in this study: EGAD00000000007 - WTCCC1 project Rheumatoid arthritis (RA) samples, EGAD00010000250 - NBS control samples.

Code availability

The mathematical foundation of Taxonomy3 and its algorithms have been published previously^19,20,21,22. The code used to run permutation analysis and bioinformatics analysis makes use of open-source software packages and is available at https://github.com/c4x-discovery/tax3-ra-publication. All analyses were performed on a cluster of Linux machines (Amazon Web Services).

References

Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).
Article CAS PubMed Google Scholar
King, E. A., Davis, J. W. & Degner, J. F. Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet. 15, e1008489 (2019).
Article PubMed PubMed Central Google Scholar
Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim. 1, 1–21 (2021).
Google Scholar
Cano-Gamez, E. & Trynka, G. From GWAS to function: Using functional genomics to identify the mechanisms underlying complex diseases. Front. Genet. 11, 505357 (2020).
Article Google Scholar
Manolio, T. A. Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363, 166–176 (2010).
Article CAS PubMed Google Scholar
Mathieson, I. The omnigenic model and polygenic prediction of complex traits. Am. J. Hum. Genet. 108, 1558–1563 (2021).
Article CAS PubMed PubMed Central Google Scholar
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: From polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022-1034.e6 (2019).
Article CAS PubMed PubMed Central Google Scholar
MacGregor, A. J. et al. Characterizing the quantitative genetic contribution to rheumatoid arthritis using data from twins. Arthritis Rheum 43, 30–37 (2000).
Article CAS PubMed Google Scholar
Stahl, E. A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat Genet 42, 508–514 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447, 661–678 (2007).
Article Google Scholar
The RACI consortium et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Article Google Scholar
Pirmohamed, M. et al. Investigation into the multidimensional genetic basis of drug-induced Stevens-Johnson syndrome and toxic epidermal necrolysis. Res. Rep. 8, 1661 (2007).
CAS Google Scholar
Bowman, C. & Delrieu, O. Immunogenetics of drug-induced skin blistering disorders. Part I: Perspective. Pharmacogenomics 10, 601–621 (2009).
Article CAS PubMed Google Scholar
Bowman, C. & Delrieu, O. Immunogenetics of drug-induced skin blistering disorders. Part II: Synthesis. Pharmacogenomics 10, 779–816 (2009).
Article CAS PubMed Google Scholar
Bowman, C. & Delrieu, O. Correlation laplacians, haplotype networks and residual pharmacogenetics.
Bowman, C. Genetic ‘skylines’ in the aetiology of graded pharmaceutical phenotypes. Geom.-Driven Stat. Cut. Edge Appl. (2017).
Bowman, C., Delrieu, O. & Roger, J. Filtering pharmacogenetic signals.
Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951).
Article MathSciNet Google Scholar
Delrieu, O. & Bowman, C. Visualizing gene determinants of disease in drug discovery. Pharmacogenomics 7, 311–329 (2006).
Article CAS PubMed Google Scholar
Delrieu, O. & Bowman, C. E. On using the correlations of divergences. Syst. Biol. Stat. Bioinf. 27, 35 (2007).
Google Scholar
Delrieu, O. & Bowman, C. E. Visualisation of gene and pathway determinants of disease. Quant. Biol. Shape Anal. Wavelets 21–24 (2005).
Bowman, C. E. Individualised divergences, in Geometry Driven Statistics, 337–355 (Wiley, 2015). https://doi.org/10.1002/9781118866641.ch17
Biernacki, C., Celeux, G., Govaert, G. & Langrognet, F. Model-based cluster and discriminant analysis with the MIXMOD software. Comput. Stat. Data Anal. 51, 587–600 (2006).
Article MathSciNet Google Scholar
Zheng, X. et al. HIBAG—HLA genotype imputation with attribute bagging. Pharmacogenomics J. 14, 192–200 (2014).
Article CAS PubMed Google Scholar
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Article ADS PubMed PubMed Central Google Scholar
Ghoussaini, M. et al. Open targets genetics: Systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res. 49, D1311–D1320 (2021).
Article CAS PubMed Google Scholar
The 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Bigaret, S., Hodgett, R. E., Meyer, P., Mironova, T. & Olteanu, A.-L. Supporting the multi-criteria decision aiding process: R and the MCDA package. EURO J. Decis. Process. 5, 169–194 (2017).
Article Google Scholar
Ochoa, D. et al. Open targets platform: Supporting systematic drug–target identification and prioritisation. Nucleic Acids Res. 49, D1302–D1310 (2021).
Article CAS PubMed Google Scholar
del Toro, N. et al. The IntAct database: Efficient access to fine-grained molecular interaction data. Nucleic Acids Res. 50, D648–D653 (2022).
Article PubMed Google Scholar
Hruz, T. et al. Genevestigator V3: A reference expression database for the meta-analysis of transcriptomes. Adv. Bioinform. 2008, 1–5 (2008).
Article Google Scholar
Langfelder, P. & Horvath, S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008).
Article Google Scholar
Csardi, G. & Nepusz, T. The igraph software package for complex network research. 10 (2006).
Almende B. V. et al. visNetwork: Network visualization using ‘vis.js’ library version 2.1.2 from CRAN. https://rdrr.io/cran/visNetwork/ (2022).
Müllner, D. fastcluster: Fast hierarchical, agglomerative clustering routines for R and python. J. Stat. Softw. 53, 1–18 (2013).
Article Google Scholar
Eixarch, H. et al. Inhibition of the BMP signaling pathway ameliorated established clinical symptoms of experimental autoimmune encephalomyelitis. Neurotherapeutics 17, 1988–2003 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lei, C. et al. Association of the CTLA-4 gene with rheumatoid arthritis in Chinese Han population. Eur. J. Hum. Genet. 13, 823–828 (2005).
Article PubMed Google Scholar
Cutolo, M., Sulli, A., Paolino, S. & Pizzorni, C. CTLA-4 blockade in the treatment of rheumatoid arthritis: An update. Expert Rev. Clin. Immunol. 12, 417–425 (2016).
Article CAS PubMed Google Scholar
Jayathilaka, N. et al. Inhibition of the function of class IIa HDACs by blocking their interaction with MEF2. Nucleic Acids Res. 40, 5378–5388 (2012).
Article CAS PubMed PubMed Central Google Scholar
Serrano-Albarrás, A. et al. Fighting rheumatoid arthritis: Kv13 as a therapeutic target. Biochem. Pharmacol. 165, 214–220 (2019).
Article PubMed Google Scholar
Tong, X. et al. Merimepodib, an IMPDH inhibitor, suppresses replication of Zika virus and other emerging viral pathogens. Antiviral Res. 149, 34–40 (2018).
Article CAS PubMed Google Scholar
Ratcliffe, A. J. Inosine 5’-monophosphate dehydrogenase inhibitors for the treatment of autoimmune diseases. Curr. Opin. Drug Discov. Dev. 9, 595–605 (2006).
CAS Google Scholar
Weissler, A. M. & Bailey, K. R. A critique on contemporary reporting of likelihood ratios in test power analysis. Mayo Clin. Proc. 79, 1317–1318 (2004).
Article PubMed Google Scholar
Hart, T., Brown, K. R., Sircoulomb, F., Rottapel, R. & Moffat, J. Measuring error rates in genomic perturbation screens: Gold standards for human functional genomics. Mol. Syst. Biol. 10, 733 (2014).
Article PubMed PubMed Central Google Scholar
Solé, L. et al. The C-terminal domain of Kv1.3 regulates functional interactions with the KCNE4 subunit. J. Cell Sci. 129, 4265–4277 (2016).
Article PubMed PubMed Central Google Scholar
Solé, L. et al. KCNE4 suppresses Kv1.3 currents by modulating trafficking, surface expression and channel gating. J. Cell Sci. 122, 3738–3748 (2009).
Article PubMed Google Scholar
Brennan, F. M. et al. Resting CD4+effector memory T cells are precursors of bystander-activated effectors: A surrogate model of rheumatoid arthritis synovial T-cell function. Arthritis Res. Ther. 10, R36 (2008).
Article PubMed PubMed Central Google Scholar
Matsuki, F. et al. CD45RA−Foxp3low non-regulatory T cells in the CCR7−CD45RA−CD27+CD28+ effector memory subset are increased in synovial fluid from patients with rheumatoid arthritis. Cell. Immunol. 290, 96–101 (2014).
Article CAS PubMed Google Scholar
Toh, M.-L. & Miossec, P. The role of T cells in rheumatoid arthritis: New subsets and new targets. Curr. Opin. Rheumatol. 19, 284–288 (2007).
Article PubMed Google Scholar
Azam, P., Sankaranarayanan, A., Homerick, D., Griffey, S. & Wulff, H. Targeting effector memory T cells with the small molecule Kv1.3 Blocker PAP-1 suppresses allergic contact dermatitis. J. Investig. Dermatol. 127, 1419–1429 (2007).
Article CAS PubMed Google Scholar
Beeton, C. et al. Targeting effector memory T cells with a selective peptide inhibitor of Kv1.3 channels for therapy of autoimmune diseases. Mol. Pharmacol. 67, 1369–1381 (2005).
Article CAS PubMed Google Scholar
Wulff, H. et al. The voltage-gated Kv1.3 K(+) channel in effector memory T cells as new target for MS. J. Clin. Invest. 111, 1703–1713 (2003).
Article CAS PubMed PubMed Central Google Scholar
Bandukwala, H. S. & Rao, A. ‘Nurr’ishing treg cells: Nr4a transcription factors control Foxp3 expression. Nat. Immunol. 14, 201–203 (2013).
Article CAS PubMed PubMed Central Google Scholar
Bending, D. & Ono, M. From stability to dynamics: Understanding molecular mechanisms of regulatory T cells through Foxp3 transcriptional dynamics. Clin. Exp. Immunol. 197, 14–23 (2019).
Article CAS PubMed Google Scholar
Won, H. Y. & Hwang, E. S. Transcriptional modulation of regulatory T cell development by novel regulators NR4As. Arch. Pharm. Res. 39, 1530–1536 (2016).
Article CAS PubMed Google Scholar
Chavele, K.-M. & Ehrenstein, M. R. Regulatory T-cells in systemic lupus erythematosus and rheumatoid arthritis. FEBS Lett. 585, 3603–3610 (2011).
Article CAS PubMed Google Scholar
Ehrenstein, M. R. et al. Compromised function of regulatory T cells in rheumatoid arthritis and reversal by anti-TNFα therapy. J. Exp. Med. 200, 277–285 (2004).
Article CAS PubMed PubMed Central Google Scholar
Leipe, J., Skapenko, A., Lipsky, P. E. & Schulze-Koops, H. Regulatory T cells in rheumatoid arthritis. Arthritis Res. Therapy 7, 93 (2005).
Article CAS Google Scholar
Esensten, J. H., Wofsy, D. & Bluestone, J. A. Regulatory T cells as therapeutic targets in rheumatoid arthritis. Nat. Rev. Rheumatol. 5, 560–565 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Olivier Delrieu and Clive Bowman for designing the concept of the analysis method, stimulating discussions and for creating the Taxonomy3 analysis pipeline.

Author information

Authors and Affiliations

C4X Discovery Ltd, Manchester One, 53 Portland Street, Manchester, M1 3LD, UK
Justyna Kozlowska, Neil Humphryes-Kirilov, Anastasia Pavlovets, Martin Connolly, Zhana Kuncheva, Jonathan Horner, Ana Sousa Manso, Clare Murray, J. Craig Fox & Alun McCarthy

Authors

Justyna Kozlowska
View author publications
You can also search for this author in PubMed Google Scholar
Neil Humphryes-Kirilov
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Pavlovets
View author publications
You can also search for this author in PubMed Google Scholar
Martin Connolly
View author publications
You can also search for this author in PubMed Google Scholar
Zhana Kuncheva
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Horner
View author publications
You can also search for this author in PubMed Google Scholar
Ana Sousa Manso
View author publications
You can also search for this author in PubMed Google Scholar
Clare Murray
View author publications
You can also search for this author in PubMed Google Scholar
J. Craig Fox
View author publications
You can also search for this author in PubMed Google Scholar
Alun McCarthy
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Design of the analysis, J.K.; interpretation of the results, J.K. A.M., M.C., N.HK., J. C.F.; writing of the manuscript, J.K., A.M., N.HK., M.C., J.C.F.; figure and table preparation, J.K, N.HK., M.C.; bioinformatics analysis, N.HK.; data analysis, A.P., N.HK., J.K. M.C.; collection and/or assembly of data, J.K.; Biology project management, A.S.M.; final approval of the manuscript, C.M; all authors revised the manuscript.

Corresponding author

Correspondence to Justyna Kozlowska.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kozlowska, J., Humphryes-Kirilov, N., Pavlovets, A. et al. Unveiling new genetic insights in rheumatoid arthritis for drug discovery through Taxonomy3 analysis. Sci Rep 14, 14153 (2024). https://doi.org/10.1038/s41598-024-64970-0

Download citation

Received: 11 January 2023
Accepted: 14 June 2024
Published: 19 June 2024
DOI: https://doi.org/10.1038/s41598-024-64970-0
Springer Nature Limited

Unveiling new genetic insights in rheumatoid arthritis for drug discovery through Taxonomy3 analysis

Abstract

Similar content being viewed by others

Genomics-driven drug discovery based on disease-susceptibility genes

Identification of potential drug targets for rheumatoid arthritis from genetic insights: a Mendelian randomization study

Network and pathway expansion of genetic disease associations identifies successful drug targets

Introduction

Methods

Overview of Taxonomy3 method

Taxonomy3 analysis data management

Subjects and phenotype data

Genotypes

Univariate quality control

HLA imputation

Genetic ancestry

SNP-to-gene mapping

Assessment of novelty

Bioinformatic assessment of drug tractability

Network and pathway enrichment

Cell assays

Results

Univariate quality control

HapMap co-analysis

Taxonomy3 analysis results

Interpretation of Taxonomy3 findings (SNP-to-gene)

Novelty of Taxonomy3 findings

In-silico exploration of Taxonomy3 findings

In vitro validation of Taxonomy3 findings

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation