Introduction

Malaria remains a huge cause of morbidity and mortality in many endemic regions burdened with the infection especially sub-Saharan Africa1. The recent World Health Malaria Report estimated about 249 million malaria cases and 608,000 deaths, and 95% of the cases and deaths are from sub-Saharan Africa2. As a result of the growing threat of parasite and vector resistance to antimalarial drug and insecticides3,4,5,6,7,8,9,10,11,12,13,14,15, attention is now focused, more than ever on other control measures that can be deployed for malaria control. Unfortunately, vaccine development has also not been spared from parasite evasive mechanisms. Antigenic variations of different Plasmodium falciparum antigens16,17,18 are huge challenges that impede efficacy of the available malaria vaccine. The two approved malaria vaccines provide adequate (~ 30–50%) protection against malaria with multiple boosts19,20,21, thus requiring deeper understanding of the parasite as well as appropriate host responses.

Plasmodium falciparum parasite possesses a unique complex life cycle criss-crossing the human host, the invertebrate mosquito vector and the pathogen22. Advancement of this life cycle in the different host involves dynamic and tight regulation of the expression of genes involved with human immune response and the parasite genome23,24,25,26,27,28,29,30,31,32. Hence, understanding the functional interaction between humans and the parasite becomes pertinent to dissect the host–pathogen interactome. The genomic architecture of the human host and parasite provides the unique platform shaping host–pathogen interaction, and this communication is a selective force on the host immune response and parasite evasive tactics33,34,35,36. Therefore, understanding this interaction require an in-depth analysis of host–pathogen transcriptomic profile. Many studies have been designed to understand parasite transcriptome/genome31,37, however, studies to decipher the transcriptome of the human host under an antigenic assault is limited. Current findings have shown the role of signaling and pathogen recognition receptors in immune response to disease, however, the role of regulatory elements especially in Pf infection is currently unknown. A recent report concluded that epigenetically reprogrammed monocytes potentially drive a disease outcome through IL-10, CD163 and CD20638, but the regulatory elements involved in the over- or under-expression of these genes are yet to be fully understood, presenting an opportunity to explore a potential area for disease surveillance or diagnostic tool, benefiting prevention/control of disease.

At present, there are increasing reports of the involvement of regulatory players in modulating the expression of genes at the transcriptional, posttranscriptional, translational and post-translation levels39,40,41 in different species and disease states42. Such regulatory players include long non-coding RNAs (LncRNAs), micro RNAs (miRNA) and transcription factors (TF). LncRNAs are a class of non-coding transcripts, defined by a threshold of > 200 nucleotides and found within or between coding genes43. They have been implicated in regulating different pathological and biological functions at the transcriptional and epigenetic levels, thereby influencing host immune response42. Characterization of lncRNAs and their target genes in different species is important in delineating their functions in interspecies “crosstalk”. MiRNA on the other hand, are shorter than lncNRAs and attached to the 3′-5′ untranslated region and the coding arm of a messenger RNA to modify gene expression, and ultimately protein products translated from modified genes43. Early diagnosis of infection and quick intervention are key strategies in malaria control. With an array of miRNA and lncRNA discoveries serving as biomarkers in various disease conditions such as cardiovascular diseases, cancer, diabetes and even infectious pathogens44,45,46, there is an increased opportunity to add these regulatory elements (REs) to the arsenal for malaria control. Many studies on malaria have attempted to correlate differential expression of these REs in individuals with different clinical manifestations to identify host biomarkers that can serve either diagnostic or therapeutic purpose, but none has been found yet.

Research on the role of host lncRNAs during Plasmodium infection is limited. Some studies have evaluated the role of P. falciparum lncRNA and how it regulates var genes activation in malaria parasite47. However, one study in BALB/c mice demonstrated the upregulation of 132 and downregulation of 159 lncRNAs in BALB/c mice infected with P. yoelii, amongst which four (ENMSUSG00000111521.1, XLOC_038009, XLOC_058629 and XLOC_065676) were found to be associated with malaria severity by interacting with dendritic/regulatory T cells48. Three human miRNA (miR-451, miR-223 and let-7i) were shown to negatively regulate malaria infection, and at the same highly expressed in both sickle cell carriers (HbAS) and those with sickle cell disease (HbSS)49. miR-451 was found to be highly expressed in both parasitized and unparasitized red blood cells (RBCs)50. However, a different study found miR-451, miR-106, miR-16, miR-92, miR-7b, miR-144, miR-142, let-7f, let-7a, and miR-91 downregulated, while miR-223 and miR-19b were upregulated in the RBCs of individuals infected with malaria51,52. None has however looked at a combination of regulatory elements such as lncRNAs, miRNA and TFs at a glance.

With an abundance of binding sites for miRNA and messenger RNA (mRNA), lncRNA can act as competing endogenous RNA (ceRNA) and are significant players in post-transcriptional gene expression53,54, while TFs are proteins capable of altering or activating gene-expression level55,56,57,58. Multiple miRNA databases such as miRWalk59, miRNet60, and TargetScan61 compute potential miRNA-mRNA interactions, while the role of individual miRNA can be inferred through functional analysis with Gene Ontology (GO)62. Similarly, lncRNA prediction binding software, including NONCODE63, lncRNA2Target64 and lncTar65 have been useful in guiding bench experiments. Current algorithms, relying on base pair complementation, evolutionary conservation, and thermodynamic stability of binding regions, have been shown to be useful in predicting miRNA, lncRNA and TF binding sites on target genes53,54,66.

Therefore, we set out to perform transcriptomic profiling for the identification of differentially expressed genes in individuals with different malaria conditions, characterize associated lncRNAs and miRNA, and extrapolate their interaction with immune genes.

Results

Differentially expressed genes in various malaria conditions

Of the 107 total RNAseq datasets retrieved from the SRA database and analyzed, a total of 11,221 genes were identified, however, an estimated 5534 genes were found to be differentially expressed among those with severe, symptomatic, asymptomatic malaria and uninfected individuals. A unique pattern of expression was observed among the different clinical definitions of disease: individuals with severe and symptomatic malaria from both Gambia and Burkina Faso had similar expression pattern, with similar genes observed to be upregulated (e.g. UBB, TENTSC, HSP90AA1, FCGR3B, CYBB, SOD2 etc.) and downregulated (DCAF12, IFITM3, R3HDM4, CNN2, CFAP65, ATP283 etc.). However, a reverse pattern was observed among individuals with asymptomatic malaria infection or uninfected controls. Of the 5534 DEGs, 3928 genes were upregulated in individuals with severe and symptomatic malaria, and these genes were found downregulated in those with asymptomatic malaria or uninfected individuals. Consequently, 1604 were observed to be upregulated in the asymptomatic or uninfected individuals while being downregulated in the severe/symptomatic groups (Fig. 1). One hundred and thirty genes showed no change in their level of expression (Fig. 2). Network interactome of the identified genes with various miRNA targets revealed a moderate form of interaction (maximum of 4 genes connecting with a miRNA), where KMT2B, IDF2, TTYH3, NTRK2 genes were observed to interact with hsa-mir-331, and EGFR, MBD6, ZNF385A, TTYH3 genes interacted with hsa-mir-24 (Fig. 3).

Figure 1
figure 1

Venn diagram (http://bioinformatics.psb.ugent.be/webtools/Venn) of DEGs in different malaria clinical manifestations. NB: Genes that were upregulated in individuals with severe and symptomatic malaria (3928) were downregulated in those with asymptomatic malaria or individuals not infected. Consequently, genes that were upregulated in asymptomatic malaria infected individuals or those with no infection (1604) were downregulated in severe/asymptomatic malaria conditions. While 130 genes showed no change in the level of their expressions.

Figure 2
figure 2

Differentially expressed genes in individuals with various Plasmodium falciparum clinical manifestations. Individuals with symptomatic and severe malaria showed similar patterns of up/downregulated genes, and this pattern was reversed in those with asymptomatic or no malaria infection. NB: GSev—Gambia severe; BFSymp—Burkina Faso Symptomatic; Tu—Tanzania uninfected; Ti – Tanzania infected; Muprv—Mali uninfected pre-vaccinated group; Mupov—Mali uninfected post-vaccinated; Miprv—Mali infected pre-vaccinated; Mipov—Mali infected post-vaccinated. Following trimming of adapters and lowquality reads (Trimmomatic) and alignment with the human reference genome—UCSC hg38 (Hisat2), DEGs were visualized with DESeq2 (https://bioconductor.org/packages/release/bioc/html/DESeq2.html).

Figure 3
figure 3

Network interactome (https://www.mirnet.ca/miRNet/home.xhtml) between DEGs (pink) and differentially expressed miRNAs (blue). Among all the DEGs and miRNAs, these are the only ones that interacted with one another, indicating a pathway possibly implicated with malaria. Hsa-mir-331 interacting and regulating KMT2B, IDF2, TTYH3, and NTRK2 genes and hsa-mir-24 interacting with EGFR, MBD6, ZNF385A, as well as TTYH3 genes in this interactome. Among all the differentially expressed miRNAs, these are the ones interacting with the DEGs.

Differentially expressed miRNA interacting with target genes and lncRNAs

A total of 171 miRNAs were identified, of which 141 miRNAs were differentially expressed, and were found across all human chromosomes except chromosome 21, which had no representation. Of these, 78 were upregulated and 63 downregulated in severe/symptomatic group and vice versa with the asymptomatic/uninfected group. Various biological processes including positive regulation of cell cycle (adjusted p-value 0.00000153), negative regulation of protein metabolism (adjusted p-value 0.0000194), response to drug (adjusted p-value 0.00197) were identified (Fig. 4a) in the upregulated miRNA. Biological processes identified among the downregulated miRNA include regulation of RNA metabolic process (adjusted p-value 0.000144), regulation of gene expression (adjusted p-value 0.000201), regulation of transcription (adjusted p-value 0.000393), and cellular response to extracellular stimulus (adjusted p-value 0.000951) (Fig. 5a). Molecular functions of upregulated miRNAs identified 25 functions including enzyme binding (adjusted p-value 0.0454), protein complex binding (adjusted p-value 0.0454) etc. (Fig. 4b), while 20 molecular functions including neutral amino acid transmembrane transporter activity (adjusted p-value 0.00527) and RNA binding (adjusted p-value 0.0408) (Fig. 5b) were identified among the downregulated miRNAs. Of the 16 cellular components identified in miRNA that were overexpressed, cytosol (adjusted p-value 0.000179), mitochondrial outer membrane (adjusted p-value 0.000837), nucleoplasm (adjusted p-value 0.000895) top the list (Fig. 4c), while ribonucleoprotein complex (adjusted p-value 0.0000549), nucleus (adjusted p-value 0.0309), spliceosome complex (adjusted p-value 0.0467) were among the cellular components observed in downregulated miRNAs (Fig. 5c).

Figure 4
figure 4

Enriched biological processes (a), molecular function (b) and cellular component (c) of upregulated miRNAs from individuals with different malaria clinical outcomes. All pathways were generated using the DAVID tool (https://david.ncifcrf.gov/tools.jsp), and verified on the iDEP platform (https://idepsite.wordpress.com/pathways/). Note that the list of genes for the pathways represent those that are significant after correction.

Figure 5
figure 5

Enriched biological processes (a), molecular function (b) and cellular component (c) of downregulated miRNAs from individuals with different malaria clinical outcomes. All pathways were generated using the DAVID tool (https://david.ncifcrf.gov/tools.jsp), and verified on the iDEP platform (https://idepsite.wordpress.com/pathways/). Note that the list of genes for the pathways represent those that are significant after correction.

Of all identified miRNA, five stood out (hsa-mir-32, hsa-mir-25, hsa-mir-221, hsa-mir-29 and hsa-mir-148) and were shown found to have multiple interactions with various genes from the same dataset. Of these five, hsa-mir-221 had the highest connections to 16 genes, including DDIT4, ICAM1, CDKN1C, p27, FOS etc. The second most connected miRNA is hsa-mir-29a with connections to 14 genes (CDK6, CD276, CDC42, PTEN, DNMT3A, CXXC6 etc.), while the least connected miRNA is hsa-mir-32 with only one interaction (Fig. 6). Furthermore, all of the five miRNAs, with the exception of hsa-mir-32, interacted with multiple lncRNAs, such as hsa-mir-221 interacting with RUNDC3A, TMEM147, FGDS, ARHGA27P1, hsa-mir-29a with STAG3L5P, MALAT1, hsa-mir-25 with NEAT1, LINC02275, and hsa-mir-148 with MMP25, XIST, ERVK13 (Fig. 7).

Figure 6
figure 6

Network interaction (https://www.mirnet.ca/miRNet/home.xhtml) between the most unique miRNAs (blue) and target genes (pink). Hsa-mir-221 (16 genes), hsa-mir-29a (13 genes) and hsa-mir-148b (6 genes) had the most connections, while hsa-mir-342 and hsa-mir-32, were the least connected (one gene each). It is postulated that genes with the most interactions could serve for diagnostic or therapeutic purposes in managing disease.

Figure 7
figure 7

Network interaction (https://www.mirnet.ca/miRNet/home.xhtml) between predominant miRNAs (blue) and some lncRNAs (pink). Various interactions were observed in this visualized display.

Characterization of identified long non-coding RNAs

A total of 2,028 lncRNAs were identified, of which 608 lncRNA were identified to have various expression patterns, with 586 and 22 recorded to be upregulated and downregulated respectively. Applying further stringency reveal only 522 and 14 lncRNAs to be upregulated and downregulated respectively (Supplementary Tables S1 and S2). The minimum number of exons found among the highly expressed lncRNA is 2, while the maximum is 32. However, among the downregulated lncRNAs, the minimum exon is 2 and the maximum 6. Some of highly expressed lncRNAs such as SLC7A11 through interactions with genes like CD43, CDK41, ZBTB47 were found to be involved in antigen-specific activation of T cells, regulation of transcription by RNA polymerase II, while LINC01524 is involved in CD28 co-stimulation when it interacts with CD292, IRGQ, HOOK3, BRD4, AP1M1 genes (Supplementary Tables S1 and S2).

Biological, molecular and cellular functions of the differentially expressed genes

The pathway analyses of the top 100 upregulated genes identified 68 biological processes (Fig. 8a), including axon guidance (p-value 2.52E−07), cell–cell adhesion (p-value 3.78E−05), transmembrane receptor protein tyrosine kinase signaling pathway (p-value 4.33E05), and hemophilic cell adhesion via plasma membrane adhesion molecules (p-value 1.87E040). On the other hand, enrichment analyses of the top 100 downregulated genes identified 26 biological processes (Fig. 8b), including oxygen transport (p-value 6.64E−09), hydrogen peroxidase catabolic process (p-value 1.43E−07), cellular oxidant detoxification (p-value 2.42E−05) the most enriched.

Figure 8
figure 8

Biological processes identified among the topmost 100 (a) upregulated and (b) downregulated genes from individuals with different clinical malaria manifestation All pathways were generated using the DAVID tool (https://david.ncifcrf.gov/tools.jsp), and verified on the iDEP platform (https://idepsite.wordpress.com/pathways/). Note that the list of genes for the pathways represent those that are significant after correction.

Thirty-six (36) molecular functions were identified from these upregulated genes, with functions such as extracellular matrix constituent lubricant activity (p-value 2.34E04), extracellular matrix structural constituent (p-value 5.83E04), protein binding involved in cell–cell adhesion (p-value 0.00823392) were among the most significant pathways (Fig. 9a). However, only 16 molecular functions were identified in the probed downregulated genes, of which oxygen transport activity (p-value 9.54E−11), haptoglobin binding (p-value 6.55E−10) and organic acid binding (p-value 1.20E−09) were more enriched than others (Fig. 9b).

Figure 9
figure 9

Molecular Functions identified among the topmost 100 (a) upregulated and (b) downregulated genes from individuals with different clinical malaria manifestation All pathways were generated using the DAVID tool (https://david.ncifcrf.gov/tools.jsp), and verified on the iDEP platform (https://idepsite.wordpress.com/pathways/). Note that the list of genes for the pathways represent those that are significant after correction.

Integral component membrane (p-value 5.65E−08), plasma membrane (1.79E−07), receptor complex (p-value9.95E−07) were the most enriched cellular components of the upregulated genes (Fig. 10a), while only 11 cellular components were identified among the downregulated ones, with hemoglobin complex (p-value 2.32E−11) and haptoglobin-hemoglobin complex (p-value 8.28E-10) the most enriched (Fig. 10b). KEGG pathway analysis identified axon guidance, protein digestion and absorption, morphine addiction, and calcium oxytocin signaling pathways among the upregulated genes while MAPK signaling pathway, amyotrophic lateral sclerosis and tuberculosis the most identified among the downregulated genes (Fig. 11).

Figure 10
figure 10

Cellular components identified among the topmost 100 (a) upregulated and (b) downregulated genes from individuals with different clinical malaria manifestation All pathways were generated using the DAVID tool (https://david.ncifcrf.gov/tools.jsp), and verified on the iDEP platform (https://idepsite.wordpress.com/pathways/). Note that the list of genes for the pathways represent those that are significant after correction.

Figure 11
figure 11

KEGG pathway analysis of upregulated genes (red) and downregulated genes (blue). Note that the list of genes for the pathways represent those that are significant after correction.

Discussion

Plasmodium falciparum infection remains a major public health challenge in many countries, especially in sub-Saharan Africa, where children under five years of age and pregnant women are the most vulnerable and affected of the population, with infection leading to severe complications and outcome, if not promptly diagnosed and treated. Understanding how immune response genes and regulatory factors impact clinical outcome in different disease classifications can potentially unlock the key for new diagnostic biomarkers and druggable targets.

We used a computational approach to elucidate differentially expressed genes and multi-omic regulatory factors (miRNA and lncRNA) that potentially contribute to disease outcome in different types of clinical malaria from the perspective of the human host. We identified large number of genes that were differentially expressed in the four clinical categories. Interestingly, the severe and symptomatic disease categories showed similar expression patterns, with same genes upregulated and downregulated in both groups. Among the upregulated genes, C-C motif chemokine 11 (CCL11), one of chemokine genes clustered around chromosome 17 and displaying chemotactic activity for eosinophils, as well as a participant in innate immune response67, was the most upregulated. It is predicted to induce the production of reactive oxygen species (ROS) in microglia cells and mediating neutrophils homing into damaged or infected areas, amplifying inflammatory response in the process68. Possibly of most importance today, elevated plasma levels of CCL11 has been associated with severe COVID-19 cases in hospitalized patients67. Another highly expressed gene observed is the mitochondrial RNR2L10 (mt-rnr2l10) gene, which is involved in the regulation and execution of apoptotic cell death during inflammation69. Hence, we can surmise that these highly expressed genes in severe and symptomatic malaria individuals are significant contributors to the inflammatory process mediating a robust innate response, underpinned by an effective downstream adaptive immune response to disease, and ultimately clinical outcomes.

The gamma globin gene 2 (HBG2) among others, was found to be downregulated. This gene alongside HBG1 are normally expressed in fetal liver, bone marrow and spleen and replaced by adult hemoglobin (HbA) after birth70. Being linked to iron and oxygen binding, it is important for effective transportation of oxygen via heme, thereby preventing and reducing cell death. But in downregulated disease conditions as observed, cell death becomes inevitable, hence the severity of disease. In this study, we did not identify the C6KTB7 or C6KTD2 identified by others71 to be differentially expressed in malaria patients. A possible reason for this could be difference in the demographics of participants from whom the data were obtained as signatures of immune response are shaped partly by the array of pathogens in different geography that challenge individuals residing in those areas.

Of interest is the identification of several miRNAs that were either upregulated or downregulated in our study. Five of the miRNAs-hsa-mir-32, hsa-mir-25, hsa-mir-221, hsa-mir-29 and hsa-mir-148 were peculiar in that they bind to different target genes and the possibility of regulating their expression during infection. To the best of our knowledge, this is possibly the first report identifying these five miRNAs as differentially expressed and regulating different host genes such as CDK6, CD276, CDC42, DNMT3A, CXXC6 etc. CDK6 is known to phosphorylate and regulate the expression of tumor suppressor proteins, and its altered expression has been observed in prostate cancer72, stomach cancer73, and coronary artery disease74. CD276 belongs to the immunoglobulin superfamily of proteins and participate in the regulation of T-cell-mediated immune response75. It is highly expressed, hence of no surprise that hsa-mir-29 binds to the 3’UTR resulting in even higher expression that ultimately leads to inflammatory response and cell death. However, a study has found an inverse relationship between hsa-mir-29 and the expression of cytochrome P450 2C19 (CYP2C19) gene45, where hsa-mir-29 was reported to suppress the expression of CYP2C19 gene.

Substantial number of lncRNAs were also identified to be associated with disease. Of these, two (SLC7A11 and LINC01524) were found to interact and stimulate several immune genes such as CD43, CDK41, ZBTB47, CD292, IRGI etc. which are known to be involved in antigen-specific activation of T cells and regulation of transcription. Other lncRNAs which were unique in the number of target immune genes they bind include LINC02275, ERICH3, LINC01120, FOXD2, LINC01816. Therefore, we propose that these lncRNAs could that have the ability to regulate the expression of these various genes, and their ability to impact over-or under expression can be targeted as biomarkers for point-of-care diagnostic tool in distinguishing the different forms of malaria infections.

Our pathway analyses of differentially expressed genes reveal several pathways such as cell–cell adhesion, transmembrane receptor protein tyrosine kinase signaling pathway, hemophilic cell adhesion, protein binding, oxygen transport, hydrogen peroxidase catabolic process, cellular oxidant detoxification, hemoglobin complex and haptoglobin-hemoglobin complex that were enriched among the up-/downregulated genes. Multiple genes were identified to be involved in in the different pathways. For example, EGFR, EPHAS, and ROS1 genes implicated in lung cancer susceptibility76,77,78 and were involved in positive regulation of kinase activity, cell–cell adhesion etc. Of interest and importance is the identification of cell–cell adhesion pathway amongst the various identified pathways. Malaria severity has been confirmed to be enhanced by clustering and adhesion of infected cells to endothelial cells (cytoadherence), rosetting with uninfected erythrocytes and platelets-mediated clumping of infected erythrocytes, all of which play key roles in increased malaria pathogenic outcome79.

Evaluating the network connection between the DEGs and miRNA, differentially expressed miRNAs and various genes and miRNAs and lncRNAs to deconvolute the visual connection between them all reveal interesting networks amongst the duplexes. For instance, interaction was observed between KMT2B/IDF2/TTYH3/NTRK2 genes and hsa-mir-331, EGFR/MBD6/ZNF385A/TTYH3 and hsa-mir-24. The most connected miRNA is the hsa-mir-221 which was observed to target 16 genes including ICAM1, followed by the hsa-mir-29a binding to 14 genes (including CD276, CDC42, and CXXC6). Similarly, these two miRNAs were also found to be connected to several lncRNAs. Taken together, it is safe to propose that these two miRNAs play significant role in human immune response to malaria and result in different clinical outcome. Therefore, since they modify and/or regulate the expression of these genes and lncRNAs, targeting them for diagnostic purpose in discriminating individuals with severe/symptomatic malaria from those with asymptomatic infection or not infected will provide an added arsenal in the battle against malaria.

This is a computational study utilizing RNAseq transcriptome data to decipher DEGS, miRNAs and lncRNAs involved in malaria infection and how they modulate disease status and severity (outcome). Although, our study is limited by the inability to validate our results with clinical cases since as at the time of this study we did not have malaria samples from individuals with different clinical manifestations, we strongly postulate that the findings can serve as the basis for subsequent in vitro and in vivo characterization of individuals with different malaria clinical outcomes as has been planned in our follow-up study. These results could also be additional tools to further enhance the dissection of Pf host-parasite interactome.

Methodology

RNAseq data acquisition

A total of 107 raw fastq of human RNAseq datasets were retrieved from the Sequence Read Archive (SRA) database. The RNAseq datasets were obtained from individuals infected with severe malaria (The Gambia; n = 20)80, symptomatic (Burkina Faso; n = 15)81, and asymptomatic disease (Mali; n = 16)82, in addition to uninfected controls (Tanzania; n = 20; and Mali; n = 36)82,83. All libraries were paired-end reads with an average of 103 million reads (Table 1); RNAseq reads ranging from 21 to 103 million reads. Data from studies that were not properly defined in terms of infection status at the time of sample collection or for which status of malaria infection (uninfected, asymptomatic, symptomatic and severe) was unknown were excluded from this analysis. The infection status of the different samples used were confirmed both by clinical (symptoms and microscopy) and molecular technique (conventional and/or real-time PCR) as detailed in the articles80,82,83.

Table 1 Countries, accession numbers and infection status of retrieved RNAseq data.

Taking into cognizance that genetic history in Africa is largely partitioned by geography and language with evidence for genetic structuring by geography provided by Fan et al.84, the countries where these samples were sorted from fall under the Nilo-Saharan lineages which cuts across the western part of Africa (including Gambia, Mali, Burkina Faso) and some parts of East Africa where Tanzania falls under. In addition, malaria parasite transmission dynamics in the various countries where the data were generated from were seemingly similar with the exception of the Gambia85 where transmission intensity is considerably low. It is well known that immune response and clinical outcomes of malaria is largely shaped by the transmission intensity of the parasite, the level of exposure (giving rise to different number of malaria episodes), and the different clones of malaria parasites circulating in an area.

Gene expression analysis of RNAseq data

Low quality reads and any remaining adapters were trimmed off the RNAseq data, using Trimmomatic version 0.3886. Hisat2 version 2.2.1 was used to align quality trimmed reads with the human (USCU hg38) reference genome87. The resulting bam files were used as input for constructing the gene counts using htseq-count version 0.9.188 with the unstranded option and the annotated hg38 gene transfer format. Gene expression analysis of generated gene counts mapping to the human genome was performed using DESeq2 (version 1.34.0)89. Resulting p-value were adjusted using the Benjamin and Hochberg’s approach for controlling false discovery rate. Genes with an adjusted p-value of < 0.05 found by DESeq2 were noted as differentially expressed genes (DEGs) involved in human response to the clinical manifestations of malaria. Furthermore, gene counts were filtered and normalized, and low-quality expression genes, defined as genes with < 0.5 count per million across all samples, were removed. Samples were clustered by the status of infection (severe, symptomatic, asymptomatic and uninfected) and DEGs were visualized in a heatmap. It is important to state that in all studies, only that from the Gambia provided information on the specific age of the study participants, while from the other three countries, it was a description of whether study participants were adult or children. Therefore, since age is an important factor in immune response, and because samples from Burkina Faso and the Gambia were from children while those from Tanzania and Mali of mixed age, couple with the fact that the clinical manifestation in the first two were either severe or symptomatic unlike the last two where they were either asymptomatic or uninfected, we therefore in our analyses corrected for age, and grouped samples which were from severe/symptomatic individuals (Gambia/Burkina Faso) together, and those from asymptomatic/uninfected individuals (Tanzania/Mali) together. In addition, because samples from Mali were from asymptomatic vaccinated individuals, vaccination was a factor we corrected for. Further, since our datasets were from different studies, countries, and years, we applied a batch correction effects using Harman. Harman aid the removal of background noise from different samples batch. This technique is able to do this without any risk of losing any biologically relevant information90.

Functional and pathway analysis of human DEGs in malaria clinical conditions

Functional enrichment analysis of the topmost differentially expressed genes to determine the molecular functions, biological processes and cellular components associated with these genes was performed using Database for Annotation, Visualization and Integrated Discovery (DAVID). With a false discovery rate (FDR) of 0.1, and a p-value of ≤ 0.05, molecular functions, biological processes and cellular components were catalogued. Generated pathways were further confirmed using the iDEP (Ge et al., 2018) without altering the parameters. KEGG programming algorithm91 was used to generate significant pathways with a p-value of ≤ 0.05 and FDR of 0.1.

Differentially expressed miRNA prediction in different disease manifestations

miRNAs that were differentially expressed with a p-value of < 0.05 and log fold change ≥ 2 were taken to be upregulated while those with fold change of ≤ -2 and a p-value of < 0.05 were downregulated. Some selected miRNAs were used to probe for genes and long non-coding RNA regions on the human reference genome that bind to the 5′-UTR, CDS and 3′-UTR, using miRnet60, TargetScan61 and miRTarBase92. Network of the miRNAs interaction with targeted genes and lncRNAs were constructed separately using the miRnet web-based platform60. Additionally, functional enrichment of the differentially expressed miRNAs was carried out using DAVID to determine the molecular functions, biological processes and cellular components connected with these miRNAs.

Identification of lncRNAs in disease populations

Putative lncRNAs that might potentially impact the expression of immune genes in individuals leading to varied disease manifestation were sorted among the various differentially expressed products. To further locate the lncRNAs on the genome, the NONCODE database, which is a robust repository for non-coding RNAs was used63. Due to the huge size of non-coding RNAs in most vertebrate genomes, some assumptions were made to filter out any non-coding RNAs that might have even the slightest potential of coding for a particular protein. These assumptions include (a) lncRNA with more than two exons were selected based on the presumption that high quality transcripts have high number of exons; (b) transcripts length of ≥ 200 bp were selected; (c) non-coding potential prediction score using the coding-non-coding index (CNCI) of < 0 was used to discriminate between lncRNAs with the ability to code from those without such ability93,94. All lncRNAs that met the above three criteria were selected and a second validation done using the LGC toolkit found among the suite of LnCBook package95 that characterize and identify features of lncRNA, including transcript length. The LnCbook collates lncRNAs from both pre-existing databases and experimentally verified community-curated transcripts making it a reliable database for determining the coding potential of lncRNAs. LncRNAs that were identified to have coding potential by this toolkit were further dropped from the list.