Background

Malaria, caused by unicellular protozoan Plasmodium spp. parasites, is an ancient disease and remains a major threat to human health and wellbeing. Five species of Plasmodium are currently recognized as causing human malaria, of which the most lethal is P. falciparum (Pf). In 2015, the World Health Organization estimated that the maximal annual burden imposed by malaria, whilst decreasing, still stands at 214 million (range 149–303 million) cases that result in 438,000 (range 236,000–623,000) deaths [1]. Drug resistance to frontline antimalarials continues to arise and spread, exacerbated by slow progress in the introduction of alternatives. Properly efficacious vaccines remain a hope, not a likelihood. Against this background, genome-based research on malaria seeks to provide new avenues for therapeutic or prophylactic development based upon biological insights such as the identification of new drug targets and vaccine candidates.

The landmark of the completion of the genome sequence of a laboratory strain of Pf was achieved over a decade ago [2] (Fig. 1). This has since been accompanied, thanks to plummeting costs and advances in next-generation sequencing (NGS) technologies, by the whole-genome sequencing (WGS) of a wide range of species representing all the major clades of the genus, although the genomes of all known human infectious Plasmodium species remain to be sequenced [3]. However, the combination of NGS and WGS has enabled the development of innovative large-scale genomic studies, for example, for genomic epidemiology [4]. Such population genomics, fueled by collaborative consortia (for example, the Malaria Genomic Epidemiology Network (MalariaGEN; http://www.malariagen.net), have allowed the dynamics of global and local population structures to be assessed and adaptive change in parasite genomes to be monitored in response to threats, such as artemisinin (ART). This is especially true for single-nucleotide polymorphisms (SNPs), and while other aspects of genome variation (such as indels and copy number variation) might currently lag behind, the gaps in the database are known and are firmly in the sights of researchers.

Fig. 1
figure 1

Major advances in omics-related fields. This figure highlights landmark studies providing key insights into parasite makeup, development, and pathogenesis (yellow boxes) as well as crucial technical advances (blue boxes) since the first Plasmodium genomes were published in 2002 [2, 5, 12, 13, 27, 29, 31, 39, 40, 42, 43, 4850, 53, 54, 57, 66, 114, 115, 151, 153178]. AID auxin-inducible degron, ART artemisinin, cKD conditional knockdown, CRISPR clustered regularly interspaced short palindromic repeats, DD destabilization domain, K13 kelch13, Pb P. berghei, Pf P. falciparum, TSS transcription start site, TF transcription factor, ZNF zinc finger nuclease

The template Plasmodium genomes have provided the substrate for the application of an explosion of other post-genome survey technologies that have been largely exclusively applied to Pf, such as transcriptomics, proteomics, metabolomics, and lipidomics, and that map the general and stage-specific characteristics of malaria parasites. These data are warehoused in expensive but critical community web sites such as PlasmoDB (http://www.Plasmodb.org). This in turn has been exploited by ever improving forward and reverse genetic capabilities to assign function to genes, steadily reducing the >60 % of genes of unknown function that were originally catalogued [2]. Advances that will be highlighted in this review include: the unraveling of the molecular mechanisms of parasite resistance to ART; the functional identification of some of the histone-modifying enzymes that write the epigenetic code (such as Pf histone deacetylase 2 (PfHDA2)) and the proteins that read it (such as Pf heterochromatin protein 1 (PfHP1)) that, with others (such as RNaseII), play a significant role in the regulation of antigenic variation and commitment to sexual development.

Furthermore, the genomes of the host and of a growing number of mosquito vectors have been characterized in both increasing number and depth, permitting meta-analyses of these genomes in combination with Plasmodium infection. These studies have revealed important loci associated with resistance to the malaria parasite in the host and vector, respectively [5, 6], and indicate the genomic hotspots in the genetic arms race that malaria has stimulated.

We also review the recent advances in this very active area of malaria genomics and control of gene expression and emphasize any benefits that these advances may have for the development of therapies and interventions (Table 1).

Table 1 Key advances from recent omics studies

Human genomics

The infrastructure required to effectively collect, collate, and analyze large genomes for epidemiological studies (that is, genome-wide association studies (GWAS)) is so costly that it is best achieved in consortia. These can work at such a scale that analyses are powered to a degree that GWAS findings become more certain and the global context of the effect of, for example, human genetics on susceptibility to malaria is more reliably resolved. The African Genome Variation Project recognizes the significant diversity of ethnicities and, therefore, genotypes and, through WGS, imputation, and SNP mapping, seeks to build a database through which disease incidence and outcome can be reliably associated with haplotypes [7]. Already, such wider analyses have confirmed SNP associations with five well-known traits, including hemoglobinopathies and glucose-6-phosphate dehydrogenase (G6PD) deficiency, but have refuted 22 others that had been linked by smaller-scale studies [8]. This study also showed opposing effects of G6PD on different fatal consequences of malaria infection, revealing a hitherto unsuspected complexity of associations. Ongoing analyses have revealed new, although unsurprising, examples of haplotypes of loci associated with protection from severe malaria, such as the glycophorin locus on human chromosome 4 [8, 9].

Vector genomics

In Africa, malaria is mainly transmitted by female Anopheles gambiae (Ag) mosquitoes. Approaches to understanding the role of Ag mosquito genomics in malaria transmission have been similar to those of the African Genome Variation Project. Thus, the Ag1000G project (https://www.malariagen.net/projects/ag1000g) involves 35 working groups that have sampled Ag mosquitos from 13 malaria endemic countries and that aim to establish the levels of Ag genome diversity, establish population structures, and link these to the ecology of disease transmission. The Anopheles vector genome is very dynamic. Comparative vector genomics has revealed rapid gene gain and loss compared with Drosophila and significant intragenus diversity and mixing in genes involved in both insecticide resistance and antimalarial immunity [10, 11]. The nature and extent of such diversity precludes the application of classic GWAS approaches and a novel approach of phenotype-driven, pooled sequencing coupled with linkage mapping in carefully selected founder colonies has been used to map vector phenotypes. This study recently revealed TOLL11 as a gene that protects African mosquitoes against Pf infection [6].

Parasite genomics

Full genome sequences are now available for many strains of Pf [2], Plasmodium vivax [12], and Plasmodium knowlesi [13] among the human infectious parasites. Primate and rodent infectious species that are frequently used as model parasites have also been sequenced and include Plasmodium berghei (Pb), Plasmodium cynomolgi, Plasmodium chabaudi and Plasmodium yoelii [14]. Recently, the genomes of seven further primate infectious species have become available, demonstrating the close relationship between Pf and chimpanzee infectious species [15]. The typical Plasmodium genome consists of 14 linear chromosomes of aggregate size of approximately 22 megabases encoding >5000 protein-coding genes. The core, conserved genome of about 4800 such genes occupies the central chromosomal regions whilst the multi-gene families (at least some of which are associated with antigenic variation) are largely distributed to the subtelomeric regions. Non-coding RNA (ncRNA) genes [16] and antisense transcription [17, 18] are being catalogued in Pf but this catalogue probably remains incomplete as only blood-stage parasites have been seriously investigated in this regard and ncRNAs remain largely of unknown significance.

One key feature of Pf is its evolution in the face of human-imposed selection pressures in the form of drugs and potentially vaccines. Such pressure has consistently resulted in the emergence of drug-resistant parasites. There is a huge potential global reservoir of genome variation upon which selection may act. In an initial analysis of 227 parasite samples collected at six different locations in Africa, Asia, and Oceania, MalariaGEN, the Oxford-based genomic epidemiology network, identified more than 86,000 exonic SNPs. This initial SNP catalogue is described in detail by Manske and colleagues [19]. Currently (27 July 2016), the MalariaGEN database states that for the Pf Community Project, it has data on 3488 samples from 43 separate locations in 23 countries and the number of high-quality, filtered exonic SNPs has increased to more than 900,000. All of this variation is diversity, which in turn can be selected for fitter and perhaps more deadly parasites. Modern NGS and WGS have enabled comparative and population genomics approaches that have been used to reveal important features of emerging parasite populations, for example, in response to drugs.

Parasite development and pathogenesis

Within their mammalian host and mosquito vector Plasmodium parasites complete a remarkable lifecycle, alternating between asexual and sexual replication (Fig. 2). Throughout the Plasmodium lifecycle, regulation of gene expression is orchestrated by a variety of mechanisms, including epigenetic, transcriptional, post-transcriptional, and translational control of gene expression. Owing to the absence of most canonical eukaryotic transcription factors in the Plasmodium genome [2], epigenetic control has long been recognized to play an important role in gene expression regulation.

Fig. 2
figure 2

Plasmodium life cycle. After a mosquito bite, malaria parasites are deposited into the host’s skin and within minutes are carried via the bloodstream into the liver, where through asexual proliferation within the hepatocytes tens of thousands of merozoites are produced. Following hepatocyte rupture, merozoites are released into the bloodstream where they can invade the host’s red blood cells (RBC), leading to the initiation of the intra-erythrocytic development cycle (IDC). During the IDC (lasting about 48–72 h in human and about 24 h in rodent malaria parasites), Plasmodium parasites multiply asexually through the completion of several morphologically distinct stages within the RBCs. After RBC invasion, malaria parasites develop via the ring and trophozoite stage into schizonts, each containing a species-specific number of merozoites (typically 10–30). Upon schizont rupture, merozoites are released into the bloodstream, where they can invade new RBCs and initiate a new IDC. However, a small fraction of ring-stage parasites sporadically differentiate into male or female gametocytes, which are responsible for initiating transmission back to the mosquito. Through another mosquito blood meal gametocytes are taken up into the mosquito midgut where they are activated and form male (eight per gametocyte) and female (one) gametes. Following fertilization, the zygote undergoes meiosis (and therefore true sexual recombination) and develops into a motile, tetraploid ookinete that traverses the midgut and forms an oocyst. Via another round of asexual proliferation inside the oocyst several thousands of new haploid sporozoites are generated that, upon their release, colonize the mosquito salivary glands, poised to initiate a new infection of another mammalian host

Epigenetics lies at the very heart of gene expression, regulating access of the transcriptional machinery to chromatin [20] via (1) post-translational modifications (PTMs) of histones, (2) nucleosome occupancy, and (3) global chromatin architecture. In the past decade, various histone PTMs have been identified throughout the Plasmodium lifecycle (reviewed in [21]) and the existing catalog of modifications in Pf was recently extended to 232 distinct PTMs, 88 unique to Plasmodium [22]. The majority of detected PTMs show dynamic changes across the intra-erythrocytic development cycle (IDC), likely mirroring changes within chromatin organization linked to its transcriptional status. Methylation and acetylation of N-terminal histone tails are by far the most studied regulatory PTMs, linked either to a transcriptionally active chromatin structure (that is, euchromatin) or to transcriptionally inert heterochromatin. In Pf, various genes encoding putative epigenetic modulators (that is, proteins catalyzing either the addition or removal of histone PTM marks) have been identified [23], but only a few have been subjected to more detailed investigation [24, 25]. Many of the histone modifiers are essential for Plasmodium development, making them a promising target for antimalarial drugs [26]. In Pf, conditional knockdown of HDA2, a histone lysine deacetylase (HDAC) catalyzing the removal of acetyl groups from acetylated histone 3 lysine 9 (H3K9ac), resulted in elevated H3K9ac levels in previously defined heterochromatin regions [27]. H3K9ac is an epigenetic mark associated with transcriptionally active euchromatin [28] and HDA2 depletion resulted in the transcriptional activation of genes located in heterochromatin regions, leading to impaired asexual growth and an increased gametocyte conversion [27]. Interestingly, genes found to be dysregulated by HDA2 knockdown are also known to be associated with HP1, a key epigenetic player binding to tri-methylated H3K9 (H3K9me3), linked to transcriptionally repressed chromatin. Strikingly, conditional knockdown of PfHP1 recapitulated, to a much greater extent, the phenotype observed in HDA2-knockdown mutants [29]. HP1 is believed to act as a recruitment platform for histone lysine methyltransferases (HKMTs), required for maintenance and spreading of H3K9me3 marks [30], which is consistent with the reduction of H3K9me3 observed in HP1 knockdown cells [29]. In addition, bromodomain protein 1 (BDP1) was found to bind to H3K9ac and H3K14ac marks within transcription start sites (TSSs) in Pf, among them predominantly invasion-related genes (Fig. 3a), and BDP1-knockdown parasites consistently failed to invade new erythrocytes. BDP1 also appears to act as a recruitment platform for other effector proteins such as BDP2 and members of the apicomplexan AP2 (ApiAP2) transcription factor family [31].

Fig. 3
figure 3

Malaria parasite genomic components involved in pathogenesis. a The expression of invasion-related genes is regulated through epigenetic and post-transcriptional mechanisms. Bromodomain protein 2 (BDP2) binds to H3K9ac marks within the promoter region of genes associated with red blood cell (RBC) invasion (as well as other gene families not depicted here [31]), enabling their transcription. This is likely achieved through the recruitment of BDP1 and transcription factors (TFs) of the ApiAP2 family. Following transcription during the trophozoite stage, mRNAs encoding invasion-related proteins are bound by ALBA1 functioning as translation repressor. After progression to the schizont stage, ALBA1 is released, allowing the timely synthesis of proteins required for merozoite invasion of RBCs. b Experimental findings either directly from studies on ap2-g or from epigenetically regulated var genes are suggestive of an epigenetically controlled mechanism regulating ap2-g transcription. In sexually committed parasites, ap2-g is characterized by H3K4me2/3 and H3K9ac histone marks and most likely contains histone variants H2A.Z and H2B.Z located in its promoter region. BDPs are believed to bind to H3K9ac, facilitating ap2-g transcription. ApiAP2-G drives expression of genes required for sexual development through binding to a 6/8-mer upstream DNA motif. ap2-g expression itself is believed to be multiplied through an autoregulatory feedback loop where ApiAP2-G binds to its own promoter that also contains ApiAP2-G motifs. In asexual blood-stage parasites, ap2-g is transcriptionally silenced by heterochromatin protein 1 (HP1) binding to H3K9me3 histone marks (located in repressive loci in the nuclear periphery). Histone deacetylase 2 (HDA2) catalyzes the removal of H3K9ac from active ap2-g, facilitating ap2-g silencing. c Monoallelic expression of one of the approximately 60 members of the erythrocyte membrane protein 1 (EMP1)-encoding var genes is regulated through epigenetic silencing of all but one var gene copy. The active var is marked by euchromatin post-translational modifications H3K4me2/3 and H3K9ac and histone variants H2A.Z/H2B.Z located in its promoter region, as well as H3K36me3 covering the whole var gene body but absent from the promoter region. Transcription of noncoding RNAs associated with the active var gene is facilitated by upstream as well as intronic promoters. All other silenced var genes cluster into perinuclear repressive loci and are characterized by HP1 binding to H3K9me3 marks. var gene silencing also involves SET2/vs-dependent placing of H3K36me3 histone marks in promoter regions and is marked by the absence of non-coding RNAs, likely safeguarded through RNaseII exonuclease activity. In addition, other histone code modulators such as HDA2, SET10, and SIR2A/B are likely involved in epigenetic var gene regulation. d Mutations in kelch13 (K13) were found to be the major contributors to artemisinin (ART) resistance identified in drug-resistant parasites in the laboratory as well as in field isolates. kelch13 mutations appear to arise in a complex array of background mutations (that is, mutations in genes encoding ferredoxin (FD), apicomplast ribosomal protein S10 (ARPS10), multidrug resistance protein 2 (MDR2), and chloroquine resistance transporter (CRT)), not yet detected in African parasites. In addition, elevated phosphatidylinositol-3-kinase (PI3K) levels have been observed in ART-resistant parasites and PI3K signaling has been implicated to impact on the unfolded protein response observed in ART-resistant parasites. H2A.Z/H2B.Z, orange/yellow-paired quarter circles; H3K4me2/3, light green circles; H3K9ac, dark green circles; H3K9me3, red circles; H3K36me3, blue circles; canonical nucleosomes, grey globes; ApiAP2-G binding motif; light blue line; ncRNAs, wobbly red lines; mRNAs, wobbly black lines. AP2n other TFs belonging to the ApiAP2 DNA binding protein family, ncRNA non-coding RNA, TFs transcription factors

In addition to histone PTMs, nucleosome organization plays a critical role in gene expression regulation in Plasmodium. In general, heterochromatin is substantially enriched in nucleosomes compared with euchromatin [32] and active promoters and intergenic regions in Pf show markedly reduced nucleosome occupancy [33]. In addition, common transcript features such as TSSs, transcription termination sites, and splice donor/acceptor sites show clearly distinguishable nucleosome positioning in Pf [34], but previously described dynamic changes in nucleosome positioning [32] appeared to be mostly restricted to TSSs during the IDC [34]. Uniquely in Plasmodium spp., canonical histones in intergenic regions are replaced by histone variant H2A.Z [28], which, in concert with the apicomplexan-specific H2B.Z, establishes a H2A.Z/H2B.Z double-variant nucleosome subtype enriched at AT-rich promoter regions and correlates with open chromatin and active gene transcription [35].

Within the confined space of the nucleus, chromosomes are tightly packed into a three-dimensional structure. This three-dimensional architecture allows interaction between otherwise distant chromatin regions possessing regulatory function and facilitates contacts with other nuclear sub-compartments such as the nucleolus and the nuclear envelope [36]. Until recently, knowledge of the chromosome architecture and chromatin interactions in Plasmodium was mostly restricted to single genomic loci based on fluorescence in situ hybridization experiments [37]. However, recent advances in deep-sequencing technologies [38] have for the first time enabled the genome-wide profiling of chromosome interactions at kilobase resolution in Plasmodium [37, 39]. In contrast to other eukaryotic organisms, the Pf nucleus appears to lack clearly defined chromosome territories and chromatin interactions are mainly restricted to intra-chromosomal contacts showing a clear distance-related dependency [37, 39]. Inter-chromosomal interactions are mostly absent in Pf and restricted to centromeres, telomeres, ribosomal DNA (rDNA) loci, and internal as well as subtelomeric-localized var genes (further discussed in the next section). This observed clustering appears to coincide with transcriptional activity of each cluster. Interestingly, using three-dimensional chromatin modeling, the highly transcribed rDNA genes were proposed to be localized to the nuclear periphery, which was previously mainly associated with transcriptionally silenced heterochromatin [40], indicative of perinuclear transcriptionally active compartments [37].

Transcription itself is initiated through binding of the transcriptional machinery to promoter regions in the nucleus, resulting in the synthesis of pre-mRNA molecules, which, following extensive processing and nuclear export, leads to the accumulation of mature mRNAs in the parasite cytosol [41]. A recent study found evidence for stage-specific transcription initiation from distinct TSSs of otherwise identical transcriptional units, giving rise to developmentally regulated mRNA isoforms [42]. While most canonical eukaryotic transcription factors are absent from the Plasmodium genome [2], the ApiAP2 family of DNA-binding proteins comprises by far the largest group of transcription factors in malaria parasites [43]. A collection of ApiAP2 proteins is expressed throughout all stages of the IDC [44], while other ApiAP2 proteins are expressed outside the IDC [4547]. ApiAP2s appear to be among the main drivers of developmental progress throughout most Plasmodium lifecycle stages and their disruption abolishes or greatly reduces parasite development [45, 46]. They bind in a sequence-specific fashion to motifs generally distributed upstream of open reading frames (ORFs) and individual AP2s may have widespread influence; PfAP2-O has been shown to bind upstream of >500 genes (roughly 10 % of the parasite ORFs), potentially influencing a wide range of cellular activities [48].

Through forward genetic screens and comparative genomics, ApiAP2-G was discovered to function as a conserved master regulator of sexual commitment in Pf and Pb. ApiAP2-G binds to a conserved 6/8-mer nucleotide motif enriched upstream of gametocyte-specific genes and ap2-g itself, leading to an autoregulatory feedback loop [49, 50] (Fig. 3b). ApiAP2-G2, another ApiAP2 family member, acts downstream of ApiAP2-G during sexual development, functioning as a transcriptional repressor blocking expression of genes required for asexual development and influencing gametocyte sex ratios [50, 51]. During the asexual IDC, ap2-g displays characteristics of epigenetically silenced heterochromatin, such as H3K9me3 marks, binding to HP1 and localization to the nuclear periphery (reviewed in [52]) (Fig. 3b). However, the previously mentioned knockdowns of both PfHDA2 and HP1 resulted in increased gametocyte conversion, likely as a direct consequence of the loss of H3K9me3 marks and H3K9 hyperacetylation leading to ap2-g transcriptional activation [27, 29]. This opens the possibility of a bet-hedging mechanism for sexual commitment in Plasmodium, regulating stochastic, low-level activation of ap2-g sensitive to environmental stimuli, as has been shown for several blood-stage-expressed genes [52, 53]. PTMs such as lysine acetylation are not restricted to histones and a recent study has demonstrated that the “acetylome” impacts >1000 proteins and intriguingly is highly enriched in the ApiAP2 transcription factor family [54, 55], although the functional consequences of these PTMs have yet to be established.

Following their synthesis, eukaryotic mRNAs are processed and are finally translated by the ribosomal machinery. Translation has long been a focus of malaria research, not only because it represents a promising target for antimalarial drugs but also for its potential regulatory features [56]. The lack of correlation between transcript and protein level observed throughout the Plasmodium lifecycle has fueled researchers’ interest in post-transcriptional and translational control for decades [57]. Many features of post-transcriptional/translational control in malaria parasites are similar to the mechanisms found in other eukaryotes [41]. However, the advent of ribosome profiling [58] has enabled in-depth genome-wide analysis of the Plasmodium translatome. Throughout the IDC, transcription and translation are tightly coupled and only 8 % (approximately 300 transcripts) of the transcriptome was found to be translationally regulated [59]. These genes were found to be involved in merozoite egress and invasion, and while transcript level peaked during the late stages of the IDC, maximal translation was observed during the early ring stage. This observation resembles a general feature of gene expression in Plasmodium, whereby for a set of genes transcription and translation are uncoupled and mRNA translation occurs during a later developmental time point when compared with maximal transcriptional activity, most notably in female gametocytes [46, 6064]. This is especially true for genes required for developmental progression and provides the parasite with the capability of rapid and timely protein synthesis without the need for preceding de novo mRNA synthesis. Recently, PfALBA1, a member of the DNA/RNA-binding Alba protein family, was postulated to act as master regulator during the Pf IDC, controlling translation of invasion-related transcripts (Fig. 3a) as well as regulating mRNA homeostasis of approximately 100 transcripts in blood-stage parasites [65]. In contrast to findings by Caro and colleagues [59], an earlier study using polysome profiling found a discrepancy between steady-state mRNA level and polysome-associated mRNAs among 30 % of genes (1280 transcripts) during the Pf IDC, indicative of translationally controlled gene expression [66]. Additionally, the results of this study, as well the findings of others, suggest upstream ORF translation and stop codon read-through in Pf [6769], but the genome-wide extent of such mechanisms in Plasmodium spp. remains controversial [59]. Hence, expansion of these studies to other parasite life stages, such as the gametocyte, where translational control is firmly established, would surely give further insights into the extent of translation regulation in Plasmodium.

In addition to canonical protein-coding mRNAs, a vast number of genes encoding different ncRNAs have been identified within the Plasmodium genome in recent years, which are believed to exert a variety of regulatory functions (reviewed in [70]). Circular RNAs (circRNAs) are among the newest members of the still expanding catalogue of existing ncRNAs in Plasmodium [17]. Host microRNAs (miRNAs) have been shown to regulate parasite translation [71], and circRNAs therefore might act as sponge for host miRNAs, a mechanism described in other organisms [72]. Recent studies have especially increased our knowledge of the role of ncRNAs in var gene regulation (discussed in the next section) but, nevertheless, the biological role of the vast majority of these ncRNA species remains unclear.

Immune evasion

In their attempts to occupy a diverse range of host environments, protozoan parasites of the Plasmodium genus have evolved a plethora of molecular mechanisms to evade the host adaptive immune response. The host immune response to Plasmodium infection is dependent upon both host and parasite genomics and the developmental stage and phenotype of the invading parasite [7375]. In the best-studied example in Plasmodium, virulence of Pf is attributed largely to monoallelic expression of just one of approximately 60 var genes that encode variant copies of the surface antigen, Pf erythrocyte membrane protein 1 (PfEMP1). The ability to switch expression from one var gene to another enables the invading parasite to alternate between phenotypes of variable cytoadherent and immunogenic properties [7678]. PfEMP1 proteins are expressed at parasite-induced knobs on the infected erythrocyte surface, which are electron-dense features comprising many parasite proteins anchored to the erythrocyte cytoskeleton. Failure to present PfEMP1 in such knob structures greatly reduces the ability of the infected erythrocyte to bind to its specific host receptor [79].

Pf var gene regulation is complex and includes mechanisms of gene regulation such as chromosomal organization and subnuclear compartmentalization [80, 81], endogenous var gene clustering and var promoter–intron pairing [82, 83], transcriptional gene silencing via exoribonuclease-mediated RNA degradation [84], histone variant exchange at var promoters [85, 86], the effect of trans antisense long non-coding RNAs (lncRNAs) [87], and the presence or absence of histone modifications and their associated histone-modifying enzymes [27, 29, 40, 8792] (Fig. 3c). Interest in delineating these mechanisms has continued, and even grown, as more research in the post-genomic area has highlighted the important differential role of the 5′ upstream promoter families into which the var genes can be subdivided into five classes (upsA to upsE), which correlate closely with the severity of malaria infection in the human host [9398]. Pf var gene promoters are also essential components of the gene silencing mechanism and monoallelic expression. The upsC var promoter in particular is necessary to maintain chromosome-internal var genes in their silenced state and recently has been proposed to do so through the interaction of cis-acting MEE2-like sequence motifs and MEE2-interacting factors to reinforce var gene transcriptional repression [75, 83].

Monoallelic var gene transcription is also associated with the presence of H3K9me3 repressive marks at silent var gene loci (Fig. 3c). This histone modification is predicted but not proven to be imposed by the HKMT PfSET3 and is associated with perinuclear repressive centers and the binding of PfHP1, stimulating heterochromatin formation [40, 89, 90, 92]. Conditional disruption of one of these essential proteins, HP1, disrupts singular var gene expression and dysregulates antigenic variation [29]. In addition, conditional knockdown of PfHDA2 has been shown to result in a dramatic loss of monoallelic var gene expression [27]. This implicated PfHDA2 as an upstream regulator of HP1 binding as it facilitates the establishment of the H3K9me3 mark. The indispensable role of the dynamic histone lysine methylation of Plasmodium chromatin by histone lysine demethylases (HKDMs) and HKMTs in controlling the transcription of nearly all var genes has also been demonstrated. Knockout of the Pf hkmt gene encoding SET2/SETvs (vs, variant-silencing) resulted in reduced presence of the repressive H3K36me3 mark at the TSSs and intronic promoters of all var gene subtypes (Fig. 3c). Loss of this SETvs-dependent histone modification resulted in the loss of monoallelic var gene expression and expression of the entire var repertoire [98]. Furthermore, SETvs can directly interact with the C-terminal domain of RNA polymerase II, with SETvs disruption resulting in a loss of binding to RNA polymerase II and var gene switching [99].

Pf upsA-type var gene expression is also regulated by PfRNaseII, a chromatin-associated exoribonuclease. An inverse relationship exists between transcript levels of PfRNaseII and upsA-type var genes, with an increase in the latter corresponding to incidences of severe malaria in infected patients [84]. PfRNaseII is proposed to control upsA-type var gene transcription by marking TSSs and intronic promoter regions, degrading potential full-length transcripts to produce short-lived cryptic RNA molecules that are then further degraded by the exosome immediately upon expression (Fig. 3c). Disruption of the pfrnaseII gene resulted in loss of this degradation and the generation of full-length upsA var gene transcripts and intron-derived antisense lncRNA. These data illustrate the relationship between PfRNaseII and the control of monoallelic var gene transcription and suggest a correlation between lncRNA and var gene activation in Pf [84]. The role of lncRNAs in Pf var gene activation was again investigated in a study by Amit-Avraham and colleagues [87], which demonstrated dose-dependent transcriptional activation of var genes by overexpression of their individual antisense lncRNA transcripts. Disruption of antisense lncRNA expression by peptic nucleic acids resulted in downregulation of active var gene transcripts and induced var gene switching. The exact mechanism by which antisense lncRNAs act to promote the active transcription of a var gene is unknown. It has been postulated that antisense var transcripts may recruit chromatin-modifying enzymes that in turn would affect gene accessibility for the Pf transcriptional machinery. Antisense var gene lncRNAs would also contain a complementary sequence to var gene intronic insulator-like pairing elements that bind specific nuclear-binding proteins, therefore blocking the silencing activity of pairing elements by hybridization [87, 100].

The Plasmodium helical interspersed subtelomeric protein (PHIST) family of genes, which is unique to Pf, has also been implicated in the regulation of immune evasion as a result of its ability to bind to the intracellular acidic terminal segment of PfEMP1. Conditional knockdown of the essential PHIST protein PfE1605w reduced the capability of the infected host erythrocyte to adhere to the CD36 endothelial receptor, an important virulence feature of Pf. This study highlighted the importance not only of var genes and their controlled expression but also of other genes that are associated with anchoring PfEMP1 to the erythrocyte surface and creating the Plasmodium cytoadherence complex [101].

The list of regulatory mechanisms underlying var gene monoallelic expression is vast and much more may still be discovered in this area. However, immune evasion in the Plasmodium genus is not confined to Pf or var gene regulation. Indeed, var gene expression is exclusive to Pf, with much still to be gleaned in the areas of immune evasion in human malaria parasites such as P. vivax, P. knowlesi, Plasmodium ovale and Plasmodium malariae [13, 102105]. In addition, PfEMP1 is just one of a number of variant surface antigens (VSAs) known to be expressed at the host erythrocyte surface upon infection with Pf, although it is the best characterized. Pf-infected erythrocytes also express VSAs of the multi-copy gene families of the proteins repetitive interspersed family (RIFIN), subtelomeric variable open reading frame (STEVOR), and Pf Maurer’s cleft 2 transmembrane (PfMC-2TM) [106]. The roles of these protein families in antigenic variation and pathology are generally poorly defined but are being elucidated; for example, RIFINs are implicated in the severity of Pf malaria in African children with blood group A. This tendency toward increased malaria pathogenicity is a result of their expression at the surface of infected host erythrocytes, from which they bind uninfected erythrocytes (preferentially, blood group A) to form rosette structures and mediate binding to the host microvasculature [107]. Thus, the combined roles of HP1 and HDA2 in regulating single var gene expression and the transcriptional regulator ApiAP2-G suggests that both processes share epigenetic regulatory mechanisms and that Plasmodium immune evasion and transmission to new hosts are inextricably linked [27, 29].

Immune evasion is not restricted to blood-stage Plasmodium; when the parasite passes through the mosquito it must also combat a sophisticated innate immune system that is very effective in reducing the parasite burden experienced by the vector. A forward genetic screen and WGS was used to identify the key parasite factor, the surface protein PfS47 (found on the surface of the ookinete as it penetrates the mosquito midgut), that appears to interact with and suppress the vector innate immune system [108]. PfS47 is thought to suppress signaling through the c-Jun N-terminal kinase (JNK) pathway that is critical to an effective immune response [109]. WGS demonstrated that PfS47 has a distinct population structure linked to global distribution. PfS47 is rapidly evolving and selected to achieve JNK suppression in diverse mosquito species, which becomes a key step in the adaptation of Pf to transmission in different vectors, thereby contributing to its broad global distribution [110].

Artemisinin resistance

The goals of MalariaGEN characterize a new approach to understanding parasite population biology. Through the generation and, these days more critically, the management and analysis of the colossal datasets that result from WGS of large numbers of samples, a well-organized study can draw meaningful conclusions. This was applied to perhaps the most serious threat to malaria control that has emerged in recent years—resistance to ART. Using these datasets in meta-analyses with clinical data describing the individual WGS-sequenced samples and outcomes of ART treatment allowed a path to be charted that associated SNPs with treatment features (such as delayed clearance) [111] and identified candidate genes [112]: in both studies a region of chromosome 13 was implicated (Fig. 3d). The precise gene encoding the protein KELCH13 was identified by a combination of “old-fashioned” selection of drug-resistant parasites in the laboratory followed by WGS and comparative genomics of the sensitive parental parasites and the progeny, as well as WGS of ART-resistant field isolates [113, 114]. The role of the kelch13 mutations in ART resistance was proven by direct genome engineering of kelch13 to generate resistant parasites [115, 116]. kelch13 SNPs have been used to map the alarmingly rapid spread of resistance throughout Southeast Asia [116] and it is clear that there is already significant but distinct kelch13 heterogeneity in African Pf strains, although there is no evidence of ART resistance [117121]. However, in-depth analysis of Southeast Asian ART-resistant parasite genomes [122] revealed that a complex array of background mutations (Fig. 3d) in a variety of genes (encoding ferredoxin (FD), apicoplast ribosomal protein S10 (ARPS10), multidrug resistance protein 2 (MDR2), and Pf chloroquine resistance transporter (CRT)) that are not yet described in African parasites would explain why ART resistance is not (yet) a threat to the use of ART in that continent [121].

A further puzzle was the large number of independent SNPs that seemed capable of mediating ART resistance—typically drug resistance is generated by one or a small number of SNPs focused on either altering the target binding site for the drug or preventing drug access to a binding site buried in the target structure. KELCH proteins are propeller proteins with an iterated structural motif that serves as a platform for the assembly of multi-protein complexes. In addition, KELCH13 has a BTB/POZ domain that might be involved in homodimerization, E3 ubiquitin ligase binding, and transcriptional repression (reviewed in [123]). It has been suggested that ART-resistance-associated kelch13 SNPs might cause a degree of reduced binding of Pf phosphatidylinositol-3-kinase (PI3K), which in turn results in its reduced ubiquitination and consequent degradation of PI3K (Fig. 3d). Elevated levels of PI3K generate increased amounts of its lipid product phosphatidylinositol-3-phosphate (PI3P), which then changes the physiological state of the parasite cell through signaling in as yet unknown pathways [124] but through a mechanism predicated on the proposed abundance of PI3P in the lumen of the endoplasmic reticulum and its proposed role in protein export beyond the parasite vacuole within the host cell [125]. However, aspects of this view have been challenged [126] and further studies are clearly required to resolve the possible role of PI3K signaling in ART resistance. It will be of interest to see if PI3K signaling impacts upon the unfolded protein response implicated in ART resistance using population transcriptomics [127]. The WGS data and two proteomics studies [128, 129] that demonstrate the wide variety of proteins from different cellular compartments of the target parasite that interact with activated ART together suggest that ART resistance is a pleiotropic phenomenon [123]. Therefore, other interrogations, such as metabolomics (see next section), might also be needed to gain functional insights into ART’s mode of action.

Translational implications for malaria control

Antimalarials

WGS has been instrumental in identifying the cellular target of novel Pf antimalarials as part of the drug discovery pipeline and in following the in vitro selection of resistant parasite lines and validation of observed genomic changes by reverse genetics as described for ART above. This approach has proved highly successful for spiralindolines [130], resulting in the identification of the target of NITD609 (also known as KAE609 or cipargamin) as the P-type ATPase PfATPase4. Furthermore, the translation elongation factor eEF2 has been identified as the target of the 2,6-disubstituted quinoline-4-carboxamide scaffold derivative DDD107498 [131]. WGS is not the only post-genome approach that is useful for determining modes of drug action; metabolomics has a similar potential for analyzing the metabolic changes produced in response to drug exposure and has been utilized in antibiotic [132] and anti-protozoan drug [133] investigations. A metabolomics-based approach also has the advantages that parasite lines resistant to the drug need not be generated and that the activity of pleiotropically acting drugs (such as ART) are directly observed rather than imputed from genomes of resistant parasites.

Vaccines

Post-genomic approaches have also identified promising new Pf vaccine candidates. For example, Pf reticulocyte-binding protein homologue 5 (RH5) binds to the human red cell surface receptor protein basigin, an interaction that is essential for erythrocyte invasion by Pf [134]. Recent WGS studies have shown that both the host and parasite proteins are highly conserved, that antibodies to RH5 block merozoite invasion of erythrocytes [135, 136], and that basigin itself is druggable by recombinant antibodies [137]. Although RH5–basigin interaction offers great promise, the challenges for vaccine development remain considerable and many promising candidates have fallen or will fall by the wayside due to an inability to formulate them to deliver effective vaccination, massive candidate gene sequence variability, and functional non-essentiality of the candidate. WGS will help identify non- or minimally variant candidates and should prove useful in monitoring the effect of vaccination and the analysis of “breakthrough” parasites (those developing in vaccinated individuals), as described in the next section. Effective subunit vaccines will be an invaluable additional approach to vaccination, supplementing other approaches such as the use of the promising but technologically challenging attenuated whole parasite, for example, sporozoite vaccine [138].

Surveillance

The identification of genome signatures of resistance through WGS in the laboratory and increasingly through large-scale genomic epidemiology provides a powerful tool to monitor the emergence of resistance in Plasmodium populations under selective pressure due to administration of both drugs and vaccines. In the case of drugs whose targets have been identified in the laboratory, specific, simple PCR-based assays can be devised. WGS of field parasites under drug pressure is still desirable, however, as alternative resistance mechanisms might emerge that would be missed by targeted assays and, with sufficient depth of sampling, new signatures of resistance would be identifiable from the sequence data. Similar surveillance of parasites that emerge post-vaccination might also be informative. An important analysis of the clinical trial of the RTS,S/AS01 malaria vaccine compared the strain-specific sequence of the gene encoding the circumsporozoite (CS) protein that comprised the vaccine with the CS gene sequences of strains in the infections actually encountered by immunized individuals (between 5 and 17 months of age) [139]. This study demonstrated that homologous protection was greater than protection against heterologous strains and that a cause of failure to protect was simply that the CS protein carried by the infecting parasites did not match that of the vaccine and so perhaps a protective effect was less likely [139]. Therefore, WGS has the power to guide vaccine design based upon the outcomes of trials.

Gene editing

A new era of genetic engineering has dawned with the discovery and development of the bacterial guide RNA template-targeted clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 recombinases as tools for the accurate editing of genomes. The technology has been successfully adapted to many species, including Plasmodium [140], Anopheles [141, 142], and humans (discussed in [143]). Currently, applications of CRISPR-Cas9 to Plasmodium manipulation are restricted to reverse genetic investigations of gene function. However, with the concepts of whole (pre-erythrocytic) parasite vaccines [144, 145], CRISPR-Cas9 offers an obvious route towards the generation of an immunogenic, non-pathogenic parasite that might be suitably safe to administer to humans as a vaccination strategy. Clearly, the engineering of human genomes at any stage of gestation is fraught with ethical considerations [146] and it is inconceivable that this will be applied to the improvement of human resistance to malaria in the foreseeable future. Conversely, although subject to similar ethical and ecological debate, significant conceptual advances towards the generation of CRISPR-Cas9-engineered Anopheles mosquitoes have been quickly achieved. Through harnessing the concept of gene drive, two independent teams have reported the generation of either engineered Anopheles stephensi (a major Indian vector of malaria) that is resistant to malaria [141] or sterile female Ag [142]. Again, owing to ecological considerations, it is unlikely that such engineered mosquitoes, although clearly feasible, will be released into the wild any time soon [147].

Conclusions and future directions

Despite the progress summarized here, the fundamental requirements of malaria research in any era remain the same; namely, new drugs to replace those that become ineffective, vaccines that work, and the means to administer them effectively. Genomics, post-genomic technologies, and associated computing developments have revolutionized investigations into the biology of the malaria parasite and the search for therapeutics or intervention measures. Significant progress has been made on many fronts, including candidate drug and vaccine discovery, parasite drug resistance mechanisms, host–parasite–vector interactions and parasite biology, and mechanisms of human resistance to malaria. Also, new concepts of combatting malaria via engineered mosquito populations have been introduced through the advent of novel genome-editing approaches such as CRISPR-Cas9.

We can anticipate that WGS will continue to improve in terms of both cost and quality, making sequencing of every desirable Pf isolate feasible. This would enable more detailed studies of population structure and dynamics, allowing the tracking of gene flow and genotype success that might even resolve at the village level and, further, potentially almost in real time. However, this will only happen if data storage, access, and computing technologies keep pace. Where Pf WGS studies have gone P. vivax research will follow and recent studies have revealed signatures of drug selection superimposed upon a far more complex (global, regional, and even within a single infection) population structure than Pf [148, 149]. Single-cell RNA sequencing will significantly improve our understanding of antigenic variation and variant and sex-specific gene expression.

More immediately, an important need is for surveillance, particularly in Africa, to look for kelch13 mutations and genotypes associated with ART resistance and a pan-African network is in place to monitor for this and collect samples [150]. Genomics will continue to be used in novel ways as well, for example, in studies of the outcomes of human interventions such as drug treatment and vaccination.

New fields of endeavor are also emerging that will certainly prove fruitful in the years to come. Lipidomics is a nascent discipline that will no doubt reveal insights into membrane composition and organization [151] and might also open avenues to therapy. PTMs such as palmitoylation give proteins the means to conditionally interact with membranes and Plasmodium makes extensive use of protein palmitoylation that should influence a range of important parasite biological activities, such as cytoadherence and drug resistance [152].

Although the power of genomics approaches is quite clear, direct biological investigations are frequently required to confirm or refute the findings that genomics might imply. The numerous examples given here indicate that although genomic analyses often generate associations and degrees of confidence about their conclusions, unequivocal confirmation is provided by genetic engineering (of parasites and their vectors at least). Genetic screens are powerful, often unbiased approaches to discover gene function. The recent development of the PlasmoGEM resource coupled with high-efficiency transfection and barcoded vectors permits genome-scale reverse genetics screens to be deployed that will undoubtedly reveal information about parasite-specific genes and Plasmodium biology [153]. Finally, many of the genes encoded by parasite, host, and vector genomes have unknown functions, the details of which are slowly emerging as technologies and assays improve. The staggering complexity of organismal biology and the interactions between parasite, host, and vector will continue to amaze but equally will offer hope for new and improved therapies.