Introduction

The obligate intracellular bacterium Coxiella burnetii is the causative agent of Q fever in humans [1,2,3]. Centers for Disease Control and Prevention identified this bacterium as a category B agent due to the low infectious dose, environmental stability, and aerosolized spread of the bacterium [2, 4, 5]. Humans infected with C. burnetii may present with a variety of different symptoms, ranging from asymptomatic to acute and further to chronic disease [3, 6]. Acute disease is typically characterized by flu-like symptoms, consisting of fever, fatigue, and chills [6]. Individuals which progress to chronic disease most commonly have endocarditis with culture negative blood, where hepatitis and chronic fatigue syndrome have also been described. C. burnetii is endemic worldwide, except for New Zealand, and most human outbreaks are blamed on domestic agricultural animals acting as reservoirs of the bacterium [3, 6, 7]. Cows, sheep, and goats represent the main animals of interest, where these animals also contract disease when exposed to C. burnetii [1, 5, 6, 8]. Coxiellosis in the small ruminant species, goats and sheep, tends to present with late-term abortions [8, 9]. While cattle may present with late-term abortions, they are more frequently affected by a decrease in calf birthweight or subclinical mastitis [8]. C. burnetii is found in large numbers within the placenta of aborted neonates but detection of the bacterium in the urine, milk, uterine fluid, vaginal mucus, and feces of parenteral animals has also occurred [7, 8, 10, 11].

The most widely accepted vaccines against Q fever, or coxiellosis, are known as Q-vax and Coxevac, where the vaccine contains either the Henzerling or Nine Mile Phase I (RSA 493) isolate of C. burnetii fixed with formalin [1, 7, 10, 12,13,14]. These vaccines are not available within the United States [1, 13]. Q-vax is used for human vaccination in Australia and is known to cause adverse side effects in individuals which have had previous exposure to the bacterium [12, 13]. Contrastingly, Coxevac is exploited in Europe for vaccination of agricultural species, wherein this vaccine was used to attempt containment of the 2007–2010 Netherlands outbreak [7, 10]. Either of these vaccination techniques require the producer to culture large amounts of a category B bacterium, a process that is both costly and hazardous [10, 12]. Therefore, investigation into new vaccines has been initiated through isolation of surface antigens or identification of seroreactive proteins [15, 16]. While surface isolated proteins can confer protection, it does not eliminate the cost or safety concerns during product generation.

A clear need exists for low cost, broadly applicable vaccines and especially those that can be produced in safer biosafety level 2 conditions. Subunit vaccines can meet this need, and a new generation of work on C. burnetii vaccines has begun based on specific epitope definition. Multiple studies have identified small numbers of epitopes used in human or mouse immune responses, and a few studies have produced subunit vaccines [13, 14, 17,18,19]. The general conclusion of such work has been that multiple epitopes will be needed to achieve protective immunity [13, 19]. The next challenge is to achieve comprehensive, genome-wide evaluation of potential key epitopes coupled with optimization to achieve broad protection across the multiple host species of this zoonotic pathogen.

Bioinformatic tools have been developed to more quickly and cost effectively assess proteins as host antigens [20,21,22,23]. This strategy is known as reverse vaccination development, wherein in silico methods cut down the number of initial screening experiments required to identify putative stimulants of the adaptive immune response [20, 24, 25]. In silico techniques assess the antigenic ability of peptides by modeling their potential immune system interactions as T- or B-cell epitopes [20, 22]. Identification of T-cell epitopes typically evaluates the ability of peptides to be loaded into major histocompatibility complexes, either MHCI or MHCII, wherein both play an important role in the adaptive immune response [21, 22]. MHCI molecules are present on all nucleated host cells and define whether a host cell has been compromised by an invading pathogen [26]. On the other hand, MHCII molecules decorate antigen presenting cells, which function to aid in the initiation of an organized adaptive immune response [21, 22, 27].

Success in the use of T-cell epitope predictors has been seen in rapidly mutating viruses, like HIV and influenza, and in fastidious bacteria [1, 20]. More specifically, the Brucella mellintensis protein Omp31 has been of major study during multi-subunit vaccine development against this bacterial agent [28,29,30]. Research looking into peptide recognition by human monoclonal antibodies isolated similar peptide fragments as B-cell epitope bioinformatic predictors [28, 29]. Additionally, random peptide generation from the Omp31 amino acid sequence allowed for IFN-γ production by T-cells in sheep, wherein the major epitope of interest was bioinformatically determined to be a T-cell epitope in humans later on [29, 30].

For C. burnetii, addition of either CD4+ or CD8+ T lymphocytes alone to infected SCID mice was sufficient to achieve immune control of C. burnetii [31]. C. burnetii clearance by macrophages has been shown to rely on IFN-γ production by T-cells during the adaptive immune response, which requires accurate loading of antigenic peptides into MHCII molecules for T-cell presentation [13, 15, 21, 32]. Accompanying these data are knockout mouse models that promote the importance of CD8+ T-cells in controlling bacterial replication and host tissue pathology, suggesting that MHCI peptide loading also plays an important role during C. burnetii infection [27, 31]. Furthermore, it is presumed that cytotoxic T-cells acting on infected host cells degrades availability of the intracellular niche required by this bacterium [27]. While B-cell depletion suggests a role in tissue pathology during C. burnetii infection, the inability to link humoral immune responses to restricted bacterial replication suggests that B-cells are not a major player in the control of disease [31, 33]. Thus, this work will focus on identification of T-cell epitopes supporting these beneficial immune responses. Many previous works investigating C. burnetii epitopes have focused on known type IV secretion system (T4SS) effectors or proteins eliciting antibody response [14, 17, 19]. The following work will provide the first comprehensive analysis of C. burnetii T-cell epitopes on a proteome-wide scale. This will also be one of the few applications to investigate a bacterial proteome, since most prior work has focused on smaller viral proteomes [34]. Furthermore, we will incorporate data from a range of C. burnetii isolates to identify conserved epitopes with broad utility and leverage predictions from human, mouse, and ruminant hosts to facilitate development of optimally useful vaccines for this zoonotic pathogen.

Results

Conserved Coxiella burnetii proteome

C. burnetii isolates are genetically diverse, wherein they secrete different type four secretion system effectors, contain antigenic variation, and form a plethora of genomic groups based on multiple loci variable number of tandem repeats analysis (MVLA) [6, 16, 35,36,37]. For this reason, a proteome-wide comparison between Coxiella isolates was completed to ensure pursuit of epitopes within conserved proteins. Nine Coxiella burnetii isolates were referenced against Nine Mile Phase I (RSA 493) during proteome-wide comparison. Each strain, with its genomic grouping, tissue of isolation, characteristic of interest, and human virulence, if known, are listed in Table 1. Two genomic group four isolates were chosen based on the observation that this genomic group contains the highest amount of genomic variance between contained isolates [37].

Table 1 C. burnetii isolates chosen for proteome-wide comparison

The tested isolate with the highest percent identity to Nine Mile Phase I (RSA 493) is Ohio 314 (RSA 270) (Fig. 1). This is expected as both isolates belong to genomic group I, indicated by Hemsley et al. [37]. The isolates demonstrating the lowest percent identity compared to Nine Mile Phase I (RSA 493) are Dugway 5J108-111, MSU Goat Q177, Schperling, and CbuG_Q212. The prior strains come from genomic groups IV to VI and represent more divergent isolates as compared to Ohio 314 (RSA 270). Analysis of the overall number of absent or low conservation proteins compared to Nine Mile Phase I (RSA 493) revealed variation between C. burnetii isolates (Table 2). In agreement with the pictorial representation of the proteome-wide comparison, less related genomic groups trended towards an increase in the number of absent and unconserved proteins. One exception to this trend was genomic group II-b isolate Z3055, which was missing 201 proteins when compared to Nine Mile Phase I (RSA 493), similar to genomic groups IV-VI. Previous examination of Z3055 has demonstrated that this isolate has an increase in the number of non-synonymous mutations, insertions, and deletions [38, 41].

Fig. 1
figure 1

Proteome-wide comparison using Nine Mile Phase I (RSA 493) as a reference strain. The outermost strain is Nine Mile Phase I (RSA 493) and the remainder of the strains moving inward are as follows: CbuG_Q212, Z3055, 701CbB1, Henzerling, Q545, Ohio 314 (RSA 270), Dugway 5J108-111, Schperling, and MSU Goat Q177. The percent identity is indicated by color, where purple-blue is ~ 100–99% identity, green-yellow is ~ 98–70% identity, orange is ~ 69–30% identity, and red is ~ 29–0% identity. Image provided as output from PATRIC database [57]

Table 2 Numbers of poorly-conserved proteins between Nine Mile Phase I (RSA 493) and isolates of interest

A total of 352 proteins were removed upon the basis that the Nine Mile Phase I (RSA 493) proteome lacked a homolog in one of the nine isolates aligned. These predominantly consisted of hypothetical proteins and transposases as opposed to better studied proteins. Overall, proteome-wide comparison between C. burnetii isolates and Nine Mile Phase I (RSA 493) resulted in the identification of 1,413 conserved proteins.

Determination of host homologs in Coxiella burnetii

During epitope identification, and future vaccine generation, it is necessary to avoid sensitizing the host’s immune system against itself. Therefore, the resultant protein list was queried using Blastp analysis against the host species of interest (cow, sheep, goat, and human) and the murine disease model for C. burnetii. BlastGrabber analysis determined that 391 of 1,413 C. burnetii conserved proteins shared homology with species of interest [45]. Thus, the final list of C. burnetii proteins for further analysis consisted of 1022 proteins and an overview of the protein selection process can be seen in Fig. 2 (Additional File 1).

Fig. 2
figure 2

Data Generation and Analysis Flowchart. Steps of data generation are highlighted in larger text. Programs used and data refinement measures are defined in smaller text below

Human and Murine MHCII Epitopes Present in C. burnetii

Once a list was generated that contained conserved C. burnetii proteins, which lacked host homology, it was possible to exploit NetMHCIIpan 4.0 to define MHCII epitopes. While every murine allele was tested, there were an abundance of human alleles known. To mitigate the number of human alleles, allelic frequency, geographical abundance, and phylogenetic distance were considered (Methods and Additional file 2A/B). In the end, 206 human allelic pairings were chosen to represent common alleles within major clades for MHCII epitope inquiry. Proteome-wide analysis of program derived 15mer peptides returned a total of 293,520 peptides tested. Of these, there were 67,528 peptides that did not bind any of the human alleles. Furthermore, there were 184,615 peptides that did not bind any of the murine alleles. After screening previously identified epitopes to harmonize quality control metrics (Additional files 3 and 4), we found an average binding score of 186 (90%) or strong interaction with 93 (45%) allelic pairings examined during human analysis. On the other hand, the comparison between the datasets for murine analysis delineated an average of 8 (100%) bound alleles or 5 (65%) alleles with strong peptide interaction. Use of these defined numbers to filter the output data returned 1217 and 4072 MHCII epitopes for human and mouse, respectively (Additional file 5). A composite list highlighting MHCII epitopes recognized by both species may be found in Additional file 6 and Fig. 2 summarizes the generation of the composite list. Epitopes that were less than seven amino acids apart were treated as one epitope and the position with the highest human peptide:allele interaction value was retained.

Table 3 Human MHCII epitopes with presentation by an exceptional range of host alleles

Overall, there were 453 peptides, corresponding to 338 total proteins, determined to bind a high number of human and murine alleles or interact with many of the tested alleles strongly. Peptides within this data set that bound to 100% of the tested alleles or proteins that contained greater than or equal to 3 epitopes were isolated to further consolidate the data. Ten peptides bound all 206 human alleles (Table 3). A total of 347 peptides bound all 8 murine alleles (Additional file 7). This is not surprising considering the initial data examination filtered the murine output by focusing on peptides that bound 100% of the alleles analyzed. Marked epitopes within Additional file 7 represent peptides that were one to seven amino acids removed from the epitope observed in Additional file 6; where human peptides with higher binding events were kept during discrepancy in Additional file 6, Additional file 7 retained epitopes that had higher numbers of peptide:allele binding events when considering murine alleles. Of the ten peptides that bound every human allelic pair tested, only one, 9-DKEIRAISDYVVNHK-23 of AAO90441.1 (prpD), did not bind all eight murine alleles analyzed.

Table 4 MHCII epitope-dense proteins

Evaluation for epitope dense proteins consisted of data consolidation through isolation of proteins containing a high number of epitopes [24, 46]. Analysis of the 338 proteins with high scoring MHCII-epitopes determined that there were 85 proteins with more than one epitope present. Examination of proteins with three or more epitopes present shortened this list to 20 proteins (Table 4). Notably, three epitope dense proteins also had epitopes that bound every human and murine allele tested; these were AAO89704.2 (ftsA), AAO90965.2, and AAO91357.1 (parC). Furthermore, AAO90965.2, along with AAO90357.1 (parC), encompassed the highest number of epitopes per protein with 5 total epitopes present in either protein.

Human, murine, and bovine MHCI epitopes

It has become increasingly evident that CD8+ T-cells play just as important of a role during resolution of C. burnetii infection as CD4+ T-cells [27, 31]. While MHCII epitope prediction allows determination of antigenic peptides for CD4+ T-cells, there are also MHCI epitope prediction programs available that can help identify antigenic peptides specific for CD8+ T-cell recognition [20, 21, 23]. One such program is NetMHCpan 4.1, which has recently been re-trained in its ability to recognize bovine MHCI epitopes, thereby allowing study of another host species of interest [47]. The same list of conserved C. burnetii proteins without host-similarity was tested against human, mouse, and bovine MHCI alleles. Similar to NetMHCIIpan 4.0, NetMHCpan 4.1 has a large number of human alleles available for testing. Therefore, phylogenetic trees and geographical frequency of alleles were exploited to alleviate the total number of human alleles run (Methods and Additional file 2C/D), where a total of 82 human alleles were examined during NetMHCpan 4.1 analysis. In addition, we tested all 8 murine alleles and all 105 bovine alleles present on the server.

NetMHCpan 4.1 generates 8-, 9-, 10-, and 11-mer peptides during allele binding assessment, thereby 1,196,564 peptides were generated and tested in their ability to interact with human, murine, and bovine alleles. The number of peptides that did not bind any alleles varied per species and were 783,576; 1,033,923; and 842,516 for human, murine, and bovine respectively. MHCI epitopes have been less widely studied and are therefore less represented in Additional file 4. Accordingly, there were fewer epitopes to aid in the determination as to where the output cut-off values would reside for data filtration. Comparison of these previous epitopes with the present data output determined an average of 51 (62%) bound alleles or a strong interaction with 18 (22%) alleles. While this allowed for a relatively stringent cut-off for the number of peptides binding alleles, the output list was increased by two- to four-fold when peptides that interacted strongly with twenty percent of alleles were included. For this reason, the quantity of alleles strongly bound was restricted to the lower value, 45% of alleles, from MHCII analysis. In examining alleles that bound either 60% of alleles tested or 45% of alleles strongly, there were 1,367 human peptides, 5,355 murine peptides, and 4,438 bovine peptides returned (Additional file 8). As before, the output was searched for duplicate GenBank IDs and positions. A number of returned peptides were only present in murine and bovine analyses, manual annotation thereby allowed for identification of plausible epitopes in all three species tested (Additional file 9).

Table 5 Human and bovine MHCI epitopes with presentation by an exceptional range of host alleles
Table 6 Epitope dense proteins during MHCI epitope analysis

Data annotation to isolate epitopes represented in human, murine, and bovine species returned 777 MHCI epitopes within 489 different proteins. The data was further evaluated by looking for peptides binding a high number of alleles or for epitope dense proteins. Contrary to MHCII epitope data, there were not any peptides that bound all the bovine or human alleles tested. In order to analyze peptides that bound a high number of alleles tested, the cut-off value was lowered to 98% alleles bound. This returned 17 peptides binding 103 alleles in cattle and 171 peptides binding 8 alleles in the mouse (Table 5 and Additional file 10). This new definition of high allelic binding continued to lack peptide records within the human analysis. The stringency was therefore further lowered to look at peptides that interacted with 90% of the human alleles tested, which led to the identification of 3 human peptides (Table 5). Table 5 shows that highly bound peptides with the most extreme scores do not overlap between the human and bovine species. In comparing human peptides that show exceptional binding to those peptides binding many alleles in the murine species there is only one coinciding protein, AAO91456. Within this shared murine and human protein, the peptide is positionally located at amino acid 54 for human and 261 for the mouse. Contrastingly, the bovine highly bound peptides are predominantly identical to those found within the murine data, where only proteins, AAO89868.2, AAO89977.1, and AAO90780.1, do not coincide. Of these, AAO89868.2 and AAO90780.1 are not represented within the murine data and AAO89977.1 has an epitope present in an alternate position.

Table 7 MHCI epitopes that overlapped or are partially contained within MHCII epitopes

In studying MHCI epitopes for epitope dense proteins, we found a higher number of epitopes per protein (7 in AAO91182.1) was achieved as compared to a maximum of 5 MHCII epitopes (Table 6). There were 28 proteins classified as epitope dense when assessing the MHCI epitope data for proteins with four or more epitopes. Of the epitope dense proteins identified, there was one present in the human analysis, twenty-one present in mouse data, and two present in bovine analysis when comparing the proteins identified as containing epitopes with high allelic coverage (Table 5 and Additional file 10). Human analysis identified CBU_1967, where cattle analysis contained proteins CBU_0425 and CBU_1686. The epitope dense proteins that were missing in the murine high allelic output were CBU_0685, CBU_1226, CBU_1228 (qseC), CBU_1242, CBU_1489 (lpxH), CBU_1928, and CBU_1978 (ostA).

Consolidation of epitopes or proteins from MHCI and MHCII data

Assessment of the C. burnetii proteome for both MHCI and MHCII epitopes enables identification of multi-use epitopes and proteins. There were 31 epitopes that had overlapping use by MHCI and MHCII (Table 7). Of these epitopes, only one has been previously studied and is present in Additional file 4; this is Com1 (CBU_1910) [9, 13, 14, 17,18,19]. Other notable aspects were that some of the epitopes constituted a complete overlap whereas others were mildly overlapped. In total, eleven of the thirty-one epitopes completely overlapped between identified MHCI and MHCII epitopes. Furthermore, Inmembrane predicted that approximately fifty percent of the epitopes were cytoplasmic and that the remaining fifty percent were in some way associated with the bacterial membrane.

GenBank IDs from MHCI and MHCII output summary tables, Additional files 6 and 9, were combined to determine if additional epitope dense proteins would be observed. The resultant proteins can be seen in Table 8, where 33 epitope dense proteins were identified with at least 5 epitopes. Seven of these proteins were not previously identified when looking at either MHCI or MHCII epitope dense proteins alone (GenBankIDs are AAO89890.1 (thiDE), AAO90155.1 (yaeT), AAO90323.2, AAO90990.2, AAO91128.1 (icmO), AAO91393.1, and AAO91455.1 (hemA)). Additionally, there were 19 proteins absent from the combined epitopes dense protein list that were previously encompassed in either the MHCI or MHCII data. Many of the proteins which were lost in the combined epitope dense protein table represent proteins containing the number of epitopes near the bottom of the previous cut-off values. None of the previously studied proteins in Additional file 4 were present as an epitope dense protein in the unified MHCI and MHCII Table 8. Nine of the epitope dense proteins also contained overlapping epitopes; however, these epitopes were considered separate during quantification due to their binding alternate immune major histocompatibility complexes. In comparing MHCI and MHCII epitope results it was possible to elucidate epitopes or proteins that could stimulate both cytotoxic T-cells and T-helper cells.

Table 8 Proteins with ≥ 5 epitopes present overall for MHCI and MHCII

Discussion

We sought to leverage both C. burnetii and host genomic diversity to predict widely useful T-cell epitopes across a range of hosts for this zoonotic pathogen. Epitopes were identified by leveraging an array of MHCII and MHCI alleles for antigen presentation, thereby capturing epitopes incorporated in both MHC systems across multiple host species. The results highlight broadly useful epitopes, including many with minimal prior study, that can be used for future work and vaccine development.

Foundational data aimed to capture broad representation of C. burnetii and focus on proteins that would avoid self-reactive antigens. In particular, we selected at least one sequence from each genomic group (Table 1), including the relatively minimal genome of virulent Nine Mile Phase I (RSA 493) as a reference. This resulted in a refined list of 1413 conserved proteins for further analysis. This list was further screened for homology within human, mouse, and ruminant host proteins to avoid stimulating potential autoimmune responses. 391 such proteins were identified, suggesting large-scale use of host protein domain structures by C. burnetii. During assembly of the protein query list, it became apparent that a substantial number of annotated genes within the Nine Mile Phase I (RSA 493) genome lack discovery work and that many underlying functions are suggested by homology to alternate bacterial proteins. This promotes analyzing the bacterial proteome in its entirety, as the importance of many C. burnetii proteins has yet to be determined.

Relatively few Gram-negative bacteria have been examined for T-cell epitopes on a proteome-wide basis [34], leaving much of the previous epitope studies examining effector proteins or proteins residing at the cellular surface [24, 48,49,50]. This is no exception for studies examining C. burnetii proteins for host cell epitopes, wherein previous work has focused on proteins injected into the host cytoplasm by the type four secretion system (T4SS) or proteins which elicit an antibody response [13, 14, 17]. Resolution of C. burnetii infection is known to rely on the production of a Th1 type immune response that results in the production of IFN-γ [15, 32, 33]. This immune response is accomplished by coordination of T-helper cells through interaction with MHC class II peptide loaded molecules and a harmonized cytokine environment [22]. Therefore proteome-wide analysis for C. burnetii contained epitopes began with identifying MHC class II interacting peptides (See Repository). The MHC class II analysis herein identified numerous epitopes with relatively high allelic interactions (Additional file 6), many with cross-species presentation (Additional file 7). Some had presentation by an exceptional range of host alleles (Table 3), and many were clustered in epitope dense proteins of special interest (Table 4). Studies looking at the importance of different immune cellular subsets during C. burnetii infection has led to increased interest in CD8+ T-cell stimulation, which requires MHC class I presentation of peptides [27, 31]. As such, similar methodology was implemented to identify epitopes binding an exceptional number of host MHC class I alleles (Table 5 and Additional file 8) and epitope dense proteins characterized by MHC class I binding (Table 6).

The Dugway 5J108-111 isolate of C. burnetii represents the only known avirulent strain included in the following analysis and was included to exemplify the high degree of genomic variability contained between bacterial isolates [37, 39, 41]. Discarding the Dugway 5J108-111 isolate would result in the addition of thirteen proteins to the analysis, where two would be removed upon identification of host homologs (Additional file 12A). Examination of the remaining eleven proteins determined that their inclusion would minimally alter the data included herein, as only three new MHCI T-cell epitopes with cross-species representation were discerned (Additional file 12B). Notably, none of these additional epitopes bound an exceptional number of alleles tested nor did they encompass epitope dense proteins.

Examination of either the MHC class I or II datasets demonstrates the return of proteins which have not previously been studied for T-cell epitopes. As mentioned before, much of the earlier work identifying T-cell epitopes has focused on certain protein subsets [9, 13, 14, 16, 19]. Therefore, return of novel epitope-containing proteins does not preclude epitopes defined within this work; instead, these epitopes may represent more immunogenic peptides that exemplify a range of host species. For example, a group of novel epitope-containing proteins can be seen within the MHC class II and I datasets and are responsible for bacterial cell division, encompassing AAO89704.2 (ftsA), AAO89682.2 (ftsI), and AAO90095.2 (rodA) [51]. The MHC class I analysis for bacterial epitopes supports the addition of a ruminant species to the dataset. It is believed that many human outbreaks arise from domestic ruminants, consisting of sheep, goats, and cattle, therefore vaccination efforts in ruminants may help in the prevention of zoonotic spread [3, 6, 7]. Furthermore, coxiellosis in animals does not come without consequence, where sheep and goats present most frequently with late-term abortions and cattle have decreased birthing weights and possible mastitis [8]. Consequently, Coxiella burnetii infection in these species causes clear economic losses and requires intervention.

A potential pitfall of bioinformatic analysis of T-cell epitopes is the possibility of false positives [14, 21, 52]. This hinderance has been largely combated through the inclusion of more MHC ligand elution data during server training [21, 23, 47]. During this research, alleviation of false positives was attempted by assessing a plethora of different MHCI and MHCII alleles and investigating the peptides which had high allelic coverage. It is presumed that false positives arise due to a lack of training data between alleles and that analysis of a myriad of alleles would promote dilution of false positives [21, 47, 52]. When considering the 8 murine alleles tested during use of either NetMHCpan 4.1 or NetMHCIIpan 4.0, as compared to either 82-206 human alleles or 105 bovine alleles, it is noticeable that there were an increasing number of peptides falling within the filtered data sets (Additional files 6 and 8). This data is suspected to contain a number of false positives, but comparison with high binding peptides of human and cattle alleles is believed to lessen this burden. Previous research on C. burnetii defined T-cell epitopes have used methodologies that measure the ability to achieve host T-cell activation in response to epitopes of interest; including EliSpot, ELISA, flow cytometry, and peptide loading into MHCs [13, 14, 18, 19]. It remains imperative to test returned T-cell epitopes for their ability to interact with the host immune system before production of vaccine candidates may begin.

Once data had been acquired for both MHC class I and II alleles, it became possible to cross-analyze outputs. Investigation into overlapping MHC class II and I epitopes defined 31 peptides of interest (Table 7). Com1, a well-studied C. burnetii protein of interest, was represented within this output. Importantly, former analysis of Com1 as a vaccine candidate against C. burnetii has demonstrated a decent amount of promise [13, 18, 19]. Specifically, mice exposed to Com1 were afforded better protection during challenge assays and produced IFN-γ during immune system stimulation. Unfortunately, Com1 was categorized as a secreted protein by Inmembrane, where it is a well-studied surface associated protein [16, 18, 36]. It is likely that there is a secondary processing step that is not recognized by Inmembrane. This does not disqualify the overall purpose for such notation, as many vaccination efforts have focused on surface proteins, where it is believed that these proteins most readily interact with the immune system during infection [1, 25, 53]. While care should be taken regarding protein location, proteins residing at the level of the membrane or that are secreted would suggest improved immune recognition.

Com1 did not remain in the MHC class I and II cross-analysis when assessing for epitope dense proteins (Table 8). Likewise, none of the previously studied proteins present in Additional file 4 are represented in the 33 epitope dense proteins composed from MHC class I and II data. Of these novel epitope-containing proteins, there were seven that were not returned when assessing MHC class I or II epitope dense proteins alone. These are AAO89890.1 (thiDE), AAO90155.1 (yaeT), AAO90323.2, AAO90990.2, AAO91128.1 (icmO), AAO91393.1, and AAO91455.1 (hemA), which represent epitope rich proteins that have a balanced MHC class I and II coverage. Three of the previously mentioned proteins are designated as secreted or membrane exposed proteins by Inmembrane, AAO90155.1 (yaeT), AAO91128.1 (icmO), and AAO91393.1. Therefore, these proteins are suggested to more readily interact with the immune system upon arrival of the bacterium within host tissues. IcmO and YaeT are significant proteins in regards to host:pathogen interaction as IcmO is part of the multi-subunit T4SS and YaeT is responsible for assembly of beta-barrel surface proteins [54,55,56].

Cross-analysis between MHC class I and II data allows for future vaccination efforts to cover both classes of T-cell epitopes. Furthermore, the investigation herein also aids in epitope decision with regards to alternate vaccine types. For instance, identified epitope dense proteins provide a source of epitopes which can partake in a vectored vaccine [20, 34]. On the other hand, when looking at proteins that contain overlapping MHCI and MHCII epitopes, there is the possibility of using the epitopes in a heterologous recombinant subunit vaccine. As a result, the provided data allows for vaccination efforts against Coxiella burnetii to move forward without restrictions on the approach to be used.

Conclusions

These data represent the first comprehensive, proteome-wide examination of T-cell epitopes for C. burnetii. The use of multiple divergent C. burnetii isolates enabled the identification of widely conserved proteins and epitopes to empower future work. Furthermore, the use of multiple host species for antigen presentation analyses supports the existence of widely conserved epitopes that can be broadly useful across many host species for this zoonotic pathogen. The specific results highlight many proteins and epitopes not previously described in regards to host immune recognition, and in so doing provide useful direction for future work in developing epitope-rich vaccines.

Methods

Proteome-wide comparison between Coxiella burnetii isolates

The PATRIC database (Pathosystems Resource Integration Center) was exploited to run proteome-wide comparisons between C. burnetii isolates (https://www.patricbrc.org/) [57, 58]. Bacterial isolates selected and their corresponding assembly numbers are as follows: Nine Mile Phase I (RSA 493) (ASM776v2), Dugway 5J108-111 (ASM1710v1), MSU Goat Q177 Priscilla (ASM16887v3), CbuG_Q212 (ASM1986v1), Z3055 (Z3055), 701CbB1 (ASM263396v1), Henzerling (ASM263402v1), Schperling (ASM263406), Q545 (ASM289675v1), and Ohio 314 (RSA 270) (ASM224728v1) [37, 38, 59,60,61,62]. Of these, Nine Mile Phase I (RSA 493), MSU Goat Q177, and Schperling updated assemblies were not loaded into the PATRIC database. These three proteomes were downloaded from the National Center for Biotechnology Information (NCBI) database as multi-FASTA files. Nine Mile Phase I (RSA 493) was chosen as the reference strain during analysis because of its short genome length and well-documented virulence [38, 39]. An E-value of 1e−8 was used, where proteins were considered homologs if the percent identity was 90% or above [37, 63].

Homolog identification in the host species

Nine Mile Phase I (RSA 493) proteins found to be conserved between C. burnetii isolates were entered as a multi-FASTA file onto the Blastp server and analyzed for homologs present in host species. The host species tested and their taxonomic Id’s are as follows human (txid 9606), mouse (txid 10,088), cow (txid 9913), goat (txid 9925), and sheep (txid 9940). BlastGrabber was exploited to analyze results obtained from NCBI’s basic local alignment search tool (BLASTp) [45]. An E-value cut-off of 0.01 (1e−2) and a percent identity greater than 35% was set based on previous experimental methods used to remove host homologs from analysis [24, 63, 64].

Phylogenetic analysis for human MHC alleles

The top ten most common MHCI alleles for eleven global regions were determined using the Allele Frequency Net Database (AFND) (http://www.allelefrequencies.net/default.asp) [65, 66]. Duplicate alleles were removed from the resultant list and protein FASTA sequences were obtained from the International Immunogenetics Information System/Human Leukocyte Antigen (IMGT/HLA) database (https://www.ebi.ac.uk/ipd/imgt/hla/) [67]. Of the remaining MHCI alleles, there were three allelic FASTA sequences that were no longer available within the database and were therefore excluded going forward; these were A*29:25, A*29:50, and A*02:264. Phylogenetic trees were built using MEGA X, wherein 1,000 bootstraps were run during the construction of both a neighbor-joining and maximum likelihood tree [68]. Afterwards, the trees were condensed so that only bootstrap values above 80 were involved in branch generation (Additional file 2C/D). If MHCI alleles were closely related, then a representative allele was chosen based upon its representation within the annotated geographic regions denoted by the AFND. There were 83 human MHCI alleles chosen for epitope analysis from NetMHCpan 4.1. The MHCII DRB1 locus has annotated data for the top ten alleles for each of the eleven geographic regions on AFND. Contrastingly, the DPA1, DPB1, DQA1, and DQB1 loci did not have region associated data. Alleles in these alternate loci were chosen based on an allelic frequency that was greater than or equal 0.05 in any one geographic region, where the database was filtered for gold and silver data that were obtained from available literature [65]. Protein FASTA sequences were again obtained from the IMGT/HLA database. Notably, DRB1*04:140, DRB1*04:155, DRB1*12:09, DPB1*26:01:01, DPB1*101:01, DQA1*05:02, and DQB1 02:03:01 MHCII alleles were partial sequences and were removed from further analysis. MEGA X was used to make a neighbor-joining and maximum likelihood tree with the remaining MHCII alleles using a minimum of 999 bootstraps per analysis (Additional file 2A/B) [68]. The remainder of the MHCII analysis was completed as described above for the MHCI analysis. There were 28 DRB1, 4 DPA1, 27 DPB1, 10 DQA1, and 7 DQB1 alleles chosen for epitope inquiry, governing a total of 206 allelic parings.

Identification of human, murine, and bovine MHC epitopes

Conserved Nine Mile Phase I (RSA 493) proteins lacking homology to host species were loaded onto the NetMHCpan 4.1 database for analysis across multiple host species (https://services.healthtech.dtu.dk/service.php?NetMHCpan-4.1) and (http://www.cbs.dtu.dk/services/NetMHCpan/) [23, 47, 69] Of the approximately 3,000 human MHCI alleles, 83 were chosen based upon locus frequency within defined populations, representation of alleles in more than one region, and greater evolutionary distance as discerned by phylogenetic tree analysis. During this investigation it was determined that allele B*13:07 N was not available for assessment on NetMHCpan 4.1, decreasing the number of human alleles assessed to 82. There were 8 murine MHCI alleles present, which sought to represent the available inbred strains of lab mice. Lastly, 105 BoLA (bovine leukocyte antigens) MHCI alleles were recently trained for server inclusion and allowed for representation of a host ruminant species. Each of these MHCI allelic groupings were evaluated over the course of multiple program runs. A complete list of tested MHCI alleles can be found in Additional file 11. The threshold values were set at 0.5 for %Rank of a strong binder and 2 for %Rank of a weak binder during the assessment. Peptide length was kept at the baseline parameters, wherein this gave 8-, 9-, 10-, and 11-mer peptides in the output.

NetMHCIIpan 4.0 was exploited to study peptides that can bind human or murine MHCII alleles (https://services.healthtech.dtu.dk/service.php?NetMHCIIpan-4.0) [21, 23, 70]. There were 8 murine MHCII alleles and 936 human MHCII alleles present on the given server, which generates thousands of human MHCII complexes. Human MHCII alleles to be tested were chosen based on the previously mentioned phylogenetic analysis. Threshold values identified a strong binder as a %Rank less than 2.0 and a weak binder as a %Rank greater than or equal to 2.0 and less than or equal to 10.0. The standard peptide length of 15 amino acids was kept during this investigation. A complete list of tested MHCII alleles can be found in Additional file 11. Positional output differed by one amino acid base between NetMHCIIpan 4.0 and NetMHCpan 4.1 (starting positions designated as 0 versus 1); therefore, all output data was standardized to achieve consistent positional designation.

C. burnetii proteome localization

The multi-FASTA file that contained conserved bacterial and nonhomologous host proteins was run through Inmembrane to determine each protein’s localization within the bacterium [71]. The program coordinates runs for a combination of bioinformatic tools consisting of TMHMM, SignalP, LipoP, and HMMER [72,73,74,75].