Introduction

Cancer immunotherapy is a major modality of cancer treatment. It is often used in combination with other cancer treatments such as surgery, radiotherapy, and chemotherapy. Immunotherapy is an intervention that modulates immune responses for effective targeting and elimination of tumor cells. The term immunotherapy is quite broad and encompasses a variety of treatments; some enhance immune response in a very general way, while others direct the immune system to specifically target cancer cells. Although immunotherapy has been well documented for more than a century [1], the mechanisms of immune responses and tumor evasion remain poorly understood. Understanding the genetic background of the patient as well as the molecular characteristics of individual cancers at the time of diagnosis, as well as during therapy, is essential for stratification of patients, assessing their responses, and selection of optimal therapies [2]. Immunotherapies focus on tumor-specific antigens as well as on regulatory mechanisms that drive effective immune responses. Inducing immune responses to cancer is a delicate balancing act where effective immunity against the cancer cells should be enhanced, while the autoimmune responses against normal cells should be minimized [3]. Even when successful in this aspect, immunotherapies often lose their efficacy over time due to immune evasion by tumor cells. Immune evasion by tumor cells involves several mechanisms such as downregulation of HLA expression [4], infiltration of immune suppressive cells (e.g., Tregs [5], MDSC [6] ), expression of immune suppressive molecules, e.g., IDO [7] and Arg-1 [8], lack of chemokine-mediated trafficking [9], poor innate immune cell activation [10], or immune checkpoint ligands like PD-L1 and PD-L2 [11]. Passive immunotherapies involve administration of antibodies and other immune system products that provide immunity but without activation of the host’s immune responses, while active immunotherapies stimulate host immune responses against tumor cells by activating them to respond against tumor antigens. In this review, we will focus our attention to active immunotherapies using tumor antigens to target cancer cells.

Identification and detailed characterization of vaccine targets is an essential step in tumor vaccine development. Technical advances in instrumentation, sample processing, immunological assays, and bioinformatics techniques have generated large amounts of immunological data, including experimentally identified tumor antigens and T cell epitopes, novel tumor biomarkers, and differentially expressed genes or proteins identified through genomics, proteomics, or other high-throughput methods. Collection, analysis and management of these data require extensive use of bioinformatics applications. Here we describe bioinformatics tools and data resources used for the study of cancer immunotherapies, focusing primarily on T cell-based therapies.

Tumor antigens for T cell-based therapies

T cell epitopes are short peptide fragments of 8–12 and 13–25 amino acids in length for HLA class I and II, respectively [12, 13]. In the classic mode of CD8+ T cell activation, intracellular proteins are processed in the cytoplasm and cleaved by the proteasome into small peptides [14], which are then delivered into the endoplasmic reticulum by transporter associated with antigen processing (TAP) proteins [15], where they bind the HLA class I molecules and subsequently present on the cell surface as T cell epitopes. The T cell epitopes on the surface of target cells are screened and recognized by CD8+ cytotoxic T lymphocytes (CTLs). Those target cells that are recognized as foreign due to malignant transformations are killed by the cognizant CD8+ cells [16]. Initially, the CD8+ T cells are in their naïve state. After recognition of a peptide-HLA complex, they are activated to their effector CTL state and proliferate to clear cells presenting the peptide-HLA complex on their surface.

The cellular immune system responds to tumor cells by recognition of either tumor-associated antigens (TAAs, antigens that are overexpressed in cancer cells) [17, 18] or tumor-specific antigens (TSAs, antigens that are not expressed in most normal tissues) [19]. The therapeutic potential of a tumor antigen depends on an array of factors. In an effort to define the characteristics of the ideal cancer antigen, researchers at the American National Cancer Institute proposed to rank tumor antigens by therapeutic function, immunogenicity, oncogenicity, specificity, expression level and percentage of positive cells, stem cell expression, the number of patients with antigen-positive cancers, the number of validated epitopes in antigen, and cellular location of expression [20].

TAAs are not exclusive to tumor cells. They can, however, in certain instances, still elicit a tumor-specific response. TAAs can be divided into two subgroups: differentiation antigens and overexpressed antigens. Common for both types is the inherent risk that eliciting a sufficiently strong T cell response may induce systemic autoimmunity against healthy cells carrying the antigen. A number of overexpressed antigens have been characterized, as an array of genes involved in regulating tumor growth, replication, as well as apoptosis, and can severely affect the health of the cell if dysregulated [21].

Mutations in tumor cell genes can lead to changes in the primary or secondary protein structure that may affect immunogenicity of antigens [22]. Specifically, sequence changes in short peptides can change peptide binding affinity for HLA, and thus change subsequent responses by T cells [23]. Similarly, mutations are likely to induce changes in secondary structure that can affect, to some extent T cell recognition, but primarily change the affinity and avidity of circulating antibodies for the target [24]. The main drawback of TSAs is that they are mostly patient specific, and therefore, not applicable for broadly neutralizing therapies. However, they are likely to be truly tumor specific and are often found in driver genes, thus making them less susceptible to immune escape by immunoediting [25].

Bioinformatics for T cell-based cancer immunotherapy

Identification and selection of antigens is a multifaceted task that depends both on the type and on the application of antigens. In 2000, Rino Rappuoli formalized the role of computational analyses in vaccinology in a conceptual framework termed “Reverse Vaccinology” [26]. Originally formulated to facilitate vaccine target discovery in pathogens, the concepts of reverse vaccinology can be expanded for applications in cancer immunology. Reverse vaccinology revolves around sequence analysis, whereby the genomic sequence is used to catalog all potential molecular antigens. In simple viral and prokaryotic pathogens, essentially all protein products are potential antigens, whereas the majority of tumor tissue proteins are not aberrant, thereby rendering them poor therapy targets. Therefore, cataloging antigens in tumor cells require additional pre-screening.

Once the catalog of potential antigens has been established, the reverse vaccinology pipeline calls for in silico prediction of vaccine targets—a process that is characterized as being completely naïve, and in that targets predicted from sequence, such as predicted HLA binders, are not characterized in terms of intracellular preprocessing, conservation, or in vivo expression. Additional computational and experimental pre-screening of epitope candidates must, therefore, be performed before they can be included as therapy targets. Potential T cell epitopes, must be examined in terms of preprocessing by the proteasome, transport by the TAP, HLA binding, stability of peptide-HLA complex [27], peptide-HLA binding to the T cell receptor (TCR) [28], and in vivo expression.

Potential T cell epitopes should then be examined for stability of their expression in tumor cells to ensure lasting immunological control or clearance of targeted tumor cells. However, the heterogenic nature of cancer means that no given cancer type has a uniform molecular profile and several cancers are subclassified into a number of characterized or uncharacterized classes with varying prognostic and therapeutic outcome [2932]. Adding further to this property, evidence of intra-tumor heterogeneity is beginning to surface for certain tumors [33, 34]. Additionally, given that immunoediting of tumor antigens is based on somatic mutations, it is extremely difficult to predict the antigenic phenotype after clonal selection. It is, therefore, often observed that tumors develop tolerance to immunotherapy after a limited period of time of successful treatment [35, 36].

Cataloging, predicting, and selecting immunotherapy targets can be extensively addressed using existing bioinformatics tools and biological databases. Figure 1 presents a schematic overview of the general process. In the following sections, we present examples of bioinformatics analyses for antigen cataloging and immunotherapy target discovery with examples of application for each task. A number of tools and databases were used for the example applications and many more exist. It is beyond the scope of this review to assess and catalog all existing tools, but in the following section we present an overview of some of the most commonly used tools and data resources for each of the tasks.

Fig. 1
figure 1

General overview of bioinformatics supported discovery of potential T cell-based immunotherapy targets in cancer tumors. Green operations are preliminary and final laboratory analyses; red operations are bioinformatics analyses performed using computational tools (described in further detail in the following sections); blue operations are cross-references with biological databases (likewise described in further detail in the following sections), and yellow operations denote intermediary outputs from the analyses and cross-referencing

Cataloging potential antigens

Identification of potential antigens de novo from genomic sequence using bioinformatics tools is highly challenging, as expression of proteins is regulated by an array of complex regulatory mechanisms, many of which are poorly understood. Traditionally, tumor antigens are identified in vitro from serum by screening cDNA phage libraries using immunoassays [37] or proteomics-based screening [38], but bioinformatics tools are perfectly suited to aid this process, either by actively identifying novel tumor antigens or by organizing and accessing information about known tumor antigens in accessible databases.

In silico screening for novel tumor antigens

Large-scale screening of mRNA from public databases is a proposed method for active identification of novel tumor antigens [39, 40]. Specifically, comparing expression profiles of tumors and healthy tissue can elucidate genes that are overexpressed or expressed exclusively in malignant tumors. High-throughput genomics methods have enabled large-scale screening of gene expression. These include nucleotide microarray technologies [41] and next-generation RNA sequencing [42]. Preprocessing of raw outputs from expression data depends entirely on the platforms used for the experiments. Tools for most, if not all, platforms can be preprocessed and analyzed in the statistical environment R, with packages from the Bioconductor software project [43]. Once preprocessed, the most widely used computational tools for analysis of gene expression data are the packages RMA (Robust Multichip Average) [44] and Limma [45] for R or one of several graphical user interface tools, most notably the Gitools software [46]. However, for an antigen to be suitable for immunotherapy, it must be expressed at protein level as well. Large-scale proteome studies include technologies of protein microarrays [47], antibody microarrays [48] mass spectrometry-based proteomics [49], and mass cytometry screening [50]. Recent studies of the proteome of breast cancer have revealed molecular features of tumorigenesis [51], and proteomics studies are gradually approaching a scale where whole proteome screening is feasible [52]. Databases such as the Gene Expression Omnibus [53] contain MIAME standard compliant expression data [54] and a number of specialized cancer genomics data repositories such as The Cancer Genome Atlas (TCGA) consortium data portal (https://tcga-data.nci.nih.gov/tcga/), which contains standardized data (including protein expression) for a large number of cancer types, Oncomine [55] (mRNA expression data for multiple cancer types), and the HemaExplorer [56, 57] (mRNA expression data for healthy and malignant hematopoietic cells). Although tumor antigens must be expressed at protein level to be useful for antigen-based cancer immunotherapies, mRNA expression profiles can serve as useful pre-screenings, subject to additional experimental validation, including proteomics verification, as correlation between RNA expression and protein expression vary between different protein families [58]. Lastly, an expression profile database such as UniGene (http://www.ncbi.nlm.nih.gov/unigene) and The Human Protein Atlas [59] can be consulted to ensure that potential target tumor antigens are not similarly expressed in healthy tissues on genetic and protein levels. Nucleic Acids’ Research offers a comprehensive catalog of maintained biological databases [60].

Cross-referencing expression data with known tumor antigens

A large number of studies presenting potential tumor antigens are published each year. Cross-referencing tumor gene expression and protein expression profiles with previous experimental efforts enable fast cataloging of potential targets. Data resources for tumor antigens include the Cancer Immunity peptide database of T cell-defined tumor antigens [61], a static listing of tumor T cell antigens provided by Parmiani and colleagues [21], CTdatabase of cancer-testis antigens [62], and the TANTIGEN database of T cell tumor antigens (http://cvc.dfci.harvard.edu/tadb/index.html). Genes or proteins previously identified as TSAs (and expressed in a target sample), or identified as TAAs (and overexpressed compared to normal tissue from the same patient), are subject to further investigations as a potential immunotherapy targets if they are expressed at appropriate levels in a given tumor sample.

Assessing potential tumor antigens

Once one or several potential tumor antigens have been identified, further information can aid in the pre-experimental assessment of the antigens. This process is best illustrated with an example of information collection and antigen assessment of the well-characterized tumor antigen, HER2.

Information about HER2, relevant to assessing its role as a tumor antigen is located and extracted from a number of different biological databases. Table 1 lists information relevant to assessing the suitability of HER2 as an antigen in a number of different cancers. HER2 is an epidermal growth factor that is amplified in about 20–40 % of invasive breast cancers [63]. Whereas normal tissue generally has low expression of HER2, breast cancer cells can have up to 50 copies of the encoding ERBB2 gene and up to 100-fold increased protein expression [64], with heavy correlation to a poor clinical outcome. These properties make HER2 a good marker for tumor tissue. HER2 has four isoforms produced by alternative splicing and alternative initiation. The isoforms overlap in identity by slightly less than half of the protein sequences, and a number of somatic mutations detectable on protein level are characterized in HER2. Since HER2 is present on the cell surface in large numbers, it is suitable for targeting by both cellular and humoral immunities, and a number of both T cell and B cell HER2 epitopes have been identified [65, 66]. A selection of tools and databases useful for cataloging potential tumor antigens can be found in Table 2.

Table 1 Annotation of HER2 for assessing potential as tumor antigen
Table 2 Sample of analytical tools for cataloging tumor antigens

Prediction of potential T cell epitopes

Each of the cellular processes responsible for T cell epitope preprocessing is rate limiting in the classical T cell-mediated immunity. The prediction of immunogenicity is, therefore, a non-trivial task, which is divided into predictions of each epitope processing step.

HLA binding prediction

Prediction of peptide processing events, such as proteasomal cleavage [7476] and TAP transport [7779], has been explored, but evaluations suggest that these methods are still not optimal [80]. Algorithms for predicting HLA binding affinity are superior in accuracy and highly accurate for a number of HLA alleles. Most currently maintained algorithms have been thoroughly reviewed and benchmarked in [81].

Prediction of peptide binding affinity to HLA class I and class II can be performed with a host of prediction algorithms (extensively reviewed in [82, 83]). The overall best performing predictor for HLA class I binding is the artificial neural network (ANN) and weight matrix-based prediction tool, netMHC 3.2 [84], and the best for class II is the ANN predictor netMHCII 2.2 [85]. Other highly accurate classification algorithms include BIMAS [86], SYFPEITHI [87], novel ensemble methods PM and AvgTanh [88], and various averaging methods [89]. Pan-specific prediction algorithms such as NetMHCpan [90] include HLA allele sequence in the binding prediction, which has been shown to increase accuracy for certain alleles [81]. Additionally, these methods enable prediction to a large number specific HLA alleles as well as HLA supertypes. A number of prediction algorithms combines prediction of HLA binding with prediction of proteasomal cleavage and TAP transport, for example, NetCTL [91] (for an extensive review of CTL prediction algorithms, please refer to [91] ), but as peptide preprocessing predictions are not nearly as accurate as HLA binding predictions, including these additional algorithms do not enrich the prediction.

A number of known HLA binders have been shown to be unable to elicit immune response, a phenomenon referred to as holes in the T cell repertoire [92]. Recently, it has been shown that stability of the peptide-HLA (pHLA) complex is likely a better predictor for immunogenicity of a peptide than the affinity of the binding, as immunogenic peptides are generally more stably bound to HLA [27]. Similarly, assessment of pHLA complex binding to the TCR has been explored as a predictor of immunogenicity [93]. However, at present, only 21 crystal structures of pHLA-TCR complex are completed, which is not sufficient basis to train a generally applicable classifier. Other approaches to evaluating immunogenicity include prediction of T cell reactivity based on an array of physiochemical properties [94, 95].

To predict HLA binders of tumor cell samples, the tumor tissue exome should ideally be sequenced and potential epitopes should be predicted from the translated sequence. If tumor tissue sequence is not available, canonical sequences can be extracted from protein sequence databases such as UniProt [72] or from primary repositories for cancer sequences such as GChub (https://cghub.ucsc.edu/). Predicted binders can be cross-referenced with a tumor antigen database such as TANTIGEN to check whether experimental validation has been previously performed.

Predicting HLA binders for 9-meric peptides in HER2 using netMHC 3.4 yields potential binders to a number of HLA alleles. A closer look at HLA A*02:01 reveals 52 predicted binders, of which one is an experimentally validated binder, found by cross-referencing with TANTIGEN. Some candidate binders are conserved across all isoforms and mutated forms of HER2, while others are found only in some isoforms. Table 3 shows peptides binding HLA A*02:01 that are either conserved in all isoforms, or positions where all variant peptides bind HLA A*02:01, discovered by block conservation analysis [96]).

Table 3 Peptides from HER2 predicted to bind HLA A*02:01

Prior to preclinical testing of the predicted epitopes, experimental validation of the candidates is important. Since peptide preprocessing is still unknown for predicted HLA binders, appropriate peptide processing and in vivo binding should be confirmed experimentally before epitopes are included in vaccine constructs. Large-scale T cell epitope validation is enabled by mass spectrometry [98] and flow cytometry-based methods [99].

Human immune system diversity

HLAs are among the most polymorphic molecules in the human genome and represent the most variable factor of human immune recognition. Comprised of more than 200 genes located on chromosome 6, three different HLA classes are defined [100]. Only class I and class II are involved in adaptive immunity and thus are a main focus in this review. For each class, several major and minor proteins are defined, which are in turn classified into supertypes [101, 102] and 9,310 individual alleles are reported in Release 3.12.0, (April 17, 2013) of the IMGT/HLA database [103]. Additional data resources for HLA allele sequences, clinical data, and population frequencies can be found in dbMHC [104] and The Allele Frequency Net Database [105] among others. Further bioinformatics resources for HLA research have been thoroughly reviewed in [106].

Specificity of HLA molecules is instrumental in determining resistance and susceptibility to invading pathogens and cancers. Owing to hereditability of HLA loci, specific alleles are often geographically clustered, meaning that some populations are more susceptible to, for example, EBV-related cancers [107]. T cell-mediated immunosurveillance of cancerous cells involves HLA restriction, which further complicates formulation of T cell-based therapies. Even if we ignore the variability of tumor antigens, the diversity of human immune response to T cell epitopes renders the identification of broadly applicable T cell-based immunotherapy targets highly challenging, and increases the search space of useful T cell targets in personalized therapies immensely.

Host HLA profile must be identified before predicting personalized targets, a task that can be done by DNA sequencing [108], RNA sequencing [109], or microarray-based approaches [110] (thoroughly reviewed in [100] ). Analysis of sequencing data can be performed by a large number of tools reviewed in [111]. If the aim is to produce more general therapies, population coverage tools such as the PopCover algorithm [112], the Block Conservation analysis web server (http://met-hilab.bu.edu/blockcons), or the IEDB Population Coverage Calculation tool [113] can be applied to ensure broad coverage by a combination of immune targets. A selection of useful bioinformatics resources for prediction of prediction of potential T cell epitopes is listed in Table 4.

Table 4 Sample of analytical tools for discovery of T cell epitopes for cancer immunotherapy

Selection of potential epitopes for immunotherapy

Conservation and variability analysis for immunotherapy target selection is a multidimensional problem. If one aims to define targets for general immunotherapies applicable to a broad cohort of patients, antigen diversity must be studied across the patient population. Due to high variability, even personalized vaccine targets are likely to be unstable over time, and the somatic process driving the selection is difficult to predict. There are, however, strategies for inducing lasting immune response against tumors, including epitope selection with increased emphasis on sequence stability, the combination of multiple therapy targets, and multi-epitope vaccination strategies [122].

Selection of stable epitopes

Comparative analysis of gene and protein sequences can reveal de novo SNPs and other somatic mutations. Peptides found in mutated protein regions unique to the tumor are candidates for epitope prediction. However, peptides found in highly variable regions are likely subject to frequent mutations and potentially lead to loss of antigen immunogenicity. Owing to the heterogenic nature of cancer and complex processes such as immunoediting, the landscape of tumor mutation is far from fully understood. However, potential epitopes can be analyzed for stability in the context of known mutations cataloged in databases such as COSMIC [123]. Similarly, splice variations of proteins and structural variations in the genome can influence the stability of a given epitope. This can be examined using databases such as DECIPHER [124] for chromosomal variations and UniProt [125] for protein isoforms. Identifying regions of limited variability and high stability, and choosing potential epitopes in these regions may increase the likelihood for sustained immune response. Variability also depends on the selection pressure exerted by the immune system during therapy, but regions of known high variability can be excluded.

Multi-epitope strategies

Epitopes in mutating regions of target proteins are not necessarily excluded as a valuable target for immunotherapy. Treatments can be composed of multiple epitopes from one or more proteins [122], such that if immunogenicity of one epitope is lost, the remaining set of epitopes may continue to confer protection. In a multi-epitope setting, mutating regions of target proteins can be of value if they contain epitopes—even if just for a limited time or a limited fraction of the tumor cells. Analysis of metabolic pathways of tumor cells may reveal potential epitopes in multiple proteins complementary to each other, which collectively can provide protection. Theoretically, co-targeting multiple epitopes in multiple proteins in pathways essential for tumor fitness should increase probability of sustained response [126]. The network analysis of signaling pathways and the perturbations by oncogenes was recently shown to successfully identify oncogenic targets. A sequential application of anticancer drugs increased the collective efficiency of the drugs targeting oncogenic signaling pathways [127]. The sequential administration of immunotherapies targeting different epitopes may also be advantageous. Multi-epitope approaches carry the inherent risk of raising a dominant response against one, or a few, of the administered epitopes. However, this can, in theory, be avoided by multiple site vaccinations [128]. Additionally, it has been shown that immunotherapies sometimes facilitate immune responses against additional antigens, not included as targets in the therapy, by a process referred to as “epitope spreading” [129] or “provoked immunity” [130].

The predicted HLA binders of HER2 shown in Table 3 are filtered by conservation in known isoforms and mutated forms. As can be seen, four of the predicted binders are found in all known forms of HER2 (positions 689, 949, 953, and 954), whereas six are found only in some (positions 767 and 823). The latter six peptides are located in potentially unstable regions, but as observed on position 767, each of the four variants are predicted to bind HLA A*02:01, making them potentially useful in a multi-epitope setting. Note, however, that only 244 uniquely mutated samples of HER2 have been identified (UniProt, May 28, 2013), giving an estimate of the variability of HER2. Additionally, the frequency of each mutation is unknown, so some mutations may impact a peptide’s suitability as an immunotherapy target more than others.

Targeting multiple antigenic protein

Targeting multiple epitopes within the same protein can be valuable to avoid loss of immunogenicity caused by mutations or splice variation. However, targeting a single protein does not address loss of immunogenicity by downregulation or complete loss of protein expression. It can, therefore, be valuable to target multiple epitopes in different proteins. Combined targeting of several antigens increases therapy flexibility and may increase the magnitude of the response. This approach is especially valuable when targeting proteins of similar or compensatory function in redundant pathways [131]. An examination of HER2 interactions recorded in the STRING (a database of direct (physical) and indirect (functional) associations [132] ) with known tumor antigens (from TANTIGEN) reveals a large number of proteins homologous to, co-expressed with, or interacting with HER2, based on recorded co-expression, data mining primary literature, or recorded interactions in specialized databases. The ten tumor antigens with highest scoring confidence relationship to HER2 are shown in Fig. 2. One of these is EGFR, which has previously been examined as a co-target with HER2 [131, 133]. Cross-referencing TANTIGEN shows that EGFR harbors T cell epitopes for potential immunotherapy targeting. In a similar fashion, functional homologues can be examined as novel targets.

Fig. 2
figure 2

Protein–protein interaction (PPI) network for HER2 and interacting tumor antigens (from TANTIGEN). PPI networks may indicate possible compensatory activity such as that for HER2 and EGFR, making interaction networks useful for elucidating additional potential targets. Nodes represent proteins and edges correspond to functional interactions. Thicker edges signify higher confidence in the interaction. Image was generated using the STRING database

Another strategy is to target multiple epitopes in proteins from different interacting pathways. The HER2 and the estrogen receptor (ER) signaling pathways are the dominant drivers of cell proliferation in 85 % of breast cancers, which make antigens of these pathways desirable therapy targets [134]. Another multi-epitope strategy could, therefore, involve targeting several antigens in both pathways to avoid therapy resistance if a single epitope is lost. Examining the HER2 and ER pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) the molecular signatures database MSigDB [135] reveals multiple potential antigens in each pathway. Lastly, exploring the ClinicalTrials.gov database (http://www.clinicaltrials.gov) may help direct the selection of useful targets. Table 5 lists a sample of data resources useful for co-target discovery.

Table 5 Sample of data resources for interaction and pathway analysis useful for co-target discovery

Future perspectives

At present, accurate epitope predictions are limited to cellular responses although prediction of antibody response is highly studied [139]. Computational methods for identification of T cell epitopes also have limitations, in that peptide preprocessing predictions are not yet as accurate as peptide binding prediction algorithms. Additionally, availability of tumor sequences represents a bottleneck in conservation and variability analyses, but this is likely to be remedied in a near future, as high-throughput sequencing becomes cheaper and more efficient. Another issue currently being addressed is that intra-tumor diversity may not be adequately captured by current methods, and may, therefore, impact the efficacy of immunotherapies and other therapies alike. Lastly, immunotherapy is largely a field of research that must be addressed using proteomics analyses rather than genomics analyses, the latter having been far more prolific in the past decade. Common to all these limitations is that they are currently being addressed in the wet laboratory to different extents, and all aspects of this progress will increase the need for bioinformatics tools and experts.

Bioinformatics for antibody-based therapies

Immunotherapies utilizing cellular surface proteins, such as antibody-based therapies [140] or the adoptive transfer of chimeric antigen receptor (CAR)-modified T cells [141] also stands to benefit from bioinformatics supported target selection. There are, however, limitations to the current computational tools for B cell epitope prediction as well as a lack of a dedicated data resource for cellular expression of surface proteins. Prediction of B cell epitopes have long been explored using a variety of approaches, but the accuracy of these algorithms—particularly for discontinuous epitopes—remains suboptimal, and their practical utility is limited [139, 142]. Computational prediction of membrane spanning protein regions has been explored for several decades [143], but before prediction of surface epitopes or selection of novel CAR targets can take place, a comprehensive data resource of validated surface protein expression should be assembled to provide the foundation for such exploratory analyses.

In silico assessment of susceptibility to immunotherapy

Successfully induced T cells, raised against a number of tumor T cell antigens, have been observed in peripheral blood, and yet clinical responses to immunotherapies have been limited. This indicates that barriers to immune response exist within the tumor environment, and as such, play a significant role in planning appropriate treatment modalities [144]. Therefore, personalized immunotherapy treatments are likely to benefit from thorough analysis of genetic and proteomic host factors related to tumor immune escape mechanisms and thereby a patient’s susceptibility to a given immunotherapy.

In response to the immune system’s role in curbing cancer cell growth and immunoediting it has recently been proposed that the immune system should be included in the traditional histopathological classification of tumors. The classical TNM staging system describes the extent of tumorigenesis based on tumor burden (T), the presence of cancer cells in draining lymph nodes (N), and status of metastases (M). In addition to these scores, a so-called immunoscore (I) can be determined on the basis of two leukocyte populations, namely cytotoxic CD8+ T cells and memory CD8+ T cells [145, 146]. Comparison of the infiltration-rate of these two cell types in the center of the tumor and in the invasive margin of the tumor, determines the “I” score for the tumor. In two independent cohorts, patients with a high I score had significantly less relapse and overall improved survival compared with patients with low I scores [147].

The TNM-I classification scheme in conjunction with predictive biomarkers for immune response could provide a reasonable estimate of the suitability of immunotherapy as part of a treatment modality in any given patient. However, the applicability of this approach is lessened by lack of functional studies on the topic. Firstly, not all cancers are resected, and even fewer have significant material left after normal histological assessment has been completed. Markers and leukocyte profiles that can classify patients on the basis of immunological markers in peripheral blood are, therefore, desirable although no such markers have been successfully correlated with clinical response to antigen-based immunotherapies [144]. Secondly, the complex interplay between genetic and proteomic elements makes it hard to elucidate single predictive biomarkers for accurate predictions. Therefore, no tools or data resources for immunotherapy susceptibility biomarkers exist as of yet.

Conclusions

Traditionally, mass experimental screening has been the primary tool to elucidate cancer immunotherapy targets, a process which could be streamlined by systematic application of bioinformatics on patient antigen expression and sequence profiles in conjunction with publically available biological data. The conceptual framework put forth in this review was applied to HER2 as an example of the proposed computational analyses. These methods will increase in accuracy as the genomic and proteomic landscape of tumor antigens is uncovered in greater breadth and depth in the wet laboratory. As tumor cells’ response to immunotherapies is gradually uncovered and the body of biological tumor data grows, so will the need for bioinformatics to organize, store, and analyze these data.