Background

A B-cell epitope is defined as that part of antigen recognized by either a particular antibody molecule or a particular B-cell receptor of the immune system. It may be linear (continuous), i.e. a short contiguous stretch of amino acids, or conformational (discontinuous), consisting of sequence segments that are distantly scattered along the protein sequence and are brought together in spatial proximity when the protein is folded [1]. It has been estimated that more than ninety percent of B-cell epitopes are conformational [2, 3]. The main purpose of B-cell epitope prediction is to provide the facilities for efficiently rational vaccine design [4]. Furthermore, synthetic peptides mimicking epitopes, as well as anti-peptide antibodies, have many applications in the diagnosis of human diseases [5, 6]. Therefore B-cell epitope prediction is very important in medicine research.

Though B-cell epitopes can be directly identified using many biochemical or physical experiments, such as X-ray crystallography of antibody-antigen (Ab-Ag) complexes, these experiments are usually costly, time-consuming and are not always successful [7]. Computational methods to predict B-cell epitope are much more efficient and cost-effective. However they are mainly focused on the prediction of linear epitopes [814], because only few antigens are completely annotated with respect to their conformational epitopes, which makes it difficult to develop a conformational epitope prediction method. To the best of our knowledge, DiscoTope [15] and CEP [16] are the only two methods for conformational epitope prediction that are based on antigen structure information. Recently, researchers tested and evaluated existing epitope prediction methods on benchmark datasets, and concluded that the accuracies of these methods are not high enough to significantly reduce the experimental workload [1719]. Combining experiments with computational methods can tremendously improve the accuracy of the epitope prediction at a modest cost in biological experiments. Therefore, it has attracted the attention of many researchers, especially in integrating computational methods with random peptide libraries. Several researchers have reported encouraging preliminary results using phage-display peptide libraries [2029]. Mimotopes can be selected from phage-displayed random peptide libraries by affinity selection with monoclonal antibodies (mAb), so-called biopanning. The mAb affinity-selected mimotopes can be selected by their capacity of binding to the Ab directly against a given Antigen (Ag). Obviously, the mimotopes and Ag are both recognized by the same Ab paratope and thus mimotopes are expected to mimic natural epitopes. The purpose of the computational approach is to analyze the set of mimotopes and then to localize the mimicked region that is regarded as the epitope candidate. Thereafter, biological experiments, such as site-directed mutagenesis and deletion analysis, may be implemented for further validation.

Generally, a computational method has three steps to approach this goal: (i) the representation of the surface residues of the antigen; (ii) the search (or alignment) of the mimotopes (or motifs derived from the mimotopes) on the antigen surface; (iii) the output of the epitope candidates based on screening and clustering. Pizzi et al [20] were the first to combine computational methods with experimental results to assign epitopes. Recently, they published an improved method named MEPS [27]. In MEPS, the surface of antigen is represented by a collection of peptides below a certain length. The motifs that derived from the mimotopes are searched against this surface and alignment tools like BLAST can be directly used in the method. However, finding all given length simple paths (i.e. a sequence of neighboring residues) on a surface graph representing the exposed residues of the antigen is a NP-hard (Non-deterministic Polynomial-time hard) problem [29]. Subsequently, several computational algorithms were proposed, in which some new strategies were adopted [2126, 28, 29]. For example, SiteLight [23] divides the antigen surface into overlapping patches and then aligns each mimotope with each patch based on the maximal bipartite matching algorithm. Mapitope [22, 28] converts a set of mimotopes into overlapping residue pairs, then calculates them to rank the pairs' occurrences to obtain a set of major statistically significant pairs (SSP), and finally uses them to search the 3D structure of the antigen and links the SSP into clusters on the antigen surface. Lately, PepSurf [29], an epitope prediction program based on a color-coding algorithm [30], proposed to search all possible simple paths in the surface graph of an antigen and adopted a clustering strategy for epitope prediction. However, the running time of PepSurf depends exponentially on the length of a mimotope. Therefore, on their online server, each mimotope used must be less than or equal to 14 amino acids in length. Although epitopes and mimotopes are functionally equivalent, they seldom share a similar sequence. The mimicry is supposed to rely on similarities in physicochemical properties and similar spatial organization. Moreover, the binding site of an antibody is a surface, not just a continuous sequence, so the epitope prediction problem is outside the scope of classical string alignment algorithms. Searching all the surface residues on an antigen of interest for the mimotopes is problematical. Therefore, although numerous phage display library based algorithms have been proposed to characterize B-cell epitopes, the precise localization of the interaction site mimicked by the mimotopes on the antigen surface is still an open challenge [25, 29].

In this research, we presented a method, Pep-3D-Search, based on mimotope analysis for B-cell epitope prediction. In Pep-3D-Search, a promising ACO (Ant Colony Optimization) algorithm was proposed to search matching paths on an antigen surface with respect to the query mimotopes or a motif. The ACO algorithm adopted a novel heuristic strategy that makes it powerful in dealing with longer mimotopes or motifs. Moreover, the P-value calculation algorithm and the DFS (Depth-First Search) algorithm, a graph search algorithm, were used to screen and cluster the result paths at the output stage. A group of test cases, which were all taken from published data, were applied to Pep-3D-Search for validation of its performance. The experimental results showed that the predictive performance of Pep-3D-Search was comparable to other epitope prediction algorithms, and some novel, rational results were provided.

Implementation

Algorithm flow

The Pep-3D-Search algorithm flow is shown in Figure 1. Its input included a 3D structure of an antigen (a protein data bank (PDB) [31] file) and a set of mimotopes or a motif. Pep-3D-Search identified all exposed residues of the given antigen and created a surface graph of it. The algorithm can be employed in two modes. The first mode is the mimotope mode, which searched for matching paths on the antigen surface with each query mimotope by the ACO algorithm. All paths were scored to the corresponding mimotope according to an amino-acid substitution matrix. Putative candidate epitopes were then picked out by the P-value calculation algorithm and the DFS algorithm. The second mode is the motif mode, which directly mapped the motif onto the antigen surface using the ACO algorithm and took the top-scoring paths as epitope candidates.

Figure 1
figure 1

An algorithmic flowchart of Pep-3D-Search. Given the 3D structure of an antigen, Pep-3D-Search identifies all the surface residues and creates a surface graph. After that, it can be used in two modes: mimotope or motif. In mimotope mode, every mimotope received as an input is aligned to the antigen surface and the epitope candidates are obtained through screening and clustering of the matched paths. In motif mode, a motif received as an input is mapped on to the antigen surface. Subsequently, the top scoring paths are output directly as the epitope candidates.

Graphical representation of the antigen surface

A B-cell epitope typically is a solvent accessible surface consisting of some 15–20 exposed residues derived from 2 to 3 discontinuous segments of the antigen [32]. Whether or not a residue is exposed can be determined by its solvent accessible surface area (SASA). In this study, the exposed residues in the study antigen were determined by three steps: (i) the total SASA of a residue composed of N atoms was calculated by: SASA = ∑ N A i , where A i is the SASA of the i th atom and determined by the Surface Racer program 4.0 [33] with a probe sphere of radius 1.4 Å, corresponding to a water molecule; (ii) the relative solvent accessibility (RSA) of a residue was calculated as the SASA of the residue compared to the maximum exposed surface of the same residue type in an extended ALA-X-ALA tripeptide, where the maximum exposed surface of the residue X in the ALA-X-ALA tripeptide is that calculated by Ahmad al. [34]; (iii) A residue was determined as being exposed if the value of its RSA is greater than a predefined threshold (default = 5%). A surface graph representing the exposed residues, G = (V,E), was defined, where V is the vertex set consisting of all exposed residues, and E is the edge set, where any two vertices are connected by an edge if the Euclidian distance between the two vertices is not greater than a predefined threshold. In Pep-3D-Search, three methods were provided to calculate neighbor residue pairs on the antigen surface. Firstly the distance between the two residues was taken as the distance between the C α atoms of the two amino acids. Using C α atoms may better reflect the backbone positions. Secondly, the distance between the C β atoms was used, which may better reflect the side chain position (the C α atom was still used when it is a glycine because it does not have a C β atom). Thirdly, the minimum distances between all the heavy atoms of the two residues were used. In Pep-3D-Search, we used CA, CB and AHA to represent the three methods respectively and took CA as the default parameter with a distance threshold 7 Å.

The ACO algorithm

ACO is a multi-agent heuristic algorithm used for combinatorial optimization. It was inspired by the capability of real ants to find the shortest path between their nest and a food source. The original ACO algorithm was introduced by Dorigo et al [35] for solving the traveling salesman problem (TSP). Since then, many researchers have extended the original algorithm, and have successfully applied their new algorithms to large scale TSP and other problems like the vehicle routing, scheduling, routing in Internet-like networks, and so on [36]. The successful application of ACO algorithms in the TSP inspired us to develop a new heuristic algorithm for solving the mimotope prediction problem. Our aim was to find a simple path on a surface graph that yielded the alignment to a mimotope or a motif with a maximal score. Similarly to the TSP, our problem was an ordering problem, i.e. the algorithm's aim was to put the different vertices in a certain order. However, several different aspects had to be considered: (i) our problem is a partial vertex permutation of a graph, in which the number of vertices in the permutation equals the residue number in the mimotope (or the motif); (ii) the edge of any two neighbor vertices must be the same length, and scoring a resulting path is only dependent on a vertex permutation, totally irrelevant to the path length; (iii) in a resulting path, some insertions/deletions may be permitted. Therefore, some new strategies were needed for solving our problem. The details of these strategies are described below.

Definition of the pheromone trail and the heuristic information

The pheromone trail and the heuristic information are two important parameters in the ACO algorithm. Theoretically, the pheromone trail can give the artificial ants a global guide in their decision-making, whereas the heuristic information can guide these ants to explore better paths locally. The quality of an ACO application depends greatly on the definition of the meaning of the pheromone trail and the heuristic information [35]. According to the features of our problem, pheromone and the heuristic information for each edge on surface graph were defined as follows:

Let τ(k)(i, j) be the pheromone from vertex i to vertex j at the k th searching step in a solution, which encodes the favorability of visiting a certain vertex j after vertex i, where 1 ≤ kL, and L is the number of vertices in a resulting path (i.e. the number of residues in the mimotope or motif). In our approach, τ(k)(i, j) was assigned an initial value at the start point and was updated after each iteration.

Let η(k)(i, j) be the heuristic information from vertex i to vertex j at the k th searching step in a solution, which encodes the preference of visiting a certain vertex j after vertex i, where 1 ≤ kL, and L is the number of vertices in a resulting path. The value of η(k)(i, j) was assigned according to the input mimotope (or motif) and the amino-acid substitution matrix used (see Scoring amino acid similarities). For example, let the mimotope be "ANYNATRGTVSA", and a row of the amino-acid substitution matrix used is supposed to be: "A←A(2.14), K(0.44), I(0.39), G(0.25), V(0.07), D(-0.15), S(-0.22), N(-0.36), Q(-0.36), T(-0.4), F(-0.61), C(-0.61), E(-0.7), L(-0.73), M(-0.91), Y(-0.91), H(-1.15), P(-1.15), R(-1.67), W(-2.61)" which represents the scoring values of each amino-acid substitution for Alanine (A). It can be seen that the first, the fifth and twelfth amino acid in the mimotope are all alanine (A). In order to make the ants tend to find maximal alignment score in each step, for k = 1, 5 and 12, we will set η(k)(i, j) = 2.14 if the vertex j is a Alanine (A) and i is any neighbor vertex of j, and in the same way, η(k)(i, j) = 0.44, if the vertex j is a Lysine (K) and i is any neighbor vertex of j,..., finally, η(k)(i, j) = -2.61, if the vertex j is a Tryptophan (W) and i is any neighbor vertex of j. In this way, for all 1 ≤ k ≤ 12 and each edge on the surface graph, η(k)(i, j) can be defined and it naturally represents the preference of an ant in vertex i for vertex j in each searching step.

In the case of a motif, let Q = (q1, q2,...,q L ) be the motif, then q k (1 ≤ kL) may be a set of amino acids (e.g. [STDE], see Epitope prediction based on motif mapping), a gap (-) or a character "X" which means it can be any amino acid. When q k is a set of amino acids (the set is named S), η(k)(i, j) will be set to be the maximal value in all the scoring values of vertex i substitution for vertex j, where the vertex j belongs to the set S and i is any neighbor vertex of j; When q k is a gap or a character "X", η(k)(i, j) will be set to be the average value of the substitution matrix, if j and i are a pair of neighbors.

Scoring amino acid similarities

Algorithms for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The choice of the substitution matrix will directly influence the performance of the algorithms. However, the optimal substitution matrices used by the existing epitope prediction algorithms are generally not compatible with each other. Following comparison experiments, we chose the substitution matrix M_Blosum62 by Mayrose et al [29] as the default selection for the similar match mode. Moreover, we defined the substitution matrix STRICT as the default selection for the exact match mode, in which the scoring value of substitution between the same two amino acids is 1, whereas the scoring value of substitution between any two different amino-acids is 0. A simple path on the surface graph is a path in which all vertices are distinct. When an ant has no no-visited edge to connect to other vertices, it is allowed to jump to a no-edge-connected vertex if the distance between the two vertices is less than the double predefined distance threshold. In this situation, a gap can be left on its path. For each unmatched residue, a penalty was added.

According to the above analysis, two methods for scoring the similarity of amino acids are proposed. For mimotope analysis, the similarity score h(q i , p i ) of amino acids q i and p i is calculated by Equation (1):

h ( q i , p i ) = { m i n i m u m + p e n a l t y if  q i  or  p i  is a gap s ( q i , p i ) otherwise MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiAaGMaeiikaGIaemyCae3aaSbaaSqaaiabdMgaPbqabaGccqGGSaalcqWGWbaCdaWgaaWcbaGaemyAaKgabeaakiabcMcaPiabg2da9maaceaabaqbaeaabiGaaaqaaiabd2gaTjabdMgaPjabd6gaUjabdMgaPjabd2gaTjabdwha1jabd2gaTjabgUcaRiabdchaWjabdwgaLjabd6gaUjabdggaHjabdYgaSjabdsha0jabdMha5bqaaiabbMgaPjabbAgaMjabbccaGiabdghaXnaaBaaaleaacqWGPbqAaeqaaOGaeeiiaaIaee4Ba8MaeeOCaiNaeeiiaaIaemiCaa3aaSbaaSqaaiabdMgaPbqabaGccqqGGaaicqqGPbqAcqqGZbWCcqqGGaaicqqGHbqycqqGGaaicqqGNbWzcqqGHbqycqqGWbaCaeaacqWGZbWCcqGGOaakcqWGXbqCdaWgaaWcbaGaemyAaKgabeaakiabcYcaSiabdchaWnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKcabaGaee4Ba8MaeeiDaqNaeeiAaGMaeeyzauMaeeOCaiNaee4DaCNaeeyAaKMaee4CamNaeeyzaugaaaGaay5Eaaaaaa@7B1F@
(1)

Where minimum refers to the minimum value in the substitution matrix used; the values of penalty are set from 0 to -0.5 (default = -0.5); s(q i , p i ) is the observed substitution score in the substitution matrix used.

In the case of motif analysis, let Q = (q1, q2,..., q L ) be the motif and P = (p1, p2,..., p L ) be the resulting path on the surface graph, then we calculate the similarity score h(q i , p i ) (1 ≤ iL) by Equation (2):

h ( q i , p i ) = { a v e r a g e if  q i  is X or ( a gap ) m i n i m u m + p e n a l t y if  q i  is an amino-acid and  p i  is a gap s ( q i , p i ) if both  q i  and  p i  are amino-acids MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiAaGMaeiikaGIaemyCae3aaSbaaSqaaiabdMgaPbqabaGccqGGSaalcqWGWbaCdaWgaaWcbaGaemyAaKgabeaakiabcMcaPiabg2da9maaceaabaqbaeaabmGaaaqaaiabdggaHjabdAha2jabdwgaLjabdkhaYjabdggaHjabdEgaNjabdwgaLbqaaiabbMgaPjabbAgaMjabbccaGiabdghaXnaaBaaaleaacqWGPbqAaeqaaOGaeeiiaaIaeeyAaKMaee4CamNaeeiiaaIaeeiwaGLaeeiiaaIaee4Ba8MaeeOCaiNaeyOeI0IaeiikaGIaeeyyaeMaeeiiaaIaee4zaCMaeeyyaeMaeeiCaaNaeiykaKcabaGaemyBa0MaemyAaKMaemOBa4MaemyAaKMaemyBa0MaemyDauNaemyBa0Maey4kaSIaemiCaaNaemyzauMaemOBa4MaemyyaeMaemiBaWMaemiDaqNaemyEaKhabaGaeeyAaKMaeeOzayMaeeiiaaIaemyCae3aaSbaaSqaaiabdMgaPbqabaGccqqGGaaicqqGPbqAcqqGZbWCcqqGGaaicqqGHbqycqqGUbGBcqqGGaaicqqGHbqycqqGTbqBcqqGPbqAcqqGUbGBcqqGVbWBcqqGTaqlcqqGHbqycqqGJbWycqqGPbqAcqqGKbazcqqGGaaicqqGHbqycqqGUbGBcqqGKbazcqqGGaaicqWGWbaCdaWgaaWcbaGaemyAaKgabeaakiabbccaGiabbMgaPjabbohaZjabbccaGiabbggaHjabbccaGiabbEgaNjabbggaHjabbchaWbqaaiabdohaZjabcIcaOiabdghaXnaaBaaaleaacqWGPbqAaeqaaOGaeiilaWIaemiCaa3aaSbaaSqaaiabdMgaPbqabaGccqGGPaqkaeaacqqGPbqAcqqGMbGzcqqGGaaicqqGIbGycqqGVbWBcqqG0baDcqqGObaAcqqGGaaicqWGXbqCdaWgaaWcbaGaemyAaKgabeaakiabbccaGiabbggaHjabb6gaUjabbsgaKjabbccaGiabdchaWnaaBaaaleaacqWGPbqAaeqaaOGaeeiiaaIaeeyyaeMaeeOCaiNaeeyzauMaeeiiaaIaeeyyaeMaeeyBa0MaeeyAaKMaeeOBa4Maee4Ba8Maeeyla0IaeeyyaeMaee4yamMaeeyAaKMaeeizaqMaee4CamhaaaGaay5Eaaaaaa@CF5E@
(2)

Where average refers to the average value in the substitution matrix used; minimum denotes the minimum value in the substitution matrix used; the values of penalty is set from 0 to -0.5 (default = -0.5); s(q i , p i ) is the observed substitution score in the substitution matrix used.

Building a solution

The pheromone trail and the heuristic information defined above will now be used by the ants to find the best solutions. Suppose the number of residues in the mimotope is L. Every ant starts with a virtual original point named "O", which is permitted to connect to any vertex on the graph. Then an ant will randomly choose a vertex as its first vertex, and builds a solution going from a vertex to another connected vertex. The process will not stop until the ant has visited L vertices on the graph. At the k th searching step (1 ≤ kL), the probability that an ant A in a vertex i will choose a vertex j as its next vertex is given by equation (3):

P A ( i , j ) = { [ τ ( k ) ( i , j ) ] α [ η ( k ) ( i , j ) ] β g J A ( i ) [ τ ( k ) ( i , g ) ] α [ η ( k ) ( i , g ) ] β if  j J A ( i ) 0 otherwise MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaSbaaSqaaiabdgeabbqabaGccqGGOaakcqWGPbqAcqGGSaalcqWGQbGAcqGGPaqkcqGH9aqpdaGabaqaauaabaqaciaaaKqbagaadaWcaaqaaiabcUfaBjabes8a0naaCaaabeqaaiabcIcaOiabdUgaRjabcMcaPaaacqGGOaakcqWGPbqAcqGGSaalcqWGQbGAcqGGPaqkcqGGDbqxdaahaaqabeaacqaHXoqyaaGaei4waSLaeq4TdG2aaWbaaeqabaGaeiikaGIaem4AaSMaeiykaKcaaiabcIcaOiabdMgaPjabcYcaSiabdQgaQjabcMcaPiabc2faDnaaCaaabeqaaiabek7aIbaaaeaadaaeqaqaaiabcUfaBjabes8a0naaCaaabeqaaiabcIcaOiabdUgaRjabcMcaPaaacqGGOaakcqWGPbqAcqGGSaalcqWGNbWzcqGGPaqkcqGGDbqxdaahaaqabeaacqaHXoqyaaGaei4waSLaeq4TdG2aaWbaaeqabaGaeiikaGIaem4AaSMaeiykaKcaaiabcIcaOiabdMgaPjabcYcaSiabdEgaNjabcMcaPiabc2faDnaaCaaabeqaaiabek7aIbaaaeaacqWGNbWzcqGHiiIZcqWGkbGsdaWgaaqaaiabdgeabbqabaGaeiikaGIaemyAaKMaeiykaKcabeGaeyyeIuoaaaaakeaacqqGPbqAcqqGMbGzcqqGGaaicqWGQbGAcqGHiiIZcqWGkbGsdaWgaaWcbaGaemyqaeeabeaakiabcIcaOiabdMgaPjabcMcaPaqaaiabicdaWaqaaiabb+gaVjabbsha0jabbIgaOjabbwgaLjabbkhaYjabbEha3jabbMgaPjabbohaZjabbwgaLbaaaiaawUhaaaaa@939C@
(3)

Where τ(k)(i, j) and η(k)(i, j) are the pheromone and the heuristic information between i and j at k th searching step, respectively. So the preference of an ant A in vertex i for vertex j is partly defined by the pheromone between i and j, and partly by the heuristic favorability of j after i. Parameters α and β define the relative importance of the pheromone information and the heuristic information (default α = β = 2). J A (i) is the set of vertices that connect to i and have not yet been visited by the ant A in vertex i.

The fitness function

In order to guide the algorithm towards good solutions, a fitness function was defined to assess the quality of the solutions. Let Q = (q1, q2,..., q L ) be a mimotope (or a motif) of length L and P = (p1, p2,...,p L ) be a simple path on the surface graph obtained by an ant. Then, the alignment score between Q and P is defined as: S ( Q , P ) = i = 1 L h ( q i , p i ) MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaeiikaGIaemyuaeLaeiilaWIaemiuaaLaeiykaKIaeyypa0ZaaabmaeaacqWGObaAcqGGOaakcqWGXbqCdaWgaaWcbaGaemyAaKgabeaakiabcYcaSiabdchaWnaaBaaaleaacqWGPbqAaeqaaOGaeiykaKcaleaacqWGPbqAcqGH9aqpcqaIXaqmaeaacqWGmbata0GaeyyeIuoaaaa@4345@ , where h(q i , p i ) denotes the amino acid similarity score between q i and p i . Here, the average of the alignment score between Q and P is chosen to define the fitness of the solution P:

F ( P ) = S ( Q , P ) L MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOrayKaeiikaGIaemiuaaLaeiykaKIaeyypa0tcfa4aaSGaaeaacqWGtbWucqGGOaakcqWGrbqucqGGSaalcqWGqbaucqGGPaqkaeaacqWGmbataaaaaa@38EF@
(4)

Updating the pheromone trail

After all the ants have completed one iteration, the pheromones were updated. Firstly, we defined the elite ant as follows: an ant was appointed as the elite ant only if the fitness value of the path obtained by the ant was greater than a threshold. Only the elite ants were permitted to leave the pheromones on its own path. The pheromones were updated according to equations (5) and (6).τ(k)(i, j) = (1 - ρ)τ(k)(i, j) + Δτ(i, j)

Δ τ ( i , j ) = { F ( P ) if  ( i , j ) path  P  of the elite ant 0 otherwise MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeuiLdqKaeqiXdqNaeiikaGIaemyAaKMaeiilaWIaemOAaOMaeiykaKIaeyypa0ZaaiqaaeaafaqaaeGacaaabaGaemOrayKaeiikaGIaemiuaaLaeiykaKcabaGaeeyAaKMaeeOzayMaeeiiaaIaeiikaGIaemyAaKMaeiilaWIaemOAaOMaeiykaKIaeyicI4SaeeiCaaNaeeyyaeMaeeiDaqNaeeiAaGMaeeiiaaIaemiuaaLaeeiiaaIaee4Ba8MaeeOzayMaeeiiaaIaeeiDaqNaeeiAaGMaeeyzauMaeeiiaaIaeeyzauMaeeiBaWMaeeyAaKMaeeiDaqNaeeyzauMaeeiiaaIaeeyyaeMaeeOBa4MaeeiDaqhabaGaeGimaadabaGaee4Ba8MaeeiDaqNaeeiAaGMaeeyzauMaeeOCaiNaee4DaCNaeeyAaKMaee4CamNaeeyzaugaaaGaay5Eaaaaaa@6E8E@
(6)

Equation (5) consists of two parts and k represents the k th searching step. The left part makes the pheromone on all edges decay. The speed of this decay is defined by the evaporation parameter ρ (0 <ρ < 1) (default ρ = 0.05). The right part increases the pheromones on all the edges visited by the elite ants. The amount of pheromone that the elite ant deposits on an edge is defined by the fitness value of the path created by the ant, as in equation (6). In this way, the increase of pheromone for an edge depends on the number of the elite ants that use this edge, and on the quality of the solutions found by those ants.

In order to enhance exploration of ants and overcome the premature convergence of the ACO algorithm, an adaptive strategy was employed to determine the threshold (which was used to select the elite ants): (i) initially, the threshold was set to 1; (ii) within 300 iterations, if the total number of the elite ants determined in each iteration was less than 5, then the new threshold was set to equal the original threshold minus 0.1; within 20 iterations, if the total number of the elite ants determined in each iteration was greater than 10, then the new threshold was set to equal the original threshold plus 0.1. In addition, according to Stützle and Hoos [37], we defined an upper and lower limit (τmax and τmin) for the pheromone values. Stützle and Hoos defined τmax and τmin algebraically based on the probability of constructing the best solution found when all the pheromone values have converged to either τmax or τmin. In our approach, the aim of the ACO algorithm was mainly to provide a set of good quality solutions, rather than a best solution. Therefore we defined τmax as being equal to the maximum value minus the minimum value in the amino-acid substitution matrix used, and τmin as zero.

Output of epitope candidates

While running the ACO algorithm, all paths obtained by the elite ants were stored in a local database. How were putative epitope candidates produced from this set of paths? According to the different kinds of input sequences, i.e. a set of mimotopes or a motif, two different strategies were adopted. For the set of mimotopes, a clustering strategy was employed (described as next section); for the motif, the n highest scoring paths were chosen directly as the epitope candidates.

P-value calculation for a path

Typically, a set of input mimotopes contains a number of amino-acid sequences with different lengths. In order to rationally assess the paths obtained with different mimotopes, we calculated the probability of randomly obtaining a path with a specific score, i.e. P-value of the path. According to the work by Mayrose et al [29], the distribution of the scores of random paths can be approximated using an extreme value distribution, whose parameters are fitted from the empirical distribution using the method of moments. To obtain rational empirical distribution of alignment scores, we generated a set of m (default m = 106) random simple paths on the surface graph for every mimotope, and each random simple path was then aligned to the mimotope.

Creating a weighted graph of the result paths

We then selected those paths whose P-values were less than or equal to 10-3 as the result paths and created a weighted graph of the result paths G = (V, E), where V is the vertex set consisting of all the result paths, and E is the edge set, where any two vertices are connected by an edge if they share at least one residue. In addition, the weight of each vertex in G was defined as the P-value of the path.

Clustering the result paths based on DFS algorithm

The weighted graph defined above was generally unconnected. Each connection component in the graph, which may consist of several connected paths, can be regarded as a potential epitope candidate. Here, the DFS algorithm [30] was employed to compute all the connection components of the weighted graph. According to Mayrose et al [29], the surface accessible areas of 95% of all available epitopes in the PDB are not greater than 2000 Å2. Moreover, a native epitope is generally less than 40 residues. Therefore, if the surface accessible area of a connection component was greater than 2000 Å2 or the number of residues in the connection component was greater than 40, this connection component was reduced in size. By iteratively removing a path, the size was cut until the remaining part met the conditions. In each such iteration, the algorithm chose a path for removal such that the remaining connection components kept the maximum score. The score of a connection component was defined as the sum of -log (P-value) of the paths within it. As a consequence, n maximum score connection components were output as the n epitope candidates (default n = 3).

Results

Epitope prediction based on mimotope analysis

In order to assess the predictive performance of Pep-3D-Search, we applied it to ten test cases (see Table 1), which were all taken from other similar published data. These test cases fulfilled the following requirements: (i) a set of mimotopes were derived by screening an antibody in a biopanning experiment; (ii) a 3D structure of the antibody-antigen complex was available; (iii) the native epitope of each test case had been crystallographically defined. Due to the similar policy of fully scanning the mimotopes (or neighbor amino acid pair (AAP) derived from the mimotopes in Mapitope [22, 28]) versus the 3D structure of the antigen, we mainly compared the results from Pep-3D-Search with those from PepSurf [29] and Mapitope.

Table 1 The test cases used for Assessment of Pep-3D-Search's performance in mimotope anlysis.

Epitope prediction using antibody-antigen test cases

The first test group (antibody-antigen test cases in Table 1) contained eight test cases from Mapitope, PepSurf and Mimox [26]. The first test case (labeled 1jrh in Table 1) contains 59 mimotopes of 5 residues in length. Lang et al [38] further analyzed the detailed interactions between the mAb A6 and the interferon gamma receptor (IFNgR) by selecting 59 fragments of the IFNgR mutants with high affinity for the mAb A6 by phage display. These fragments can thus be regarded as mimotopes of the IFNgR and the crystal structure of the mAb A6-IFNgR complex has been resolved (PDB id: 1jrh). In the second test case (labeled 1bj1 in Table 1), mimotopes were obtained by a similar experiment to the first case, but here the Fab fragment of a humanized neutralizing antibody (also known as rhuMAb VEGF) was mutated and selected for binding to the vascular endothelial growth factor (VEGF) by phage display [39]. The structure of the rhuMAb-VEGF complex has been deposited in the PDB (PDB id: 1bj1). In test cases three to eight, the six sets of mimotopes were obtained by screening phage display libraries with the 17b [22], 13b5 [22], Herceptin [40], Bo2C11 [41], Cetuximab Fab [42] and 82D6A3 IgG [43] antibodies respectively (see Table 1), and their corresponding Ab-Ag complex structures have been resolved (PDB id: 1g9m, 1e6j, 1n8z, 1iqd, 1yy9 and 2adf). In addition, the native epitope for each test case (1–8) is present in the CED database [44]. We analyzed the mimotopes in the test cases with our Pep-3D-Search, PepSurf and Mapitope, respectively. The results predicted by the three algorithms and evaluation in terms of the Matthews correlation coefficient (MCC) [45], sensitivity and precision are shown in Table 2. The results in Table 2 show that our Pep-3D-Search successfully predicted all the mimotopes in all eight test cases. Especially, for the test cases 1bj1, 1n8z and 1yy9, the MCC, sensitivity and precision values of Pep-3D-Search were considerably superior to those of PepSurf and Mapitope. For the test case 1iqd, PepSurf yielded the best performance (MCC: 0.1272; sensitivity: 0.2581; precision: 0.5); though Mapitope achieved the highest precision (0.9375), it gave the lowest MCC (-0.3502) and sensitivity (0.1415); Pep-3D-Search yielded inferior prediction (MCC: 0.0356; sensitivity: 0.1277; precision: 0.375) with default parameters, whereas it obtained better prediction by using distance parameter CB with threshold 6.5 (MCC: 0.1604; sensitivity: 0.2326; precision: 0.625, see Table 3). Furthermore, for the test cases 1jrh, 1g9m, 1e6j and 2adf, Pep-3D-Search and PepSurf gave better predictions, while Mapitope failed in the test cases 1e6j and 2adf.

Table 2 Evaluation and comparison of the performances of Pep-3D-Search.
Table 3 Comparison of the predictive performance of Pep-3D-Search with different distance parameters (CB).

Using Pep-3D-Search for the prediction of protein-protein interacting sites

In order to compare Pep-3D-Search with previously published algorithms, we applied it to detect the interface residues of the interacting proteins for the two test cases, 1avz and 1hx1 (protein-protein test cases in Table 1), which were taken from PepSurf. Rickles et al [46] used the Fyn-SH3 domain to select a semi-combinatorial random peptide library and obtained 18 affinity-selected peptides. The co-crystal of Fyn-SH3 domain with its interacting protein Nef and Fyn-SH2 domain is now available (PDB id: 1avz). The second test case was taken from the work by Takenaka et al. [47]. They screened a random phage library against the 70 kDa heat shock cognate (Hsc70) protein and obtained a set of peptides that bind Hsc70. The structure of Hsc70 with its interacting protein Bag chaperone regulator has been deposited in the PDB (PDB id: 1hx1). For each of the above test cases, the prediction was compared to the 'true' protein-protein interacting site that was inferred using the 'Contact Map Analysis' server [48].

From Table 2, it can be seen that both Pep-3D-Search and PepSurf obtained better results than Mapitope. Especially, for the test case 1hx1, the results showed a complementarity between Pep-3D-Search and PepSurf: the 24 contacting residues of protein Hsc70 and Bag chaperone regulator inferred by Contact Map Analysis server were R205 KA (208–209) IE (211–212) MK (215–216) LE (218–219) IDTLIL (221–226) R234 RK (237–238) VK (241–242) Q245 L248 D252 E255; the 39 contacting residues predicted by Pep-3D-Search were GNS (150–152) E155 V157 K161 H164 K167 K171 AD (173–174) L200 K202 D204 R205 R206 KA (208–209) I211 M215 L218 FKD (230–232) R234 LK (235–236) RK (237–238) G239 VK (241–242) K243 Q245 AF (246–247) L248 AE (249–250); the 25 contacting residues suggested by PepSurf were K161 KHL (163–165) KS (167–168) E182 GI (185–186) D204 R205 R206 KA (208–209) I211 MK (215–216) I217 LE (218–219) E220 DT (222–223) L248 E255. From the above results, it is evident that in the predicted results of Pep-3D-Search, six epitope residues R234, R237,K238, V241, K242 andQ245 were missed by PepSurf, while in the predicted results of PepSurf, five epitope residues K216,E219, D222,T223 and E255 were missed by Pep-3D-Search.

The overall performance of each method was measured by average MCC, sensitivity and precision values. Compared with PepSurf and Mapitope, Pep-3D-Search achieved the best average MCC, precision values and second-best average sensitivity value (average MCC, sensitivity and precision values of predicted results by Pep-3D-Search were 0.1758, 0.3642, 0.6948; PepSurf were 0.1589, 0.3944 and 0.5409; Mapitope were 0.1053, 0.3404 and 0.4081, see Figure 2). In addition, Pep-3D-Search provides three parameters to calculate neighbor residue pairs on antigen surface, which are CB, CA and AHA. The experimental results that examined Pep-3D-Search's performance with different parameters are listed in Table 3 to 5. The overall performance analyses in terms of average MCC, sensitivity and precision values are shown in Figure 3. Generally, Pep-3D-Search obtained better results by using the parameter CA (distance threshold = 7) than by the other parameters. Subsequently the parameter CA with distance threshold 7 was set as the default.

Figure 2
figure 2

Overall performance evaluation of Pep-3D-Search using average MCC, sensitivity and precision values. From Figure 2, it can be seen that Pep-3D-Search obtained the best average MCC, precision values and second-best average sensitivity value; PepSurf obtained the best average sensitivity value and second-best average MCC and precision values; Mapitope gave inferior results in comparison with the above two methods.

Figure 3
figure 3

Overall performance analysis of Pep-3D-Search with different distance parameters CB, CA and AHA. From Figure 3, it can be seen that with parameter CA (DT (distance threshold) = 7), Pep-3D-Search obtained the best average MCC value (0.1758), precision value (0.6948), and the better average sensitivity (0.3642). In Pep-3D-Search the parameter CA with distance threshold 7 is set as the default.

Table 4 Comparison of the predictive performance of Pep-3D-Search with different distance parameters (CA).
Table 5 Comparison of the predictive performance of Pep-3D-Search with different distance parameters (AHA).

Epitope prediction based on motif mapping

Pep-3D-Search also provides the selection of predicting epitope based on motif mapping. The motif sequence can be derived from the set of mimotopes by using multiple sequence alignment tools such as ClustalW [49] or directly using the Mimox web service, and it is thus supposed to contain important residues for interaction of the Ab and the Ag. After mapping the motif sequence on to the antigen surface, Pep-3D-Search obtained a set of matched paths and those top-scoring paths were selected as the epitope candidates. In order to assess the performance of Pep-3D-Search, six test cases were applied and the results are listed in Table 6 and Supplementary Table S1 to S5 [see Additional file 1]. Here, we describe one experiment of the test case 1e6j (Table 6) in detail. The test case 1e6j is taken from Mapitope and Mimox. Enshell-Seijffers et al [22] used the mAb 13B5 (recognizing HIV-1 capsid protein p24) to select a phage displayed random peptide library and obtained a set of 16 mimotopes. The structure of p24 with 13B5 has been resolved [PDB: 1e6j], and the 13B5 epitope, which is composed of ALGPAATEE (204–210, 212, 213) TA (216–217), has been recorded in the CED database as CE0170. Using Mapitope, Enshell-Seijffers et al suggested that 13B5 epitope residues might consist of E187 D197 A204 GPAA (206–209) EE (212–213) A217, in which the epitope residues are marked in bold. It should be noted that when all parameters were set to default, Mapitope predicted candidate residues A194 N195 P196 D197 C198 A217 (i.e. among the six predicted residues, only one was epitope residue). Furthermore, Huang J et al [26] derived a motif sequence, [DE] V [FM] GPL [STDE] TX-X [DE], from the 16 mimotopes using Mimox. Mimox has no ability to directly analyze the motif sequence of this type, therefore they derived three fragments, GPL, ET and EE, from the motif by manual parsing. Using the three fragments as the motif sequences respectively, they predicted the 13B5 epitope using MIMOX. For the fragment GPL, the top two candidates given by MIMOX were G206 P207 L205 and G106 P49 L52; for the fragments ET, the top three candidates were E212 T216, E213 T216 and E212 T210; for the fragments EE, the top three candidates were E28 E29, E29 E28 and E212 E213. Using Pep-3D-Search we directly mapped the motif sequence, [DE]V [FM]GPL [STDE]TX-X [DE], on to the antigen surface of p24 to predict the 13B5 epitope. Under the similar match mode (i.e. using substitution matrix M_Blosum62, see Scoring amino acid similarities) and parameter AHA (distance threshold = 4), the top ten predicted candidates by Pep-3D-Search are listed in Table 6. From Table 6, we can see that the ten candidates all successfully localized in the epitope region. Especially, the eighth-ranked candidate gave the best results: D197 I201 L205 G206 P207 A209 E213 T210 M214 A217 T216 E212. Taking the top ten candidates together, we obtained a total of 25 residues suggested by Pep-3D-Search, which overlap 10 of the 11 epitope residues in the 13B5. The other five experiments for assessing the performance of Pep-3D-Search are similar to the procedure mentioned above, and their results are listed in Supplementary Tables S1 to S5 [see Additional file 1]. These experiments show that Pep-3D-Search is effective and efficient in predicting epitopes in motif mode.

Table 6 Epitope prediction of the test case 1e6j (chain: P) based on motif mapping : motif sequence taken from Mimox is [DE]V [FM]GPL [STDE]TX-X [DE]; native epitope recorded in CED (id: CE0170) is ALGPAATEE (204–210, 212, 213) TA (216–217); parameters of Pep-3D-Search are similarity mode and AHA (distance threshold = 4).

The searching capability of Pep-3D-Search

In general, the searching algorithm has a great impact on the effectiveness and efficiency of an epitope prediction program. Therefore it is the most important part of the whole design process. In Pep-3D-Search, the ACO algorithm, a kind of heuristic algorithm, is employed for searching mimotopes or motifs on an antigen surface. In order to evaluate the capability of the ACO algorithm for searching the target paths with various lengths on the antigen surface, we took gp120 (the envelope protein of HIV; chain G; PDB id: 1g9m; the residue number of the antigen is 304, see Table 2) as the target antigen and randomly selected the paths with lengths from 9 to 25 (odd numbers) residues on the antigen surface as the search goals. As shown in Figure 4, a path on the gp120 surface with 25 residues is localized firstly, E351 S347 K343 Q344 K348 I272 N234 G237 N94 K97 D99 M100 K487 V489 L226 V488 A224 A219 Y217 C218 Q246 V84 L86 N88 T240, in which the Euclidian distance of any two neighbor residues is less than or equal to 7.5 Å. From this path, 9 sub-paths with lengths from 9 to 25 (odd numbers) residues were randomly selected as the test cases (see Table 7 and Supplementary Table S6 in Additional file 1). Here, we describe one experiment in detail to explain the search process of the target path with 21 residues on the gp120 surface. The target path is E351 S347 K343 Q344 K348 I272 N234 G237 N94 K97 D99 M100 K487 V489 L226 V488 A224 A219 Y217 C218 Q246 (see Table 7). We used the target path itself and mutations of it as input sequence for Pep-3D-Search to localize the target path on the gp120 surface. Some residues on the original sequence were randomly changed (the mutation rates vary from 10% to 30%). From Table 7, it can be seen that Pep-3D-Search quickly localized the target path with 5000 iteration numbers. When the input sequence was the target path itself (ESKQKINGNKDMKVLVAAYCQ), the path localized by Pep-3D-Search with the iteration number of 5000 was E351 S347 K343 Q344 K348 I272 N234 G237 N94 K97 D99 V488 K487 V489 L226 V245 A224 A219 Y217 C247 Q246, which overlaps 19 of the 21 residues in the target path; when the iteration number was set to 25000, Pep-3D-Search precisely localized the target path. When the iteration number was 30000, the path localized by Pep-3D-Search was E351 S347 K343 Q344 K348 I272 N234 G237 N94 K97 D99 M100 K487 V489 L226 V488 A224 A219 Y217 C247 Q246. Though the twentieth residue (C247) on the localized path is not identical with the corresponding one (C218) on the target path in that position, they are all Cysteine. When a mutated sequence is used as input sequence, Pep-3D-Search still localized the region of the target path. For example, using ESKDR INGNC DMKVH VAAYA Q (the mutation rate is 25%) as input, Pep-3D-Search gave the top-ranked output: E267 T232 K231 N229 K485 F233 N234 G237 N94 ___ D99 M100 K487 V488 ___ I491 G222 A219 F223 A224 Q246 with 10000 iteration numbers. As shown in Table 7, although Pep-3D-Search got the worst result in the test case, it overlaps 10 of 21 residues in the target path.

Figure 4
figure 4

A path on the gp120 (the envelope protein of HIV) surface. The path on the gp120 surface, which is used to evaluate the searching capability of Pep-3D-Search, is composed of 25 residues, E351 S347 K343 Q344 K348 I272 N234 G237 N94 K97 D99 M100 K487 V489 L226 V488 A224 A219 Y217 C218 Q246 V84 L86 N88 T240, in which the Euclidian distance of the any two neighbor residues is less than or equal to 7.5 Å.

Table 7 Evaluation of the Pep-3D-Search's searching capability.

The experiments of other eight test cases for assessing Pep-3D-Search's searching capability are all based on similar procedures to the one described above. Those experimental results are listed in Supplementary Table S6 [see Additional file 1]. The experiments demonstrate the excellent search capability of Pep-3D-Search, especially when the length of the query sequence becomes longer; the iteration numbers of Pep-3D-Search for localizing the target paths on the protein surface did not change significantly. Thus, Pep-3D-Search can be used for quickly localizing the epitope regions mimicked by longer mimotopes (more than 20-residues), and the proposed ACO algorithm has further potential in other applications involving sequence-structure alignment.

Discussion

In this study we developed a method, Pep-3D-Search, for epitope prediction based on mimotope and motif analysis. An ACO algorithm was proposed for aligning a 1D mimotope sequence (or a motif sequence) to the 3D structure of an antigen, and P-value calculation based screening strategy and DFS algorithm based clustering strategy were employed in localizing epitope candidate regions. Compared with competing methods, our Pep-3D-Search adopts a simple and natural strategy to deal with matches, gaps and deletions in aligning a sequence to an antigen surface, which makes it more efficient and effective, not only for sequence search, but also for motif discovery.

We conducted different sets of experiments to assess our method's performance. The results show that our method is comparable to other similar methods. In some test cases, our method is superior to the others or can provide complementary information to them. On the other hand, in order to examine the searching capability of our method, a set of test cases with different-length sequences was constructed. The experiment showed that our method has excellent capability in searching sequences on a structure, especially when the length of the query sequence becomes longer (up to 25 residues); the iteration numbers of Pep-3D-Search for precisely localizing sequence did not change significantly. Thus the method has further potential for localizing the epitope regions mimicked by longer mimotopes. For example, using an mRNA display technique, one can obtain affinity-selected peptides of more than 20 residues against an antibody [50]. Moreover, the method also has potential for other applications, such as querying pathways in protein-protein interaction networks [51]. The Pep-3D-Search algorithm depends on several parameters that may influence its prediction accuracy, such as iteration number, gap penalty and distance threshold defining two neighbor residues. However, because of the limited availability benchmark datasets, we only examined a limited set of values for each parameter and were constrained in properly learning these parameters. In our experiments, varying these parameters within a reasonable range did not significantly influence the prediction results (see Table 3 to 5).

The Pep-3D-Search algorithm is basically divided into three steps: generating random paths on the surface graph of an antigen for P-value calculation (which is not needed for motif analysis), searching the optimal paths for each mimotope (or a motif), and clustering these paths into several epitope candidates. The running time of the algorithm mainly depends on the number of graph edges, the number of mimotopes, the length of each mimotope (or the motif), and the number of generated random paths for P-value calculation. For a mimotope with 14 or 15 amino acids, generating 106 random paths to obtain the empirical distribution of alignment scores for P-value calculation may take about 10 minutes (using a PC with a Intel Core 2 processor at 1.86 GHz); searching the optimal paths may take few minutes (the iteration number is 20000 in default); clustering paths can complete in a few seconds. So the main computational burden of the algorithm comes from the P-value calculation.

Theoretically, the estimation of the statistical parameters for an alignment score distribution function requires a large number of random paths on the surface graph of the antigen for aligning to the mimotopes. Actually, the number of the paths generated at random is determined according to a given time limit, so that the algorithm can make a trade-off between computational time consumed and the accuracy of the final results. We set the number to 106 in default. In general, when a set of mimotopes is to be analyzed, the running time of the algorithm will linearly increase with the number of mimotopes. However, because a collection of paths generated at random for P-value calculation can be used by all those mimotopes in the same length in the set of the mimotopes, the actual running time of the algorithm is much shorter in practice.

We plan to improve our method by further research in at least four areas: 1) by improving the method to identify surface-exposed residues in an antigen; 2) by attempting more effective strategies for searching a path and dealing with matches, gaps and deletions in aligning a sequence to antigen surface in the ACO algorithm; 3) by choosing a better amino-acid substitution matrix in scoring procedure for a specialized application; and 4) by studying more efficient methods for P-value calculation.

Conclusion

This research makes two valuable contributions to the field of epitope prediction. Firstly, a promising ACO algorithm was proposed to align a sequence or a motif to an antigen surface. Secondly, an application program, Pep-3D-Search, was developed for epitope prediction based on mimotope or motif analysis. As a stand-alone program in this area, Pep-3D-Search is publicly accessible [see Additional file 2]. The program was tested and evaluated by several datasets [see Additional file 1, 3, 4 and 5]. The results indicate that Pep-3D-Search is comparable to other similar tools.

Availability and requirements

Project name: Pep-3D-Search

Project's homepage: http://kyc.nenu.edu.cn/Pep3DSearch/

Operating system: Windows XP Professional with Service Pack 2(or later) with Microsoft .NET Framework 1.1(or later) installed

Programming language: Visual Basic.Net

License: GNU GPL

Any restrictions to use by non-academics: license needed for commercial use