Epitope Mapping Using Randomly Generated Peptide Libraries
Characterizing the immune response towards a pathogen is of high interest for vaccine development and diagnosis. However, the characterization of disease-related antigen–antibody interactions is of enormous complexity. Here, we present a method comprising binding studies of serum antibody pools to synthetic random peptide libraries, and data analysis of the resulting binding patterns. The analysis can be applied to classify and predict different groups of individuals and to detect the peptides which best discriminate the investigated groups. As an example, the analysis of antibody repertoire binding patterns of different mice strains and of mice infected with helminth parasites is shown. Due to the design of the library and the sophisticated analysis, the method is able to classify and predict the different mice strains and the infection with very high accuracy and with a very small number of peptides, illustrating the potential of random library screenings in determining molecular markers for diagnosis.
Key wordsSerum antibody repertoire Random peptide library Microarray Binding pattern analysis Machine learning Feature selection Diagnosis
Diseases are often diagnosed by testing serological antibody reactivity. This is the case for several infections, allergies, and autoimmune diseases. Well known examples are HIV infections and Hashimoto’s diseases, which are diagnosed by commercially available serum antibody tests.
Antibody reactivity tests are widely used in cases where the antigen eliciting the immune response is known. If the antigenic epitope is linear, a peptide representing the linear epitope might suffice as a diagnostic molecular marker. Several strategies for protein epitope mapping have been developed (1, 2, 3, 4, 5, 6). In this context, peptide microarrays have become a class of widely used tools for analyzing antibody binding (7, 8, 9, 10). For instance, Quintana et al. (11) were able to discriminate between mice resistant and susceptible to diabetes by analyzing the IgG autoantibody repertoire using protein and peptide microarrays.
Despite these methods being successfully used in the diagnosis and even prognosis of many diseases, they obviously fail if no antigen is known. However, the lack of a known antigen is not an impediment for diagnosis via serum antibody profiling. This problem can be circumvented by searching differentially recognized epitopes which do not stem from a pathogenic antigen. Instead, the epitopes can stem from synthetic random sequence peptide libraries. Here, peptides or groups of peptides which are recognized differently by a diseased group in comparison with a control group are chosen as candidate molecular markers.
The rationale behind the assertion that differentially recognized epitopes might be present in random peptide libraries derives from the fact that antibodies are strongly cross-reactive (2, 12). Indeed, studies conducted by Nóbrega et al. on serum binding reactivities to protein mixtures from cell extracts using quantitative immunoblot revealed significant reactivity differences between healthy and infected individuals (13, 14).
To illustrate this method, BALB/c mice were infected with the intestinal helminthic parasite Heligmosomoides polygyrus (H. polygyrus). Serum samples were collected before and 14 days after infection, and antibody reactivities were measured. Moreover, the antibody reactivities of the healthy BALB/c mice samples were compared with those of a second healthy mouse strain, C57BL/6. The investigated groups and number of samples are summarized in Fig.1a.
The binding reactivities were analyzed using a synthetic peptide library consisting of 255 different 14mer peptides. The sequences of the library’s peptides were determined randomly, based on amino acid frequencies corresponding to the amino acid’s appearance on solvent accessible protein surfaces. No repeat of three consecutive amino acids was allowed. The synthesized peptides were printed on glass slides. TAMRA derived peptide was attached to the glass as internal fluorescence control. Full-lenght mouse-IgM and mouse-IgG were included as secondary antibody controls. The peptide library was displayed in five identical sub-arrays on each slide. The incubation of serum with the peptide library (seeNotes1–8) and subsequent detection with fluorescence labelled anti-mouse IgM and IgG antibodies resulted in characteristic binding patterns (seeFig.1b). The resulting signal intensities were read out with a microarray scanner for subsequent data analysis (seeNote9 and 10). Normalization of data was not necessary (seeNote11). False-positive “blank” signals that derive from unspecific binding of secondary antibody were eliminated from all data sets, amounting to six peptides excluded by the anti-mouse IgM antibody.
Sequences of peptides for best classification of murine IgM binding patterns selected by P-SVM from a randomly generated library of 255 peptides
Sequences of peptides for best classification
Healthy BALB/c vs. healthy C57BL/6
Healthy BALB/c vs. infected BALB/c
P-SVM classification and leave-one-out prediction accuracies of murine IgM binding patterns for healthy mice of different strains and after infection with the nematode H. polygyrus
Healthy BALB/c vs. healthy C57BL/6
Healthy BALB/c vs. infected BALB/c
No. of features
No. of features
Given the small number of samples in each group, the accuracy of prediction was calculated by taking one data point out of the training set (“leave-one-out”), calculating the best classifying features, and using this classifier to predict the test data point. This procedure was repeated for all data points. As shown in Table2, an additional peptide is needed in the case of the healthy vs. infected mice to ensure the best prediction accuracy. Here also, the accuracy of prediction achieved 100% in both cases, again with a significance below p=0.002.
RepliTope™ Microarrays (JPT Peptide Technologies GmbH, Berlin, Germany), ready to use microarrays.
Double distilled water.
Working buffer (T-PBS): 9.2mM Na2HPO4.12H2O, 1.6mM NaH2PO4.H2O, 150mM NaCl, 10% Tween 20, add double distilled water up to 1,000mL (pH 7.4).
Multiwell GeneFrame− (Abgene, Epsom, United Kingdom).
Serum concentration: microarrays are incubated with a 1:10 dilution of serum in working buffer.
Secondary antibody: goat anti-mouse IgG-Alexa Fluor 647 conjugate (20µg/mL) (Invitrogen, Carlsbad, CA), goat anti-mouse IgM-Alexa Fluor 546 conjugate (20µg/mL) (Invitrogen, Carlsbad, CA).
Centrifuge (Centrifuge 5403, Eppendorf, Hamburg, Germany).
Microarray scanner (Genepix 4200AL, Molecular Devices GmbH, Ismaning, Germany) and the associated GenePix Pro software.
Genespotter software (MicroDiscovery GmbH, Berlin, Germany).
17.3.1 Microarray Incubation and Signal Read-Out
Shortly wash microarrays with 100% ethanol.
Wash microarrays three times for two min with T-PBS, three times for 2min with deionized water, rinse with flowing deionized water and then dry by centrifugation (seeNotes1–3).
RepliTope Microarrays are pre-treated to minimize unspecific binding of the target antibodies. Therefore, no blocking step is required prior to incubation.
All incubations are performed using a five-well incubation chamber with a total assay volume of 45µL per well. Each well is incubated with 45µL of a 1:10 dilution of serum in T-PBS for 4h at room temperature (seeNotes4–8).
Remove incubation chamber and wash microarrays three times for 2min with T-PBS and three times for 2min with deionized water.
Incubate microarrays with 300µL of anti-mouse IgG-Alexa Flour 647 (20µg/mL) and IgM-Alexa Flour 546 (20µg/mL) in T-PBS for 1h at room temperature.
Wash microarrays three times for 2min with T-PBS, three times for 2min with deionized water, rinse with flowing deionized water and dry by centrifugation.
Fluorescence signals are measured on the GenePix 4200AL microarray scanner (seeNote9). Both lasers (535nm and 635nm) can be used simultaneously using a red (~650–690nm) and green (~550–600nm) emission filter. An image file is generated at a resolution of 10µm using the associated GenePix Pro software.
17.3.2 Data Processing and Statistical Analyses
Signal intensities are quantified with Genespotter™ software.
Statistical analyses are performed using MATLAB 7.0 (The MathWorks Inc.) and R 2.3.1 (16).
Principal component analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis. PCA transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. The function princomp from MATLAB 7.0 is used to calculate the principal components (17).
Linear discriminant analysis (LDA) is a method used in statistics and machine learning to find the linear combination of features which best separate two or more classes of objects. LDA was performed using the algorithm lda of R 2.3.1 (17,18,20) (seeNote12).
Potential support vector machines (P-SVM) (19) is a new supervised learning method, which can be used with kernels like standard support vector machine (SVM) methods. SVMs transform the data space into a high dimensional feature space in order to find a hyper-plane that separates the data points with maximum distance to the closest data points from both classes. P-SVM is able to synchronously select appropriate peptides (= features) for classification and prediction.
The results reveal that both classification and prediction via serum antibody reactivity can be performed based on an unanticipated small number of peptides carefully selected from a rather small synthetic random peptide library. The resulting high prediction accuracies emphasize the potential of nonantigenic, differentially recognized epitopes as molecular markers for diagnosis.
Water, ethanol and PBS should be filtered in order to remove small particles that might grate the microarray surface.
Handle the microarrays with care, always wear gloves and never touch the microarray surface.
Never add buffers and water directly on the microarray surface during the washing steps.
When drying the microarrays, they should intensively be pre-washed with double distilled water in order to remove all residual proteins and salts. Salts may corrode the microarray surface resulting in scratches disturbing signal read out.
Before incubation with the serum samples, the microarrays need to be dry, otherwise the multiwell frames will not stick to the microarray surface. When using single well incubation chambers this step is not required. We use multiwell incubation chambers because the sample volume is very small, which allows us to repeat an analysis without using large quantities of rare biological material.
Carefully drop the serum samples into the incubation chambers without touching the microarray surface with the pipette tip.
When using a multiwell incubation chamber, carefully attach the cover slip. Make sure that different samples in nearby chambers do not mix. Apply slight pressure to the cover slip in order to avoid air bubbles.
Make sure that the microarrays never dry out during incubation. Use a humidity chamber.
Assure yourself that prior to signal read out, the microarrays are dry, clean and that no residual adhesive from the incubation chamber resides on the microarray surface. This can cause problems with the engineering mechanics of the microarray scanner.
Store incubated microarrays at 4°C under inert gas. This will prevent fluorescence signals from bleaching for at least 3 months.
Always use microarrays from the same batch, printed at the same time, since different batches have shifts in signal read-out, which are hard to be corrected.
In case of different numbers of individuals per group, make sure to eliminate the group bias in the statistical methods (use ‘prior’ in lda of R 2.3.1 and ‘-b’ in P-SVM).
- 5.Wenschuh, H., Gausepohl, H., Germeroth, L., Ulbricht, M., Matuschewski, H., Kramer, A., Volkmer-Engert, R., Heine, N., Ast, T., Scharn, D., and Schneider-Mergener, J. (2000)in Combinatorial Chemistry: A Practical Approach (Fenniri, H.), Oxford University Press, Oxford, UK, pp. 95–116.Google Scholar
- 11.Quintana, F. J., Hagedorn, P. H., Elizur, G., Merbl, Y., Domany, E. , and Cohen, I. R. (2004) Functional immunomics: microarray analysis of IgG autoantibody repertoires predicts the future response of mice to induced diabetes. Proc. Natl. Acad. Sci. U S A 101(Suppl 2), 14615–14621.PubMedCrossRefGoogle Scholar
- 12.12.Frank, S. A. (ed.) (2002) Immunology and Evolution of Infectious Disease. Princeton University Press, Princeton, NJ.Google Scholar
- 13.13.Reineke, U., Ivascu, C., Schlief, M., Landgraf, C., Gericke, S., Zahn, G., Herzel, H., Volkmer-Engert, R. , and Schneider-Mergener, J. (2002) Identification of distinct antibody epitopes and mimotopes from a peptide array of 5520 randomly generated sequences. J. Immunol. Methods 267, 37–51.PubMedCrossRefGoogle Scholar
- 16.R Development Core Team (2007) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3–90051–07–0, URL http://www.R-project.org.
- 17.Jackson, J. E. (ed.) (1991) A User’s Guide to Principal Components. Wiley, Hoboken, NJ.Google Scholar
- 18.Ripley, B.D. (ed.) (1006) Pattern Recognition and Neural Networks. Cambridge University Press, New York, NY.Google Scholar
- 20.Venables, W. N. and Ripley, B. D. (eds.) (2002) Modern Applied Statistics with S. Springer, New York, NY.Google Scholar