Epitope Mapping Using Randomly Generated Peptide Libraries

Protocol
Part of the Methods in Molecular Biology™ book series (MIMB, volume 524)

Summary

Characterizing the immune response towards a pathogen is of high interest for vaccine development and diagnosis. However, the characterization of disease-related antigen–antibody interactions is of enormous complexity. Here, we present a method comprising binding studies of serum antibody pools to synthetic random peptide libraries, and data analysis of the resulting binding patterns. The analysis can be applied to classify and predict different groups of individuals and to detect the peptides which best discriminate the investigated groups. As an example, the analysis of antibody repertoire binding patterns of different mice strains and of mice infected with helminth parasites is shown. Due to the design of the library and the sophisticated analysis, the method is able to classify and predict the different mice strains and the infection with very high accuracy and with a very small number of peptides, illustrating the potential of random library screenings in determining molecular markers for diagnosis.

Key words

Serum antibody repertoire Random peptide library Microarray Binding pattern analysis Machine learning Feature selection Diagnosis 

17.1 Introduction

Diseases are often diagnosed by testing serological antibody reactivity. This is the case for several infections, allergies, and autoimmune diseases. Well known examples are HIV infections and Hashimoto’s diseases, which are diagnosed by commercially available serum antibody tests.

Antibody reactivity tests are widely used in cases where the antigen eliciting the immune response is known. If the antigenic epitope is linear, a peptide representing the linear epitope might suffice as a diagnostic molecular marker. Several strategies for protein epitope mapping have been developed (1, 2, 3, 4, 5, 6). In this context, peptide microarrays have become a class of widely used tools for analyzing antibody binding (7, 8, 9, 10). For instance, Quintana et al. (11) were able to discriminate between mice resistant and susceptible to diabetes by analyzing the IgG autoantibody repertoire using protein and peptide microarrays.

Despite these methods being successfully used in the diagnosis and even prognosis of many diseases, they obviously fail if no antigen is known. However, the lack of a known antigen is not an impediment for diagnosis via serum antibody profiling. This problem can be circumvented by searching differentially recognized epitopes which do not stem from a pathogenic antigen. Instead, the epitopes can stem from synthetic random sequence peptide libraries. Here, peptides or groups of peptides which are recognized differently by a diseased group in comparison with a control group are chosen as candidate molecular markers.

The rationale behind the assertion that differentially recognized epitopes might be present in random peptide libraries derives from the fact that antibodies are strongly cross-reactive (2, 12). Indeed, studies conducted by Nóbrega et al. on serum binding reactivities to protein mixtures from cell extracts using quantitative immunoblot revealed significant reactivity differences between healthy and infected individuals (13, 14).

Screening for differentially recognized epitopes is best performed by using libraries of peptides printed on glass slides. Here, the binding of serum antibodies to each peptide is detected via fluorescence labelled secondary antibodies, and the resulting signal intensities of each individual peptide are committed to data analysis. The proposed analysis pathways are sketched in Fig.1c. The analysis aims at classification of a diseased group and a control. To gain a first impression on whether the data quality allows for easy classification, linear techniques to reduce data dimensionality, as principal component analysis (PCA), can be used. Linear discriminant analysis (LDA) on the first few principal components allows then for determination of classifier and prediction accuracy. However, only further analysis enables to enhance the classification accuracy and to determine the set of peptides most suitable for prediction. The selection of the most suitable peptides can be performed with a supervised learning, feature selection, and classification tool like potential support vector machines (P-SVM) (15).
Fig. 1

(a) Serum samples of the mice strains BALB/c and C57BL/6 are incubated and the inter-strain differences are used for classification and prediction. The intra-strain differences are analyzed by comparing healthy BALB/c mice with H. polygyrus infected BALB/c mice. The numbers in brackets represent the number of different sera in each group tested. (b) Binding pattern of serum antibodies to the random peptide library. TAMRA, as internal fluorescence control and murine IgM and IgG as secondary antibody controls are displayed four times on each array (upper left and right and lower right corner). Binding is detected by fluorescently labelled secondary antibodies (anti-IgM-Alexa Fluor 546, anti-IgG-Alexa Fluor 647). (c) Statistical analysis pathway. Read-out signal intensities do not need to be normalized. False-positive blank signals that derive from secondary antibody binding are eliminated in all data sets analysed (six out of 255 peptides for IgM). PCA is applied in order to reduce the dimensionality of the dataset by extraction of the highest variances. P-SVM and LDA are applied to classify the different groups. Best classification, though, is not achieved by considering the highest variances stemming from PCA, but from very few peptides selected by the classification and feature selection tool P-SVM. LDA allows a visualization of the classification by again reducing the dimension of the data set.

To illustrate this method, BALB/c mice were infected with the intestinal helminthic parasite Heligmosomoides polygyrus (H. polygyrus). Serum samples were collected before and 14 days after infection, and antibody reactivities were measured. Moreover, the antibody reactivities of the healthy BALB/c mice samples were compared with those of a second healthy mouse strain, C57BL/6. The investigated groups and number of samples are summarized in Fig.1a.

The binding reactivities were analyzed using a synthetic peptide library consisting of 255 different 14mer peptides. The sequences of the library’s peptides were determined randomly, based on amino acid frequencies corresponding to the amino acid’s appearance on solvent accessible protein surfaces. No repeat of three consecutive amino acids was allowed. The synthesized peptides were printed on glass slides. TAMRA derived peptide was attached to the glass as internal fluorescence control. Full-lenght mouse-IgM and mouse-IgG were included as secondary antibody controls. The peptide library was displayed in five identical sub-arrays on each slide. The incubation of serum with the peptide library (seeNotes1–8) and subsequent detection with fluorescence labelled anti-mouse IgM and IgG antibodies resulted in characteristic binding patterns (seeFig.1b). The resulting signal intensities were read out with a microarray scanner for subsequent data analysis (seeNote9 and 10). Normalization of data was not necessary (seeNote11). False-positive “blank” signals that derive from unspecific binding of secondary antibody were eliminated from all data sets, amounting to six peptides excluded by the anti-mouse IgM antibody.

The minimum set of differentially recognized peptides necessary for classification was selected using the classification and feature selection tool P-SVM (seeNote12). The selected peptides are listed in Table1. Surprisingly, already a single peptide classifies the mouse strains best, while three peptides are sufficient to discriminate samples of healthy and infected individuals. In both cases, classification results were 100% correct. The significance of the classification results is calculated by shuffling the data group labels and performing classification under the condition that the number of peptides used is the same as with the correct data labels. The significances are then calculated by the number of times a classification with the shuffled labels results in better or the same classification accuracies. As shown in Table2, the significance lies below p = 0.002 in both cases.
Table 1

Sequences of peptides for best classification of murine IgM binding patterns selected by P-SVM from a randomly generated library of 255 peptides

 

Sequences of peptides for best classification

 

Healthy BALB/c vs. healthy C57BL/6

SGFPDKIEFPTQDC

 

Healthy BALB/c vs. infected BALB/c

THEDFRYDDVFEGN

 
 

FFDEIIHSCRSQNG

 
 

VRQVQRSKKMHKKG

 
Table 2

P-SVM classification and leave-one-out prediction accuracies of murine IgM binding patterns for healthy mice of different strains and after infection with the nematode H. polygyrus

 

Healthy BALB/c vs. healthy C57BL/6

Healthy BALB/c vs. infected BALB/c

 

Classification accuracy

100%

100%

 

Significance p

<0.002

0.002

 

No. of features

1

3

 

Prediction accuracy

100%

100%

 

Significance p

<0.002

<0.001

 

No. of features

1

4

 

Given the small number of samples in each group, the accuracy of prediction was calculated by taking one data point out of the training set (“leave-one-out”), calculating the best classifying features, and using this classifier to predict the test data point. This procedure was repeated for all data points. As shown in Table2, an additional peptide is needed in the case of the healthy vs. infected mice to ensure the best prediction accuracy. Here also, the accuracy of prediction achieved 100% in both cases, again with a significance below p=0.002.

Taking together, the above classification results reveal that four peptides are sufficient to unstitch the three investigated mice groups. For better visualization of the classification, the four-dimensional space defined by these four peptides is reduced to a two-dimensional space using LDA. The resulting representation of all data points is depicted in Fig.2.
Fig. 2

Linear discriminant analysis of IgM binding patterns for two mouse strains and a group infected with a parasite using four peptides previously selected by P-SVM.

17.2 Materials

  1. 1.

    RepliTope™ Microarrays (JPT Peptide Technologies GmbH, Berlin, Germany), ready to use microarrays.

     
  2. 2.

    Ethanol.

     
  3. 3.

    Double distilled water.

     
  4. 4.

    Working buffer (T-PBS): 9.2mM Na2HPO4.12H2O, 1.6mM NaH2PO4.H2O, 150mM NaCl, 10% Tween 20, add double distilled water up to 1,000mL (pH 7.4).

     
  5. 5.

    Multiwell GeneFrame− (Abgene, Epsom, United Kingdom).

     
  6. 6.

    Serum concentration: microarrays are incubated with a 1:10 dilution of serum in working buffer.

     
  7. 7.

    Secondary antibody: goat anti-mouse IgG-Alexa Fluor 647 conjugate (20µg/mL) (Invitrogen, Carlsbad, CA), goat anti-mouse IgM-Alexa Fluor 546 conjugate (20µg/mL) (Invitrogen, Carlsbad, CA).

     
  8. 8.

    Centrifuge (Centrifuge 5403, Eppendorf, Hamburg, Germany).

     
  9. 9.

    Microarray scanner (Genepix 4200AL, Molecular Devices GmbH, Ismaning, Germany) and the associated GenePix Pro software.

     
  10. 10.

    Genespotter software (MicroDiscovery GmbH, Berlin, Germany).

     

17.3 Methods

17.3.1 Microarray Incubation and Signal Read-Out

  1. 1.

    Shortly wash microarrays with 100% ethanol.

     
  2. 2.

    Wash microarrays three times for two min with T-PBS, three times for 2min with deionized water, rinse with flowing deionized water and then dry by centrifugation (seeNotes1–3).

     
  3. 3.

    RepliTope Microarrays are pre-treated to minimize unspecific binding of the target antibodies. Therefore, no blocking step is required prior to incubation.

     
  4. 4.

    All incubations are performed using a five-well incubation chamber with a total assay volume of 45µL per well. Each well is incubated with 45µL of a 1:10 dilution of serum in T-PBS for 4h at room temperature (seeNotes4–8).

     
  5. 5.

    Remove incubation chamber and wash microarrays three times for 2min with T-PBS and three times for 2min with deionized water.

     
  6. 6.

    Incubate microarrays with 300µL of anti-mouse IgG-Alexa Flour 647 (20µg/mL) and IgM-Alexa Flour 546 (20µg/mL) in T-PBS for 1h at room temperature.

     
  7. 7.

    Wash microarrays three times for 2min with T-PBS, three times for 2min with deionized water, rinse with flowing deionized water and dry by centrifugation.

     
  8. 8.

    Fluorescence signals are measured on the GenePix 4200AL microarray scanner (seeNote9). Both lasers (535nm and 635nm) can be used simultaneously using a red (~650–690nm) and green (~550–600nm) emission filter. An image file is generated at a resolution of 10µm using the associated GenePix Pro software.

     

17.3.2 Data Processing and Statistical Analyses

  1. 1.

    Signal intensities are quantified with Genespotter™ software.

     
  2. 2.

    Statistical analyses are performed using MATLAB 7.0 (The MathWorks Inc.) and R 2.3.1 (16).

     
  3. 3.

    Principal component analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis. PCA transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. The function princomp from MATLAB 7.0 is used to calculate the principal components (17).

     
  4. 4.

    Linear discriminant analysis (LDA) is a method used in statistics and machine learning to find the linear combination of features which best separate two or more classes of objects. LDA was performed using the algorithm lda of R 2.3.1 (17,18,20) (seeNote12).

     
  5. 5.

    Potential support vector machines (P-SVM) (19) is a new supervised learning method, which can be used with kernels like standard support vector machine (SVM) methods. SVMs transform the data space into a high dimensional feature space in order to find a hyper-plane that separates the data points with maximum distance to the closest data points from both classes. P-SVM is able to synchronously select appropriate peptides (= features) for classification and prediction.

     

17.3.3 Closing

The results reveal that both classification and prediction via serum antibody reactivity can be performed based on an unanticipated small number of peptides carefully selected from a rather small synthetic random peptide library. The resulting high prediction accuracies emphasize the potential of nonantigenic, differentially recognized epitopes as molecular markers for diagnosis.

17.4 Notes

  1. 1.

    Water, ethanol and PBS should be filtered in order to remove small particles that might grate the microarray surface.

     
  2. 2.

    Handle the microarrays with care, always wear gloves and never touch the microarray surface.

     
  3. 3.

    Never add buffers and water directly on the microarray surface during the washing steps.

     
  4. 4.

    When drying the microarrays, they should intensively be pre-washed with double distilled water in order to remove all residual proteins and salts. Salts may corrode the microarray surface resulting in scratches disturbing signal read out.

     
  5. 5.

    Before incubation with the serum samples, the microarrays need to be dry, otherwise the multiwell frames will not stick to the microarray surface. When using single well incubation chambers this step is not required. We use multiwell incubation chambers because the sample volume is very small, which allows us to repeat an analysis without using large quantities of rare biological material.

     
  6. 6.

    Carefully drop the serum samples into the incubation chambers without touching the microarray surface with the pipette tip.

     
  7. 7.

    When using a multiwell incubation chamber, carefully attach the cover slip. Make sure that different samples in nearby chambers do not mix. Apply slight pressure to the cover slip in order to avoid air bubbles.

     
  8. 8.

    Make sure that the microarrays never dry out during incubation. Use a humidity chamber.

     
  9. 9.

    Assure yourself that prior to signal read out, the microarrays are dry, clean and that no residual adhesive from the incubation chamber resides on the microarray surface. This can cause problems with the engineering mechanics of the microarray scanner.

     
  10. 10.

    Store incubated microarrays at 4°C under inert gas. This will prevent fluorescence signals from bleaching for at least 3 months.

     
  11. 11.

    Always use microarrays from the same batch, printed at the same time, since different batches have shifts in signal read-out, which are hard to be corrected.

     
  12. 12.

    In case of different numbers of individuals per group, make sure to eliminate the group bias in the statistical methods (use ‘prior’ in lda of R 2.3.1 and ‘-b’ in P-SVM).

     

References

  1. 1.
    Wenschuh, H., Volkmer-Engert, R., Schmidt, M., Schulz, M., Schneider-Mergener, J., and Reineke, U. (2000) Coherent membrane supports for parallel microsynthesis and screening of bioactive peptides. Biopolymers 55, 188–206.PubMedCrossRefGoogle Scholar
  2. 2.
    Frank, R. (2002) The SPOT-synthesis technique. Synthetic peptide arrays on membrane supports–principles and applications. J. Immunol. Methods 267, 13–26.PubMedCrossRefGoogle Scholar
  3. 3.
    Geysen, H. M., Meloen, R. H. , and Barteling, S. J. (1984) Use of peptide synthesis to probe viral antigens for epitopes to a resolution of a single amino acid. Proc. Natl. Acad. Sci. U S A 81, 3998–4002.PubMedCrossRefGoogle Scholar
  4. 4.
    Weiser, A. A., Or-Guil, M., Tapia, V., Leichsenring, A., Schuchhardt, J., Frommel, C. , and Volkmer-Engert, R. (2005) SPOT synthesis: reliability of array-based measurement of peptide binding affinity. Anal. Biochem. 342, 300–311.PubMedCrossRefGoogle Scholar
  5. 5.
    Wenschuh, H., Gausepohl, H., Germeroth, L., Ulbricht, M., Matuschewski, H., Kramer, A., Volkmer-Engert, R., Heine, N., Ast, T., Scharn, D., and Schneider-Mergener, J. (2000)in Combinatorial Chemistry: A Practical Approach (Fenniri, H.), Oxford University Press, Oxford, UK, pp. 95–116.Google Scholar
  6. 6.
    Reineke, U., Volkmer-Engert, R. , and Schneider-Mergener, J. (2001) Applications of peptide arrays prepared by the SPOT-technology. Curr. Opin. Biotech. 12, 59–64.PubMedCrossRefGoogle Scholar
  7. 7.
    Tapia, V., Bongartz, J., Schutkowski, M., Bruni, N., Weiser, A., Ay, B., Volkmer, R. , and Or-Guil, M. (2007) Affinity profiling using the peptide microarray technology: a case study. Anal. Biochem. 363, 108–118.PubMedCrossRefGoogle Scholar
  8. 8.
    Reimer, U., Reineke, U., and Schneider-Mergener, J. (2002) Peptide arrays: from macro to micro. Curr. Opin. Biotech. 13, 315–320.PubMedCrossRefGoogle Scholar
  9. 9.
    Schutkowski, M., Reimer, U., Panse, S., Dong, L., Lizcano, J. M., Alessi, D. R. , and Schneider-Mergener, J. (2004) High-Content Peptide Microarrays for Deciphering Kinase Specificity and Biology. Angew. Chem. 116, 2725–2728.CrossRefGoogle Scholar
  10. 10.
    Jones, R. B., Gordus, A., Krall, J. A., and MacBeath, G. (2006) A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439, 168–174.PubMedCrossRefGoogle Scholar
  11. 11.
    Quintana, F. J., Hagedorn, P. H., Elizur, G., Merbl, Y., Domany, E. , and Cohen, I. R. (2004) Functional immunomics: microarray analysis of IgG autoantibody repertoires predicts the future response of mice to induced diabetes. Proc. Natl. Acad. Sci. U S A 101(Suppl 2), 14615–14621.PubMedCrossRefGoogle Scholar
  12. 12.
    12.Frank, S. A. (ed.) (2002) Immunology and Evolution of Infectious Disease. Princeton University Press, Princeton, NJ.Google Scholar
  13. 13.
    13.Reineke, U., Ivascu, C., Schlief, M., Landgraf, C., Gericke, S., Zahn, G., Herzel, H., Volkmer-Engert, R. , and Schneider-Mergener, J. (2002) Identification of distinct antibody epitopes and mimotopes from a peptide array of 5520 randomly generated sequences. J. Immunol. Methods 267, 37–51.PubMedCrossRefGoogle Scholar
  14. 14.
    Nobrega, A., Grandien, A., Haury, M., Hecker, L., Malanchere, E. , and Coutinho, A. (1998) Functional diversity and clonal frequencies of reactivity in the available antibody repertoire. Eur. J. Immunol. 28, 1204–1215.PubMedCrossRefGoogle Scholar
  15. 15.
    Haury, M., Grandien, A., Sundblad, A., Coutinho, A. , and Nobrega, A. (1994) Global analysis of antibody repertoires. 1. An immunoblot method for the quantitative screening of a large number of reactivities. Scand. J. Immunol. 39, 79–87.PubMedCrossRefGoogle Scholar
  16. 16.
    R Development Core Team (2007) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3–90051–07–0, URL http://www.R-project.org.
  17. 17.
    Jackson, J. E. (ed.) (1991) A User’s Guide to Principal Components. Wiley, Hoboken, NJ.Google Scholar
  18. 18.
    Ripley, B.D. (ed.) (1006) Pattern Recognition and Neural Networks. Cambridge University Press, New York, NY.Google Scholar
  19. 19.
    Hochreiter, S. and Obermayer, K. (2006) Support vector machines for dyadic data. Neural Comput. 18, 1472–1510.PubMedCrossRefGoogle Scholar
  20. 20.
    Venables, W. N. and Ripley, B. D. (eds.) (2002) Modern Applied Statistics with S. Springer, New York, NY.Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Juliane Bongartz
    • 1
  • Nicole Bruni
    • 1
  • Michal Or-Guil
    • 1
  1. 1.Systems Immunology GroupInstitute for Theoretical Biology, Humboldt University BerlinBerlinGermany

Personalised recommendations