Deciphering evolution of immune recognition in antibodies
Antibody, the primary effector molecule of the immune system, evolves after initial encounter with the antigen from a precursor form to a mature one to effectively deal with the antigen. Antibodies of a lineage diverge through antigen-directed isolated pathways of maturation to exhibit distinct recognition potential. In the context of evolution in immune recognition, diversity of antigen cannot be ignored. While there are reports on antibody lineage, structural perspective with respect to diverse recognition potential in a lineage has never been studied. Hence, it is crucial to evaluate how maturation leads to topological tailoring within a lineage enabling them to interact with significantly distinct antigens.
A data-driven approach was undertaken for the study. Global experimental mouse and human antibody-antigen complex structures from PDB were compiled into a coherent database of germline-linked antibodies bound with distinct antigens. Structural analysis of all lineages showed variations in CDRs of both H and L chains. Observations of conformational adaptation made from analysis of static structures were further evaluated by characterizing dynamics of interaction in two lineages, mouse VH1–84 and human VH5–51. Sequence and structure analysis of the lineages explained that somatic mutations altered the geometries of individual antibodies with common structural constraints in some CDRs. Additionally, conformational landscape obtained from molecular dynamics simulations revealed that incoming pathogen led to further conformational divergence in the paratope (as observed across datasets) even while maintaining similar overall backbone topology. MM-GB/SA analysis showed binding energies to be in physiological range. Results of the study are coherent with experimental observations.
The findings of this study highlight basic structural principles shaping the molecular evolution of a lineage for significantly diverse antigens. Antibodies of a lineage follow different developmental pathways while preserving the imprint of the germline. From the study, it can be generalized that structural diversification of the paratope is an outcome of natural selection of a conformation from an available ensemble, which is further optimized for antigen interaction. The study establishes that starting from a common lineage, antibodies can mature to recognize a wide range of antigens. This hypothesis can be further tested and validated experimentally.
KeywordsGermline Mature Somatic hypermutation Data science Antigens Paratope Antibody Simulation Cluster Conformation
Complementarity Determining Region
Membrane Proximal External Region
Protein Data Bank
Proteins, Interfaces, Structures and Assemblies
Root Mean Square Deviation
Variable Diversity Junction
Variable region of heavy chain
Variable region of light chain
The antibody-antigen (Ab-Ag) is a miniature system to understand the process of evolution. Development of a B-cell starting from progenitor lymphoid cell to an immature B-cell marks the event of VDJ recombination leading to the formation of naïve germline antibody (Ab) . Exposure of an antigen (Ag) fosters affinity maturation leading to an iterative process of cell proliferation, extensive mutagenesis on the immunoglobulin gene and stringent Darwinian selection of B-cells producing higher affinity antibodies . Various studies have also reported insertions and deletions during somatic hypermutation [3, 4]. Alterations in the genetic machinery of immunoglobulin are tailored to mediate antigen recognition. Such genetic drifts are strictly dependent on the time of exposure and antigen persistence . Hence, at any given time, the incoming antigen acts as a stimulus that allows somatic mutations to be incorporated in the genome that may be favorable and rarely deleterious. Infrequent deleterious mutations result in formation of self-reactive receptors or B-cell transformation . These outcomes are generally eliminated by regulated cell-cycle checkpoints and negative selection processes in the germinal center. Nonetheless, the processes often undergo challenge that may result in breach of self-tolerance leading to autoimmune disorders. Favorable mutations, on the other hand, undergo affinity-based positive selection and facilitate recognition of bona fide antigen. Therefore, based on the incoming antigen, different pathways of maturation may ensue in the germline B-cells forming different advanced versions or siblings of a lineage. Detailed characterization of dynamics of antigen recognition by mature antibodies should provide unprecedented insight into the immune response. Thus, an Ab-Ag system serves as a perfect miniature model to study the evolution of molecular recognition i.e. how antibodies of an ontology diffuse into isolated molecular environment of the stimulating antigen enabling differentiated fixation of paratope topology. One way of dissecting this is by looking for structural changes in the lineage. While sequence based phylogeny has been implemented to analyze different facets of evolution of immunological recognition, the role of structural adaptation in a lineage has not been systematically explored.
Studies on antibodies inheriting the same set of germline genes suggest adoption of fundamentally different binding modes . In the present study, we have examined conformational features associated with recognition of a wide range of antigens by sibling antibodies. Additionally, it is hypothesized that any similarity despite divergence in the structural landscape obtained from simulation will explain their common origin.
Towards this end, structural data of antibodies of different lineages bound with significantly distinct antigens were collated and analyzed. Topological alteration of the CDRs, to accommodate distinct antigens was evident within individual lineage. To validate observation from analysis of static structures, antibodies of two germline lineages i.e. carrying the same VH gene set and bound to significantly different antigens were analyzed separately using explicit solvent molecular dynamics (MD) simulations. While sequence analysis suggested that somatic mutations have altered the geometry of the respective antibody, dynamics showcased that further structural divergence specifically in the paratope was brought about for preferential antigen binding. Despite somatic diversification, similar overall architecture of the descendants and the conformers could be envisaged as retention of germline imprint.
Germline-linked mature antibodies reveal structural heterogeneity
In the collated database, the chemistry of antigens bound by antibodies evolved from a common IGHV was analyzed. The data revealed a high level of diversity and individual uniqueness of the antigens (Fig. 1; Additional file 1: Table S1 and Additional file 2: Table S2). Antigens of individual sets could not be aligned because of distinctness in their chemical nature and topology. Even for protein or peptide antigens, sequence alignment revealed no immuno-dominant epitopes as seen for anti-peptide antibodies of VH1S127, VH1–53, VH1–39, VH1–80 etc. lineages from mouse and VH3–30, VH1–2, VH3–33, VH5–51 etc. lineages from human (Additional file 1: Table S1 and Additional file 2: Table S2). Chemically, the spectrum of epitopes was wide for mouse VH3–2, VH7–3, VH1–84, VH1–5 and VH5–17 lineages and for human VH1–69, VH3–23, VH4–59 and VH4–39 lineages ranging from proteins, peptides, nucleic acid to sugar and haptens.
In order to further assess contributions of CDR loops, interaction profile of the complexes in the light of H-bond formation was investigated. Engagement of H-chain was higher than L-chain in 71% mouse and 78% human data, with contributions of all three loops (Additional file 3: Figures S3 and S4). 14% (mouse) and 24.5% (human) data show no involvement of L-chain. In 12% mouse (PDB 1TPX of VH9–3, 1E6J of VH1S18, 1BAF of VH3–2 etc.) and 1% human (PDB 4DGV of VH3–33, 2JB6 of VH1–69, 4JFZ of VH3–23 etc.) data, H3 does not form any H-bond, implying significant contributions of CDRs H1 and H2. But, H3, in conjunction with other loops, is crucial for antigen recognition and binding as seen across rest of the data . In general, H3 plays a predominant role in defining the topography of the binding site . Shorter ones can create a cavity to accommodate peptides while long H3 loops can generate a definite finger-like topography [14, 15]. Thus, structural re-arrangement in all CDRs of the collated dataset and especially in H-chain that bear common genetic elements was surprising.
Thus, sibling antibodies exhibit conformational heterogeneity of the paratope to accommodate distinct antigens and interaction is mediated by different CDR loops with contribution of varying degrees across dataset. Even though H3 is a crucial player in antigen recognition, it may not necessarily be involved in direct contact with the antigen (as observed from H-bond analysis), signifying the involvement of other loops. The appropriately screened database thus served as an ideal resource to understand divergent evolution. However, mechanistic details would be better revealed by investigating dynamics of a lineage as it would shed light on time-dependent structural changes.
Ab-Ag complexes as test systems for study
Urokinase plasminogen activator surface receptor, Vitronectin
T-cell-specific surface glycoprotein CD28
Gp41 MPER peptide
Ectodomain D3 of IL-13
Mapping somatic mutations in mature antibodies
Somatic mutation and interacting residues in antibodies of VH5–51 and VH1–84 lineages
Interacting residues on H-chain
V H 5–51
Q1, V24, A30, E32, K58, I68, N76, I89
S31, W33, D54, D56, Y100C, R100F, T100G
S32, D56, S57, R60, N102, W34
V H 1–84
V2, E5, E6, M33, L36, R37, K39, V51, I56, D65, A71, I77, V78, H81, D85, N87,T92
V2, T16, R19, E23, S31, H35, C50, N55,V56, N59, D65, I71, R85, M86, T97
S31,Y101, G102, D104
V2, K3, V11, S28,N31, F32, H35, F52, H52A, D55, E58, D65, A70,
Y33, W50, D55, N56, T57, E58
Analysis of multiple sequence alignment of the mature and respective germline forms suggests that mutations are random within a lineage and mutational frequency is high in the CDRs as compared to framework regions. Variations in the framework region contribute to the orientation of VH-VL pairing and support the antigen-binding site [19, 20]. Despite sequence variation, the canonical structures of CDRs L2 and H1 of VH1–84 antibodies were conserved as they belonged to class 1 category. CDRL2 of VH5–51 antibodies assumed a common canonical class 1 category. This indicates that despite somatic mutations led sequence variation, some CDR loops of the antibodies had conserved structural framework. Further examination of the conformational ensemble using molecular dynamics simulation will advance our understanding of the structural principles that govern maturation associated binding to unrelated antigens, while bearing some degree of structural connectivity.
Conformational selection leads to structural divergence during maturation
Affinity maturation associated changes in VH1–84 and VH5–51 lineages were comprehended by carrying out 0.5 μs all-atom molecular dynamics simulation of bound and free mature counterparts. Conformational ensembles were analyzed by subjecting trajectories of bound and respective free forms together to k-mean clustering protocol. Frames with similar conformations could be clustered using Cα radius of 1.5 Å from the centroid. Apart from overall structural variations and connectivity of structural region, changes in the core of the paratope, in particular, were traced.
Further analysis of the sampled conformers showed that in all cases the bound form of the antibodies selected one from the spectrum of conformers sampled by the free form (same colour code of conformers in Figs. 5 and 6 between bound and free forms represent common cluster). The number of frames in a cluster was expressed as a percentage of the population adopting a conformation (Lower panels in Figs. 5 and 6). Conformation with highest percent population is the dominant conformation of the molecule. In case of ED10 the population of the dominant conformation for DNA bound antibody was 99.6%, while the population of the same conformer was only 11.6% in its free form. For bound anti-uPAR it was 92.4% as against 0.4% in its free form (Lower panel in Figs. 5). Similarly for VH5–51 antibodies, dominant conformation for bound 10G5H6 was 80.2% as opposed to 29.2% in its free form and the population for bound m66 was 100% as against 1.4% in its free form (Lower panel in Fig. 6). Thus the free form of antibody samples all possible conformers of which one is naturally selected and best optimized for preferential antigen binding. In case of 5.11A1, the free form did not sample the dominant conformation of the bound form. This indicates that the antibody presumably undergoes induced-fit and assumes a different conformation of the paratope to accommodate antigen CD28 (Lower panel in Fig. 5a).
Structural dynamics in light of V H mutation
In order to check if the structural divergence between the mature antibodies is due to somatic mutation or the degree of variability increases due to binding of incoming antigen, the last structure of free antibody obtained from simulation was compared with that of the crystal structure of bound antibody by structure superposition. Superposition of the free states of all antibodies obtained from simulation suggested structural divergence (Figs. 3c and 4c). Pairwise structural alignment of the free and respective bound states showed significant variation in the paratope region and particularly in mutated residues (Figs. 3d-f and 4d-e). Thus it can be said that while mutations tailor the overall geometry of individual antibodies of a lineage, it is the interaction with respective antigen that leads to additional topological alteration in their paratope.
Analyses of bonding pattern and binding energy of the complexes
The trajectories were further examined to check interface bonding pattern. For distance cut off of 3.5 Å, H-bonds between antibody and antigen were calculated using CPPTRAJ module of AMBER14. H-bonds with or above 30% occupancy across the trajectories are presented in Additional file 3: Tables S3 and S4. In VH1–84 lineage, anti-uPAR complex had 15 bonds (ASN_190 and TRP_325, GLY_217 and TYR_308, GLN_189 and TYR_483 being the most stable ones with 93, 93 and 88% occupancy respectively), Fab 5.11A1 complex had 4 (stable ones being GLU_316 and TYR_205, GLY_206 and TYR_280 each with 49% occupancy) and Fab ED10 complex had 6 H-bonds (with the highest occupancy being 66% between different atoms of DT5_1 with ASN_147 and TYR_145) (Additional file 3: Table S3). Fab m66 and Fab 10G5H6 complexes of VH5–51 lineage had 10 (stable being between TYR_107 and SER_245 with 74% occupancy and between LEU_240 and TRP_33 with 73% occupancy) and 15 H-bonds (most stable was between TYR_318 and LYS_93 with 94% occupancy) respectively (Additional file 3: Table S4). Different sets of H-bonds in the trajectories of the mature variants was a manifestation of mutations during maturation, suggesting independent developmental pathways. Binding surfaces for individual antigens followed the standard principle . A flattened surface was assumed by anti-protein antibodies viz. anti-uPAR, Fab 5.11A1 and Fab 10G5H6 antibodies as opposed to anti-peptide antibody (Fab m66). Due to the small size of DNA fragment, Fab ED10 showed a small groovy binding site buried deeper in the VH-VL interface. Together these findings suggest how the physicochemical factors govern reorganization to foster shape complementarity.
The study was aimed at understanding what factors shape antibodies coming from a common germline lineage to be able to bind significantly distinct antigens. While there are past reports that have analyzed maturation associated evolution, the contexts, however, were different from ours. Some reports are based on sequence analyses alone while some emphasize on dynamics of antibody evolution to a common epitope. The prime focus of our study was to strictly examine structural diversification of mature antibodies for a wide range of chemically distinct antigens. The approach of the study is data-driven, where available crystallographic structures have been analyzed and inferences drawn are validated by characterizing dynamics within lineages.
Analyses of static structural data from 35 (mouse) and 13 (human) IGHV families coupled with different light chain genes revealed that antibodies of a certain descent undergo topological tailoring of their binding pockets mediated by all the CDR loops to accommodate distinct antigens. The contribution of individual loop, however, varies across lineages. The dataset transcends a wide range of antigens covering hapten (24% in mouse and 5% in human), peptide (27% in mouse and 21% in human), sugar (5% in mouse and 5% in human), protein (41% in mouse and 69% in human) and nucleic acid (3% in mouse) (Fig. 1) that are typical in pathogens. Hence, conceptually our study is of credence as the diversity of epitopes facilitates a realistic understanding of host-pathogen interaction.
Sequence examination of two lineages indicates that different number and nature of mutations differentiate the antibodies from the common germline precursor. During affinity maturation, repeated rounds of mutations result in various intermediate stages of sequence diversification; hence the antibodies presumably represent different stages of evolution each following an isolated maturation pathway . If the extent of maturation of the reported structures could be investigated, additional insights could have been drawn. Sequence variations within the hypervariable regions shift the canonical structure framework relative to individual CDR loops to accommodate distinct antigens and maintain complementarity of interacting surfaces by reducing entropic cost [11, 31]. Further conformational landscape reveals that despite sequence variation led structural re-arrangement, the overall backbone geometry is similar. It can be envisaged that a common structural imprint of the germline is inherited. This suggests that while affinity maturation is highly stochastic, the evolution of the antibody repertoire is shaped by structural constraints. Examination of paratope sheds light on how one conformational state is naturally selected from the available spectrum and optimized to favor antigen binding. It is the optimization that brings about divergence in the paratope of individual antibody; evident from non-overlapping paratope topologies. Studies that reported maturation associated pre-ordering of the antibody combining site to favor antigen binding corroborates these findings [24, 32, 33, 34, 35]. Present concepts on protein structure and function postulate coexistence of several functional states even for a highly specific enzyme. It is the population of a distinct conformation that determines the specificity of the molecule . The observed homogeneity in bound antibody indicates selection of a functional state, thereby demonstrating narrowing down of specificity window leading to molecular divergence during affinity maturation [35, 37, 38, 39]. An analogy to this can be quoted in Darwin’s words from The Origin of Species, “from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved” .
It may be noted that the observations made from simulation studies provide an interesting perspective to the interrogation of structural changes in sibling antibodies that target different antigens. Whether the antigen is self or foreign has no bearing on our observation as the study primarily dwells with identifying structural principles of interaction governing evolution. Due to practical limitations, conclusions have been drawn from simulations of 5 complexes from 2 IGHV lineages. Subjecting more complexes to simulation would increase the confidence of the interpretation. Additionally, the hypothesis derived from the study, as is the case with all computational analyses, has to be further validated experimentally. The best experimental follow-up of this study would be to make recombinant germline antibodies and determine their three dimensional structures by X-ray diffraction in antigen bound and unbound states. Subsequently, simulation studies can be conducted to sample conformational space of the germline complexes and compare them with the mature counterparts to decipher structural diversification in the lineage.
The computational analysis of global data followed by dynamics illustrates the molecular mechanism of how modulation in the structure is paramount to biomolecular recognition. Data from various computational approaches used in this study support the existence of different routes of maturation. Structural adaptation of the paratope despite conservation of overall backbone architecture is a pivotal finding of the study. The results of the study present an interesting implication of molecular evolution leading to the generation of antibody diversity. Additionally, our analysis displays characteristic that can perpetuate the antibody diversity model  whereby the 4th level of diversity is attained after somatic diversification of a lineage leading to recognition of diverse antigens.
Data retrieval and compilation
Coordinate files of Ab-Ag complexes were retrieved from RCSB PDB (www.rcsb.org/pdb). Data so obtained were filtered and subjected to mining as mentioned in the result section. CDRs were identified using Kabat numbering system . Coordinate files of complexes were retrieved and segregated based on source of antibody into two groups, human and mouse. While more than 90% of the retrieved antibodies were obtained from immunized house mouse, the strain of the mouse was not considered as a criterion for selection of the antibodies as no two individuals have identical genetic make up, hence the immune repertoire would also vary.
Identification of germline origin of the antibodies and their clustering
Candidate sequences were queried for germline genes in IMGT Database using Ig BLAST tool 1.3.0 at NCBI with default settings [43, 44]. The antibodies were then clustered based on common germline VH origin. Data sharing no common origin, were discarded while the rest were grouped.
Structure based sequence alignment of all the lineages were performed in Chimera 1.11.2 . One antibody was randomly chosen as reference with which other structures were matched. RMSDs of CDRs were noted. Contacts between the complexes were noted from PDBsum . For multi-subunit antigen, the bonds between antibody and each chain of the antigen are added and reported. In complexes where PDBsum did not fetch any interaction information, PISA 1.48  was used to identify the contacts.
Selection of system
Three mature antibodies, anti-uPAR, Fab 5.11A1 and Fab ED10 that bound to uPAR, CD28 and DNA respectively of VH1–84 lineage from mouse formed a system (PDB ID: 3BT2, 1YJD, 2OK0 respectively). Two of four antibodies, Fab 10G5H6 bound with ectodomain D3 of IL-13 and antibody m66 bound with gp41 MPER (Membrane Proximal External Region) peptide (PDB ID: 4HWB, 4NRX respectively) of VH5–51 lineage from human comprised of the other system and were chosen for simulation. Since in human, the other 2 antibodies Fab 2558 and Fab CH58 also bound to peptides, therefore these were excluded in the study to maintain distinctness of epitope. Only molecules directly interacting with the antibody were retained, rest were deleted. For antibody, only Fv (fragment variable) region constituting of CDRs and framework regions was retained. Antigens from each of the complexes were removed to generate free form of antibodies.
Multiple sequence analysis of the antibodies and their corresponding germline VH-gene was performed online using CLUSTAL omega (default settings) because of accuracy of alignment . Mutations were identified from the alignment. Percent identity matrix was generated from the alignment to obtain identity between the sequence with germline counterpart. Canonical classes of the CDRs were assigned using strict Chothia SDR templates (http://www.bioinf.org.uk/abs/chothia.html) [12, 31].
Molecular dynamics simulation
All the starting heteromeric structures were provided as input to tLEaP module in AMBER14 package to generate topology and coordinate files . Molecular mechanics parameters were assigned using ff12SB force-field . The molecules were explicitly solvated using TIP3P water box with box edges lying 10 Å from the outermost atoms of the proteins in all directions. Charge of the system was neutralized with monovalent counter ions, Na + or Cl-. Prior to subjecting to simulation, energy minimization was performed for 5000 steps with steepest descent for first 2500 steps followed by conjugate gradient for rest. If steric clashes persisted, minimization cycle was increased. Systems were heated to 300 K during a 14 ps dynamics simulation using the NVT ensemble. The temperature of the system was constrained using Langevin dynamics temperature coupling with a time step of 2 fs. Pressure was equilibrated to 1 atm over a period of 10 ps using isotropic position scaling keeping the temperature constant at 300 K. A third equilibration was run for 100 ps to stabilize the system. Production MD run was conducted using the NPT ensemble for 0.5 μs at 300 K and 1 atm for each system. Snapshots were saved at an interval of 10 ps. All the MD simulations were performed using Sander and a parallel CUDA version of PMEMD from AMBER14 [51, 52]. All simulations were performed in-house using High Performance Computing (HPC) facility with NVIDIA K20X GPUs.
Analysis of MD trajectories
We acknowledge financial support from Department of Science & Technology and Department of Biotechnology, Government of India, to Dinakar M. Salunke and Council of Scientific and Industrial Research for fellowship to Harmeet Kaur.
Availability of data and materials
The structural data used in the study are available in Protein Data Bank (PDB). The IDs associated with the structural data are 1TPX, 2ADF, 1TET, 1NCC, 1WEJ, 2BDN, 2Q8A, 2VDO, 25C8, 1A3R, 2NR6, 4K2U, 1FNS, 1A2Y, 1E6J, 1NMB, 3LEY, 3RVV, 3G5Y, 1KCS, 1F90, 1BAF, 1C12, 1CF8, 2AJV, 3CFD, 1QLE, 1TZH, 2QHR, 2OTW, 1CLZ, 2AEP, 3OKD, 3PHO, 2G5B, 4HLZ, 1EJO, 1IND, 3LS4, 3CFB, 1KFA, 4HZL, 1YQV, 1Q0Y, 1YNK, 2BJM, 1CT8, 1NAK, 1DQJ, 4GAG, 1OB1, 2ZPK, 2OK0, 3BT2, 1YJD, 1IGJ, 1EGJ, 2VXT, 1F3D, 4DW2, 1FL3, 3IFP, 1GGI, 2ADJ, 2H1P, 3FFD, 3LIZ, 2OR9, 3HNS, 3O0R, 1MFE, 3VW3, 1MPA, 1MH5, 4ALA, 4FFV, 2YPV, 1V7M, 1WZ1, 3FO9, 3IET, 1SM3, 1Q72, 1M7D, 1UM5, 2DDQ, 2FR4, 2F58, 1OSP, 2J5L, 1QKZ, 2HKF, 3SGE, 1CBV, 3I50, 3V0W, 1JRH, 2R0W, 3RKD, 3IFO, 3O41, 3IGA, 4OII, 2I9L, 4AG4, 4BKL, 4DGI, 4ETQ, 3QUM, 4F2M, 1KNO, 2HRP, 4C83, 2GSI, 1CU4, 4DGV, 3U2S, 3ZTN, 2NYY, 2QQK, 4HHA, 4G7V, 3GBN, 2NXZ, 2DD8, 4DN4, 2CMR, 2JB6, 4JZO, 4HJ0, 4MWF, 2FX7, 3W9E, 3NPS, 4LST, 4HKX, 3D85, 4AL8, 3MXW, 3HAE, 3KDM, 3HI6, 3SO3, 2VXS, 1H0D, 3BN9, 4FP8, 4JFZ, 3MLX, 3HI1, 2YK1, 4JY4, 1IKF, 3BDY, 3DVG, 3IDX, 3SOB, 2H9G, 3K2U, 3NH7, 1OP5, 3H42, 3UJI, 4HPO, 4NRX, 4HWB, 3L5X, 2F5B, 3THM, 3TWC, 4G6F, 1Q1J. All data generated or analyzed during this study are included in this published article and its Additional file 1, Additional file 2 and Additional file 3.
Conceived and designed the experiments: DMS, HK. Performed the experiments: HK. Analysed the data: HK, NS, DM, DMS. Wrote the manuscript: HK, DMS. Manuscript is reviewed, discussed and approved by all authors.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 18.Collins AM, Wang Y, Roskin KM, Marquis CP, Jackson KJ. The mouse antibody heavy chain repertoire is germline-focused and highly variable between inbred strains. Philos Trans R Soc Lond B Biol Sci. 2015;370(1676):20140236.Google Scholar
- 40.Darwin C, R. The Origin of Species. The Harvard Classics. New York: P.F. Collier & Son; 1858.Google Scholar
- 42.Kabat EA, Wu TT, Bilofsky H, Reid-Miller M, Perry H. Sequence of proteins of immunological interest. Bethesda: National Institutes of Health; 1983.Google Scholar
- 56.Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley Symposium on Mathematical Statistics and Probability. University of California Press; 1967.Google Scholar
- 57.Team RC. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2013.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.