Abstract
Variable (V) domains of immunoglobulins (Ig) and T cell receptors (TCR) are generated from genomic V gene segments (V-genes). At present, such V-genes have been annotated only within the genome of a few species. We have developed a bioinformatics tool that accelerates the task of identifying functional V-genes from genome datasets. Automated recognition is accomplished by recognizing key V-gene signatures, such as recombination signal sequences, size of the exon region, and position of amino acid motifs within the translated exon. This algorithm also classifies extracted V-genes into either TCR or Ig loci. We describe the implementation of the algorithm and validate its accuracy by comparing V-genes identified from the human and mouse genomes with known V-gene annotations documented and available in public repositories. The advantages and utility of the algorithm are illustrated by using it to identify functional V-genes in the rat genome, where V-gene annotation is still incomplete. This allowed us to perform a comparative human–rodent phylogenetic analysis based on V-genes that supports the hypothesis that distinct evolutionary pressures shape the TCRs and Igs V-gene repertoires. Our program, together with a user graphical interface, is available as open-source software, downloadable at http://code.google.com/p/vgenextract/.
Similar content being viewed by others
References
Baker ML, Osterman AK, Brumburgh S (2005) Divergent t-cell receptor δ chains from marsupials. Immunogenetics 57(9):665–673
Bolotin D, Mamedov I, Britanova O, Zvyagin I, Shagin D, Ustyugova S, Turchaninova M, Lukyanov S, Lebedev Y, Chudakov D (2012) Next generation sequencing for tcr repertoire profiling: platform-specific features and correction algorithms. Eur J Immunol 42(11):3073–3083
Bonilla FA, Oettgen HC (2010) Adaptive immunity. J Allergy Clin Immunol 125(2, Supplement 2):S33–S40
Cannon J, Haire R, Rast J, Litman G (2004) The phylogenetic origins of the antigen-binding receptors and somatic diversification mechanisms. Immunol Rev 200(1):12–22
Chaplin D (2010) Overview of the immune response. J Allergy Clin Immunol 125(2, Supplement 2):S3–S23
Charlemagne J, Fellah J, Guerra A, Kerfourn F, Partula S (1998) T cell receptors in ectothermic vertebrates. Immunol Rev 166(1):87–102
Cock P, Antao T, Chang J, Chapman B, Cox C, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL (2009) Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11):1422–1423
Cowell L, Davila M, Ramsden D, Kelsoe G (2004) Computational tools for understanding sequence variability in recombination signals. Immunol Rev 200(1):57–69
Danilova N, Amemiya CT (2009) Going adaptive. Ann N Y Acad Sci 1168(1):130–155
Das S, Nozawa M, Klein J, Nei M (2008) Evolutionary dynamics of the immunoglobulin heavy chain variable region genes in vertebrates. Immunogenetics 60:47–55
Das S, Hirano M, McCallister C, Tako R, Nikolaidis N (2011) Comparative genomics and evolution of immunoglobulin-encoding loci in tetrapods. Adv Immunol 111:143–178
Deng L, Langley RJ, Wang Q, Topalian S, Mariuzza R (2012) Structural insights into the editing of germ-line encoded interactions between T cell receptor and MHC class II by v a cdr3. Proc Natl Acad Sci U S A 109(37):14,960–14,965
Felix NJ, Allen PM (2007) Specificity of T cell alloreactivity. Nat Rev Immunol 7(12):942–953
Flajnik MF, Kasahara M (2010) Origin and evolution of the adaptive immune system: genetic events and selective pressures. Nat Rev Genet 11(1):47–59
Flicek P et al (2012) Ensembl 2012. Nucleic Acids Res 40(D1):D84–D90
Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2009) The immunoglobulin heavy chain locus in the platypus (Ornithorhynchus anatinus). Mol Immunol 46(13):2515–2523. doi:10.1016/j.molimm.2009.05.025
Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2009) The immunoglobulin heavy chain locus in the reptile Anolis carolinensis. Mol Immunol 46(8–9):1679–1687. doi:10.1016/j.molimm.2009.02.019
Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2010) Presence of an unique IGT on the IGH locus in three-spined stickleback fish (Gasterosteus aculeatus) and the very recent generation of a repertoire of vh genes. Dev Comp Immunol 34(2):114–122. doi:10.1016/j.dci.2009.08.011
Giudicelli V, Chaume D, Lefranc MP (2005) Imgt/gene-db: a comprehensive database for human and mouse immunoglobulin and t cell receptor genes. Nucl Acids Res pp D256–D26
Hassanin A, Golub R, Lewis SM, Wu GE (2000) Evolution of the recombination signal sequences in the Ig heavy-chain variable region locus of mammals. Proc Natl Acad Sci 97(21):11,415–11,420
Kidd M, Chen Z, Wang Y, Jackson K, Zhang L, Boyd S, Fire A, Tanaka M, Gaëta B, Collins A (2012) The inference of phased haplotypes for the immunoglobulin h chain v region gene loci by analysis of vdj gene rearrangements. J Immunol 188(3):1333–1340
Kumar S, Nei M, Dudley J, Tamura K (2008) Mega: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinforma 9(4):299–306
Lane J, Duroux P, Lefranc M (2010) From IMGT-ontology to IMGT/ligmotif: the IMGT(r) standardized approach for immunoglobulin and T cell receptor gene identification and description in large genomic sequences. BMC Bioinforma 11:223
Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R, Thompson J, Gibson T, Higgins D (2007) Clustal w and clustal x version 2.0. Bioinformatics 23(21):2947–2948
Lee Y, Alt FW, Reyes J, Gleason M, Zarrin A, Jung D (2009) Differential utilization of T cell receptor tcra/tcrd locus variable region gene segments is mediated by accessibility. Proc Natl Acad Sci 106(41):17,487–17,492
Lefranc M, Pommié C, Ruiz M, Giudicelli V, Foulquier E, Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily v-like domains. Dev Comp Immunol 27(1):55–77
Magadán-Mompó S, Sánchez-Espinel C, Gambón-Deza F (2011) Immunoglobulin heavy chains in medaka (Oryzias latipes). BMC Evol Biol 11:165. doi:10.1186/1471-2148-11-165
Narciso J, Uy I, Cabang A, Chavez J, Lorenzo J, Padilla-Concepcion G, Padlan E (2011) Analysis of the antibody structure based on high-resolution crystallographic studies. New Biotechnol 28(5):435–447
Ohta Y, Flajnik M (2006) IGD, like IGM, is a primordial immunoglobulin class perpetuated in most jawed vertebrates. Proc Natl Acad Sci U S A 103(28):10,723–10,728
Oltz E (2001) Regulation of antigen receptor gene assembly in lymphocytes. Immunol Res 23:121–133
Ota T, Nei M (1994) Divergent evolution and evolution by the birth-and-death process in the immunoglobulin vh gene family. Mol Biol Evol 11(3):469–482
Schroeder H, Cavacini L (2010) Structure and function of immunoglobulins. J Allergy Clin Immunol 125(2, Supplement 2):S41–S52
Scott-Browne J, Crawford F, Young M, Kappler J, Marrack P, Gapin L (2011) Evolutionarily conserved features contribute to ab T cell receptor specificity. Immunity 35(4):526–535
Sun Y, Wei Z, Li N, Zhao Y (2012) A comparative overview of immunoglobulin genes and the generation of their diversity in tetrapods. Dev Comp Immunol 39(1–2):103–109
Vaccarelli G, Miccoli M, Lanave C, Massari S, Cribiu E, Ciccarese S (2005) Genomic organization of the sheep TRG1 locus and comparative analyses of Bovidae and human variable genes. Gene 357(2):103–114
Villadangos J, Ploegh H (2000) Proteolysis in MHC class II antigen presentation: who’s in charge? Immunity 12(3):233–239
Watson CT, Breden F (2012) The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease. Genes Immun 13(5):363–373
Williams A, Barclay AN (1998) The immunoglobulin superfamily-domains for cell surface recognition. Annu Rev Immunol 6:381–405
Wilming LG, Gilbert JGR, Howe K, Trevanion S, Hubbard T, Harrow JL (2008) The vertebrate genome annotation (vega) database. Nucleic Acids Res 36:D753â–D760. doi:10.1093/nar/gkm987
Acknowledgments
This work was partially supported by the European Union 7th Framework Programme [FP7/REGPOT-2012-2013.1] under grant agreement no. 316265, BIOCAPS. JF acknowledges the support of PIRSES-GA-2008-230665 (7th FP, EC).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Online Resource Fig. 1
Results for human IGHV. The genomic segment of chromosome 14 that contains the IGHV is shown. The figure consists of the Ensembl representation of this chromosomal section with the Havana IGHV annotation (obtained from their web application) and superposed with the V-gene positions obtained from our algorithm, VgenExtractor. A numerical summary of the results are provided in Table 1 (GIF 99 kb)
High resolution image
(EPS 69 kb)
Online Resource Fig. 2
Results for human IGKV. The final genomic segment of chromosome 2 containing IGKV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard IGKV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 (GIF 95 kb)
High resolution image
(EPS 60 kb)
Online Resource Fig. 3
Results for human IGLV. The final genomic segment of chromosome 22 containing IGLV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard IGVL gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 (GIF 80 kb)
High resolution image
(EPS 57 kb)
Online Resource Fig. 4
Results for human TRBV. The final genomic segment of chromosome 2 containing TRBV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard TRBV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 in the text (GIF 103 kb)
High resolution image
(EPS 63 kb)
Online Resource Fig. 5
Results for human TRGV. The final genomic segment of chromosome 2 containing TRGV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard TRGV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 in the text (GIF 50 kb)
High resolution image
(EPS 45 kb)
Online Resource Fig. 6
Study of gene segments duplications from TRAV locus in rat genome. The upper subfigure represents the human IGKV domain while the lower subfigure represents the rat TRAV locus. In each subfigure, duplicate gene segments are depicted. The green tracks indicate homology between 85 and 99.9 %. In the IGKV locus, a quasi-duplication process can be seen throughout the entire locus. In contrast, in the rat TRAV locus, no large duplications can be discerned (GIF 359 kb)
High resolution image
(EPS 542 kb)
Online Resource Table 1
Results of sequences found for each locus for the Rattus norvegicus genome (Rnor5.0) from our program VgenExtractor compared to those found by the IMGT. Data results are displayed as IMGT/VgenExtractor for functional (F), pseudogenes (P), and open reading frames (ORF). Empty entries indicate that annotations are not available (NA) in the IMGT (DOCX 11 kb)
Online Resource Table 2
Results of sequences found using VgenExtractor from different species where partial annotations are available at the IMGT. Data results are displayed as IMGT/VgenExtractor for functional (F), pseudogenes (P), and open reading frames (ORF). Empty entries indicate no available annotations in the IMGT (DOCX 11 kb)
Rights and permissions
About this article
Cite this article
Olivieri, D., Faro, J., von Haeften, B. et al. An automated algorithm for extracting functional immunologic V-genes from genomes in jawed vertebrates. Immunogenetics 65, 691–702 (2013). https://doi.org/10.1007/s00251-013-0715-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00251-013-0715-8