Advertisement

Immunogenetics

, Volume 65, Issue 9, pp 691–702 | Cite as

An automated algorithm for extracting functional immunologic V-genes from genomes in jawed vertebrates

  • David Olivieri
  • Jose Faro
  • Bernardo von Haeften
  • Christian Sánchez-Espinel
  • Francisco Gambón-Deza
Original Paper

Abstract

Variable (V) domains of immunoglobulins (Ig) and T cell receptors (TCR) are generated from genomic V gene segments (V-genes). At present, such V-genes have been annotated only within the genome of a few species. We have developed a bioinformatics tool that accelerates the task of identifying functional V-genes from genome datasets. Automated recognition is accomplished by recognizing key V-gene signatures, such as recombination signal sequences, size of the exon region, and position of amino acid motifs within the translated exon. This algorithm also classifies extracted V-genes into either TCR or Ig loci. We describe the implementation of the algorithm and validate its accuracy by comparing V-genes identified from the human and mouse genomes with known V-gene annotations documented and available in public repositories. The advantages and utility of the algorithm are illustrated by using it to identify functional V-genes in the rat genome, where V-gene annotation is still incomplete. This allowed us to perform a comparative human–rodent phylogenetic analysis based on V-genes that supports the hypothesis that distinct evolutionary pressures shape the TCRs and Igs V-gene repertoires. Our program, together with a user graphical interface, is available as open-source software, downloadable at http://code.google.com/p/vgenextract/.

Keywords

Immunoglobulins TCR Genes Immunoinformatics 

Notes

Acknowledgments

This work was partially supported by the European Union 7th Framework Programme [FP7/REGPOT-2012-2013.1] under grant agreement no. 316265, BIOCAPS. JF acknowledges the support of PIRSES-GA-2008-230665 (7th FP, EC).

Supplementary material

251_2013_715_Fig4_ESM.gif (99 kb)
Online Resource Fig. 1

Results for human IGHV. The genomic segment of chromosome 14 that contains the IGHV is shown. The figure consists of the Ensembl representation of this chromosomal section with the Havana IGHV annotation (obtained from their web application) and superposed with the V-gene positions obtained from our algorithm, VgenExtractor. A numerical summary of the results are provided in Table 1 (GIF 99 kb)

251_2013_715_MOESM1_ESM.eps (70 kb)
High resolution image (EPS 69 kb)
251_2013_715_Fig5_ESM.gif (95 kb)
Online Resource Fig. 2

Results for human IGKV. The final genomic segment of chromosome 2 containing IGKV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard IGKV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 (GIF 95 kb)

251_2013_715_MOESM2_ESM.eps (60 kb)
High resolution image (EPS 60 kb)
251_2013_715_Fig6_ESM.gif (80 kb)
Online Resource Fig. 3

Results for human IGLV. The final genomic segment of chromosome 22 containing IGLV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard IGVL gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 (GIF 80 kb)

251_2013_715_MOESM3_ESM.eps (58 kb)
High resolution image (EPS 57 kb)
251_2013_715_Fig7_ESM.gif (104 kb)
Online Resource Fig. 4

Results for human TRBV. The final genomic segment of chromosome 2 containing TRBV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard TRBV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 in the text (GIF 103 kb)

251_2013_715_MOESM4_ESM.eps (63 kb)
High resolution image (EPS 63 kb)
251_2013_715_Fig8_ESM.gif (51 kb)
Online Resource Fig. 5

Results for human TRGV. The final genomic segment of chromosome 2 containing TRGV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard TRGV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 in the text (GIF 50 kb)

251_2013_715_MOESM5_ESM.eps (46 kb)
High resolution image (EPS 45 kb)
251_2013_715_Fig9_ESM.gif (360 kb)
Online Resource Fig. 6

Study of gene segments duplications from TRAV locus in rat genome. The upper subfigure represents the human IGKV domain while the lower subfigure represents the rat TRAV locus. In each subfigure, duplicate gene segments are depicted. The green tracks indicate homology between 85 and 99.9 %. In the IGKV locus, a quasi-duplication process can be seen throughout the entire locus. In contrast, in the rat TRAV locus, no large duplications can be discerned (GIF 359 kb)

251_2013_715_MOESM6_ESM.eps (542 kb)
High resolution image (EPS 542 kb)
251_2013_715_MOESM7_ESM.docx (12 kb)
Online Resource Table 1 Results of sequences found for each locus for the Rattus norvegicus genome (Rnor5.0) from our program VgenExtractor compared to those found by the IMGT. Data results are displayed as IMGT/VgenExtractor for functional (F), pseudogenes (P), and open reading frames (ORF). Empty entries indicate that annotations are not available (NA) in the IMGT (DOCX 11 kb)
251_2013_715_MOESM8_ESM.docx (12 kb)
Online Resource Table 2 Results of sequences found using VgenExtractor from different species where partial annotations are available at the IMGT. Data results are displayed as IMGT/VgenExtractor for functional (F), pseudogenes (P), and open reading frames (ORF). Empty entries indicate no available annotations in the IMGT (DOCX 11 kb)

References

  1. Baker ML, Osterman AK, Brumburgh S (2005) Divergent t-cell receptor δ chains from marsupials. Immunogenetics 57(9):665–673PubMedCrossRefGoogle Scholar
  2. Bolotin D, Mamedov I, Britanova O, Zvyagin I, Shagin D, Ustyugova S, Turchaninova M, Lukyanov S, Lebedev Y, Chudakov D (2012) Next generation sequencing for tcr repertoire profiling: platform-specific features and correction algorithms. Eur J Immunol 42(11):3073–3083PubMedCrossRefGoogle Scholar
  3. Bonilla FA, Oettgen HC (2010) Adaptive immunity. J Allergy Clin Immunol 125(2, Supplement 2):S33–S40PubMedCrossRefGoogle Scholar
  4. Cannon J, Haire R, Rast J, Litman G (2004) The phylogenetic origins of the antigen-binding receptors and somatic diversification mechanisms. Immunol Rev 200(1):12–22PubMedCrossRefGoogle Scholar
  5. Chaplin D (2010) Overview of the immune response. J Allergy Clin Immunol 125(2, Supplement 2):S3–S23PubMedCrossRefGoogle Scholar
  6. Charlemagne J, Fellah J, Guerra A, Kerfourn F, Partula S (1998) T cell receptors in ectothermic vertebrates. Immunol Rev 166(1):87–102PubMedCrossRefGoogle Scholar
  7. Cock P, Antao T, Chang J, Chapman B, Cox C, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL (2009) Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11):1422–1423PubMedCrossRefGoogle Scholar
  8. Cowell L, Davila M, Ramsden D, Kelsoe G (2004) Computational tools for understanding sequence variability in recombination signals. Immunol Rev 200(1):57–69PubMedCrossRefGoogle Scholar
  9. Danilova N, Amemiya CT (2009) Going adaptive. Ann N Y Acad Sci 1168(1):130–155PubMedCrossRefGoogle Scholar
  10. Das S, Nozawa M, Klein J, Nei M (2008) Evolutionary dynamics of the immunoglobulin heavy chain variable region genes in vertebrates. Immunogenetics 60:47–55PubMedCrossRefGoogle Scholar
  11. Das S, Hirano M, McCallister C, Tako R, Nikolaidis N (2011) Comparative genomics and evolution of immunoglobulin-encoding loci in tetrapods. Adv Immunol 111:143–178PubMedCrossRefGoogle Scholar
  12. Deng L, Langley RJ, Wang Q, Topalian S, Mariuzza R (2012) Structural insights into the editing of germ-line encoded interactions between T cell receptor and MHC class II by v a cdr3. Proc Natl Acad Sci U S A 109(37):14,960–14,965CrossRefGoogle Scholar
  13. Felix NJ, Allen PM (2007) Specificity of T cell alloreactivity. Nat Rev Immunol 7(12):942–953PubMedCrossRefGoogle Scholar
  14. Flajnik MF, Kasahara M (2010) Origin and evolution of the adaptive immune system: genetic events and selective pressures. Nat Rev Genet 11(1):47–59PubMedCrossRefGoogle Scholar
  15. Flicek P et al (2012) Ensembl 2012. Nucleic Acids Res 40(D1):D84–D90PubMedCrossRefGoogle Scholar
  16. Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2009) The immunoglobulin heavy chain locus in the platypus (Ornithorhynchus anatinus). Mol Immunol 46(13):2515–2523. doi: 10.1016/j.molimm.2009.05.025 PubMedCrossRefGoogle Scholar
  17. Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2009) The immunoglobulin heavy chain locus in the reptile Anolis carolinensis. Mol Immunol 46(8–9):1679–1687. doi: 10.1016/j.molimm.2009.02.019 PubMedCrossRefGoogle Scholar
  18. Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2010) Presence of an unique IGT on the IGH locus in three-spined stickleback fish (Gasterosteus aculeatus) and the very recent generation of a repertoire of vh genes. Dev Comp Immunol 34(2):114–122. doi: 10.1016/j.dci.2009.08.011 PubMedCrossRefGoogle Scholar
  19. Giudicelli V, Chaume D, Lefranc MP (2005) Imgt/gene-db: a comprehensive database for human and mouse immunoglobulin and t cell receptor genes. Nucl Acids Res pp D256–D26Google Scholar
  20. Hassanin A, Golub R, Lewis SM, Wu GE (2000) Evolution of the recombination signal sequences in the Ig heavy-chain variable region locus of mammals. Proc Natl Acad Sci 97(21):11,415–11,420CrossRefGoogle Scholar
  21. Kidd M, Chen Z, Wang Y, Jackson K, Zhang L, Boyd S, Fire A, Tanaka M, Gaëta B, Collins A (2012) The inference of phased haplotypes for the immunoglobulin h chain v region gene loci by analysis of vdj gene rearrangements. J Immunol 188(3):1333–1340PubMedCrossRefGoogle Scholar
  22. Kumar S, Nei M, Dudley J, Tamura K (2008) Mega: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinforma 9(4):299–306CrossRefGoogle Scholar
  23. Lane J, Duroux P, Lefranc M (2010) From IMGT-ontology to IMGT/ligmotif: the IMGT(r) standardized approach for immunoglobulin and T cell receptor gene identification and description in large genomic sequences. BMC Bioinforma 11:223CrossRefGoogle Scholar
  24. Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R, Thompson J, Gibson T, Higgins D (2007) Clustal w and clustal x version 2.0. Bioinformatics 23(21):2947–2948PubMedCrossRefGoogle Scholar
  25. Lee Y, Alt FW, Reyes J, Gleason M, Zarrin A, Jung D (2009) Differential utilization of T cell receptor tcra/tcrd locus variable region gene segments is mediated by accessibility. Proc Natl Acad Sci 106(41):17,487–17,492Google Scholar
  26. Lefranc M, Pommié C, Ruiz M, Giudicelli V, Foulquier E, Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily v-like domains. Dev Comp Immunol 27(1):55–77PubMedCrossRefGoogle Scholar
  27. Magadán-Mompó S, Sánchez-Espinel C, Gambón-Deza F (2011) Immunoglobulin heavy chains in medaka (Oryzias latipes). BMC Evol Biol 11:165. doi: 10.1186/1471-2148-11-165 PubMedCrossRefGoogle Scholar
  28. Narciso J, Uy I, Cabang A, Chavez J, Lorenzo J, Padilla-Concepcion G, Padlan E (2011) Analysis of the antibody structure based on high-resolution crystallographic studies. New Biotechnol 28(5):435–447CrossRefGoogle Scholar
  29. Ohta Y, Flajnik M (2006) IGD, like IGM, is a primordial immunoglobulin class perpetuated in most jawed vertebrates. Proc Natl Acad Sci U S A 103(28):10,723–10,728CrossRefGoogle Scholar
  30. Oltz E (2001) Regulation of antigen receptor gene assembly in lymphocytes. Immunol Res 23:121–133PubMedCrossRefGoogle Scholar
  31. Ota T, Nei M (1994) Divergent evolution and evolution by the birth-and-death process in the immunoglobulin vh gene family. Mol Biol Evol 11(3):469–482PubMedGoogle Scholar
  32. Schroeder H, Cavacini L (2010) Structure and function of immunoglobulins. J Allergy Clin Immunol 125(2, Supplement 2):S41–S52PubMedCrossRefGoogle Scholar
  33. Scott-Browne J, Crawford F, Young M, Kappler J, Marrack P, Gapin L (2011) Evolutionarily conserved features contribute to ab T cell receptor specificity. Immunity 35(4):526–535PubMedCrossRefGoogle Scholar
  34. Sun Y, Wei Z, Li N, Zhao Y (2012) A comparative overview of immunoglobulin genes and the generation of their diversity in tetrapods. Dev Comp Immunol 39(1–2):103–109PubMedGoogle Scholar
  35. Vaccarelli G, Miccoli M, Lanave C, Massari S, Cribiu E, Ciccarese S (2005) Genomic organization of the sheep TRG1 locus and comparative analyses of Bovidae and human variable genes. Gene 357(2):103–114PubMedCrossRefGoogle Scholar
  36. Villadangos J, Ploegh H (2000) Proteolysis in MHC class II antigen presentation: who’s in charge? Immunity 12(3):233–239PubMedCrossRefGoogle Scholar
  37. Watson CT, Breden F (2012) The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease. Genes Immun 13(5):363–373PubMedCrossRefGoogle Scholar
  38. Williams A, Barclay AN (1998) The immunoglobulin superfamily-domains for cell surface recognition. Annu Rev Immunol 6:381–405CrossRefGoogle Scholar
  39. Wilming LG, Gilbert JGR, Howe K, Trevanion S, Hubbard T, Harrow JL (2008) The vertebrate genome annotation (vega) database. Nucleic Acids Res 36:D753â–D760. doi: 10.1093/nar/gkm987 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • David Olivieri
    • 1
  • Jose Faro
    • 2
    • 3
    • 4
  • Bernardo von Haeften
    • 5
  • Christian Sánchez-Espinel
    • 6
  • Francisco Gambón-Deza
    • 7
    • 3
  1. 1.School of Computer EngineeringUniversity of VigoOurenseSpain
  2. 2.Immunology, Faculty of Biology, and Biomedical Research Center (CINBIO)University of VigoVigoSpain
  3. 3.Instituto Biomédico de VigoVigoSpain
  4. 4.Instituto Gulbenkian de CiênciaOeirasPortugal
  5. 5.Area of Immunology, Faculty of BiologyUniversity of VigoVigoSpain
  6. 6.Nanoimmunotech SL, Pza. Fernando CondeVigoSpain
  7. 7.Servicio Gallego de Salud (SERGAS)Unidad de Inmunología, Hospital do MeixoeiroVigoSpain

Personalised recommendations