Skip to main content

Advertisement

Log in

An automated algorithm for extracting functional immunologic V-genes from genomes in jawed vertebrates

  • Original Paper
  • Published:
Immunogenetics Aims and scope Submit manuscript

Abstract

Variable (V) domains of immunoglobulins (Ig) and T cell receptors (TCR) are generated from genomic V gene segments (V-genes). At present, such V-genes have been annotated only within the genome of a few species. We have developed a bioinformatics tool that accelerates the task of identifying functional V-genes from genome datasets. Automated recognition is accomplished by recognizing key V-gene signatures, such as recombination signal sequences, size of the exon region, and position of amino acid motifs within the translated exon. This algorithm also classifies extracted V-genes into either TCR or Ig loci. We describe the implementation of the algorithm and validate its accuracy by comparing V-genes identified from the human and mouse genomes with known V-gene annotations documented and available in public repositories. The advantages and utility of the algorithm are illustrated by using it to identify functional V-genes in the rat genome, where V-gene annotation is still incomplete. This allowed us to perform a comparative human–rodent phylogenetic analysis based on V-genes that supports the hypothesis that distinct evolutionary pressures shape the TCRs and Igs V-gene repertoires. Our program, together with a user graphical interface, is available as open-source software, downloadable at http://code.google.com/p/vgenextract/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Baker ML, Osterman AK, Brumburgh S (2005) Divergent t-cell receptor δ chains from marsupials. Immunogenetics 57(9):665–673

    Article  PubMed  Google Scholar 

  • Bolotin D, Mamedov I, Britanova O, Zvyagin I, Shagin D, Ustyugova S, Turchaninova M, Lukyanov S, Lebedev Y, Chudakov D (2012) Next generation sequencing for tcr repertoire profiling: platform-specific features and correction algorithms. Eur J Immunol 42(11):3073–3083

    Article  PubMed  CAS  Google Scholar 

  • Bonilla FA, Oettgen HC (2010) Adaptive immunity. J Allergy Clin Immunol 125(2, Supplement 2):S33–S40

    Article  PubMed  Google Scholar 

  • Cannon J, Haire R, Rast J, Litman G (2004) The phylogenetic origins of the antigen-binding receptors and somatic diversification mechanisms. Immunol Rev 200(1):12–22

    Article  PubMed  CAS  Google Scholar 

  • Chaplin D (2010) Overview of the immune response. J Allergy Clin Immunol 125(2, Supplement 2):S3–S23

    Article  PubMed  Google Scholar 

  • Charlemagne J, Fellah J, Guerra A, Kerfourn F, Partula S (1998) T cell receptors in ectothermic vertebrates. Immunol Rev 166(1):87–102

    Article  PubMed  CAS  Google Scholar 

  • Cock P, Antao T, Chang J, Chapman B, Cox C, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL (2009) Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11):1422–1423

    Article  PubMed  CAS  Google Scholar 

  • Cowell L, Davila M, Ramsden D, Kelsoe G (2004) Computational tools for understanding sequence variability in recombination signals. Immunol Rev 200(1):57–69

    Article  PubMed  CAS  Google Scholar 

  • Danilova N, Amemiya CT (2009) Going adaptive. Ann N Y Acad Sci 1168(1):130–155

    Article  PubMed  CAS  Google Scholar 

  • Das S, Nozawa M, Klein J, Nei M (2008) Evolutionary dynamics of the immunoglobulin heavy chain variable region genes in vertebrates. Immunogenetics 60:47–55

    Article  PubMed  CAS  Google Scholar 

  • Das S, Hirano M, McCallister C, Tako R, Nikolaidis N (2011) Comparative genomics and evolution of immunoglobulin-encoding loci in tetrapods. Adv Immunol 111:143–178

    Article  PubMed  CAS  Google Scholar 

  • Deng L, Langley RJ, Wang Q, Topalian S, Mariuzza R (2012) Structural insights into the editing of germ-line encoded interactions between T cell receptor and MHC class II by v a cdr3. Proc Natl Acad Sci U S A 109(37):14,960–14,965

    Article  CAS  Google Scholar 

  • Felix NJ, Allen PM (2007) Specificity of T cell alloreactivity. Nat Rev Immunol 7(12):942–953

    Article  PubMed  CAS  Google Scholar 

  • Flajnik MF, Kasahara M (2010) Origin and evolution of the adaptive immune system: genetic events and selective pressures. Nat Rev Genet 11(1):47–59

    Article  PubMed  CAS  Google Scholar 

  • Flicek P et al (2012) Ensembl 2012. Nucleic Acids Res 40(D1):D84–D90

    Article  PubMed  CAS  Google Scholar 

  • Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2009) The immunoglobulin heavy chain locus in the platypus (Ornithorhynchus anatinus). Mol Immunol 46(13):2515–2523. doi:10.1016/j.molimm.2009.05.025

    Article  PubMed  Google Scholar 

  • Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2009) The immunoglobulin heavy chain locus in the reptile Anolis carolinensis. Mol Immunol 46(8–9):1679–1687. doi:10.1016/j.molimm.2009.02.019

    Article  PubMed  Google Scholar 

  • Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2010) Presence of an unique IGT on the IGH locus in three-spined stickleback fish (Gasterosteus aculeatus) and the very recent generation of a repertoire of vh genes. Dev Comp Immunol 34(2):114–122. doi:10.1016/j.dci.2009.08.011

    Article  PubMed  Google Scholar 

  • Giudicelli V, Chaume D, Lefranc MP (2005) Imgt/gene-db: a comprehensive database for human and mouse immunoglobulin and t cell receptor genes. Nucl Acids Res pp D256–D26

  • Hassanin A, Golub R, Lewis SM, Wu GE (2000) Evolution of the recombination signal sequences in the Ig heavy-chain variable region locus of mammals. Proc Natl Acad Sci 97(21):11,415–11,420

    Article  CAS  Google Scholar 

  • Kidd M, Chen Z, Wang Y, Jackson K, Zhang L, Boyd S, Fire A, Tanaka M, Gaëta B, Collins A (2012) The inference of phased haplotypes for the immunoglobulin h chain v region gene loci by analysis of vdj gene rearrangements. J Immunol 188(3):1333–1340

    Article  PubMed  CAS  Google Scholar 

  • Kumar S, Nei M, Dudley J, Tamura K (2008) Mega: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinforma 9(4):299–306

    Article  CAS  Google Scholar 

  • Lane J, Duroux P, Lefranc M (2010) From IMGT-ontology to IMGT/ligmotif: the IMGT(r) standardized approach for immunoglobulin and T cell receptor gene identification and description in large genomic sequences. BMC Bioinforma 11:223

    Article  Google Scholar 

  • Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R, Thompson J, Gibson T, Higgins D (2007) Clustal w and clustal x version 2.0. Bioinformatics 23(21):2947–2948

    Article  PubMed  CAS  Google Scholar 

  • Lee Y, Alt FW, Reyes J, Gleason M, Zarrin A, Jung D (2009) Differential utilization of T cell receptor tcra/tcrd locus variable region gene segments is mediated by accessibility. Proc Natl Acad Sci 106(41):17,487–17,492

    CAS  Google Scholar 

  • Lefranc M, Pommié C, Ruiz M, Giudicelli V, Foulquier E, Truong L, Thouvenin-Contet V, Lefranc G (2003) IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily v-like domains. Dev Comp Immunol 27(1):55–77

    Article  PubMed  CAS  Google Scholar 

  • Magadán-Mompó S, Sánchez-Espinel C, Gambón-Deza F (2011) Immunoglobulin heavy chains in medaka (Oryzias latipes). BMC Evol Biol 11:165. doi:10.1186/1471-2148-11-165

    Article  PubMed  Google Scholar 

  • Narciso J, Uy I, Cabang A, Chavez J, Lorenzo J, Padilla-Concepcion G, Padlan E (2011) Analysis of the antibody structure based on high-resolution crystallographic studies. New Biotechnol 28(5):435–447

    Article  CAS  Google Scholar 

  • Ohta Y, Flajnik M (2006) IGD, like IGM, is a primordial immunoglobulin class perpetuated in most jawed vertebrates. Proc Natl Acad Sci U S A 103(28):10,723–10,728

    Article  CAS  Google Scholar 

  • Oltz E (2001) Regulation of antigen receptor gene assembly in lymphocytes. Immunol Res 23:121–133

    Article  PubMed  CAS  Google Scholar 

  • Ota T, Nei M (1994) Divergent evolution and evolution by the birth-and-death process in the immunoglobulin vh gene family. Mol Biol Evol 11(3):469–482

    PubMed  CAS  Google Scholar 

  • Schroeder H, Cavacini L (2010) Structure and function of immunoglobulins. J Allergy Clin Immunol 125(2, Supplement 2):S41–S52

    Article  PubMed  Google Scholar 

  • Scott-Browne J, Crawford F, Young M, Kappler J, Marrack P, Gapin L (2011) Evolutionarily conserved features contribute to ab T cell receptor specificity. Immunity 35(4):526–535

    Article  PubMed  CAS  Google Scholar 

  • Sun Y, Wei Z, Li N, Zhao Y (2012) A comparative overview of immunoglobulin genes and the generation of their diversity in tetrapods. Dev Comp Immunol 39(1–2):103–109

    PubMed  Google Scholar 

  • Vaccarelli G, Miccoli M, Lanave C, Massari S, Cribiu E, Ciccarese S (2005) Genomic organization of the sheep TRG1 locus and comparative analyses of Bovidae and human variable genes. Gene 357(2):103–114

    Article  PubMed  CAS  Google Scholar 

  • Villadangos J, Ploegh H (2000) Proteolysis in MHC class II antigen presentation: who’s in charge? Immunity 12(3):233–239

    Article  PubMed  CAS  Google Scholar 

  • Watson CT, Breden F (2012) The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease. Genes Immun 13(5):363–373

    Article  PubMed  CAS  Google Scholar 

  • Williams A, Barclay AN (1998) The immunoglobulin superfamily-domains for cell surface recognition. Annu Rev Immunol 6:381–405

    Article  Google Scholar 

  • Wilming LG, Gilbert JGR, Howe K, Trevanion S, Hubbard T, Harrow JL (2008) The vertebrate genome annotation (vega) database. Nucleic Acids Res 36:D753â–D760. doi:10.1093/nar/gkm987

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the European Union 7th Framework Programme [FP7/REGPOT-2012-2013.1] under grant agreement no. 316265, BIOCAPS. JF acknowledges the support of PIRSES-GA-2008-230665 (7th FP, EC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Olivieri.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource Fig. 1

Results for human IGHV. The genomic segment of chromosome 14 that contains the IGHV is shown. The figure consists of the Ensembl representation of this chromosomal section with the Havana IGHV annotation (obtained from their web application) and superposed with the V-gene positions obtained from our algorithm, VgenExtractor. A numerical summary of the results are provided in Table 1 (GIF 99 kb)

High resolution image

(EPS 69 kb)

Online Resource Fig. 2

Results for human IGKV. The final genomic segment of chromosome 2 containing IGKV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard IGKV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 (GIF 95 kb)

High resolution image

(EPS 60 kb)

Online Resource Fig. 3

Results for human IGLV. The final genomic segment of chromosome 22 containing IGLV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard IGVL gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 (GIF 80 kb)

High resolution image

(EPS 57 kb)

Online Resource Fig. 4

Results for human TRBV. The final genomic segment of chromosome 2 containing TRBV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard TRBV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 in the text (GIF 103 kb)

High resolution image

(EPS 63 kb)

Online Resource Fig. 5

Results for human TRGV. The final genomic segment of chromosome 2 containing TRGV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard TRGV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 in the text (GIF 50 kb)

High resolution image

(EPS 45 kb)

Online Resource Fig. 6

Study of gene segments duplications from TRAV locus in rat genome. The upper subfigure represents the human IGKV domain while the lower subfigure represents the rat TRAV locus. In each subfigure, duplicate gene segments are depicted. The green tracks indicate homology between 85 and 99.9 %. In the IGKV locus, a quasi-duplication process can be seen throughout the entire locus. In contrast, in the rat TRAV locus, no large duplications can be discerned (GIF 359 kb)

High resolution image

(EPS 542 kb)

Online Resource Table 1

Results of sequences found for each locus for the Rattus norvegicus genome (Rnor5.0) from our program VgenExtractor compared to those found by the IMGT. Data results are displayed as IMGT/VgenExtractor for functional (F), pseudogenes (P), and open reading frames (ORF). Empty entries indicate that annotations are not available (NA) in the IMGT (DOCX 11 kb)

Online Resource Table 2

Results of sequences found using VgenExtractor from different species where partial annotations are available at the IMGT. Data results are displayed as IMGT/VgenExtractor for functional (F), pseudogenes (P), and open reading frames (ORF). Empty entries indicate no available annotations in the IMGT (DOCX 11 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olivieri, D., Faro, J., von Haeften, B. et al. An automated algorithm for extracting functional immunologic V-genes from genomes in jawed vertebrates. Immunogenetics 65, 691–702 (2013). https://doi.org/10.1007/s00251-013-0715-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00251-013-0715-8

Keywords

Navigation