Immunogenetics

, Volume 65, Issue 9, pp 691–702

An automated algorithm for extracting functional immunologic V-genes from genomes in jawed vertebrates

  • David Olivieri
  • Jose Faro
  • Bernardo von Haeften
  • Christian Sánchez-Espinel
  • Francisco Gambón-Deza
Original Paper

DOI: 10.1007/s00251-013-0715-8

Cite this article as:
Olivieri, D., Faro, J., von Haeften, B. et al. Immunogenetics (2013) 65: 691. doi:10.1007/s00251-013-0715-8

Abstract

Variable (V) domains of immunoglobulins (Ig) and T cell receptors (TCR) are generated from genomic V gene segments (V-genes). At present, such V-genes have been annotated only within the genome of a few species. We have developed a bioinformatics tool that accelerates the task of identifying functional V-genes from genome datasets. Automated recognition is accomplished by recognizing key V-gene signatures, such as recombination signal sequences, size of the exon region, and position of amino acid motifs within the translated exon. This algorithm also classifies extracted V-genes into either TCR or Ig loci. We describe the implementation of the algorithm and validate its accuracy by comparing V-genes identified from the human and mouse genomes with known V-gene annotations documented and available in public repositories. The advantages and utility of the algorithm are illustrated by using it to identify functional V-genes in the rat genome, where V-gene annotation is still incomplete. This allowed us to perform a comparative human–rodent phylogenetic analysis based on V-genes that supports the hypothesis that distinct evolutionary pressures shape the TCRs and Igs V-gene repertoires. Our program, together with a user graphical interface, is available as open-source software, downloadable at http://code.google.com/p/vgenextract/.

Keywords

Immunoglobulins TCR Genes Immunoinformatics 

Supplementary material

251_2013_715_Fig4_ESM.gif (99 kb)
Online Resource Fig. 1

Results for human IGHV. The genomic segment of chromosome 14 that contains the IGHV is shown. The figure consists of the Ensembl representation of this chromosomal section with the Havana IGHV annotation (obtained from their web application) and superposed with the V-gene positions obtained from our algorithm, VgenExtractor. A numerical summary of the results are provided in Table 1 (GIF 99 kb)

251_2013_715_MOESM1_ESM.eps (70 kb)
High resolution image(EPS 69 kb)
251_2013_715_Fig5_ESM.gif (95 kb)
Online Resource Fig. 2

Results for human IGKV. The final genomic segment of chromosome 2 containing IGKV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard IGKV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 (GIF 95 kb)

251_2013_715_MOESM2_ESM.eps (60 kb)
High resolution image(EPS 60 kb)
251_2013_715_Fig6_ESM.gif (80 kb)
Online Resource Fig. 3

Results for human IGLV. The final genomic segment of chromosome 22 containing IGLV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard IGVL gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 (GIF 80 kb)

251_2013_715_MOESM3_ESM.eps (58 kb)
High resolution image(EPS 57 kb)
251_2013_715_Fig7_ESM.gif (104 kb)
Online Resource Fig. 4

Results for human TRBV. The final genomic segment of chromosome 2 containing TRBV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard TRBV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 in the text (GIF 103 kb)

251_2013_715_MOESM4_ESM.eps (63 kb)
High resolution image(EPS 63 kb)
251_2013_715_Fig8_ESM.gif (51 kb)
Online Resource Fig. 5

Results for human TRGV. The final genomic segment of chromosome 2 containing TRGV regions is represented. The image was obtained from the Ensembl genomic viewer, where we have maintained the standard TRGV gene annotation defined by the HAVANA project. Superimposed on this figure from Ensembl is output from VgenExtractor, showing the location of the exons that were found by the algorithm, thereby providing a direct comparison. A numerical summary of the results are provided in Table 1 in the text (GIF 50 kb)

251_2013_715_MOESM5_ESM.eps (46 kb)
High resolution image(EPS 45 kb)
251_2013_715_Fig9_ESM.gif (360 kb)
Online Resource Fig. 6

Study of gene segments duplications from TRAV locus in rat genome. The upper subfigure represents the human IGKV domain while the lower subfigure represents the rat TRAV locus. In each subfigure, duplicate gene segments are depicted. The green tracks indicate homology between 85 and 99.9 %. In the IGKV locus, a quasi-duplication process can be seen throughout the entire locus. In contrast, in the rat TRAV locus, no large duplications can be discerned (GIF 359 kb)

251_2013_715_MOESM6_ESM.eps (542 kb)
High resolution image(EPS 542 kb)
251_2013_715_MOESM7_ESM.docx (12 kb)
Online Resource Table 1Results of sequences found for each locus for the Rattus norvegicus genome (Rnor5.0) from our program VgenExtractor compared to those found by the IMGT. Data results are displayed as IMGT/VgenExtractor for functional (F), pseudogenes (P), and open reading frames (ORF). Empty entries indicate that annotations are not available (NA) in the IMGT (DOCX 11 kb)
251_2013_715_MOESM8_ESM.docx (12 kb)
Online Resource Table 2Results of sequences found using VgenExtractor from different species where partial annotations are available at the IMGT. Data results are displayed as IMGT/VgenExtractor for functional (F), pseudogenes (P), and open reading frames (ORF). Empty entries indicate no available annotations in the IMGT (DOCX 11 kb)

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • David Olivieri
    • 1
  • Jose Faro
    • 2
    • 3
    • 4
  • Bernardo von Haeften
    • 5
  • Christian Sánchez-Espinel
    • 6
  • Francisco Gambón-Deza
    • 7
    • 3
  1. 1.School of Computer EngineeringUniversity of VigoOurenseSpain
  2. 2.Immunology, Faculty of Biology, and Biomedical Research Center (CINBIO)University of VigoVigoSpain
  3. 3.Instituto Biomédico de VigoVigoSpain
  4. 4.Instituto Gulbenkian de CiênciaOeirasPortugal
  5. 5.Area of Immunology, Faculty of BiologyUniversity of VigoVigoSpain
  6. 6.Nanoimmunotech SL, Pza. Fernando CondeVigoSpain
  7. 7.Servicio Gallego de Salud (SERGAS)Unidad de Inmunología, Hospital do MeixoeiroVigoSpain

Personalised recommendations