An automated algorithm for extracting functional immunologic V-genes from genomes in jawed vertebrates
Variable (V) domains of immunoglobulins (Ig) and T cell receptors (TCR) are generated from genomic V gene segments (V-genes). At present, such V-genes have been annotated only within the genome of a few species. We have developed a bioinformatics tool that accelerates the task of identifying functional V-genes from genome datasets. Automated recognition is accomplished by recognizing key V-gene signatures, such as recombination signal sequences, size of the exon region, and position of amino acid motifs within the translated exon. This algorithm also classifies extracted V-genes into either TCR or Ig loci. We describe the implementation of the algorithm and validate its accuracy by comparing V-genes identified from the human and mouse genomes with known V-gene annotations documented and available in public repositories. The advantages and utility of the algorithm are illustrated by using it to identify functional V-genes in the rat genome, where V-gene annotation is still incomplete. This allowed us to perform a comparative human–rodent phylogenetic analysis based on V-genes that supports the hypothesis that distinct evolutionary pressures shape the TCRs and Igs V-gene repertoires. Our program, together with a user graphical interface, is available as open-source software, downloadable at http://code.google.com/p/vgenextract/.
KeywordsImmunoglobulins TCR Genes Immunoinformatics
This work was partially supported by the European Union 7th Framework Programme [FP7/REGPOT-2012-2013.1] under grant agreement no. 316265, BIOCAPS. JF acknowledges the support of PIRSES-GA-2008-230665 (7th FP, EC).
- Gambón-Deza F, Sánchez-Espinel C, Magadán-Mompó S (2010) Presence of an unique IGT on the IGH locus in three-spined stickleback fish (Gasterosteus aculeatus) and the very recent generation of a repertoire of vh genes. Dev Comp Immunol 34(2):114–122. doi: 10.1016/j.dci.2009.08.011 PubMedCrossRefGoogle Scholar
- Giudicelli V, Chaume D, Lefranc MP (2005) Imgt/gene-db: a comprehensive database for human and mouse immunoglobulin and t cell receptor genes. Nucl Acids Res pp D256–D26Google Scholar
- Lee Y, Alt FW, Reyes J, Gleason M, Zarrin A, Jung D (2009) Differential utilization of T cell receptor tcra/tcrd locus variable region gene segments is mediated by accessibility. Proc Natl Acad Sci 106(41):17,487–17,492Google Scholar