Gemina: A Web-Based Epidemiology and Genomic Metadata System Designed to Identify Infectious Agents
The Gemina system (http://gemina.tigr.org) developed at TIGR is a tool for identification of microbial and viral pathogens and their associated genomic sequences based on the associated epidemiological data. Gemina has been designed as a tool to identify epidemiological factors of disease incidence and to support the design of DNA-based diagnostics such as the development of DNA signature-based assays. The Gemina database contains the full complement of microbial and viral pathogens enumerated in the Microbial Rosetta Stone database (MRS) . Initially, curation efforts in Gemina have focused on the NIAID category A, B, and C priority pathogens  identified to the level of strains. For the bacterial NIAID category A-C pathogens, for example, we have included 38 species and 769 strains in Gemina. Representative genomic sequences are selected for each pathogen from NCBI’s GenBank by a three tiered filtering system and incorporated into TIGR’s Panda DNA sequence database. A single representative sequence is selected for each pathogen firstly from complete genome sequences (Tier 1), secondly from whole genome shotgun (WGS) data from genome projects (Tier 2), or thirdly from genomic nucleotide sequences from genome projects (Tier3). The list of selected accessions is transferred to Insignia when new pathogens are added to Gemina, allowing Insignia’s Signature Pipeline  to be run for each pathogen identified in a Gemina query.
KeywordsInfection System Viral Pathogen Control Vocabulary Whole Genome Shotgun Transmission Method
- 3.Insignia, http://insignia.cbcb.umd.edu/
- 4.Smith, B., et al.: Relations in biomedical ontologies. Genome Biol., R46 (2005)Google Scholar
- 5.National Center for Biotechnology Information (NCBI) Taxonomy, http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/