Protocol

Data Mining for Systems Biology

Volume 939 of the series Methods in Molecular Biology pp 253-261

Date:

Viral Genome Analysis and Knowledge Management

  • Carla KuikenAffiliated withLos Alamos National Laboratory, Theoretical Biology and Biophysics (MS K710) Email author 
  • , Hyejin YoonAffiliated withLos Alamos National Laboratory, Theoretical Biology and Biophysics (MS K710)
  • , Werner AbfaltererAffiliated withLos Alamos National Laboratory, Theoretical Biology and Biophysics (MS K710)
  • , Brian GaschenAffiliated withLos Alamos National Laboratory, Theoretical Biology and Biophysics (MS K710)
  • , Chienchi LoAffiliated withLos Alamos National Laboratory, Theoretical Biology and Biophysics (MS K710)
  • , Bette KorberAffiliated withLos Alamos National Laboratory, Theoretical Biology and Biophysics (MS K710)

* Final gross prices may vary according to local VAT.

Get Access

Abstract

One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use.

The database/analysis platforms described in this chapter can be accessed at

http://​hiv.​lanl.​gov

http://​hcv.​lanl.​gov

http://​hfv.​lanl.​gov

Key words

RNA virus Alignment Database Ontology Taxonomy Annotation