Chapter

Databases in Networked Information Systems

Volume 7108 of the series Lecture Notes in Computer Science pp 78-88

Personal Genomes: A New Frontier in Database Research

  • Taro L. SaitoAffiliated withDepartment of Computational Biology, The University of Tokyo

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Due to the recent technological improvement of the next-generation sequencers, reading genome sequence of individual DNA becomes popular in biology and medical study. The amount of data produced by next generation sequencers is enormous. Today, more than 10,000 people’s DNAs are sequenced in the world and tera-bytes of data are being produced in a daily basis. The types of genome information also vary according to the biological experiments used for preparing DNA samples. Biologists and medical scientists are now facing to manage these huge volumes of data with variety of types. Existing DBMS, whose major targets are business applications, is not suited to managing these biological data because storing such large data to DBMS is time-consuming, and also current database queries cannot accommodate various types of bioinformatics tools written in various programming languages. Processing bioinformatics workflows in parallel and distributed manner is also a challenging problem. In this paper, in hope of recruiting database researchers into this rapidly progressing biology and medical research area, we introduce several challenges in genome informatics from the viewpoint of using existing DBMS for processing next-generation sequencer data.

Keywords

Personal genomes bioinformatics parallel computing workflow management