Personal Genomes: A New Frontier in Database Research

Saito, Taro L.

doi:10.1007/978-3-642-25731-5_8

Personal Genomes: A New Frontier in Database Research

Taro L. Saito¹⁷

Conference paper

748 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7108))

Abstract

Due to the recent technological improvement of the next-generation sequencers, reading genome sequence of individual DNA becomes popular in biology and medical study. The amount of data produced by next generation sequencers is enormous. Today, more than 10,000 people’s DNAs are sequenced in the world and tera-bytes of data are being produced in a daily basis. The types of genome information also vary according to the biological experiments used for preparing DNA samples. Biologists and medical scientists are now facing to manage these huge volumes of data with variety of types. Existing DBMS, whose major targets are business applications, is not suited to managing these biological data because storing such large data to DBMS is time-consuming, and also current database queries cannot accommodate various types of bioinformatics tools written in various programming languages. Processing bioinformatics workflows in parallel and distributed manner is also a challenging problem. In this paper, in hope of recruiting database researchers into this rapidly progressing biology and medical research area, we introduce several challenges in genome informatics from the viewpoint of using existing DBMS for processing next-generation sequencer data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)
Google Scholar
Barski, A., Cuddapah, S., Cui, K., Roh, T., Schones, D.: High-resolution profiling of histone methylations in the human genome. Cell (2007)
Google Scholar
Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation (1994)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, vol. 6, p. 10. USENIX Association, Berkeley (2004)
Google Scholar
Durbin, R.M., Altshuler, D.L., Durbin, R.M., Abecasis, G.R., Bentley, D.R., et al.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010)
Article Google Scholar
Flicek, P.: Sense from sequence reads: methods for alignment and assembly. Nature Methods (2009)
Google Scholar
Flicek, P., Amode, M., Barrell, D., Beal, K.: Ensembl 2011. Nucleic Acid Research (2011)
Google Scholar
Fujita, P., Rhead, B., Zweig, A.: The UCSC Genome Browser database: update 2011. Nucleic Acids … (2011)
Google Scholar
Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S., Berlin, A.M., Aird, D., Costello, M., Daza, R., Williams, L., Nicol, R., Gnirke, A., Nusbaum, C., Lander, E.S., Jaffe, D.B.: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences 108(4), 1513–1518 (2011)
Article Google Scholar
Apache, hadoop, http://hadoop.apache.org/
Hashimoto, S.-i., Suzuki, Y., Kasai, Y., Morohoshi, K., Yamada, T., Sese, J., Morishita, S., Sugano, S., Matsushima, K.: 5?-end SAGE for the analysis of transcriptional start sites. Nature Biotechnology 22(9), 1146–1149 (2004)
Article Google Scholar
Illumina, HiSeq (2000), http://www.illumina.com/
Jagadish, H.V., Chapman, A., Elkiss, A., Jayapandian, M., Li, Y., Nandi, A., Yu, C.: Making database systems usable. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 13–24. ACM Press, New York (2007)
Chapter Google Scholar
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3), R25+ (2009)
Google Scholar
Li, H., Durbin, R.: Fast and accurate short read alignment with burrowswheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Article Google Scholar
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research (2010)
Google Scholar
Lister, R., Pelizzola, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery, J.R., Lee, L., Ye, Z., Ngo, Q.-M., Edsall, L., Antosiewicz-Bourget, J., Stewart, R., Ruotti, V., Millar, A.H., Thomson, J.A., Ren, B., Ecker, J.R.: Human DNA methylomes at base resolution show widespread epigenomic differences.. Nature 462(7271), 315–322 (2009)
Article Google Scholar
Martin, J.A., Wang, Z.: Next-generation transcriptome assembly. Nature Reviews Genetics 12(10), 671–682 (2011)
Article Google Scholar
Nègre, N., Brown, C.D., Ma, L., Bristow, C.A., Miller, S.W., Wagner, U., Kheradpour, P., et al.: A cis-regulatory map of the Drosophila genome. Nature 471(7339), 527–531 (2011)
Article Google Scholar
Saito, T., Yoshimura, J., Sasaki, S., Ahsan, B., Sasaki, A., Kuroshu, R., Morishita, S.: UTGB toolkit for personalized genome browsers. Bioinformatics (January 2009)
Google Scholar
Samtools, http://samtools.sourceforge.net/
Schones, D.E., Cui, K., Cuddapah, S., Roh, T.-Y., Barski, A., Wang, Z., Wei, G., Zhao, K.: Dynamic regulation of nucleosome positioning in the human genome. Cell 132(5), 887–898 (2008)
Article Google Scholar
Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29(1), 308–311 (2001)
Article Google Scholar
Simpson, J., Wong, K., Jackman, S.: ABySS: a parallel assembler for short read sequence data. Genome Research (2009)
Google Scholar
Applied biosystems, SOLiD4 System, m http://www.appliedbiosystems.com/
Taura, K., Matsuzaki, T., Miwa, M., Kamoshida, Y.: Design and implementation of GXP make–A workflow system based on make. Future Generation Computer Systems (2011)
Google Scholar
UCSC, Data File Formats FAQ, http://genome.ucsc.edu/FAQ/FAQformat.html
Wang, Z., Gerstein, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics (2009)
Google Scholar
Wilhelm, B.: RNA-Seq–quantitative measurement of expression through massively parallel RNA-sequencing. Nature Methods (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Biology, The University of Tokyo, Japan
Taro L. Saito

Authors

Taro L. Saito
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Aizu, Japan
Shinji Kikuchi , Aastha Madaan , Shelly Sachdeva & Subhash Bhalla , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saito, T.L. (2011). Personal Genomes: A New Frontier in Database Research. In: Kikuchi, S., Madaan, A., Sachdeva, S., Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2011. Lecture Notes in Computer Science, vol 7108. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25731-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-25731-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25730-8
Online ISBN: 978-3-642-25731-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics