Skip to main content

Personal Genomes: A New Frontier in Database Research

  • Conference paper
  • 748 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7108))

Abstract

Due to the recent technological improvement of the next-generation sequencers, reading genome sequence of individual DNA becomes popular in biology and medical study. The amount of data produced by next generation sequencers is enormous. Today, more than 10,000 people’s DNAs are sequenced in the world and tera-bytes of data are being produced in a daily basis. The types of genome information also vary according to the biological experiments used for preparing DNA samples. Biologists and medical scientists are now facing to manage these huge volumes of data with variety of types. Existing DBMS, whose major targets are business applications, is not suited to managing these biological data because storing such large data to DBMS is time-consuming, and also current database queries cannot accommodate various types of bioinformatics tools written in various programming languages. Processing bioinformatics workflows in parallel and distributed manner is also a challenging problem. In this paper, in hope of recruiting database researchers into this rapidly progressing biology and medical research area, we introduce several challenges in genome informatics from the viewpoint of using existing DBMS for processing next-generation sequencer data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)

    Google Scholar 

  2. Barski, A., Cuddapah, S., Cui, K., Roh, T., Schones, D.: High-resolution profiling of histone methylations in the human genome. Cell (2007)

    Google Scholar 

  3. Burrows, M., Wheeler, D.: A block-sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, vol. 6, p. 10. USENIX Association, Berkeley (2004)

    Google Scholar 

  5. Durbin, R.M., Altshuler, D.L., Durbin, R.M., Abecasis, G.R., Bentley, D.R., et al.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010)

    Article  Google Scholar 

  6. Flicek, P.: Sense from sequence reads: methods for alignment and assembly. Nature Methods (2009)

    Google Scholar 

  7. Flicek, P., Amode, M., Barrell, D., Beal, K.: Ensembl 2011. Nucleic Acid Research (2011)

    Google Scholar 

  8. Fujita, P., Rhead, B., Zweig, A.: The UCSC Genome Browser database: update 2011. Nucleic Acids … (2011)

    Google Scholar 

  9. Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S., Berlin, A.M., Aird, D., Costello, M., Daza, R., Williams, L., Nicol, R., Gnirke, A., Nusbaum, C., Lander, E.S., Jaffe, D.B.: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences 108(4), 1513–1518 (2011)

    Article  Google Scholar 

  10. Apache, hadoop, http://hadoop.apache.org/

  11. Hashimoto, S.-i., Suzuki, Y., Kasai, Y., Morohoshi, K., Yamada, T., Sese, J., Morishita, S., Sugano, S., Matsushima, K.: 5?-end SAGE for the analysis of transcriptional start sites. Nature Biotechnology 22(9), 1146–1149 (2004)

    Article  Google Scholar 

  12. Illumina, HiSeq (2000), http://www.illumina.com/

  13. Jagadish, H.V., Chapman, A., Elkiss, A., Jayapandian, M., Li, Y., Nandi, A., Yu, C.: Making database systems usable. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 13–24. ACM Press, New York (2007)

    Chapter  Google Scholar 

  14. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3), R25+ (2009)

    Google Scholar 

  15. Li, H., Durbin, R.: Fast and accurate short read alignment with burrowswheeler transform. Bioinformatics 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  16. Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research (2010)

    Google Scholar 

  17. Lister, R., Pelizzola, M., Dowen, R.H., Hawkins, R.D., Hon, G., Tonti-Filippini, J., Nery, J.R., Lee, L., Ye, Z., Ngo, Q.-M., Edsall, L., Antosiewicz-Bourget, J., Stewart, R., Ruotti, V., Millar, A.H., Thomson, J.A., Ren, B., Ecker, J.R.: Human DNA methylomes at base resolution show widespread epigenomic differences.. Nature 462(7271), 315–322 (2009)

    Article  Google Scholar 

  18. Martin, J.A., Wang, Z.: Next-generation transcriptome assembly. Nature Reviews Genetics 12(10), 671–682 (2011)

    Article  Google Scholar 

  19. Nègre, N., Brown, C.D., Ma, L., Bristow, C.A., Miller, S.W., Wagner, U., Kheradpour, P., et al.: A cis-regulatory map of the Drosophila genome. Nature 471(7339), 527–531 (2011)

    Article  Google Scholar 

  20. Saito, T., Yoshimura, J., Sasaki, S., Ahsan, B., Sasaki, A., Kuroshu, R., Morishita, S.: UTGB toolkit for personalized genome browsers. Bioinformatics (January 2009)

    Google Scholar 

  21. Samtools, http://samtools.sourceforge.net/

  22. Schones, D.E., Cui, K., Cuddapah, S., Roh, T.-Y., Barski, A., Wang, Z., Wei, G., Zhao, K.: Dynamic regulation of nucleosome positioning in the human genome. Cell 132(5), 887–898 (2008)

    Article  Google Scholar 

  23. Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., Sirotkin, K.: dbSNP: the NCBI database of genetic variation. Nucleic Acids Research 29(1), 308–311 (2001)

    Article  Google Scholar 

  24. Simpson, J., Wong, K., Jackman, S.: ABySS: a parallel assembler for short read sequence data. Genome Research (2009)

    Google Scholar 

  25. Applied biosystems, SOLiD4 System, m http://www.appliedbiosystems.com/

  26. Taura, K., Matsuzaki, T., Miwa, M., Kamoshida, Y.: Design and implementation of GXP make–A workflow system based on make. Future Generation Computer Systems (2011)

    Google Scholar 

  27. UCSC, Data File Formats FAQ, http://genome.ucsc.edu/FAQ/FAQformat.html

  28. Wang, Z., Gerstein, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics (2009)

    Google Scholar 

  29. Wilhelm, B.: RNA-Seq–quantitative measurement of expression through massively parallel RNA-sequencing. Nature Methods (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saito, T.L. (2011). Personal Genomes: A New Frontier in Database Research. In: Kikuchi, S., Madaan, A., Sachdeva, S., Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2011. Lecture Notes in Computer Science, vol 7108. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25731-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25731-5_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25730-8

  • Online ISBN: 978-3-642-25731-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics