Application of Biomolecular Computing to Medical Science: A Biomolecular Database System for Storage, Processing, and Retrieval of Genetic Information and Material

  • John H. ReifEmail author
  • Michael Hauser
  • Michael Pirrung
  • Thomas LaBean
Part of the Topics in Biomedical Engineering International Book Series book series (ITBE)


A key problem in medical science and genomics is that of the efficient storage, processing, and retrieval of genetic information and material. This chapter presents an architecture for a Biomolecular Database system that would provide a unique capability in genomics. It completely bypasses the usual transformation from biological material (genomic DNA and transcribed RNA) to digital media, as done in conventional bioinformatics. Instead, biotechnology techniques provide the needed capability of a Biomolecular Database system without ever transferring the biological information into digital media. The inputs to the system are DNA obtained from tissues: either genomic DNA, or reverse-transcript cDNA. The input DNA is then tagged with artificially synthesized DNA strands. These “information tags” encode essential information (e.g., identification of the DNA donor, as well as the date of the sample, gender, and date of birth) about the individual or cell type that the DNA was obtained from. The resulting Biomolecular Database is capable of containing a vast store of genomic DNA obtained from many individuals (multiple army divisions, etc.). For example, the DNA of a million individuals requires about 6 pedabits (6 × 1015 bits), but due to the compactness of DNA a volume the size of a conventional test tube with a few milliliters of solution could contain that entire Biomolecular Database. Known procedures for amplification and reproduction of the resulting Biomolecular Database are discussed. The Biomolecular Database system has the capability of retrieval of subsets of stored genetic material, which are specified by associative queries on the tags and/or the attached genomic DNA strands, as well as logical selection queries on the tags of the database. We describe how these queries can be executed by applying recombinant DNA operations on the Biomolecular Database, which have the effect of selection of subsets of the database as specified by the queries. In particular, we describe how to execute these queries on this Biomolecular Database by the use of biomolecular computing (also known as DNA computing) techniques, including execution of parallel associative search queries on DNA databases, and the execution of logical operations using recombinant DNA operations. We also utilize recent biotechnology developments (recombinant DNA technology, DNA hybridization arrays, DNA tagging methods, etc.), which are quickly being enhanced in scale (e.g., output via DNA hybridization array technology). The chapter also discusses applications of such a Biomolecular Database system to various medical sciences and genomic processing capabilities, including: (a) rapid identification of subpopulations possessing a specific known genotype, (b) large-scale gene expression profiling using DNA databases, and (c) streamlining identification of susceptibility genes (high-throughput screening of candidate genes to optimize genetic association analysis for complex diseases). Such a Biomolecular Database system may provide a revolutionary change in the way that these genomic problems are solved.


Word Design Logical Query Associative Search DIMACS Workshop Biomolecular Computing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7. References

  1. 1.
    Adleman L. 1994. Molecular computation of solution to combinatorial problems. Science 266:1021–27.PubMedCrossRefGoogle Scholar
  2. 2.
    Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511.PubMedCrossRefGoogle Scholar
  3. 3.
    Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750.PubMedCrossRefGoogle Scholar
  4. 4.
    Arnheim, N, Li HH, Cui XF. 1990. PCR analysis of DNA sequences in single cells: single sperm gene mapping and genetic disease diagnosis. Genomics 8:415–419.PubMedCrossRefGoogle Scholar
  5. 5.
    Bach E, Condon A, Glaser E, Tanguay C. 1996. Improved models and algorithms for DNA computation. In Proc. 11th annual IEEE conference on computational complexity, J Comp Syst Sci, pp. 290–299.Google Scholar
  6. 6.
    Bancroft C, Bowler T, Bloom B, Clelland CT. 2001. Long-term storage of information in DNA. Science 293(5536):1763–1765.PubMedCrossRefGoogle Scholar
  7. 7.
    Baum EB. 1995. How to build an associative memory vastly larger than the brain. Science 268:583–585.PubMedCrossRefGoogle Scholar
  8. 8.
    Baum EB. 1996. DNA sequences useful for computation. In DNA sequences useful for computation, Proc. 2nd DIMACS workshop on DNA-based computing, Princeton. AMS DIMACS Series, 44:235–241. Ed. LF Landweber, E Baum. See ( Scholar
  9. 9.
    Box, GEP. 1978. Statistics for experimenters: an introduction to design, data analysis, and model building. Wiley, New York.Google Scholar
  10. 10.
    Box, GEP. 1987. Empirical model-building and response surfaces. Wiley, New York.Google Scholar
  11. 11.
    Braich RS, Chelyapov N, Johnson C, Rothemund PWK, Adleman L. 2002. Solution of a 20-variable 3-SAT problem on a DNA computer. Science 296(5567):499–502.PubMedCrossRefGoogle Scholar
  12. 12.
    Cantor CR, Smith CL, Mathew MK. 1988. Pulsed-field gel electrophoresis of very large DNA molecules. Annu Rev Biophys Biophys Chem 17:287–304.PubMedCrossRefGoogle Scholar
  13. 13.
    Chen CJ, Deaton R, Wang Y. 2003. A DNA-based memory with in vitro learning and associative recall, Proc. 9th annual meeting on DNA-based computers, pp. 127–136.Google Scholar
  14. 14.
    Clayton SJ, Scott FM, Walker J, Callaghan K, Haque K, Liloglou T, Xinarianos G, Shawcross S, Ceuppens P, Field JK, Fox JC. 2000. K-ras point mutation detection in lung cancer: comparison of two approaches to somatic mutation detection using ARMS allele-specific amplification. Clin Chem 46:1929–1938.PubMedGoogle Scholar
  15. 15.
    Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC, Small GW, Roses AD, Haines JL, Pericak-Vance MA. 1993. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science 261(5123):921–923.PubMedCrossRefGoogle Scholar
  16. 16.
    Corder EH, Saunders AM, Risch NJ, Strittmatter WJ, Schmechel DE, Gaskell Jr PC, Rimmler JB, Locke PA, Conneally PM, Schmader KE, Small GW, Roses AD, Haines JL, Pericak-Vance MA. 1994. Protective effect of apolipoprotein e type 2 allele for late onset Alzheimer disease. Nature Genet 7:180–184.PubMedCrossRefGoogle Scholar
  17. 17.
    Cukras AR, Faulhammer D, Lipton, RJ, Landweber LF. 2000. Molecular computation: RNA solutions to chess problems, Proc Natl Acad Sci USA 97:1385–1389.PubMedCrossRefGoogle Scholar
  18. 18.
    Deaton R. Murphy RE, Rose JA, Garzon M, Franceschetti DR, Stevens Jr SE. 1997. A DNA-based implementation of an evolutionary search for good encodings for DNA computation. In Proc. IEEE Conference on Evolutionary Computation, ICEC-97, pp. 267–271.Google Scholar
  19. 19.
    Deaton R, Garzon M, Rose JA, Franceschetti DR, Murphy RC, Stevens Jr SE. 1998. Reliability and efficiency of a DNA-based computation. Phys Rev Lett 80:417–420.CrossRefGoogle Scholar
  20. 20.
    Deaton R, Murphy RC, Garzon M, Franceschetti DR, Stevens Jr SE. 1999. Good encodings for DNA-based solutions to combinatorial problems. In Proc. DNA-based computers, II: DIMACS Workshop 10–12 June. Ed LF Landweber and EB Baum. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 44:247–258.Google Scholar
  21. 21.
    Deming SN. 1987. Experimental design: a chemometric approach. Elsevier, New York.Google Scholar
  22. 22.
    DeRisi J, van den Hazel B, Marc P, Balzi E, Brown P, Jacq C, Goffeau A. 2000. Genome microarray analysis of transcriptional activation in multidrug resistance yeast mutants. FEBS Lett 470(2):156–160.PubMedCrossRefGoogle Scholar
  23. 23.
    Faulhammer D, Cukras AR, Lipton RJ, Landweber. 2000. Molecular computation: RNA solutions to chess problems. Proc Natl Acad Sci USA 97:1385–1389.PubMedCrossRefGoogle Scholar
  24. 24.
    Frutos AG, Thiel AJ, Condon AE, Smith LM, Corn RM. 1997. DNA computing at surfaces: 4 base mismatch word design. In Proc. 3rd DIMACS meeting on DNA-based computers, University of Pennsylvania, Philadelphia, June.Google Scholar
  25. 25.
    Garzon M, Deaton R, Neathery P, Murphy RC, Franceschetti DR, Stevens Jr SE. 1997. On the Encoding Problem for DNA Computing. In Proc. 3rd DIMACS meeting on DNA-based computers, University of Pennsylvania, Philadelphia, June.Google Scholar
  26. 26.
    Garzon M, Neel A, Bobba K. 2004. Efficiency and reliability of semantic retrieval in DNA-based memories. In DNA computing, 9th international workshop on DNA-based computers. Ed. J Chen, JH Reif. Lect Notes Comput Sci 2943:157–169.Google Scholar
  27. 27.
    Gehani A, and Reif JH. 1999. Microflow bio-molecular computation. In Proc. 4th DIMACS workshop on DNA-based computers, University of Pennsylvania, June 1998. Series in Discrete Mathematics and Theoretical Computer Science. Ed. H Rubin. American Mathematical Society, Providence, RI. Also appeared in special issue of Biosystems: J Biol Inform Processing Sci 52: (1–3):197–216.Google Scholar
  28. 28.
    Gehani A, LaBean TH, Reif JH. 2000. DNA-based cryptography. In 5th DIMACS workshop on DNA-based computers, MIT, June 1999. Series in Discrete Mathematics and Theoretical Computer Science. Ed. E Winfree. American Mathematical Society, Providence, RI.Google Scholar
  29. 29.
    Gray JM, Frutos TG, Berman AM, Condon AE, Lagally MG, Smith LM, Corn RM. 1996. Reducing errors in DNA computing by appropriate word design. Draft paper, University of Wisconsin, Department of Chemistry, October 9.Google Scholar
  30. 30.
    Hartemink A, Gifford D, Khodor J. 1998. Automated constraint-based nucleotide sequence selection for DNA computation, In Proc. 4th DIMACS workshop on DNA-based computers, University of Pennsylvania, June 1998.Google Scholar
  31. 31.
    Helene C, Thuong NT. 1991. Design of bifunctional oligonucleotide intercalator conjugates as inhibitors of gene expression. Nucleic Acids Symp Ser 24:133–137.PubMedGoogle Scholar
  32. 32.
    Jonoska N, Karl SA. 1997. Ligation experiments in computing with DNA. In Proc. IEEE Conference on Evolutionary Computation, ICEC-97, pp. 261–265.Google Scholar
  33. 33.
    Kaplan P, Cecchi G, Libchaber A. 1996. DNA-based molecular computation: template-template interactions in PCR. In Proc. 2nd DIMACS workshop on DNA-based computing. Ed. LF Landweber and EB Baum. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 44:94–102.Google Scholar
  34. 34.
    Kashiwamura S, Yamamoto M, Kameda A, Shiba T, Ohuchi A. 2003. Hierarchical DNA memory based on nested PCR. In Proc. 8th DIMACS workshop on DNA-based computing, Sapporo, Japan, June 10–13. Ed. M Hagiya, A Ohuchi. Lect Notes Comput Sci 2568:112–123.Google Scholar
  35. 35.
    Li HH, Cui XF, Arnheim N. 1990. Analysis of DNA sequences in individual gametes: application to human genetic mapping. Prog Clin Biol Res 340C:207–211.PubMedGoogle Scholar
  36. 36.
    Lipton RJ. 1996. DNA computations can have global memory. In Proc. 2nd DIMACS workshop on DNA-based computing. Ed. LF Landweber and EB Baum. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 44:259–266.Google Scholar
  37. 37.
    Liu Q, Liman W. Frutos AG, Condon AE, Corn RM, Smith LM. 2000. DNA Computing on surfaces. Nature 403:175–179.PubMedCrossRefGoogle Scholar
  38. 38.
    Lizardi P, Huang X, Zhu Z, Bray-Ward P, Thomas DC, Ward DC. 1998. Mutant detection and single molecule counting using isothermal rolling circle replication. Nature Genet 19:225–232.PubMedCrossRefGoogle Scholar
  39. 39.
    Mir KU. 1996. A restricted genetic alphabet for DNA computing. In Proc. 2nd DIMACS workshop on DNA-based computing. Ed. LF Landweber and EB Baum. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 44:243–246.Google Scholar
  40. 40.
    Niculescu AB, Segal DS, Kuczenski R, Barrett T, Hauger RL, Kelsoe JR. 2000. Identifying a series of candidate genes for mania and psychosis: a convergent functional genomics approach. Physiol Genomics 4(1):83–91.PubMedGoogle Scholar
  41. 41.
    Olson MV. 1989. Separation of large DNA molecules by pulsed-field gel electrophoresis: a review of the basic phenomenology. J Chromatogr 470:377–383.PubMedCrossRefGoogle Scholar
  42. 42.
    Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D. 2000. Molecular portraits of human breast tumours. Nature 406(6797):747–752.PubMedCrossRefGoogle Scholar
  43. 43.
    Pieles U, Englisch U. 1989. Psoralen covalently linked to oligodeoxyribonucleotides: synthesis, sequence specific recognition of DNA and photo-cross-linking to pyrimidine residues of DNA. Nucleic Acids Res 17:285–99.PubMedCrossRefGoogle Scholar
  44. 44.
    Pirrung MC. 1995. Combinatorial libraries: chemistry meets Darwin. Chemtracts Org Chem 8:5.Google Scholar
  45. 45.
    Pirrung MC. 1997. Spatially-addressable combinatorial libraries. Chem Rev 97:473.PubMedCrossRefGoogle Scholar
  46. 46.
    Pirrung MC, Chau JH-L, Chen J. 1996. Indexed combinatorial libraries: non-oligomeric chemical diversity for the discovery of novel enzyme inhibitors. In Combinatorial chemistry: a high-tech search for new drug candidates, pp. 191–206. Ed. SR Wilson, R Murphy. John Wiley & Sons, New York.Google Scholar
  47. 47.
    Pirrung MC, Connors RV, Montague-Smith MP, Odenbaugh AL, Walcott NG, Tollett JJ. 2000. The arrayed primer-extension method for DNA microchip analysis: molecular computation of satisfaction problems. J Am Chem Soc 122:1873.CrossRefGoogle Scholar
  48. 48.
    Pirrung MC, Zhao X, Harris SV. 2001. A universal, photocleavable, DNA base: nitropiperonyl 2′-deoxyriboside (dP*). J Org Chem 66:2067.PubMedCrossRefGoogle Scholar
  49. 49.
    Quillent C, Oberlin E, Braun J, Rousset D, Gonzalez-Canali G, Metais P, Montagnier L, Virelizier JL, Arenzana-Seisdedos F, Beretta A. 1998. HIV-1-resistance phenotype conferred by combination of two separate inherited mutations of CCR5 gene. Lancet 351(9095):14–18.PubMedCrossRefGoogle Scholar
  50. 50.
    Reif, J.H. 1998. Paradigms for biomolecular computation. Paper presented at 1st international conference on unconventional models of computation, Auckland, New Zealand, January. In Unconventional models of computation, pp. 72–93. Ed. CS Calude, J Casti, MJ Dinneen. Springer, New York.Google Scholar
  51. 51.
    Reif JH. 1999. Parallel Molecular Computation: Models and Simulations. In Proc. 7th annual ACM symposium on parallel algorithms and architectures (SPAA’95), Santa Barbara, CA, July 1995, pp. 213–223. Published in Algorithmica, special issue on Comput Biol 25(2):142–176.Google Scholar
  52. 52.
    Reif JH. 2002. The emergence of the discipline of biomolecular computation in the US. Invited paper presented in a special issue on Biomolecular Computing, New Generation Computing, ed. M Hagiya, M Yamamura, T Head, 20(3):217–236.Google Scholar
  53. 53.
    Reif, JH. 2002. Perspectives: successes and challenges. Science 296:478–479.PubMedCrossRefGoogle Scholar
  54. 54.
    Reif JH. LaBean TH. 2001. Computationally inspired biotechnologies: improved dna synthesis and associative search using error-correcting codes and vector-quantization, In Proc. 6th DIMACS workshop on DNA-based computers, Leiden, The Netherlands, June 13–17, 2000. Lect Notes Comput Sci 2054:145–172.Google Scholar
  55. 55.
    Reif JH, LaBean TH, Pirrung M, Rana VS, Guo B, Kingsford C, Wickham GS. 2002. Experimental construction of very large-scale DNA databases with associative search capability. In Proc. 7th DIMACS workshop on DNA-based computers, Tampa, FL, June 10–13, 2001. Lect Notes Comput Sci 2340:231–247.Google Scholar
  56. 56.
    Risch N, Merikangas K. 1996. The future of genetic studies of complex human disorders. Science 273(5281):1516–1517.PubMedCrossRefGoogle Scholar
  57. 57.
    Robinson BH, Seeman NC. 1987. The design of a biochip: a self-assembling molecular-scale memory device. Prot Eng 1:295–300.CrossRefGoogle Scholar
  58. 58.
    Roweis S, Winfree E, Burgoyne R, Chelyapov NV, Goodman MF, Rothemund PWK, Adleman LM. 1998. A sticker-based model for DNA computation, J Comput Biol 5:615–629.PubMedCrossRefGoogle Scholar
  59. 59.
    Sakakibara Y, Suyama A. 2000. Intelligent DNA chips: logical operation of gene expression profiles on DNA computers. Genome Informatics 11:33–42.PubMedGoogle Scholar
  60. 60.
    Suyama A, Nishida N, Kurata K, Omagari K. 2000. Gene expression analysis by DNA computing. Curr Comput Mol Biol 30:12–13.Google Scholar
  61. 61.
    Szatmari I, Aradi J. 2001. Telomeric repeat amplification, without shortening or lengthening of the telomerase products: a method to analyze the processivity of telomerase enzyme. Nucleic Acids Res 29:E3.PubMedCrossRefGoogle Scholar
  62. 62.
    Taylor GR, Logan WP. 1995. The polymerase chain reaction: new variations on an old theme. Curr Opin Biotechnol 6:24–29.PubMedCrossRefGoogle Scholar
  63. 63.
    Taylor GR, Robinson P. 1998. The polymerase chain reaction: from functional genomics to high-school practical classes. Curr Opin Biotechnol 9:35–42.PubMedCrossRefGoogle Scholar
  64. 64.
    Wellinger RE, Lucchini R, Dammann R, Sogo JM. 1999. In vivo mapping of nucleosomes using psoralen-DNA crosslinking and primer extension. Methods Mol Biol 119:161–173.PubMedGoogle Scholar
  65. 65.
    Winfree E. 1998. Whiplash PCR for O(1) computing. In Proc. 4th DIMACS workshop on DNA-based computers, University of Pennsylvania, June 1998.Google Scholar
  66. 66.
    Wood DH. 1998. Applying error-correcting codes to DNA computing. In Proc. 4th DIMACS workshop on DNA-based computers, University of Pennsylvania, June 1998, pp. 109–110.Google Scholar
  67. 67.
    Zhang L, Cui X, Schmitt K, Hubert R, Navidi W, Arnheim N. 1992. Whole genome amplification from a single cell: implications for genetic analysis. Proc Natl Acad Sci USA 89:5847–5851.PubMedCrossRefGoogle Scholar
  68. 68.
    Zhao R, Gish K, Murphy M, Yin Y, Notterman D, Hoffman WH, Tom E, Mack DH, Levine AJ. 2000. Analysis of p53-regulated gene expression patterns using oligonucleotide arrays. Genes Dev 14(8):981–993.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Inc. 2006

Authors and Affiliations

  • John H. Reif
    • 1
    Email author
  • Michael Hauser
    • 1
  • Michael Pirrung
    • 1
  • Thomas LaBean
    • 1
  1. 1.Department of Computer Science, Ophthalmology, and ChemistryDuke UniversityDurham

Personalised recommendations