Application of Biomolecular Computing to Medical Science: A Biomolecular Database System for Storage, Processing, and Retrieval of Genetic Information and Material

Reif, John H.; Hauser, Michael; Pirrung, Michael; LaBean, Thomas

doi:10.1007/978-0-387-33532-2_31

John H. Reif⁵,
Michael Hauser⁵,
Michael Pirrung⁵ &
…
Thomas LaBean⁵

Part of the book series: Topics in Biomedical Engineering International Book Series ((ITBE))

1931 Accesses
1 Citations

Abstract

A key problem in medical science and genomics is that of the efficient storage, processing, and retrieval of genetic information and material. This chapter presents an architecture for a Biomolecular Database system that would provide a unique capability in genomics. It completely bypasses the usual transformation from biological material (genomic DNA and transcribed RNA) to digital media, as done in conventional bioinformatics. Instead, biotechnology techniques provide the needed capability of a Biomolecular Database system without ever transferring the biological information into digital media. The inputs to the system are DNA obtained from tissues: either genomic DNA, or reverse-transcript cDNA. The input DNA is then tagged with artificially synthesized DNA strands. These “information tags” encode essential information (e.g., identification of the DNA donor, as well as the date of the sample, gender, and date of birth) about the individual or cell type that the DNA was obtained from. The resulting Biomolecular Database is capable of containing a vast store of genomic DNA obtained from many individuals (multiple army divisions, etc.). For example, the DNA of a million individuals requires about 6 pedabits (6 × 10¹⁵ bits), but due to the compactness of DNA a volume the size of a conventional test tube with a few milliliters of solution could contain that entire Biomolecular Database. Known procedures for amplification and reproduction of the resulting Biomolecular Database are discussed. The Biomolecular Database system has the capability of retrieval of subsets of stored genetic material, which are specified by associative queries on the tags and/or the attached genomic DNA strands, as well as logical selection queries on the tags of the database. We describe how these queries can be executed by applying recombinant DNA operations on the Biomolecular Database, which have the effect of selection of subsets of the database as specified by the queries. In particular, we describe how to execute these queries on this Biomolecular Database by the use of biomolecular computing (also known as DNA computing) techniques, including execution of parallel associative search queries on DNA databases, and the execution of logical operations using recombinant DNA operations. We also utilize recent biotechnology developments (recombinant DNA technology, DNA hybridization arrays, DNA tagging methods, etc.), which are quickly being enhanced in scale (e.g., output via DNA hybridization array technology). The chapter also discusses applications of such a Biomolecular Database system to various medical sciences and genomic processing capabilities, including: (a) rapid identification of subpopulations possessing a specific known genotype, (b) large-scale gene expression profiling using DNA databases, and (c) streamlining identification of susceptibility genes (high-throughput screening of candidate genes to optimize genetic association analysis for complex diseases). Such a Biomolecular Database system may provide a revolutionary change in the way that these genomic problems are solved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

7. References

Adleman L. 1994. Molecular computation of solution to combinatorial problems. Science 266:1021–27.
Article PubMed CAS Google Scholar
Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM. 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511.
Article PubMed CAS Google Scholar
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. 1999. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750.
Article PubMed CAS Google Scholar
Arnheim, N, Li HH, Cui XF. 1990. PCR analysis of DNA sequences in single cells: single sperm gene mapping and genetic disease diagnosis. Genomics 8:415–419.
Article PubMed CAS Google Scholar
Bach E, Condon A, Glaser E, Tanguay C. 1996. Improved models and algorithms for DNA computation. In Proc. 11th annual IEEE conference on computational complexity, J Comp Syst Sci, pp. 290–299.
Google Scholar
Bancroft C, Bowler T, Bloom B, Clelland CT. 2001. Long-term storage of information in DNA. Science 293(5536):1763–1765.
Article PubMed CAS Google Scholar
Baum EB. 1995. How to build an associative memory vastly larger than the brain. Science 268:583–585.
Article PubMed CAS Google Scholar
Baum EB. 1996. DNA sequences useful for computation. In DNA sequences useful for computation, Proc. 2nd DIMACS workshop on DNA-based computing, Princeton. AMS DIMACS Series, 44:235–241. Ed. LF Landweber, E Baum. See (http://www.neci.nj.nec.com/homepages/eric/seq.ps.)
Google Scholar
Box, GEP. 1978. Statistics for experimenters: an introduction to design, data analysis, and model building. Wiley, New York.
Google Scholar
Box, GEP. 1987. Empirical model-building and response surfaces. Wiley, New York.
Google Scholar
Braich RS, Chelyapov N, Johnson C, Rothemund PWK, Adleman L. 2002. Solution of a 20-variable 3-SAT problem on a DNA computer. Science 296(5567):499–502.
Article PubMed CAS Google Scholar
Cantor CR, Smith CL, Mathew MK. 1988. Pulsed-field gel electrophoresis of very large DNA molecules. Annu Rev Biophys Biophys Chem 17:287–304.
Article PubMed CAS Google Scholar
Chen CJ, Deaton R, Wang Y. 2003. A DNA-based memory with in vitro learning and associative recall, Proc. 9th annual meeting on DNA-based computers, pp. 127–136.
Google Scholar
Clayton SJ, Scott FM, Walker J, Callaghan K, Haque K, Liloglou T, Xinarianos G, Shawcross S, Ceuppens P, Field JK, Fox JC. 2000. K-ras point mutation detection in lung cancer: comparison of two approaches to somatic mutation detection using ARMS allele-specific amplification. Clin Chem 46:1929–1938.
PubMed CAS Google Scholar
Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC, Small GW, Roses AD, Haines JL, Pericak-Vance MA. 1993. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science 261(5123):921–923.
Article PubMed CAS Google Scholar
Corder EH, Saunders AM, Risch NJ, Strittmatter WJ, Schmechel DE, Gaskell Jr PC, Rimmler JB, Locke PA, Conneally PM, Schmader KE, Small GW, Roses AD, Haines JL, Pericak-Vance MA. 1994. Protective effect of apolipoprotein e type 2 allele for late onset Alzheimer disease. Nature Genet 7:180–184.
Article PubMed CAS Google Scholar
Cukras AR, Faulhammer D, Lipton, RJ, Landweber LF. 2000. Molecular computation: RNA solutions to chess problems, Proc Natl Acad Sci USA 97:1385–1389.
Article PubMed Google Scholar
Deaton R. Murphy RE, Rose JA, Garzon M, Franceschetti DR, Stevens Jr SE. 1997. A DNA-based implementation of an evolutionary search for good encodings for DNA computation. In Proc. IEEE Conference on Evolutionary Computation, ICEC-97, pp. 267–271.
Google Scholar
Deaton R, Garzon M, Rose JA, Franceschetti DR, Murphy RC, Stevens Jr SE. 1998. Reliability and efficiency of a DNA-based computation. Phys Rev Lett 80:417–420.
Article CAS Google Scholar
Deaton R, Murphy RC, Garzon M, Franceschetti DR, Stevens Jr SE. 1999. Good encodings for DNA-based solutions to combinatorial problems. In Proc. DNA-based computers, II: DIMACS Workshop 10–12 June. Ed LF Landweber and EB Baum. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 44:247–258.
Google Scholar
Deming SN. 1987. Experimental design: a chemometric approach. Elsevier, New York.
Google Scholar
DeRisi J, van den Hazel B, Marc P, Balzi E, Brown P, Jacq C, Goffeau A. 2000. Genome microarray analysis of transcriptional activation in multidrug resistance yeast mutants. FEBS Lett 470(2):156–160.
Article PubMed CAS Google Scholar
Faulhammer D, Cukras AR, Lipton RJ, Landweber. 2000. Molecular computation: RNA solutions to chess problems. Proc Natl Acad Sci USA 97:1385–1389.
Article PubMed CAS Google Scholar
Frutos AG, Thiel AJ, Condon AE, Smith LM, Corn RM. 1997. DNA computing at surfaces: 4 base mismatch word design. In Proc. 3rd DIMACS meeting on DNA-based computers, University of Pennsylvania, Philadelphia, June.
Google Scholar
Garzon M, Deaton R, Neathery P, Murphy RC, Franceschetti DR, Stevens Jr SE. 1997. On the Encoding Problem for DNA Computing. In Proc. 3rd DIMACS meeting on DNA-based computers, University of Pennsylvania, Philadelphia, June.
Google Scholar
Garzon M, Neel A, Bobba K. 2004. Efficiency and reliability of semantic retrieval in DNA-based memories. In DNA computing, 9th international workshop on DNA-based computers. Ed. J Chen, JH Reif. Lect Notes Comput Sci 2943:157–169.
Google Scholar
Gehani A, and Reif JH. 1999. Microflow bio-molecular computation. In Proc. 4th DIMACS workshop on DNA-based computers, University of Pennsylvania, June 1998. Series in Discrete Mathematics and Theoretical Computer Science. Ed. H Rubin. American Mathematical Society, Providence, RI. Also appeared in special issue of Biosystems: J Biol Inform Processing Sci 52: (1–3):197–216.
Google Scholar
Gehani A, LaBean TH, Reif JH. 2000. DNA-based cryptography. In 5th DIMACS workshop on DNA-based computers, MIT, June 1999. Series in Discrete Mathematics and Theoretical Computer Science. Ed. E Winfree. American Mathematical Society, Providence, RI.
Google Scholar
Gray JM, Frutos TG, Berman AM, Condon AE, Lagally MG, Smith LM, Corn RM. 1996. Reducing errors in DNA computing by appropriate word design. Draft paper, University of Wisconsin, Department of Chemistry, October 9.
Google Scholar
Hartemink A, Gifford D, Khodor J. 1998. Automated constraint-based nucleotide sequence selection for DNA computation, In Proc. 4th DIMACS workshop on DNA-based computers, University of Pennsylvania, June 1998.
Google Scholar
Helene C, Thuong NT. 1991. Design of bifunctional oligonucleotide intercalator conjugates as inhibitors of gene expression. Nucleic Acids Symp Ser 24:133–137.
PubMed CAS Google Scholar
Jonoska N, Karl SA. 1997. Ligation experiments in computing with DNA. In Proc. IEEE Conference on Evolutionary Computation, ICEC-97, pp. 261–265.
Google Scholar
Kaplan P, Cecchi G, Libchaber A. 1996. DNA-based molecular computation: template-template interactions in PCR. In Proc. 2nd DIMACS workshop on DNA-based computing. Ed. LF Landweber and EB Baum. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 44:94–102.
Google Scholar
Kashiwamura S, Yamamoto M, Kameda A, Shiba T, Ohuchi A. 2003. Hierarchical DNA memory based on nested PCR. In Proc. 8th DIMACS workshop on DNA-based computing, Sapporo, Japan, June 10–13. Ed. M Hagiya, A Ohuchi. Lect Notes Comput Sci 2568:112–123.
Google Scholar
Li HH, Cui XF, Arnheim N. 1990. Analysis of DNA sequences in individual gametes: application to human genetic mapping. Prog Clin Biol Res 340C:207–211.
PubMed CAS Google Scholar
Lipton RJ. 1996. DNA computations can have global memory. In Proc. 2nd DIMACS workshop on DNA-based computing. Ed. LF Landweber and EB Baum. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 44:259–266.
Google Scholar
Liu Q, Liman W. Frutos AG, Condon AE, Corn RM, Smith LM. 2000. DNA Computing on surfaces. Nature 403:175–179.
Article PubMed CAS Google Scholar
Lizardi P, Huang X, Zhu Z, Bray-Ward P, Thomas DC, Ward DC. 1998. Mutant detection and single molecule counting using isothermal rolling circle replication. Nature Genet 19:225–232.
Article PubMed CAS Google Scholar
Mir KU. 1996. A restricted genetic alphabet for DNA computing. In Proc. 2nd DIMACS workshop on DNA-based computing. Ed. LF Landweber and EB Baum. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 44:243–246.
Google Scholar
Niculescu AB, Segal DS, Kuczenski R, Barrett T, Hauger RL, Kelsoe JR. 2000. Identifying a series of candidate genes for mania and psychosis: a convergent functional genomics approach. Physiol Genomics 4(1):83–91.
PubMed CAS Google Scholar
Olson MV. 1989. Separation of large DNA molecules by pulsed-field gel electrophoresis: a review of the basic phenomenology. J Chromatogr 470:377–383.
Article PubMed CAS Google Scholar
Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D. 2000. Molecular portraits of human breast tumours. Nature 406(6797):747–752.
Article PubMed CAS Google Scholar
Pieles U, Englisch U. 1989. Psoralen covalently linked to oligodeoxyribonucleotides: synthesis, sequence specific recognition of DNA and photo-cross-linking to pyrimidine residues of DNA. Nucleic Acids Res 17:285–99.
Article PubMed CAS Google Scholar
Pirrung MC. 1995. Combinatorial libraries: chemistry meets Darwin. Chemtracts Org Chem 8:5.
CAS Google Scholar
Pirrung MC. 1997. Spatially-addressable combinatorial libraries. Chem Rev 97:473.
Article PubMed CAS Google Scholar
Pirrung MC, Chau JH-L, Chen J. 1996. Indexed combinatorial libraries: non-oligomeric chemical diversity for the discovery of novel enzyme inhibitors. In Combinatorial chemistry: a high-tech search for new drug candidates, pp. 191–206. Ed. SR Wilson, R Murphy. John Wiley & Sons, New York.
Google Scholar
Pirrung MC, Connors RV, Montague-Smith MP, Odenbaugh AL, Walcott NG, Tollett JJ. 2000. The arrayed primer-extension method for DNA microchip analysis: molecular computation of satisfaction problems. J Am Chem Soc 122:1873.
Article CAS Google Scholar
Pirrung MC, Zhao X, Harris SV. 2001. A universal, photocleavable, DNA base: nitropiperonyl 2′-deoxyriboside (dP*). J Org Chem 66:2067.
Article PubMed CAS Google Scholar
Quillent C, Oberlin E, Braun J, Rousset D, Gonzalez-Canali G, Metais P, Montagnier L, Virelizier JL, Arenzana-Seisdedos F, Beretta A. 1998. HIV-1-resistance phenotype conferred by combination of two separate inherited mutations of CCR5 gene. Lancet 351(9095):14–18.
Article PubMed CAS Google Scholar
Reif, J.H. 1998. Paradigms for biomolecular computation. Paper presented at 1st international conference on unconventional models of computation, Auckland, New Zealand, January. In Unconventional models of computation, pp. 72–93. Ed. CS Calude, J Casti, MJ Dinneen. Springer, New York.
Google Scholar
Reif JH. 1999. Parallel Molecular Computation: Models and Simulations. In Proc. 7th annual ACM symposium on parallel algorithms and architectures (SPAA’95), Santa Barbara, CA, July 1995, pp. 213–223. Published in Algorithmica, special issue on Comput Biol 25(2):142–176.
Google Scholar
Reif JH. 2002. The emergence of the discipline of biomolecular computation in the US. Invited paper presented in a special issue on Biomolecular Computing, New Generation Computing, ed. M Hagiya, M Yamamura, T Head, 20(3):217–236.
Google Scholar
Reif, JH. 2002. Perspectives: successes and challenges. Science 296:478–479.
Article PubMed CAS Google Scholar
Reif JH. LaBean TH. 2001. Computationally inspired biotechnologies: improved dna synthesis and associative search using error-correcting codes and vector-quantization, In Proc. 6th DIMACS workshop on DNA-based computers, Leiden, The Netherlands, June 13–17, 2000. Lect Notes Comput Sci 2054:145–172.
Google Scholar
Reif JH, LaBean TH, Pirrung M, Rana VS, Guo B, Kingsford C, Wickham GS. 2002. Experimental construction of very large-scale DNA databases with associative search capability. In Proc. 7th DIMACS workshop on DNA-based computers, Tampa, FL, June 10–13, 2001. Lect Notes Comput Sci 2340:231–247.
Google Scholar
Risch N, Merikangas K. 1996. The future of genetic studies of complex human disorders. Science 273(5281):1516–1517.
Article PubMed CAS Google Scholar
Robinson BH, Seeman NC. 1987. The design of a biochip: a self-assembling molecular-scale memory device. Prot Eng 1:295–300.
Article CAS Google Scholar
Roweis S, Winfree E, Burgoyne R, Chelyapov NV, Goodman MF, Rothemund PWK, Adleman LM. 1998. A sticker-based model for DNA computation, J Comput Biol 5:615–629.
Article PubMed CAS Google Scholar
Sakakibara Y, Suyama A. 2000. Intelligent DNA chips: logical operation of gene expression profiles on DNA computers. Genome Informatics 11:33–42.
PubMed CAS Google Scholar
Suyama A, Nishida N, Kurata K, Omagari K. 2000. Gene expression analysis by DNA computing. Curr Comput Mol Biol 30:12–13.
CAS Google Scholar
Szatmari I, Aradi J. 2001. Telomeric repeat amplification, without shortening or lengthening of the telomerase products: a method to analyze the processivity of telomerase enzyme. Nucleic Acids Res 29:E3.
Article PubMed CAS Google Scholar
Taylor GR, Logan WP. 1995. The polymerase chain reaction: new variations on an old theme. Curr Opin Biotechnol 6:24–29.
Article PubMed CAS Google Scholar
Taylor GR, Robinson P. 1998. The polymerase chain reaction: from functional genomics to high-school practical classes. Curr Opin Biotechnol 9:35–42.
Article PubMed CAS Google Scholar
Wellinger RE, Lucchini R, Dammann R, Sogo JM. 1999. In vivo mapping of nucleosomes using psoralen-DNA crosslinking and primer extension. Methods Mol Biol 119:161–173.
PubMed CAS Google Scholar
Winfree E. 1998. Whiplash PCR for O(1) computing. In Proc. 4th DIMACS workshop on DNA-based computers, University of Pennsylvania, June 1998.
Google Scholar
Wood DH. 1998. Applying error-correcting codes to DNA computing. In Proc. 4th DIMACS workshop on DNA-based computers, University of Pennsylvania, June 1998, pp. 109–110.
Google Scholar
Zhang L, Cui X, Schmitt K, Hubert R, Navidi W, Arnheim N. 1992. Whole genome amplification from a single cell: implications for genetic analysis. Proc Natl Acad Sci USA 89:5847–5851.
Article PubMed CAS Google Scholar
Zhao R, Gish K, Murphy M, Yin Y, Notterman D, Hoffman WH, Tom E, Mack DH, Levine AJ. 2000. Analysis of p53-regulated gene expression patterns using oligonucleotide arrays. Genes Dev 14(8):981–993.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Ophthalmology, and Chemistry, Duke University, Durham, North Carolina
John H. Reif, Michael Hauser, Michael Pirrung & Thomas LaBean

Authors

John H. Reif
View author publications
You can also search for this author in PubMed Google Scholar
Michael Hauser
View author publications
You can also search for this author in PubMed Google Scholar
Michael Pirrung
View author publications
You can also search for this author in PubMed Google Scholar
Thomas LaBean
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John H. Reif .

Editor information

Editors and Affiliations

Harvard-MIT (HST) Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, East Bldg. 149, 13th Street, Charlestown, MA, 02129
Thomas S. Deisboeck M.D. (Assistant Professor of Radiology (HMS, MGH, HST) and Director, Complex Biosystems Modeling Laboratory) (Assistant Professor of Radiology (HMS, MGH, HST) and Director, Complex Biosystems Modeling Laboratory)
Dept. of Cardiothoracic Surgery, Drexel Univ. College of Medicine, 215 N. 15th Street, MS# 111, Philadelphia, PA, 19102-1192
J. Yasha Kresh Ph.D., F.A.C.C. (Professor, Research Director, Professor of Medicine and Director) (Professor, Research Director, Professor of Medicine and Director)
Cardiovascular Biophysics, Drexel Univ. College of Medicine, 215 N. 15th Street, MS# 111, Philadelphia, PA, 19102-1192
J. Yasha Kresh Ph.D., F.A.C.C. (Professor, Research Director, Professor of Medicine and Director) (Professor, Research Director, Professor of Medicine and Director)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Reif, J.H., Hauser, M., Pirrung, M., LaBean, T. (2006). Application of Biomolecular Computing to Medical Science: A Biomolecular Database System for Storage, Processing, and Retrieval of Genetic Information and Material. In: Deisboeck, T.S., Kresh, J.Y. (eds) Complex Systems Science in Biomedicine. Topics in Biomedical Engineering International Book Series. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-33532-2_31

Download citation

DOI: https://doi.org/10.1007/978-0-387-33532-2_31
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30241-6
Online ISBN: 978-0-387-33532-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics