Accurate Restoration of DNA Sequences

  • Gary A. Churchill
Part of the Lecture Notes in Statistics book series (LNS, volume 105)

Keywords

Error Rate Markov Chain Hide Markov Model Posterior Distribution Gibbs Sampler 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alizadeh, F., Karp, R.M., Newberg, L.A., Weisser, D.K. (1992) Physical mapping of chromosomes: A combinatorial problem in molecular biology. Preprint.Google Scholar
  2. Altschul, S.F., Lipman, D.J. (1989) Trees, stars, and multiple biological sequence alignment. SIAM Journal on Applied Mathematics 49:197–209.MathSciNetMATHCrossRefGoogle Scholar
  3. Berger, J.O. (1985) Statistical Decision Theory and Bayesian Analysis. 2nd ed. Springer-Verlag.MATHGoogle Scholar
  4. Borodovsky, M. and McIninch, J. (1993a) Genmark: Parallel gene recognition for both DNA strands. Computers Chem. 17:123–133.MATHCrossRefGoogle Scholar
  5. Borodovsky, M. and McIninch, J. (1993b) Eecognition of genes in DNA sequence with ambiguity. Biosystems 30:161–171.CrossRefGoogle Scholar
  6. Bowling, J.M., Bruner, K.L., Cmarik, J.L., Tibbets, C. (1991) Neighboring nucleotide interactions during DNA sequencing gel electrophoresis. Nucl. Acids Res. 19:3089–3097.CrossRefGoogle Scholar
  7. Branscomb, E. et al. (1990) Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries. Genomics 8:351–366.CrossRefGoogle Scholar
  8. Casella, G.C. and George, E.I. (1992) Explaining the Gibbs sampler American Statistician.Google Scholar
  9. Chen, E. et al. (1991) Sequence of the human glucose-6-phosphate dehydrogenase cloned in plasmids and a yeast artificial chromosome. Genomics 10:792–800.CrossRefGoogle Scholar
  10. Chernoff H. (1992) Estimating a sequence from noisy copies. Harvard University technical report no. ONR-C-10.Google Scholar
  11. Churchill, G.A. (1989) A stochastic model for heterogeneous DNA sequences. Bull. Math. Biol. 51:79–94.MathSciNetMATHGoogle Scholar
  12. Churchill, G.A., Burks, C., Eggert, M., Engle, M.L., Waterman, M.S. (1992) Assembling DNA fragments by shuffling and simulated annealing. Manuscript.Google Scholar
  13. Churchill, G.A. and Thorne, J.L. (1993) The probability distribution of a molecular sequence alignment. Cornell University, Biometrics Unit technical report.Google Scholar
  14. Churchill, G.A. and Waterman, M.S. (1992). The accuracy of DNA sequences: estimating sequence quality. Genomics in press.Google Scholar
  15. Clark, A.G. and Whittam T.S. (1992) Sequencing errors and molecular evolutionary analysis. Mol. Biol. Evol. 9:744–752.Google Scholar
  16. Clarke, L. and Carbon, J. (1976) A colony bank containing synthetic Col EI hybrid plasmids representative of the entire E. coli genome. Cell 9:91–99.CrossRefGoogle Scholar
  17. Cornish-Bowden A. (1985) Nomenclature for incompletely specified bases in DNA sequences: Recommendations 1984. Nucl. Acids Res. 13:3021–3030.CrossRefGoogle Scholar
  18. Daniels, D.L., Plunkett, G., Burland, V., Blattner, F.R. (1992) Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science 257: 771–778.CrossRefGoogle Scholar
  19. Dempster, A.P., Laird, N.M., Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B 39:1–38.MathSciNetMATHGoogle Scholar
  20. Edwards, A. et al. (1990) Automated DNA sequencing of the Human HPRT locus. Genomics 6:593–608.CrossRefGoogle Scholar
  21. Fu, Y.X., Timberlake, W.E., Arnold, J. (1992) On the design of genome mapping experiments using short synthetic oligonucleotides. Biometrics 48:337–359.CrossRefGoogle Scholar
  22. Gelfand A.E. and Smith, A.F.M. (1990) Sampling based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85:398–409.MathSciNetMATHCrossRefGoogle Scholar
  23. Gelman, A. and Rubin, D.B. (1992) Inference from iterative simulation, with discussion. Statistical Science 7:457–511.CrossRefGoogle Scholar
  24. Geyer, C.J. (1992) Markov chain Monte Carlo maximum likelihood. Computer Science and Statistics: Proceeding of the 23rd symposium on the interface.Google Scholar
  25. Golden, J.B., Torgersen, D., Tibbets, C. (1993) Pattern recognition for automated DNA sequencing: I. On-line signal conditioning and feature extraction for basecalling. In Proceedings of the First International Conference on Intelligent Systems for Molecular Biology. AAAI Press.Google Scholar
  26. Hastings (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109.CrossRefGoogle Scholar
  27. Huang, X. (1992) A contig assembly program based on sensitive detection of fragment overlaps. Genomics 14:18–25.CrossRefGoogle Scholar
  28. Hunkapillar, T, Kaiser, R.J., Koop, B.F., Hood, L. (1991) Large-scale automated DNA sequence determination. Science 254:59–67.CrossRefGoogle Scholar
  29. Kececioglu, J. and Myers, E. (1990). A robust automatic fragment assembly system. Preprint.Google Scholar
  30. Koop, B.F., Rowan, L., Chen, W.-Q., Deshpande, P., Lee, H. and Hood, L. (1993) Sequence length and error analysis of sequenase and automated Taq cycle sequencing methods. Bio Techniques 14:442–447.Google Scholar
  31. Krawetz, S.A. (1989) Sequence errors described in GenBank: A means to determine the accuracy of DNA sequence interpretation. Nucl. Acids Res. 17:3951–3957.CrossRefGoogle Scholar
  32. Krogh, A., Brown, M., Mian, I.S., Sjölander, K., Haussler, D. (1993) Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol., accepted.Google Scholar
  33. Lander, E.S. and Waterman, M.S. (1988) Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2:231–239.CrossRefGoogle Scholar
  34. Larson, S., Mudita, J., Myers, G. (1993) An interface for a fragment assembly kernal. University of Arizona, Department of Computer Science TR93–20.Google Scholar
  35. Lawrence, C.B. and Solovyev, V.V. (1993) Assignment of position specific error probability to primary DNA sequence data, manuscript.Google Scholar
  36. Lewin, B. (1992) Genes V. Wiley, New York.Google Scholar
  37. Maxam, A.M. and Gilbert, W. (1977) A new method for sequencing DNA. Proc. Natl Acad. Sci. 74:5463–5467.CrossRefGoogle Scholar
  38. Oliver, S.G., et al. (1992) The complete DNA sequence of yeast chromosome III. Nature 357:38–46.CrossRefGoogle Scholar
  39. Posfai J. and Roberts, R.J. (1992) Finding errors in DNA sequences. Proc. Natl. Acad. Sci. 89: 4698–4702.CrossRefGoogle Scholar
  40. Roberts, L. (1990). Large-scale sequencing trials begin. Science, 250: 1336–1338.CrossRefGoogle Scholar
  41. Sanger, F., Nicklen, S., and Coulson, A.R. (1977) DNA sequencing with chain terminating inhibitors. Biochemistry 74:560–564.Google Scholar
  42. Santner, T.J. and Duffy, D.E. (1989) The Statistical Analysis of Discrete Data. Springer-Verlag, NY.MATHCrossRefGoogle Scholar
  43. Seto, D., Koop, B.F., Hood, L. (1993) An experimentally derived data set constructed for testing large-scale DNA sequence assembly algorithms. Genomics 15:673–676.CrossRefGoogle Scholar
  44. Staden, R. (1980). A new computer method for the storage and manipulation of DNA gel reading data. Nucleic Acids Res. 8:3673–2694.CrossRefGoogle Scholar
  45. States, D.J. (1992) Molecular sequence accuracy: analysing imperfect data. Trends in Genetics 8:52–55.Google Scholar
  46. States, D.J. and Botstein, D. (1991). Molecular sequence accuracy and the analysis of protein coding regions. Proc. Natl. Acad. Sci. USA 88:5518–5522.CrossRefGoogle Scholar
  47. Sulston, J. et al. (1992) The C. elegans genome sequencing project: a beginning. Nature 356:37–41.CrossRefGoogle Scholar
  48. Thorne, J.L. and Churchill, G.A. (1993) Estimation and reliability of molecular sequence alignments. Biometrics, accepted.Google Scholar
  49. Thorne, J.L., Kishino, H., Felsenstein, J.F. (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33:114–124.CrossRefGoogle Scholar
  50. Thorne, J.L., Kishino, H., Felsenstein, J.F. (1992) Inching toward reality: An improved likelihood model of sequence evolution. J. Mol. Evol. 34:3–16.CrossRefGoogle Scholar
  51. Tibbets, C, Bowling, J.M., Golden, J.B. (1993) Neural networks for automated base calling of gel-based DNA sequencing ladders. In Automated DNA Sequencing and Analysis Techniques Dr. J. Craig Ventner, Editor, Academic Press.Google Scholar
  52. Waterman, M.S. (1984) General methods of sequence comparison. Bull. Math. Biol. 46:473–500.MathSciNetMATHGoogle Scholar
  53. Watson, J and Crick, F. (1953) Nature 171: 737–738.CrossRefGoogle Scholar
  54. Besag, J. and Mengersen, K.L. (1993) Meta-Analysis using Monte Carlo Markov Chain methods. Tech. report, Dept. of Statistics, Colorado State Univ.Google Scholar
  55. Celeux, G. and Diebolt, J. (1986) The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Cornput. Statist. Quater. 2, 73–82.Google Scholar
  56. Diebolt, J. and Robert, C.P. (1993) The Duality Principle: Discussion of Smith and Roberts, Besag and Green, and Gllks et al. J.R.S.S. (Ser. B) 55, 73–74.Google Scholar
  57. Diebolt, J. and Robert, C.P. (1994) Estimation of finite mixture distributions by Bayesian sampling. J.R.S.S. (Ser. B) 56, 163–175.MathSciNetGoogle Scholar
  58. Gelman, A. and Rubin, D.B. (1992) Does a single iteration suffice? In Bayesian Statistics 4 (J.O. Berger, J.M. Bernardo, A.P. Dawid and A.F.M. Smith, eds.) Oxford University Press, London.Google Scholar
  59. Karlin, S., Dembo, A., and Kawabata, T. (1990). Statistical composition of high-scoring segments from molecular sequences. Ann. Statist. 18 , 571–581.MathSciNetCrossRefGoogle Scholar
  60. Lawrence, C.E., Atschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F. and Wootton, J.C. (1993) Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208–214.CrossRefGoogle Scholar
  61. Muller, P. (1992) A black-box algorithm for implementing the Metropolis algorithm. Tech. Report, Dept. of Statistics, Purdue University, Lafayette.Google Scholar
  62. Qian, W. and Titterington, D.M. (1991) Estimation of parameters in hidden Markov models. Phil Trans. Roy. Soc. London A 337, 407–428.MATHCrossRefGoogle Scholar
  63. Robert, C.P. (1992) Discussion of Meng and Rubin In Bayesian Statistics 4 (J.O. Berger, J.M. Bernardo, A.P. Dawid and A.F.M. Smith, eds.) Oxford University Press, London.Google Scholar
  64. Robert, C.P. (1993) Convergence assessments for Monte-Carlo Markov chain methods. Technical Report, Dept. of Math, Univ. de Rouen.Google Scholar
  65. Tierney, L. (1991) Markov chains for exploring posterior distributions. Computer Sciences and Statistics: Proc. 23d Symp. Interface, 563–570.Google Scholar
  66. Cleveland, W.S. (1979) Robust Locally-weighted Regression and Smoothing Scatterplots. J. Amer. Statist. Assoc. 74, 829–836.MathSciNetMATHCrossRefGoogle Scholar
  67. Koop, B.F., Rowan, L., Chen, W.-Q., Deshpande, P., Lee, BL and Hood, L. (1993). Sequence Length and Error Analysis of Sequenase and Automated Taq Cycle Sequencing Methods. Biotechniques 14, 442–447.Google Scholar
  68. Sanger, F., Nicklen, S. and Coulson, A.R. (1977). DNA Sequencing with Chain Terminating Inhibiters. Biochemistry 74, 560–564.Google Scholar
  69. Waterman, M.S. (1984). General Methods of Sequence Comparison. Bull Math. Biol. 46, 473–500.MathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1995

Authors and Affiliations

  • Gary A. Churchill
    • 1
  1. 1.Cornell UniversityUSA

Personalised recommendations