DEXTER: A System that Experiments with Choices of Training Data Using Expert Knowledge in the Domain of DNA Hydration

Cohen, Dawn M.; Kulikowski, Casimir; Berman, Helen

doi:10.1023/A:1022669731459

DEXTER: A System that Experiments with Choices of Training Data Using Expert Knowledge in the Domain of DNA Hydration

Published: October 1995

Volume 21, pages 81–101, (1995)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

DEXTER: A System that Experiments with Choices of Training Data Using Expert Knowledge in the Domain of DNA Hydration

Download PDF

Dawn M. Cohen¹,
Casimir Kulikowski² &
Helen Berman³

349 Accesses
9 Citations
Explore all metrics

Abstract

In this paper, we describe a system, DEXTER, that uses knowledge to suggest inductive learning experiments in the domain of DNA hydration pattern prediction. These experiments vary the training data presented to a classifier learner. Such experiments are necessary in this domain, since, as in many other scientific domains, data are noisy, the relevance of particular attributes is not well established, and the number of training cases is limited. In each experiment, DEXTER chooses a set of training cases, attributes and classes to learn. To generate an experiment, it examines the results of previous experiments, and uses domain knowledge and domain independent heuristics to select and modify a previous experiment. For the domain expert interested in using the induced rules to understand data, DEXTER's explicit use of knowledge provides several advantages that other data selection techniques do not. In particular, the variation of classifiers induced in different experiments yields insights into the roles and interactions of particular attributes in determining hydration. In addition, many of the classifiers induced from DEXTER's choices of data are of accuracy greater than or equal to those induced using the entire set of available data or data chosen by several other techniques. This work is of theoretical and pragmatic importance to molecular biophysicists. The learned hydration predictors provide insights about factors influencing DNA hydration. Also, the hydration predictors could lead to a tool for automatically predicting water positions around DNA molecules for which crystallographic data are not available.

Article PDF

More Interpretable Decision Trees

Semi-supervised self-training for decision tree classifiers

Article Open access 24 January 2015

SPAARC: A Fast Decision Tree Algorithm

References

Aggarwal, A.K., Rodgers, D. W., Drottar, M., Ptashne, M. & and Harrison, S.C. (1988). Recognition of a DNA operator by the repressor of Phage 434: A view at high resolution. Science, 242:899–907.
Google Scholar
Almuallin, H. & Dietterich, T.G. (1991). Learning with many irrelevant features. In Proceedings of the Ninth National Conference on Artificial Intelligence, pages 547–552. Anaheim, CA: AAAI Press.
Google Scholar
Berman, Helen. (1991). Hydration of DNA. Current Opinions in Structural Biology, 1 (3).
Berman, H.M., Olson, W.K., Beveridge, D.L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H., Srinivasan, A.R. & Schneider, B. (1992). The nucleic acid database: A comprehensive relational database of three-dimensional structures of nucleic acids. Biophysical Journal, 69:751–759.
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. (1984). Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA.
Google Scholar
Cherkauer, K.J. & Shavlik, J.W. (1993). Protein structure prediction: Selecting salient features from large candidate pools. In Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, pages 74–82. Bethesda, MD: AAAI Press.
Google Scholar
Chuprina, V.P., Heinemann, U., Nurislamov, A.A., Zielenkiewicz, P. & Dickerson, R.E. (1991). Molecular dynamics simulation of the hydration shell of a B-DNA decamer reveals two main types of minor-groove hydration, depending on groove width. Proceedings National Academy Science, pages 593–597.
Cohen, Dawn M. (1994). Knowledge-Based Generation of Machine Learning Experiments: Learning to Predict DNA Hydration Patterns. PhD thesis, Rutgers University.
Eisenstein, M., Frolow, F., Shakked, Z. & Rabinovich, D. (1990). The structure and hydration of the A-DNA fragment d(GGGTACCC) at room temperature and low temperature. Nucleic Acids Research, 18 (11):3185–3194.
Google Scholar
Evans, B. & Fisher, D. (1994). Process delay analysis using decision tree induction. IEEE Expert, 9:60.
Google Scholar
Fukunaga, K. (1972). Introduction to Statistical Pattern Recognition. Academic Press, New York.
Google Scholar
Ginsberg, A., Weiss, S.M. & Politakis, P. (1988). Automatic knowledge base refinement for classification systems. Artificial Intelligence, 35:197–226.
Google Scholar
Ho, P.S., Quigley, G.J., Tilton, R. F. & Rich, A. (1988). Hydration of methylated and nonmethylated B-DNA and Z-DNA. Journal of Physical Chemistry, 92 (4):939–945.
Google Scholar
Hunter, L. (1993). Planning to learn about protein structure. In L. Hunter, editor, Artificial Intelligence and Molecular Biology, pages 259–288. AAAI Press, Menlo Park, CA.
Google Scholar
Hunter L. & Klein, T. (1993). Finding relevant biomolecular features. In Proceedings of the First International Conference on Intelligent Systems for Molecular Biology, pages 190–197. Bethesda, MD: AAAI Press.
Google Scholar
Kira, K. & Rendell, L.A. (1992). The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the National Conference on Artificial Intelligence, pages 129–134. San Jose, CA: AAAI Press.
Google Scholar
Klosgen, W. (1992). Problems for knowledge discovery in databases and their treatment in the statistics interpreter EXPLORA. International Journal of Intelligent Systems, 7 (7):649–673.
Google Scholar
Kopka, M.L., Frantini, A.V., Drew, H.R. & Dickerson, R.E. (1983). Ordered water structure around a B-DNA dodecamer. a quantitative study. Journal of Molecular Biology, 163:129–146.
Google Scholar
Narendra, P.M. & Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Trans. Comp., 26:917–922.
Google Scholar
Neidle, S., Berman, H.M. & Shieh, H.S. (1980). Highly structured water networks in crystals of a deoxydinucleoside-drug complex. Nature, 288:129–133.
Google Scholar
Pagallo, G. & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5:71–99.
Google Scholar
Piatetsky-Shapiro, G. & Matheus, C.J. (1992). Knowledge discovery workbench for exploring business databases. International Journal of Intelligent Systems, 7:675–686.
Google Scholar
Prive, G.G., Yanagi, K. & Dickerson, R.E. (1991). Structure of the B-DNA decamer CCAACGTTGG and comparison with isomorphous decamers CCAAGATTGG and CCAGGCCTGG. Journal of Molecular Biology, 217:177–199.
Google Scholar
Provost, F.J., Buchanan, B.G., Clearwater, S.H., Lee, Y. & Leng, B. (1993). Machine learning in the service of exploratory science and engineering: A case study of the RL induction program. Technical Report ISL-93-6, Computer Science Department, University of Pittsburgh.
Salzberg, S. (1992). Improving classification methods via feature selection. Technical Report JHU-TR-92-12, Johns Hopkins University.
Schneider, B., Cohen, D. & Berman, H. (1992). Hydration of DNA bases: Analysis of crystallographic data. Biopolymers, 32:725–250.
Google Scholar
Schneider, B., Cohen, D.M., Schleifer, L., Srinivasan, A.R., Olson, W.K. & Berman, H.M. (1993). A systematic method for studying the spatial distribution of water molecules around nucleic acid bases. The Biophysical Journal.
Schneider, B., Ginell, S.L., Jones, R., Gaffney, B. & Berman, H.M. (1992). Crystal and molecular structure of a DNA fragment containing a 2-aminoadenine modification: The relationship between conformation, packing and hydration in Z-DNA hexamers. Biochemistry, 31:9622–9628.
Google Scholar
Siedlecki, W. & Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197–220.
Google Scholar
Weiss, S. & Indurkhya, N. (1991). Reduced complexity rule induction. In Proceedings of IJCAI-91, pages 678–684. Sydney: Morgan Kaufmann.
Google Scholar
Weiss, S.M. & Kulikowski, C.A. (1991). Computer Systems That Learn. Morgan Kaufmann, San Mateo, CA.
Google Scholar

Download references

Author information

Authors and Affiliations

Keck Center for Computational Biology, University of Pittsburgh, Pittsburgh, PA, 15260
Dawn M. Cohen
Department of Computer Science, Rutgers University, Piscataway, NJ, 08855
Casimir Kulikowski
Department of Chemistry, Rutgers University, Piscataway, NJ, 08855
Helen Berman

Authors

Dawn M. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Casimir Kulikowski
View author publications
You can also search for this author in PubMed Google Scholar
Helen Berman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cohen, D.M., Kulikowski, C. & Berman, H. DEXTER: A System that Experiments with Choices of Training Data Using Expert Knowledge in the Domain of DNA Hydration. Machine Learning 21, 81–101 (1995). https://doi.org/10.1023/A:1022669731459

Download citation

Issue Date: October 1995
DOI: https://doi.org/10.1023/A:1022669731459

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

DEXTER: A System that Experiments with Choices of Training Data Using Expert Knowledge in the Domain of DNA Hydration

Abstract

Article PDF

Similar content being viewed by others

More Interpretable Decision Trees

Semi-supervised self-training for decision tree classifiers

SPAARC: A Fast Decision Tree Algorithm

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

DEXTER: A System that Experiments with Choices of Training Data Using Expert Knowledge in the Domain of DNA Hydration

Abstract

Article PDF

Similar content being viewed by others

More Interpretable Decision Trees

Semi-supervised self-training for decision tree classifiers

SPAARC: A Fast Decision Tree Algorithm

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation