Abstract
Hypothesis-formation problems occur when the outcome of an experiment as predicted by a scientific theory does not match the outcome observed by a scientist. The problem is to modify the theory, and/or the scientist's conception of the intial conditions of the experiment, such that the prediction agrees with the observation. I treat hypothesis formation as adesign problem. A program calledHypgene designs hypotheses by reasoning backward from its goal of eliminating the difference between prediction and observation. This prediction error is eliminated bydesign operators that are applied by a planning system. The synthetic, goal-directed application of these operators should prove more efficient than past generate-and-test approaches to hypothesis generation.Hypgene uses heuristic search to guide a generator that is focused on the errors in a prediction. The advantages of the design approach to hypothesis formation over the generate-and-test approach are analogous to the advantages of dependency-directed backtracking over chronological backtracking. These hypothesis-formation methods were developed in the context of a historical study of a scientific research program in molecular biology. This article describes in detail the results of applying theHypgene program to several hypothesis-formation problems identified in this historical study.Hypgene found most of the same solutions as did the biologists, which demonstrates that it is capable of solving complex, real-world hypothesis-formation problems.
Article PDF
Similar content being viewed by others
References
Altman, R. (1989).Exclusion methods for the determination of protein structure from experimental data. Ph.D. thesis, Stanford University Medical Information Sciences, Stanford, CA.
Bertrand, K., & Yanofsky, C. (1976). Regulation of transcription termination in the leader region of the tryptophan operon ofE. coli involves tryptophan or its metabolic product.Journal of Molecular Biology, 103, 339–349.
Buchanan, B., & Mitchell, T. (1978). Model-directed learning of production rules. InPattern-Directed Inference Systems. New York: Academic Press.
Chapman, D. (1987). Planning for conjunctive goals.Artificial Intelligence, 32, 333–377.
Dietterich, T. (1984).Constraint propagation techniques for theory-driven data interpretation. Ph.D. thesis, Stanford University, Stanford, CA. Also STAN-CS-84-1030 and HPP Report 84–46.
Dietterich, T., & Bennett, J. (1986). The test incorporation theory of problem-solving (preliminary report). InProceedings of the Workshop on Knowledge Compilation.
Fikes, R., & Nilsson, N. (1971). STRIPS: A new approach to the application of theorem proving to problem solving.Artificial Intelligence, 2, 189–208.
Forbus, K. (1984). Qualitative process theory (Technical Report TR-789). Cambridge, MA: Massachusetts Institute of Technology AI Laboratory.
Friedland, P. (1979).Knowledge-based hierarchical planning in molecular genetics. Ph.D. thesis, Stanford University Computer Science Department, Stanford, CA. Report CS-79-760.
Hiraga, S. (1969). Operator mutants of the tryptophan operon inE. coli.Journal of Molecular Biology, 39, 159–179.
Jackson, E., & Yanofsky, C. (1973). The region between the operator and first structural gene of the tryptophan operan ofE. coli may have a regulatory function.Journal of Molecular Biology, 76, 89–101.
Karp, P. (1989).Hypothesis formation and qualitative reasoning in molecular biology. Ph.D. thesis, Stanford University Computer Science Department, Stanford, CA. Technical reports STAN-CS-89-1263, KSL-89-52.
Karp, P. (1990). Hypothesis formation as design. InComputational models of discovery and theory formation. Morgan Kaufmann. See also Stanford University Knowledge Systems Laboratory report KSL-89-11.
Karp, P., & Friedland, P. (1989). Coordinating the use of qualitative and quantitative knowledge in declarative device modeling. InArtificial Intelligence, Modeling and Simulation. New York: John Wiley and Sons. See also Stanford University Knowledge Systems Laboratory report KSL-87-09.
Karp, P.D. (in press). A qualitative biochemistry and its application to the regulation of the tryptophan operon. In L. Hunter (Ed.),Artificial intelligence and molecular biology. Menlo Park, CA: AAAI Press.
Kulkarni, D. (1988).The processes of scientific research: The strategy of experimentation. Ph.D. thesis, Carnegie Mellon University Computer Science Department. CMU School of Computer Science Technical report 88–207.
Langley, P., Simon, H., Bradshaw, G., & Żytkow, J. (1987).Scientific discovery: Computational explorations of the creative process. Cambridge, MA: MIT Press.
Lindsay, R., Buchanan, B.G., A., F.E., & Lederberg, J. (1980).Applications of artificial intelligence for organic chemistry: The DENDRAL project. New York: McGraw-Hill.
Pednault, E. (1988). Extending conventional planning techniques to handle actions with context-dependent effects. InProceedings of the 1988 National Conference on Artificial Intelligence, (pp. 55–59). Morgan Kaufmann.
Rajamoney, S. (1988).Explanation-based theory revision: An approach to the problems of incomplete and incorrect theories. Ph.D. thesis, University of Illinois Department of Computer Science.
Simmons, R. (1988).Combining associational and causal reasoning to solve interpretation and planning problems. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.
Stefik, M. (1980).Planning with constraints. Ph.D. thesis, Stanford University Computer Science Department, Stanford, CA. Technical reports HPP-89-2, STAN-CS-80-784.
Tong, C. (1988).Knowledge-based circuit design. Ph.D. thesis, Stanford University Computer Science Department, Stanford, CA.
Yanofsky, C. (1989). Personal communication.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Karp, P.D. Design methods for scientific hypothesis formation and their application to molecular biology. Mach Learn 12, 89–116 (1993). https://doi.org/10.1007/BF00993062
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF00993062