Database exploration in search of regularities

Żytkow, Jan M.; Zembowicz, Robert

doi:10.1007/BF01066546

Database exploration in search of regularities

Published: March 1993

Volume 2, pages 39–81, (1993)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Jan M. Żytkow¹ &
Robert Zembowicz¹

83 Accesses
35 Citations
Explore all metrics

Abstract

Large databases can be a source of useful knowledge. Yet this knowledge is implicit in the data. It must be mined and expressed in a concise, useful form of statistical patterns, equations, rules, conceptual hierarchies, and the like. Automation of knowledge discovery is important because databases are growing in size and number, and standard data analysis techniques are not designed for exploration of huge hypotheses spaces. We concentrate on discovery of regularities, defining a regularity by a pattern and the range in which that pattern holds. We argue that two types of patterns are particularly important: contingency tables and equations, and we present Forty-Niner (49er), a general-purpose database mining system which conducts large-scale search for those patterns in many subsets of data, conducting a more costly search for equations only when data indicate a functional relationship. 49er can refine the initial regularities to yield stronger and more general regularities and more useful concepts. 49er combines several searches, each contributing to a different aspect of a regularity. Correspondence between the components of search and the structure of regularities makes the system easy to understand, use, and expand. Finally, we discuss 49er's performance in four categories of tests: (1) open exploration of new databases; (2) reproduction of human findings (limited because databases which have been extensively explored are very rare); (3) hide- and -seek testing on artificially created data, to evaluate 49er on large scale against known results; (4) exploration of randomly generated databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bhattacharyya, G.K. and Johnson, R.A. (1986).Statistical Concepts and Methods. New York: Wiley.
Google Scholar
Cai, Y., Cereone, Y., and Jiawei, H. (1989). Attribute-oriented induction in relational databases.Proc. Int. Workshop Knowledge Discovery in Databases, IJCAI-89, Detroit, MI.
Chimenti, D., Gamboa, R., Krishnamurthy, R., Naqvi, S., Tsur, S., and Zaniolo, C (1990). The LDL System Prototype,IEEE Transactions on Knowledge and Data Engineering, Vol. 2–1, pp.
Chipman, S.F., Krantz, D.H., and Silver, R. (1990). Mathematics Anxiety and Science Careers Among Able College Women. Technical Report.
Eadie, W.T., Drijard, D., James, F.E., Roos, M., Sadoulet, B. (1971).Statistical Methods in Experimental Physics, Amsterdam: North-Holland.
Google Scholar
Falkenhainer, B.C. and Michalski, R.S. (1986). Integrating Quantitative and Qualitative Discovery: The ABACUS System.Machine Learning, 1, 367–401.
Google Scholar
Fisher, D.H. (1987). Knowledge Acquisition via Incremental Conceptual Clustering.Machine Learning, 2, 139–172.
Google Scholar
Glymour, C., Scheines, R., Spirtes, P., and Kelly, K. (1987).Discovering Casual Structure. San Diego, CA: Academic Press.
Google Scholar
Gokhale, D.V. and Kullback, S. (1978).The Information in Contingency Tables. New York: Marcel Dekker.
Google Scholar
Harris, R.J. (1985).A Primer of Multivariate Statistics. New York: Academic Press.
Google Scholar
Hoschka, P. and Klösgen, W. (1991). A Support System for Interpreting Statistical Data. In G. Piatetsky-Shapiro and W. Frawley (Eds.),Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.
Google Scholar
Kaufman, K.A., Michalski, R.S., and Kerschberg, L. (1991). An Architecture for Integrating Machine Learning and Discovery Programs into a Data Analysis System. In G. Piatetsky-Shapiro (Ed.),Proc. AAAI-91 Workshop on Knowledge Discovery in Databases, (pp. 35–51).
Klösgen, W. (1992). Patterns for Knowledge Discovery in Databases. In J. Żytkow (Ed.),Proc. ML-92 Workshop Machine Discovery (MD-92), (pp. 1–10), National Institute for Aviation Research, Wichita, KS.
Google Scholar
Langley, P., Simon, H.A., Bradshaw, G.L., and Żytkow, J.M. (1987).Scientific Discovery: Computational Explorations of the Creative Processes. Cambridge, MA: MIT Press.
Google Scholar
Lisp-Stat (1991) Book Review.Statistical Science, 6-4, 339–362.
Google Scholar
Michalski, R.S., Kerschberg, L. Kaufman, K.A., and Ribeiro, J.S. (1992). Mining for Knowledge in Databases: The INLEN Architecture, Initial Implementation and First Results.Intelligent Information Systems: Integrating Artificial Intelligence and Database Technologies, 1-1, 85–113.
Google Scholar
Naqvi, S. and Tsur, S. (1989).A Logical Language for Data and Knowledge Bases. New York: Computer Science Press.
Google Scholar
Piatetsky-Shapiro, G. (1992). Probabilistic Data Dependencies. In J. Żytkow (Ed.),Proc. ML-92 Workshop on Machine Discovery, (pp. 11–17). National Institute for Aviation Research, Wichita, KS.
Google Scholar
Piatetsky-Shapiro, G.(ed.) (1991).Proc. AAAI-91 Workshop Knowledge Discovery in Databases. San Diego, CA.
Piatetsky-Shapiro, G. and Frawley, W. (eds.) (1991).Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.
Google Scholar
Piatetsky-Shapiro, G. and Matheus, C. (1991). Knowledge Discovery Workbench. In G. Piatetsky-Shapiro (Ed.),Proc. AAAI-91 Workshop Knowledge Discovery in Databases, (pp. 11–24).
Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. (1989).Numerical Recipes in Pascal. Cambridge, UK: Cambridge University Press.
Google Scholar
Shrager, J. and Langley, P. (eds.) (1990).Computational Models of Scientific Discovery and Theory Formation. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Spirtes, P., Glymour, C., and Scheines, R. (1993).Causation, Prediction and Search. New York: Springer-Verlag.
Google Scholar
SPSS Reference Guide (1990). Chicago, IL: SPSS Inc.
Stevens, J. (1986).Applied Multivariate Statistics for the Social Sciences. Hillsdale, NJ: Lawrence Earlbaum Associates.
Google Scholar
Tierney, L. (1990).Lisp-Stat: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics. New York: Wiley.
Google Scholar
Zembowicz, R. and Żytkow, J.M. (1991). Automated discovery of empirical equations from data.Proc. ISMIS-91 Symp. (pp. 429–440). New York: Springer-Verlag.
Google Scholar
Zembowicz, R. and Żytkow, J.M. (1992). Discovery of Regularities in Databases. In J. Żytkow (Ed.),Proc. ML-92 Workshop on Machine Discovery, (pp. 18–27). National Institute for Aviation Research, Wichita, KS.
Google Scholar
Zembowicz, R. and Żytkow, J.M. (1992a). Discovery of Equations: Experimental Evaluation of Convergence. InProc. Tenth National Conf. Artif. Intel, (pp. 70–75). Menlo Park, CA: AAAI Press/MIT Press.
Google Scholar
Żytkow, J.M. (1987). Combining many searches in the FAHRENHEIT discovery system.Proc. 4th Int. Workshop Machine Learning (pp. 281–287). Irvine, CA: Morgan Kaufmann.
Google Scholar
Żytkow, J.M. (ed.) (1992).Proc. ML-92 Workshop on Machine Discovery (MD-92), National Institute for Aviation Research, Wichita, KS.
Google Scholar
Żytkow, J., and Baker, J. (1991). Interactive Mining of Regularities in Databases. In G. Piatetsky-Shapiro and W. Frawley (Eds.),Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Wichita State University, 67208, Wichita, KS
Jan M. Żytkow & Robert Zembowicz

Authors

Jan M. Żytkow
View author publications
You can also search for this author in PubMed Google Scholar
Robert Zembowicz
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Żytkow, J.M., Zembowicz, R. Database exploration in search of regularities. J Intell Inf Syst 2, 39–81 (1993). https://doi.org/10.1007/BF01066546

Download citation

Issue Date: March 1993
DOI: https://doi.org/10.1007/BF01066546

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Database exploration in search of regularities

Abstract

Access this article

Similar content being viewed by others

A survey of Bayesian Network structure learning

Uncertainty in big data analytics: survey, opportunities, and challenges

Recent advances in decision trees: an updated survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Database exploration in search of regularities

Abstract

Access this article

Similar content being viewed by others

A survey of Bayesian Network structure learning

Uncertainty in big data analytics: survey, opportunities, and challenges

Recent advances in decision trees: an updated survey

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation