Skip to main content
Log in

Database exploration in search of regularities

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Large databases can be a source of useful knowledge. Yet this knowledge is implicit in the data. It must be mined and expressed in a concise, useful form of statistical patterns, equations, rules, conceptual hierarchies, and the like. Automation of knowledge discovery is important because databases are growing in size and number, and standard data analysis techniques are not designed for exploration of huge hypotheses spaces. We concentrate on discovery of regularities, defining a regularity by a pattern and the range in which that pattern holds. We argue that two types of patterns are particularly important: contingency tables and equations, and we present Forty-Niner (49er), a general-purpose database mining system which conducts large-scale search for those patterns in many subsets of data, conducting a more costly search for equations only when data indicate a functional relationship. 49er can refine the initial regularities to yield stronger and more general regularities and more useful concepts. 49er combines several searches, each contributing to a different aspect of a regularity. Correspondence between the components of search and the structure of regularities makes the system easy to understand, use, and expand. Finally, we discuss 49er's performance in four categories of tests: (1) open exploration of new databases; (2) reproduction of human findings (limited because databases which have been extensively explored are very rare); (3) hide- and -seek testing on artificially created data, to evaluate 49er on large scale against known results; (4) exploration of randomly generated databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bhattacharyya, G.K. and Johnson, R.A. (1986).Statistical Concepts and Methods. New York: Wiley.

    Google Scholar 

  • Cai, Y., Cereone, Y., and Jiawei, H. (1989). Attribute-oriented induction in relational databases.Proc. Int. Workshop Knowledge Discovery in Databases, IJCAI-89, Detroit, MI.

  • Chimenti, D., Gamboa, R., Krishnamurthy, R., Naqvi, S., Tsur, S., and Zaniolo, C (1990). The LDL System Prototype,IEEE Transactions on Knowledge and Data Engineering, Vol. 2–1, pp.

  • Chipman, S.F., Krantz, D.H., and Silver, R. (1990). Mathematics Anxiety and Science Careers Among Able College Women. Technical Report.

  • Eadie, W.T., Drijard, D., James, F.E., Roos, M., Sadoulet, B. (1971).Statistical Methods in Experimental Physics, Amsterdam: North-Holland.

    Google Scholar 

  • Falkenhainer, B.C. and Michalski, R.S. (1986). Integrating Quantitative and Qualitative Discovery: The ABACUS System.Machine Learning, 1, 367–401.

    Google Scholar 

  • Fisher, D.H. (1987). Knowledge Acquisition via Incremental Conceptual Clustering.Machine Learning, 2, 139–172.

    Google Scholar 

  • Glymour, C., Scheines, R., Spirtes, P., and Kelly, K. (1987).Discovering Casual Structure. San Diego, CA: Academic Press.

    Google Scholar 

  • Gokhale, D.V. and Kullback, S. (1978).The Information in Contingency Tables. New York: Marcel Dekker.

    Google Scholar 

  • Harris, R.J. (1985).A Primer of Multivariate Statistics. New York: Academic Press.

    Google Scholar 

  • Hoschka, P. and Klösgen, W. (1991). A Support System for Interpreting Statistical Data. In G. Piatetsky-Shapiro and W. Frawley (Eds.),Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Kaufman, K.A., Michalski, R.S., and Kerschberg, L. (1991). An Architecture for Integrating Machine Learning and Discovery Programs into a Data Analysis System. In G. Piatetsky-Shapiro (Ed.),Proc. AAAI-91 Workshop on Knowledge Discovery in Databases, (pp. 35–51).

  • Klösgen, W. (1992). Patterns for Knowledge Discovery in Databases. In J. Żytkow (Ed.),Proc. ML-92 Workshop Machine Discovery (MD-92), (pp. 1–10), National Institute for Aviation Research, Wichita, KS.

    Google Scholar 

  • Langley, P., Simon, H.A., Bradshaw, G.L., and Żytkow, J.M. (1987).Scientific Discovery: Computational Explorations of the Creative Processes. Cambridge, MA: MIT Press.

    Google Scholar 

  • Lisp-Stat (1991) Book Review.Statistical Science, 6-4, 339–362.

    Google Scholar 

  • Michalski, R.S., Kerschberg, L. Kaufman, K.A., and Ribeiro, J.S. (1992). Mining for Knowledge in Databases: The INLEN Architecture, Initial Implementation and First Results.Intelligent Information Systems: Integrating Artificial Intelligence and Database Technologies, 1-1, 85–113.

    Google Scholar 

  • Naqvi, S. and Tsur, S. (1989).A Logical Language for Data and Knowledge Bases. New York: Computer Science Press.

    Google Scholar 

  • Piatetsky-Shapiro, G. (1992). Probabilistic Data Dependencies. In J. Żytkow (Ed.),Proc. ML-92 Workshop on Machine Discovery, (pp. 11–17). National Institute for Aviation Research, Wichita, KS.

    Google Scholar 

  • Piatetsky-Shapiro, G.(ed.) (1991).Proc. AAAI-91 Workshop Knowledge Discovery in Databases. San Diego, CA.

  • Piatetsky-Shapiro, G. and Frawley, W. (eds.) (1991).Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Piatetsky-Shapiro, G. and Matheus, C. (1991). Knowledge Discovery Workbench. In G. Piatetsky-Shapiro (Ed.),Proc. AAAI-91 Workshop Knowledge Discovery in Databases, (pp. 11–24).

  • Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. (1989).Numerical Recipes in Pascal. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Shrager, J. and Langley, P. (eds.) (1990).Computational Models of Scientific Discovery and Theory Formation. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Spirtes, P., Glymour, C., and Scheines, R. (1993).Causation, Prediction and Search. New York: Springer-Verlag.

    Google Scholar 

  • SPSS Reference Guide (1990). Chicago, IL: SPSS Inc.

  • Stevens, J. (1986).Applied Multivariate Statistics for the Social Sciences. Hillsdale, NJ: Lawrence Earlbaum Associates.

    Google Scholar 

  • Tierney, L. (1990).Lisp-Stat: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics. New York: Wiley.

    Google Scholar 

  • Zembowicz, R. and Żytkow, J.M. (1991). Automated discovery of empirical equations from data.Proc. ISMIS-91 Symp. (pp. 429–440). New York: Springer-Verlag.

    Google Scholar 

  • Zembowicz, R. and Żytkow, J.M. (1992). Discovery of Regularities in Databases. In J. Żytkow (Ed.),Proc. ML-92 Workshop on Machine Discovery, (pp. 18–27). National Institute for Aviation Research, Wichita, KS.

    Google Scholar 

  • Zembowicz, R. and Żytkow, J.M. (1992a). Discovery of Equations: Experimental Evaluation of Convergence. InProc. Tenth National Conf. Artif. Intel, (pp. 70–75). Menlo Park, CA: AAAI Press/MIT Press.

    Google Scholar 

  • Żytkow, J.M. (1987). Combining many searches in the FAHRENHEIT discovery system.Proc. 4th Int. Workshop Machine Learning (pp. 281–287). Irvine, CA: Morgan Kaufmann.

    Google Scholar 

  • Żytkow, J.M. (ed.) (1992).Proc. ML-92 Workshop on Machine Discovery (MD-92), National Institute for Aviation Research, Wichita, KS.

    Google Scholar 

  • Żytkow, J., and Baker, J. (1991). Interactive Mining of Regularities in Databases. In G. Piatetsky-Shapiro and W. Frawley (Eds.),Knowledge Discovery in Databases. Menlo Park, CA: AAAI Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Żytkow, J.M., Zembowicz, R. Database exploration in search of regularities. J Intell Inf Syst 2, 39–81 (1993). https://doi.org/10.1007/BF01066546

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01066546

Keywords

Navigation