Advertisement

Detecting Complex Dependencies in Categorical Data

  • Tim Oates
  • Matthew D. Schmill
  • Dawn E. Gregory
  • Paul R. Cohen
Part of the Lecture Notes in Statistics book series (LNS, volume 112)

Abstract

Locating and evaluating relationships among values in multiple streams of data is a difficult and important task. Consider the data flowing from monitors in an intensive care unit. Readings from various subsets of the monitors are indicative and predictive of certain aspects of the patient’s state. We present an algorithm that facilitates discovery and assessment of the strength of such predictive relationships called Multi-stream Dependency Detection (MSDD). We use heuristic search to guide our exploration of the space of potentially interesting dependencies to uncover those that are significant. We begin by reviewing the dependency detection technique described in [3], and extend it to the multiple stream case, describing in detail our heuristic search over the space of possible dependencies. Quantitative evidence for the utility of our approach is provided through a series of experiments with artificially-generated data. In addition, we present results from the application of our algorithm to two real problem domains: feature-based classification and prediction of pathologies in a simulated shipping network.

Keywords

Stream Length Structure Rule Multiple Stream Predictive Rule Dependency Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Bennett, K. P. and Mangasarian, O. L. Robust linear programming discrimination of two linearly inseparable sets. In Optimization Methods and Software 1, 1992, 23–34 (Gordon and Breach Science Publishers).CrossRefGoogle Scholar
  2. [2]
    Holte, Robert C. Very simple classification rules perform well on most commonly used datasets. In Machine Learning, (11), pp. 63–91, 1993.zbMATHCrossRefGoogle Scholar
  3. [3]
    Howe, Adele E. and Cohen, Paul R. Understanding Planner Behavior. To appear in AI Journal, Winter 1995.Google Scholar
  4. [4]
    Murphy, P. M., and Aha, D. W. UCI Repository of machine learning databases [Machine-readable data repository]. Irvine, CA: University of California, Department of Information and Computer Science, 1994.Google Scholar
  5. [5]
    Oates, Tim. MSDD as a Tool for Classification. Memo 94–29, Experimental Knowledge Systems Laboratory, Department of Computer Science, University of Massachusetts, Amherst, 1994.Google Scholar
  6. [6]
    Oates, Tim and Cohen, Paul R. Toward a plan steering agent: Experiments with schedule maintenance. In Proceedings of the Second International Conference on Artificial Intelligence Planning Systems, pp. 134–139, 1994.Google Scholar
  7. [7]
    Thrun, S.B. The MONK’s problems: A performance comparison of different learning algorithms. Carnegie Mellon University, CMU-CS-91–197, 1991.Google Scholar
  8. [8]
    Wirth, J. and Catlett, J. Experiments on the costs and benefits of windowing in ID3. In Proceedings of the Fifth International Conference on Machine Learning, pp. 87–99, 1988.Google Scholar
  9. [9]
    Zheng, Zijian. A benchmark for classifier learning. Basser Department of Computer Science, University of Sydney, NSW.Google Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • Tim Oates
    • 1
  • Matthew D. Schmill
    • 1
  • Dawn E. Gregory
    • 1
  • Paul R. Cohen
    • 1
  1. 1.Computer Science Department, LGRCUniversity of MassachusettsAmherstUSA

Personalised recommendations