Resampling in an indefinite database to approximate functional dependencies

  • Ethan Collopy
  • Mark Levene
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1510)


Functional Dependency satisfaction, where the value of one attribute uniquely determines another, may be approximated by Numerical Dependencies (NDs), wherein an attribute set determines at most k attribute sets. Hence, we use NDs to “mine” a relation to see how well a given FD set is approximated. We motivate NDs by examining their use with indefinite information in relations. The family of all possible ND sets which may approximate an FD set forms a complete lattice. Using this, a proximity metric is presented and used to assess the distance of each resulting ND set to a given FD set.

Searching for a definite relation extracted from an indefinite relation which satisfies a given set of FDs, known as the consistency problem, has been shown to be NP-complete. We propose a novel application of the bootstrap, a computer intensive resampling technique, to determine a suitable number of definite relations upon which to apply a heuristic based hill-climbing algorithm which attempts to minimise the distance between the best ND set and the given FD set. The novelty is that we repeatedly apply the bootstrap to an indefinite relation with an increasing sample size until an approximate fixpoint is reached at which point we assume that the sample size is then representative of the indefinite relation. We compare the bootstrap with its predecessor, the jackknife, and conclude that both are applicable with the bootstrap providing additional flexibility. This work highlights the utility of computer intensive resampling within a dependency data mining context.

Key words

Functional Dependency Numerical Dependency Data Mining Indefinite Relation Resampling Bootstrap 


  1. 1.
    J. Grant and J. Minker. Inferences for numerical dependencies. Theoretical Computer Science, 41:271–287, 1985.MATHMathSciNetCrossRefGoogle Scholar
  2. 2.
    K. Vadaparty and S. Naqvi. Using constraints for efficient query processing in nondeterministic databases. IEEE Transactions on Knowledge and Data Engineering, 7(6):850–864, 1995.CrossRefGoogle Scholar
  3. 3.
    H. Mannila and K-J. Räihä. The Design of Relational Databases. Addison-Wesley, 1992.Google Scholar
  4. 4.
    E. Collopy and M. Levene. Using numerical dependencies and the bootstrap for the consistency problem. Technical Report RN/98/2, University College London, U.K., 1998.Google Scholar
  5. 5.
    B. Efron and R. Tibshirani. Bootstrap methods for standerd errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, 1(1):54–77, 1986.MATHMathSciNetGoogle Scholar
  6. 6.
    G. Piatetsky-Shapiro and C. J. Matheus. Measuring data dependencies in large databases. In Proceedings of the Workshop on Knowledge Discovery in Databases, pages 162–173, Washington DC, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Ethan Collopy
    • 1
  • Mark Levene
    • 1
  1. 1.Department of Computer ScienceUniversity College LondonLondonU.K.

Personalised recommendations