Skip to main content
Log in

Mining functional dependencies from data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In this paper, we propose an efficient rule discovery algorithm, called FD_Mine, for mining functional dependencies from data. By exploiting Armstrong’s Axioms for functional dependencies, we identify equivalences among attributes, which can be used to reduce both the size of the dataset and the number of functional dependencies to be checked. We first describe four effective pruning rules that reduce the size of the search space. In particular, the number of functional dependencies to be checked is reduced by skipping the search for FDs that are logically implied by already discovered FDs. Then, we present the FD_Mine algorithm, which incorporates the four pruning rules into the mining process. We prove the correctness of FD_Mine, that is, we show that the pruning does not lead to the loss of useful information. We report the results of a series of experiments. These experiments show that the proposed algorithm is effective on 15 UCI datasets and synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., pp 207–216

  • Baixeries J (2004) A formal concept analysis framework to mine functional dependencies. In: Proceeding of the Workshop on Mathematical Methods for Learning, Villa Geno, Italy

  • Baixeries J (2007). Lattice characterization of Armstrong and symmetric dependencies. Ph.D. Thesis, Universitat Politècnica de Catalunya, Spain, 2007

  • Carpineto C, Romano G and d’Adamo P (1999). Inferring dependencies from relations: a conceptual clustering approach. Computat Intelligence 15(4): 415–441

    Article  MathSciNet  Google Scholar 

  • Demetrovics J, Libkin L and Muchnik IB (1992). Functional dependencies in relational databases: a lattice point of view. Disc Appl Math 40: 155–185

    Article  MATH  MathSciNet  Google Scholar 

  • Fagin R (1977). Functional dependencies in a relational database and propositional logic. IBM J Res Dev 21(6): 534–544

    Article  MATH  MathSciNet  Google Scholar 

  • Flach PA and Savnik A (1999). Database dependency discovery: a machine learning approach. AI Commun 12(3): 139–160

    MathSciNet  Google Scholar 

  • Flesca S, Furfaro F, Greco S, Zumpano E (2005) Repairing inconsistent XML data with functional dependencies. Encycl Database Technol Appl Idea Group 542–547

  • Ganter B and Wille R (1999). Formal concept analysis: mathematical foundations. Springer, Berlin/Heidelberg

    MATH  Google Scholar 

  • Goodaire EG and Parmenter MM (1992). Discrete mathematics with graph theory. Prentice Hall, New Jersey

    Google Scholar 

  • Huhtala Y, Karkkainen J, Porkka P and Toivonen H (1999). TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput J 42(2): 100–111

    Article  MATH  Google Scholar 

  • Kalashnikov VD and Mehrotra S (2006). Domain-independent data cleaning via analysis of entity- relationship graph. ACM Trans Database Syst 31(2): 716–767

    Article  Google Scholar 

  • Lopes S, Petit J-M, Lakhal L (2000) Efficient discovery of functional dependencies and Armstrong relations. In: 7th International Conference on Extending Database Technology (EDBT 2000), pp 350–364

  • Lopes S, Petit J-M and Lakhal L (2002). Functional and approximate dependency mining: database and FCA points of view. Special issue of J Exp Theor Artif Intelligence (JETAI) on Concept Lattices for KDD 14(2–3): 93–114

    MATH  Google Scholar 

  • Maier D (1983). The theory of relational databases. Computer Science Press, Rockville, Maryland

    MATH  Google Scholar 

  • Mannila H and Raiha KJ (1994). Algorithms for inferring functional dependencies from relations. Data Knowl Eng 12(1): 83–99

    Article  MATH  Google Scholar 

  • Mannila H and Toivonen H (1997). Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Disc 1(3): 241–258

    Article  Google Scholar 

  • Novelli N, Cicchetti R (2001a) FUN: an efficient algorithm for mining functional and embedded dependencies. In: Proceedings of the International Conference on Database Theory, London, UK, pp 189–203

  • Novelli N and Cicchetti R (2001b). Functional and embedded dependency inference: A data mining point of view. Inform Syst 26(7): 477–506

    Article  MATH  Google Scholar 

  • Ramakrishnan R and Gehrke J (2002). Database management systems. McGraw-Hill, New York

    Google Scholar 

  • Sagiv Y, Delobel C, Parker DS and Fagin R (1981). An equivalence between relational database dependencies and a fragment of propositional logic. J ACM 28(3): 435–453

    Article  MathSciNet  MATH  Google Scholar 

  • Tan HBK and Zhao Y (2004). Automated elicitation of functional dependencies from source codes of database transactions. Inform Software Technol 46(2): 109–117

    Article  Google Scholar 

  • UCI Repository of machine learning databases (2005) http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Ullman JD (1982). Principles of database systems. Computer Science Press, Rockville

    MATH  Google Scholar 

  • Wyss C, Giannella C, Robertson EL (2001) FastFDs, a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances. In: Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2001), pp 101–110

  • Yao H, Hamilton HJ, Butz CJ (2002) FD_Mine: discovering functional dependencies in a database using equivalences. In: Proceedings of the 2nd IEEE International Conference on Data Mining, Maebashi City, Japan, pp 729–732

  • Yao H, Butz CJ, Hamilton HJ (2005) Causal discovery. In: Maimon O Rokach L (eds) The data mining and knowledge discovery handbook, Springer, pp 945–955

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Howard J. Hamilton.

Additional information

Responsible editor: M. J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, H., Hamilton, H.J. Mining functional dependencies from data. Data Min Knowl Disc 16, 197–219 (2008). https://doi.org/10.1007/s10618-007-0083-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-007-0083-9

Keywords

Navigation