Advertisement

Detecting Maximum Inclusion Dependencies without Candidate Generation

  • Nuhad ShaabaniEmail author
  • Christoph Meinel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9828)

Abstract

Inclusion dependencies (INDs) within and across databases are an important relationship for many applications in data integration, schema (re-)design, integrity checking, or query optimization. Existing techniques for detecting all INDs need to generate IND candidates and test their validity in the given data instance. However, the major disadvantage of this approach is the exponentially growing number of data accesses in terms of the number of SQL queries as well as I/O operations. We introduce Mind \(_2\), a new approach for detecting n-ary INDs (\(n > 1\)) without any candidate generation. Mind \(_2\) implements a new characterization of the maximum INDs we developed in this paper. This characterization is based on set operations defined on certain metadata that Mind \(_2\) generates by accessing the database only 2 \(\times \) the number of valid unary INDs. Thus, Mind \(_2\) eliminates the exponential number of data accesses needed by existing approaches. Furthermore, the experiments show that Mind \(_2\) is significantly more scalable than hypergraph-based approaches.

Keywords

Mind2 Inclusion dependency Data integration Data profiling 

References

  1. 1.
    Bauckmann, J., Leser, U., Naumann, F.: Efficiently computing inclusion dependencies for schema discovery. In: ICDE Workshops (2006)Google Scholar
  2. 2.
    Casanova, M.A., Fagin, R., Papadimitriou, C.H.: Inclusion dependencies and their interaction with functional dependencies. In: PODS (1982)Google Scholar
  3. 3.
    Casanova, M.A., Tucherman, L., Furtado, A.L.: Enforcing inclusion dependencies and referencial integrity. In: VLDB (1988)Google Scholar
  4. 4.
    Gryz, J.: Query folding with inclusion dependencies. In: Proceedings of the 14th IEEE Internation Conference on Data Engineering (ICDE 1998), pp. 126–133 (1998)Google Scholar
  5. 5.
    Kantola, M., Mannila, H., Räihä, K.J., Siirtola, H.: Discovering functional and inclusion dependencies in relational databases. JIIS 7(7), 591–607 (1992)zbMATHGoogle Scholar
  6. 6.
    Koeller, A., Rundensteiner, E.: Discovery of high-dimensional inclusion dependencies. Technical Reports WPI-CS-TR-02-15, Worcester Polytechnic Institute (2002)Google Scholar
  7. 7.
    Koeller, A., Rundensteiner, E.: Discovery of high-dimensional inclusion dependencies. In: ICDE, pp. 683–685 (2003)Google Scholar
  8. 8.
    Koeller, A., Rundensteiner, E.A.: Heuristic strategies for inclusion dependency discovery. In: Meersman, R. (ed.) OTM 2004. LNCS, vol. 3291, pp. 891–908. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Köhler, H., Link, S.: Inclusion dependencies reloaded. In: CIKM 2015 (2015)Google Scholar
  10. 10.
    Levene, M., Vincent, M.W.: Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Eng. 12, 2000 (2000)CrossRefGoogle Scholar
  11. 11.
    Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data- a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2012)CrossRefGoogle Scholar
  12. 12.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledgediscovery. Data Min. Knowl. Discov. 1(3), 241–258 (1997)CrossRefGoogle Scholar
  13. 13.
    Marchi, F.D., Lopes, S., Petit, J.M.: Efficient algorithms for mining inclusion dependencies. In: EDBT 2002, pp. 464–476 (2002)Google Scholar
  14. 14.
    Marchi, F.D., Petit, J.M.: Zigzag: A new algorithm for mining large inclusion dependencies in databases. In: ICDM (2003)Google Scholar
  15. 15.
    Marchi, F., Lopes, S., Petit, J.M.: Unary and n-ary inclusion dependency discovery in relational databases. JIIS 32, 53–73 (2009)Google Scholar
  16. 16.
    Memari, M., Link, S., Dobbie, G.: SQL Data Profiling of Foreign Keys. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, O.P. (eds.) Conceptual Modeling. LNCS, vol. 9381, pp. 229–243. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  17. 17.
    Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Howard Ho, C.T., Fagin, R., Popa, L.: The clio project: Managing heterogeneity. SIGMOD Rec. 30, 78–83 (2001)Google Scholar
  18. 18.
    Shaabani, N., Meinel, C.: Scalable inclusion dependency discovery. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 425–440. Springer, Heidelberg (2015)Google Scholar
  19. 19.
    Papenbrock, T., Sebastian Kruse, J.: Divide & conquer-based inclusion dependency discovery. VLDB 8, 774–785 (2015)Google Scholar
  20. 20.
    Zhang, M., Hadjieleftheriou, M., Ooi, B.C., Procopiuc, C.M., Srivastava, D.: On multi-column foreign key discovery. VLDB 3, 805–814 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Hasso-Plattner-InstitutUniversity of PotsdamPotsdamGermany

Personalised recommendations