The VLDB Journal

, Volume 25, Issue 2, pp 223–241 | Cite as

Efficient order dependency detection

Regular Paper

Abstract

Order dependencies (ODs) describe a relationship of order between lists of attributes in a relational table. ODs can help to understand the semantics of datasets and the applications producing them. They have applications in the field of query optimization by suggesting query rewrites. Also, the existence of an OD in a table can provide hints on which integrity constraints are valid for the domain of the data at hand. This work is the first to describe the discovery problem for order dependencies in a principled manner by characterizing the search space, developing and proving pruning rules, and presenting the algorithm Order, which finds all order dependencies in a given table. Order traverses the lattice of permutations of attributes in a level-wise bottom-up manner. In a comprehensive evaluation, we show that it is efficient even for various large datasets.

Keywords

Data profiling Functional dependencies Metadata 

References

  1. 1.
    Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)CrossRefGoogle Scholar
  2. 2.
    Abedjan, Ziawasch, Naumann, Felix: Advancing the discovery of unique column combinations. In: Proceedings of the International Conference on Information and Knowledge Management (CIKM), pp. 1565–1570, (2011)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the International Conference on Very Large Databases (VLDB), pp. 487–499, (1994)Google Scholar
  4. 4.
    De Marchi, F., Lopes, S., Petit, J.-M.: Unary and n-ary inclusion dependency discovery in relational databases. J. Intell. Inf. Syst. 32(1), 53–73 (2009)CrossRefGoogle Scholar
  5. 5.
    Dong, J., Hull, R.: Applying approximate order dependency to reduce indexing space. In: Proceedings of the International Conference on Management of Data (SIGMOD), pp. 119–127, (1982)Google Scholar
  6. 6.
    Ginsburg, S., Hull, R.: Order dependency in the relational model. Theoret. Comput. Sci. 26(1–2), 149–195 (1983)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Golab, L., Karloff, H.J., Korn, F., Saha, A., Srivastava, D.: Sequential dependencies. Proc. VLDB Endow. 2(1), 574–585 (2009)CrossRefGoogle Scholar
  8. 8.
    Halbeisen, L., Hungerbühler, N.: Number theoretic aspects of a combinatorial function. Notes Numb. Theory Discrete Math. 5(4), 138–150 (1999)MathSciNetMATHGoogle Scholar
  9. 9.
    Heise, A., Quiané-Ruiz, J.-A., Abedjan, Z., Jentzsch, A., Naumann, F.: Scalable discovery of unique column combinations. Proc. VLDB Endow. 7(4), 301–312 (2013)CrossRefGoogle Scholar
  10. 10.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)CrossRefMATHGoogle Scholar
  11. 11.
    Lichman, M.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml. Accessed March 10, 2015
  12. 12.
    Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE Trans. Knowl Data Eng. 24(2), 251–264 (2012)CrossRefGoogle Scholar
  13. 13.
    Naumann, F.: Data profiling revisited. SIGMOD Rec. 42(4), 40–49 (2013)CrossRefGoogle Scholar
  14. 14.
    Ng, W.: Ordered functional dependencies in relational databases. Inf. Syst. 24(7), 535–554 (1999)CrossRefMATHGoogle Scholar
  15. 15.
    Northwestern University. WikiTables: Public Site (2015). http://downey-n1.cs.northwestern.edu/public. Accessed March 10, 2015
  16. 16.
    Papenbrock, T., Bergmann, T., Finke, M., Zwiener, J., Naumann, F.: Data profiling with Metanome. Proc. VLDB Endow. 8(12), 1860–1871 (2015)CrossRefGoogle Scholar
  17. 17.
    Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.-P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: An experimental evaluation of seven algorithms. Proc. VLDB Endow. 8(10), 1082–1093 (2015)CrossRefGoogle Scholar
  18. 18.
    Sloane, N.J.A.: The On-Line Encyclopedia of Integer Sequences—A000522 (2015). http://oeis.org/A000522. Accessed March 10, 2015
  19. 19.
    Szlichta, J., Godfrey, P., Gryz, J.: Chasing polarized order dependencies. In: Proceedings of the Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), pp. 168–179, (2012)Google Scholar
  20. 20.
    Szlichta, J., Godfrey, P., Gryz, J.: Fundamentals of order dependencies. Proc. VLDB Endow. 5(11), 1220–1231 (2012)CrossRefGoogle Scholar
  21. 21.
    Szlichta, J., Godfrey, P., Gryz, J., Ma, W., Qiu, W., Zuzarte, C.: Business-intelligence queries with order dependencies in DB2. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 750–761, (2014)Google Scholar
  22. 22.
    Szlichta, J., Godfrey, P., Gryz, J., Zuzarte, C.: Expressiveness and complexity of order dependencies. Proc. VLDB Endow. 6(14), 1858–1869 (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Hasso Plattner InstitutePotsdamGermany

Personalised recommendations