Advertisement

The VLDB Journal

, Volume 27, Issue 4, pp 573–591 | Cite as

Effective and complete discovery of bidirectional order dependencies via set-based axioms

  • Jaroslaw Szlichta
  • Parke Godfrey
  • Lukasz Golab
  • Mehdi Kargar
  • Divesh Srivastava
Regular Paper

Abstract

Integrity constraints (ICs) are useful for expressing and enforcing application semantics. Formulating ICs manually, however, requires domain expertise, is prone to human error, and can be exceedingly time-consuming. Thus, methods for automatic discovery have been developed for some classes of ICs, such as functional dependencies (FDs), and recently, order dependencies (ODs). ODs properly subsume FDs and can express business rules involving order; e.g., an employee who pays higher taxes has a higher salary than another employee. Bidirectional ODs further allow different ordering directions, ascending and descending, as in SQL’s order-by; e.g., a student with an alphabetically lower letter grade has a higher percentage grade than another student. We address the limitations of prior work on automatic OD discovery, which has factorial complexity, is incomplete, and is not concise. We present an efficient bidirectional OD discovery algorithm enabled by a novel polynomial mapping to a canonical form, and a sound and complete set of axioms for canonical bidirectional ODs to prune the search space. Our algorithm has exponential worst-case time complexity in the number of attributes and linear complexity in the number of tuples. We prove that it produces a complete and minimal set of bidirectional ODs, and we experimentally show orders of magnitude performance improvements over the prior state-of-the-art methodologies.

Keywords

Data profiling Data integration Metadata 

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)Google Scholar
  2. 2.
    Bläsius, T., Friedrich, T., Schirneck, M.: The parameterized complexity of dependency detection in relational databases. In: IPEC, pp. 6:1–6:13 (2016)Google Scholar
  3. 3.
    Chu, X., Ilyas, I., Papotti, P.: Discovering denial constraints. PVLDB 6(13), 1498–1509 (2013)Google Scholar
  4. 4.
    Chu, X., Ilyas, I., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE, pp. 458–469 (2013)Google Scholar
  5. 5.
    Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: VLDB, pp. 315–326 (2007)Google Scholar
  6. 6.
    Dong, J., Hull, R.: Applying approximate order dependency to reduce indexing space. In: SIGMOD, pp. 119–127 (1982)Google Scholar
  7. 7.
    Ginsburg, S., Hull, R.: Order dependency in the relational model. TCS 26(1), 149–195 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Golab, L., Karloff, H., Korn, F., Srivastava, D.: Sequential dependencies. PVLDB 2(1), 574–585 (2009)Google Scholar
  9. 9.
    Guravannavar, R., Ramanujam, H., Sudarshan, S.: Optimizing nested queries with parameter sort orders. In: VLDB, pp. 481–492 (2005)Google Scholar
  10. 10.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Efficient discovery of functional and approximate dependencies using partitions. In: ICDE, pp. 392–401 (1998)Google Scholar
  11. 11.
    Ilyas, I., Markl, V., Haas, P., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD, pp. 647–658 (2004)Google Scholar
  12. 12.
    Langer, P., Naumann, F.: Efficient order dependency detection. VLDB J. 25(2), 223–241 (2016)CrossRefGoogle Scholar
  13. 13.
    Malkemus, T., Padmanabhan, S., Bhattacharjee, B., Cranston, L.: Predicate derivation and monotonicity detection in DB2 UDB. In: ICDE, pp. 939–947 (2005)Google Scholar
  14. 14.
    Mihaylov, A., Godfrey, P., Golab, L., Kargar, M., Srivastava, D., Szlichta, J.: FastOD: bringing order to data. In: ICDE, System demonstration (2018, to appear)Google Scholar
  15. 15.
    Ng, W.: An extension of the relational data model to incorporate ordered domains. TODS 26(3), 344–383 (2001)CrossRefzbMATHGoogle Scholar
  16. 16.
    Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10), 1082–1093 (2015)Google Scholar
  17. 17.
    Papenbrock, T., Naumann, F.: A hybrid approach to functional dependency discovery. In: SIGMOD, pp. 821–833 (2016)Google Scholar
  18. 18.
    Prokoshyna, N., Szlichta, J., Chiang, F., Miller, R., Srivastava, D.: Combining quantitative and logical data cleaning. PVLDB 9(4), 300–311 (2015)Google Scholar
  19. 19.
    Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., Price, T.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979)Google Scholar
  20. 20.
    Simmen, D., Shekita, E., Malkemus, T.: Fundamental techniques for order optimization. In: SIGMOD, pp. 57–67 (1996)Google Scholar
  21. 21.
    Sismanis, Y., Brown, P., Haas, P., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)Google Scholar
  22. 22.
    Szlichta, J., Godfrey, P., Golab, L., Kargar, M., Srivastava, D.: Effective and complete discovery of order dependencies via set-based axiomatization. PVLDB 10(7), 721–732 (2017)Google Scholar
  23. 23.
    Szlichta, J., Godfrey, P., Gryz, J.: Fundamentals of order dependencies. PVLDB 5(11), 1220–1231 (2012)Google Scholar
  24. 24.
    Szlichta, J., Godfrey, P., Gryz, J., Ma, W., Pawluk, P., Zuzarte, C.: Queries on dates: fast yet not blind. In: EDBT, pp. 497–502 (2011)Google Scholar
  25. 25.
    Szlichta, J., Godfrey, P., Gryz, J., Ma, W., Qiu, W., Zuzarte, C.: Business-intelligence queries with order dependencies in DB2. In: EDBT, pp. 750–761 (2014)Google Scholar
  26. 26.
    Szlichta, J., Godfrey, P., Gryz, J., Zuzarte, C.: Expressiveness and complexity of order dependencies. PVLDB 6(14), 1858–1869 (2013)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of Ontario Institute of TechnologyOshawaCanada
  2. 2.York UniversityTorontoCanada
  3. 3.University of WaterlooWaterlooCanada
  4. 4.Ryerson UniversityTorontoCanada
  5. 5.AT&T Labs-ResearchFlorham ParkUSA

Personalised recommendations