Advertisement

BFASTDC: A Bitwise Algorithm for Mining Denial Constraints

  • Eduardo H. M. Pena
  • Eduardo Cunha de Almeida
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11029)

Abstract

Integrity constraints (ICs) are meant for many data management tasks. However, some types of ICs can express semantic rules that others ICs cannot, or vice versa. Denial constraints (DCs) are known to be a response to this expressiveness issue because they generalize important types of ICs, such as functional dependencies (FDs), conditional FDs, and check constraints. In this regard, automatic DC discovery is essential to avoid the expensive and error-prone task of manually designing DCs. FASTDC is an algorithm that serves this purpose, but it is highly sensitive to the number of records in the dataset. This paper presents BFASTDC, a bitwise version of FASTDC that uses logical operations to form the auxiliary data structures from which DCs are mined. Our experimental study shows that BFASTDC can be more than one order of magnitude faster than FASTDC.

Keywords

Data profiling Denial constraints Integrity constraints 

References

  1. 1.
    Kandel, S., Paepcke, A., Hellerstein, J.M., Heer, J.: Enterprise data analysis and visualization: an interview study. IEEE TVCG 18(12), 2917–2926 (2012)Google Scholar
  2. 2.
    Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)CrossRefGoogle Scholar
  3. 3.
    Ayat, N., Afsarmanesh, H., Akbarinia, R., Valduriez, P.: Pay-as-you-go data integration using functional dependencies. In: Quirchmayr, G., Basl, J., You, I., Xu, L., Weippl, E. (eds.) CD-ARES 2012. LNCS, vol. 7465, pp. 375–389. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-32498-7_28CrossRefzbMATHGoogle Scholar
  4. 4.
    Fan, W.: Data quality: from theory to practice. SIGMOD Rec. 44(3), 7–18 (2015)CrossRefGoogle Scholar
  5. 5.
    Bertossi, L.: Database Repairing and Consistent Query Answering. Morgan & Claypool Publishers, San Rafael (2011)Google Scholar
  6. 6.
    Chu, X., Ilyas, I.F., Papotti, P.: Discovering denial constraints. Proc. VLDB Endow. 6(13), 1498–1509 (2013)CrossRefGoogle Scholar
  7. 7.
    Rekatsinas, T., Chu, X., Ilyas, I.F., Ré, C.: Holoclean: holistic data repairs with probabilistic inference. PVLDB Endow. 10(11), 1190–1201 (2017)CrossRefGoogle Scholar
  8. 8.
    Geerts, F., Mecca, G., Papotti, P., Santoro, D.: That’s all folks!: LLUNATIC goes open source. PVLDB 7, 1565–1568 (2014)Google Scholar
  9. 9.
    Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data - a review. IEEE TKDE 24(2), 251–264 (2012)Google Scholar
  10. 10.
    Papenbrock, T., et al.: Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10), 1082–1093 (2015)Google Scholar
  11. 11.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)CrossRefGoogle Scholar
  12. 12.
    Wyss, C., Giannella, C., Robertson, E.: FastFDs: a heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2001. LNCS, vol. 2114, pp. 101–110. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-44801-2_11CrossRefGoogle Scholar
  13. 13.
    Fan, W., Geerts, F., Li, J., Xiong, M.: Discovering conditional functional dependencies. IEEE TKDE 23(5), 683–698 (2011)Google Scholar
  14. 14.
    Caruccio, L., Deufemia, V., Polese, G.: Relaxed functional dependencies - a survey of approaches. IEEE TKDE 28(1), 147–165 (2016)Google Scholar
  15. 15.
    Bleifuß, T., Kruse, S., Naumann, F.: Efficient denial constraint discovery with hydra. Proc. VLDB Endow. 11(3), 311–323 (2017)CrossRefGoogle Scholar
  16. 16.
    Fan, W., Geerts, F.: Foundations of Data Quality Management. Morgan & Claypool Publishers, San Rafael (2012)zbMATHGoogle Scholar
  17. 17.
    Zhang, M., Hadjieleftheriou, M., Ooi, B.C., Procopiuc, C.M., Srivastava, D.: On multi-column foreign key discovery. PVLDB 3(1–2), 805–814 (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Eduardo H. M. Pena
    • 1
  • Eduardo Cunha de Almeida
    • 2
  1. 1.Federal University of TechnologyToledoBrazil
  2. 2.Federal University of ParanáCuritibaBrazil

Personalised recommendations