Advertisement

An Evaluation of TANE Algorithm for Functional Dependency Detection

  • Nikita Bobrov
  • George Chernishev
  • Dmitry Grigoriev
  • Boris Novikov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10563)

Abstract

Exploitation of logical schema information can allow producing better physical designs for a database. In order to exploit this information, one has to extract it from the data stored in the database. Extraction should be performed using some kind of an algorithm that provides an acceptable level of result quality. This quality has to be ensured, for example, in terms of precision.

In this paper we consider a particular type of such information: functional dependencies. One of the well-known algorithms for extraction of functional dependencies is the TANE algorithm. We propose to study its precision-related properties which are relevant for its use in our automatic physical design tool. TANE, being an approximate algorithm, returns only a fraction of existing dependencies. It is also prone to false positives. In contrast with the previous research, which measured run times and memory consumption, we aim to evaluate the quality of this algorithm.

Finally, we briefly describe the context of this study—constructing an alternative physical design tuning system that would use the output of the TANE algorithm. The system is an ordinary vertical partitioning tool, but which operates without workload knowledge, relying on data characteristics. Our plan is to employ TANE inside the functional dependency detection component. Thus, the purpose of evaluation is to study to what extent the properties of the algorithm affect our goals.

Keywords

TANE Physical design tuning Vertical partitioning Logical schema information Functional dependency Functional dependency detection Experimentation 

Notes

Acknowledgments

We would like to thank Felix Naumann for his valuable comments on the previous version of this paper. We would also like to thank anonymous reviewers for their valuable comments on this work. This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.

References

  1. 1.
    Bellatreche, L.: Optimization and tuning in data warehouses. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1995–2003. Springer, New York (2009). doi: 10.1007/978-0-387-39940-9_259 Google Scholar
  2. 2.
    Lightstone, S.: Physical database design for relational databases. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 2108–2114. Springer, New York (2009). doi: 10.1007/978-0-387-39940-9_644 Google Scholar
  3. 3.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)CrossRefMATHGoogle Scholar
  4. 4.
    Chaudhuri, S., Weikum, G.: Self-management technology in databases. In: Liu, L., Öszu, M. (eds.) Encyclopedia of Database Systems, pp. 2550–2555. Springer, New York (2009). doi: 10.1007/978-0-387-39940-9_334 Google Scholar
  5. 5.
    Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004, pp. 359–370 (2004)Google Scholar
  6. 6.
    Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD 2002, pp. 558–569 (2002)Google Scholar
  7. 7.
    Nehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: SIGMOD 2011, pp. 1137–1148 (2011)Google Scholar
  8. 8.
    Agrawal, S., Chu, E., Narasayya, V.: Automatic physical design tuning: workload as a sequence. In: SIGMOD 2006, pp. 683–694 (2006)Google Scholar
  9. 9.
    Alagiannis, I., Dash, D., Schnaitter, K., Ailamaki, A., Polyzotis, N.: An automated, yet interactive and portable DB designer. In: SIGMOD 2010, pp. 1183–1186 (2010)Google Scholar
  10. 10.
    Schnaitter, K., Abiteboul, S., Milo, T., Polyzotis, N.: Colt: continuous on-line tuning. In: SIGMOD 2006, pp. 793–795 (2006)Google Scholar
  11. 11.
    Hose, K., Klan, D., Marx, M., Sattler, K.U.: When is it time to rethink the aggregate configuration of your OLAP server? Proc. VLDB Endow. 1(2), 1492–1495 (2008)CrossRefGoogle Scholar
  12. 12.
    Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03730-6_9 CrossRefGoogle Scholar
  13. 13.
    Bellatreche, L., Boukhalfa, K., Abdalla, H.I.: SAGA: a combination of genetic and simulated annealing algorithms for physical data warehouse design. In: Bell, D.A., Hong, J. (eds.) BNCOD 2006. LNCS, vol. 4042, pp. 212–219. Springer, Heidelberg (2006). doi: 10.1007/11788911_18 CrossRefGoogle Scholar
  14. 14.
    Bellatreche, L., Cuzzocrea, A., Benkrid, S.: \(\cal{F}\)&\(\cal{A}\): a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Berlin (2010). doi: 10.1007/978-3-642-15105-7_8 CrossRefGoogle Scholar
  15. 15.
    Gebaly, K.E., Aboulnaga, A.: Robustness in automatic physical database design. In: EDBT 2008, pp. 145–156 (2008)Google Scholar
  16. 16.
    Zilio, D., Zuzarte, C., Lightstone, S., Ma, W., Lohman, G., Cochrane, R., Pirahesh, H., Colby, L., Gryz, J., Alton, E., Valentin, G.: Recommending materialized views and indexes with the IBM DB2 design advisor. In: ICAC 2004, pp. 180–187, May 2004Google Scholar
  17. 17.
    Chaudhuri, S., Narasayya, V.: Self-tuning database systems: a decade of progress. In: VLDB 2007, pp. 3–14. VLDB Endowment (2007)Google Scholar
  18. 18.
    Chernishev, G.: A survey of DBMS physical design approaches. SPIIRAS Proc. 24, 222–276 (2013)Google Scholar
  19. 19.
    Quix, C., Li, X., Kensche, D., Geisler, S.: View management techniques and their application to data stream management. In: Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions, pp. 83–112 (2010)Google Scholar
  20. 20.
    Mami, I., Bellahsene, Z.: A survey of view selection methods. SIGMOD Rec. 41(1), 20–29 (2012)CrossRefGoogle Scholar
  21. 21.
    Wah, B.: File placement on distributed computer systems. Computer 17(1), 23–32 (1984)CrossRefGoogle Scholar
  22. 22.
    Chernishev, G.: Towards self-management in a distributed column-store system. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. CCIS, vol. 539, pp. 97–107. Springer, Cham (2015). doi: 10.1007/978-3-319-23201-0_12 CrossRefGoogle Scholar
  23. 23.
    Novelli, N., Cicchetti, R.: FUN: an efficient algorithm for mining functional and embedded dependencies. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 189–203. Springer, Heidelberg (2001). doi: 10.1007/3-540-44503-X_13 CrossRefGoogle Scholar
  24. 24.
    Yao, H., Hamilton, H.J., Butz, C.J.: FD_Mine: discovering functional dependencies in a database using equivalences. In: ICDM 2002, pp. 729–732 (2002)Google Scholar
  25. 25.
    Abedjan, Z., Schulze, P., Naumann, F.: DFD: efficient functional dependency discovery. In: CIKM 2014, pp. 949–958 (2014)Google Scholar
  26. 26.
    Lopes, S., Petit, J.-M., Lakhal, L.: Efficient discovery of functional dependencies and Armstrong relations. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 350–364. Springer, Heidelberg (2000). doi: 10.1007/3-540-46439-5_24 CrossRefGoogle Scholar
  27. 27.
    Flach, P.A., Savnik, I.: Database dependency discovery: a machine learning approach. AI Commun. 12(3), 139–160 (1999)MathSciNetGoogle Scholar
  28. 28.
    Bobrov, N., Chernishev, G., Novikov, B.: Workload-independent data-driven vertical partitioning. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A., Gamper, J., Wrembel, J., Darmont, J., Rizzi, S. (eds.) ADBIS 2017. CCIS, vol. 767. Springer, Cham (2017)Google Scholar
  29. 29.
    Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. Proc. VLDB Endow. 8(10), 1082–1093 (2015)CrossRefGoogle Scholar
  30. 30.
    Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)CrossRefGoogle Scholar
  31. 31.
    Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2012)CrossRefGoogle Scholar
  32. 32.
    Song, S., Chen, L.: Differential dependencies: reasoning and discovery. ACM Trans. Database Syst. 36(3), 16:1–16:41 (2011)CrossRefGoogle Scholar
  33. 33.
    TPC: TPC Benchmark H. Decision Support. http://www.tpc.org/tpch
  34. 34.
    Federal Railroad Administration Office of Safety Analysis: FRA Highway-Rail Crossing Inventory Database. http://safetydata.fra.dot.gov/OfficeofSafety/default.aspx
  35. 35.
    Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE implementation. http://www.cs.helsinki.fi/research/fdk/datamining/tane/
  36. 36.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATHGoogle Scholar
  37. 37.
    Papadomanolakis, S., Ailamaki, A.: Autopart: automating schema design for large scientific databases using data partitioning. In: SSDBM 2004, pp. 383–392 (2004)Google Scholar
  38. 38.
    Boehm, A.M., Seipel, D., Sickmann, A., Wetzka, M.: Squash: a tool for analyzing, tuning and refactoring relational database applications. In: Seipel, D., Hanus, M., Wolf, A. (eds.) INAP/WLP -2007. LNCS (LNAI), vol. 5437, pp. 82–98. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-00675-3_6 CrossRefGoogle Scholar
  39. 39.
    Qian, L., LeFevre, K., Jagadish, H.V.: CRIUS: user-friendly database design. Proc. VLDB Endow. 4(2), 81–92 (2010)CrossRefGoogle Scholar
  40. 40.
    Wiese, D., Rabinovitch, G., Reichert, M., Arenswald, S.: Autonomic tuning expert: a framework for best-practice oriented autonomic database tuning. In: CASCON 2008, pp. 327–341 (2008)Google Scholar
  41. 41.
    De Marchi, F., Lopes, S., Petit, J.M., Toumani, F.: Analysis of existing databases at the logical level: the DBA companion project. SIGMOD Rec. 32(1), 47–52 (2003)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Nikita Bobrov
    • 1
  • George Chernishev
    • 1
    • 2
  • Dmitry Grigoriev
    • 1
  • Boris Novikov
    • 1
    • 2
  1. 1.Saint Petersburg State UniversitySt. PetersburgRussia
  2. 2.JetBrains ResearchPragueCzech Republic

Personalised recommendations