An Evaluation of TANE Algorithm for Functional Dependency Detection

Bobrov, Nikita; Chernishev, George; Grigoriev, Dmitry; Novikov, Boris

doi:10.1007/978-3-319-66854-3_16

Nikita Bobrov¹⁷,
George Chernishev^17,18,
Dmitry Grigoriev¹⁷ &
…
Boris Novikov^17,18

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10563))

Included in the following conference series:

International Conference on Model and Data Engineering

874 Accesses
1 Citations
2 Altmetric

Abstract

Exploitation of logical schema information can allow producing better physical designs for a database. In order to exploit this information, one has to extract it from the data stored in the database. Extraction should be performed using some kind of an algorithm that provides an acceptable level of result quality. This quality has to be ensured, for example, in terms of precision.

In this paper we consider a particular type of such information: functional dependencies. One of the well-known algorithms for extraction of functional dependencies is the TANE algorithm. We propose to study its precision-related properties which are relevant for its use in our automatic physical design tool. TANE, being an approximate algorithm, returns only a fraction of existing dependencies. It is also prone to false positives. In contrast with the previous research, which measured run times and memory consumption, we aim to evaluate the quality of this algorithm.

Finally, we briefly describe the context of this study—constructing an alternative physical design tuning system that would use the output of the TANE algorithm. The system is an ordinary vertical partitioning tool, but which operates without workload knowledge, relying on data characteristics. Our plan is to employ TANE inside the functional dependency detection component. Thus, the purpose of evaluation is to study to what extent the properties of the algorithm affect our goals.

This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellatreche, L.: Optimization and tuning in data warehouses. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1995–2003. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_259
Google Scholar
Lightstone, S.: Physical database design for relational databases. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 2108–2114. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_644
Google Scholar
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)
Article MATH Google Scholar
Chaudhuri, S., Weikum, G.: Self-management technology in databases. In: Liu, L., Öszu, M. (eds.) Encyclopedia of Database Systems, pp. 2550–2555. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_334
Google Scholar
Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004, pp. 359–370 (2004)
Google Scholar
Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD 2002, pp. 558–569 (2002)
Google Scholar
Nehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: SIGMOD 2011, pp. 1137–1148 (2011)
Google Scholar
Agrawal, S., Chu, E., Narasayya, V.: Automatic physical design tuning: workload as a sequence. In: SIGMOD 2006, pp. 683–694 (2006)
Google Scholar
Alagiannis, I., Dash, D., Schnaitter, K., Ailamaki, A., Polyzotis, N.: An automated, yet interactive and portable DB designer. In: SIGMOD 2010, pp. 1183–1186 (2010)
Google Scholar
Schnaitter, K., Abiteboul, S., Milo, T., Polyzotis, N.: Colt: continuous on-line tuning. In: SIGMOD 2006, pp. 793–795 (2006)
Google Scholar
Hose, K., Klan, D., Marx, M., Sattler, K.U.: When is it time to rethink the aggregate configuration of your OLAP server? Proc. VLDB Endow. 1(2), 1492–1495 (2008)
Article Google Scholar
Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03730-6_9
Chapter Google Scholar
Bellatreche, L., Boukhalfa, K., Abdalla, H.I.: SAGA: a combination of genetic and simulated annealing algorithms for physical data warehouse design. In: Bell, D.A., Hong, J. (eds.) BNCOD 2006. LNCS, vol. 4042, pp. 212–219. Springer, Heidelberg (2006). doi:10.1007/11788911_18
Chapter Google Scholar
Bellatreche, L., Cuzzocrea, A., Benkrid, S.: \(\cal{F}\)&\(\cal{A}\): a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Berlin (2010). doi:10.1007/978-3-642-15105-7_8
Chapter Google Scholar
Gebaly, K.E., Aboulnaga, A.: Robustness in automatic physical database design. In: EDBT 2008, pp. 145–156 (2008)
Google Scholar
Zilio, D., Zuzarte, C., Lightstone, S., Ma, W., Lohman, G., Cochrane, R., Pirahesh, H., Colby, L., Gryz, J., Alton, E., Valentin, G.: Recommending materialized views and indexes with the IBM DB2 design advisor. In: ICAC 2004, pp. 180–187, May 2004
Google Scholar
Chaudhuri, S., Narasayya, V.: Self-tuning database systems: a decade of progress. In: VLDB 2007, pp. 3–14. VLDB Endowment (2007)
Google Scholar
Chernishev, G.: A survey of DBMS physical design approaches. SPIIRAS Proc. 24, 222–276 (2013)
Google Scholar
Quix, C., Li, X., Kensche, D., Geisler, S.: View management techniques and their application to data stream management. In: Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions, pp. 83–112 (2010)
Google Scholar
Mami, I., Bellahsene, Z.: A survey of view selection methods. SIGMOD Rec. 41(1), 20–29 (2012)
Article Google Scholar
Wah, B.: File placement on distributed computer systems. Computer 17(1), 23–32 (1984)
Article Google Scholar
Chernishev, G.: Towards self-management in a distributed column-store system. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. CCIS, vol. 539, pp. 97–107. Springer, Cham (2015). doi:10.1007/978-3-319-23201-0_12
Chapter Google Scholar
Novelli, N., Cicchetti, R.: FUN: an efficient algorithm for mining functional and embedded dependencies. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 189–203. Springer, Heidelberg (2001). doi:10.1007/3-540-44503-X_13
Chapter Google Scholar
Yao, H., Hamilton, H.J., Butz, C.J.: FD_Mine: discovering functional dependencies in a database using equivalences. In: ICDM 2002, pp. 729–732 (2002)
Google Scholar
Abedjan, Z., Schulze, P., Naumann, F.: DFD: efficient functional dependency discovery. In: CIKM 2014, pp. 949–958 (2014)
Google Scholar
Lopes, S., Petit, J.-M., Lakhal, L.: Efficient discovery of functional dependencies and Armstrong relations. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 350–364. Springer, Heidelberg (2000). doi:10.1007/3-540-46439-5_24
Chapter Google Scholar
Flach, P.A., Savnik, I.: Database dependency discovery: a machine learning approach. AI Commun. 12(3), 139–160 (1999)
MathSciNet Google Scholar
Bobrov, N., Chernishev, G., Novikov, B.: Workload-independent data-driven vertical partitioning. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A., Gamper, J., Wrembel, J., Darmont, J., Rizzi, S. (eds.) ADBIS 2017. CCIS, vol. 767. Springer, Cham (2017)
Google Scholar
Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. Proc. VLDB Endow. 8(10), 1082–1093 (2015)
Article Google Scholar
Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)
Article Google Scholar
Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2012)
Article Google Scholar
Song, S., Chen, L.: Differential dependencies: reasoning and discovery. ACM Trans. Database Syst. 36(3), 16:1–16:41 (2011)
Article Google Scholar
TPC: TPC Benchmark H. Decision Support. http://www.tpc.org/tpch
Federal Railroad Administration Office of Safety Analysis: FRA Highway-Rail Crossing Inventory Database. http://safetydata.fra.dot.gov/OfficeofSafety/default.aspx
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE implementation. http://www.cs.helsinki.fi/research/fdk/datamining/tane/
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Papadomanolakis, S., Ailamaki, A.: Autopart: automating schema design for large scientific databases using data partitioning. In: SSDBM 2004, pp. 383–392 (2004)
Google Scholar
Boehm, A.M., Seipel, D., Sickmann, A., Wetzka, M.: Squash: a tool for analyzing, tuning and refactoring relational database applications. In: Seipel, D., Hanus, M., Wolf, A. (eds.) INAP/WLP -2007. LNCS (LNAI), vol. 5437, pp. 82–98. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00675-3_6
Chapter Google Scholar
Qian, L., LeFevre, K., Jagadish, H.V.: CRIUS: user-friendly database design. Proc. VLDB Endow. 4(2), 81–92 (2010)
Article Google Scholar
Wiese, D., Rabinovitch, G., Reichert, M., Arenswald, S.: Autonomic tuning expert: a framework for best-practice oriented autonomic database tuning. In: CASCON 2008, pp. 327–341 (2008)
Google Scholar
De Marchi, F., Lopes, S., Petit, J.M., Toumani, F.: Analysis of existing databases at the logical level: the DBA companion project. SIGMOD Rec. 32(1), 47–52 (2003)
Article Google Scholar

Download references

Acknowledgments

We would like to thank Felix Naumann for his valuable comments on the previous version of this paper. We would also like to thank anonymous reviewers for their valuable comments on this work. This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.

Author information

Authors and Affiliations

Saint Petersburg State University, St. Petersburg, Russia
Nikita Bobrov, George Chernishev, Dmitry Grigoriev & Boris Novikov
JetBrains Research, Prague, Czech Republic
George Chernishev & Boris Novikov

Authors

Nikita Bobrov
View author publications
You can also search for this author in PubMed Google Scholar
George Chernishev
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Grigoriev
View author publications
You can also search for this author in PubMed Google Scholar
Boris Novikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Chernishev .

Editor information

Editors and Affiliations

ISAE-ENSMA, Chasseneuil, France
Yassine Ouhammou
University of Novi Sad, Novi Sad, Serbia
Mirjana Ivanovic
UPC-Barcelona Tech, Barcelona, Spain
Alberto Abelló
ISAE-ENSMA, Chasseneuil, France
Ladjel Bellatreche

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bobrov, N., Chernishev, G., Grigoriev, D., Novikov, B. (2017). An Evaluation of TANE Algorithm for Functional Dependency Detection. In: Ouhammou, Y., Ivanovic, M., Abelló, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2017. Lecture Notes in Computer Science(), vol 10563. Springer, Cham. https://doi.org/10.1007/978-3-319-66854-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-66854-3_16
Published: 06 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66853-6
Online ISBN: 978-3-319-66854-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics