Abstract
Exploitation of logical schema information can allow producing better physical designs for a database. In order to exploit this information, one has to extract it from the data stored in the database. Extraction should be performed using some kind of an algorithm that provides an acceptable level of result quality. This quality has to be ensured, for example, in terms of precision.
In this paper we consider a particular type of such information: functional dependencies. One of the well-known algorithms for extraction of functional dependencies is the TANE algorithm. We propose to study its precision-related properties which are relevant for its use in our automatic physical design tool. TANE, being an approximate algorithm, returns only a fraction of existing dependencies. It is also prone to false positives. In contrast with the previous research, which measured run times and memory consumption, we aim to evaluate the quality of this algorithm.
Finally, we briefly describe the context of this study—constructing an alternative physical design tuning system that would use the output of the TANE algorithm. The system is an ordinary vertical partitioning tool, but which operates without workload knowledge, relying on data characteristics. Our plan is to employ TANE inside the functional dependency detection component. Thus, the purpose of evaluation is to study to what extent the properties of the algorithm affect our goals.
This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellatreche, L.: Optimization and tuning in data warehouses. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1995–2003. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_259
Lightstone, S.: Physical database design for relational databases. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 2108–2114. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_644
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)
Chaudhuri, S., Weikum, G.: Self-management technology in databases. In: Liu, L., Öszu, M. (eds.) Encyclopedia of Database Systems, pp. 2550–2555. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_334
Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004, pp. 359–370 (2004)
Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD 2002, pp. 558–569 (2002)
Nehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: SIGMOD 2011, pp. 1137–1148 (2011)
Agrawal, S., Chu, E., Narasayya, V.: Automatic physical design tuning: workload as a sequence. In: SIGMOD 2006, pp. 683–694 (2006)
Alagiannis, I., Dash, D., Schnaitter, K., Ailamaki, A., Polyzotis, N.: An automated, yet interactive and portable DB designer. In: SIGMOD 2010, pp. 1183–1186 (2010)
Schnaitter, K., Abiteboul, S., Milo, T., Polyzotis, N.: Colt: continuous on-line tuning. In: SIGMOD 2006, pp. 793–795 (2006)
Hose, K., Klan, D., Marx, M., Sattler, K.U.: When is it time to rethink the aggregate configuration of your OLAP server? Proc. VLDB Endow. 1(2), 1492–1495 (2008)
Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03730-6_9
Bellatreche, L., Boukhalfa, K., Abdalla, H.I.: SAGA: a combination of genetic and simulated annealing algorithms for physical data warehouse design. In: Bell, D.A., Hong, J. (eds.) BNCOD 2006. LNCS, vol. 4042, pp. 212–219. Springer, Heidelberg (2006). doi:10.1007/11788911_18
Bellatreche, L., Cuzzocrea, A., Benkrid, S.: \(\cal{F}\)&\(\cal{A}\): a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Berlin (2010). doi:10.1007/978-3-642-15105-7_8
Gebaly, K.E., Aboulnaga, A.: Robustness in automatic physical database design. In: EDBT 2008, pp. 145–156 (2008)
Zilio, D., Zuzarte, C., Lightstone, S., Ma, W., Lohman, G., Cochrane, R., Pirahesh, H., Colby, L., Gryz, J., Alton, E., Valentin, G.: Recommending materialized views and indexes with the IBM DB2 design advisor. In: ICAC 2004, pp. 180–187, May 2004
Chaudhuri, S., Narasayya, V.: Self-tuning database systems: a decade of progress. In: VLDB 2007, pp. 3–14. VLDB Endowment (2007)
Chernishev, G.: A survey of DBMS physical design approaches. SPIIRAS Proc. 24, 222–276 (2013)
Quix, C., Li, X., Kensche, D., Geisler, S.: View management techniques and their application to data stream management. In: Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions, pp. 83–112 (2010)
Mami, I., Bellahsene, Z.: A survey of view selection methods. SIGMOD Rec. 41(1), 20–29 (2012)
Wah, B.: File placement on distributed computer systems. Computer 17(1), 23–32 (1984)
Chernishev, G.: Towards self-management in a distributed column-store system. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. CCIS, vol. 539, pp. 97–107. Springer, Cham (2015). doi:10.1007/978-3-319-23201-0_12
Novelli, N., Cicchetti, R.: FUN: an efficient algorithm for mining functional and embedded dependencies. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 189–203. Springer, Heidelberg (2001). doi:10.1007/3-540-44503-X_13
Yao, H., Hamilton, H.J., Butz, C.J.: FD_Mine: discovering functional dependencies in a database using equivalences. In: ICDM 2002, pp. 729–732 (2002)
Abedjan, Z., Schulze, P., Naumann, F.: DFD: efficient functional dependency discovery. In: CIKM 2014, pp. 949–958 (2014)
Lopes, S., Petit, J.-M., Lakhal, L.: Efficient discovery of functional dependencies and Armstrong relations. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 350–364. Springer, Heidelberg (2000). doi:10.1007/3-540-46439-5_24
Flach, P.A., Savnik, I.: Database dependency discovery: a machine learning approach. AI Commun. 12(3), 139–160 (1999)
Bobrov, N., Chernishev, G., Novikov, B.: Workload-independent data-driven vertical partitioning. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A., Gamper, J., Wrembel, J., Darmont, J., Rizzi, S. (eds.) ADBIS 2017. CCIS, vol. 767. Springer, Cham (2017)
Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. Proc. VLDB Endow. 8(10), 1082–1093 (2015)
Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)
Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2012)
Song, S., Chen, L.: Differential dependencies: reasoning and discovery. ACM Trans. Database Syst. 36(3), 16:1–16:41 (2011)
TPC: TPC Benchmark H. Decision Support. http://www.tpc.org/tpch
Federal Railroad Administration Office of Safety Analysis: FRA Highway-Rail Crossing Inventory Database. http://safetydata.fra.dot.gov/OfficeofSafety/default.aspx
Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE implementation. http://www.cs.helsinki.fi/research/fdk/datamining/tane/
Manning, C.D., Raghavan, P., SchĂĽtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Papadomanolakis, S., Ailamaki, A.: Autopart: automating schema design for large scientific databases using data partitioning. In: SSDBM 2004, pp. 383–392 (2004)
Boehm, A.M., Seipel, D., Sickmann, A., Wetzka, M.: Squash: a tool for analyzing, tuning and refactoring relational database applications. In: Seipel, D., Hanus, M., Wolf, A. (eds.) INAP/WLP -2007. LNCS (LNAI), vol. 5437, pp. 82–98. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00675-3_6
Qian, L., LeFevre, K., Jagadish, H.V.: CRIUS: user-friendly database design. Proc. VLDB Endow. 4(2), 81–92 (2010)
Wiese, D., Rabinovitch, G., Reichert, M., Arenswald, S.: Autonomic tuning expert: a framework for best-practice oriented autonomic database tuning. In: CASCON 2008, pp. 327–341 (2008)
De Marchi, F., Lopes, S., Petit, J.M., Toumani, F.: Analysis of existing databases at the logical level: the DBA companion project. SIGMOD Rec. 32(1), 47–52 (2003)
Acknowledgments
We would like to thank Felix Naumann for his valuable comments on the previous version of this paper. We would also like to thank anonymous reviewers for their valuable comments on this work. This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Bobrov, N., Chernishev, G., Grigoriev, D., Novikov, B. (2017). An Evaluation of TANE Algorithm for Functional Dependency Detection. In: Ouhammou, Y., Ivanovic, M., AbellĂł, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2017. Lecture Notes in Computer Science(), vol 10563. Springer, Cham. https://doi.org/10.1007/978-3-319-66854-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-66854-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66853-6
Online ISBN: 978-3-319-66854-3
eBook Packages: Computer ScienceComputer Science (R0)