Skip to main content

An Evaluation of TANE Algorithm for Functional Dependency Detection

  • Conference paper
  • First Online:
Book cover Model and Data Engineering (MEDI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10563))

Included in the following conference series:

Abstract

Exploitation of logical schema information can allow producing better physical designs for a database. In order to exploit this information, one has to extract it from the data stored in the database. Extraction should be performed using some kind of an algorithm that provides an acceptable level of result quality. This quality has to be ensured, for example, in terms of precision.

In this paper we consider a particular type of such information: functional dependencies. One of the well-known algorithms for extraction of functional dependencies is the TANE algorithm. We propose to study its precision-related properties which are relevant for its use in our automatic physical design tool. TANE, being an approximate algorithm, returns only a fraction of existing dependencies. It is also prone to false positives. In contrast with the previous research, which measured run times and memory consumption, we aim to evaluate the quality of this algorithm.

Finally, we briefly describe the context of this study—constructing an alternative physical design tuning system that would use the output of the TANE algorithm. The system is an ordinary vertical partitioning tool, but which operates without workload knowledge, relying on data characteristics. Our plan is to employ TANE inside the functional dependency detection component. Thus, the purpose of evaluation is to study to what extent the properties of the algorithm affect our goals.

This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bellatreche, L.: Optimization and tuning in data warehouses. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1995–2003. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_259

    Google Scholar 

  2. Lightstone, S.: Physical database design for relational databases. In: Liu, L., Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 2108–2114. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_644

    Google Scholar 

  3. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE: an efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42(2), 100–111 (1999)

    Article  MATH  Google Scholar 

  4. Chaudhuri, S., Weikum, G.: Self-management technology in databases. In: Liu, L., Öszu, M. (eds.) Encyclopedia of Database Systems, pp. 2550–2555. Springer, New York (2009). doi:10.1007/978-0-387-39940-9_334

    Google Scholar 

  5. Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004, pp. 359–370 (2004)

    Google Scholar 

  6. Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD 2002, pp. 558–569 (2002)

    Google Scholar 

  7. Nehme, R., Bruno, N.: Automated partitioning design in parallel database systems. In: SIGMOD 2011, pp. 1137–1148 (2011)

    Google Scholar 

  8. Agrawal, S., Chu, E., Narasayya, V.: Automatic physical design tuning: workload as a sequence. In: SIGMOD 2006, pp. 683–694 (2006)

    Google Scholar 

  9. Alagiannis, I., Dash, D., Schnaitter, K., Ailamaki, A., Polyzotis, N.: An automated, yet interactive and portable DB designer. In: SIGMOD 2010, pp. 1183–1186 (2010)

    Google Scholar 

  10. Schnaitter, K., Abiteboul, S., Milo, T., Polyzotis, N.: Colt: continuous on-line tuning. In: SIGMOD 2006, pp. 793–795 (2006)

    Google Scholar 

  11. Hose, K., Klan, D., Marx, M., Sattler, K.U.: When is it time to rethink the aggregate configuration of your OLAP server? Proc. VLDB Endow. 1(2), 1492–1495 (2008)

    Article  Google Scholar 

  12. Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03730-6_9

    Chapter  Google Scholar 

  13. Bellatreche, L., Boukhalfa, K., Abdalla, H.I.: SAGA: a combination of genetic and simulated annealing algorithms for physical data warehouse design. In: Bell, D.A., Hong, J. (eds.) BNCOD 2006. LNCS, vol. 4042, pp. 212–219. Springer, Heidelberg (2006). doi:10.1007/11788911_18

    Chapter  Google Scholar 

  14. Bellatreche, L., Cuzzocrea, A., Benkrid, S.: \(\cal{F}\)&\(\cal{A}\): a methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Berlin (2010). doi:10.1007/978-3-642-15105-7_8

    Chapter  Google Scholar 

  15. Gebaly, K.E., Aboulnaga, A.: Robustness in automatic physical database design. In: EDBT 2008, pp. 145–156 (2008)

    Google Scholar 

  16. Zilio, D., Zuzarte, C., Lightstone, S., Ma, W., Lohman, G., Cochrane, R., Pirahesh, H., Colby, L., Gryz, J., Alton, E., Valentin, G.: Recommending materialized views and indexes with the IBM DB2 design advisor. In: ICAC 2004, pp. 180–187, May 2004

    Google Scholar 

  17. Chaudhuri, S., Narasayya, V.: Self-tuning database systems: a decade of progress. In: VLDB 2007, pp. 3–14. VLDB Endowment (2007)

    Google Scholar 

  18. Chernishev, G.: A survey of DBMS physical design approaches. SPIIRAS Proc. 24, 222–276 (2013)

    Google Scholar 

  19. Quix, C., Li, X., Kensche, D., Geisler, S.: View management techniques and their application to data stream management. In: Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions, pp. 83–112 (2010)

    Google Scholar 

  20. Mami, I., Bellahsene, Z.: A survey of view selection methods. SIGMOD Rec. 41(1), 20–29 (2012)

    Article  Google Scholar 

  21. Wah, B.: File placement on distributed computer systems. Computer 17(1), 23–32 (1984)

    Article  Google Scholar 

  22. Chernishev, G.: Towards self-management in a distributed column-store system. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. CCIS, vol. 539, pp. 97–107. Springer, Cham (2015). doi:10.1007/978-3-319-23201-0_12

    Chapter  Google Scholar 

  23. Novelli, N., Cicchetti, R.: FUN: an efficient algorithm for mining functional and embedded dependencies. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 189–203. Springer, Heidelberg (2001). doi:10.1007/3-540-44503-X_13

    Chapter  Google Scholar 

  24. Yao, H., Hamilton, H.J., Butz, C.J.: FD_Mine: discovering functional dependencies in a database using equivalences. In: ICDM 2002, pp. 729–732 (2002)

    Google Scholar 

  25. Abedjan, Z., Schulze, P., Naumann, F.: DFD: efficient functional dependency discovery. In: CIKM 2014, pp. 949–958 (2014)

    Google Scholar 

  26. Lopes, S., Petit, J.-M., Lakhal, L.: Efficient discovery of functional dependencies and Armstrong relations. In: Zaniolo, C., Lockemann, P.C., Scholl, M.H., Grust, T. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 350–364. Springer, Heidelberg (2000). doi:10.1007/3-540-46439-5_24

    Chapter  Google Scholar 

  27. Flach, P.A., Savnik, I.: Database dependency discovery: a machine learning approach. AI Commun. 12(3), 139–160 (1999)

    MathSciNet  Google Scholar 

  28. Bobrov, N., Chernishev, G., Novikov, B.: Workload-independent data-driven vertical partitioning. In: Kirikova, M., Nørvåg, K., Papadopoulos, G.A., Gamper, J., Wrembel, J., Darmont, J., Rizzi, S. (eds.) ADBIS 2017. CCIS, vol. 767. Springer, Cham (2017)

    Google Scholar 

  29. Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J.P., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. Proc. VLDB Endow. 8(10), 1082–1093 (2015)

    Article  Google Scholar 

  30. Abedjan, Z., Golab, L., Naumann, F.: Profiling relational data: a survey. VLDB J. 24(4), 557–581 (2015)

    Article  Google Scholar 

  31. Liu, J., Li, J., Liu, C., Chen, Y.: Discover dependencies from data—a review. IEEE Trans. Knowl. Data Eng. 24(2), 251–264 (2012)

    Article  Google Scholar 

  32. Song, S., Chen, L.: Differential dependencies: reasoning and discovery. ACM Trans. Database Syst. 36(3), 16:1–16:41 (2011)

    Article  Google Scholar 

  33. TPC: TPC Benchmark H. Decision Support. http://www.tpc.org/tpch

  34. Federal Railroad Administration Office of Safety Analysis: FRA Highway-Rail Crossing Inventory Database. http://safetydata.fra.dot.gov/OfficeofSafety/default.aspx

  35. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: TANE implementation. http://www.cs.helsinki.fi/research/fdk/datamining/tane/

  36. Manning, C.D., Raghavan, P., SchĂĽtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  37. Papadomanolakis, S., Ailamaki, A.: Autopart: automating schema design for large scientific databases using data partitioning. In: SSDBM 2004, pp. 383–392 (2004)

    Google Scholar 

  38. Boehm, A.M., Seipel, D., Sickmann, A., Wetzka, M.: Squash: a tool for analyzing, tuning and refactoring relational database applications. In: Seipel, D., Hanus, M., Wolf, A. (eds.) INAP/WLP -2007. LNCS (LNAI), vol. 5437, pp. 82–98. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00675-3_6

    Chapter  Google Scholar 

  39. Qian, L., LeFevre, K., Jagadish, H.V.: CRIUS: user-friendly database design. Proc. VLDB Endow. 4(2), 81–92 (2010)

    Article  Google Scholar 

  40. Wiese, D., Rabinovitch, G., Reichert, M., Arenswald, S.: Autonomic tuning expert: a framework for best-practice oriented autonomic database tuning. In: CASCON 2008, pp. 327–341 (2008)

    Google Scholar 

  41. De Marchi, F., Lopes, S., Petit, J.M., Toumani, F.: Analysis of existing databases at the logical level: the DBA companion project. SIGMOD Rec. 32(1), 47–52 (2003)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank Felix Naumann for his valuable comments on the previous version of this paper. We would also like to thank anonymous reviewers for their valuable comments on this work. This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Chernishev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bobrov, N., Chernishev, G., Grigoriev, D., Novikov, B. (2017). An Evaluation of TANE Algorithm for Functional Dependency Detection. In: Ouhammou, Y., Ivanovic, M., AbellĂł, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2017. Lecture Notes in Computer Science(), vol 10563. Springer, Cham. https://doi.org/10.1007/978-3-319-66854-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66854-3_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66853-6

  • Online ISBN: 978-3-319-66854-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics