Advertisement

Repairing Data Violations with Order Dependencies

  • Yu Qiu
  • Zijing TanEmail author
  • Kejia Yang
  • Weidong Yang
  • Xiangdong Zhou
  • Naiwang Guo
Conference paper
  • 2.4k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10828)

Abstract

Lexicographical order dependencies (ODs) are proposed to describe the relationships between two lexicographical ordering specifications with respect to lists of attributes, and are proved to be useful in query optimizations concerning ordered attributes. To take full advantage of ODs, the data instance is supposed to satisfy OD specifications. In practice, data are often found to violate given ODs, as demonstrated in recent studies on discovery of ODs. This highlights the quest for data repairing techniques for ODs, to restore consistency of the data with respect to ODs. New challenges arise since ODs convey order semantics beyond functional dependencies, and are specified on lists of attributes. In this paper, we make a first effort to develop techniques for repairing data violations with ODs. (1) We formalize the data repairing problem for ODs, and prove that it is NP-hard in the size of the data. (2) Despite the intractability, we develop effective heuristic algorithms to address the problem. (3) We experimentally evaluate the effectiveness and efficiency of our algorithms, using both real-life and synthetic data.

Notes

Acknowledgements

This work is supported by NSFC 61572135, NSFC 61370157, National High Technology Research and Development Program (863 Program) of China (2015AA050203), State Grid Rsearch Project No. 52094016000A, Shanghai Science and Technology Project (No. 16DZ1100200, 16DZ1110102), Aircraft Risk Management Database Project, National Nonprofit Ocean Research Project (No. 201405031-04).

References

  1. 1.
    Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)Google Scholar
  2. 2.
    Beskales, G., Ilyas, I., Golab, L., Galiullin, A.: Sampling from repairs of conditional functional dependency violations. VLDB J. 23(1), 103–128 (2014)CrossRefGoogle Scholar
  3. 3.
    Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: VLDB (2007)Google Scholar
  4. 4.
    Chu, X., Ilyas, I., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE (2013)Google Scholar
  5. 5.
    Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  6. 6.
    Dallachiesa, M., Ebaid, A., Eldawy, A. Elmagarmid, A., Ilyas, I., Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013)Google Scholar
  7. 7.
    Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. VLDB J. 21(2), 213–238 (2012)CrossRefGoogle Scholar
  8. 8.
    Ginsburg, S., Hull, R.: Order dependency in the relational model. TCS 26(1), 149–195 (1983)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kolahi, S., Lakshmanan, L.: On approximating optimum repairs for functional dependency violations. In: ICDT (2009)Google Scholar
  10. 10.
    Langer, P., Naumann, F.: Efficient order dependency detection. VLDB J. 25(2), 223–241 (2016)CrossRefGoogle Scholar
  11. 11.
    Ng, W.: An extension of the relational data model to incorporate ordered domains. TODS 26(3), 344–383 (2001)CrossRefGoogle Scholar
  12. 12.
    Song, S., Chen, L.: Differential dependencies: reasoning and discovery. TODS 36(3), 16:1–16:41 (2011)CrossRefGoogle Scholar
  13. 13.
    Szlichta, J., Godfrey, P., Gryz, J.: Fundamentals of order dependencies. PVLDB 5(11), 1220–1231 (2012)Google Scholar
  14. 14.
    Szlichta, J., Godfrey, P., Golab, L., Kargar, M., Srivastava, D.: Effective and complete discovery of order dependencies via set-based axiomatization. PVLDB 10(7), 721–732 (2017)Google Scholar
  15. 15.
    Szlichta, J., Godfrey, P., Gryz, J., Zuzarte, C.: Expressiveness and complexity of order dependencies. PVLDB 6(14), 1858–1869 (2013)Google Scholar
  16. 16.
    Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: SIGMOD (2014)Google Scholar
  17. 17.
    Zhang, A., Song, S., Wang, J.: Sequential data cleaning: a statistical approach. In: SIGMOD (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Yu Qiu
    • 1
    • 2
  • Zijing Tan
    • 1
    • 2
    Email author
  • Kejia Yang
    • 3
  • Weidong Yang
    • 1
    • 2
  • Xiangdong Zhou
    • 1
    • 2
  • Naiwang Guo
    • 4
  1. 1.School of Computer ScienceFudan UniversityShanghaiChina
  2. 2.Shanghai Key Laboratory of Data ScienceShanghaiChina
  3. 3.Computer Science and Mathematical ScienceUniversity of MichiganAnn ArborUSA
  4. 4.State Grid Shanghai Municipal Electric Power CompanyShanghaiChina

Personalised recommendations