AUDIT: approving and tracking updates with dependencies in collaborative databases

Abstract

Collaborative databases such as genome databases, often involve extensive curation activities where collaborators need to interact to be able to converge and agree on the content of data. In a typical scenario, a member of the collaboration makes some updates and these become visible to all collaborators for possible comments and modifications. At the same time, these updates are usually pending the approval or rejection from the data custodian based on the related discussion and the content of the data. Unfortunately, the approval and authorization of updates in current databases is based solely on the identity of the user, e.g., via the SQL GRANT and REVOKE commands. In this paper, we present a scalable cloud-based collaborative database system to support collaboration and data curation scenarios. Our system is based on an Update Pending Approval model. In a nutshell, when a collaborator updates a given data item, it is marked as pending approval until the data custodian approves or rejects the update. Until then, any other collaborator can view and comment on the data, pending its approval. We fully realized our system inside HBase, a cloud-based platform. We also conducted extensive experiments showing that the system scales well under different workloads.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

References

  1. 1.

    BLAST. http://blast.ncbi.nlm.nih.gov/Blast.cgi/

  2. 2.

    Fagin, R.: On an authorization mechanism. ACM Trans. Database Syst. 3(3), 310–319 (1978)

    MathSciNet  Article  Google Scholar 

  3. 3.

    Griffiths, P.P., Wade, B.W.: An authorization mechanism for a relational database system. ACM TODS 1(3), 242–255 (1976)

    Article  Google Scholar 

  4. 4.

    Apache hbase. https://hbase.apache.org/

  5. 5.

    Mershad, K., Malluhi, Q., Quzzani, M., Tang, M., Aref, A.: Approving updates in collaborative databases. In: Proceedings of the 3rd IEEE International Conference on Cloud Engineering. IC2E 15 (2015)

  6. 6.

    Dayal, U., Hsu, M., Ladin, R.: Organizing long-running activities with triggers and transactions. SIGMOD Rec. 19(2), 204–214 (1990)

    Article  Google Scholar 

  7. 7.

    Garcia-Molina, H., Salem, K.: Sagas. SIGMOD Rec. 16(3), 249–259 (1987)

    Article  Google Scholar 

  8. 8.

    Aiken, A., Hellerstein, J., Widom, J.: Behavior of database production rules: termination, confluence, and observable determinism. In: SIGMOD (1992)

  9. 9.

    Paton, N.W., Daz, O.: Active database systems. ACM Comput. Surv. 31(1), 63–103 (1999)

    Article  Google Scholar 

  10. 10.

    Lomet, D., Barga, R., Mokbel, M., Shegalov, G.: Transaction time support inside a database engine. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 06) (2006)

  11. 11.

    Oracle flashback. http://www.oracle.com/technetwork/issue-archive/2008/08-jul/o48totalrecall-092147.html/

  12. 12.

    Shankar, S., Kini, A., DeWitt, D.J., Naughton, J.: Integrating databases and workflow systems. SIGMOD Rec. 34(3), 5–11 (2005)

    Article  Google Scholar 

  13. 13.

    Apache subversion. http://subversion.apache.org/

  14. 14.

    Git. http://git-scm.com/

  15. 15.

    Bhardwaj, A., Deshpande, A., Elmore, A.J., Karger, D., Madden, S., Parameswaran, A., Subramanyam, H., Wu, E., Zhang, R.: Collaborative data analytics with DataHub. Proc. VLDB Endow. 8(12), 1916–1919 (2015)

    Article  Google Scholar 

  16. 16.

    Xu, L., Huang, S., Hui, S., Elmore, A.J., Parameswaran, A.: ORPHEUSDB: a lightweight approach to relational dataset versioning. In: Proceedings of the ACM International Conference on Management of Data, ACM 2017, pp. 1655–1658 (2017)

  17. 17.

    Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation management system for relational databases. VLDB J. 14(4), 373–396 (2005)

    Article  Google Scholar 

  18. 18.

    Buneman, P., Chapman, A., Cheney, J.: Provenance man-agement in curated databases. In: SIGMOD 06. ACM, pp. 539–550 (2006)

  19. 19.

    Buneman, P., Khanna, S., Wang-Chiew, T.: Why and where: a characterization of data provenance. In: Database Theory ICDT 2001, Ser. Lecture Notes in Computer Science, vol. 1973, pp. 316–330. Springer, Heidelberg (2001)

  20. 20.

    Davidson, S.B., Freire, J.: Provenance and scientific workflows: Challenges and opportunities. In: SIGMOD 08. ACM, pp. 1345–1350 (2008)

  21. 21.

    Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: enabling database-style workflow provenance. Proc. VLDB Endow. 5(4), 346–357 (2011)

    Article  Google Scholar 

  22. 22.

    Wikipedia Page history. https://en.wikipedia.org/wiki/Help:Page_history/

  23. 23.

    XWIKI Homepage. http://www.xwiki.org/xwiki/bin/view/Main/

  24. 24.

    Eltabakh, M.Y., Ouzzani, M., Aref, W.G.: DBMS—a database management system for biological data. CIDR 2007, 196–206 (2007)

    Google Scholar 

  25. 25.

    Eltabakh, M., Aref, W.G., Elmagarmid, A., Ouzzani, M.: Handson db: managing data dependencies involving human actions. In: IEEE TKDE, no. PrePrints, p. 1 (2013)

  26. 26.

    Apache hadoop. http://hadoop.apache.org/

  27. 27.

    Hadoop distributed file system. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

  28. 28.

    Apache zookeeper. http://zookeeper.apache.org/

  29. 29.

    Bernstein, P.A., Goodman, N.: Concurrency control in distributed database systems. ACM Comput. Surv. 13(2), 185–221 (1981)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Apache hbase coprocessors. https://blogs.apache.org/hbase/entry/coprocessor_introduction

  31. 31.

    Protocol buffers. https://developers.google.com/protocol-buffers/

  32. 32.

    Wikimedia Downloads. https://dumps.wikimedia.org/

  33. 33.

    Wikipedia: Usability. https://en.wikipedia.org/wiki/Usability

  34. 34.

    Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: GenBank. Nucleic Acids Res. 41(D1), D36–D42 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This publication was made possible by the support of an NPRP Grant 4-1534-1-247 from the the Qatar National Research Fund (a member of Qatar Foundation) and the National Science Foundation under Grants IIS-1117766 and IIS-0964639. The statements made herein are solely the responsibility of the authors.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Khaleel Mershad.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mershad, K., Malluhi, Q.M., Ouzzani, M. et al. AUDIT: approving and tracking updates with dependencies in collaborative databases. Distrib Parallel Databases 36, 81–119 (2018). https://doi.org/10.1007/s10619-017-7208-y

Download citation

Keywords

  • Collaborative databases
  • Cloud computing
  • Data dependency
  • Multiversion data
  • Update authorization
  • Big data