Abstract
Collaborative databases such as genome databases, often involve extensive curation activities where collaborators need to interact to be able to converge and agree on the content of data. In a typical scenario, a member of the collaboration makes some updates and these become visible to all collaborators for possible comments and modifications. At the same time, these updates are usually pending the approval or rejection from the data custodian based on the related discussion and the content of the data. Unfortunately, the approval and authorization of updates in current databases is based solely on the identity of the user, e.g., via the SQL GRANT and REVOKE commands. In this paper, we present a scalable cloud-based collaborative database system to support collaboration and data curation scenarios. Our system is based on an Update Pending Approval model. In a nutshell, when a collaborator updates a given data item, it is marked as pending approval until the data custodian approves or rejects the update. Until then, any other collaborator can view and comment on the data, pending its approval. We fully realized our system inside HBase, a cloud-based platform. We also conducted extensive experiments showing that the system scales well under different workloads.
Similar content being viewed by others
References
Fagin, R.: On an authorization mechanism. ACM Trans. Database Syst. 3(3), 310–319 (1978)
Griffiths, P.P., Wade, B.W.: An authorization mechanism for a relational database system. ACM TODS 1(3), 242–255 (1976)
Apache hbase. https://hbase.apache.org/
Mershad, K., Malluhi, Q., Quzzani, M., Tang, M., Aref, A.: Approving updates in collaborative databases. In: Proceedings of the 3rd IEEE International Conference on Cloud Engineering. IC2E 15 (2015)
Dayal, U., Hsu, M., Ladin, R.: Organizing long-running activities with triggers and transactions. SIGMOD Rec. 19(2), 204–214 (1990)
Garcia-Molina, H., Salem, K.: Sagas. SIGMOD Rec. 16(3), 249–259 (1987)
Aiken, A., Hellerstein, J., Widom, J.: Behavior of database production rules: termination, confluence, and observable determinism. In: SIGMOD (1992)
Paton, N.W., Daz, O.: Active database systems. ACM Comput. Surv. 31(1), 63–103 (1999)
Lomet, D., Barga, R., Mokbel, M., Shegalov, G.: Transaction time support inside a database engine. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 06) (2006)
Oracle flashback. http://www.oracle.com/technetwork/issue-archive/2008/08-jul/o48totalrecall-092147.html/
Shankar, S., Kini, A., DeWitt, D.J., Naughton, J.: Integrating databases and workflow systems. SIGMOD Rec. 34(3), 5–11 (2005)
Apache subversion. http://subversion.apache.org/
Git. http://git-scm.com/
Bhardwaj, A., Deshpande, A., Elmore, A.J., Karger, D., Madden, S., Parameswaran, A., Subramanyam, H., Wu, E., Zhang, R.: Collaborative data analytics with DataHub. Proc. VLDB Endow. 8(12), 1916–1919 (2015)
Xu, L., Huang, S., Hui, S., Elmore, A.J., Parameswaran, A.: ORPHEUSDB: a lightweight approach to relational dataset versioning. In: Proceedings of the ACM International Conference on Management of Data, ACM 2017, pp. 1655–1658 (2017)
Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation management system for relational databases. VLDB J. 14(4), 373–396 (2005)
Buneman, P., Chapman, A., Cheney, J.: Provenance man-agement in curated databases. In: SIGMOD 06. ACM, pp. 539–550 (2006)
Buneman, P., Khanna, S., Wang-Chiew, T.: Why and where: a characterization of data provenance. In: Database Theory ICDT 2001, Ser. Lecture Notes in Computer Science, vol. 1973, pp. 316–330. Springer, Heidelberg (2001)
Davidson, S.B., Freire, J.: Provenance and scientific workflows: Challenges and opportunities. In: SIGMOD 08. ACM, pp. 1345–1350 (2008)
Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: enabling database-style workflow provenance. Proc. VLDB Endow. 5(4), 346–357 (2011)
Wikipedia Page history. https://en.wikipedia.org/wiki/Help:Page_history/
XWIKI Homepage. http://www.xwiki.org/xwiki/bin/view/Main/
Eltabakh, M.Y., Ouzzani, M., Aref, W.G.: DBMS—a database management system for biological data. CIDR 2007, 196–206 (2007)
Eltabakh, M., Aref, W.G., Elmagarmid, A., Ouzzani, M.: Handson db: managing data dependencies involving human actions. In: IEEE TKDE, no. PrePrints, p. 1 (2013)
Apache hadoop. http://hadoop.apache.org/
Hadoop distributed file system. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
Apache zookeeper. http://zookeeper.apache.org/
Bernstein, P.A., Goodman, N.: Concurrency control in distributed database systems. ACM Comput. Surv. 13(2), 185–221 (1981)
Apache hbase coprocessors. https://blogs.apache.org/hbase/entry/coprocessor_introduction
Protocol buffers. https://developers.google.com/protocol-buffers/
Wikimedia Downloads. https://dumps.wikimedia.org/
Wikipedia: Usability. https://en.wikipedia.org/wiki/Usability
Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Sayers, E.W.: GenBank. Nucleic Acids Res. 41(D1), D36–D42 (2012)
Acknowledgements
This publication was made possible by the support of an NPRP Grant 4-1534-1-247 from the the Qatar National Research Fund (a member of Qatar Foundation) and the National Science Foundation under Grants IIS-1117766 and IIS-0964639. The statements made herein are solely the responsibility of the authors.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mershad, K., Malluhi, Q.M., Ouzzani, M. et al. AUDIT: approving and tracking updates with dependencies in collaborative databases. Distrib Parallel Databases 36, 81–119 (2018). https://doi.org/10.1007/s10619-017-7208-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-017-7208-y