Skip to main content
Log in

Business Process Engineering for Data Storing and Processing in a Collaborative Distributed Environment Based on Provenance Metadata, Smart Contracts and Blockchain Technology

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

We suggest a novel approach to designing totally decentralized data management systems in distributed environments with administratively unrelated or loosely related user groups and in conditions of partial or complete lack of trust between them. This approach is based on the integration of blockchain technology, smart contracts and provenance metadata driven data management. Provenance metadata (PMD) contain key information that is necessary to determine the origin, authorship and quality of relevant data, their storage and usage consistency, and for interpretation and confirmation of relevant results of data processing. Architecture, operation principles and algorithms have been developed for the system, entitled ProvHL (Provenance HyperLedger), which provides fault-tolerant, safe and reliable management of provenance metadata, control of operations with data files, as well as resource access management in collaborative distributed computing systems (CDCS). CDCS refers to distributed systems formed by combining into a single pool of computer resources of various organizations (institutions) to work together in the framework of a project. The paper also suggests a new blockchain-based method for delegation of rights within distributed computing systems which is free from shortcomings inherent in other solutions. The main goal of the work is to demonstrate the capabilities of the proposed approach and the above technologies to improve the functional properties of CDCSs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

All data generated or analysed during this study are included in this published article.

References

  1. Ali, S., Wang, G., White, B., Cottrell, R.: A blockchain-based decentralized data storage and access framework for PingER. In: 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp 1303–1308. https://doi.org/10.1109/trustcom/bigdatase.2018.00179 (2018)

  2. Allen, D.W., Berg, A., Markey-Towler, B.: Blockchain and supply chains: V-form organisations, value redistributions, de-commoditisation and quality proxies. J. Br. Blockchain Assoc. 2(1), 1–8 (2019). https://doi.org/10.31585/jbba-2-1-(3)2019

    Article  Google Scholar 

  3. Androulaki, E., Barger, A., Bortnikov, V., Cachin, C., Christidis, K., De Caro, A., Enyeart, D., Ferris, C., Laventman, G., Manevich, Y., et al: Hyperledger fabric: a distributed operating system for permissioned blockchains. In: Proceedings of the Thirteenth EuroSys Conference, pp 1–15. https://doi.org/10.1145/3190508.3190538 (2018)

  4. Apache Kafka: A distributed streaming platform. https://kafka.apache.org

  5. Azaria, A., Ekblaw, A., Vieira, T., Lippman, A.: MedRec: Using blockchain for medical data access and permission management. In: Proceedings of the 2nd International Conference on Open and Big Data, pp 25–30. https://doi.org/10.1109/obd.2016.11 (2016)

  6. Baird, L.: The swirlds hashgraph consensus algorithm: Fair, fast, byzantine fault tolerance. Report SWIRLDS-TR-2016, Swirlds, Inc. http://pages.cpsc.ucalgary.ca/joel.reardon/blockchain/readings/hashgraph.pdf (2016)

  7. Baliga, A.: Understanding blockchain consensus models. Report 1–14 persistent systems (2017)

  8. Baliga, A., Solanki, N., Verekar, S., Pednekar, A., Kamat, P., Chatterjee, S.: Blockchain for enterprise: Overview, opportunities and challenges. In: Crypto Valley Conference on Blockchain Technology (CVCBT), pp 65–74. https://doi.org/10.1109/cvcbt.2018.00013 (2018)

  9. Bartoletti, M., Bellomy, B., Pompianu, L.: A journey into bitcoin metadata. J. Grid Comput. 17(1), 3–22 (2019)

    Article  Google Scholar 

  10. Berghöfer, T., Agrafioti, I., Allen, B., Beckmann, V., Chiarusi, T., Delfino, M., Hesping, S., Chudoba, J., Dell’Agnello, L., Katsanevas, S., et al: Towards a model for computing in european astroparticle physics. arXiv:1512.00988, pp 1–23 (2015)

  11. Bistarelli, S., Mercanti, I., Santancini, P., Santini, F.: End-to-end voting with non-permissioned and permissioned ledgers. J. Grid Comput. 17(1), 97–118 (2019)

    Article  Google Scholar 

  12. Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Comput. Surv. (CSUR) 37(1), 1–28 (2005). https://doi.org/10.1145/1057977.1057978

    Article  Google Scholar 

  13. Buchmann, J.A., Karatsiolis, E., Wiesmaier, A.: Introduction to public key infrastructures. Springer science & business media. https://doi.org/10.1007/978-3-642-40657-7 (2013)

  14. Buneman, P., Tan, W.C.: Data provenance: What next? ACM SIGMOD Rec. 47 (3), 5–16 (2019). https://doi.org/10.1145/3316416.3316418

    Article  Google Scholar 

  15. Buterin, V.: On public and private blockchains. https://blog.ethereum.org/2015/08/07/on-public-and-private-blockchains (2015)

  16. Buterin, V.: Ethereum white paper. https://github.com/ethereum/wiki/wiki/White-Paper (2016)

  17. Bychkov, I., Demichev, A., Dubenskaya, J., Fedorov, O., Haungs, A., Heiss, A., Kang, D., Kazarina, Y., Korosteleva, E., Kostunin, D., et al: Russian–german astroparticle data life cycle initiative. Data 3(4), 56 (2018). https://doi.org/10.3390/data3040056

    Article  Google Scholar 

  18. Cachin, C., Schubert, S., Vukolić, M.: Non-determinism in byzantine fault-tolerant replication. arXiv:1603.07351, pp 1–20 (2016)

  19. Cachin, C., Vukolić, M.: Blockchain consensus protocols in the wild. arXiv:1707.01873, pp 1–11 (2017)

  20. Casino, F., Dasaklis, T.K., Patsakis, C.: A systematic literature review of blockchain-based applications: current status, classification and open issues. Telematics Inform. 36, 55–81 (2019). https://doi.org/10.1016/j.tele.2018.11.006

    Article  Google Scholar 

  21. Castro, M., Liskov, B.: Practical byzantine fault tolerance. In: Proceedings of the 3rd Symposium on Operating Systems Design and Implementation, pp 173– 186. https://www.usenix.org/publications/library/proceedings/osdi99/full_papers/castro/castro.ps (1999)

  22. Christidis, K.: A Kafka-based ordering service for fabric. https://docs.google.com/document/d/19JihmW-8blTzN99lAubOfseLUZqdrB6sBR0HsRgCAnY/edit (2016)

  23. Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., Polk, W.T., et al: Internet x. 509 public key infrastructure certificate and certificate revocation list (crl) profile. RFC 5280, 1–151 (2008). https://doi.org/10.17487/rfc5280

    Google Scholar 

  24. da Cruz, S.M.S., Campos, M.L.M., Mattoso, M.: Towards a taxonomy of provenance in scientific workflow management systems. In: 2009 Congress on Services-I. IEEE, pp 259–266. https://doi.org/10.1109/services-i.2009.18 (2009)

  25. Daneshgar, F., Sianaki, O., Guruwacharya, P.: Blockchain: A research framework for data security and privacy. In: Workshops of the International Conference on Advanced Information Networking and Applications, pp 966–974. https://doi.org/10.1007/978-3-030-15035-8∖_95 (2019)

  26. Demichev, A., Kryukov, A.: Management of provenance metadata for large scientific experiments based on the distributed consensus algorithm. In: Russian Supercomputing Days, pp. 287–295. https://2018.russianscdays.org/files/pdf18/287.pdf (2018)

  27. Demichev, A., Kryukov, A., Prikhod’ko, N.: Blockchain-based delegation of rights in distributed computing environment. In: International Conference on Parallel Computing Technologies. Springer, pp 408–418 (2019)

  28. Demichev, A., Kryukov, A., Prikhod’ko, N.: Metadata driven data management in distributed computing environments with partial or complete lack of trust between user groups. In: 2019 Ivannikov Ispras Open Conference (ISPRAS). IEEE, pp 35–41 (2019)

  29. Dinh, T.T.A., Wang, J., Chen, G., Liu, R., Ooi, B.C., Tan, K.L.: Blockbench: A framework for analyzing private blockchains. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp 1085–1100. https://doi.org/10.1145/3035918.3064033 (2017)

  30. Dubenskaya, J., Kryukov, A., Demichev, A., Prikhodko, N.: New security infrastructure model for distributed computing systems. J. Phys. Conf. Ser. 681(1), 012051 (2016). https://doi.org/10.1088/1742-6596/681/1/012051

    Article  Google Scholar 

  31. Egberts, A.: The Oracle problem – An analysis of how blockchain oracles undermine the advantages of decentralized ledger systems. SSRN Electron. J., pp 3382343. https://doi.org/10.2139/ssrn.3382343 (2017)

  32. Foster, I., Kesselman, C.: The Grid 2: Blueprint for a new computing infrastructure. Elsevier, New York (2003)

    Google Scholar 

  33. Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: A survey. Comput. Sci. Eng. 10(3), 11–21 (2008). https://doi.org/10.1109/mcse.2008.79

    Article  Google Scholar 

  34. Gorenflo, C., Lee, S., Golab, L., Keshav, S.: Fastfabric: Scaling hyperledger fabric to 20,000 transactions per second. In: 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC). IEEE, pp 455–463. https://doi.org/10.1002/nem.2099 (2019)

  35. Greenberg, J.: Big metadata, smart metadata, and metadata capital: Toward greater synergy between data science and metadata. J. Data Inf. Sci. 2(3), 19–36 (2017). https://doi.org/10.1515/jdis-2017-0012

    Google Scholar 

  36. Hamida, E., Brousmiche, K., Levard, H., Thea, E.: Blockchain for enterprise: Overview, opportunities and challenges. In: The Thirteenth International Conference on Wireless and Mobile Communications (ICWMC2017), no. hal-01591859 in HAL, pp 1–7. https://hal.archives-ouvertes.fr/hal-01591859/document (2017)

  37. Han, R., Gramoli, V., Xu, X.: Evaluating blockchains for IoT. In: 2018 9Th IFIP International Conference on New Technologies, Mobility and Security (NTMS). IEEE, pp 1–5. https://doi.org/10.1109/ntms.2018.8328736 (2018)

  38. Hang, L., Choi, E., Kim, D.: A novel EMR integrity management based on a medical blockchain platform in hospital. Electronics 8(4), 467 (2019). https://doi.org/10.3390/electronics8040467

    Article  Google Scholar 

  39. Hölbl, M., Kompara, M., Kamišalić, A., Nemec Zlatolas, L.: A systematic review of the use of blockchain in healthcare. Symmetry 10(10), 470 (2018). https://doi.org/10.3390/sym10100470

    Article  Google Scholar 

  40. Hyperledger Architecture Working Group: Hyperledger architecture volume 1: Introduction to hyperledger business blockchain design philosophy and consensus. https://www.hyperledger.org/wp-content/uploads/2017/08/Hyperledger_Arch_WG_Paper_1_Consensus.pdf

  41. Hyperledger Fabric: A blockchain platform for the enterprise. https://hyperledger-fabric.readthedocs.io/en/latest

  42. Kochovski, P., Gec, S., Stankovski, V., Bajec, M., Drobintsev, P.D.: Trust management in a blockchain based fog computing platform with trustless smart oracles. Futur. Gener. Comput. Syst. 101, 747–759 (2019)

    Article  Google Scholar 

  43. Kochovski, P., Stankovski, V., Gec, S., Faticanti, F., Savi, M., Siracusa, D., Kum, S.: Smart contracts for service-level agreements in edge-to-cloud computing. J Grid Comput 18 (4), 673690 (2020)

    Article  Google Scholar 

  44. Koufil, D., Basney, J.: A credential renewal service for long-running jobs. In: The 6th IEEE/ACM International Workshop on Grid Computing, 2005, IEEE pp 6-pp. https://doi.org/10.1109/grid.2005.1542725(2005)

  45. Kryukov, A., Demichev, A.: Architecture of distributed data storage for astroparticle physics. Lobachevskii J. Math. 39(9), 1199–1206 (2018). https://doi.org/10.1134/s1995080218090123

    Article  MathSciNet  Google Scholar 

  46. Kryukov, A., Demichev, A.: Decentralized data storages: Technologies of construction. Program. Comput. Softw. 44(5), 303–315 (2018). https://doi.org/10.1134/s0361768818050067

    Article  Google Scholar 

  47. Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998). https://doi.org/10.1145/279227.279229

    Article  Google Scholar 

  48. Liang, X., Shetty, S., Tosh, D., Kamhoua, C., Kwiat, K., Njilla, L.: Provchain: A blockchain-based data provenance architecture in cloud environment with enhanced privacy and availability. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp 468–477. https://doi.org/10.1109/ccgrid.2017.8 (2017)

  49. Liu, S., Viotti, P., Cachin, C., Quéma, V., Vukolić, M.: XFT: Practical fault tolerance beyond crashes. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), vol 1, pp 485–500. https://www.usenix.org/system/files/conference/osdi16/osdi16-liu.pdf (2016)

  50. Norman, A.T.: Blockchain technology explained: The ultimate beginners guide About blockchain wallet, mining, bitcoin, ethereum, litecoin, zcash, monero, ripple, dash, iota and smart contracts. Createspace Independent Publishing Platform, North Charleston (2017)

    Google Scholar 

  51. OGF: Open grid forum. https://www.ogf.org/ogf/doku.php/documents/documents

  52. Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: 2014 {USENIX} Annual Technical Conference ({USENIX}{ATC} 14), pp 305–319. https://www.usenix.org/system/files/conference/atc14/atc14-paper-ongaro.pdf (2014)

  53. Prinz, W., Martínez-Carreras, M. A., Pallot, M.: From collaborative tools to collaborative working environments. In: Advancing Collaborative Knowledge Environments: New Trends in E-Collaboration. IGI Global, pp 1–10. https://doi.org/10.4018/978-1-61350-459-8.ch001 (2012)

  54. Ramachandran, A., Kantarcioglu, M.: Using blockchain and smart contracts for secure data provenance management. arXiv:1709.10000, pp 1–11 (2017)

  55. Ramachandran, A., Kantarcioglu, M.: SmartProvenance: A distributed, blockchain based data provenance system. In: CODASPY’18: The 8th ACM Conference on Data and Application Security and Privacy, pp 35–42. https://doi.org/10.1145/3176258.3176333 (2018)

  56. Rouhani, S., Pourheidari, V., Deters, R.: Physical access control management system based on permissioned blockchain. In: IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp 1078–1083. https://doi.org/10.1109/cybermatics∖_2018.2018.00198(2018)

  57. Saini, V.: ConsensusPedia: An encyclopedia of 30+ consensus algorithms. https://hackernoon.com/consensuspedia-an-encyclopedia-of-29-consensus-algorithms-e9c4b4b7d08f (2018)

  58. Salman, T., Zolanvari, M., Erbad, A., Jain, R., Samaka, M.: Security services using blockchains: A state of the art survey. IEEE Commun. Surv. Tutorials 21(1), 858–880 (2018). https://doi.org/10.1109/comst.2018.2863956

    Article  Google Scholar 

  59. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM Sigmod Rec. 34(3), 31–36 (2005). https://doi.org/10.1145/1084805.1084812

    Article  Google Scholar 

  60. Sukhwani, H., Wang, N., Trivedi, K.S., Rindos, A.: Performance modeling of hyperledger fabric (permissioned blockchain network). In: 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA). IEEE, pp 1–8. https://doi.org/10.1109/nca.2018.8548070 (2018)

  61. Szabo, N.: The idea of smart contracts. http://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/{CDROM}/Literature/{LOT}winterschool2006/szabo.best.vwh.net/smart_contracts_idea.html (1997)

  62. Tasnim, M., Al Omar, A., Rahman, M., Bhuiyan, M.: Crab: Blockchain based criminal record management system. In: International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, pp 294–303. https://doi.org/10.1007/978-3-030-05345-1_25(2018)

  63. Thanh, T., Mohan, S., Choi, E., Kim, S., Kim, P.: A taxonomy and survey on distributed file systems. In: Fourth International Conference on Networked Computing and Advanced Information Management, vol 1, pp 144–149. https://doi.org/10.1109/ncm.2008.162 (2008)

  64. Tuecke, S., Welch, V., Engert, D., Pearlman, L., Thompson, M.: Internet x. 509 public key infrastructure proxy certificate profile, 2003. RFC 3820, 1–37 (2004). https://doi.org/10.17487/rfc3820

    Google Scholar 

  65. Uchibeke, U., Schneider, K., Kassani, S., Deters, R.: Blockchain access control ecosystem for big data security. In: IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp 1373–1378. https://doi.org/10.1109/cybermatics_2018.2018.00236(2018)

  66. Valenta, M., P., S.: Comparison of ethereum, hyperledger fabric and corda. FSBC working paper, pp 1–8. https://pdfs.semanticscholar.org/00c7/5699db7c5f2196ab0ae92be0430be4b291b4.pdf (2017)

  67. Vassiliadis, P.: A survey of Extract-Transform-Load technology. Int. J. Datawarehouse Min. 5(3), 1–27 (2009). https://doi.org/10.4018/978-1-60960-537-7.ch008

    Article  Google Scholar 

  68. W3C Provenance Incubator Group: What is provenance. https://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance (2010)

  69. W3C Recommendation: PROV-DM: The PROV data model. https://www.w3.org/TR/prov-dm (2013)

  70. WLCG: Worldwide LHC computing grid. https://wlcg.web.cern.ch

  71. Wood, G., et al: Ethereum: A secure decentralised generalised transaction ledger. http://gavwood.com/paper.pdf (2015)

  72. Zafar, F., Khan, A., Suhail, S., Ahmed, I., Hameed, K., Khan, H.M., Jabeen, F., Anjum, A.: Trustworthy data: A survey, taxonomy and future trends of secure provenance schemes. J. Netw. Comput. Appl. 94, 50–68 (2017). https://doi.org/10.1016/j.jnca.2017.06.003

    Article  Google Scholar 

Download references

Acknowledgments

This work was funded by the Russian Science Foundation (grant No. 18-11-00075).

Preliminary versions of some principles of building decentralized CDCSs (Section 4.2), as well as of the rights delegation method (Section 4.6) based on the blockchain technology, were presented at the International conferences ISPRAS’2019 [28] and PaCT’2019 [27] by the authors of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrey Demichev.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was funded by the Russian Science Foundation (grant No. 18-11-00075).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Demichev, A., Kryukov, A. & Prikhod’ko, N. Business Process Engineering for Data Storing and Processing in a Collaborative Distributed Environment Based on Provenance Metadata, Smart Contracts and Blockchain Technology. J Grid Computing 19, 3 (2021). https://doi.org/10.1007/s10723-021-09544-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10723-021-09544-4

Keywords

Navigation