Abstract
In this work, we present a blockchain-based secure and flexible distributed privacy-preserving online model that helps in sharing key features of datasets across multiple organizations without violating the privacy of data. In our model, all members are encouraged to participate, discouraged to write fake data. Learning is carried out without sharing of raw data, and data sharing is immutable that improves prediction results of the data held by each member of an industry. We also propose a new consensus algorithm—Proof of Share for adding a valid transaction to the blockchain, thus preventing non participating members from reading any of the data shared by the peer and discouraging fake writes. We evaluated our model on 3, 5, and 10 members setup by applying decision tree, logistic regression, Gaussian naive Bayes, and support vector machine classifiers. The maximum increase of \(26.9231\%\) was observed in accuracy where results of a member’s data were taken as baseline. \(F_{\beta }(\beta =0.5)\) score increased by 0.4533 and \(F_{1}\) score by 0.0800. The proposed model to the best of our knowledge is the only one that encourages all members to participate, rather than being passive listeners and discourages a member from forging results thus rendering it suitable for utilization in domains like health care, finance, education, etc. where data are unevenly split and secrecy of data and peers is required.
Similar content being viewed by others
Data availability
Data are publicly available on UCI machine learning Repository https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State [37]
References
Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Technical Report Manubot (2019)
Zheng, Z., Xie, S., Dai, H.-N., Chen, X., Wang, H.: Blockchain challenges and opportunities: a survey. 505 Int. J. Web Grid Serv. 14, 352–375 (2018)
Kuo, T.-T., Ohno-Machado, L.: Modelchain: decentralized privacy-preserving healthcare predictive modeling framework on private blockchain networks. arXiv:1802.01746 (2018)
Omar, I.A., Jayaraman, R., Salah, K., Yaqoob, I., Ellahham, S.: Applications of blockchain technology in clinical trials: review and open challenges. Arabian J. Sci. Eng. 46, 3001–3015 (2020)
Yuølnes, S., Ubacht, J., Janssen, M.: Blockchain in government: benefits and implications of distributed ledger technology for information sharing. Gov. Inf. Q. 34, 355–364 (2017)
Vacca, A., Di Sorbo, A., Visaggio, C.A., Canfora, G.: A systematic literature review of blockchain and smart contract development: techniques, tools, and open challenges. J. Syst. Softw. 174, 110891 (2021). https://doi.org/10.1016/j.jss.2020.110891
Liu, M., Wu, K., Xu, J.J.: How will blockchain technology impact auditing and accounting: permissionless versus permissioned blockchain. Current Issues Audit. 13, A19–A29 (2019)
Mingxiao, D., Xiaofeng, M., Zhe, Z., Xiangwei, W., Qijun, C.: A review on consensus algorithm of blockchain. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2567–2572. IEEE (2017)
Woo, T.Y., Lam, S.S.: Authentication for distributed systems. Computer 25, 39–52 (1992)
Swain, P.H., Hauska, H.: The decision tree classifier: design and potential. IEEE Trans. Geosci. Electron. 15, 142–147 (1977)
Song, Y.-Y., Ying, L.: Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27, 130 (2015)
Wright, R.E.: Logistic regression (1995)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Langley, P., Iba, W., Thompson, K. et al.: An analysis of Bayesian classifiers. In: Aaai pp. 223–228. Citeseer volume 90, (1992)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Wu, Y., Jiang, X., Kim, J., Ohno-Machado, L.: G rid Binary LO gistic RE gression (GLORE): building shared models without sharing data. J. Am. Med. Inf. Assoc. 19, 758–764 (2012)
Jiang, W., Li, P., Wang, S., Wu, Y., Xue, M., Ohno-Machado, L., Jiang, X.: Webglore: a web service for grid logistic regression. Bioinformatics 29, 3238–3240 (2013)
Shi, H., Jiang, C., Dai, W., Jiang, X., Tang, Y., Ohno-Machado, L., Wang, S.: Secure multi-pArty computation grid LOgistic REgression (SMAC-GLORE). BMC Med. Inform. Decis. Mak. 16, 175–187 (2016)
Wang, S., Jiang, X., Wu, Y., Cui, L., Cheng, S., Ohno-Machado, L.: Expectation propagation logistic regression (explorer): distributed privacy-preserving online model learning. J. Biomed. Inf. 46, 480–496 (2013)
Li, Y., Jiang, X., Wang, S., Xiong, H., Ohno-Machado, L.: Vertical grid logistic regression (vertigo). J. Am. Med. Inform. Assoc. 23, 570–579 (2016)
Huang, L., Shea, A.L., Qian, H., Masurkar, A., Deng, H., Liu, D.: Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 99, 103291 (2019)
Wang, S., Chang, T.-H.: Federated clustering via matrix factorization models: from model averaging to gradient sharing. arXiv:2002.04930, (2020)
Mohassel, P., Zhang, Y.: Secureml: A system for scalable privacy-preserving machine learning. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. IEEE (2017)
Shokri, R., Shmatikov, V.: Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1310–1321 (2015)
Phuong, T.T., et al.: Privacy-preserving deep learning via weight transmission. IEEE Trans. Inf. Forensics Secur. 14, 3003–3015 (2019)
Aono, Y., Hayashi, T., Wang, L., Moriai, S., et al.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 13, 1333–1345 (2017)
Brisimi, T.S., Chen, R., Mela, T., Olshevsky, A., Paschalidis, I.C., Shi, W.: Federated learning of predictive models from federated electronic health records. Int. J Med. Inform. 112, 59–67 (2018)
Duan, M., Liu, D., Chen, X., Tan, Y., Ren, J., Qiao, L., Liang, L.: Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep learning applications. In: 2019 IEEE 37th International Conference on Computer Design (ICCD), pp. 246–254. IEEE (2019)
Xie, M., Long, G., Shen, T., Zhou, T., Wang, X., Jiang, J.: Multi-center federated learning. arXiv:2005.01026, (2020)
Kim, Y., Hakim, E. A., Haraldson, J., Eriksson, H., Silva Jr., J. M.B.D., Fischione, C.: Dynamic clustering in federated learning. arXiv:2012.03788 (2020)
Choudhury, O., Gkoulalas-Divanis, A., Salonidis, T., Sylla, I., Park, Y., Hsu, G., Das, A.: Differential privacyenabled federated learning for sensitive health data. arXiv:1910.02578, (2019)
Bouacida, N., Mohapatra, P.: Vulnerabilities in federated learning. IEEE Access 23(9), 63229–49 (2021)
Kuo, T.-T., Kim, J., Gabriel, R.A.: Privacy-preserving model learning on a blockchain network-of networks. J. Am. Med. Inform. Assoc. 27, 343–354 (2020)
Kuo, T.-T., Gabriel, R.A., Ohno-Machado, L.: Fair compute loads enabled by blockchain: sharing models by alternating client and server roles. J. Am. Med. Inform. Assoc. 26, 392–403 (2019)
Kuo, T.-T., Gabriel, R.A., Cidambi, K.R., Ohno-Machado, L.: Ex pectation p ropagation logistic regression on permissioned block chain (explorerchain): decentralized online healthcare/genomics predictive model learning. J. Am. Med. Inform. Assoc. 27, 747–756 (2020)
Kennedy, R.L., Fraser, H.S., McStay, L.N., Harrison, R.F.: Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. Eur. Heart J. 17(8), 1181–91 (1996)
Dua, D., Graff, C.: UCI machine learning repository. URL:http://archive.ics.uci.edu/ml (2017)
Jere, M.S., Farnan, T., Koushanfar, F.: A taxonomy of attacks on federated learning. IEEE Secur. Privacy 19(2), 20–8 (2020)
Issa, W., Moustafa, N., Turnbull, B., Sohrabi, N., Tari, Z.: Blockchain-based federated learning for securing internet of things: a comprehensive survey. ACM Comput. Surv. 55(9), 1–43 (2023)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Routledge, Milton Park (2017)
Daemen, J., Rijmen, V.: Aes proposal: Rijndael, (1999)
Standard, D.E., et al.: Data encryption standard. Federal Information Processing Standards Publication, 112 (1999)
Kim, H., Park, J., Bennis, M., Kim, S.L.: Blockchained on-device federated learning. IEEE Commun. Lett. 24(6), 1279–1283 (2019)
Short, A.R., Leligou, H.C., Papoutsidakis, M., Theocharis, E.: Using blockchain technologies to improve security in federated learning systems. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 1183-1188. IEEE (2020 Jul 13)
Yin, X., Zhu, Y., Hu, J.: A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions. ACM Comput. Surv. (CSUR) 54(6), 1–36 (2021)
Wei, K., Li, J., Ding, M., Ma, C., Yang, H.H., Farokhi, F., Jin, S., Quek, T.Q., Poor, H.V.: Federated learning with differential privacy: algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 17(15), 3454–69 (2020)
Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 106-115. IEEE (2006, April)
Funding
The study has not been funded by any institute or agency.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bansal, V., Baliyan, N. & Ghosh, M. MLChain: a privacy-preserving model learning framework using blockchain. Int. J. Inf. Secur. 23, 649–677 (2024). https://doi.org/10.1007/s10207-023-00754-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10207-023-00754-3