Privacy preserving distributed training of neural networks


Learnae is a system aiming to achieve a fully distributed way of neural network training. It follows a “Vires in Numeris” approach, combining the resources of commodity personal computers. It has a full peer-to-peer model of operation; all participating nodes share the exact same privileges and obligations. Another significant feature of Learnae is its high degree of fault tolerance. All training data and metadata are propagated through the network using resilient gossip protocols. This robust approach is essential in environments with unreliable connections and frequently changing set of nodes. It is based on a versatile working scheme and supports different roles, depending on processing power and training data availability of each peer. In this way, it allows an expanded application scope, ranging from powerful workstations to online sensors. To maintain a decentralized architecture, all underlying tech should be fully distributed too. Learnae’s coordinating algorithm is platform agnostic, but for the purpose of this research two novel projects have been used: (1) IPFS, a decentralized filesystem, as a means to distribute data in a permissionless environment and (2) IOTA, a decentralized network targeting the world of low energy “Internet of Things” devices. In our previous work, a first approach was attempted on the feasibility of using distributed ledger technology to collaboratively train a neural network. Now, our research is extended by applying Learnae to a fully deployed computer network and drawing the first experimental results. This article focuses on use cases that require data privacy; thus, there is only exchanging of model weights and not training data.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13


  1. 1.

    Bitswap homepage,

  2. 2.

    IOTA Foundation homepage,

  3. 3.

    Serguei Popov, “The Tangle”,

  4. 4.

    HEPMASS Dataset homepage,


  1. 1.

    Nikolaidis S, Refanidis I (2019) Learnae: distributed and resilient deep neural network training for heterogeneous peer to peer topologies. In: International conference engineering applications of neural networks, pp 286–298

  2. 2.

    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX symposium on operating systems design and implementation, pp 265–283

  3. 3.

    Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Mao M, Razato M, Senior A, Tucker P, Yang K, Ng AY (2012) Large scale distributed deep networks In: Advances in neural information processing systems, pp 1223–1231

  4. 4.

    Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202

    MathSciNet  MATH  Google Scholar 

  5. 5.

    Li M, Andersen DG, Smola AJ, Yu K (2014) Communication efficient distributed machine learning with the parameter server In: Advances in neural information processing systems (NIPS), 2014

  6. 6.

    Zhang S, Choromanska A, LeCun Y (2015) Deep learning with elastic averaging SGD. In: Advances in neural information processing systems, pp 685–693

  7. 7.

    Niu F, Recht B, Re C, Wright SJ (2011) “HOGWILD!: A lock-free approach to parallelizing stochastic gradient descent”, arXiv:1106.5730v2, 2011

  8. 8.

    Shokri R, Shmatikov V (2015) Privacy-preserving deep learning. In: Proceedings of the 22nd ACM SIGSAC, 2015, pp 1310–1321

  9. 9.

    Zinkevich M, Weimer M, Li L, Smola AJ (2010) Parallelized stochastic gradient descent. In: advances in neural information processing systems (NIPS)

  10. 10.

    McDonald R, Hall K, Mann G (2010) Distributed training strategies for the structured perceptron. In: Proceedings of North American chapter of the association for computational linguistics (NAACL)

  11. 11.

    Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. In: Proceedings of learningsys, 2015

  12. 12.

    Iandola FN, Ashraf K, Moskewicz MW, Keutzer K (2015) FireCaffe: near-linear acceleration of deep neural network training on compute clusters. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  13. 13.

    Lian X, Zhang C, Zhang H, Hsieh CJ, Zhang W, Liu J (2017) Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. In: Advances in neural information processing systems (NIPS)

  14. 14.

    Jiang Z, Balu A, Hegde C, Sarkar S (2017) Collaborative deep learning in fixed topology networks. In: Advances in neural information processing systems (NIPS)

  15. 15.

    Lian X, Zhang W, Zhang C, Liu J 2018 Asynchronous decentralized parallel stochastic gradient descent. In: Proceedings of international conference on machine learning (ICML)

  16. 16.

    Yu H, Yang S, Zhu S (2018) Parallel restarted SGD with faster convergence and less communication: demystifying why model averaging works for deep learning”, arXiv:1807.06629

  17. 17.

    Agarwal A, Duchi JC (2011) Distributed delayed stochastic optimization. In: NIPS, 2011

  18. 18.

    Feyzmahdavian HR, Aytekin A, Johansson M (2016) An asynchronous mini-batch algorithm for regularized stochastic optimization. IEEE Trans Autom Control 61:3740

    MathSciNet  Article  Google Scholar 

  19. 19.

    Paine T, Jin H, Yang J, Lin Z, Huang T (2013) Gpu asynchronous stochastic gradient descent to speed up neural network training, arXiv:1312.6186

  20. 20.

    Recht B, Re C, Wright S, Niu F (2011) Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in neural information processing systems

  21. 21.

    Seide F, Fu H, Droppo J, Li G, Yu D (2014) 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In: Annual conference of the international speech communication association (INTERSPEECH)

  22. 22.

    Alistarh D, Grubic D, Li J, Tomioka R, Vojnovic M (2017) QSGD: Communication-efficient SGD via gradient quantization and encoding. In: Advances in neural information processing systems (NIPS)

  23. 23.

    Wen W, Xu C, Yan F, Wu C, Wang Y, Chen Y, Li H (2017) Terngrad: ternary gradients to reduce communication in distributed deep learning. In: Advances in neural information processing systems (NIPS)

  24. 24.

    Strom N (2015) Scalable distributed DNN training using commodity GPU cloud computing. In: Annual conference of the international speech communication association (INTERSPEECH)

  25. 25.

    Dryden N, Moon T, Jacobs SA, Van Essen B (2016) Communication quantization for data-parallel training of deep neural networks. In: Workshop on machine learning in HPC environments (MLHPC)

  26. 26.

    Aji AF, Heafield K (2017) Sparse communication for distributed gradient descent. In: Conference on empirical methods in natural language processing (EMNLP)

  27. 27.

    McMahan HB, Moore E, Ramage D, Hampson S, et al (2017) Communication-efficient learning of deep networks from decentralized data. In: International conference on artificial intelligence and statistics (AISTATS)

  28. 28.

    Benet J, IPFS-content addressed, versioned, P2P File System”, arXiv:1407.3561, 2014

  29. 29.

    Mashtizadeh AJ, Bittau A, Huang YF, Mazieres D (2013) Replication, history, and grafting in the ori file system. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, ACM, 2013 pp 151–166

  30. 30.

    Cohen B (2003) Incentives build robustness in bittorrent. In: Workshop on economics of peer-to-peer systems, vol 6. pp 68–72

  31. 31.

    Baumgart I, Mies S (2007) S/kademlia: a practicable approach towards secure key based routing. In: Parallel and distributed systems international conference

  32. 32.

    Freedman MJ, Freudenthal E, Mazieres D (2004) Democratizing content publication with coral. NSDI 4:18–18

    Google Scholar 

  33. 33.

    Wang L, Kangasharju J (2013) Measuring large-scale distributed systems: case of bittorrent mainline dht. In: 2013 IEEE thirteenth international conference, IEEE, 2013 pp 1–10

  34. 34.

    Levin D, LaCurts K, Spring N, Bhattacharjee B (2008) Bittorrent is an auction: analyzing and improving bittorrent’s incentives. In: ACM SIGCOMM computer communication review, vol 38. ACM, pp 243–254

  35. 35.

    Dean J, Ghemawat S (2011) Leveldb–a fast and lightweight key/value database library by google

  36. 36.

    Popov S, Saa O, Finardi P (2018) Equilibria in the tangle arXiv:1712.05385, 2018

  37. 37.

    Coates A, Huval B, Wang T Wu D, Catanzaro B, Andrew N (2013) Deep learning with cots hpc systems. In: Proceedings of the 30th international conference on machine learning, 2013, pp 1337–1345

  38. 38.

    Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks. In: Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference, 2014

  39. 39.

    Miao Y, Zhang H, Metze F (2014) Distributed learning of multilingual dnn feature extractors using GPUs. In: Fifteenth Annual Conference of the International Speech Communication Association

  40. 40.

    Povey D, Zhang X, Khudanpur S (2014) Parallel training of deep neural networks with natural gradient and parameter averaging. arXiv:1410.7455

  41. 41.

    Seide F, Fu H, Droppo J, Li G, Yu D (2014) 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In: Fifteenth annual conference of the international speech communication association,

Download references


This research is funded by the University of Macedonia Research Committee as part of the “Principal Research 2019” funding program.

Author information



Corresponding author

Correspondence to Spyridon Nikolaidis.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nikolaidis, S., Refanidis, I. Privacy preserving distributed training of neural networks. Neural Comput & Applic 32, 17333–17350 (2020).

Download citation


  • Decentralized neural network training
  • Data privacy
  • Weight averaging
  • Distributed ledger technology
  • IPFS
  • IOTA