Skip to main content
Log in

The cascading neural network: building the Internet of Smart Things

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Most of the research on deep neural networks so far has been focused on obtaining higher accuracy levels by building increasingly large and deep architectures. Training and evaluating these models is only feasible when large amounts of resources such as processing power and memory are available. Typical applications that could benefit from these models are, however, executed on resource-constrained devices. Mobile devices such as smartphones already use deep learning techniques, but they often have to perform all processing on a remote cloud. We propose a new architecture called a cascading network that is capable of distributing a deep neural network between a local device and the cloud while keeping the required communication network traffic to a minimum. The network begins processing on the constrained device, and only relies on the remote part when the local part does not provide an accurate enough result. The cascading network allows for an early-stopping mechanism during the recall phase of the network. We evaluated our approach in an Internet of Things context where a deep neural network adds intelligence to a large amount of heterogeneous connected devices. This technique enables a whole variety of autonomous systems where sensors, actuators and computing nodes can work together. We show that the cascading architecture allows for a substantial improvement in evaluation speed on constrained devices while the loss in accuracy is kept to a minimum.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://www.raspberrypi.org/.

  2. http://www.intel.com/content/www/us/en/do-it-yourself/edison.html.

  3. http://www.nvidia.com/object/jetson-tk1-embedded-dev-kit.html.

  4. http://leon.bottou.org/projects/infimnist.

References

  1. Hinton G, LeCun Y et al (2015) Guest editorial: deep learning. Int J Comput Vis 113(1):1–2

    Article  MathSciNet  Google Scholar 

  2. Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with cots hpc systems. In: Proceedings of the 30th international conference on machine learning, pp 1337–1345

  3. Leroux S, Bohez S, Verbelen T, Vankeirsbilck B, Simoens P, Dhoedt B (2015) Resource-constrained classification using a cascade of neural network layers. In: IJCNN 2015

  4. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database, In: IEEE conference on computer vision and pattern recognition, 2009 (CVPR 2009). IEEE, pp 248–255

  5. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  Google Scholar 

  6. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Article  Google Scholar 

  7. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  MATH  Google Scholar 

  8. Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cognit Model 5(3):1

  9. Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, Curran Associates, pp 1223–1231

  10. Xu Z, Weinberger K, Chapelle O (2012) The greedy miser: learning under test-time budgets, arXiv preprint arXiv:1206.6451

  11. Singer K (2004) Online classification on a budget. Adv Neural Inf Process Syst 16:225

    Google Scholar 

  12. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  13. Lefakis L, Fleuret F (2010) Joint cascade optimization using a product of boosted classifiers. In: Advances in neural information processing systems, Curran Associates, pp 1315–1323

  14. Xu ZE, Kusner MJ, Weinberger KQ, Chen M, Chapelle O (2014) Classifier cascades and trees for minimizing feature evaluation cost. J Mach Learn Res 15:2113–2144

    MathSciNet  MATH  Google Scholar 

  15. Bucilu C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 535–541

  16. Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, Neural Information Processing Systems Foundation, Inc, pp 2654–2662

  17. Hinton GE, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. In: NIPS 2014 deep learning workshop

  18. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets, arXiv preprint arXiv:1412.6550

  19. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  20. Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen Y (2015) Compressing neural networks with the hashing trick, arXiv preprint arXiv:1504.04788

  21. Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision, arXiv preprint arXiv:1502.02551

  22. Courbariaux M, Bengio Y, David J-P (2014) Low precision arithmetic for deep learning, arXiv preprint arXiv:1412.7024

  23. LeCun Y, Denker JS, Solla SA, Howard RE, Jackel LD (1989) Optimal brain damage. In: NIPs, vol 89

  24. Figurnov M, Vetrov D, Kohli P (2015) Perforatedcnns: acceleration through elimination of redundant convolutions, arXiv preprint arXiv:1504.08362

  25. Park E, Kim D, Kim S, Kim Y-D, Kim G, Yoon S, Yoo S (2015) Big/little deep neural network for ultra low power inference. In: 2015 International conference on hardware/software codesign and system synthesis (CODES+ISSS). IEEE, pp 124–132

  26. Richard MD, Lippmann RP (1991) Neural network classifiers estimate bayesian a posteriori probabilities. Neural Comput 3(4):461–483

    Article  Google Scholar 

  27. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, Neural Information Processing Systems Foundation, Inc, pp 3320–3328

  28. Bonomi F, Milito R, Zhu J, Addepalli S (2012) Fog computing and its role in the internet of things. In: Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, pp 13–16

  29. Verbelen T, Simoens P, De Turck F, Dhoedt B (2012) Cloudlets: bringing the cloud to the mobile user. In: Proceedings of the third ACM workshop on mobile cloud computing and services. ACM, pp 29–36

  30. Boahen K (2005) Neuromorphic microchips. Sci Am 292(5):56–63

    Article  Google Scholar 

  31. Ovtcharov K, Ruwase O, Kim J-Y, Fowers J, Strauss K, Chung ES (2015) Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res Whitepap 2(11)

  32. Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, Bouchard N, Warde-Farley D, Bengio Y (2012) Theano: new features and speed improvements, arXiv preprint arXiv:1211.5590

  33. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  34. LeCun Y, Jackel L, Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Muller U, Sackinger E, Simard P et al (1995) Learning algorithms for classification: a comparison on handwritten digit recognition. Neural Netw Stat Mech Perspect 261:276

    Google Scholar 

  35. Wan L, Zeiler M, Zhang S, Cun YL, Fergus R (2013) Regularization of neural networks using dropconnect. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1058–1066

  36. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  37. Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys 4(5):1–17

    Article  Google Scholar 

  38. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  39. Loosli G, Canu S, Bottou L (2007) Training invariant support vector machines using selective sampling. In: Bottou L, Chapelle O, DeCoste D, Weston J (eds) Large scale kernel machines. MIT Press, Cambridge, pp 301–320 (online). http://leon.bottou.org/papers/loosli-canu-bottou-2006

  40. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto. Tech. Rep, vol 1(4), p 7

  41. Karpathy A (2011) Lessons learned from manually classifying cifar-10. Published online at http://karpathy.github.io/2011/04/27/manually-classifying-cifar10/

  42. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition, arXiv preprint arXiv:1512.03385

  43. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  44. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2014) Imagenet large scale visual recognition challenge, arXiv preprint arXiv:1409.0575

  45. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification, arXiv preprint arXiv:1502.01852

  46. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: International conference on learning representations (ICLR 2014). CBLS

Download references

Acknowledgements

Part of this work was supported by the iMinds IoT Research Program. Steven Bohez is funded by a Ph.D. grant of the Agency for Innovation by Science and Technology in Flanders (IWT). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU and the Jetson TK1 used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sam Leroux.

Appendix: The overfeat cascading architecture

Appendix: The overfeat cascading architecture

figure j

The adapted overfeat network with two extra output layers. Maxpooling is used to reduce the dimensionality before applying the additional softmax layers. This is a larger version of Fig. 11.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leroux, S., Bohez, S., De Coninck, E. et al. The cascading neural network: building the Internet of Smart Things. Knowl Inf Syst 52, 791–814 (2017). https://doi.org/10.1007/s10115-017-1029-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1029-1

Keywords

Navigation