Abstract
Most of the research on deep neural networks so far has been focused on obtaining higher accuracy levels by building increasingly large and deep architectures. Training and evaluating these models is only feasible when large amounts of resources such as processing power and memory are available. Typical applications that could benefit from these models are, however, executed on resource-constrained devices. Mobile devices such as smartphones already use deep learning techniques, but they often have to perform all processing on a remote cloud. We propose a new architecture called a cascading network that is capable of distributing a deep neural network between a local device and the cloud while keeping the required communication network traffic to a minimum. The network begins processing on the constrained device, and only relies on the remote part when the local part does not provide an accurate enough result. The cascading network allows for an early-stopping mechanism during the recall phase of the network. We evaluated our approach in an Internet of Things context where a deep neural network adds intelligence to a large amount of heterogeneous connected devices. This technique enables a whole variety of autonomous systems where sensors, actuators and computing nodes can work together. We show that the cascading architecture allows for a substantial improvement in evaluation speed on constrained devices while the loss in accuracy is kept to a minimum.
Similar content being viewed by others
References
Hinton G, LeCun Y et al (2015) Guest editorial: deep learning. Int J Comput Vis 113(1):1–2
Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with cots hpc systems. In: Proceedings of the 30th international conference on machine learning, pp 1337–1345
Leroux S, Bohez S, Verbelen T, Vankeirsbilck B, Simoens P, Dhoedt B (2015) Resource-constrained classification using a cascade of neural network layers. In: IJCNN 2015
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database, In: IEEE conference on computer vision and pattern recognition, 2009 (CVPR 2009). IEEE, pp 248–255
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cognit Model 5(3):1
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, Curran Associates, pp 1223–1231
Xu Z, Weinberger K, Chapelle O (2012) The greedy miser: learning under test-time budgets, arXiv preprint arXiv:1206.6451
Singer K (2004) Online classification on a budget. Adv Neural Inf Process Syst 16:225
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Lefakis L, Fleuret F (2010) Joint cascade optimization using a product of boosted classifiers. In: Advances in neural information processing systems, Curran Associates, pp 1315–1323
Xu ZE, Kusner MJ, Weinberger KQ, Chen M, Chapelle O (2014) Classifier cascades and trees for minimizing feature evaluation cost. J Mach Learn Res 15:2113–2144
Bucilu C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 535–541
Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, Neural Information Processing Systems Foundation, Inc, pp 2654–2662
Hinton GE, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. In: NIPS 2014 deep learning workshop
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets, arXiv preprint arXiv:1412.6550
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen Y (2015) Compressing neural networks with the hashing trick, arXiv preprint arXiv:1504.04788
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision, arXiv preprint arXiv:1502.02551
Courbariaux M, Bengio Y, David J-P (2014) Low precision arithmetic for deep learning, arXiv preprint arXiv:1412.7024
LeCun Y, Denker JS, Solla SA, Howard RE, Jackel LD (1989) Optimal brain damage. In: NIPs, vol 89
Figurnov M, Vetrov D, Kohli P (2015) Perforatedcnns: acceleration through elimination of redundant convolutions, arXiv preprint arXiv:1504.08362
Park E, Kim D, Kim S, Kim Y-D, Kim G, Yoon S, Yoo S (2015) Big/little deep neural network for ultra low power inference. In: 2015 International conference on hardware/software codesign and system synthesis (CODES+ISSS). IEEE, pp 124–132
Richard MD, Lippmann RP (1991) Neural network classifiers estimate bayesian a posteriori probabilities. Neural Comput 3(4):461–483
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, Neural Information Processing Systems Foundation, Inc, pp 3320–3328
Bonomi F, Milito R, Zhu J, Addepalli S (2012) Fog computing and its role in the internet of things. In: Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, pp 13–16
Verbelen T, Simoens P, De Turck F, Dhoedt B (2012) Cloudlets: bringing the cloud to the mobile user. In: Proceedings of the third ACM workshop on mobile cloud computing and services. ACM, pp 29–36
Boahen K (2005) Neuromorphic microchips. Sci Am 292(5):56–63
Ovtcharov K, Ruwase O, Kim J-Y, Fowers J, Strauss K, Chung ES (2015) Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res Whitepap 2(11)
Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, Bouchard N, Warde-Farley D, Bengio Y (2012) Theano: new features and speed improvements, arXiv preprint arXiv:1211.5590
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Jackel L, Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Muller U, Sackinger E, Simard P et al (1995) Learning algorithms for classification: a comparison on handwritten digit recognition. Neural Netw Stat Mech Perspect 261:276
Wan L, Zeiler M, Zhang S, Cun YL, Fergus R (2013) Regularization of neural networks using dropconnect. In: Proceedings of the 30th international conference on machine learning (ICML-13), pp 1058–1066
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput Math Math Phys 4(5):1–17
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Loosli G, Canu S, Bottou L (2007) Training invariant support vector machines using selective sampling. In: Bottou L, Chapelle O, DeCoste D, Weston J (eds) Large scale kernel machines. MIT Press, Cambridge, pp 301–320 (online). http://leon.bottou.org/papers/loosli-canu-bottou-2006
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto. Tech. Rep, vol 1(4), p 7
Karpathy A (2011) Lessons learned from manually classifying cifar-10. Published online at http://karpathy.github.io/2011/04/27/manually-classifying-cifar10/
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition, arXiv preprint arXiv:1512.03385
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2014) Imagenet large scale visual recognition challenge, arXiv preprint arXiv:1409.0575
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification, arXiv preprint arXiv:1502.01852
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: International conference on learning representations (ICLR 2014). CBLS
Acknowledgements
Part of this work was supported by the iMinds IoT Research Program. Steven Bohez is funded by a Ph.D. grant of the Agency for Innovation by Science and Technology in Flanders (IWT). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU and the Jetson TK1 used for this research.
Author information
Authors and Affiliations
Corresponding author
Appendix: The overfeat cascading architecture
Appendix: The overfeat cascading architecture
The adapted overfeat network with two extra output layers. Maxpooling is used to reduce the dimensionality before applying the additional softmax layers. This is a larger version of Fig. 11.
Rights and permissions
About this article
Cite this article
Leroux, S., Bohez, S., De Coninck, E. et al. The cascading neural network: building the Internet of Smart Things. Knowl Inf Syst 52, 791–814 (2017). https://doi.org/10.1007/s10115-017-1029-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-017-1029-1