Abstract
Capsule Network is powerful at defining the positional relationship between features in deep neural networks for visual recognition tasks, but it is computationally expensive and not suitable for running on mobile devices. The bottleneck is in the computational complexity of the Dynamic Routing mechanism used between the capsules. On the other hand, XNOR-Net is fast and computationally efficient, though it suffers from low accuracy due to information loss in the binarization process. To address the computational burdens of the Dynamic Routing mechanism, this paper proposes new Fully Connected (FC) layers by xnorizing the linear projection outside or inside the Dynamic Routing within the CapsFC layer. Specifically, our proposed FC layers have two versions, XnODR (Xnorize the Linear Projection Outside Dynamic Routing) and XnIDR (Xnorize the Linear Projection Inside Dynamic Routing). To test the generalization of both XnODR and XnIDR, we insert them into two different networks, MobileNetV2 and ResNet-50. Our experiments on three datasets, MNIST, CIFAR-10, and MultiMNIST validate their effectiveness. The results demonstrate that both XnODR and XnIDR help networks to have high accuracy with lower FLOPs and fewer parameters (e.g., 96.14% correctness with 2.99M parameters and 311.74M FLOPs on CIFAR-10).
Similar content being viewed by others
References
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. Advances in neural information processing systems 30 (2017)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, pp. 525–542 (2016). Springer
Jeong, T., Lee, Y., Kim, H.: Ladder capsule network. In: International Conference on Machine Learning, pp. 3071–3079 (2019). PMLR
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4510–4520 (2018)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., Wu, Y.: Coca: Contrastive captioners are image-text foundation models.arXiv:2205.01917 (2022)
Dai, Z., Liu, H., Le, Q.V., Tan, M.: Coatnet: Marrying convolution and attention for all data sizes. Adv Neural Inf Process 34, 3965–3977 (2021)
Xi, E., Bing, S., Jin, Y.: Capsule Network Performance on Complex Data. 1712–03480 (2017) arXiv:1712.03480
Lenssen, J.E., Fey, M., Libuschewski, P.: Group equivariant capsule networks. In: NeurIPS, pp. 8858–8867 (2018)
Bahadori, M.T.: Spectral capsule networks. In: ICLR (2018)
Gu, J., Tresp, V.: Improving the robustness of capsule networks to image affine transformations. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7283–7291 (2020)
He, P., Zhou, Y., Duan, S., Hu, X.: Memristive residual capsnet: A hardware friendly multi-level capsule network. Neurocomputing (2022). https://doi.org/10.1016/j.neucom.2022.04.088
Jia, X., Li, J., Zhao, B., Guo, Y., Huang, Y.: Res-capsnet: Residual capsule network for data classification. Neural Processing Letters (2022). https://doi.org/10.1007/s11063-022-10806-9
Lin, Z., Gao, W., Jia, J., Huang, F.: Capsnet meets sift: A robust framework for distorted target categorization. Neurocomputing 464, 290–316 (2021). https://doi.org/10.1016/j.neucom.2021.08.087
Lin, Z., Jia, J., Huang, F., Gao, W.: A coarseto- fine capsule network for fine-grained image categorization. Neurocomputing 456, 200–219 (2021). https://doi.org/10.1016/j.neucom.2021.05.032
Kim, J., Jang, S., Park, E., Choi, S.: Text classification using capsules. Neurocomputing 376, 214–221 (2020)
Liang, T., Chai, C., Sun, H., Tan, J.: Wind speed prediction based on multivariable capsnet-bilstm-mohho for wpccc. Energy 250, 123761 (2022). https://doi.org/10.1016/j.energy.2022.123761
Zeng, Q., Xie, T., Zhu, S., Fan, M., Chen,L., Tian, Y.: Estimating the near-ground pm2.5 concentration over china based on the capsnet model during 2018-2020. Remote Sensing 14(3) (2022). https://doi.org/10.3390/rs14030623
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS) 2(4), 303–314 (1989). https://doi.org/10.1007/BF02551274
Yu, D., Seide, F., Li, G.: Conversational speech transcription using context-dependent deep neural networks. In: ICML. ICML’12, pp. 1–2. Omnipress, Madison, WI, USA (2012)
Dauphin, Y., Bengio, Y.: Big neural networks waste capacity. https://www.CoRRabs/1301.3583 (2013)
Ba, L.J., Caruana, R.: Do deep nets really need to be deep? In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. NIPS’14, pp. 2654–2662. MIT Press, Cambridge, MA, USA (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P.,Reed, S., Anguelov, D., Erhan, D., Vanhoucke,V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (2016)
Iandola, F.N., Han, S., Moskewicz, M.W.,Ashraf, K., Dally, W.J., Keutzer, K.:SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5MB model size. arXiv e-prints, 1602-07360 (2016) arXiv:1602.07360
Xie, X., Zhou, Y., Kung, S.-Y.: Exploring highly efficient compact neural networks for image classification. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 2930–2934 (2020). https://doi.org/10.1109/ICIP40778.2020.9191334
Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing Deep Convolutional Networks using Vector Quantization. arXiv e-prints, 1412–6115 (2014). arXiv:1412.6115
Hwang, K., Sung, W.: Fixed-point feedforward deep neural network design using weights +1, 0, and -1. In: 2014 IEEE Workshop on Signal Processing Systems (SiPS), pp. 1–6 (2014). https://doi.org/10.1109/SiPS.2014.6986082
Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings (2016). arXiv:1510.03009
Floropoulos, N., Tefas, A.: Complete vector quantization of feedforward neural networks. Neurocomputing 367, 55–63 (2019). https://doi.org/10.1016/j.neucom.2019.08.003
Lybrand, E., Saab, R.: A Greedy Algorithm for Quantizing Neural Networks. Journal of Machine Learning Research, 2010–15979 (2020). arXiv:2010.15979
Yang, Z., Wang, Y., Han, K., Xu, C., Xu, C.,Tao, D., Xu, C.: Searching for Low-BitWeights in Quantized Neural Networks. arXiv e-prints, 2009-08695 (2020). arXiv:2009.08695
Guerra, L., Zhuang, B., Reid, I., Drummond,T.: Automatic Pruning for Quantized Neural Networks. arXiv e-prints, 2002-00523 (2020). arXiv:2002.00523
Arora, S., Bhaskara, A., Ge, R., Ma, T.: Provable bounds for learning some deep representations. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 584–592. PMLR, Bejing, China (2014)
Yoshida, Y., Oiwa, R., Kawahara, T.: Ternary sparse xnor-net for fpga implementation. In: 2018 7th International Symposium on Next Generation Electronics (ISNE), pp. 1-2 (2018). https://doi.org/10.1109/ISNE.2018.8394728
Bulat, A., Tzimiropoulos, G.: Xnor-net++: Improved binary neural networks. In: BMVC (2019)
Liu, Z., Luo, W., Wu, B., Yang, X., Liu, W., Cheng, K.: Bi-real net: Binarizing deep network towards real-network performance. Int J Comput Vis 128, 202–219 (2019)
Zhu, S., Duong, L.H.K., Liu, W.: Xor-net: An efficient computation pipeline for binary neural network inference on edge devices. In: 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), pp. 124–131 (2020). https://doi.org/10.1109/ICPADS51040.2020.00026
Zabidi, M.M., Wong, K.L., Sheikh, U.U.,Abdul Manan, S.S., Hamzah, M.A.N.: Bird sound detection with binarized neural networks. ELEKTRIKA - Journal of Electrical Engineering 21(1), 48–53 (2022). https://doi.org/10.11113/elektrika.v21n1.349
Zhao, Y., Yu, J., Zhang, D., Hu, Q., Liu,X., Jiang, H., Ding, Q., Han, Z., Cheng, J.,Zhang, W., Cao, Y., Zhou, R., Lu, H., Xu, X.,Yang, J.: A 0.02 accuracy loss voltage-mode parallel sensing scheme for rram-based xnornet application. IEEE Transactions on Circuits and Systems II: Express Briefs, 1–1 (2022). https://doi.org/10.1109/TCSII.2022.3157767
Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with em routing. In: International Conference on Learning Representations (2018)
Ribeiro, F.D.S., Leontidis, G., Kollias, S.: Capsule routing via variational bayes. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3749–3756 (2020)
Zhao, L., Wang, X., Huang, L.: An efficient agreement mechanism in capsnets by pairwise product. In: ECAI (2020)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing 10, 18–31 (2019)
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010)
Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 (canadian institute for advanced research)
Byerly, A., Kalganova, T., Dear, I.: No routing needed between capsules. Neurocomputing 463, 545–553 (2021). https://doi.org/10.1016/j.neucom.2021.08.064
Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G., Jégou, H.: Going deeper with Image Transformers. 2103–17239 (2021). arXiv:2103.17239
Mazzia, V., Salvetti, F., Chiaberge, M.: Efficient-capsnet: Capsule network with selfattention routing. Scientific Reports 11 (2021)
Duarte, K., Rawat, Y., Shah, M.: Plm: Partial label masking for imbalanced multi-label classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2739–2748 (2021)
Yang, H., Li, S., Yu, B.: Routing Towards Discriminative Power of Class Capsules. arXiv e-prints, 2103-04278 (2021) arXiv:2103.04278 [cs.LG]
Cheng, K., Tahir, R., Eric, L.K., Li, M.: An analysis of generative adversarial networks and variants for image synthesis on mnist dataset. Multimed Tools Appl 79(19), 13725–13752 (2020). https://doi.org/10.1007/s11042-019-08600-2
Hirata, D., Takahashi, N.: Ensemble learning in CNN augmented with fully connected subnetworks. arXiv e-prints, 2003–08562 (2020). arXiv:2003.08562
Wang, L., Xie, S., Li, T., Fonseca, R., Tian, Y.: Sample-Efficient Neural Architecture Search by Learning Action Space. 1906–06832 (2019). arXiv:1906.06832
Kosiorek, A.R., Sabour, S., Teh, Y.W., Hinton,G.: Stacked capsule autoencoders. In: Neural Information Processing Systems (2019). arXiv:1906.06818
Yang, Z., Wang, X.: Reducing the dilution: An analysis of the information sensitiveness of capsule network with a practical improvement method. arXiv e-prints, 1903–10588 (2019).arXiv:1903.10588
Yao, H., Regan, M., Yang, Y., Ren, Y.: Image decomposition and classification through a generative model. In: 2019 IEEE International Conference on Image Processing, ICIP 2019 -Proceedings. roceedings - International Conference on Image Processing, ICIP, pp. 400–404. IEEE Computer Society, ??? (2019). https://doi.org/10.1109/ICIP.2019.8802991 . Publisher Copyright: © 2019 IEEE.; 26th IEEE International Conference on Image Processing, ICIP 2019 ; Conference date: 22-09-2019 Through 25-09-2019
Muñoz, J.P., Lyalyushkin, N., Akhauri, Y.,Senina, A., Kozlov, A., Jain, N.: Enabling NAS with Automated Super-Network Generation. arXiv e-prints, 2112–10878 (2021).arXiv:2112.10878
Wightman, R., Touvron, H., Jégou, H.: Resnet strikes back: An improved training procedure in timm.arXiv:2110.00476 (2021)
Chen, X., Hsieh, C.-J., Gong, B.: When vision transformers outperform resnets without pretraining or strong data augmentations. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=LtKcMgGOeLt
Cherti, M., Jitsev, J.: Effect of pre-training scale on intra-and inter-domain full and few-shot transfer learning for natural and medical x-ray chest images. arXiv:2106.00116 (2021)
Mukhometzianov, R., Carrillo, J.: CapsNet comparative performance evaluation for image classification. arXiv e-prints, 1805–11195 (2018). arXiv:1805.11195
Mohaimenuzzaman, M., Bergmeir, C., Meyer,B.: Pruning vs XNOR-net: A comprehensive study of deep learning for audio classification on edge-devices. IEEE Access 10, 6696–6707 (2022)
Acknowledgements
We would like to thank Ms. Druselle May who helped us proofread the manuscript.
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
Jian Sun: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization. Ali Pourramezan Fard: Writing - Review & Editing. Mohammad H. Mahoor: Resources, Writing - Review & Editing, Supervision, Project administration.
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethics approval
Not applicable
Consent to participate
Not applicable
Consent for publication
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, J., Fard, A.P. & Mahoor, M.H. XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers for Convolutional Neural Networks. J Intell Robot Syst 109, 17 (2023). https://doi.org/10.1007/s10846-023-01952-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-023-01952-w