Abstract
Over the last few years, deep learning on irregular 3D data given its wide range of applications has become one of the active topics in the field. While field programmable gate array (FPGA)-based acceleration of deep learning models has been proved to produce power-efficient designs in comparison with other platforms such as CPUs and GPUs, only a few studies have been conducted regarding the models that consume point clouds as their input. Although tailoring the hardware designs to specific networks could lead to better optimization opportunities, it is also important to keep the reusability of the design in mind, especially for a new and evolving topic like learning on point clouds. In this work, we have aimed to achieve reusability by keeping the hardware isolated from the computational graph. Considering the numerous types of layers used in dynamic graph convolutional neural network (DGCNN) and its popularity, our proposed design aims for the thorough acceleration of DGCNN. The challenges including 18 types of tensor operations, achieving burst transfers, dealing with kernel complexities, external memory banks, in-order and out-of-order execution modes, and approaches with multiple processing elements have been explained in details throughout the paper. Our experiments on a single FPGA with a single bitstream, DDR4 memory subsystem, and Float32 data type demonstrated speedups of \(2.73\times \) to \(8.4\times \) compared to a sequential single-threaded implantation on an Intel Core i7 6700HQ.
Similar content being viewed by others
Data Availability
All data included in this study are available in DeepPoint-V2-FPGA’s repository, https://doi.org/10.5281/zenodo.6397222.
References
F. Albu, J. Kadlec, C. Softley, R. Matousek, A. Hermanek, N. Coleman, A. Fagan, Implementation of (normalised) RLS lattice on virtex, in Field-Programmable Logic and Applications. ed. by G. Brebner, R. Woods (Springer, Berlin Heidelberg, Berlin, Heidelberg, 2001), pp.91–100
F. Albu, J. Kadlec, N. Coleman, A. Fagan, Pipelined implementations of the a priori error-feedback LSL algorithm using logarithmic arithmetic, in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3 (2002), pp. III-2681–III-2684. https://doi.org/10.1109/ICASSP.2002.5745200
L. Bai, Y. Lyu, X. Xu, X. Huang, PointNet on FPGA for real-time LiDAR point cloud processing, in 2020 IEEE International Symposium on Circuits and Systems (ISCAS) (2020), pp. 1–5. https://doi.org/10.1109/ISCAS45731.2020.9180841
A.X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, F. Yu, ShapeNet: An Information-Rich 3D Model Repository, arXiv preprint, arXiv (2015). https://doi.org/10.48550/ARXIV.1512.03012
R.Q. Charles, H. Su, M. Kaichun, L.J. Guibas, PointNet: deep learning on point sets for 3D classification and segmentation, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 77–85. https://doi.org/10.1109/CVPR.2017.16
R. Chen, V.K. Prasanna, Computer generation of high throughput and memory efficient sorting designs on FPGA. IEEE Trans. Parallel Distrib. Syst. 28(11), 3100–3113 (2017). https://doi.org/10.1109/TPDS.2017.2705128
M. Christ, F. de Dinechin, F. Pétrot, Low-precision logarithmic arithmetic for neural network accelerators, in ASAP 2022—33rd IEEE International Conference on Application-Specific Systems, Architectures and Processors, Gothenburg, Sweden (2022). https://hal.inria.fr/hal-03684585
P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, G. Ranzuglia, MeshLab: an open-source mesh processing tool, in Eurographics Italian Chapter Conference (2008)
J. Coleman, E. Chester, C. Softley, J. Kadlec, Arithmetic on the European logarithmic microprocessor. IEEE Trans. Comput. 49(7), 702–715 (2000). https://doi.org/10.1109/12.863040
R. DiCecco, G. Lacey, J. Vasiljevic, P. Chow, G. Taylor, S. Areibi, Caffeinated FPGAs: FPGA framework for convolutional neural networks, in 2016 International Conference on Field-Programmable Technology (FPT) (2016), pp. 265–268. https://doi.org/10.1109/FPT.2016.7929549
J. de Fine Licht, T. Hoefler, hlslib: Software Engineering for Hardware Design, arXiv preprint, arXiv (2019). https://doi.org/10.48550/ARXIV.1910.04436
J. de Fine Licht, G. Kwasniewski, T. Hoefler, Flexible communication avoiding matrix multiplication on FPGA with high-level synthesis, in The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Association for Computing Machinery, New York, NY, USA, FPGA ’20 (2020), pp. 244–254. https://doi.org/10.1145/3373087.3375296
J. de Fine Licht, M. Besta, S. Meierhans, T. Hoefler, Transformations of high-level synthesis codes for high-performance computing. IEEE Trans. Parallel Distrib. Syst. 32(5), 1014–1029 (2021). https://doi.org/10.1109/TPDS.2020.3039409
L. Gong, C. Wang, X. Li, X. Zhou, Improving HW/SW adaptability for accelerating CNNs on FPGAs through a dynamic/static co-reconfiguration approach. IEEE Trans. Parallel Distrib. Syst. 32(7), 1854–1865 (2021). https://doi.org/10.1109/TPDS.2020.3046762
Y.B. Jmaa, R.B. Atitallah, D. Duvivier, M.B. Jemaa, A comparative study of sorting algorithms with FPGA acceleration by high level synthesis. Computación y Sistemas 23(1), 213 (2019). https://doi.org/10.13053/cys-23-1-2999
S. Liang, C. Liu, Y. Wang, H. Li, X. Li, DeepBurning-GL: an automated framework for generating graph neural network accelerators, in 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD) (2020), pp. 1–9
L. Luo, Y. Wu, F. Qiao, Y. Yang, Q. Wei, X. Zhou, Y. Fan, S. Xu, X. Liu, H. Yang, M. Hübner, Design of FPGA-based accelerator for convolutional neural network under heterogeneous computing framework with OpenCL. Int. J. Reconfig. Comput. (2018). https://doi.org/10.1155/2018/1785892
J. Matai, D. Richmond, D. Lee, Z. Blair, Q. Wu, A. Abazari, R. Kastner, Resolve: generation of high-performance sorting architectures from high-level synthesis, in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Association for Computing Machinery, New York, NY, USA, FPGA ’16 (2016), pp. 195–204. https://doi.org/10.1145/2847263.2847268
D. Maturana, S. Scherer, VoxNet: a 3D convolutional neural network for real-time object recognition, in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2015), pp. 922–928. https://doi.org/10.1109/IROS.2015.7353481
C.R. Qi, H. Su, M. Nießner, A. Dai, M. Yan, L.J. Guibas, Volumetric and multi-view CNNs for object classification on 3D data, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 5648–5656. https://doi.org/10.1109/CVPR.2016.609
C.R. Qi, L. Yi, H. Su, L.J. Guibas, PointNet++: deep hierarchical feature learning on point sets in a metric space, in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA. ed. by I. Guyon, U. von Luxburg, S. Bengio, H.M. Wallach, R. Fergus, S.V.N. Vishwanathan, R. Garnett (2017), pp. 5099–5108. https://proceedings.neurips.cc/paper/2017/hash/d8bf84be3800d12f74d8b05e9b89836f-Abstract.html
J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, H. Yang, Going deeper with embedded FPGA platform for convolutional neural network, in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, February 21–23, 2016. ed. by D. Chen, J.W. Greene (ACM, 2016), pp. 26–35. https://doi.org/10.1145/2847263.2847265
R.B. Rusu, S. Cousins, 3D is here: point cloud library (PCL), in 2011 IEEE International Conference on Robotics and Automation (2011), pp. 1–4. https://doi.org/10.1109/ICRA.2011.5980567
A. Shawahna, S.M. Sait, A. El-Maleh, FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7, 7823–7859 (2019). https://doi.org/10.1109/ACCESS.2018.2890150
H. Su, S. Maji, E. Kalogerakis, E.G. Learned-Miller, Multi-view convolutional neural networks for 3D shape recognition, in IEEE International Conference on Computer Vision, ICCV, Santiago, Chile (IEEE Computer Society, 2015), pp. 945–953. https://doi.org/10.1109/ICCV.2015.114
N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J.S. Seo, Y. Cao, Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks, in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Association for Computing Machinery, New York, NY, USA, FPGA ’16 (2016), pp. 16–25. https://doi.org/10.1145/2847263.2847276
Y. Wang, Y. Sun, Z. Liu, S.E. Sarma, M.M. Bronstein, J.M. Solomon, Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 1–12 (2019). https://doi.org/10.1145/3326362
Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, J. Xiao, 3D ShapeNets:a deep representation for volumetric shapes, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 1912–1920. https://doi.org/10.1109/CVPR.2015.7298801
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing FPGA-based accelerator design for deep convolutional neural networks, in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Association for Computing Machinery, New York, NY, USA, FPGA ’15 (2015), pp. 161–170. https://doi.org/10.1145/2684746.2689060
C. Zhang, Z. Fang, P. Zhou, P. Pan, J. Cong, Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks, in 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2016), pp. 1–8. https://doi.org/10.1145/2966986.2967011
J.F. Zhang, Z. Zhang, Point-X: a spatial-locality-aware architecture for energy-efficient graph-based point-cloud deep learning, in MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Association for Computing Machinery, New York, NY, USA, MICRO ’21 (2021), pp. 1078–1090. https://doi.org/10.1145/3466752.3480081
Q. Zhang, M. Zhang, T. Chen, Z. Sun, Y. Ma, B. Yu, Recent advances in convolutional neural network acceleration. Neurocomputing 323, 37–51 (2019). https://doi.org/10.1016/j.neucom.2018.09.038
X. Zheng, M. Zhu, Y. Xu, Y. Li, An FPGA based parallel implementation for point cloud neural network, in 2019 IEEE 13th International Conference on ASIC (ASICON) (2019), pp. 1–4. https://doi.org/10.1109/ASICON47005.2019.8983660
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no financial or nonfinancial competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jamali Golzar, S., Karimian, G., Shoaran, M. et al. DGCNN on FPGA: Acceleration of the Point Cloud Classifier Using FPGAs. Circuits Syst Signal Process 42, 748–779 (2023). https://doi.org/10.1007/s00034-022-02179-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-022-02179-0