Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification

Yuan, Chengguo; Jin, Yu; Wu, Zongzhen; Wei, Fanting; Wang, Yangzirui; Chen, Lan; Wang, Xiao

doi:10.1007/978-981-99-8429-9_1

Chengguo Yuan¹⁵,
Yu Jin¹⁵,
Zongzhen Wu¹⁵,
Fanting Wei¹⁵,
Yangzirui Wang¹⁵,
Lan Chen¹⁵ &
…
Xiao Wang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1309 Accesses

Abstract

Recognizing target objects using an event-based camera draws more and more attention in recent years. Existing works usually represent the event streams into point-cloud, voxel, image, etc., and learn the feature representations using various deep neural networks. Their final results may be limited by the following factors: monotonous modal expressions and the design of the network structure. To address the aforementioned challenges, this paper proposes a novel dual-stream framework for event representation, extraction, and fusion. This framework simultaneously models two common representations: event images and event voxels. By utilizing Transformer and Structured Graph Neural Network (GNN) architectures, spatial information and three-dimensional stereo information can be learned separately. Additionally, a bottleneck Transformer is introduced to facilitate the fusion of the dual-stream information. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on two widely used event-based classification datasets. The source code of this work is available at: https://github.com/Event-AHU/EFV_event_classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/Event-AHU/Event_Camera_in_Top_Conference.

References

ul Hassan, M.: AlexNet ImageNet classification with deep convolutional neural networks (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G., Liu, J.: Human action recognition from various data modalities: a review. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3200–3225 (2023)
Google Scholar
Wang, X., et al.: VisEvent: reliable object tracking via collaboration of frame and event flows arXiv preprint arXiv:2108.05015 (2021)
Tang, C., et al.: Revisiting color-event based tracking: a unified network, dataset, and metric, arXiv preprint arXiv:2211.11010 (2022)
Zhu, L., Wang, X., Chang, Y., Li, J., Huang, T., Tian, Y.: Event-based video reconstruction via potential-assisted spiking neural network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3594–3604 (2022)
Google Scholar
Wang, Y., et al.: EV-Gait: event-based robust gait recognition using dynamic vision sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6358–6367 (2019)
Google Scholar
Wang, Y., et al.: Event-stream representation for human gaits identification using deep neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 44, 3436–3449 (2021)
Google Scholar
Fang, H., Shrestha, A., Zhao, Z., Qiu, Q.: Exploiting neuron and synapse filter dynamics in spatial temporal learning of deep spiking neural network, arXiv preprint arXiv:2003.02944 (2020)
Fang, W., Yu, Z., Chen, Y., Masquelier, T., Huang, T., Tian, Y.: Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2661–2671 (2021)
Google Scholar
Bi, Y., Chadha, A., Abbas, A., Bourtsoulatze, E., Andreopoulos, Y.: Graph-based object classification for neuromorphic vision sensing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 491–501 (2019)
Google Scholar
Bi, Y., Chadha, A., Abbas, A., Bourtsoulatze, E., Andreopoulos, Y.: Graph-based spatio-temporal feature learning for neuromorphic vision sensing. IEEE Trans. Image Process. 29, 9084–9098 (2020)
Article Google Scholar
Wang, Y., et al.: Event-stream representation for human gaits identification using deep neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3436–3449 (2021)
MathSciNet Google Scholar
Diehl, P.U., Neil, D., Binas, J., Cook, M., Liu, S.-C., Pfeiffer, M.: Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2015)
Google Scholar
Perez-Nieves, N., Goodman, D.: Sparse spiking gradient descent. Adv. Neural Inf. Process. Syst. 34, 11 795–11 808 (2021)
Google Scholar
Fang, W., Yu, Z., Chen, Y., Huang, T., Masquelier, T., Tian, Y.: Deep residual learning in spiking neural networks. Adv. Neural Inf. Process. Syst. 34, 21 056–21 069 (2021)
Google Scholar
Wang, X., et al.: SSTFormer: bridging spiking neural network and memory support transformer for frame-event based recognition arXiv preprint arXiv:2308.04369 (2023)
Jiang, B., Yuan, C., Wang, X., Bao, Z., Zhu, L., Luo, B.: Point-voxel absorbing graph representation learning for event stream based recognition arXiv preprint arXiv:2306.05239 (2023)
Wang, Q., Zhang, Y., Yuan, J., Lu, Y.: Space-time event clouds for gesture recognition: from RGB cameras to event cameras. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1826–1835. IEEE (2019)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Xie, B., Deng, Y., Shao, Z., Liu, H., Li, Y.: VMV-GCN: volumetric multi-view based graph CNN for event stream classification. IEEE Robot. Autom. Lett. 7(2), 1976–1983 (2022)
Article Google Scholar
Li, Z., Asif, M.S., Ma, Z.: Event transformer, arXiv preprint arXiv:2204.05172 (2022)
Schaefer, S., Gehrig, D., Scaramuzza, D.: AEGNN: asynchronous event-based graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12 371–12 381 (2022)
Google Scholar
Li, Y., et al.: Graph-based asynchronous event processing for rapid object recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 934–943 (2021)
Google Scholar
Srinivas, A., Lin, T.-Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottleneck transformers for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision And Pattern Recognition, pp. 16 519–16 529 (2021)
Google Scholar
Li, S., et al.: Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst., 32 (2019)
Google Scholar
Nagrani, A., Yang, S., Arnab, A., Jansen, A., Schmid, C., Sun, C.: Attention bottlenecks for multimodal fusion. Adv. Neural Inf. Process. Syst. 34, 14 200–14 213 (2021)
Google Scholar
Song, R., Feng, Y., Cheng, W., Mu, Z., Wang, X.: BS2T: bottleneck spatial-spectral transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 60, 1–17 (2022)
Google Scholar
Miranda, L.J.: Understanding softmax and the negative log-likelihood. ljvmiranda921. github. io (2017)
Google Scholar
Orchard, G., Jayawant, A., Cohen, G.K., Thakor, N.: Converting static image datasets to spiking neuromorphic datasets using saccades. Front. Neurosci. 9, 437 (2015)
Article Google Scholar
Gehrig, D., Loquercio, A., Derpanis, K.G., Scaramuzza, D.: End-to-end learning of representations for asynchronous event-based data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5633–5643 (2019)
Google Scholar
Deng, Y., Li, Y., Chen, H.: AMAE: adaptive motion-agnostic encoder for event-based object classification. IEEE Robot. Autom. Lett. 5(3), 4596–4603 (2020)
Article Google Scholar
Cannici, M., Ciccone, M., Romanoni, A., Matteucci, M.: A differentiable recurrent surface for asynchronous event-based data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 136–152. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_9
Chapter Google Scholar
Deng, Y., Chen, H., Li, Y.: MVF-Net: a multi-view fusion network for event-based object classification. IEEE Trans. Circuits Syst. Video Technol. 32(12), 8275–8284 (2021)
Article Google Scholar
Sekikawa, Y., Hara, K., Saito, H.: EventNet: asynchronous recursive event processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3887–3896(2019)
Google Scholar
Deng, Y., Chen, H., Chen, H., Li, Y.: EVVGCNN: a voxel graph CNN for event-based object classification, arXiv preprint arXiv:2106.00216, vol. 1, no. 2, p. 6 (2021)
Sironi, A., Brambilla, M., Bourdis, N., Lagorce, X., Benosman, R.: HATS: histograms of averaged time surfaces for robust event-based object classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1731–1740 (2018)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 62102205).

Author information

Authors and Affiliations

Anhui University, Hefei, 230601, Anhui, China
Chengguo Yuan, Yu Jin, Zongzhen Wu, Fanting Wei, Yangzirui Wang, Lan Chen & Xiao Wang

Authors

Chengguo Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yu Jin
View author publications
You can also search for this author in PubMed Google Scholar
Zongzhen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fanting Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yangzirui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lan Chen .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, C. et al. (2024). Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_1

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_1
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion Based Classification