PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

Zhang, Libo; Jiang, Lutao; Ji, Ruyi; Fan, Heng

doi:10.1007/s11263-023-01855-1

PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

Published: 04 August 2023

Volume 131, pages 3170–3192, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Libo Zhang^1,2,
Lutao Jiang^1,2,
Ruyi Ji³ &
…
Heng Fan⁴

770 Accesses
1 Citation
2 Altmetric
Explore all metrics

A Correction to this article was published on 17 August 2023

This article has been updated

Abstract

Automatic security inspection relying on computer vision technology is a challenging task in real-world scenarios due to many factors, such as intra-class variance, class imbalance, and occlusion. Most previous methods rarely touch the cases where the prohibited items are deliberately hidden in messy objects because of the scarcity of large-scale datasets, hindering their applications. To address this issue and facilitate related research, we present a large-scale dataset, named PIDray, which covers various cases in real-world scenarios for prohibited item detection, especially for deliberately hidden items. In specific, PIDray collects 124, 486 X-ray images for 12 categories of prohibited items, and each image is manually annotated with careful inspection, which characterizes it, to our best knowledge, with the largest volume and varieties of annotated images with prohibited items to date. Meanwhile, we propose a general divide-and-conquer pipeline to develop baseline algorithms on PIDray. Specifically, we adopt the tree-like structure to suppress the influence of the long-tailed issue in the PIDray dataset, where the first course-grained node is tasked with the binary classification to alleviate the influence of head category, while the subsequent fine-grained node is dedicated to the specific tasks of the tail categories. Based on this simple yet effective scheme, we offer strong task-specific baselines across object detection, instance segmentation, and multi-label classification tasks and verify the generalization ability on common datasets (e.g., COCO and PASCAL VOC). Extensive experiments on PIDray demonstrate that the proposed method performs favorably against current state-of-the-art methods, especially for deliberately hidden items. Our benchmark and codes are available at https://github.com/lutao2021/PIDray.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handling occlusion in prohibited item detection from X-ray images

Article 21 July 2022

Multi-label X-Ray Imagery Classification via Bottom-Up Attention and Meta Fusion

Electronic explosives inspection: a fine-grained X-ray benchmark and few-shot prohibited phone detection model

Article 31 October 2023

Data Availability

Our benchmark and codes are released at https://github.com/lutao2021/PIDray.

Change history

17 August 2023
A Correction to this paper has been published: https://doi.org/10.1007/s11263-023-01889-5

Notes

http://labelme.csail.mit.edu/Release3.0/.
https://github.com/open-mmlab/mmdetection.

References

Akcay, S., & Breckon, T. (2022). Towards automatic threat detection: A survey of advances of deep learning within x-ray security imaging. Pattern Recognition, 122(108), 245.
Google Scholar
Akcay, S., & Breckon, T.P. (2017). An evaluation of region based object detection strategies within x-ray baggage security imagery. In IEEE International Conference on Image Processing (ICIP), pp. 1337–1341.
Akcay, S., Kundegorski, M. E., Willcocks, C. G., et al. (2018). Using deep convolutional neural network architectures for object classification and detection within x-ray baggage security imagery. IEEE Transactions on Information Forensics and Security, 13(9), 2203–2215.
Article Google Scholar
Ben-Baruch, E., Ridnik, T., Friedman, I., et al. (2021). Multi-label classification with partial annotations using class-aware selective loss. arXiv preprint arXiv:2110.10955.
Cai, J., Wang, Y., & Hwang, J.N. (2021). Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In IEEE International Conference on Computer Vision (ICCV), pp. 112–121.
Cai, Y., Du, D., Zhang, L., et al. (2020). Guided attention network for object detection and counting on drones. In ACM International Conference on Multimedia (ACM MM), pp. 709–717.
Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 6154–6162.
Cai, Z., & Vasconcelos, N. (2019). Cascade r-cnn: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1483–1498.
Article Google Scholar
Cao, Y., Xu, J., Lin, S., et al. (2019). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.
Carion, N., Massa, F., Synnaeve, G., et al. (2020). End-to-end object detection with transformers. In European Conference on Computer Vision (ECCV), Springer, pp. 213–229.
Chen, K., Cao, Y., Loy, C.C., et al. (2020). Feature pyramid grids. arXiv preprint arXiv:2004.03580.
Chen, Z., Yang, C., Li, Q., et al (2021) Disentangle your dense object detector. In ACM International Conference on Multimedia (ACM MM), pp. 4939–4948.
Cui, Y., Jia, M., Lin, T.Y., et al. (2019). Class-balanced loss based on effective number of samples. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 9268–9277.
Duan, K., Bai, S., Xie, L., et al .(2019). Centernet: Keypoint triplets for object detection. InIEEE International Conference on Computer Vision (ICCV). pp. 6569–6578.
Everingham, M., Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.
Article Google Scholar
Feng, C., Zhong, Y., Gao, Y., et al. (2021). Tood: Task-aligned one-stage object detection. In IEEE International Conference on Computer Vision (ICCV), pp. 3490–3499.
Fu, C.Y., Liu, W., Ranga, A., et al (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659.
Gao, B. B., & Zhou, H. Y. (2021). Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Transactions on Image Processing, 30, 5920–5932.
Article Google Scholar
Gao, Z., Xie, J., Wang, Q., et al. (2019). Global second-order pooling convolutional networks. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 3024–3033.
Ge, Z., Liu, S., Wang, F., et al. (2021). YOLOX: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.
Ghiasi, G., Lin, T.Y., & Le, Q.V. (2019). Nas-fpn: Learning scalable feature pyramid architecture for object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 7036–7045.
Girshick, R. (2015). Fast r-cnn. In IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448.
Girshick, R., Donahue, J., Darrell, T., et al. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR)., pp. 580–587.
Gong, Y., Yu, X., Ding, Y., et al. (2021). Effective fusion factor in fpn for tiny object detection. In IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1160–1168.
He, K., Zhang, X., Ren, S., et al (2016) Deep residual learning for image recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 770–778.
He, K., Gkioxari, G., Dollár, P., et al. (2017). Mask r-cnn. In IEEE International Conference on Computer Vision (ICCV). pp. 2961–2969.
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 7132–7141.
Huang, Z., Wang, X., Huang, L., et al. (2019). Ccnet: Criss-cross attention for semantic segmentation. In IEEE International Conference on Computer Vision (ICCV). pp. 603–612.
Ji, R., Du, D., Zhang, L., et al. (2020a). Learning semantic neural tree for human parsing. In European Conference on Computer Vision (ECCV). pp. 205–221.
Ji, R., Wen, L., Zhang, L., et al. (2020b). Attention convolutional binary neural tree for fine-grained visual categorization. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 10465–10474.
Kang, B., Xie, S., Rohrbach, M., et al. (2019). Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217.
Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In European Conference on Computer Vision (ECCV). pp. 734–750.
Lewis, D. Griffin JTAAMatthew Caldwell. (2019). Compass-xp.
Li, C., Du, D., Zhang, L., et al, (2020). Spatial attention pyramid network for unsupervised domain adaptation. In European Conference on Computer Vision (ECCV). pp. 481–497.
Li, S., Gong, K., Liu, C.H., et al. (2021). Metasaug: Meta semantic augmentation for long-tailed visual recognition. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 5212–5221.
Li, S., He, C., Li, R., et al (2022a) A dual weighting label assignment scheme for object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 9387–9396.
Li, Y., Mao, H., Girshick, R., et al. (2022b). Exploring plain vision transformer backbones for object detection. arXiv preprint arXiv:2203.16527.
Lin, T.Y., Maire. M., Belongie. S., et al. (2014). Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV). pp. 740–755.
Lin, T.Y., Dollár, P., Girshick, R., et al. (2017a). Feature pyramid networks for object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 2117–2125.
Lin, T.Y., Goyal, P., Girshick, R., et al. (2017b). Focal loss for dense object detection. In IEEE International Conference on Computer Vision (ICCV). pp. 2980–2988.
Liu, J., Leng, J., Ying, L. (2019a). Deep convolutional neural network based object detector for x-ray baggage security imagery. In IEEE International Conference on Tools with Artificial Intelligence (ICTAI). pp. 1757–1761.
Liu, S., Qi, L., Qin, H., et al. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 8759–8768.
Liu, S., Huang, D., & Wang, Y. (2019b). Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516.
Liu, S., Zhang, L., Yang, X., et al. (2021). Query2label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834.
Liu, W., Anguelov, D., Erhan, D., et al. (2016). Ssd: Single shot multibox detector. In European Conference on Computer Vision (ECCV). pp. 21–37.
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
Mery, D., Riffo, V., Zscherpel, U., et al. (2015). Gdxray: The database of x-ray images for nondestructive testing. Journal of Nondestructive Evaluation, 34(4), 42.
Article Google Scholar
Mery, D., Saavedra, D., & Prasad, M. (2020). X-ray baggage inspection with computer vision: A survey. IEEE Access, 8, 145620–145633.
Article Google Scholar
Miao, C., Xie, L., Wan, F., et al. (2019). Sixray: A large-scale security inspection x-ray benchmark for prohibited item discovery in overlapping images. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 2119–2128.
Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. Advances in Neural Information Processing Systems In Proceeings of Neural Information Processing Systems (NIPS). 27.
Pang, J., Chen, K., Shi, J., et al. (2019). Libra r-cnn: Towards balanced learning for object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 821–830.
Qin, Z., Zhang, P., Wu, F., et al. (2021), Fcanet: Frequency channel attention networks. In IEEE International Conference on Computer Vision (ICCV). pp. 783–792.
Redmon, J., & Farhadi, A. (2017). Yolo9000: better, faster, stronger. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 7263–7271.
Redmon, J., Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
Redmon, J., Divvala, S., Girshick, R., et al. (2016). You only look once: Unified, real-time object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 779–788.
Ren, S., He, K., Girshick, R., et al. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Annual Conference on Neural Information Processing Systems (NIPS). pp. 91–99.
Ridnik, T., Lawen, H., Noy, A., et al. (2020). Tresnet: High performance gpu-dedicated architecture. 2003.13630.
Ridnik, T., Ben-Baruch, E., Zamir, N., et al. (2021a). Asymmetric loss for multi-label classification. In IEEE International Conference on Computer Vision (ICCV). pp. 82–91.
Ridnik, T., Sharir, G., Ben-Cohen, A., et al. (2021b). Ml-decoder: Scalable and versatile classification head. arXiv preprint arXiv:2111.12933.
Saavedra, D., Banerjee, S., & Mery, D. (2021). Detection of threat objects in baggage inspection with x-ray images using deep learning. Neural Computing and Applications, 33, 7803–7819.
Article Google Scholar
Sermanet, P., Eigen, D., Zhang, X., et al. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229.
Smith, L.N., & Topin, N. (2019). Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-domain Operations Applications. pp. 369–386.
Tao, R., Wei, Y., Jiang, X., et al. (2021a). Towards real-world x-ray security inspection: A high-quality benchmark and lateral inhibition module for prohibited items detection. In IEEE ICCV.
Tao, R., Wei, Y., Li, H., et al. (2021b). Over-sampling de-occlusion attention network for prohibited items detection in noisy x-ray images. arXiv preprint arXiv:2103.00809.
Tian, Z., Shen, C., Chen, H., et al. (2019). Fcos: Fully convolutional one-stage object detection. In IEEE International Conference on Computer Vision (ICCV) pp. 9627–9636.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. NIPS 30 5998–6008.
Velayudhan, D., Hassan, T., Damiani, E., et al. (2022). Recent advances in baggage threat detection: A comprehensive and systematic survey. ACM Computing Surveys, 55(8), 1–38.
Article Google Scholar
Wang, B., Zhang, L., Wen, L., et al. (2021). Towards real-world prohibited item detection: A large-scale x-ray benchmark. In IEEE International Conference on Computer Vision (ICCV), pp. 5412–5421.
Wang, J., Chen, K., Xu, R., et al. (2019a). Carafe: Content-aware reassembly of features. In IEEE International Conference on Computer Vision (ICCV). pp. 3007–3016.
Wang, Q., Wu, B., Zhu, P., et al. (2020). Eca-net: Efficient channel attention for deep convolutional neural networks. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR).
Wang, X., Girshick, R., Gupta. A., et al. (2018) Non-local neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR).
Wang, Y., Gan, W., Yang, J., et al. (2019b). Dynamic curriculum learning for imbalanced data classification. In: IEEE International Conference on Computer Vision (ICCV). pp. 5017–5026.
Wang, Y., Pan, X., Song, S., et al. (2019c). Implicit semantic data augmentation for deep networks. Advances in Neural Information Processing Systems 32
Wei, Y., Tao, R., Wu, Z., et al. (2020). Occluded prohibited items detection: An x-ray security inspection benchmark and de-occlusion attention module. In Proceedings of the 28th ACM International Conference on Multimedia. pp. 138–146.
Woo, S., Park, J., Lee, J.Y., et al. (2018). Cbam: Convolutional block attention module. In European Conference on Computer Vision (ECCV)., pp. 3–19.
Wu, H., Xiao, B., Codella, N., et al. (2021). Cvt: Introducing convolutions to vision transformers. In IEEE International Conference on Computer Vision (ICCV), pp. 22–31.
Xiang, L., Ding, G., & Han, J. (2020). Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In European Conference on Computer Vision (ECCV), Springer, pp. 247–263.
Yin, X., Yu, X., Sohn, K., et al. (2019). Feature transfer learning for face recognition with under-represented data. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 5704–5713.
Zhou, B., Cui, Q., Wei, X.S., et al. (2020). Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 9719–9728.
Zhu, K., Wu, J. (2021). Residual attention: A simple but effective method for multi-label recognition. In IEEE International Conference on Computer Vision (ICCV), pp. 184–193.

Download references

Author information

Authors and Affiliations

Institute of Software Chinese Academy of Sciences, Beijing, China
Libo Zhang & Lutao Jiang
University of Chinese Academy of Sciences, Beijing, China
Libo Zhang & Lutao Jiang
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ruyi Ji
Department of Computer Science and Engineering, University of North Texas, Denton, USA
Heng Fan

Authors

Libo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lutao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ruyi Ji
View author publications
You can also search for this author in PubMed Google Scholar
Heng Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruyi Ji.

Additional information

Communicated by D. Scharstein

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Libo Zhang and Lutao Jiang make equal contributions to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, L., Jiang, L., Ji, R. et al. PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection. Int J Comput Vis 131, 3170–3192 (2023). https://doi.org/10.1007/s11263-023-01855-1

Download citation

Received: 07 October 2022
Accepted: 17 July 2023
Published: 04 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11263-023-01855-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

Abstract

Access this article

Similar content being viewed by others

Handling occlusion in prohibited item detection from X-ray images

Multi-label X-Ray Imagery Classification via Bottom-Up Attention and Meta Fusion

Electronic explosives inspection: a fine-grained X-ray benchmark and few-shot prohibited phone detection model

Data Availability

Change history

17 August 2023

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

Abstract

Access this article

Similar content being viewed by others

Handling occlusion in prohibited item detection from X-ray images

Multi-label X-Ray Imagery Classification via Bottom-Up Attention and Meta Fusion

Electronic explosives inspection: a fine-grained X-ray benchmark and few-shot prohibited phone detection model

Data Availability

Change history

17 August 2023

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation