Skip to main content
Log in

PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

A Correction to this article was published on 17 August 2023

This article has been updated

Abstract

Automatic security inspection relying on computer vision technology is a challenging task in real-world scenarios due to many factors, such as intra-class variance, class imbalance, and occlusion. Most previous methods rarely touch the cases where the prohibited items are deliberately hidden in messy objects because of the scarcity of large-scale datasets, hindering their applications. To address this issue and facilitate related research, we present a large-scale dataset, named PIDray, which covers various cases in real-world scenarios for prohibited item detection, especially for deliberately hidden items. In specific, PIDray collects 124, 486 X-ray images for 12 categories of prohibited items, and each image is manually annotated with careful inspection, which characterizes it, to our best knowledge, with the largest volume and varieties of annotated images with prohibited items to date. Meanwhile, we propose a general divide-and-conquer pipeline to develop baseline algorithms on PIDray. Specifically, we adopt the tree-like structure to suppress the influence of the long-tailed issue in the PIDray dataset, where the first course-grained node is tasked with the binary classification to alleviate the influence of head category, while the subsequent fine-grained node is dedicated to the specific tasks of the tail categories. Based on this simple yet effective scheme, we offer strong task-specific baselines across object detection, instance segmentation, and multi-label classification tasks and verify the generalization ability on common datasets (e.g., COCO and PASCAL VOC). Extensive experiments on PIDray demonstrate that the proposed method performs favorably against current state-of-the-art methods, especially for deliberately hidden items. Our benchmark and codes are available at https://github.com/lutao2021/PIDray.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

Our benchmark and codes are released at https://github.com/lutao2021/PIDray.

Change history

Notes

  1. http://labelme.csail.mit.edu/Release3.0/.

  2. https://github.com/open-mmlab/mmdetection.

References

  • Akcay, S., & Breckon, T. (2022). Towards automatic threat detection: A survey of advances of deep learning within x-ray security imaging. Pattern Recognition, 122(108), 245.

    Google Scholar 

  • Akcay, S., & Breckon, T.P. (2017). An evaluation of region based object detection strategies within x-ray baggage security imagery. In IEEE International Conference on Image Processing (ICIP), pp. 1337–1341.

  • Akcay, S., Kundegorski, M. E., Willcocks, C. G., et al. (2018). Using deep convolutional neural network architectures for object classification and detection within x-ray baggage security imagery. IEEE Transactions on Information Forensics and Security, 13(9), 2203–2215.

    Article  Google Scholar 

  • Ben-Baruch, E., Ridnik, T., Friedman, I., et al. (2021). Multi-label classification with partial annotations using class-aware selective loss. arXiv preprint arXiv:2110.10955.

  • Cai, J., Wang, Y., & Hwang, J.N. (2021). Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In IEEE International Conference on Computer Vision (ICCV), pp. 112–121.

  • Cai, Y., Du, D., Zhang, L., et al. (2020). Guided attention network for object detection and counting on drones. In ACM International Conference on Multimedia (ACM MM), pp. 709–717.

  • Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 6154–6162.

  • Cai, Z., & Vasconcelos, N. (2019). Cascade r-cnn: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1483–1498.

    Article  Google Scholar 

  • Cao, Y., Xu, J., Lin, S., et al. (2019). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops.

  • Carion, N., Massa, F., Synnaeve, G., et al. (2020). End-to-end object detection with transformers. In European Conference on Computer Vision (ECCV), Springer, pp. 213–229.

  • Chen, K., Cao, Y., Loy, C.C., et al. (2020). Feature pyramid grids. arXiv preprint arXiv:2004.03580.

  • Chen, Z., Yang, C., Li, Q., et al (2021) Disentangle your dense object detector. In ACM International Conference on Multimedia (ACM MM), pp. 4939–4948.

  • Cui, Y., Jia, M., Lin, T.Y., et al. (2019). Class-balanced loss based on effective number of samples. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 9268–9277.

  • Duan, K., Bai, S., Xie, L., et al .(2019). Centernet: Keypoint triplets for object detection. InIEEE International Conference on Computer Vision (ICCV). pp. 6569–6578.

  • Everingham, M., Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Everingham, M., Van Gool, L., Williams, C. K., et al. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.

    Article  Google Scholar 

  • Feng, C., Zhong, Y., Gao, Y., et al. (2021). Tood: Task-aligned one-stage object detection. In IEEE International Conference on Computer Vision (ICCV), pp. 3490–3499.

  • Fu, C.Y., Liu, W., Ranga, A., et al (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659.

  • Gao, B. B., & Zhou, H. Y. (2021). Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Transactions on Image Processing, 30, 5920–5932.

    Article  Google Scholar 

  • Gao, Z., Xie, J., Wang, Q., et al. (2019). Global second-order pooling convolutional networks. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 3024–3033.

  • Ge, Z., Liu, S., Wang, F., et al. (2021). YOLOX: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430.

  • Ghiasi, G., Lin, T.Y., & Le, Q.V. (2019). Nas-fpn: Learning scalable feature pyramid architecture for object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 7036–7045.

  • Girshick, R. (2015). Fast r-cnn. In IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448.

  • Girshick, R., Donahue, J., Darrell, T., et al. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR)., pp. 580–587.

  • Gong, Y., Yu, X., Ding, Y., et al. (2021). Effective fusion factor in fpn for tiny object detection. In IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1160–1168.

  • He, K., Zhang, X., Ren, S., et al (2016) Deep residual learning for image recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 770–778.

  • He, K., Gkioxari, G., Dollár, P., et al. (2017). Mask r-cnn. In IEEE International Conference on Computer Vision (ICCV). pp. 2961–2969.

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 7132–7141.

  • Huang, Z., Wang, X., Huang, L., et al. (2019). Ccnet: Criss-cross attention for semantic segmentation. In IEEE International Conference on Computer Vision (ICCV). pp. 603–612.

  • Ji, R., Du, D., Zhang, L., et al. (2020a). Learning semantic neural tree for human parsing. In European Conference on Computer Vision (ECCV). pp. 205–221.

  • Ji, R., Wen, L., Zhang, L., et al. (2020b). Attention convolutional binary neural tree for fine-grained visual categorization. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 10465–10474.

  • Kang, B., Xie, S., Rohrbach, M., et al. (2019). Decoupling representation and classifier for long-tailed recognition. arXiv preprint arXiv:1910.09217.

  • Law, H., & Deng, J. (2018). Cornernet: Detecting objects as paired keypoints. In European Conference on Computer Vision (ECCV). pp. 734–750.

  • Lewis, D. Griffin JTAAMatthew Caldwell. (2019). Compass-xp.

  • Li, C., Du, D., Zhang, L., et al, (2020). Spatial attention pyramid network for unsupervised domain adaptation. In European Conference on Computer Vision (ECCV). pp. 481–497.

  • Li, S., Gong, K., Liu, C.H., et al. (2021). Metasaug: Meta semantic augmentation for long-tailed visual recognition. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 5212–5221.

  • Li, S., He, C., Li, R., et al (2022a) A dual weighting label assignment scheme for object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 9387–9396.

  • Li, Y., Mao, H., Girshick, R., et al. (2022b). Exploring plain vision transformer backbones for object detection. arXiv preprint arXiv:2203.16527.

  • Lin, T.Y., Maire. M., Belongie. S., et al. (2014). Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV). pp. 740–755.

  • Lin, T.Y., Dollár, P., Girshick, R., et al. (2017a). Feature pyramid networks for object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 2117–2125.

  • Lin, T.Y., Goyal, P., Girshick, R., et al. (2017b). Focal loss for dense object detection. In IEEE International Conference on Computer Vision (ICCV). pp. 2980–2988.

  • Liu, J., Leng, J., Ying, L. (2019a). Deep convolutional neural network based object detector for x-ray baggage security imagery. In IEEE International Conference on Tools with Artificial Intelligence (ICTAI). pp. 1757–1761.

  • Liu, S., Qi, L., Qin, H., et al. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 8759–8768.

  • Liu, S., Huang, D., & Wang, Y. (2019b). Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516.

  • Liu, S., Zhang, L., Yang, X., et al. (2021). Query2label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834.

  • Liu, W., Anguelov, D., Erhan, D., et al. (2016). Ssd: Single shot multibox detector. In European Conference on Computer Vision (ECCV). pp. 21–37.

  • Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.

  • Mery, D., Riffo, V., Zscherpel, U., et al. (2015). Gdxray: The database of x-ray images for nondestructive testing. Journal of Nondestructive Evaluation, 34(4), 42.

    Article  Google Scholar 

  • Mery, D., Saavedra, D., & Prasad, M. (2020). X-ray baggage inspection with computer vision: A survey. IEEE Access, 8, 145620–145633.

    Article  Google Scholar 

  • Miao, C., Xie, L., Wan, F., et al. (2019). Sixray: A large-scale security inspection x-ray benchmark for prohibited item discovery in overlapping images. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 2119–2128.

  • Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. Advances in Neural Information Processing Systems In Proceeings of Neural Information Processing Systems (NIPS). 27.

  • Pang, J., Chen, K., Shi, J., et al. (2019). Libra r-cnn: Towards balanced learning for object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 821–830.

  • Qin, Z., Zhang, P., Wu, F., et al. (2021), Fcanet: Frequency channel attention networks. In IEEE International Conference on Computer Vision (ICCV). pp. 783–792.

  • Redmon, J., & Farhadi, A. (2017). Yolo9000: better, faster, stronger. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 7263–7271.

  • Redmon, J., Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.

  • Redmon, J., Divvala, S., Girshick, R., et al. (2016). You only look once: Unified, real-time object detection. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 779–788.

  • Ren, S., He, K., Girshick, R., et al. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Annual Conference on Neural Information Processing Systems (NIPS). pp. 91–99.

  • Ridnik, T., Lawen, H., Noy, A., et al. (2020). Tresnet: High performance gpu-dedicated architecture. 2003.13630.

  • Ridnik, T., Ben-Baruch, E., Zamir, N., et al. (2021a). Asymmetric loss for multi-label classification. In IEEE International Conference on Computer Vision (ICCV). pp. 82–91.

  • Ridnik, T., Sharir, G., Ben-Cohen, A., et al. (2021b). Ml-decoder: Scalable and versatile classification head. arXiv preprint arXiv:2111.12933.

  • Saavedra, D., Banerjee, S., & Mery, D. (2021). Detection of threat objects in baggage inspection with x-ray images using deep learning. Neural Computing and Applications, 33, 7803–7819.

    Article  Google Scholar 

  • Sermanet, P., Eigen, D., Zhang, X., et al. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229.

  • Smith, L.N., & Topin, N. (2019). Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-domain Operations Applications. pp. 369–386.

  • Tao, R., Wei, Y., Jiang, X., et al. (2021a). Towards real-world x-ray security inspection: A high-quality benchmark and lateral inhibition module for prohibited items detection. In IEEE ICCV.

  • Tao, R., Wei, Y., Li, H., et al. (2021b). Over-sampling de-occlusion attention network for prohibited items detection in noisy x-ray images. arXiv preprint arXiv:2103.00809.

  • Tian, Z., Shen, C., Chen, H., et al. (2019). Fcos: Fully convolutional one-stage object detection. In IEEE International Conference on Computer Vision (ICCV) pp. 9627–9636.

  • Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. NIPS 30 5998–6008.

  • Velayudhan, D., Hassan, T., Damiani, E., et al. (2022). Recent advances in baggage threat detection: A comprehensive and systematic survey. ACM Computing Surveys, 55(8), 1–38.

    Article  Google Scholar 

  • Wang, B., Zhang, L., Wen, L., et al. (2021). Towards real-world prohibited item detection: A large-scale x-ray benchmark. In IEEE International Conference on Computer Vision (ICCV), pp. 5412–5421.

  • Wang, J., Chen, K., Xu, R., et al. (2019a). Carafe: Content-aware reassembly of features. In IEEE International Conference on Computer Vision (ICCV). pp. 3007–3016.

  • Wang, Q., Wu, B., Zhu, P., et al. (2020). Eca-net: Efficient channel attention for deep convolutional neural networks. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR).

  • Wang, X., Girshick, R., Gupta. A., et al. (2018) Non-local neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR).

  • Wang, Y., Gan, W., Yang, J., et al. (2019b). Dynamic curriculum learning for imbalanced data classification. In: IEEE International Conference on Computer Vision (ICCV). pp. 5017–5026.

  • Wang, Y., Pan, X., Song, S., et al. (2019c). Implicit semantic data augmentation for deep networks. Advances in Neural Information Processing Systems 32

  • Wei, Y., Tao, R., Wu, Z., et al. (2020). Occluded prohibited items detection: An x-ray security inspection benchmark and de-occlusion attention module. In Proceedings of the 28th ACM International Conference on Multimedia. pp. 138–146.

  • Woo, S., Park, J., Lee, J.Y., et al. (2018). Cbam: Convolutional block attention module. In European Conference on Computer Vision (ECCV)., pp. 3–19.

  • Wu, H., Xiao, B., Codella, N., et al. (2021). Cvt: Introducing convolutions to vision transformers. In IEEE International Conference on Computer Vision (ICCV), pp. 22–31.

  • Xiang, L., Ding, G., & Han, J. (2020). Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In European Conference on Computer Vision (ECCV), Springer, pp. 247–263.

  • Yin, X., Yu, X., Sohn, K., et al. (2019). Feature transfer learning for face recognition with under-represented data. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR). pp. 5704–5713.

  • Zhou, B., Cui, Q., Wei, X.S., et al. (2020). Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In IEEE International Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 9719–9728.

  • Zhu, K., Wu, J. (2021). Residual attention: A simple but effective method for multi-label recognition. In IEEE International Conference on Computer Vision (ICCV), pp. 184–193.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruyi Ji.

Additional information

Communicated by D. Scharstein

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Libo Zhang and Lutao Jiang make equal contributions to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Jiang, L., Ji, R. et al. PIDray: A Large-Scale X-ray Benchmark for Real-World Prohibited Item Detection. Int J Comput Vis 131, 3170–3192 (2023). https://doi.org/10.1007/s11263-023-01855-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01855-1

Keywords

Navigation