Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection

Cui, Ziteng; Zhu, Yingying; Gu, Lin; Qi, Guo-Jun; Li, Xiaoxiao; Zhang,  Renrui; Zhang, Zenghui; Harada, Tatsuya

doi:10.1007/978-3-031-20077-9_28

Ziteng Cui¹²,
Yingying Zhu¹³,
Lin Gu^14,15,
Guo-Jun Qi¹⁶,
Xiaoxiao Li¹⁷,
Renrui Zhang¹⁸,
Zenghui Zhang¹² &
…
Tatsuya Harada^14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13669))

Included in the following conference series:

European Conference on Computer Vision

3121 Accesses
6 Citations
33 Altmetric

Abstract

Image restoration algorithms such as super resolution (SR) are indispensable pre-processing modules for object detection in low quality images. Most of these algorithms assume the degradation is fixed and known a priori. However, in practical, either the real degradation or optimal up-sampling ratio rate is unknown or differs from assumption, leading to a deteriorating performance for both the pre-processing module and the consequent high-level task such as object detection. Here, we propose a novel self-supervised framework to detect objects in degraded low resolution images. We utilizes the downsampling degradation as a kind of transformation for self-supervised signals to explore the equivariant representation against various resolutions and other degradation conditions. The Auto Encoding Resolution in Self-supervision (AERIS) framework could further take the advantage of advanced SR architectures with an arbitrary resolution restoring decoder to reconstruct the original correspondence from the degraded input image. Both the representation learning and object detection are optimized jointly in an end-to-end training fashion. The generic AERIS framework could be implemented on various mainstream object detection architectures with different backbones. The extensive experiments show that our methods has achieved superior performance compared with existing methods when facing variant degradation situations. Code is available at this link https://github.com/cuiziteng/ECCV_AERIS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Chapter 5, Alice in Wonderland.

References

Afifi, M., Brown, M.S.: What else can fool deep learning? addressing color constancy errors on deep neural network performance. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Bai, Y., Zhang, Y., Ding, M., Ghanem, B.: Finding tiny faces in the wild with generative adversarial network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21–30 (2018). https://doi.org/10.1109/CVPR.2018.00010
Bell-Kligler, S., Shocher, A., Irani, M.: Blind super-resolution kernel estimation using an internal-gan. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/5fd0b37cd7dbbb00f97ba6ce92bf5add-Paper.pdf
Cai, J., Zeng, H., Yong, H., Cao, Z., Zhang, L.: Toward real-world single image super-resolution: A new benchmark and a new model. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
Google Scholar
Chen, K., et al.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. arXiv preprint arXiv:2012.09161 (2020)
Cohen, T.S., Welling, M.: Group equivariant convolutional networks. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - vol. 48. pp. 2990–2999. ICML’16, JMLR.org (2016)
Google Scholar
Cui, Z., Qi, G.J., Gu, L., You, S., Zhang, Z., Harada, T.: Multitask aet with orthogonal tangent regularity for dark object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2553–2562 (2021)
Google Scholar
Dai, D., Wang, Y., Chen, Y., Van Gool, L.: Is image super-resolution helpful for other vision tasks? In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE (2009)
Google Scholar
Dong, C., Loy, C.C., He, K., et al.: Learning a deep convolutional network for image super-resolution. In: Computer Vision - ECCV 2014, pp. 184–199 (2014)
Google Scholar
Efrat, N., Glasner, D., Apartsin, A., Nadler, B., Levin, A.: Accurate blur models vs. image priors in single image super-resolution. In: 2013 IEEE International Conference on Computer Vision, pp. 2832–2839 (2013). https://doi.org/10.1109/ICCV.2013.352
Elad, M., Feuer, A.: Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images. IEEE Trans. Image Process. 6(12), 1646–1658 (1997). https://doi.org/10.1109/83.650118
Article Google Scholar
Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010) https://doi.org/10.1007/s11263-009-0275-4
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=S1v4N2l0-
Girshick, R.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Gu, J., Lu, H., Zuo, W., Dong, C.: Blind super-resolution with iterative kernel correction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Haris, M., Shakhnarovich, G., Ukita, N.: Task-driven super resolution: object detection in low-resolution images. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) Neural Inform. Process., pp. 387–395. Springer International Publishing, Cham (2021)
Chapter Google Scholar
Haris, M., Shakhnarovich, G., Ukita, N.: Deep back-projection networks for super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). DOI: https://doi.org/10.1109/CVPR.2016.90
Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: Proceedings of the International Conference on Learning Representations (2019)
Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017). https://arxiv.org/abs/1704.04861
Hu, X., Mu, H., Zhang, X., Wang, Z., Tan, T., Sun, J.: Meta-sr: A magnification-arbitrary network for super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Irie, K., McKinnon, A.E., Unsworth, K., Woodhead, I.M.: A technique for evaluation of ccd video-camera noise. IEEE Trans. Circuits Syst. Video Technol. 18(2), 280–284 (2008). https://doi.org/10.1109/TCSVT.2007.913972
Article Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision (2016)
Google Scholar
Kamann, C., Rother, C.: Benchmarking the robustness of semantic segmentation models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Karaimer, H.C., Brown, M.S.: A software platform for manipulating the camera imaging pipeline. In: European Conference on Computer Vision (ECCV) (2016)
Google Scholar
Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1646–1654 (2016). https://doi.org/10.1109/CVPR.2016.182
Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5835–5843 (2017). https://doi.org/10.1109/CVPR.2017.618
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114 (2017). https://doi.org/10.1109/CVPR.2017.19
Li, K., et al.: Uniformer: Unifying convolution and self-attention for visual recognition (2022)
Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: Swinir: Image restoration using swin transformer. In: IEEE International Conference on Computer Vision Workshops (2021)
Google Scholar
Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1132–1140 (2017). https://doi.org/10.1109/CVPRW.2017.151
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, C., Sun, D.: On bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 346–360 (2014). https://doi.org/10.1109/TPAMI.2013.127
Article Google Scholar
Liu, C., Sun, D.: On bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 346–360 (2014). https://doi.org/10.1109/TPAMI.2013.127
Article Google Scholar
Liu, D., Wen, B., Liu, X., Wang, Z., Huang, T.S.: When image denoising meets high-level vision tasks: A deep learning approach. In: IJCAI (2018)
Google Scholar
Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Luo, F., Wu, X., Guo, Y.: Functional neural networks for parametric image restoration problems. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021). https://openreview.net/forum?id=MMZ4djXrwbu
Ma, X., Wang, Z., Zhan, Y., Zheng, Y., Wang, Z., Dai, D., Lin, C.W.: Both style and fog matter: Cumulative domain adaptation for semantic foggy scene understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18922–18931 (June 2022)
Google Scholar
Mao, M., Peng, G., Zhang, R., Zheng, H., Ma, T., Peng, Y., Ding, E., Zhang, B., Han, S.: Dual-stream network for visual recognition. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021). https://openreview.net/forum?id=AjfD1JjeVKN
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. pp. 3859–3869. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Sayed, M., Brostow, G.: Improved handling of motion blur in online object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1706–1716 (2021)
Google Scholar
Shermeyer, J., Van Etten, A.: The effects of super-resolution on object detection performance in satellite imagery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)
Google Scholar
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883 (2016). https://doi.org/10.1109/CVPR.2016.207
Singh, B., Davis, L.S.: An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Singh, B., Najibi, M., Davis, L.S.: Sniper: Efficient multi-scale training. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper/2018/file/166cee72e93a992007a89b39eb29628b-Paper.pdf
Talebi, H., Milanfar, P.: Learning to resize images for computer vision tasks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 497–506 (October 2021)
Google Scholar
Tong, T., Li, G., Liu, X., Gao, Q.: Image super-resolution using dense skip connections. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 4809–4817 (2017). https://doi.org/10.1109/ICCV.2017.514
Vasiljevic, I., Chakrabarti, A., Shakhnarovich, G.: Examining the impact of blur on recognition by convolutional networks (2017)
Google Scholar
Wang, L., Li, D., Zhu, Y., Tian, L., Shan, Y.: Dual super-resolution learning for semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3773–3782 (2020). https://doi.org/10.1109/CVPR42600.2020.00383
Wang, L., Wang, Y., Dong, X., Xu, Q., Yang, J., An, W., Guo, Y.: Unsupervised degradation representation learning for blind super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10581–10590 (2021)
Google Scholar
Wang, Z., Chang, S., Yang, Y., Liu, D., Huang, T.S.: Studying very low resolution recognition using deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Wei, K., Fu, Y., Yang, J., Huang, H.: A physics-based noise formation model for extreme low-light raw denoising. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Yang, W., Yuan, Y., Ren, W., et al.: Advancing image understanding in poor visibility environments: a collective benchmark study. IEEE Trans. Image Process. 29, 5737–5752 (2020). https://doi.org/10.1109/TIP.2020.2981922
Article MATH Google Scholar
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: CVPR (2022)
Google Scholar
Zhang, K., Liang, J., Van Gool, L., Timofte, R.: Designing a practical degradation model for deep blind image super-resolution. In: IEEE International Conference on Computer Vision (2021)
Google Scholar
Zhang, K., Van Gool, L., Timofte, R.: Deep unfolding network for image super-resolution. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3217–3226 (2020)
Google Scholar
Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3929–3938 (2017)
Google Scholar
Zhang, K., Zuo, W., Zhang, L.: Learning a single convolutional super-resolution network for multiple degradations. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3262–3271 (2018)
Google Scholar
Zhang, L., Qi, G.J., Wang, L., Luo, J.: Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2547–2555 (2019)
Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. CoRR abs/1904.07850 (2019), https://arxiv.org/abs/1904.07850
Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision, pp. 479–486 (2011). https://doi.org/10.1109/ICCV.2011.6126278
Zou, W.W.W., Yuen, P.C.: Very low resolution face recognition problem. IEEE Trans. Image Process. 21(1), 327–340 (2012). https://doi.org/10.1109/TIP.2011.2162423
Article MathSciNet MATH Google Scholar

Download references

Acknowledgement

This work was supported by JST Moonshot R &D Grant Number JPMJMS2011 and JST ACT-X Grant Number JPMJAX190D, Japan.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Ziteng Cui & Zenghui Zhang
University of Texas at Arlington, Arlington, USA
Yingying Zhu
RIKEN AIP, Chuo City, Japan
Lin Gu & Tatsuya Harada
The University of Tokyo, Tokyo, Japan
Lin Gu & Tatsuya Harada
Laboratory for Machine Perception and Learning, Orlanda, USA
Guo-Jun Qi
The University of British Columbia, Vancouver, Canada
Xiaoxiao Li
Shanghai AI Laboratory, Shanghai, China
Renrui Zhang

Authors

Ziteng Cui
View author publications
You can also search for this author in PubMed Google Scholar
Yingying Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lin Gu
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Jun Qi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Renrui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zenghui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuya Harada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Gu .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 172 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cui, Z. et al. (2022). Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13669. Springer, Cham. https://doi.org/10.1007/978-3-031-20077-9_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-20077-9_28
Published: 06 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20076-2
Online ISBN: 978-3-031-20077-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection