Instance Segmentation in the Dark

Chen, Linwei; Fu, Ying; Wei, Kaixuan; Zheng, Dezhi; Heide, Felix

doi:10.1007/s11263-023-01808-8

Instance Segmentation in the Dark

Published: 26 May 2023

Volume 131, pages 2198–2218, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Linwei Chen^1,2,
Ying Fu ORCID: orcid.org/0000-0002-6677-694X²,
Kaixuan Wei^2,3,4,
Dezhi Zheng¹ &
…
Felix Heide^3,5

1319 Accesses
4 Citations
4 Altmetric
Explore all metrics

Abstract

Existing instance segmentation techniques are primarily tailored for high-visibility inputs, but their performance significantly deteriorates in extremely low-light environments. In this work, we take a deep look at instance segmentation in the dark and introduce several techniques that substantially boost the low-light inference accuracy. The proposed method is motivated by the observation that noise in low-light images introduces high-frequency disturbances to the feature maps of neural networks, thereby significantly degrading performance. To suppress this “feature noise”, we propose a novel learning method that relies on an adaptive weighted downsampling layer, a smooth-oriented convolutional block, and disturbance suppression learning. These components effectively reduce feature noise during downsampling and convolution operations, enabling the model to learn disturbance-invariant features. Furthermore, we discover that high-bit-depth RAW images can better preserve richer scene information in low-light conditions compared to typical camera sRGB outputs, thus supporting the use of RAW-input algorithms. Our analysis indicates that high bit-depth can be critical for low-light instance segmentation. To mitigate the scarcity of annotated RAW datasets, we leverage a low-light RAW synthetic pipeline to generate realistic low-light data. In addition, to facilitate further research in this direction, we capture a real-world low-light instance segmentation dataset comprising over two thousand paired low/normal-light images with instance-level pixel-wise annotations. Remarkably, without any image preprocessing, we achieve satisfactory performance on instance segmentation in very low light (4% AP higher than state-of-the-art competitors), meanwhile opening new opportunities for future research. Our code and dataset are publicly available to the community (https://github.com/Linwei-Chen/LIS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

KinD-LCE: curve estimation and Retinex Fusion on low-light image

Article 06 December 2023

Low-Light Image Enhancement Under Non-uniform Dark

Luminance domain-guided low-light image enhancement

Article 25 April 2024

Notes

To make the detector compatible with sRGB inputs, instead of the Bayer RAW images, we follow (Chen et al., 2019a) to use demosaicked 3-channel RAW-RGB images as inputs, where the green channel is obtained by averaging the two green pixels in each two-by-two Bayer block. In the following, we refer to "RAW" and "RAW-RGB" interchangeably.
We use COCO samples belonging to the same 8 object classes in the LIS dataset.

References

Anaya, J., & Barbu, A. (2018). Renoir: A dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51(1), 144–154.
Article Google Scholar
Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2019). Yolact: Real-time instance segmentation. In Proceedings of IEEE international conference on computer vision (pp. 9157–9166).
Brooks, T., Mildenhall, B., Xue, T., Chen, J., Sharlet, D., & Barron, J. T. (2019). Unprocessing images for learned raw denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 11036–11045).
Chen, C., Chen, Q., Do, M. N., & Koltun, V. (2019a). Seeing motion in the dark. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3185–3194).
Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the dark. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3291–3300).
Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., & Yan, Y. (2020). Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 8573–8581).
Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., & Ouyang, W., et al. (2019b). Hybrid task cascade for instance segmentation. In Proceedings of IEEE international conference on computer vision (pp. 4974–4983).
Chen, L., Fu, Y., You, S., & Liu, H. (2021). Efficient hybrid supervision for instance segmentation in aerial images. Remote Sensing, 13(2), 252.
Article Google Scholar
Chen, L., Fu, Y., You, S., & Liu, H. (2022). Hybrid supervised instance segmentation by learning label noise suppression. Neurocomputing, 496, 131–146.
Article Google Scholar
Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., & Girdhar, R. (2022). Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1290–1299).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of IEEE international conference on computer vision (pp. 3213–3223).
Cui, Z., Qi, G. J., Gu, L., You, S., Zhang, Z., & Harada, T. (2021). Multitask aet with orthogonal tangent regularity for dark object detection. In Proceedings of IEEE international conference on computer vision (pp. 2553–2562).
Dai, D., Sakaridis, C., Hecker, S., & Van Gool, L. (2020). Curriculum model adaptation with synthetic and real data for semantic foggy scene understanding. International Journal of Computer Vision, 128(5), 1182–1204.
Article Google Scholar
Dai, D., & Van Gool, L. (2018). Dark model adaptation: Semantic image segmentation from daytime to nighttime. In Proceedings of international conference on intelligent transportation systems (pp. 3819–3824).
Dang-Nguyen, D. T., Pasquini, C., Conotter, V., & Boato, G. (2015). Raise: A raw images dataset for digital image forensics. In Proceedings of the 6th ACM multimedia systems conference (pp. 219–224).
De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic instance segmentation for autonomous driving. In Proceedings of IEEE conference on computer vision and pattern recognition workshops (pp. 7–9).
Diamond, S., Sitzmann, V., Julca-Aguilar, F., Boyd, S., Wetzstein, G., & Heide, F. (2021). Dirty pixels: Towards end-to-end image processing and perception. ACM Transactions on Graphics, 40(3), 1–15.
Article Google Scholar
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., & Sun, J. (2021). Repvgg: Making vgg-style convnets great again. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13733–13742).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Fang, K., Bai, Y., Hinterstoisser, S., Savarese, S., & Kalakrishnan, M. (2018). Multi-task domain adaptation for deep learning of instance grasping from simulation. In Proceedings of IEEE international conference on robotics and automation (pp. 3516–3523).
Foi, A., Trimeche, M., Katkovnik, V., & Egiazarian, K. (2008). Practical Poissonian–Gaussian noise modeling and fitting for single-image raw-data. IEEE Transactions on Image Processing, 17(10), 1737–1754.
Article MathSciNet MATH Google Scholar
Fu, Y., Hong, Y., Chen, L., & You, S. (2022). Le-gan: Unsupervised low-light image enhancement network using attention module and identity invariant loss. Knowledge-Based Systems, 240, 108010.
Article Google Scholar
Fu, Y., Zhang, T., Wang, L., & Huang, H. (2021). Coded hyperspectral image reconstruction using deep external and internal learning. IEEE Transactions Pattern Analysis and Machine Intelligence, 44(7), 3404–3420.
Google Scholar
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2414–2423).
Gnanasambandam, A., & Chan, S. H. (2020). Image classification in the dark using quanta image sensors. In Proceedings of European conference on computer vision (pp. 484–501).
Gonzalez, R. C., & Woods, R. E., et al. (2002). Digital image processing.
Gu, S., Li, Y., Gool, L. V., & Timofte, R. (2019). Self-guided network for fast image denoising. In Proceedings of IEEE international conference on computer vision (pp. 2511–2520).
Guo, C., Li, C., Guo, J., Loy, C. C., Hou, J., Kwong, S., & Cong, R. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1780–1789).
Hahn, J., Tai, X. C., Borok, S., & Bruckstein, A. M. (2011). Orientation-matching minimization for image denoising and inpainting. International Journal of Computer Vision, 92(3), 308–324.
Article MathSciNet MATH Google Scholar
Hajiaboli, M. R. (2011). An anisotropic fourth-order diffusion filter for image noise removal. International Journal of Computer Vision, 92(2), 177–191.
Article MathSciNet MATH Google Scholar
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of IEEE international conference on computer vision (pp. 2961–2969).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 770–778).
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7)
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask scoring r-cnn. In Proceedings of IEEE international conference on computer vision (pp. 6409–6418).
Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P., & Wang, Z. (2021). Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30(1), 2340–2349.
Article Google Scholar
Julca-Aguilar, F., Taylor, J., Bijelic, M., Mannan, F., Tseng, E., & Heide, F. (2021). Gated3d: Monocular 3d object detection from temporal illumination cues. In Proceedings of IEEE international conference on computer vision (pp. 2938–2948).
Kirillov, A., Wu, Y., He, K., & Girshick, R. (2020). Pointrend: Image segmentation as rendering. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 9799–9808).
Lamba, M., & Mitra, K. (2021). Restoring extremely dark images in real time. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3487–3497).
Lee, Y., & Park, J. (2019). Centermask: Real-time anchor-free instance segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13906–13915).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2117–2125).
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of IEEE international conference on computer vision (pp. 2980–2988).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014a). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (pp. 740–755).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (pp. 740–755).
Liu, D., Wen, B., Jiao, J., Liu, X., Wang, Z., & Huang, T. S. (2020). Connecting image denoising and high-level vision tasks via deep learning. IEEE TIP, 29(1), 3695–3706.
MATH Google Scholar
Liu, J., Xu, D., Yang, W., Fan, M., & Huang, H. (2021). Benchmarking low-light image enhancement and beyond. International Journal of Computer Vision, 129(4), 1153–1184.
Article Google Scholar
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.
Article MATH Google Scholar
Liu, Y., Qin, Z., Anwar, S., Ji, P., Kim, D., Caldwell, S., & Gedeon, T. (2021b). Invertible denoising network: A light solution for real noise removal. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13365–13374).
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021c). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE international conference on computer vision (pp. 10012–10022).
Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A convnet for the 2020s. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 11976–11986).
Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178(1), 30–42.
Article Google Scholar
Lore, K. G., Akintayo, A., & Sarkar, S. (2017). Llnet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition, 61, 650–662.
Article Google Scholar
Lv, F., Li, Y., & Lu, F. (2021). Attention guided low-light image enhancement with a large scale low-light simulation dataset. International Journal of Computer Vision, 129(7), 2175–2193.
Article Google Scholar
Mohan, R., & Valada, A. (2021). Efficientps: Efficient panoptic segmentation. International Journal of Computer Vision, 129(5), 1551–1579.
Article Google Scholar
Morawski, I., Chen, Y. A., Lin, Y. S., & Hsu, W. H. (2021). Nod: Taking a closer look at detection under extreme low-light conditions with night object detection dataset. In Proceedings of the British machine vision conference (pp. 1–13).
Plotz, T., & Roth, S. (2017). Benchmarking denoising algorithms with real photographs. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1586–1595).
Punnappurath, A., Abuolaim, A., Abdelhamed, A., Levinshtein, A., & Brown, M. S. (2022). Day-to-night image synthesis for training nighttime neural isps. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 10769–10778).
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 779–788)
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of advances in neural information processing systems (pp. 91–99).
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234–241).
Sakaridis, C., Dai, D., & Van Gool, L. (2018). Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 126(9), 973–992.
Article Google Scholar
Sakaridis, C., Dai, D., & Van Gool, V. (2019). Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In Proceedings of IEEE international conference on computer vision (pp. 7374–7383).
Sasagawa, Y., & Nagahara, H. (2020). Yolo in the dark-domain adaptation method for merging multiple models. In Proceedings of European conference on computer vision (pp. 345–359).
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In Proceedings of international conference on learning representations (pp. 1–10).
Tan, S., & Jiao, L. (2007). Multivariate statistical models for image denoising in the wavelet domain. International Journal of Computer Vision, 75(2), 209–230.
Article MATH Google Scholar
Tan, X., Xu, K., Cao, Y., Zhang, Y., Ma, L., & Lau, R. W. (2021). Night-time scene parsing with a large real dataset. IEEE Transactions on Image Processing, 30(1), 9085–9098.
Article Google Scholar
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of IEEE international conference on computer vision (pp. 9627–9636).
Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2020). Deep image prior. International Journal of Computer Vision, 128(7), 1867–1889.
Article Google Scholar
Wang, W., Wei, C., Yang, W., & Liu, J. (2018a). Gladnet: Low-light enhancement network with global awareness. In Proceedings of IEEE international conference on automatic face & gesture recognition (pp. 751–755).
Wang, W., Yang, W., & Liu, J. (2021). Hla-face: Joint high-low adaptation for low light face detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 16195–16204).
Wang, X., Girshick, R., Gupta, A., & He, K. (2018b). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794–7803).
Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep retinex decomposition for low-light enhancement. In Proceedings of the British machine vision conference (pp. 1–12).
Wei, K., Fu, Y., Yang, J., & Huang, H. (2020). A physics-based noise formation model for extreme low-light raw denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2758–2767).
Wei, K., Fu, Y., Zheng, Y., & Yang, J. (2021). Physics-based noise modeling for extreme low-light photography. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(1), 1–17.
Article Google Scholar
Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). Cbam: Convolutional block attention module. In Proceedings of European conference on computer vision (pp. 3–19).
Xiang, Y., Fu, Y., Zhang, L., & Huang, H. (2019). An effective network with convlstm for low-light image enhancement. In Pattern recognition and computer vision (pp. 221–233).
Xie, C., Wu, Y., Maaten, L. V. D., Yuille, A. L., & He, K. (2019). Feature denoising for improving adversarial robustness. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 501–509).
Xu, K., Yang, X., Yin, B., & Lau, R. W. (2020). Learning to restore low-light images via decomposition-and-enhancement. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2281–2290).
Yang, H., Kaixuan, W., Linwei, C., & Ying, F. (2021). Crafting object detection in very low light. In Proceedings of the British machine vision conference (pp. 1–15).
Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W. J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S., et al. (2020). Advancing image understanding in poor visibility environments: A collective benchmark study. IEEE Transactions on Image Processing, 29(1), 5737–5752.
Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W. J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S., et al. (2020). Advancing image understanding in poor visibility environments: A collective benchmark study. IEEE TIP, 29(1), 5737–5752.
Zhang, F., Li, Y., You, S., & Fu, Y. (2021a). Learning temporal consistency for low light video enhancement from single images. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 4967–4976).
Zhang, T., Fu, Y., & Zhang, J. (2022). Guided hyperspectral image denoising with realistic data. International Journal of Computer Vision, 130(11), 2885–2901.
Article Google Scholar
Zhang, Y., Guo, X., Ma, J., Liu, W., & Zhang, J. (2021). Beyond brightening low-light images. International Journal of Computer Vision, 129(4), 1013–1037.
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants No. 62171038, No. 61936011, No. 62088101, and No. 62006023. Felix Heide was supported by an NSF CAREER Award (2047359), a Packard Foundation Fellowship, a Sloan Research Fellowship, a Sony Young Faculty Award, a Project X Innovation Award, and an Amazon Science Research Award.

Author information

Authors and Affiliations

Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
Linwei Chen & Dezhi Zheng
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Linwei Chen, Ying Fu & Kaixuan Wei
Department of Computer Science, Princeton University, Princeton, NJ, USA
Kaixuan Wei & Felix Heide
Department of Electrical and Computer Engineering, McGill University, Quebec, Canada
Kaixuan Wei
Algolux, Rue Richardson, Montreal, Canada
Felix Heide

Authors

Linwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ying Fu
View author publications
You can also search for this author in PubMed Google Scholar
Kaixuan Wei
View author publications
You can also search for this author in PubMed Google Scholar
Dezhi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Felix Heide
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Fu.

Additional information

Communicated by Dengxin Dai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, L., Fu, Y., Wei, K. et al. Instance Segmentation in the Dark. Int J Comput Vis 131, 2198–2218 (2023). https://doi.org/10.1007/s11263-023-01808-8

Download citation

Received: 26 April 2022
Accepted: 12 April 2023
Published: 26 May 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s11263-023-01808-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instance Segmentation in the Dark

Abstract

Access this article

Similar content being viewed by others

KinD-LCE: curve estimation and Retinex Fusion on low-light image

Low-Light Image Enhancement Under Non-uniform Dark

Luminance domain-guided low-light image enhancement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Instance Segmentation in the Dark

Abstract

Access this article

Similar content being viewed by others

KinD-LCE: curve estimation and Retinex Fusion on low-light image

Low-Light Image Enhancement Under Non-uniform Dark

Luminance domain-guided low-light image enhancement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation