Skip to main content
Log in

Instance Segmentation in the Dark

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Existing instance segmentation techniques are primarily tailored for high-visibility inputs, but their performance significantly deteriorates in extremely low-light environments. In this work, we take a deep look at instance segmentation in the dark and introduce several techniques that substantially boost the low-light inference accuracy. The proposed method is motivated by the observation that noise in low-light images introduces high-frequency disturbances to the feature maps of neural networks, thereby significantly degrading performance. To suppress this “feature noise”, we propose a novel learning method that relies on an adaptive weighted downsampling layer, a smooth-oriented convolutional block, and disturbance suppression learning. These components effectively reduce feature noise during downsampling and convolution operations, enabling the model to learn disturbance-invariant features. Furthermore, we discover that high-bit-depth RAW images can better preserve richer scene information in low-light conditions compared to typical camera sRGB outputs, thus supporting the use of RAW-input algorithms. Our analysis indicates that high bit-depth can be critical for low-light instance segmentation. To mitigate the scarcity of annotated RAW datasets, we leverage a low-light RAW synthetic pipeline to generate realistic low-light data. In addition, to facilitate further research in this direction, we capture a real-world low-light instance segmentation dataset comprising over two thousand paired low/normal-light images with instance-level pixel-wise annotations. Remarkably, without any image preprocessing, we achieve satisfactory performance on instance segmentation in very low light (4% AP higher than state-of-the-art competitors), meanwhile opening new opportunities for future research. Our code and dataset are publicly available to the community (https://github.com/Linwei-Chen/LIS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. To make the detector compatible with sRGB inputs, instead of the Bayer RAW images, we follow (Chen et al., 2019a) to use demosaicked 3-channel RAW-RGB images as inputs, where the green channel is obtained by averaging the two green pixels in each two-by-two Bayer block. In the following, we refer to "RAW" and "RAW-RGB" interchangeably.

  2. We use COCO samples belonging to the same 8 object classes in the LIS dataset.

References

  • Anaya, J., & Barbu, A. (2018). Renoir: A dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51(1), 144–154.

    Article  Google Scholar 

  • Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2019). Yolact: Real-time instance segmentation. In Proceedings of IEEE international conference on computer vision (pp. 9157–9166).

  • Brooks, T., Mildenhall, B., Xue, T., Chen, J., Sharlet, D., & Barron, J. T. (2019). Unprocessing images for learned raw denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 11036–11045).

  • Chen, C., Chen, Q., Do, M. N., & Koltun, V. (2019a). Seeing motion in the dark. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3185–3194).

  • Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the dark. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3291–3300).

  • Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., & Yan, Y. (2020). Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 8573–8581).

  • Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., & Ouyang, W., et al. (2019b). Hybrid task cascade for instance segmentation. In Proceedings of IEEE international conference on computer vision (pp. 4974–4983).

  • Chen, L., Fu, Y., You, S., & Liu, H. (2021). Efficient hybrid supervision for instance segmentation in aerial images. Remote Sensing, 13(2), 252.

    Article  Google Scholar 

  • Chen, L., Fu, Y., You, S., & Liu, H. (2022). Hybrid supervised instance segmentation by learning label noise suppression. Neurocomputing, 496, 131–146.

    Article  Google Scholar 

  • Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., & Girdhar, R. (2022). Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1290–1299).

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of IEEE international conference on computer vision (pp. 3213–3223).

  • Cui, Z., Qi, G. J., Gu, L., You, S., Zhang, Z., & Harada, T. (2021). Multitask aet with orthogonal tangent regularity for dark object detection. In Proceedings of IEEE international conference on computer vision (pp. 2553–2562).

  • Dai, D., Sakaridis, C., Hecker, S., & Van Gool, L. (2020). Curriculum model adaptation with synthetic and real data for semantic foggy scene understanding. International Journal of Computer Vision, 128(5), 1182–1204.

    Article  Google Scholar 

  • Dai, D., & Van Gool, L. (2018). Dark model adaptation: Semantic image segmentation from daytime to nighttime. In Proceedings of international conference on intelligent transportation systems (pp. 3819–3824).

  • Dang-Nguyen, D. T., Pasquini, C., Conotter, V., & Boato, G. (2015). Raise: A raw images dataset for digital image forensics. In Proceedings of the 6th ACM multimedia systems conference (pp. 219–224).

  • De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic instance segmentation for autonomous driving. In Proceedings of IEEE conference on computer vision and pattern recognition workshops (pp. 7–9).

  • Diamond, S., Sitzmann, V., Julca-Aguilar, F., Boyd, S., Wetzstein, G., & Heide, F. (2021). Dirty pixels: Towards end-to-end image processing and perception. ACM Transactions on Graphics, 40(3), 1–15.

    Article  Google Scholar 

  • Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., & Sun, J. (2021). Repvgg: Making vgg-style convnets great again. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13733–13742).

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Fang, K., Bai, Y., Hinterstoisser, S., Savarese, S., & Kalakrishnan, M. (2018). Multi-task domain adaptation for deep learning of instance grasping from simulation. In Proceedings of IEEE international conference on robotics and automation (pp. 3516–3523).

  • Foi, A., Trimeche, M., Katkovnik, V., & Egiazarian, K. (2008). Practical Poissonian–Gaussian noise modeling and fitting for single-image raw-data. IEEE Transactions on Image Processing, 17(10), 1737–1754.

    Article  MathSciNet  MATH  Google Scholar 

  • Fu, Y., Hong, Y., Chen, L., & You, S. (2022). Le-gan: Unsupervised low-light image enhancement network using attention module and identity invariant loss. Knowledge-Based Systems, 240, 108010.

    Article  Google Scholar 

  • Fu, Y., Zhang, T., Wang, L., & Huang, H. (2021). Coded hyperspectral image reconstruction using deep external and internal learning. IEEE Transactions Pattern Analysis and Machine Intelligence, 44(7), 3404–3420.

    Google Scholar 

  • Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2414–2423).

  • Gnanasambandam, A., & Chan, S. H. (2020). Image classification in the dark using quanta image sensors. In Proceedings of European conference on computer vision (pp. 484–501).

  • Gonzalez, R. C., & Woods, R. E., et al. (2002). Digital image processing.

  • Gu, S., Li, Y., Gool, L. V., & Timofte, R. (2019). Self-guided network for fast image denoising. In Proceedings of IEEE international conference on computer vision (pp. 2511–2520).

  • Guo, C., Li, C., Guo, J., Loy, C. C., Hou, J., Kwong, S., & Cong, R. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1780–1789).

  • Hahn, J., Tai, X. C., Borok, S., & Bruckstein, A. M. (2011). Orientation-matching minimization for image denoising and inpainting. International Journal of Computer Vision, 92(3), 308–324.

    Article  MathSciNet  MATH  Google Scholar 

  • Hajiaboli, M. R. (2011). An anisotropic fourth-order diffusion filter for image noise removal. International Journal of Computer Vision, 92(2), 177–191.

    Article  MathSciNet  MATH  Google Scholar 

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of IEEE international conference on computer vision (pp. 2961–2969).

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 770–778).

  • Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7)

  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 7132–7141).

  • Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask scoring r-cnn. In Proceedings of IEEE international conference on computer vision (pp. 6409–6418).

  • Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P., & Wang, Z. (2021). Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30(1), 2340–2349.

    Article  Google Scholar 

  • Julca-Aguilar, F., Taylor, J., Bijelic, M., Mannan, F., Tseng, E., & Heide, F. (2021). Gated3d: Monocular 3d object detection from temporal illumination cues. In Proceedings of IEEE international conference on computer vision (pp. 2938–2948).

  • Kirillov, A., Wu, Y., He, K., & Girshick, R. (2020). Pointrend: Image segmentation as rendering. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 9799–9808).

  • Lamba, M., & Mitra, K. (2021). Restoring extremely dark images in real time. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3487–3497).

  • Lee, Y., & Park, J. (2019). Centermask: Real-time anchor-free instance segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13906–13915).

  • Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2117–2125).

  • Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of IEEE international conference on computer vision (pp. 2980–2988).

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014a). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (pp. 740–755).

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (pp. 740–755).

  • Liu, D., Wen, B., Jiao, J., Liu, X., Wang, Z., & Huang, T. S. (2020). Connecting image denoising and high-level vision tasks via deep learning. IEEE TIP, 29(1), 3695–3706.

    MATH  Google Scholar 

  • Liu, J., Xu, D., Yang, W., Fan, M., & Huang, H. (2021). Benchmarking low-light image enhancement and beyond. International Journal of Computer Vision, 129(4), 1153–1184.

    Article  Google Scholar 

  • Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.

    Article  MATH  Google Scholar 

  • Liu, Y., Qin, Z., Anwar, S., Ji, P., Kim, D., Caldwell, S., & Gedeon, T. (2021b). Invertible denoising network: A light solution for real noise removal. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13365–13374).

  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021c). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE international conference on computer vision (pp. 10012–10022).

  • Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A convnet for the 2020s. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 11976–11986).

  • Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178(1), 30–42.

    Article  Google Scholar 

  • Lore, K. G., Akintayo, A., & Sarkar, S. (2017). Llnet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition, 61, 650–662.

    Article  Google Scholar 

  • Lv, F., Li, Y., & Lu, F. (2021). Attention guided low-light image enhancement with a large scale low-light simulation dataset. International Journal of Computer Vision, 129(7), 2175–2193.

    Article  Google Scholar 

  • Mohan, R., & Valada, A. (2021). Efficientps: Efficient panoptic segmentation. International Journal of Computer Vision, 129(5), 1551–1579.

    Article  Google Scholar 

  • Morawski, I., Chen, Y. A., Lin, Y. S., & Hsu, W. H. (2021). Nod: Taking a closer look at detection under extreme low-light conditions with night object detection dataset. In Proceedings of the British machine vision conference (pp. 1–13).

  • Plotz, T., & Roth, S. (2017). Benchmarking denoising algorithms with real photographs. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1586–1595).

  • Punnappurath, A., Abuolaim, A., Abdelhamed, A., Levinshtein, A., & Brown, M. S. (2022). Day-to-night image synthesis for training nighttime neural isps. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 10769–10778).

  • Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 779–788)

  • Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of advances in neural information processing systems (pp. 91–99).

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234–241).

  • Sakaridis, C., Dai, D., & Van Gool, L. (2018). Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 126(9), 973–992.

    Article  Google Scholar 

  • Sakaridis, C., Dai, D., & Van Gool, V. (2019). Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In Proceedings of IEEE international conference on computer vision (pp. 7374–7383).

  • Sasagawa, Y., & Nagahara, H. (2020). Yolo in the dark-domain adaptation method for merging multiple models. In Proceedings of European conference on computer vision (pp. 345–359).

  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In Proceedings of international conference on learning representations (pp. 1–10).

  • Tan, S., & Jiao, L. (2007). Multivariate statistical models for image denoising in the wavelet domain. International Journal of Computer Vision, 75(2), 209–230.

    Article  MATH  Google Scholar 

  • Tan, X., Xu, K., Cao, Y., Zhang, Y., Ma, L., & Lau, R. W. (2021). Night-time scene parsing with a large real dataset. IEEE Transactions on Image Processing, 30(1), 9085–9098.

    Article  Google Scholar 

  • Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of IEEE international conference on computer vision (pp. 9627–9636).

  • Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2020). Deep image prior. International Journal of Computer Vision, 128(7), 1867–1889.

    Article  Google Scholar 

  • Wang, W., Wei, C., Yang, W., & Liu, J. (2018a). Gladnet: Low-light enhancement network with global awareness. In Proceedings of IEEE international conference on automatic face & gesture recognition (pp. 751–755).

  • Wang, W., Yang, W., & Liu, J. (2021). Hla-face: Joint high-low adaptation for low light face detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 16195–16204).

  • Wang, X., Girshick, R., Gupta, A., & He, K. (2018b). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794–7803).

  • Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep retinex decomposition for low-light enhancement. In Proceedings of the British machine vision conference (pp. 1–12).

  • Wei, K., Fu, Y., Yang, J., & Huang, H. (2020). A physics-based noise formation model for extreme low-light raw denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2758–2767).

  • Wei, K., Fu, Y., Zheng, Y., & Yang, J. (2021). Physics-based noise modeling for extreme low-light photography. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(1), 1–17.

    Article  Google Scholar 

  • Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). Cbam: Convolutional block attention module. In Proceedings of European conference on computer vision (pp. 3–19).

  • Xiang, Y., Fu, Y., Zhang, L., & Huang, H. (2019). An effective network with convlstm for low-light image enhancement. In Pattern recognition and computer vision (pp. 221–233).

  • Xie, C., Wu, Y., Maaten, L. V. D., Yuille, A. L., & He, K. (2019). Feature denoising for improving adversarial robustness. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 501–509).

  • Xu, K., Yang, X., Yin, B., & Lau, R. W. (2020). Learning to restore low-light images via decomposition-and-enhancement. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2281–2290).

  • Yang, H., Kaixuan, W., Linwei, C., & Ying, F. (2021). Crafting object detection in very low light. In Proceedings of the British machine vision conference (pp. 1–15).

  • Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W. J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S., et al. (2020). Advancing image understanding in poor visibility environments: A collective benchmark study. IEEE Transactions on Image Processing, 29(1), 5737–5752.

  • Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W. J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S., et al. (2020). Advancing image understanding in poor visibility environments: A collective benchmark study. IEEE TIP, 29(1), 5737–5752.

  • Zhang, F., Li, Y., You, S., & Fu, Y. (2021a). Learning temporal consistency for low light video enhancement from single images. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 4967–4976).

  • Zhang, T., Fu, Y., & Zhang, J. (2022). Guided hyperspectral image denoising with realistic data. International Journal of Computer Vision, 130(11), 2885–2901.

    Article  Google Scholar 

  • Zhang, Y., Guo, X., Ma, J., Liu, W., & Zhang, J. (2021). Beyond brightening low-light images. International Journal of Computer Vision, 129(4), 1013–1037.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants No. 62171038, No. 61936011, No. 62088101, and No. 62006023. Felix Heide was supported by an NSF CAREER Award (2047359), a Packard Foundation Fellowship, a Sloan Research Fellowship, a Sony Young Faculty Award, a Project X Innovation Award, and an Amazon Science Research Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Fu.

Additional information

Communicated by Dengxin Dai.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Fu, Y., Wei, K. et al. Instance Segmentation in the Dark. Int J Comput Vis 131, 2198–2218 (2023). https://doi.org/10.1007/s11263-023-01808-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01808-8

Keywords

Navigation