Abstract
Existing instance segmentation techniques are primarily tailored for high-visibility inputs, but their performance significantly deteriorates in extremely low-light environments. In this work, we take a deep look at instance segmentation in the dark and introduce several techniques that substantially boost the low-light inference accuracy. The proposed method is motivated by the observation that noise in low-light images introduces high-frequency disturbances to the feature maps of neural networks, thereby significantly degrading performance. To suppress this “feature noise”, we propose a novel learning method that relies on an adaptive weighted downsampling layer, a smooth-oriented convolutional block, and disturbance suppression learning. These components effectively reduce feature noise during downsampling and convolution operations, enabling the model to learn disturbance-invariant features. Furthermore, we discover that high-bit-depth RAW images can better preserve richer scene information in low-light conditions compared to typical camera sRGB outputs, thus supporting the use of RAW-input algorithms. Our analysis indicates that high bit-depth can be critical for low-light instance segmentation. To mitigate the scarcity of annotated RAW datasets, we leverage a low-light RAW synthetic pipeline to generate realistic low-light data. In addition, to facilitate further research in this direction, we capture a real-world low-light instance segmentation dataset comprising over two thousand paired low/normal-light images with instance-level pixel-wise annotations. Remarkably, without any image preprocessing, we achieve satisfactory performance on instance segmentation in very low light (4% AP higher than state-of-the-art competitors), meanwhile opening new opportunities for future research. Our code and dataset are publicly available to the community (https://github.com/Linwei-Chen/LIS).
Similar content being viewed by others
Notes
To make the detector compatible with sRGB inputs, instead of the Bayer RAW images, we follow (Chen et al., 2019a) to use demosaicked 3-channel RAW-RGB images as inputs, where the green channel is obtained by averaging the two green pixels in each two-by-two Bayer block. In the following, we refer to "RAW" and "RAW-RGB" interchangeably.
We use COCO samples belonging to the same 8 object classes in the LIS dataset.
References
Anaya, J., & Barbu, A. (2018). Renoir: A dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51(1), 144–154.
Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2019). Yolact: Real-time instance segmentation. In Proceedings of IEEE international conference on computer vision (pp. 9157–9166).
Brooks, T., Mildenhall, B., Xue, T., Chen, J., Sharlet, D., & Barron, J. T. (2019). Unprocessing images for learned raw denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 11036–11045).
Chen, C., Chen, Q., Do, M. N., & Koltun, V. (2019a). Seeing motion in the dark. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3185–3194).
Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the dark. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3291–3300).
Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., & Yan, Y. (2020). Blendmask: Top-down meets bottom-up for instance segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 8573–8581).
Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., & Ouyang, W., et al. (2019b). Hybrid task cascade for instance segmentation. In Proceedings of IEEE international conference on computer vision (pp. 4974–4983).
Chen, L., Fu, Y., You, S., & Liu, H. (2021). Efficient hybrid supervision for instance segmentation in aerial images. Remote Sensing, 13(2), 252.
Chen, L., Fu, Y., You, S., & Liu, H. (2022). Hybrid supervised instance segmentation by learning label noise suppression. Neurocomputing, 496, 131–146.
Cheng, B., Misra, I., Schwing, A. G., Kirillov, A., & Girdhar, R. (2022). Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1290–1299).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of IEEE international conference on computer vision (pp. 3213–3223).
Cui, Z., Qi, G. J., Gu, L., You, S., Zhang, Z., & Harada, T. (2021). Multitask aet with orthogonal tangent regularity for dark object detection. In Proceedings of IEEE international conference on computer vision (pp. 2553–2562).
Dai, D., Sakaridis, C., Hecker, S., & Van Gool, L. (2020). Curriculum model adaptation with synthetic and real data for semantic foggy scene understanding. International Journal of Computer Vision, 128(5), 1182–1204.
Dai, D., & Van Gool, L. (2018). Dark model adaptation: Semantic image segmentation from daytime to nighttime. In Proceedings of international conference on intelligent transportation systems (pp. 3819–3824).
Dang-Nguyen, D. T., Pasquini, C., Conotter, V., & Boato, G. (2015). Raise: A raw images dataset for digital image forensics. In Proceedings of the 6th ACM multimedia systems conference (pp. 219–224).
De Brabandere, B., Neven, D., & Van Gool, L. (2017). Semantic instance segmentation for autonomous driving. In Proceedings of IEEE conference on computer vision and pattern recognition workshops (pp. 7–9).
Diamond, S., Sitzmann, V., Julca-Aguilar, F., Boyd, S., Wetzstein, G., & Heide, F. (2021). Dirty pixels: Towards end-to-end image processing and perception. ACM Transactions on Graphics, 40(3), 1–15.
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., & Sun, J. (2021). Repvgg: Making vgg-style convnets great again. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13733–13742).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Fang, K., Bai, Y., Hinterstoisser, S., Savarese, S., & Kalakrishnan, M. (2018). Multi-task domain adaptation for deep learning of instance grasping from simulation. In Proceedings of IEEE international conference on robotics and automation (pp. 3516–3523).
Foi, A., Trimeche, M., Katkovnik, V., & Egiazarian, K. (2008). Practical Poissonian–Gaussian noise modeling and fitting for single-image raw-data. IEEE Transactions on Image Processing, 17(10), 1737–1754.
Fu, Y., Hong, Y., Chen, L., & You, S. (2022). Le-gan: Unsupervised low-light image enhancement network using attention module and identity invariant loss. Knowledge-Based Systems, 240, 108010.
Fu, Y., Zhang, T., Wang, L., & Huang, H. (2021). Coded hyperspectral image reconstruction using deep external and internal learning. IEEE Transactions Pattern Analysis and Machine Intelligence, 44(7), 3404–3420.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2414–2423).
Gnanasambandam, A., & Chan, S. H. (2020). Image classification in the dark using quanta image sensors. In Proceedings of European conference on computer vision (pp. 484–501).
Gonzalez, R. C., & Woods, R. E., et al. (2002). Digital image processing.
Gu, S., Li, Y., Gool, L. V., & Timofte, R. (2019). Self-guided network for fast image denoising. In Proceedings of IEEE international conference on computer vision (pp. 2511–2520).
Guo, C., Li, C., Guo, J., Loy, C. C., Hou, J., Kwong, S., & Cong, R. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1780–1789).
Hahn, J., Tai, X. C., Borok, S., & Bruckstein, A. M. (2011). Orientation-matching minimization for image denoising and inpainting. International Journal of Computer Vision, 92(3), 308–324.
Hajiaboli, M. R. (2011). An anisotropic fourth-order diffusion filter for image noise removal. International Journal of Computer Vision, 92(2), 177–191.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of IEEE international conference on computer vision (pp. 2961–2969).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 770–778).
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7)
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask scoring r-cnn. In Proceedings of IEEE international conference on computer vision (pp. 6409–6418).
Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P., & Wang, Z. (2021). Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30(1), 2340–2349.
Julca-Aguilar, F., Taylor, J., Bijelic, M., Mannan, F., Tseng, E., & Heide, F. (2021). Gated3d: Monocular 3d object detection from temporal illumination cues. In Proceedings of IEEE international conference on computer vision (pp. 2938–2948).
Kirillov, A., Wu, Y., He, K., & Girshick, R. (2020). Pointrend: Image segmentation as rendering. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 9799–9808).
Lamba, M., & Mitra, K. (2021). Restoring extremely dark images in real time. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 3487–3497).
Lee, Y., & Park, J. (2019). Centermask: Real-time anchor-free instance segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13906–13915).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017a). Feature pyramid networks for object detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2117–2125).
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017b). Focal loss for dense object detection. In Proceedings of IEEE international conference on computer vision (pp. 2980–2988).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014a). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (pp. 740–755).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Proceedings of European conference on computer vision (pp. 740–755).
Liu, D., Wen, B., Jiao, J., Liu, X., Wang, Z., & Huang, T. S. (2020). Connecting image denoising and high-level vision tasks via deep learning. IEEE TIP, 29(1), 3695–3706.
Liu, J., Xu, D., Yang, W., Fan, M., & Huang, H. (2021). Benchmarking low-light image enhancement and beyond. International Journal of Computer Vision, 129(4), 1153–1184.
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A survey. International Journal of Computer Vision, 128(2), 261–318.
Liu, Y., Qin, Z., Anwar, S., Ji, P., Kim, D., Caldwell, S., & Gedeon, T. (2021b). Invertible denoising network: A light solution for real noise removal. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 13365–13374).
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021c). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of IEEE international conference on computer vision (pp. 10012–10022).
Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T., & Xie, S. (2022). A convnet for the 2020s. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 11976–11986).
Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178(1), 30–42.
Lore, K. G., Akintayo, A., & Sarkar, S. (2017). Llnet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition, 61, 650–662.
Lv, F., Li, Y., & Lu, F. (2021). Attention guided low-light image enhancement with a large scale low-light simulation dataset. International Journal of Computer Vision, 129(7), 2175–2193.
Mohan, R., & Valada, A. (2021). Efficientps: Efficient panoptic segmentation. International Journal of Computer Vision, 129(5), 1551–1579.
Morawski, I., Chen, Y. A., Lin, Y. S., & Hsu, W. H. (2021). Nod: Taking a closer look at detection under extreme low-light conditions with night object detection dataset. In Proceedings of the British machine vision conference (pp. 1–13).
Plotz, T., & Roth, S. (2017). Benchmarking denoising algorithms with real photographs. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 1586–1595).
Punnappurath, A., Abuolaim, A., Abdelhamed, A., Levinshtein, A., & Brown, M. S. (2022). Day-to-night image synthesis for training nighttime neural isps. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 10769–10778).
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 779–788)
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of advances in neural information processing systems (pp. 91–99).
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234–241).
Sakaridis, C., Dai, D., & Van Gool, L. (2018). Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision, 126(9), 973–992.
Sakaridis, C., Dai, D., & Van Gool, V. (2019). Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In Proceedings of IEEE international conference on computer vision (pp. 7374–7383).
Sasagawa, Y., & Nagahara, H. (2020). Yolo in the dark-domain adaptation method for merging multiple models. In Proceedings of European conference on computer vision (pp. 345–359).
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In Proceedings of international conference on learning representations (pp. 1–10).
Tan, S., & Jiao, L. (2007). Multivariate statistical models for image denoising in the wavelet domain. International Journal of Computer Vision, 75(2), 209–230.
Tan, X., Xu, K., Cao, Y., Zhang, Y., Ma, L., & Lau, R. W. (2021). Night-time scene parsing with a large real dataset. IEEE Transactions on Image Processing, 30(1), 9085–9098.
Tian, Z., Shen, C., Chen, H., & He, T. (2019). Fcos: Fully convolutional one-stage object detection. In Proceedings of IEEE international conference on computer vision (pp. 9627–9636).
Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2020). Deep image prior. International Journal of Computer Vision, 128(7), 1867–1889.
Wang, W., Wei, C., Yang, W., & Liu, J. (2018a). Gladnet: Low-light enhancement network with global awareness. In Proceedings of IEEE international conference on automatic face & gesture recognition (pp. 751–755).
Wang, W., Yang, W., & Liu, J. (2021). Hla-face: Joint high-low adaptation for low light face detection. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 16195–16204).
Wang, X., Girshick, R., Gupta, A., & He, K. (2018b). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794–7803).
Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep retinex decomposition for low-light enhancement. In Proceedings of the British machine vision conference (pp. 1–12).
Wei, K., Fu, Y., Yang, J., & Huang, H. (2020). A physics-based noise formation model for extreme low-light raw denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2758–2767).
Wei, K., Fu, Y., Zheng, Y., & Yang, J. (2021). Physics-based noise modeling for extreme low-light photography. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(1), 1–17.
Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018). Cbam: Convolutional block attention module. In Proceedings of European conference on computer vision (pp. 3–19).
Xiang, Y., Fu, Y., Zhang, L., & Huang, H. (2019). An effective network with convlstm for low-light image enhancement. In Pattern recognition and computer vision (pp. 221–233).
Xie, C., Wu, Y., Maaten, L. V. D., Yuille, A. L., & He, K. (2019). Feature denoising for improving adversarial robustness. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 501–509).
Xu, K., Yang, X., Yin, B., & Lau, R. W. (2020). Learning to restore low-light images via decomposition-and-enhancement. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 2281–2290).
Yang, H., Kaixuan, W., Linwei, C., & Ying, F. (2021). Crafting object detection in very low light. In Proceedings of the British machine vision conference (pp. 1–15).
Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W. J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S., et al. (2020). Advancing image understanding in poor visibility environments: A collective benchmark study. IEEE Transactions on Image Processing, 29(1), 5737–5752.
Yang, W., Yuan, Y., Ren, W., Liu, J., Scheirer, W. J., Wang, Z., Zhang, T., Zhong, Q., Xie, D., Pu, S., et al. (2020). Advancing image understanding in poor visibility environments: A collective benchmark study. IEEE TIP, 29(1), 5737–5752.
Zhang, F., Li, Y., You, S., & Fu, Y. (2021a). Learning temporal consistency for low light video enhancement from single images. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 4967–4976).
Zhang, T., Fu, Y., & Zhang, J. (2022). Guided hyperspectral image denoising with realistic data. International Journal of Computer Vision, 130(11), 2885–2901.
Zhang, Y., Guo, X., Ma, J., Liu, W., & Zhang, J. (2021). Beyond brightening low-light images. International Journal of Computer Vision, 129(4), 1013–1037.
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grants No. 62171038, No. 61936011, No. 62088101, and No. 62006023. Felix Heide was supported by an NSF CAREER Award (2047359), a Packard Foundation Fellowship, a Sloan Research Fellowship, a Sony Young Faculty Award, a Project X Innovation Award, and an Amazon Science Research Award.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Dengxin Dai.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, L., Fu, Y., Wei, K. et al. Instance Segmentation in the Dark. Int J Comput Vis 131, 2198–2218 (2023). https://doi.org/10.1007/s11263-023-01808-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-023-01808-8