Delving Deeper into Anti-Aliasing in ConvNets

Zou, Xueyan; Xiao, Fanyi; Yu, Zhiding; Li, Yuheng; Lee, Yong Jae

doi:10.1007/s11263-022-01672-y

Delving Deeper into Anti-Aliasing in ConvNets

Published: 02 October 2022

Volume 131, pages 67–81, (2023)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Xueyan Zou¹,
Fanyi Xiao²,
Zhiding Yu³,
Yuheng Li¹ &
…
Yong Jae Lee¹

1185 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling. It arises as a problem in the context of deep learning as downsampling layers are widely adopted in deep architectures to reduce parameters and computation. The standard solution is to apply a low-pass filter (e.g., Gaussian blur) before downsampling (Zhang in: ICML, 2020). However, it can be suboptimal to apply the same filter across the entire content, as the frequency of feature maps can vary across both spatial locations and feature channels. To tackle this, we propose an adaptive content-aware low-pass filtering layer, which predicts separate filter weights for each spatial location and channel group of the input feature maps. We investigate the effectiveness and generalization of the proposed method across multiple tasks, including image classification, semantic segmentation, instance segmentation, video instance segmentation, and image-to-image translation. Both qualitative and quantitative results demonstrate that our approach effectively adapts to the different feature frequencies to avoid aliasing while preserving useful information for recognition. Code is available at https://maureenzou.github.io/ddac/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency Traces

TempFormer: Temporally Consistent Transformer for Video Denoising

Multiscale image denoising algorithm based on UNet3+

Article 27 March 2024

References

Azulay, A. & Weiss, Y. (2018). Why do deep convolutional networks generalize so poorly to small image transformations? In JMLR.
Beltagy, I., Peters, M.E. & Cohan, A. (2020). Longformer: The long-document transformer. arXiv:2004.05150.
Bietti, A. & Mairal, J.(2017). Invariance and stability of deep convolutional representations. In NeurIPS.
Bloem-Reddy, B. & Teh, Y. W. (2020). Probabilistic symmetries and invariant neural networks. In JMLR.
Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2019). YOLACT: real-time instance segmentation. In ICCV.
Caelli, T. M. & Liu, Z. Q. (1988). On the minimum number of templates required for shift, rotation and size invariant pattern recognition. In Pattern recognition.
Carion, N., Massa, F., Synnaeve, G., Usunier, N. Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In ECCV.
Chaman, A. & Dokmanic, I. (2021). Truly shift-invariant convolutional neural networks. In CVPR.
Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. In CVPR.
Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016) The cityscapes dataset for semantic urban scene understanding. In CVPR.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., & Uszkoreit, J. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.
Everingham, M., Eslami, S. A., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2015). The pascal visual object classes challenge: A retrospective. In IJCV.
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In ICLR.
Gonzales, R. C. & Woods, R. E. (2002). Digital image processing. Prentice Hall.
Gu, Z. (2021). Spatiotemporal inconsistency learning for deepfake video detection. In arXiv:2109.01860.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In ICCV.
He, K., Sun, J., & Tang, X. (2010). Guided image filtering. In ECCV.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
Hu, P., Heilbron, F. C., Wang, O., Lin, Z., Sclaroff, S., & Perazzi, F. (2020). Temporally distributed networks for fast video semantic segmentation. In CVPR.
Hu, P., Perazzi, F., Heilbron, F. C., Wang, O., Lin, Z., Saenko, K., & Sclaroff, S. (2020). Real-time semantic segmentation with fast attention. In ECCV workshop.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR.
Huang, Z., Wang, H., Xing, E. P., & Huang, D. (2020). Self-challenging improves cross-domain generalization. In ECCV.
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In CVPR.
Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V. (2016). Dynamic filter networks. In NeurIPS.
Kannan, H., Kurakin, A., & Goodfellow, I. (2018). Adversarial logit pairing. arXiv:1803.06373.
Karras, T., Aittala, M., Laine, S., Härkönen, E., Hellsten, J., Lehtinen, J., & Aila, T. (2021). Alias-free generative adversarial networks. arXiv:2106.12423.
Krizhevsky, A. & Hinton, G. (2009) Learning multiple layers of features from tiny images, Citeseer.
Kurakin, A., Goodfellow, I., & Bengio, S. (2017). Adversarial examples in the physical world. In ICLR Workshop.
Lee, J., Lee, Y., Kim, J., Kosiorek, A., Choi, S., & Teh, Y. W. (2019). Set transformer: A framework for attention-based permutation-invariant neural networks. In ICML.
Lee, K., Lee, H., Lee, K., & Shin, J. (2017). Training confidence-calibrated classifiers for detecting out-of-distribution samples. In ICLR.
Lee, K., Lee, K., Lee, H., & Shin, J. (2018). A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS.
Li, D., Yang, Y., Song, Y. Z. & Hospedales, T. M. (2017). Deeper, broader and artier domain generalization. In ICCV.
Li, S., Ma, L., Zhang, F., & Ngan, K. N. (2010). Temporal inconsistency measure for video quality assessment. In 28th picture coding symposium.
Li, Y. (1992). Reforming the theory of invariant moments for pattern recognition. In Pattern recognition.
Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., & Zhu, J. (2018). Defense against adversarial attacks using high-level representation guided denoiser. In CVPR.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In ECCV.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In ICLR.
Mairal, J., Koniusz, P., Harchaoui, Z., & Schmid, C. (2014). Convolutional kernel networks. In NeurIPS.
Massa, F. & Girshick, R. (2018). maskrcnn-benchmark: fast, modular reference implementation of instance segmentation and object detection algorithms in PyTorch. https://github.com/facebookresearch/maskrcnn-benchmark. Accessed: [Oct.10 2019].
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. In Nature.
Muandet, K., Balduzzi, D., & Schölkopf, B. (2013). Domain generalization via invariant feature representation. In ICML.
Paris, S., Kornprobst, P., Tumblin, J., & Durand, F. (2009). Bilateral filtering: Theory and applications. Foundations and Trends in Computer Graphics and Vision, 4(1), 1–73.
Park, T., Liu, M. Y., Wang, T. C., & Zhu, J. Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In CVPR.
Proakis, J. G. & Manolakis, D. G. (1992). Digital signal processing. In MPC.
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In EMNLP.
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., & Cohen-Or, D. (2021). Encoding in style: a stylegan encoder for image-to-image translation. In CVPR.
Rosenberg, D. (1974). Box filter. US Patent 3,815,754.
Rowley, H. A., Baluja, S., & Kanade, T. (1998). Rotation invariant neural network-based face detection. In CVPR.
Shankar, V., Dave, A., Roelofs, R., Ramanan, D., Recht, B., & Schmidt, L. (2019). A systematic framework for natural perturbations from videos. arXiv:1906.02168.
Shannon, C. E. (1949). Communication in the presence of noise. In Proceedings of the IRE.
Simonyan, K. & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.
Su, H., Jampani, V., Sun, D., Gallo, O., Learned-Miller, E., & Kautz, J. (2019). Pixel-adaptive convolutional neural networks. In CVPR.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv:1312.6199
Tan, M. & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML.
Tyleček, R. & Šára, R. (2013). Spatial pattern templates for recognition of objects with regular structure. In German conference on pattern recognition, pp. 364–374. Springer.
VainF: DeepLabv3Plus-Pytorch (2020). https://github.com/VainF/DeepLabV3Plus-Pytorch.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. (2017). Attention is all you need. In NeurIPS.
Wang, H., Ge, S., Xing, E. P. & Lipton, Z. C. (2019). Learning robust global representations by penalizing local predictive power. In NeurIPS.
Wang, H., He, Z., Lipton, Z. C. & Xing, E. P. (2019). Learning robust representations by projecting superficial statistics out. In ICLR.
Wang, J., Chen, K., Xu, R., Liu, Z., Loy, C. C., & Lin, D. (2019). Carafe: Content-aware reassembly of features. In ICCV.
Webber, C. J. (1994) Self-organisation of transformation-invariant detectors for constituents of perceptual patterns. Network: Computation in Neural Systems, 5(4), 471–496. https://doi.org/10.1088/0954-898X_5_4_004
Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J., Catanzaro, B. (2018). High-resolution image synthesis and semantic manipulation with conditional gans. In ICCV.
Wood, J. (1996). Invariant pattern recognition: a review. In Pattern recognition.
Wu, Y. & He, K. (2018). Group normalization. In ECCV.
Xie, C., Wu, Y., Maaten, L. v. d., Yuille, A. L., & He, K. (2019). Feature denoising for improving adversarial robustness. In CVPR.
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). Segformer: Simple and efficient design for semantic segmentation with transformers. arXiv:2105.15203.
Yang, L., Fan, Y. & Xu, N. (2019). Video instance segmentation. In ICCV.
Ye, M., Zhang, X., Yuen, P. C. & Chang, S. F. (2019). Unsupervised embedding learning via invariant and spreading instance feature. In CVPR.
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV.
Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. In ICLR.
Zhang, R. (2020). Making convolutional networks shift-invariant again. In ICML.
Zhang, Z., Hua, B. S., Rosen, D. W., & Yeung, S. K. (2019). Rotation invariant convolutions for 3d point clouds deep learning. In 3DV.
Zou, X., Xiao, F., Yu, Z., & Lee, Y. J. (2020). Delving deeper into anti-aliasing in convnets. In BMVC.

Download references

Acknowledgements

This work was supported in part by ARO YIP W911NF17-1-0410, NSF CAREER IIS-2150012, NSF IIS-2204808, NSF CCF-1934568, GCP research credit program, and AWS ML research award.

Author information

Authors and Affiliations

University of Wisconsin-Madison, Madison, USA
Xueyan Zou, Yuheng Li & Yong Jae Lee
Meta AI, Menlo Park, USA
Fanyi Xiao
NVIDIA, Santa Clara, USA
Zhiding Yu

Authors

Xueyan Zou
View author publications
You can also search for this author in PubMed Google Scholar
Fanyi Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiding Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yuheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yong Jae Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xueyan Zou.

Additional information

Communicated by William Smith.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zou, X., Xiao, F., Yu, Z. et al. Delving Deeper into Anti-Aliasing in ConvNets. Int J Comput Vis 131, 67–81 (2023). https://doi.org/10.1007/s11263-022-01672-y

Download citation

Received: 15 September 2021
Accepted: 10 August 2022
Published: 02 October 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11263-022-01672-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Delving Deeper into Anti-Aliasing in ConvNets

Abstract

Access this article

Similar content being viewed by others

VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency Traces

TempFormer: Temporally Consistent Transformer for Video Denoising

Multiscale image denoising algorithm based on UNet3+

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Delving Deeper into Anti-Aliasing in ConvNets

Abstract

Access this article

Similar content being viewed by others

VIFST: Video Inpainting Localization Using Multi-view Spatial-Frequency Traces

TempFormer: Temporally Consistent Transformer for Video Denoising

Multiscale image denoising algorithm based on UNet3+

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation