Unified Image Harmonization with Region Augmented Attention Normalization

Hou, Junjie; Zhang, Yuqi; Su, Duo

doi:10.1007/s40745-024-00531-6

Unified Image Harmonization with Region Augmented Attention Normalization

Published: 11 May 2024

(2024)
Cite this article

Annals of Data Science Aims and scope Submit manuscript

Junjie Hou ORCID: orcid.org/0009-0007-4846-4081^1,4,5,
Yuqi Zhang^2,4,5 &
Duo Su^3,4,5

26 Accesses
Explore all metrics

Abstract

The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Fine-Grained Region Matching for Image Harmonization

Multi-attention fusion transformer for single-image super-resolution

Article Open access 03 May 2024

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Data Availability

The data used in this study are sourced from publicly available datasets. iHarmony4: This dataset is one of the most commonly used datasets in the field of image harmonization. It is accessible at this link (https://github.com/bcmi/Image-Harmonization-Dataset-iHarmony4). Real Composite Images dataset: This dataset is based on real-world scenes and is one of the first datasets to propose a comprehensive approach to image harmonization. It is accessible at this link (https://github.com/wasidennis/DeepHarmonization/tree/master/data). These datasets have been instrumental in advancing research in the field of image harmonization and have been utilized in this study to conduct experiments and draw conclusions. Both datasets are freely available to the public and can be accessed through the provided links.

Code Availability

The complete code used in this study will be made publicly available on GitHub after the publication of the paper. This will include all scripts, libraries, and dependencies necessary to reproduce the results and analyses presented in the study.

References

Cong W, Zhang J, Niu L, et al (2020) Dovenet: deep image harmonization via domain verification. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) pp 8391–8400
Shi Y (2022) Advances in big data analytics. Adv Big Data Anal
Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin New York
Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer Science & Business Media, Berlin
Book Google Scholar
Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178
Article Google Scholar
Liang J, Niu L, Zhang L (2021) Inharmonious region localization. In: 2021 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6
Liang J, Niu L, Wu P, et al (2022) Inharmonious region localization by magnifying domain discrepancy. In: Proceedings of the AAAI conference on artificial intelligence, pp 1574–1582
Chen M, Fridrich J, Goljan M et al (2008) Determining image origin and integrity using sensor noise. IEEE Trans Inf Forensics Secur 3(1):74–90
Article Google Scholar
Zhang L, Wen T, Shi J (2020) Deep image blending. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV)
Reinhard E, Adhikhmin M, Gooch B et al (2001) Color transfer between images. IEEE Comput Graph Appl 21(5):34–41
Article Google Scholar
Lalonde JF, Efros AA (2007) Using color compatibility for assessing image realism. In: 2007 IEEE 11th international conference on computer vision, IEEE, pp 1–8
Pitie F, Kokaram A (2007) The linear monge-kantorovitch linear colour mapping for example-based colour transfer. In: 4th European conference on visual media production, pp 1–9, https://doi.org/10.1049/cp:20070055
Tao MW, Johnson MK, Paris S (2010) Error-tolerant image compositing. European conference on computer vision. Springer, Berlin, pp 31–44
Google Scholar
Xue S, Agarwala A, Dorsey J et al (2012) Understanding and improving the realism of image composites. ACM Trans Graph (TOG) 31(4):1–10
Article Google Scholar
Song S, Zhong F, Qin X, et al (2020) Illumination harmonization with gray mean scale. In: Computer graphics international conference, Springer, Berlin, pp 193–205
Xiaohui S, Lin Z, Tsai YH, et al (2020) Harmonizing composite images using deep learning. US Patent 10,867,416
Xue B, Ran S, Chen Q, et al (2022) Dccf: deep comprehensible color filter learning framework for high-resolution image harmonization. In: Proceedings of the European conference on computer vision (ECCV)
Ke Z, Sun C, Zhu L, et al (2022) Harmonizer: Learning to Perform White-Box Image and Video Harmonization. In: Proceedings of the European conference on computer vision (ECCV)
Gardner MA, Sunkavalli K, Yumer E, et al (2017) Learning to predict indoor illumination from a single image. arXiv preprint arXiv:1704.00090
Hold-Geoffroy Y, Sunkavalli K, Hadap S, et al (2017) Deep outdoor illumination estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Guo Z, Zheng H, Jiang Y, et al (2021) Intrinsic image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16367–16376
Cheng Y, Yan J, Wang Z (2019) Enhancement of weakly illuminated images by deep fusion networks. In: 2019 IEEE international conference on image processing (ICIP), pp 924–928, https://doi.org/10.1109/ICIP.2019.8803041
Bao Z, Long C, Fu G, et al (2022) Deep image-based illumination harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18542–18551
Zhan F, Lu S, Zhang C et al (2021) Adversarial image composition with auxiliary illumination. In: Ishikawa H, Liu CL, Pajdla T et al (eds) Computer vision - ACCV 2020. Springer International Publishing, Cham, pp 234–250
Chapter Google Scholar
Ren X, Liu Y (2022) Semantic-guided multi-mask image harmonization. In: Proceedings of the European conference on computer vision (ECCV)
Guo Z, Guo D, Zheng H, et al (2021) Image harmonization with transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14870–14879
Guo Z, Gu Z, Zheng B, et al (2022) Transformer for image harmonization and beyond. In: IEEE transactions on pattern analysis and machine intelligence pp 1–19. https://doi.org/10.1109/TPAMI.2022.3207091
Zhan F, Huang J, Lu S (2019) Hierarchy composition gan for high-fidelity image synthesis. arXiv preprint arXiv:1905.04693
Yu J, Lin Z, Yang J, et al (2018) Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5505–5514
Cong W, Niu L, Zhang J, et al (2021) Bargainnet: background-guided domain translation for image harmonization. In: 2021 IEEE international conference on multimedia and expo (ICME), pp 1–6, https://doi.org/10.1109/ICME51207.2021.9428394
Cun X, Pun CM (2020) Improving the harmony of the composite image by spatial-separated attention module. IEEE Trans Image Process 29:4759–4771
Article Google Scholar
Hao G, Iizuka S, Fukui K (2020) Image harmonization with attention-based deep feature modulation. In: The British machine vision conference (BMCV)
Wang C, Tang F, Zhang Y, et al (2021) Towards harmonized regional style transfer and manipulation for facial images. arXiv preprint arXiv:2104.14109
Cong W, Tao X, Niu L, et al (2022) High-resolution image harmonization via collaborative dual transformations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18470–18479
Ling J, Xue H, Song L, et al (2021) Region-aware adaptive instance normalization for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9361–9370
Liang J, Niu L, Zhang L (2021) Inharmonious region localization. In: 2021 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022
Yu T, Guo Z, Jin X, et al (2020) Region normalization for image inpainting. In: Proceedings of the AAAI conference on artificial intelligence, pp 12733–12740
Zhao W, Liu X, Zhao Y et al (2021) Normalnet: learning-based mesh normal denoising via local partition normalization. IEEE Trans Circuits Syst Video Technol 31(12):4697–4710
Article Google Scholar
Park T, Liu MY, Wang TC, et al (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2337–2346
Hang Y, Xia B, Yang W, et al (2022) Scs-co: self-consistent style contrastive learning for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 19710–19719
Zhou F, Huang S, Liu B et al (2022) Multi-label image classification via category prototype compositional learning. IEEE Trans Circuits Syst Video Technol 32(7):4513–4525. https://doi.org/10.1109/TCSVT.2021.3128054
Article Google Scholar
Tian Y, Zhu S (2022) Partial domain adaptation on semantic segmentation. IEEE Trans Circuits Syst Video Technol 32(6):3798–3809. https://doi.org/10.1109/TCSVT.2021.3116210
Article Google Scholar
Li W, Li H, Wu Q et al (2020) Headnet: an end-to-end adaptive relational network for head detection. IEEE Trans Circuits Syst Video Technol 30(2):482–494. https://doi.org/10.1109/TCSVT.2019.2890840
Article Google Scholar
Feng W, Lan L, Luo Y et al (2021) Near-online multi-pedestrian tracking via combining multiple consistent appearance cues. IEEE Trans Circuits Syst Video Technol 31(4):1540–1554. https://doi.org/10.1109/TCSVT.2020.3005662
Article Google Scholar
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Xue A (2021) End-to-end chinese landscape painting creation using generative adversarial networks. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pp 3863–3871
Niu Y, Chen S, Song B et al (2023) Comment-guided semantics-aware image aesthetics assessment. IEEE Trans Circuits Syst Video Technol 33(3):1487–1492. https://doi.org/10.1109/TCSVT.2022.3201510
Article Google Scholar
Li L, Huang Y, Wu J et al (2023) Theme-aware visual attribute reasoning for image aesthetics assessment. IEEE Trans Circuits Syst Video Technol 33(9):4798–4811. https://doi.org/10.1109/TCSVT.2023.3249185
Article Google Scholar
Bhattacharjee D, Zhang T, Süsstrunk S, et al (2022) Mult: An end-to-end multitask learning transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12031–12041
Wu D, Liao MW, Zhang WT et al (2022) Yolop: you only look once for panoptic driving perception. Mach Intell Res 19(6):550–562
Article Google Scholar
Yan Z, Zhang H, Wang B et al (2016) Automatic photo adjustment using deep neural networks. ACM Trans Graph. https://doi.org/10.1145/2790296
Article Google Scholar
Tsai YH, Shen X, Lin Z, et al (2017) Deep image harmonization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2799–2807, :https://doi.org/10.1109/CVPR.2017.299
Xiao Y, Li Y, Wu Y, et al (2019) Auto-retoucher (art)-a framework for background replacement and foreground adjustment. In: 2019 16th international conference on machine vision applications (MVA), IEEE, pp 1–5
Zhang L, Wang J, Xu Y, et al (2020) Nested scale-editing for conditional image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Zhang R, Li W, Zhang Y, et al (2021) Image re-composition via regional content-style decoupling. In: Proceedings of the 29th ACM international conference on multimedia, pp 3–11
Wu P, Niu L, Zhang L (2022) Inharmonious region localization with auxiliary style feature. In: BMVC
Ren X, Li Y, Song C (2021) A generative adversarial framework for optimizing image matting and harmonization simultaneously. In: 2021 IEEE international conference on image processing (ICIP), pp 1354–1358, https://doi.org/10.1109/ICIP42928.2021.9506642
Huang H, Xu S, Cai J, et al (2018) Temporally coherent video harmonization using adversarial networks. arXiv preprint arXiv:1809.01372
Yu J, Lin Z, Yang J, et al (2019) Free-form image inpainting with gated convolution. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4470–4479, https://doi.org/10.1109/ICCV.2019.00457
Li J, Wen Y, He L (2023) Scconv: spatial and channel reconstruction convolution for feature redundancy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6153–6162
Chen H, Gu Z, Li Y, et al (2023) Hierarchical dynamic image harmonization. In: ACM Multimedia
Thabtah F, Zhang L, Abdelhamid N (2019) Nba game result prediction using feature analysis and machine learning. Ann Data Sci 6(1):103–116
Article Google Scholar
Reddy SR, Varma GS, Davuluri RL (2024) Deep neural network (DNN) mechanism for identification of diseased and healthy plant leaf images using computer vision. Ann Data Sci 11(1):243–272
Article Google Scholar
Ferrigno S, Cheyette SJ, Piantadosi ST et al (2020) Recursive sequence generation in monkeys, children, US adults, and native amazonians. Sci Adv 6(26):eaaz1002. https://doi.org/10.1126/sciadv.aaz1002
Article Google Scholar
Li B, Wu F, Weinberger KQ, et al (2019) Positional normalization. Adv Neural Inf Process Syst 32
Wang Q, Ma Y, Zhao K, et al (2020) A comprehensive survey of loss functions in machine learning. Ann Data Sci, 1–26
Sofiiuk K, Popenova P, Konushin A (2021) Foreground-aware semantic representations for image harmonization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1620–1629
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Leibe B, Matas J, Sebe N et al (eds) Computer vision - ECCV 2016. Springer International Publishing, Cham, pp 694–711
Chapter Google Scholar
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs: the method of paired comparisons. Biometrika 39(3–4):324–345. https://doi.org/10.1093/biomet/39.3-4.324
Article Google Scholar
Zhu JY, Krahenbuhl P, Shechtman E, et al (2015) Learning a discriminative model for the perception of realism in composite images. In: Proceedings of the IEEE international conference on computer vision, pp 3943–3951
Jiang Y, Zhang H, Zhang J, et al (2021) Ssh: a self-supervised framework for image harmonization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4832–4841
Hao G, Iizuka S, Fukui K (2020) Image harmonization with attention-based deep feature modulation. In: BMVC
Cai X, Shi Q, Gao Y et al (2023) A structure-preserving and illumination-consistent cycle framework for image harmonization. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2023.3260620
Article Google Scholar
Liu S, Huynh CP, Chen C, et al (2023) Lemart: Label-efficient masked region transform for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18290–18299
Guerreiro JJA, Nakazawa M, Stenger B (2023) Pct-net: full resolution image harmonization using pixel-wise color transformations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5917–5926

Download references

Acknowledgements

This work has been partially supported by Grants from: National Natural Science Foundation of China (No. 12071458).

Funding

This work has been partially supported by grants from: National Natural Science Foundation of China (No. 12071458). This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Sino-Danish College, University of Chinese Academy of Sciences, Zhongguancun East Rd, Beijing, 100190, China
Junjie Hou
School of Mathematical Sciences, University of Chinese Academy of Sciences, Zhongguancun East Rd, Beijing, 100190, China
Yuqi Zhang
School of Computer Science and Technology, University of Chinese Academy of Sciences, Zhongguancun East Rd, Beijing, 100190, China
Duo Su
Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Zhongguancun East Rd, Beijing, 100190, China
Junjie Hou, Yuqi Zhang & Duo Su
Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Zhongguancun East Rd, Beijing, 100190, China
Junjie Hou, Yuqi Zhang & Duo Su

Authors

Junjie Hou
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Duo Su
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Junjie Hou: Conceptualization, Methodology, Software, Investigation, Formal Analysis, Conceptualization, Funding Acquisition, Resources, Writing - Original Draft, Writing - Review & Editing. Yuqi Zhang: Data Curation, Writing - Original Draft, Supervision; Duo Su: Software, Resources, Supervision.

Corresponding author

Correspondence to Junjie Hou.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Informed consent

None.

Ethical statements

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hou, J., Zhang, Y. & Su, D. Unified Image Harmonization with Region Augmented Attention Normalization. Ann. Data. Sci. (2024). https://doi.org/10.1007/s40745-024-00531-6

Download citation

Received: 28 December 2023
Revised: 21 March 2024
Accepted: 28 March 2024
Published: 11 May 2024
DOI: https://doi.org/10.1007/s40745-024-00531-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unified Image Harmonization with Region Augmented Attention Normalization

Abstract

Access this article

Similar content being viewed by others

Adaptive Fine-Grained Region Matching for Image Harmonization

Multi-attention fusion transformer for single-image super-resolution

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Data Availability

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Ethical statements

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unified Image Harmonization with Region Augmented Attention Normalization

Abstract

Access this article

Similar content being viewed by others

Adaptive Fine-Grained Region Matching for Image Harmonization

Multi-attention fusion transformer for single-image super-resolution

ASpanFormer: Detector-Free Image Matching with Adaptive Span Transformer

Data Availability

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Informed consent

Ethical statements

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation