Abstract
The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.
Similar content being viewed by others
Data Availability
The data used in this study are sourced from publicly available datasets. iHarmony4: This dataset is one of the most commonly used datasets in the field of image harmonization. It is accessible at this link (https://github.com/bcmi/Image-Harmonization-Dataset-iHarmony4). Real Composite Images dataset: This dataset is based on real-world scenes and is one of the first datasets to propose a comprehensive approach to image harmonization. It is accessible at this link (https://github.com/wasidennis/DeepHarmonization/tree/master/data). These datasets have been instrumental in advancing research in the field of image harmonization and have been utilized in this study to conduct experiments and draw conclusions. Both datasets are freely available to the public and can be accessed through the provided links.
Code Availability
The complete code used in this study will be made publicly available on GitHub after the publication of the paper. This will include all scripts, libraries, and dependencies necessary to reproduce the results and analyses presented in the study.
References
Cong W, Zhang J, Niu L, et al (2020) Dovenet: deep image harmonization via domain verification. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) pp 8391–8400
Shi Y (2022) Advances in big data analytics. Adv Big Data Anal
Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin New York
Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer Science & Business Media, Berlin
Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178
Liang J, Niu L, Zhang L (2021) Inharmonious region localization. In: 2021 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6
Liang J, Niu L, Wu P, et al (2022) Inharmonious region localization by magnifying domain discrepancy. In: Proceedings of the AAAI conference on artificial intelligence, pp 1574–1582
Chen M, Fridrich J, Goljan M et al (2008) Determining image origin and integrity using sensor noise. IEEE Trans Inf Forensics Secur 3(1):74–90
Zhang L, Wen T, Shi J (2020) Deep image blending. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV)
Reinhard E, Adhikhmin M, Gooch B et al (2001) Color transfer between images. IEEE Comput Graph Appl 21(5):34–41
Lalonde JF, Efros AA (2007) Using color compatibility for assessing image realism. In: 2007 IEEE 11th international conference on computer vision, IEEE, pp 1–8
Pitie F, Kokaram A (2007) The linear monge-kantorovitch linear colour mapping for example-based colour transfer. In: 4th European conference on visual media production, pp 1–9, https://doi.org/10.1049/cp:20070055
Tao MW, Johnson MK, Paris S (2010) Error-tolerant image compositing. European conference on computer vision. Springer, Berlin, pp 31–44
Xue S, Agarwala A, Dorsey J et al (2012) Understanding and improving the realism of image composites. ACM Trans Graph (TOG) 31(4):1–10
Song S, Zhong F, Qin X, et al (2020) Illumination harmonization with gray mean scale. In: Computer graphics international conference, Springer, Berlin, pp 193–205
Xiaohui S, Lin Z, Tsai YH, et al (2020) Harmonizing composite images using deep learning. US Patent 10,867,416
Xue B, Ran S, Chen Q, et al (2022) Dccf: deep comprehensible color filter learning framework for high-resolution image harmonization. In: Proceedings of the European conference on computer vision (ECCV)
Ke Z, Sun C, Zhu L, et al (2022) Harmonizer: Learning to Perform White-Box Image and Video Harmonization. In: Proceedings of the European conference on computer vision (ECCV)
Gardner MA, Sunkavalli K, Yumer E, et al (2017) Learning to predict indoor illumination from a single image. arXiv preprint arXiv:1704.00090
Hold-Geoffroy Y, Sunkavalli K, Hadap S, et al (2017) Deep outdoor illumination estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Guo Z, Zheng H, Jiang Y, et al (2021) Intrinsic image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16367–16376
Cheng Y, Yan J, Wang Z (2019) Enhancement of weakly illuminated images by deep fusion networks. In: 2019 IEEE international conference on image processing (ICIP), pp 924–928, https://doi.org/10.1109/ICIP.2019.8803041
Bao Z, Long C, Fu G, et al (2022) Deep image-based illumination harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18542–18551
Zhan F, Lu S, Zhang C et al (2021) Adversarial image composition with auxiliary illumination. In: Ishikawa H, Liu CL, Pajdla T et al (eds) Computer vision - ACCV 2020. Springer International Publishing, Cham, pp 234–250
Ren X, Liu Y (2022) Semantic-guided multi-mask image harmonization. In: Proceedings of the European conference on computer vision (ECCV)
Guo Z, Guo D, Zheng H, et al (2021) Image harmonization with transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14870–14879
Guo Z, Gu Z, Zheng B, et al (2022) Transformer for image harmonization and beyond. In: IEEE transactions on pattern analysis and machine intelligence pp 1–19. https://doi.org/10.1109/TPAMI.2022.3207091
Zhan F, Huang J, Lu S (2019) Hierarchy composition gan for high-fidelity image synthesis. arXiv preprint arXiv:1905.04693
Yu J, Lin Z, Yang J, et al (2018) Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5505–5514
Cong W, Niu L, Zhang J, et al (2021) Bargainnet: background-guided domain translation for image harmonization. In: 2021 IEEE international conference on multimedia and expo (ICME), pp 1–6, https://doi.org/10.1109/ICME51207.2021.9428394
Cun X, Pun CM (2020) Improving the harmony of the composite image by spatial-separated attention module. IEEE Trans Image Process 29:4759–4771
Hao G, Iizuka S, Fukui K (2020) Image harmonization with attention-based deep feature modulation. In: The British machine vision conference (BMCV)
Wang C, Tang F, Zhang Y, et al (2021) Towards harmonized regional style transfer and manipulation for facial images. arXiv preprint arXiv:2104.14109
Cong W, Tao X, Niu L, et al (2022) High-resolution image harmonization via collaborative dual transformations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18470–18479
Ling J, Xue H, Song L, et al (2021) Region-aware adaptive instance normalization for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9361–9370
Liang J, Niu L, Zhang L (2021) Inharmonious region localization. In: 2021 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022
Yu T, Guo Z, Jin X, et al (2020) Region normalization for image inpainting. In: Proceedings of the AAAI conference on artificial intelligence, pp 12733–12740
Zhao W, Liu X, Zhao Y et al (2021) Normalnet: learning-based mesh normal denoising via local partition normalization. IEEE Trans Circuits Syst Video Technol 31(12):4697–4710
Park T, Liu MY, Wang TC, et al (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2337–2346
Hang Y, Xia B, Yang W, et al (2022) Scs-co: self-consistent style contrastive learning for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 19710–19719
Zhou F, Huang S, Liu B et al (2022) Multi-label image classification via category prototype compositional learning. IEEE Trans Circuits Syst Video Technol 32(7):4513–4525. https://doi.org/10.1109/TCSVT.2021.3128054
Tian Y, Zhu S (2022) Partial domain adaptation on semantic segmentation. IEEE Trans Circuits Syst Video Technol 32(6):3798–3809. https://doi.org/10.1109/TCSVT.2021.3116210
Li W, Li H, Wu Q et al (2020) Headnet: an end-to-end adaptive relational network for head detection. IEEE Trans Circuits Syst Video Technol 30(2):482–494. https://doi.org/10.1109/TCSVT.2019.2890840
Feng W, Lan L, Luo Y et al (2021) Near-online multi-pedestrian tracking via combining multiple consistent appearance cues. IEEE Trans Circuits Syst Video Technol 31(4):1540–1554. https://doi.org/10.1109/TCSVT.2020.3005662
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Xue A (2021) End-to-end chinese landscape painting creation using generative adversarial networks. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pp 3863–3871
Niu Y, Chen S, Song B et al (2023) Comment-guided semantics-aware image aesthetics assessment. IEEE Trans Circuits Syst Video Technol 33(3):1487–1492. https://doi.org/10.1109/TCSVT.2022.3201510
Li L, Huang Y, Wu J et al (2023) Theme-aware visual attribute reasoning for image aesthetics assessment. IEEE Trans Circuits Syst Video Technol 33(9):4798–4811. https://doi.org/10.1109/TCSVT.2023.3249185
Bhattacharjee D, Zhang T, Süsstrunk S, et al (2022) Mult: An end-to-end multitask learning transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12031–12041
Wu D, Liao MW, Zhang WT et al (2022) Yolop: you only look once for panoptic driving perception. Mach Intell Res 19(6):550–562
Yan Z, Zhang H, Wang B et al (2016) Automatic photo adjustment using deep neural networks. ACM Trans Graph. https://doi.org/10.1145/2790296
Tsai YH, Shen X, Lin Z, et al (2017) Deep image harmonization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2799–2807, :https://doi.org/10.1109/CVPR.2017.299
Xiao Y, Li Y, Wu Y, et al (2019) Auto-retoucher (art)-a framework for background replacement and foreground adjustment. In: 2019 16th international conference on machine vision applications (MVA), IEEE, pp 1–5
Zhang L, Wang J, Xu Y, et al (2020) Nested scale-editing for conditional image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Zhang R, Li W, Zhang Y, et al (2021) Image re-composition via regional content-style decoupling. In: Proceedings of the 29th ACM international conference on multimedia, pp 3–11
Wu P, Niu L, Zhang L (2022) Inharmonious region localization with auxiliary style feature. In: BMVC
Ren X, Li Y, Song C (2021) A generative adversarial framework for optimizing image matting and harmonization simultaneously. In: 2021 IEEE international conference on image processing (ICIP), pp 1354–1358, https://doi.org/10.1109/ICIP42928.2021.9506642
Huang H, Xu S, Cai J, et al (2018) Temporally coherent video harmonization using adversarial networks. arXiv preprint arXiv:1809.01372
Yu J, Lin Z, Yang J, et al (2019) Free-form image inpainting with gated convolution. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4470–4479, https://doi.org/10.1109/ICCV.2019.00457
Li J, Wen Y, He L (2023) Scconv: spatial and channel reconstruction convolution for feature redundancy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6153–6162
Chen H, Gu Z, Li Y, et al (2023) Hierarchical dynamic image harmonization. In: ACM Multimedia
Thabtah F, Zhang L, Abdelhamid N (2019) Nba game result prediction using feature analysis and machine learning. Ann Data Sci 6(1):103–116
Reddy SR, Varma GS, Davuluri RL (2024) Deep neural network (DNN) mechanism for identification of diseased and healthy plant leaf images using computer vision. Ann Data Sci 11(1):243–272
Ferrigno S, Cheyette SJ, Piantadosi ST et al (2020) Recursive sequence generation in monkeys, children, US adults, and native amazonians. Sci Adv 6(26):eaaz1002. https://doi.org/10.1126/sciadv.aaz1002
Li B, Wu F, Weinberger KQ, et al (2019) Positional normalization. Adv Neural Inf Process Syst 32
Wang Q, Ma Y, Zhao K, et al (2020) A comprehensive survey of loss functions in machine learning. Ann Data Sci, 1–26
Sofiiuk K, Popenova P, Konushin A (2021) Foreground-aware semantic representations for image harmonization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1620–1629
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Leibe B, Matas J, Sebe N et al (eds) Computer vision - ECCV 2016. Springer International Publishing, Cham, pp 694–711
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs: the method of paired comparisons. Biometrika 39(3–4):324–345. https://doi.org/10.1093/biomet/39.3-4.324
Zhu JY, Krahenbuhl P, Shechtman E, et al (2015) Learning a discriminative model for the perception of realism in composite images. In: Proceedings of the IEEE international conference on computer vision, pp 3943–3951
Jiang Y, Zhang H, Zhang J, et al (2021) Ssh: a self-supervised framework for image harmonization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4832–4841
Hao G, Iizuka S, Fukui K (2020) Image harmonization with attention-based deep feature modulation. In: BMVC
Cai X, Shi Q, Gao Y et al (2023) A structure-preserving and illumination-consistent cycle framework for image harmonization. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2023.3260620
Liu S, Huynh CP, Chen C, et al (2023) Lemart: Label-efficient masked region transform for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18290–18299
Guerreiro JJA, Nakazawa M, Stenger B (2023) Pct-net: full resolution image harmonization using pixel-wise color transformations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5917–5926
Acknowledgements
This work has been partially supported by Grants from: National Natural Science Foundation of China (No. 12071458).
Funding
This work has been partially supported by grants from: National Natural Science Foundation of China (No. 12071458). This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
Junjie Hou: Conceptualization, Methodology, Software, Investigation, Formal Analysis, Conceptualization, Funding Acquisition, Resources, Writing - Original Draft, Writing - Review & Editing. Yuqi Zhang: Data Curation, Writing - Original Draft, Supervision; Duo Su: Software, Resources, Supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Informed consent
None.
Ethical statements
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hou, J., Zhang, Y. & Su, D. Unified Image Harmonization with Region Augmented Attention Normalization. Ann. Data. Sci. (2024). https://doi.org/10.1007/s40745-024-00531-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40745-024-00531-6