Skip to main content
Log in

Unified Image Harmonization with Region Augmented Attention Normalization

  • Published:
Annals of Data Science Aims and scope Submit manuscript

Abstract

The image harmonization task endeavors to adjust foreground information within an image synthesis process to achieve visual consistency by leveraging background information. In academic research, this task conventionally involves the utilization of simple synthesized images and matching masks as inputs. However, obtaining precise masks for image harmonization in practical applications poses a significant challenge, thereby creating a notable disparity between research findings and real-world applicability. To mitigate this disparity, we propose a redefinition of the image harmonization task as “Unified Image Harmonization,” where the input comprises only a single image, thereby enhancing its applicability in real-world scenarios. To address this challenge, we have developed a novel framework. Within this framework, we initially employ inharmonious region localization to detect the mask, which is subsequently utilized for harmonization tasks. The pivotal aspect of the harmonization process lies in normalization, which is accountable for information transfer. Nonetheless, the current background-to-foreground information transfer and guidance mechanisms are limited by single-layer guidance, thereby constraining their effectiveness. To overcome this limitation, we introduce Region Augmented Attention Normalization (RA2N), which enhances the attention mechanism for foreground feature alignment, consequently leading to improved alignment and transfer capabilities. Through qualitative and quantitative comparisons on the iHarmony4 dataset, our model exhibits exceptional performance not only in unified image harmonization but also in conventional image harmonization tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The data used in this study are sourced from publicly available datasets. iHarmony4: This dataset is one of the most commonly used datasets in the field of image harmonization. It is accessible at this link (https://github.com/bcmi/Image-Harmonization-Dataset-iHarmony4). Real Composite Images dataset: This dataset is based on real-world scenes and is one of the first datasets to propose a comprehensive approach to image harmonization. It is accessible at this link (https://github.com/wasidennis/DeepHarmonization/tree/master/data). These datasets have been instrumental in advancing research in the field of image harmonization and have been utilized in this study to conduct experiments and draw conclusions. Both datasets are freely available to the public and can be accessed through the provided links.

Code Availability

The complete code used in this study will be made publicly available on GitHub after the publication of the paper. This will include all scripts, libraries, and dependencies necessary to reproduce the results and analyses presented in the study.

References

  1. Cong W, Zhang J, Niu L, et al (2020) Dovenet: deep image harmonization via domain verification. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) pp 8391–8400

  2. Shi Y (2022) Advances in big data analytics. Adv Big Data Anal

  3. Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin New York

  4. Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer Science & Business Media, Berlin

    Book  Google Scholar 

  5. Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178

    Article  Google Scholar 

  6. Liang J, Niu L, Zhang L (2021) Inharmonious region localization. In: 2021 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6

  7. Liang J, Niu L, Wu P, et al (2022) Inharmonious region localization by magnifying domain discrepancy. In: Proceedings of the AAAI conference on artificial intelligence, pp 1574–1582

  8. Chen M, Fridrich J, Goljan M et al (2008) Determining image origin and integrity using sensor noise. IEEE Trans Inf Forensics Secur 3(1):74–90

    Article  Google Scholar 

  9. Zhang L, Wen T, Shi J (2020) Deep image blending. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV)

  10. Reinhard E, Adhikhmin M, Gooch B et al (2001) Color transfer between images. IEEE Comput Graph Appl 21(5):34–41

    Article  Google Scholar 

  11. Lalonde JF, Efros AA (2007) Using color compatibility for assessing image realism. In: 2007 IEEE 11th international conference on computer vision, IEEE, pp 1–8

  12. Pitie F, Kokaram A (2007) The linear monge-kantorovitch linear colour mapping for example-based colour transfer. In: 4th European conference on visual media production, pp 1–9, https://doi.org/10.1049/cp:20070055

  13. Tao MW, Johnson MK, Paris S (2010) Error-tolerant image compositing. European conference on computer vision. Springer, Berlin, pp 31–44

    Google Scholar 

  14. Xue S, Agarwala A, Dorsey J et al (2012) Understanding and improving the realism of image composites. ACM Trans Graph (TOG) 31(4):1–10

    Article  Google Scholar 

  15. Song S, Zhong F, Qin X, et al (2020) Illumination harmonization with gray mean scale. In: Computer graphics international conference, Springer, Berlin, pp 193–205

  16. Xiaohui S, Lin Z, Tsai YH, et al (2020) Harmonizing composite images using deep learning. US Patent 10,867,416

  17. Xue B, Ran S, Chen Q, et al (2022) Dccf: deep comprehensible color filter learning framework for high-resolution image harmonization. In: Proceedings of the European conference on computer vision (ECCV)

  18. Ke Z, Sun C, Zhu L, et al (2022) Harmonizer: Learning to Perform White-Box Image and Video Harmonization. In: Proceedings of the European conference on computer vision (ECCV)

  19. Gardner MA, Sunkavalli K, Yumer E, et al (2017) Learning to predict indoor illumination from a single image. arXiv preprint arXiv:1704.00090

  20. Hold-Geoffroy Y, Sunkavalli K, Hadap S, et al (2017) Deep outdoor illumination estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  21. Guo Z, Zheng H, Jiang Y, et al (2021) Intrinsic image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16367–16376

  22. Cheng Y, Yan J, Wang Z (2019) Enhancement of weakly illuminated images by deep fusion networks. In: 2019 IEEE international conference on image processing (ICIP), pp 924–928, https://doi.org/10.1109/ICIP.2019.8803041

  23. Bao Z, Long C, Fu G, et al (2022) Deep image-based illumination harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18542–18551

  24. Zhan F, Lu S, Zhang C et al (2021) Adversarial image composition with auxiliary illumination. In: Ishikawa H, Liu CL, Pajdla T et al (eds) Computer vision - ACCV 2020. Springer International Publishing, Cham, pp 234–250

    Chapter  Google Scholar 

  25. Ren X, Liu Y (2022) Semantic-guided multi-mask image harmonization. In: Proceedings of the European conference on computer vision (ECCV)

  26. Guo Z, Guo D, Zheng H, et al (2021) Image harmonization with transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 14870–14879

  27. Guo Z, Gu Z, Zheng B, et al (2022) Transformer for image harmonization and beyond. In: IEEE transactions on pattern analysis and machine intelligence pp 1–19. https://doi.org/10.1109/TPAMI.2022.3207091

  28. Zhan F, Huang J, Lu S (2019) Hierarchy composition gan for high-fidelity image synthesis. arXiv preprint arXiv:1905.04693

  29. Yu J, Lin Z, Yang J, et al (2018) Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5505–5514

  30. Cong W, Niu L, Zhang J, et al (2021) Bargainnet: background-guided domain translation for image harmonization. In: 2021 IEEE international conference on multimedia and expo (ICME), pp 1–6, https://doi.org/10.1109/ICME51207.2021.9428394

  31. Cun X, Pun CM (2020) Improving the harmony of the composite image by spatial-separated attention module. IEEE Trans Image Process 29:4759–4771

    Article  Google Scholar 

  32. Hao G, Iizuka S, Fukui K (2020) Image harmonization with attention-based deep feature modulation. In: The British machine vision conference (BMCV)

  33. Wang C, Tang F, Zhang Y, et al (2021) Towards harmonized regional style transfer and manipulation for facial images. arXiv preprint arXiv:2104.14109

  34. Cong W, Tao X, Niu L, et al (2022) High-resolution image harmonization via collaborative dual transformations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18470–18479

  35. Ling J, Xue H, Song L, et al (2021) Region-aware adaptive instance normalization for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9361–9370

  36. Liang J, Niu L, Zhang L (2021) Inharmonious region localization. In: 2021 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6

  37. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, PMLR, pp 448–456

  38. Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022

  39. Yu T, Guo Z, Jin X, et al (2020) Region normalization for image inpainting. In: Proceedings of the AAAI conference on artificial intelligence, pp 12733–12740

  40. Zhao W, Liu X, Zhao Y et al (2021) Normalnet: learning-based mesh normal denoising via local partition normalization. IEEE Trans Circuits Syst Video Technol 31(12):4697–4710

    Article  Google Scholar 

  41. Park T, Liu MY, Wang TC, et al (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2337–2346

  42. Hang Y, Xia B, Yang W, et al (2022) Scs-co: self-consistent style contrastive learning for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 19710–19719

  43. Zhou F, Huang S, Liu B et al (2022) Multi-label image classification via category prototype compositional learning. IEEE Trans Circuits Syst Video Technol 32(7):4513–4525. https://doi.org/10.1109/TCSVT.2021.3128054

    Article  Google Scholar 

  44. Tian Y, Zhu S (2022) Partial domain adaptation on semantic segmentation. IEEE Trans Circuits Syst Video Technol 32(6):3798–3809. https://doi.org/10.1109/TCSVT.2021.3116210

    Article  Google Scholar 

  45. Li W, Li H, Wu Q et al (2020) Headnet: an end-to-end adaptive relational network for head detection. IEEE Trans Circuits Syst Video Technol 30(2):482–494. https://doi.org/10.1109/TCSVT.2019.2890840

    Article  Google Scholar 

  46. Feng W, Lan L, Luo Y et al (2021) Near-online multi-pedestrian tracking via combining multiple consistent appearance cues. IEEE Trans Circuits Syst Video Technol 31(4):1540–1554. https://doi.org/10.1109/TCSVT.2020.3005662

    Article  Google Scholar 

  47. Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  48. Xue A (2021) End-to-end chinese landscape painting creation using generative adversarial networks. In: Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pp 3863–3871

  49. Niu Y, Chen S, Song B et al (2023) Comment-guided semantics-aware image aesthetics assessment. IEEE Trans Circuits Syst Video Technol 33(3):1487–1492. https://doi.org/10.1109/TCSVT.2022.3201510

    Article  Google Scholar 

  50. Li L, Huang Y, Wu J et al (2023) Theme-aware visual attribute reasoning for image aesthetics assessment. IEEE Trans Circuits Syst Video Technol 33(9):4798–4811. https://doi.org/10.1109/TCSVT.2023.3249185

    Article  Google Scholar 

  51. Bhattacharjee D, Zhang T, Süsstrunk S, et al (2022) Mult: An end-to-end multitask learning transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12031–12041

  52. Wu D, Liao MW, Zhang WT et al (2022) Yolop: you only look once for panoptic driving perception. Mach Intell Res 19(6):550–562

    Article  Google Scholar 

  53. Yan Z, Zhang H, Wang B et al (2016) Automatic photo adjustment using deep neural networks. ACM Trans Graph. https://doi.org/10.1145/2790296

    Article  Google Scholar 

  54. Tsai YH, Shen X, Lin Z, et al (2017) Deep image harmonization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2799–2807, :https://doi.org/10.1109/CVPR.2017.299

  55. Xiao Y, Li Y, Wu Y, et al (2019) Auto-retoucher (art)-a framework for background replacement and foreground adjustment. In: 2019 16th international conference on machine vision applications (MVA), IEEE, pp 1–5

  56. Zhang L, Wang J, Xu Y, et al (2020) Nested scale-editing for conditional image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  57. Zhang R, Li W, Zhang Y, et al (2021) Image re-composition via regional content-style decoupling. In: Proceedings of the 29th ACM international conference on multimedia, pp 3–11

  58. Wu P, Niu L, Zhang L (2022) Inharmonious region localization with auxiliary style feature. In: BMVC

  59. Ren X, Li Y, Song C (2021) A generative adversarial framework for optimizing image matting and harmonization simultaneously. In: 2021 IEEE international conference on image processing (ICIP), pp 1354–1358, https://doi.org/10.1109/ICIP42928.2021.9506642

  60. Huang H, Xu S, Cai J, et al (2018) Temporally coherent video harmonization using adversarial networks. arXiv preprint arXiv:1809.01372

  61. Yu J, Lin Z, Yang J, et al (2019) Free-form image inpainting with gated convolution. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 4470–4479, https://doi.org/10.1109/ICCV.2019.00457

  62. Li J, Wen Y, He L (2023) Scconv: spatial and channel reconstruction convolution for feature redundancy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 6153–6162

  63. Chen H, Gu Z, Li Y, et al (2023) Hierarchical dynamic image harmonization. In: ACM Multimedia

  64. Thabtah F, Zhang L, Abdelhamid N (2019) Nba game result prediction using feature analysis and machine learning. Ann Data Sci 6(1):103–116

    Article  Google Scholar 

  65. Reddy SR, Varma GS, Davuluri RL (2024) Deep neural network (DNN) mechanism for identification of diseased and healthy plant leaf images using computer vision. Ann Data Sci 11(1):243–272

    Article  Google Scholar 

  66. Ferrigno S, Cheyette SJ, Piantadosi ST et al (2020) Recursive sequence generation in monkeys, children, US adults, and native amazonians. Sci Adv 6(26):eaaz1002. https://doi.org/10.1126/sciadv.aaz1002

    Article  Google Scholar 

  67. Li B, Wu F, Weinberger KQ, et al (2019) Positional normalization. Adv Neural Inf Process Syst 32

  68. Wang Q, Ma Y, Zhao K, et al (2020) A comprehensive survey of loss functions in machine learning. Ann Data Sci, 1–26

  69. Sofiiuk K, Popenova P, Konushin A (2021) Foreground-aware semantic representations for image harmonization. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1620–1629

  70. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Leibe B, Matas J, Sebe N et al (eds) Computer vision - ECCV 2016. Springer International Publishing, Cham, pp 694–711

    Chapter  Google Scholar 

  71. Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs: the method of paired comparisons. Biometrika 39(3–4):324–345. https://doi.org/10.1093/biomet/39.3-4.324

    Article  Google Scholar 

  72. Zhu JY, Krahenbuhl P, Shechtman E, et al (2015) Learning a discriminative model for the perception of realism in composite images. In: Proceedings of the IEEE international conference on computer vision, pp 3943–3951

  73. Jiang Y, Zhang H, Zhang J, et al (2021) Ssh: a self-supervised framework for image harmonization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4832–4841

  74. Hao G, Iizuka S, Fukui K (2020) Image harmonization with attention-based deep feature modulation. In: BMVC

  75. Cai X, Shi Q, Gao Y et al (2023) A structure-preserving and illumination-consistent cycle framework for image harmonization. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2023.3260620

    Article  Google Scholar 

  76. Liu S, Huynh CP, Chen C, et al (2023) Lemart: Label-efficient masked region transform for image harmonization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 18290–18299

  77. Guerreiro JJA, Nakazawa M, Stenger B (2023) Pct-net: full resolution image harmonization using pixel-wise color transformations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5917–5926

Download references

Acknowledgements

This work has been partially supported by Grants from: National Natural Science Foundation of China (No. 12071458).

Funding

This work has been partially supported by grants from: National Natural Science Foundation of China (No. 12071458). This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

Junjie Hou: Conceptualization, Methodology, Software, Investigation, Formal Analysis, Conceptualization, Funding Acquisition, Resources, Writing - Original Draft, Writing - Review & Editing. Yuqi Zhang: Data Curation, Writing - Original Draft, Supervision; Duo Su: Software, Resources, Supervision.

Corresponding author

Correspondence to Junjie Hou.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Informed consent

None.

Ethical statements

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, J., Zhang, Y. & Su, D. Unified Image Harmonization with Region Augmented Attention Normalization. Ann. Data. Sci. (2024). https://doi.org/10.1007/s40745-024-00531-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40745-024-00531-6

Keywords

Navigation