Skip to main content
Log in

A robust PDF watermarking scheme with versatility and compatibility

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Portable Document Format (PDF) files have emerged as a ubiquitous and multi-faceted medium for the dissemination and interchange of information. The watermarking algorithms based on PDF document have spurred much research in academia due to their practical applications in copyright protection, trace-tracking and digital forensics. In this work, we emphasize the unique nature of PDF documents among other multimedia content, and propose a robust PDF watermarking scheme with versatility and compatibility, which is competent in resisting multi-level attacks including text editing, format modification, page extraction, textbox republication, and more. We traverse over the decompressed PDF file for objects related to important elements like text, images and forms, and embed encrypted watermark as a dictionary entry inside the objects. In addition, we have developed a tamper detection algorithm that ensures content integrity and facilitates the identification of tampered areas within the watermarked file. We finally make a comparison between the proposed and existing algorithms, and our algorithm offers high invisibility, scalable embedding capacity, strong robustness. It has excellent compatibility in files with different mediums, content and languages, and it is accessible to diverse file generators and readers. Meanwhile, the method can be applied in various application scenarios like copyright protection, multi-level distribution trace tracking and tamper detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

We declare that all the data associated with the manuscript is mentioned in the manuscript.

References

  1. Zheng P, Dai Q, Li Z, Ye Z, Xiong J, Liu HC, Zheng G, Zhang S (2021) Metasurface-based key for computational imaging encryption. Sci Adv 7(21):eabg0363. https://doi.org/10.1126/sciadv.abg0363

    Article  Google Scholar 

  2. Jiang Z, Liu X (2023) Image encryption algorithm based on discrete quantum baker map and Chen hyperchaotic system. Int J Theor Phys 62(2):22. https://doi.org/10.1007/s10773-023-05277-0

    Article  MathSciNet  Google Scholar 

  3. Mohammed AO, Hussein HI, Mstafa RJ et al (2023) A blind and robust color image watermarking scheme based on DCT and DWT domains. Multimed Tools Appl 82:32855–32881. https://doi.org/10.1007/s11042-023-14797-0

    Article  Google Scholar 

  4. He M, Wang H, Zhang F, Abdullahi SM, Yang L (2023) Robust blind video watermarking against geometric deformations and online video sharing platform processing. IEEE Trans Dependable Secure Comput 20(6):4702–4718. https://doi.org/10.1109/TDSC.2022.3232484

    Article  Google Scholar 

  5. Natgunanathan I, Praitheeshan P, Gao L, Xiang Y, Pan L (2022) Blockchain-based audio watermarking technique for multimedia copyright protection in distribution networks. ACM Trans Multimed Comput Commun Appl (TOMM) 18(3):1–23. https://doi.org/10.1145/3492803

    Article  Google Scholar 

  6. OpenAI (2022) Introducing ChatGPT. https://openai.com/blog/chatgpt. Accessed 1 July 2023

  7. Blum L, Blum M (2023) A theoretical computer science perspective on consciousness and artificial general intelligence. Engineering 25:12–16. https://doi.org/10.1016/j.eng.2023.03.010

    Article  Google Scholar 

  8. Wyatt P (2021) Work in progress: Demystifying PDF through a machine-readable definition. In: 2021 Workshop on Language-Theoretic Security (LangSec), IEEE Symposium on Security and Privacy. IEEE. https://github.com/gangtan/LangSec-papers-and-slides/raw/main/langsec21/papers/Wyatt_LangSec21.pdf. Accessed 30 June 2023

  9. Muralidharan T, Cohen A, Gerson N, Nissim N (2022) File packing from the malware perspective: Techniques, analysis approaches, and directions for enhancements. ACM Comput Surv 55(5):1–45. https://doi.org/10.1145/3530810

    Article  Google Scholar 

  10. Garfinkel SL (2013) Leaking sensitive information in complex document files–and how to prevent it. IEEE Secur Priv 12(1):20–27. https://doi.org/10.1109/msp.2013.131

    Article  Google Scholar 

  11. Brassil JT, Low S, Maxemchuk NF (1999) Copyright protection for the electronic distribution of text documents. Proc IEEE 87(7):1181–1196. https://doi.org/10.1109/5.771071

    Article  Google Scholar 

  12. Zhong S, Cheng X, Chen T (2007) Data hiding in a kind of PDF texts for secret communication. Int J Netw Secur 4(1):17–26. https://doi.org/10.6633/IJNS.200701.4(1).03

    Article  Google Scholar 

  13. Alizadeh F, Canceill N, Dabkiewicz S, Vandevenne D (2012) Using steganography to hide messages inside pdf files. SSN project report. https://www.os3.nl/_media/2012-2013/courses/ssn/using_steganography_to_hide_messages_inside_pdf_files.pdf

  14. Khosravi B, Khosravi B, Khosravi B, Nazarkardeh K (2019) A new method for PDF steganography in justified texts. J Inf Secur appl 45:61–70. https://doi.org/10.1016/j.jisa.2019.01.003

    Article  Google Scholar 

  15. (2020) ISO 32000-2:2020 document management - portable document format - part 2: PDF 2.0, vol-ume 2 of ISO 32000. https://www.iso.org/standard/75839.html. Accessed 1 July 2023

  16. Chen Q, Huang P (2016) A text watermarking algorithm based on character color of PDF. Electron Sci Technol 29(5):5. https://doi.org/10.16180/j.cnki.issn1007-7820.2016.05.026

    Article  Google Scholar 

  17. Bitar AW, Darazi R, Couchot JF (2017) Blind digital watermarking in PDF documents using spread transform dither modulation. Multimedia Tools Appl 76:143–161. https://doi.org/10.1007/s11042-015-3034-2

    Article  Google Scholar 

  18. Hatoum MW, Darazi R, Couchot J (2018) Blind pdf document watermarking robust against pca and ica attacks. In: Proceedings of the 15th international joint conference on e-business and telecommunications - SECRYPT. INSTICC, SciTePress, pp 420–427. https://doi.org/10.5220/0006899605860593

  19. Hatoum MW, Darazi R, Couchot JF (2020) Normalized blind STDM watermarking scheme for images and PDF documents robust against fixed gain attack. Multimedia Tools Appl 79:1887–1919. https://doi.org/10.1007/s11042-019-08242-4

    Article  Google Scholar 

  20. Kuribayashi M, Fukushima T, Funabiki N (2018) Data hiding for text document in PDF file. In: Pan JS, Tsai PW, Watada J, Jain L (eds) Advances in intelligent information hiding and multimedia signal processing. IIH-MSP 2017. Smart innovation, systems and technologies, vol 81. Springer, Cham. https://doi.org/10.1007/978-3-319-63856-0_47

    Chapter  Google Scholar 

  21. Kuribayashi M, Wong K (2020) Improved QM-QIM watermarking scheme for PDF document. In: Digital Forensics and Watermarking: 18th International Workshop, IWDW 2019, Chengdu, China, November 2–4, 2019. Revised Selected Papers 18, pp 171–183. Springer. https://doi.org/10.1007/978-3-030-43575-2_14

  22. Nursiah N, Wong K, Kuribayashi M (2019) Reversible data hiding in PDF document exploiting prefix zeros in glyph coordinates. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 1298–1302. https://doi.org/10.1109/apsipaasc47483.2019.9023310

  23. Kuribayashi M, Wong K (2021) Stealthpdf: Data hiding method for PDF file with no visual degradation. J Inf Secur Applic 61:102875. https://doi.org/10.1016/j.jisa.2021.102875

    Article  Google Scholar 

  24. Lee IS, Tsai WH (2010) A new approach to covert communication via PDF files. Signal processing 90(2):557–565. https://doi.org/10.1016/j.sigpro.2009.07.022

    Article  Google Scholar 

  25. Tyagi S, Dwivedi RK, Saxena AK (2019) A high capacity PDF text steganography technique based on hashing using quadratic probing. Int J Intell Eng Syst 12(3):192–202. https://doi.org/10.22266/ijies2019.0630.20

    Article  Google Scholar 

  26. Ekodeck SGR, Ndoundam R (2016) PDF steganography based on Chinese remainder theorem. J Inf Secur Appl 29:1–15. https://doi.org/10.1016/j.jisa.2015.11.008

    Article  Google Scholar 

  27. Singh B, Sharma M (2022) Efficient watermarking technique for protection and authentication of document images. Multimedia Tools Appl 81(16):22985–23005. https://doi.org/10.1007/s11042-022-12174-x

    Article  Google Scholar 

  28. Abdelnabi S, Fritz M (2021) Adversarial watermarking transformer: towards tracing text prove-nance with data hiding. In: 2021 IEEE Symposium on Security and Privacy (SP). IEEE,  pp 121–140. https://doi.org/10.1109/sp40001.2021.00083

  29. Yang X, Zhang J, Chen K, Zhang W, Ma Z, Wang F, Yu N (2022) Tracing text provenance via context-aware lexical substitution. Proc AAAI Conf Artif Intell 36:11613–11621. https://doi.org/10.1609/aaai.v36i10.21415

    Article  Google Scholar 

  30. Qiang J, Zhu S, Li Y, Zhu Y, Yuan Y, Wu X (2023) Natural language watermarking via paraphraser-based lexical substitution. Artif Intell 317:103859. https://doi.org/10.1016/j.artint.2023.103859

    Article  Google Scholar 

  31. Alakk W, Al-Ahmad H, Kunhu A (2014) A new watermarking algorithm for scanned grey PDF files using dwt and hash function. In: 2014 9th International Symposium on Communication Systems, Networks & Digital Sign (CSNDSP). IEEE, pp 690–693. https://doi.org/10.1109/csndsp.2014.6923915

  32. Mahmoud A, Al Maharmeh H, Al-Ahmad H (2015) A new watermarking algorithm for scanned colored PDF files using dwt and hash function. In: 2015 International Conference on Infor-mation and Communication Technology Research (ICTRC). IEEE (2015), pp 140–143. https://doi.org/10.1109/ictrc.2015.7156441

  33. Mehta S, Prabhakaran B, Nallusamy R, Newton D (2016) mpdf: Framework for watermarking PDF files using image watermarking algorithms. arXiv preprint arXiv:1610.02443. https://doi.org/10.48550/arXiv.1610.02443

  34. Dikanev P, Vybornova Y (2021) Method for protection of PDF documents against counterfeiting using semi-fragile watermarking. In: 2021 International Conference on Information Technology and Nanotechnology (ITNT). IEEE, pp 1–4. https://doi.org/10.1109/itnt52450.2021.9649063

  35. Zhong ZY, XU GA (2012) Digital watermarking algorithm based on structure of PDF document. J Comput Appl 32(10):2776. https://doi.org/10.3724/sp.j.1087.2012.02776

    Article  Google Scholar 

  36. Al Shaikhli IF, Zeki AM, Makarim RH, Pathan ASK (2012) Protection of integrity and ownership of PDF documents using invisible signature. In: 2012 UKSim 14th International Conference on Computer Modelling and Simulation. IEEE, pp 533–537. https://doi.org/10.1109/uksim.2012.81

  37. Zhao W, Guan H, Huang Y, Zhang S (2020) Research on double watermarking algorithm based on pdf document structure. In: 2020 International Conference on Culture-oriented Science & Technology (ICCST). IEEE, pp 298–303. https://doi.org/10.1109/iccst50977.2020.00064

  38. Pypdftk 0.5 Homepage. https://pypi.org/project/pypdftk/. Accessed 17 July 2023

  39. Library Of Congress Web Archiving Program (2019) .gov PDF dataset. [Software, E-Resource]. Retrieved from the Library of Congress, https://www.loc.gov/item/2020445568/

  40. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  41. Ernawan F, Ariatmanto D (2023) A recent survey on image watermarking using scaling factor techniques for copyright protection. Multimedia Tools Appl 82:27123–27163. https://doi.org/10.1007/s11042-023-14447-5

    Article  Google Scholar 

  42. Khadam U, Iqbal MM, Azam MA, Khalid S, Rho S, Chilamkurti N (2019) Digital watermarking technique for text document protection using data mining analysis. IEEE Access 7:64955–64965. https://doi.org/10.1109/access.2019.2916674

    Article  Google Scholar 

  43. Singh AK, Kumar B, Dave M et al (2015) Robust and Imperceptible Dual Watermarking for Telemedicine Applications. Wirel Personal Commun 80:1415–1433. https://doi.org/10.1007/s11277-014-2091-6

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 62272331 and 61972269, and Sichuan Science and Technology Program under Grant 2022YFG0320.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxia Wang.

Ethics declarations

Conflict of interests

The authors declare that they have no competing financial interests or personal relationships that could have appeared to influence the work in this paper.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Z., Wang, H. & Han, S. A robust PDF watermarking scheme with versatility and compatibility. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18151-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18151-w

Keywords

Navigation