Abstract
Portable Document Format (PDF) files have emerged as a ubiquitous and multi-faceted medium for the dissemination and interchange of information. The watermarking algorithms based on PDF document have spurred much research in academia due to their practical applications in copyright protection, trace-tracking and digital forensics. In this work, we emphasize the unique nature of PDF documents among other multimedia content, and propose a robust PDF watermarking scheme with versatility and compatibility, which is competent in resisting multi-level attacks including text editing, format modification, page extraction, textbox republication, and more. We traverse over the decompressed PDF file for objects related to important elements like text, images and forms, and embed encrypted watermark as a dictionary entry inside the objects. In addition, we have developed a tamper detection algorithm that ensures content integrity and facilitates the identification of tampered areas within the watermarked file. We finally make a comparison between the proposed and existing algorithms, and our algorithm offers high invisibility, scalable embedding capacity, strong robustness. It has excellent compatibility in files with different mediums, content and languages, and it is accessible to diverse file generators and readers. Meanwhile, the method can be applied in various application scenarios like copyright protection, multi-level distribution trace tracking and tamper detection.
Similar content being viewed by others
Data availability
We declare that all the data associated with the manuscript is mentioned in the manuscript.
References
Zheng P, Dai Q, Li Z, Ye Z, Xiong J, Liu HC, Zheng G, Zhang S (2021) Metasurface-based key for computational imaging encryption. Sci Adv 7(21):eabg0363. https://doi.org/10.1126/sciadv.abg0363
Jiang Z, Liu X (2023) Image encryption algorithm based on discrete quantum baker map and Chen hyperchaotic system. Int J Theor Phys 62(2):22. https://doi.org/10.1007/s10773-023-05277-0
Mohammed AO, Hussein HI, Mstafa RJ et al (2023) A blind and robust color image watermarking scheme based on DCT and DWT domains. Multimed Tools Appl 82:32855–32881. https://doi.org/10.1007/s11042-023-14797-0
He M, Wang H, Zhang F, Abdullahi SM, Yang L (2023) Robust blind video watermarking against geometric deformations and online video sharing platform processing. IEEE Trans Dependable Secure Comput 20(6):4702–4718. https://doi.org/10.1109/TDSC.2022.3232484
Natgunanathan I, Praitheeshan P, Gao L, Xiang Y, Pan L (2022) Blockchain-based audio watermarking technique for multimedia copyright protection in distribution networks. ACM Trans Multimed Comput Commun Appl (TOMM) 18(3):1–23. https://doi.org/10.1145/3492803
OpenAI (2022) Introducing ChatGPT. https://openai.com/blog/chatgpt. Accessed 1 July 2023
Blum L, Blum M (2023) A theoretical computer science perspective on consciousness and artificial general intelligence. Engineering 25:12–16. https://doi.org/10.1016/j.eng.2023.03.010
Wyatt P (2021) Work in progress: Demystifying PDF through a machine-readable definition. In: 2021 Workshop on Language-Theoretic Security (LangSec), IEEE Symposium on Security and Privacy. IEEE. https://github.com/gangtan/LangSec-papers-and-slides/raw/main/langsec21/papers/Wyatt_LangSec21.pdf. Accessed 30 June 2023
Muralidharan T, Cohen A, Gerson N, Nissim N (2022) File packing from the malware perspective: Techniques, analysis approaches, and directions for enhancements. ACM Comput Surv 55(5):1–45. https://doi.org/10.1145/3530810
Garfinkel SL (2013) Leaking sensitive information in complex document files–and how to prevent it. IEEE Secur Priv 12(1):20–27. https://doi.org/10.1109/msp.2013.131
Brassil JT, Low S, Maxemchuk NF (1999) Copyright protection for the electronic distribution of text documents. Proc IEEE 87(7):1181–1196. https://doi.org/10.1109/5.771071
Zhong S, Cheng X, Chen T (2007) Data hiding in a kind of PDF texts for secret communication. Int J Netw Secur 4(1):17–26. https://doi.org/10.6633/IJNS.200701.4(1).03
Alizadeh F, Canceill N, Dabkiewicz S, Vandevenne D (2012) Using steganography to hide messages inside pdf files. SSN project report. https://www.os3.nl/_media/2012-2013/courses/ssn/using_steganography_to_hide_messages_inside_pdf_files.pdf
Khosravi B, Khosravi B, Khosravi B, Nazarkardeh K (2019) A new method for PDF steganography in justified texts. J Inf Secur appl 45:61–70. https://doi.org/10.1016/j.jisa.2019.01.003
(2020) ISO 32000-2:2020 document management - portable document format - part 2: PDF 2.0, vol-ume 2 of ISO 32000. https://www.iso.org/standard/75839.html. Accessed 1 July 2023
Chen Q, Huang P (2016) A text watermarking algorithm based on character color of PDF. Electron Sci Technol 29(5):5. https://doi.org/10.16180/j.cnki.issn1007-7820.2016.05.026
Bitar AW, Darazi R, Couchot JF (2017) Blind digital watermarking in PDF documents using spread transform dither modulation. Multimedia Tools Appl 76:143–161. https://doi.org/10.1007/s11042-015-3034-2
Hatoum MW, Darazi R, Couchot J (2018) Blind pdf document watermarking robust against pca and ica attacks. In: Proceedings of the 15th international joint conference on e-business and telecommunications - SECRYPT. INSTICC, SciTePress, pp 420–427. https://doi.org/10.5220/0006899605860593
Hatoum MW, Darazi R, Couchot JF (2020) Normalized blind STDM watermarking scheme for images and PDF documents robust against fixed gain attack. Multimedia Tools Appl 79:1887–1919. https://doi.org/10.1007/s11042-019-08242-4
Kuribayashi M, Fukushima T, Funabiki N (2018) Data hiding for text document in PDF file. In: Pan JS, Tsai PW, Watada J, Jain L (eds) Advances in intelligent information hiding and multimedia signal processing. IIH-MSP 2017. Smart innovation, systems and technologies, vol 81. Springer, Cham. https://doi.org/10.1007/978-3-319-63856-0_47
Kuribayashi M, Wong K (2020) Improved QM-QIM watermarking scheme for PDF document. In: Digital Forensics and Watermarking: 18th International Workshop, IWDW 2019, Chengdu, China, November 2–4, 2019. Revised Selected Papers 18, pp 171–183. Springer. https://doi.org/10.1007/978-3-030-43575-2_14
Nursiah N, Wong K, Kuribayashi M (2019) Reversible data hiding in PDF document exploiting prefix zeros in glyph coordinates. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 1298–1302. https://doi.org/10.1109/apsipaasc47483.2019.9023310
Kuribayashi M, Wong K (2021) Stealthpdf: Data hiding method for PDF file with no visual degradation. J Inf Secur Applic 61:102875. https://doi.org/10.1016/j.jisa.2021.102875
Lee IS, Tsai WH (2010) A new approach to covert communication via PDF files. Signal processing 90(2):557–565. https://doi.org/10.1016/j.sigpro.2009.07.022
Tyagi S, Dwivedi RK, Saxena AK (2019) A high capacity PDF text steganography technique based on hashing using quadratic probing. Int J Intell Eng Syst 12(3):192–202. https://doi.org/10.22266/ijies2019.0630.20
Ekodeck SGR, Ndoundam R (2016) PDF steganography based on Chinese remainder theorem. J Inf Secur Appl 29:1–15. https://doi.org/10.1016/j.jisa.2015.11.008
Singh B, Sharma M (2022) Efficient watermarking technique for protection and authentication of document images. Multimedia Tools Appl 81(16):22985–23005. https://doi.org/10.1007/s11042-022-12174-x
Abdelnabi S, Fritz M (2021) Adversarial watermarking transformer: towards tracing text prove-nance with data hiding. In: 2021 IEEE Symposium on Security and Privacy (SP). IEEE, pp 121–140. https://doi.org/10.1109/sp40001.2021.00083
Yang X, Zhang J, Chen K, Zhang W, Ma Z, Wang F, Yu N (2022) Tracing text provenance via context-aware lexical substitution. Proc AAAI Conf Artif Intell 36:11613–11621. https://doi.org/10.1609/aaai.v36i10.21415
Qiang J, Zhu S, Li Y, Zhu Y, Yuan Y, Wu X (2023) Natural language watermarking via paraphraser-based lexical substitution. Artif Intell 317:103859. https://doi.org/10.1016/j.artint.2023.103859
Alakk W, Al-Ahmad H, Kunhu A (2014) A new watermarking algorithm for scanned grey PDF files using dwt and hash function. In: 2014 9th International Symposium on Communication Systems, Networks & Digital Sign (CSNDSP). IEEE, pp 690–693. https://doi.org/10.1109/csndsp.2014.6923915
Mahmoud A, Al Maharmeh H, Al-Ahmad H (2015) A new watermarking algorithm for scanned colored PDF files using dwt and hash function. In: 2015 International Conference on Infor-mation and Communication Technology Research (ICTRC). IEEE (2015), pp 140–143. https://doi.org/10.1109/ictrc.2015.7156441
Mehta S, Prabhakaran B, Nallusamy R, Newton D (2016) mpdf: Framework for watermarking PDF files using image watermarking algorithms. arXiv preprint arXiv:1610.02443. https://doi.org/10.48550/arXiv.1610.02443
Dikanev P, Vybornova Y (2021) Method for protection of PDF documents against counterfeiting using semi-fragile watermarking. In: 2021 International Conference on Information Technology and Nanotechnology (ITNT). IEEE, pp 1–4. https://doi.org/10.1109/itnt52450.2021.9649063
Zhong ZY, XU GA (2012) Digital watermarking algorithm based on structure of PDF document. J Comput Appl 32(10):2776. https://doi.org/10.3724/sp.j.1087.2012.02776
Al Shaikhli IF, Zeki AM, Makarim RH, Pathan ASK (2012) Protection of integrity and ownership of PDF documents using invisible signature. In: 2012 UKSim 14th International Conference on Computer Modelling and Simulation. IEEE, pp 533–537. https://doi.org/10.1109/uksim.2012.81
Zhao W, Guan H, Huang Y, Zhang S (2020) Research on double watermarking algorithm based on pdf document structure. In: 2020 International Conference on Culture-oriented Science & Technology (ICCST). IEEE, pp 298–303. https://doi.org/10.1109/iccst50977.2020.00064
Pypdftk 0.5 Homepage. https://pypi.org/project/pypdftk/. Accessed 17 July 2023
Library Of Congress Web Archiving Program (2019) .gov PDF dataset. [Software, E-Resource]. Retrieved from the Library of Congress, https://www.loc.gov/item/2020445568/
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Ernawan F, Ariatmanto D (2023) A recent survey on image watermarking using scaling factor techniques for copyright protection. Multimedia Tools Appl 82:27123–27163. https://doi.org/10.1007/s11042-023-14447-5
Khadam U, Iqbal MM, Azam MA, Khalid S, Rho S, Chilamkurti N (2019) Digital watermarking technique for text document protection using data mining analysis. IEEE Access 7:64955–64965. https://doi.org/10.1109/access.2019.2916674
Singh AK, Kumar B, Dave M et al (2015) Robust and Imperceptible Dual Watermarking for Telemedicine Applications. Wirel Personal Commun 80:1415–1433. https://doi.org/10.1007/s11277-014-2091-6
Acknowledgements
This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 62272331 and 61972269, and Sichuan Science and Technology Program under Grant 2022YFG0320.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no competing financial interests or personal relationships that could have appeared to influence the work in this paper.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, Z., Wang, H. & Han, S. A robust PDF watermarking scheme with versatility and compatibility. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18151-w
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18151-w