Abstract
Document forgery has increased enormously due to the progression of information technology and image processing software. Critical documents are protected using watermarks or signatures, i.e., active approach. Other documents need passive approach for document forensics. Most of the passive techniques aim to detect and fix the source of the printed document. Other techniques look for the irregularities present in the document. This paper aims to fix the document source printer using passive approach. Hand-crafted features based on key printer noise features (KPNF), speeded up robust features (SURF) and oriented FAST rotated and BRIEF (ORB) are used. Then, feature-based classifiers are implemented using K-NN, decision tree, random forest and majority voting. The document classifier proposed model can efficiently classify the questioned documents to their respective printer class. Further, adaptive boosting and bootstrap aggregating methodologies are used for the improvement in classification accuracy. The proposed model has achieved the best accuracy of 95.1% using a combination of KPNF + ORB + SURF with random forest classifier and adaptive boosting methodology.
Similar content being viewed by others
References
Ali GN, Mikkilineni AK, Delp EJ, Allebach JP, Chiang PJ, Chiu GT (2004) Application of principal components analysis and gaussian mixture models to printer identification. In: Proceedings of non-impact printing and digital fabrication conference, Salt Lake City, Utah, vol 1, pp 301–305
Amer M, Goldstein M (2012) Nearest-neighbor and clustering based anomaly detection algorithms for Rapidminer. In: Proceedings of 3rd Rapidminer community meeting and conference, Aachen, Germany, pp 1–12
Bayram S, Sencar H, Memon N, Avcibas I (2005) Source camera identification based on CFA interpolation. In: Proceedings of international conference on image processing, Genova, Italy, vol 3, pp 69–78
Bayram S, Sencar HT, Memon N (2008) Classification of digital camera-models based on demosaicing artifacts. Digit Investig 5(1):49–59
Bertrand R, Gomez-Kramer P, Terrades OR, Franco P, Ogier JM (2013) A system based on intrinsic features for fraudulent document detection. In: Proceedings of 12th international conference on document analysis and recognition, Washington, DC, pp 6–110
Bianchi T, Piva A (2013) Secure watermarking for multimedia content protection: a review of its benefits and open issues. IEEE Signal Process Mag 30(2):87–96
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Bulan O, Mao J, Sharma G (2009) Geometric distortion signatures for printer identification. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, Taipei, Taiwan, pp 1401–1404
Cedillo-Hernandez M, Garcia-Ugalde F, Nakano-Miyatake M, Perez-Meana H (2013) Robust object-based watermarking using SURF feature matching and DFT domain. Radio Eng 22(4):1057–1071
Cestnik B, Kononenko I, Bratko I (1987) Assistant 86: a knowledge elicitation tool for sophisticated users. In: Proceedings of 2nd European working session on learning, Bled, Yugoslavia, pp 31–45
Chen E (2015) Choosing a machine learning classifier. http://blog.echen.me/2011/04/27/choosing-a-machine-learningclassifier/. Accessed 13 March 2016
Choi JH, Im DH, Lee HY, Oh JT, Ryu JH, Lee HK, (2009) Color laser printer identification by analyzing statistical features on discrete wavelet transform. In: Proceedings of 16th IEEE international conference on image processing, Cairo, Egypt, pp 1505–1508
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Cox IJ, Miller ML, Bloom JA (2000) Watermarking applications and their properties. In: Proceedings of international conference on information technology: coding and computing, Las Vegas, Nevada, pp 6–10
Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9):1090–1099
Elkasrawi S, Shafait F (2014) Printer identification using supervised learning for document forgery detection. In: Proceedings of 11th IAPR international workshop on document analysis systems, France, pp 146–150
Ferreira A, Navarro LC, Pinheiro G, dos Santos JA, Rocha A (2015) Laser printer attribution: exploring new features and beyond. Forensic Sci Int 247:105–125
Foody GM, McCulloch MB, Yates WB (1995) The effect of training set size and composition on artificial neural network classification. Int J Remote Sens 16(9):1707–1723
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of international conference on machine learning, vol 96, pp 148–156
Fu YR, Yang SY (2012) CCS-LTP for Printer Identification based on Texture Analysis. Int J Digit Content Technol Appl 6(13):250–264
Gebhardt J, Goldstein M, Shafait F, Dengel A (2013) Document authentication using printing technique features and unsupervised anomaly detection. In: Proceedings of 12th international conference on document analysis and recognition, Washington, DC, pp 479–483
Jensen FV (1996) An introduction to bayesian networks, vol 210. UCL Press, London, pp 22–25
Jiang W, Ho AT, Treharne H, Shi YQ (2010) A novel multi-size block Benford’s law scheme for printer identification. In: Proceedings of Pacific-Rim conference on multimedia, Shanghai, China, pp 643–652
Joshi S, Khanna N (2017) Single classifier-based passive system for source printer classification using local texture features. IEEE Trans Inf Forensics Secur 13(7):1603–1614
Kee E, Farid H (2008) Printer profiling for forensics and ballistics. In: Proceedings of 10th ACM workshop on multimedia and security, Oxford, pp 3–10
Khanna N, Mikkilineni AK, Chiu GTC, Allebach JP, Delp EJ (2007) Scanner identification using sensor pattern noise. In: Proceedings of security, steganography, and watermarking of multimedia contents, electronic imaging, San Jose, CA
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. In: Proceedings of conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI, information retrieval and pervasive technologies, pp 3–24
Kumar M, Jindal SR, Jindal MK, Lehal GS (2018) Improved recognition results of medieval handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Process Lett 1:1–14. https://doi.org/10.1007/s11063-018-9913-6
Lampert CH, Mei L, Breuel TM (2006) Printing technique classification for document counterfeit detection. In: Proceedings of international conference on computational intelligence and security, Guangzhou, China, vol 1, pp 639–644
Li Z, Jiang W, Kenzhebalin D, Gokan A, Allebach J (2018) Intrinsic signatures for forensic identification of SOHO inkjet printers. NIP Digit Fabr Confer 1:231–236
Mikkilineni AK, Chiang PJ, Ali GN, Chiu GTC, Allebach JP, Delp EJ (2004) Printer identification based on texture features. In: Proceedings of non-impact printing and digital fabrication conference, society for imaging science and technology, Salt Lake City, Utah, vol 1, pp 306–311
Mikkilineni AK, Chiang PJ, Ali GN, Chiu GTC, Allebach JP, Delp EJ (2005a) Printer identification based on graylevel co-occurrence features for security and forensic applications. In: Proceedings of security, steganography, and watermarking of multimedia contents, electronic imaging, California, pp 430–440
Mikkilineni AK, Khanna N, Delp EJ (2011) Forensic printer detection using intrinsic signatures. Media Forensics Secur 7880:78800–78805
Peng CYJ, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14
Pereira F, Mitchell T, Botvinick M (2009) Machine learning classifiers and MRI: a tutorial overview. Neuroimage 45(1):S199–S209
Phillips IT (1996) User’s reference manual for the UW English/technical document image database III. UW-III English/technical document image database manual
Rojas R (2009) AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Freie University, Berlin, Technical Report
Ryu SJ, Lee HY, Cho IW, Lee HK (2008) Document forgery detection with SVM classifier and image quality measures. Adv Multimed Inf Process 2008:486–495
Schreyer M, Schulze C, Stahl A, Effelsberg W (2009) Intelligent printing technique recognition and photocopy detection for forensic document examination. Informatiktage 8:39–42
Schulze C, Schreyer M, Stahl A, Breuel T (2008) Evaluation of graylevel-features for printing technique classification in high-throughput document management systems. Comput Forensics 28:35–46
Smith R (2007) An overview of the Tesseract OCR engine. In: Proceedings of 9th international conference on document analysis and recognition, Beijing, China, vol 2, pp 629–633
Subramanya SR, Yi BK (2006) Digital Signatures. IEEE Potentials 25(2):5–8
Swain PH, Hauska H (1977) The decision tree classifier: design and potential. IEEE Trans Geosci Electron 15(3):142–147
Tao H, Zain JM, Ahmed MM, Abdalla AN, Jing W (2012) A wavelet-based particle swarm optimization algorithm for digital image watermarking. Integr Comput Aided Eng 19(1):81–91
Tao H, Chongmin L, Zain JM, Abdalla AN (2014) Robust image watermarking theories and techniques: a review. J Appl Res Technol 12(1):122–138
Tayan O, Kabir MN, Alginahi YM (2014) A hybrid digital-signature and zero-watermarking approach for authentication and protection of sensitive electronic documents. Sci World J 8:1–15
Tsai MJ, Liu J (2013) Digital forensics for printed source identification. In: Proceedings of IEEE international symposium on circuits and systems, Melbourne, Australia, pp 2347–2350
Tsai MJ, Yuadi I (2018) Digital forensics of microscopic images for printed source identification. Multimed Tools Appl 77(7):8729–8758
Tsai MJ, Liu J, Wang CS, Chuang CH (2011) Source color laser printer identification using discrete wavelet transform and feature selection algorithms. In: Proceedings of IEEE international symposium on circuits and systems, Rio de Janeiro, Brazil, pp 2633–2636
Van BJ, Shafait F, Breuel TM (2009) Resolution independent skew and orientation detection for document images. In: Proceedings of SPIE-IS&T document recognition and retrieval, electronic imaging, San Jose, CA, pp 1–8
Van BJ, Shafait F, Breuel TM (2013a) Text-line examination for document forgery detection. Int J Doc Anal Recognit 16(2):189–207
Van BJ, Shafait F, Breuel TM (2013b) Automatic authentication of color laser print-outs using machine identification codes. Pattern Anal Appl 16(4):663–678
Vapnik V (1995) The nature of statistical learning theory. Springer, New York. Google Scholar. Accessed on 15 July 2015
Vinay A, Kumar CA, Shenoy GR, Murthy KB, Natarajan S (2015) ORB-PCA based feature extraction technique for face recognition. Procedia Comput Sci 58:614–621
Wu Y, Kong X, You XG, Guo Y (2009) Printer forensics based on page document’s geometric distortion. In: Proceedings of 16th IEEE international conference on image processing, Cairo, Egypt, pp 2909–2912
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors have no conflicts of interest in this work.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gupta, S., Kumar, M. Forensic document examination system using boosting and bagging methodologies. Soft Comput 24, 5409–5426 (2020). https://doi.org/10.1007/s00500-019-04297-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04297-5