Forensic document examination system using boosting and bagging methodologies

Gupta, Surbhi; Kumar, Munish

doi:10.1007/s00500-019-04297-5

Forensic document examination system using boosting and bagging methodologies

Methodologies and Application
Published: 14 August 2019

Volume 24, pages 5409–5426, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Surbhi Gupta¹ &
Munish Kumar²

815 Accesses
23 Citations
1 Altmetric
Explore all metrics

Abstract

Document forgery has increased enormously due to the progression of information technology and image processing software. Critical documents are protected using watermarks or signatures, i.e., active approach. Other documents need passive approach for document forensics. Most of the passive techniques aim to detect and fix the source of the printed document. Other techniques look for the irregularities present in the document. This paper aims to fix the document source printer using passive approach. Hand-crafted features based on key printer noise features (KPNF), speeded up robust features (SURF) and oriented FAST rotated and BRIEF (ORB) are used. Then, feature-based classifiers are implemented using K-NN, decision tree, random forest and majority voting. The document classifier proposed model can efficiently classify the questioned documents to their respective printer class. Further, adaptive boosting and bootstrap aggregating methodologies are used for the improvement in classification accuracy. The proposed model has achieved the best accuracy of 95.1% using a combination of KPNF + ORB + SURF with random forest classifier and adaptive boosting methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A computational approach for printed document forensics using SURF and ORB features

Article 31 January 2020

Bagging- and Boosting-Based Latent Fingerprint Image Classification and Segmentation

Fingerprint liveness detection using local quality features

Article 15 December 2018

References

Ali GN, Mikkilineni AK, Delp EJ, Allebach JP, Chiang PJ, Chiu GT (2004) Application of principal components analysis and gaussian mixture models to printer identification. In: Proceedings of non-impact printing and digital fabrication conference, Salt Lake City, Utah, vol 1, pp 301–305
Amer M, Goldstein M (2012) Nearest-neighbor and clustering based anomaly detection algorithms for Rapidminer. In: Proceedings of 3rd Rapidminer community meeting and conference, Aachen, Germany, pp 1–12
Bayram S, Sencar H, Memon N, Avcibas I (2005) Source camera identification based on CFA interpolation. In: Proceedings of international conference on image processing, Genova, Italy, vol 3, pp 69–78
Bayram S, Sencar HT, Memon N (2008) Classification of digital camera-models based on demosaicing artifacts. Digit Investig 5(1):49–59
Article Google Scholar
Bertrand R, Gomez-Kramer P, Terrades OR, Franco P, Ogier JM (2013) A system based on intrinsic features for fraudulent document detection. In: Proceedings of 12th international conference on document analysis and recognition, Washington, DC, pp 6–110
Bianchi T, Piva A (2013) Secure watermarking for multimedia content protection: a review of its benefits and open issues. IEEE Signal Process Mag 30(2):87–96
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Bulan O, Mao J, Sharma G (2009) Geometric distortion signatures for printer identification. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, Taipei, Taiwan, pp 1401–1404
Cedillo-Hernandez M, Garcia-Ugalde F, Nakano-Miyatake M, Perez-Meana H (2013) Robust object-based watermarking using SURF feature matching and DFT domain. Radio Eng 22(4):1057–1071
Google Scholar
Cestnik B, Kononenko I, Bratko I (1987) Assistant 86: a knowledge elicitation tool for sophisticated users. In: Proceedings of 2nd European working session on learning, Bled, Yugoslavia, pp 31–45
Chen E (2015) Choosing a machine learning classifier. http://blog.echen.me/2011/04/27/choosing-a-machine-learningclassifier/. Accessed 13 March 2016
Choi JH, Im DH, Lee HY, Oh JT, Ryu JH, Lee HK, (2009) Color laser printer identification by analyzing statistical features on discrete wavelet transform. In: Proceedings of 16th IEEE international conference on image processing, Cairo, Egypt, pp 1505–1508
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article Google Scholar
Cox IJ, Miller ML, Bloom JA (2000) Watermarking applications and their properties. In: Proceedings of international conference on information technology: coding and computing, Las Vegas, Nevada, pp 6–10
Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9):1090–1099
Article Google Scholar
Elkasrawi S, Shafait F (2014) Printer identification using supervised learning for document forgery detection. In: Proceedings of 11th IAPR international workshop on document analysis systems, France, pp 146–150
Ferreira A, Navarro LC, Pinheiro G, dos Santos JA, Rocha A (2015) Laser printer attribution: exploring new features and beyond. Forensic Sci Int 247:105–125
Article Google Scholar
Foody GM, McCulloch MB, Yates WB (1995) The effect of training set size and composition on artificial neural network classification. Int J Remote Sens 16(9):1707–1723
Article Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of international conference on machine learning, vol 96, pp 148–156
Fu YR, Yang SY (2012) CCS-LTP for Printer Identification based on Texture Analysis. Int J Digit Content Technol Appl 6(13):250–264
Google Scholar
Gebhardt J, Goldstein M, Shafait F, Dengel A (2013) Document authentication using printing technique features and unsupervised anomaly detection. In: Proceedings of 12th international conference on document analysis and recognition, Washington, DC, pp 479–483
Jensen FV (1996) An introduction to bayesian networks, vol 210. UCL Press, London, pp 22–25
Google Scholar
Jiang W, Ho AT, Treharne H, Shi YQ (2010) A novel multi-size block Benford’s law scheme for printer identification. In: Proceedings of Pacific-Rim conference on multimedia, Shanghai, China, pp 643–652
Joshi S, Khanna N (2017) Single classifier-based passive system for source printer classification using local texture features. IEEE Trans Inf Forensics Secur 13(7):1603–1614
Article Google Scholar
Kee E, Farid H (2008) Printer profiling for forensics and ballistics. In: Proceedings of 10th ACM workshop on multimedia and security, Oxford, pp 3–10
Khanna N, Mikkilineni AK, Chiu GTC, Allebach JP, Delp EJ (2007) Scanner identification using sensor pattern noise. In: Proceedings of security, steganography, and watermarking of multimedia contents, electronic imaging, San Jose, CA
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. In: Proceedings of conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI, information retrieval and pervasive technologies, pp 3–24
Kumar M, Jindal SR, Jindal MK, Lehal GS (2018) Improved recognition results of medieval handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Process Lett 1:1–14. https://doi.org/10.1007/s11063-018-9913-6
Article Google Scholar
Lampert CH, Mei L, Breuel TM (2006) Printing technique classification for document counterfeit detection. In: Proceedings of international conference on computational intelligence and security, Guangzhou, China, vol 1, pp 639–644
Li Z, Jiang W, Kenzhebalin D, Gokan A, Allebach J (2018) Intrinsic signatures for forensic identification of SOHO inkjet printers. NIP Digit Fabr Confer 1:231–236
Article Google Scholar
Mikkilineni AK, Chiang PJ, Ali GN, Chiu GTC, Allebach JP, Delp EJ (2004) Printer identification based on texture features. In: Proceedings of non-impact printing and digital fabrication conference, society for imaging science and technology, Salt Lake City, Utah, vol 1, pp 306–311
Mikkilineni AK, Chiang PJ, Ali GN, Chiu GTC, Allebach JP, Delp EJ (2005a) Printer identification based on graylevel co-occurrence features for security and forensic applications. In: Proceedings of security, steganography, and watermarking of multimedia contents, electronic imaging, California, pp 430–440
Mikkilineni AK, Khanna N, Delp EJ (2011) Forensic printer detection using intrinsic signatures. Media Forensics Secur 7880:78800–78805
Google Scholar
Peng CYJ, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14
Article Google Scholar
Pereira F, Mitchell T, Botvinick M (2009) Machine learning classifiers and MRI: a tutorial overview. Neuroimage 45(1):S199–S209
Article Google Scholar
Phillips IT (1996) User’s reference manual for the UW English/technical document image database III. UW-III English/technical document image database manual
Rojas R (2009) AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Freie University, Berlin, Technical Report
Ryu SJ, Lee HY, Cho IW, Lee HK (2008) Document forgery detection with SVM classifier and image quality measures. Adv Multimed Inf Process 2008:486–495
Google Scholar
Schreyer M, Schulze C, Stahl A, Effelsberg W (2009) Intelligent printing technique recognition and photocopy detection for forensic document examination. Informatiktage 8:39–42
Google Scholar
Schulze C, Schreyer M, Stahl A, Breuel T (2008) Evaluation of graylevel-features for printing technique classification in high-throughput document management systems. Comput Forensics 28:35–46
Article Google Scholar
Smith R (2007) An overview of the Tesseract OCR engine. In: Proceedings of 9th international conference on document analysis and recognition, Beijing, China, vol 2, pp 629–633
Subramanya SR, Yi BK (2006) Digital Signatures. IEEE Potentials 25(2):5–8
Article Google Scholar
Swain PH, Hauska H (1977) The decision tree classifier: design and potential. IEEE Trans Geosci Electron 15(3):142–147
Article Google Scholar
Tao H, Zain JM, Ahmed MM, Abdalla AN, Jing W (2012) A wavelet-based particle swarm optimization algorithm for digital image watermarking. Integr Comput Aided Eng 19(1):81–91
Article Google Scholar
Tao H, Chongmin L, Zain JM, Abdalla AN (2014) Robust image watermarking theories and techniques: a review. J Appl Res Technol 12(1):122–138
Article Google Scholar
Tayan O, Kabir MN, Alginahi YM (2014) A hybrid digital-signature and zero-watermarking approach for authentication and protection of sensitive electronic documents. Sci World J 8:1–15
Article Google Scholar
Tsai MJ, Liu J (2013) Digital forensics for printed source identification. In: Proceedings of IEEE international symposium on circuits and systems, Melbourne, Australia, pp 2347–2350
Tsai MJ, Yuadi I (2018) Digital forensics of microscopic images for printed source identification. Multimed Tools Appl 77(7):8729–8758
Article Google Scholar
Tsai MJ, Liu J, Wang CS, Chuang CH (2011) Source color laser printer identification using discrete wavelet transform and feature selection algorithms. In: Proceedings of IEEE international symposium on circuits and systems, Rio de Janeiro, Brazil, pp 2633–2636
Van BJ, Shafait F, Breuel TM (2009) Resolution independent skew and orientation detection for document images. In: Proceedings of SPIE-IS&T document recognition and retrieval, electronic imaging, San Jose, CA, pp 1–8
Van BJ, Shafait F, Breuel TM (2013a) Text-line examination for document forgery detection. Int J Doc Anal Recognit 16(2):189–207
Article Google Scholar
Van BJ, Shafait F, Breuel TM (2013b) Automatic authentication of color laser print-outs using machine identification codes. Pattern Anal Appl 16(4):663–678
Article MathSciNet Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York. Google Scholar. Accessed on 15 July 2015
Vinay A, Kumar CA, Shenoy GR, Murthy KB, Natarajan S (2015) ORB-PCA based feature extraction technique for face recognition. Procedia Comput Sci 58:614–621
Article Google Scholar
Wu Y, Kong X, You XG, Guo Y (2009) Printer forensics based on page document’s geometric distortion. In: Proceedings of 16th IEEE international conference on image processing, Cairo, Egypt, pp 2909–2912

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad, Telangana, India
Surbhi Gupta
Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, Bathinda, Punjab, India
Munish Kumar

Authors

Surbhi Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Munish Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Munish Kumar.

Ethics declarations

Conflict of interest

Authors have no conflicts of interest in this work.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, S., Kumar, M. Forensic document examination system using boosting and bagging methodologies. Soft Comput 24, 5409–5426 (2020). https://doi.org/10.1007/s00500-019-04297-5

Download citation

Published: 14 August 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s00500-019-04297-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Forensic document examination system using boosting and bagging methodologies

Abstract

Access this article

Similar content being viewed by others

A computational approach for printed document forensics using SURF and ORB features

Bagging- and Boosting-Based Latent Fingerprint Image Classification and Segmentation

Fingerprint liveness detection using local quality features

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Forensic document examination system using boosting and bagging methodologies

Abstract

Access this article

Similar content being viewed by others

A computational approach for printed document forensics using SURF and ORB features

Bagging- and Boosting-Based Latent Fingerprint Image Classification and Segmentation

Fingerprint liveness detection using local quality features

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation