Forensic document examination system using boosting and bagging methodologies

  • Surbhi Gupta
  • Munish KumarEmail author
Methodologies and Application


Document forgery has increased enormously due to the progression of information technology and image processing software. Critical documents are protected using watermarks or signatures, i.e., active approach. Other documents need passive approach for document forensics. Most of the passive techniques aim to detect and fix the source of the printed document. Other techniques look for the irregularities present in the document. This paper aims to fix the document source printer using passive approach. Hand-crafted features based on key printer noise features (KPNF), speeded up robust features (SURF) and oriented FAST rotated and BRIEF (ORB) are used. Then, feature-based classifiers are implemented using K-NN, decision tree, random forest and majority voting. The document classifier proposed model can efficiently classify the questioned documents to their respective printer class. Further, adaptive boosting and bootstrap aggregating methodologies are used for the improvement in classification accuracy. The proposed model has achieved the best accuracy of 95.1% using a combination of KPNF + ORB + SURF with random forest classifier and adaptive boosting methodology.


Document forensics Printer forensics KPNF SURF ORB AdaBoost Bagging 


Compliance with ethical standards

Conflict of interest

Authors have no conflicts of interest in this work.


  1. Ali GN, Mikkilineni AK, Delp EJ, Allebach JP, Chiang PJ, Chiu GT (2004) Application of principal components analysis and gaussian mixture models to printer identification. In: Proceedings of non-impact printing and digital fabrication conference, Salt Lake City, Utah, vol 1, pp 301–305Google Scholar
  2. Amer M, Goldstein M (2012) Nearest-neighbor and clustering based anomaly detection algorithms for Rapidminer. In: Proceedings of 3rd Rapidminer community meeting and conference, Aachen, Germany, pp 1–12Google Scholar
  3. Bayram S, Sencar H, Memon N, Avcibas I (2005) Source camera identification based on CFA interpolation. In: Proceedings of international conference on image processing, Genova, Italy, vol 3, pp 69–78Google Scholar
  4. Bayram S, Sencar HT, Memon N (2008) Classification of digital camera-models based on demosaicing artifacts. Digit Investig 5(1):49–59Google Scholar
  5. Bertrand R, Gomez-Kramer P, Terrades OR, Franco P, Ogier JM (2013) A system based on intrinsic features for fraudulent document detection. In: Proceedings of 12th international conference on document analysis and recognition, Washington, DC, pp 6–110Google Scholar
  6. Bianchi T, Piva A (2013) Secure watermarking for multimedia content protection: a review of its benefits and open issues. IEEE Signal Process Mag 30(2):87–96Google Scholar
  7. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140zbMATHGoogle Scholar
  8. Breiman L (2001) Random forests. Mach Learn 45(1):5–32zbMATHGoogle Scholar
  9. Bulan O, Mao J, Sharma G (2009) Geometric distortion signatures for printer identification. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, Taipei, Taiwan, pp 1401–1404Google Scholar
  10. Cedillo-Hernandez M, Garcia-Ugalde F, Nakano-Miyatake M, Perez-Meana H (2013) Robust object-based watermarking using SURF feature matching and DFT domain. Radio Eng 22(4):1057–1071Google Scholar
  11. Cestnik B, Kononenko I, Bratko I (1987) Assistant 86: a knowledge elicitation tool for sophisticated users. In: Proceedings of 2nd European working session on learning, Bled, Yugoslavia, pp 31–45Google Scholar
  12. Chen E (2015) Choosing a machine learning classifier. Accessed 13 March 2016
  13. Choi JH, Im DH, Lee HY, Oh JT, Ryu JH, Lee HK, (2009) Color laser printer identification by analyzing statistical features on discrete wavelet transform. In: Proceedings of 16th IEEE international conference on image processing, Cairo, Egypt, pp 1505–1508Google Scholar
  14. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27zbMATHGoogle Scholar
  15. Cox IJ, Miller ML, Bloom JA (2000) Watermarking applications and their properties. In: Proceedings of international conference on information technology: coding and computing, Las Vegas, Nevada, pp 6–10Google Scholar
  16. Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9):1090–1099Google Scholar
  17. Elkasrawi S, Shafait F (2014) Printer identification using supervised learning for document forgery detection. In: Proceedings of 11th IAPR international workshop on document analysis systems, France, pp 146–150Google Scholar
  18. Ferreira A, Navarro LC, Pinheiro G, dos Santos JA, Rocha A (2015) Laser printer attribution: exploring new features and beyond. Forensic Sci Int 247:105–125Google Scholar
  19. Foody GM, McCulloch MB, Yates WB (1995) The effect of training set size and composition on artificial neural network classification. Int J Remote Sens 16(9):1707–1723Google Scholar
  20. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of international conference on machine learning, vol 96, pp 148–156Google Scholar
  21. Fu YR, Yang SY (2012) CCS-LTP for Printer Identification based on Texture Analysis. Int J Digit Content Technol Appl 6(13):250–264Google Scholar
  22. Gebhardt J, Goldstein M, Shafait F, Dengel A (2013) Document authentication using printing technique features and unsupervised anomaly detection. In: Proceedings of 12th international conference on document analysis and recognition, Washington, DC, pp 479–483Google Scholar
  23. Jensen FV (1996) An introduction to bayesian networks, vol 210. UCL Press, London, pp 22–25Google Scholar
  24. Jiang W, Ho AT, Treharne H, Shi YQ (2010) A novel multi-size block Benford’s law scheme for printer identification. In: Proceedings of Pacific-Rim conference on multimedia, Shanghai, China, pp 643–652Google Scholar
  25. Joshi S, Khanna N (2017) Single classifier-based passive system for source printer classification using local texture features. IEEE Trans Inf Forensics Secur 13(7):1603–1614Google Scholar
  26. Kee E, Farid H (2008) Printer profiling for forensics and ballistics. In: Proceedings of 10th ACM workshop on multimedia and security, Oxford, pp 3–10Google Scholar
  27. Khanna N, Mikkilineni AK, Chiu GTC, Allebach JP, Delp EJ (2007) Scanner identification using sensor pattern noise. In: Proceedings of security, steganography, and watermarking of multimedia contents, electronic imaging, San Jose, CAGoogle Scholar
  28. Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: a review of classification techniques. In: Proceedings of conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in eHealth, HCI, information retrieval and pervasive technologies, pp 3–24Google Scholar
  29. Kumar M, Jindal SR, Jindal MK, Lehal GS (2018) Improved recognition results of medieval handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Process Lett 1:1–14. Google Scholar
  30. Lampert CH, Mei L, Breuel TM (2006) Printing technique classification for document counterfeit detection. In: Proceedings of international conference on computational intelligence and security, Guangzhou, China, vol 1, pp 639–644Google Scholar
  31. Li Z, Jiang W, Kenzhebalin D, Gokan A, Allebach J (2018) Intrinsic signatures for forensic identification of SOHO inkjet printers. NIP Digit Fabr Confer 1:231–236Google Scholar
  32. Mikkilineni AK, Chiang PJ, Ali GN, Chiu GTC, Allebach JP, Delp EJ (2004) Printer identification based on texture features. In: Proceedings of non-impact printing and digital fabrication conference, society for imaging science and technology, Salt Lake City, Utah, vol 1, pp 306–311Google Scholar
  33. Mikkilineni AK, Chiang PJ, Ali GN, Chiu GTC, Allebach JP, Delp EJ (2005a) Printer identification based on graylevel co-occurrence features for security and forensic applications. In: Proceedings of security, steganography, and watermarking of multimedia contents, electronic imaging, California, pp 430–440Google Scholar
  34. Mikkilineni AK, Khanna N, Delp EJ (2011) Forensic printer detection using intrinsic signatures. Media Forensics Secur 7880:78800–78805Google Scholar
  35. Peng CYJ, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96(1):3–14Google Scholar
  36. Pereira F, Mitchell T, Botvinick M (2009) Machine learning classifiers and MRI: a tutorial overview. Neuroimage 45(1):S199–S209Google Scholar
  37. Phillips IT (1996) User’s reference manual for the UW English/technical document image database III. UW-III English/technical document image database manualGoogle Scholar
  38. Rojas R (2009) AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Freie University, Berlin, Technical ReportGoogle Scholar
  39. Ryu SJ, Lee HY, Cho IW, Lee HK (2008) Document forgery detection with SVM classifier and image quality measures. Adv Multimed Inf Process 2008:486–495Google Scholar
  40. Schreyer M, Schulze C, Stahl A, Effelsberg W (2009) Intelligent printing technique recognition and photocopy detection for forensic document examination. Informatiktage 8:39–42Google Scholar
  41. Schulze C, Schreyer M, Stahl A, Breuel T (2008) Evaluation of graylevel-features for printing technique classification in high-throughput document management systems. Comput Forensics 28:35–46Google Scholar
  42. Smith R (2007) An overview of the Tesseract OCR engine. In: Proceedings of 9th international conference on document analysis and recognition, Beijing, China, vol 2, pp 629–633Google Scholar
  43. Subramanya SR, Yi BK (2006) Digital Signatures. IEEE Potentials 25(2):5–8Google Scholar
  44. Swain PH, Hauska H (1977) The decision tree classifier: design and potential. IEEE Trans Geosci Electron 15(3):142–147Google Scholar
  45. Tao H, Zain JM, Ahmed MM, Abdalla AN, Jing W (2012) A wavelet-based particle swarm optimization algorithm for digital image watermarking. Integr Comput Aided Eng 19(1):81–91Google Scholar
  46. Tao H, Chongmin L, Zain JM, Abdalla AN (2014) Robust image watermarking theories and techniques: a review. J Appl Res Technol 12(1):122–138Google Scholar
  47. Tayan O, Kabir MN, Alginahi YM (2014) A hybrid digital-signature and zero-watermarking approach for authentication and protection of sensitive electronic documents. Sci World J 8:1–15Google Scholar
  48. Tsai MJ, Liu J (2013) Digital forensics for printed source identification. In: Proceedings of IEEE international symposium on circuits and systems, Melbourne, Australia, pp 2347–2350Google Scholar
  49. Tsai MJ, Yuadi I (2018) Digital forensics of microscopic images for printed source identification. Multimed Tools Appl 77(7):8729–8758Google Scholar
  50. Tsai MJ, Liu J, Wang CS, Chuang CH (2011) Source color laser printer identification using discrete wavelet transform and feature selection algorithms. In: Proceedings of IEEE international symposium on circuits and systems, Rio de Janeiro, Brazil, pp 2633–2636Google Scholar
  51. Van BJ, Shafait F, Breuel TM (2009) Resolution independent skew and orientation detection for document images. In: Proceedings of SPIE-IS&T document recognition and retrieval, electronic imaging, San Jose, CA, pp 1–8Google Scholar
  52. Van BJ, Shafait F, Breuel TM (2013a) Text-line examination for document forgery detection. Int J Doc Anal Recognit 16(2):189–207Google Scholar
  53. Van BJ, Shafait F, Breuel TM (2013b) Automatic authentication of color laser print-outs using machine identification codes. Pattern Anal Appl 16(4):663–678MathSciNetGoogle Scholar
  54. Vapnik V (1995) The nature of statistical learning theory. Springer, New York. Google Scholar. Accessed on 15 July 2015Google Scholar
  55. Vinay A, Kumar CA, Shenoy GR, Murthy KB, Natarajan S (2015) ORB-PCA based feature extraction technique for face recognition. Procedia Comput Sci 58:614–621Google Scholar
  56. Wu Y, Kong X, You XG, Guo Y (2009) Printer forensics based on page document’s geometric distortion. In: Proceedings of 16th IEEE international conference on image processing, Cairo, Egypt, pp 2909–2912Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringGokaraju Rangaraju Institute of Engineering and TechnologyHyderabadIndia
  2. 2.Department of Computational SciencesMaharaja Ranjit Singh Punjab Technical UniversityBathindaIndia

Personalised recommendations