Abstract
Digital Forensics is the branch of science dealing with investigation of evidences recovered from digital devices, to safeguard against rapidly increasing cyber crimes in today’s digital world. The Source Camera Identification (SCI) problem is to map an image under question correctly to its source device. Following a Digital Forensic approach, the source of an image is detected by post–priori investigation of traces left behind in the image, by the camera. Such traces are generated due to the post–processing operations an image undergoes inside a digital camera, after being captured. In this paper, we model the SCI problem as a machine learning classification problem and focus on the most crucial component of a learning model, i.e. feature selection. We propose three different techniques for feature selection: Filter based approach, Wrapper based approach using Genetic Algorithm (GA), and also a hybrid approach with both Filter and Wrapper methods combined together. We investigate the source detection accuracy that each technique succeeds to achieve. Our experimental results suggest that the proposed methods produced a much compact feature set, hence considerably improve the source detection accuracy and minimize the training time of the learning model, as compared to the state–of–the–art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Celiktutan, O., Sankur, B., Avcibas, I.: Blind identification of source cell-phone model. IEEE Trans. Inf. Forensics Secur. 3(3), 553–566 (2008)
Bayram, S., Sencar, H.T., Memon, N.: Improvements on source camera-model identification based on CFA interpolation. In: Proceeding of WG (2006)
Kharrazi, M., Sencar, H.T., Memon, N.: Blind source camera identification. In: International Conference on Image Processing (ICIP) (2004)
Tsai, M.-J.: Adaptive feature selection for digital camera source identification. In: IEEE International Symposium on Circuits, Systems, pp. 412–415 (2008)
Tsai, M.-J.: A Hybrid model for digital camera source identification. IEEE International Conference on Image Processing (ICIP), pp. 2901–2904 (2009)
Lukas, J.: Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 1(2), 205–214 (2006)
Li, C.-T.: Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 5(2), 280–287 (2010)
Lin, X., Li, C.-T.: Preprocessing reference sensor pattern noise via spectrum equalization. IEEE Trans. Inf. Forensics Secur. 11(1), 126–140 (2016)
Biney, A.G., Sellahewa, H.: Analysis of smartphone model identification using digital images. In: International Conference on Image Processing (ICIP) (2013)
Bayram, S., Avcibas, I., Sankur, B., Memon, N.: Image manipulation detection. J. Electronic Imaging 15(4), 041102 (2006). International Society for Optics and Photonics
Avcibas, I., Sankur, B., Memon, N.: Image steganalysis with binary similarity measures. In: International Conference on Image Processing (ICIP), vol. 3 (2002)
Avcibas, I., Memon, N., Sankur, B.: Steganalysis using image quality metrics. IEEE Trans. Image Process. 12(2), 221–229 (2003)
Lyu, S., Farid, H.: Steganalysis using higher-order image statistics. IEEE Trans. Inf. Forensics Secur. 1(1), 111–119 (2006)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Schaffernicht, E., Gross, H.M.: Weighted mutual information for feature selection. In: International Conference on Artificial Neural Networks (2011)
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Threshold-based feature selection techniques for high-dimensional bioinformatics data. Network Modeling Anal. Health Inform. Bioinform. 1(1), 47–61 (2012)
Liu, D., Cho, S.Y., Sun, D.M., Qiu, Z.D.: A spearman correlation coefficient ranking for matching-score fusion on speaker recognition. In: TENCON (2010)
Yuan, C., Sun, D., Liu, D., Cho, S. Y., Zhang, Y.: A research on feature selection and fusion in palmprint recognition. In: International Workshop on Emerging Techniques and Challenges for Hand-Based Biometrics (ETCHB) (2010)
Onpans, J., Rasmequan, S., Jantarakongkul, B., Chinnasarn, K., Rodtook, A.: Intrusion feature selection using mmodified heuristic greedy algorithm of itemset. In: International Symposium on Communications and Information Technologies (ISCIT) (2013)
Rachburee, N., Punlumjeak, W.: A comparision of feature selection approach between Greedy, IG-ratio, Chi-square, and mRMR in educational mining. In: International Conference on Information Technology and Electrical Engineering (ICITEE) (2015)
Bhasin, V., Bedi, P., Singhal, A.: Feature selection for steganalysis based on modified stochastic diffusion search using fisher score. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI), September 2014
Singh, B., Sankhwar, J.S., Vyas, O.P.: Optimization of feature selection method for high dimensional data using fisher score and minimum spanning tree. In: INDICON, December 2014
Xu, J., Yin, Y., Man, H., He, H.: Feature selection based on sparse imputation. In: International Joint Conference on Neural Networks (IJCNN), June 2012
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Chen, Y.-H., Lin, T.-C.: Dimension reduction techniques for accessing chinese readability. In: International Conference on Machine Learning and Cybernetics, July 2014
Packianather, M.S., kapoor, B.: A wrapper-based feature selection approach using bees algorithm for a wood defect classification system. In: System of Systems Engineering Conference (2015)
Yu, E., Cho, S.: GA-SVM wrapper approach for feature subset selection in keystroke dynamics identity verification. In: Proceedings of the International Joint Conference on Neural Networks (2003)
Talukder, K.H., Harada, K.: Haar wavelet based approach for image compression and quality assessment of compressed image. Int. J. Appl. Math. 36(1) (2007)
Gunawan, I.P., Halim, A.: Haar wavelet decomposition based blockiness detector and picture quality assessment method for JPEG images. In: International Conference on Advanced Computer Science and Information System (2011)
Gloe, T., Bhme, R.: Dresden image database’ for benchmarking digital image forensics. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2007)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Ng, A.: “CS229 Lecture Notes”, CS229 Lecture notes, Stanford (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix: Statistical Measures Used as Feature Filters
A Appendix: Statistical Measures Used as Feature Filters
-
The Chi Squared is a statistical method that measures independence of two variables. In feature selection, chi-square used to check whether the class variable is independent of a feature. Consider \(O_ij\) is the observed frequency and \(E_ij\) is the expected frequency, then chi-squared [19, 20] is defined as
$$\begin{aligned} \chi ^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \end{aligned}$$(5)$$\begin{aligned} E_{ij} = \frac{(R_{T_i})(C_{T_j})}{N} \end{aligned}$$(6)where \(R_{T_i}\) is number of samples in the ith value, \(C_{T_j}\) is number of samples in the class j, N is total number of samples.
-
The Mutual Information [15] method measures the dependency of a variable towards reducing the uncertainty about the target variable (class). It maximizes the mutual information between joint distribution and target class variables in the datasets with many features.
-
The Fisher Score measures the variance between the expected value of the information and the observed value. The information is maximized when variance is minimized. Consider dataset with c classes, \(n_j\) samples for class j, \(\mu _j\) mean value of class j, \(\mu \) mean value of whole class and \(\sigma _j^2\) variance of class j. Then fisher score [21,22,23] \(S_k\) for feature \(F_k\) is defined as
$$\begin{aligned} S_k = \frac{\sum _{j=1}^{c}n_j(\mu _j-\mu )^2}{\sum _{j=1}^{k}n_j\sigma _j^2} \end{aligned}$$(7) -
The Pearson Correlation Coefficient is a statistical model which finds the strength of the correlation between two variables. It is computed by covariance of two variables dividing by the product of their standard deviations. The Pearson correlation coefficient [14] is defined as
$$\begin{aligned} R = \frac{cov(X,Y)}{ \sqrt{var(X) var(Y)}} \end{aligned}$$(8)where cov denotes the covariance and var the variance. Therefore,
$$\begin{aligned} R = \frac{\sum _{k=1}^{m}(x_k-\bar{x})(y_k-\bar{y})}{\sqrt{\sum _{k=1}^{m}(x_k-\bar{x})^{2} \sum _{k=1}^{m}(y_k-\bar{y})^{2}}} \end{aligned}$$(9) -
The Kendall’s Tau rank correlation [16] is a statistical measure which measures the degree of similarity between the ranking of two variables. Consider n number of samples, \(n_c\) number of concordant (ordered in the same way) and \(n_d\) number of discordant (ordered differently). The kendall’s Tau is defined as
$$\begin{aligned} \tau = \frac{n_c-n_d}{\frac{n(n-1)}{2}} \end{aligned}$$(10) -
The Spearman Correlation is a statistical measure expresses the degree of how two variables are monotonically related. Consider we have n samples and \(x_i\) is sample values of X and \(r(x_i)\) is the rank of \(x_i\) and \(y_i\) is values of Y (class) and \(r(y_i)\) is the rank of \(y_i\). The Spearman coefficient [17, 18] is calculated as
$$\begin{aligned} s(X,Y) = 1-\frac{6\sum _{i=1}^{n}(r(x_i)-r(y_i))^2}{n(n^2-1)} \end{aligned}$$(11)The above filters are applied in this paper on a feature set of 598 features, as discussed in Sect. 3.2. The Tables 1 and 2 show the performance of the above filters with respect to accuracy and F–Score.
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Sameer, V.U., Sugumaran, S., Naskar, R. (2016). Digital Forensic Source Camera Identification with Efficient Feature Selection Using Filter, Wrapper and Hybrid Approaches. In: Ray, I., Gaur, M., Conti, M., Sanghi, D., Kamakoti, V. (eds) Information Systems Security. ICISS 2016. Lecture Notes in Computer Science(), vol 10063. Springer, Cham. https://doi.org/10.1007/978-3-319-49806-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-49806-5_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49805-8
Online ISBN: 978-3-319-49806-5
eBook Packages: Computer ScienceComputer Science (R0)