Digital Forensic Source Camera Identification with Efficient Feature Selection Using Filter, Wrapper and Hybrid Approaches

Sameer, Venkata Udaya; Sugumaran, S.; Naskar, Ruchira

doi:10.1007/978-3-319-49806-5_22

Venkata Udaya Sameer¹⁸,
S. Sugumaran¹⁸ &
Ruchira Naskar¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10063))

Included in the following conference series:

International Conference on Information Systems Security

1105 Accesses
1 Citations

Abstract

Digital Forensics is the branch of science dealing with investigation of evidences recovered from digital devices, to safeguard against rapidly increasing cyber crimes in today’s digital world. The Source Camera Identification (SCI) problem is to map an image under question correctly to its source device. Following a Digital Forensic approach, the source of an image is detected by post–priori investigation of traces left behind in the image, by the camera. Such traces are generated due to the post–processing operations an image undergoes inside a digital camera, after being captured. In this paper, we model the SCI problem as a machine learning classification problem and focus on the most crucial component of a learning model, i.e. feature selection. We propose three different techniques for feature selection: Filter based approach, Wrapper based approach using Genetic Algorithm (GA), and also a hybrid approach with both Filter and Wrapper methods combined together. We investigate the source detection accuracy that each technique succeeds to achieve. Our experimental results suggest that the proposed methods produced a much compact feature set, hence considerably improve the source detection accuracy and minimize the training time of the learning model, as compared to the state–of–the–art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Celiktutan, O., Sankur, B., Avcibas, I.: Blind identification of source cell-phone model. IEEE Trans. Inf. Forensics Secur. 3(3), 553–566 (2008)
Article Google Scholar
Bayram, S., Sencar, H.T., Memon, N.: Improvements on source camera-model identification based on CFA interpolation. In: Proceeding of WG (2006)
Google Scholar
Kharrazi, M., Sencar, H.T., Memon, N.: Blind source camera identification. In: International Conference on Image Processing (ICIP) (2004)
Google Scholar
Tsai, M.-J.: Adaptive feature selection for digital camera source identification. In: IEEE International Symposium on Circuits, Systems, pp. 412–415 (2008)
Google Scholar
Tsai, M.-J.: A Hybrid model for digital camera source identification. IEEE International Conference on Image Processing (ICIP), pp. 2901–2904 (2009)
Google Scholar
Lukas, J.: Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 1(2), 205–214 (2006)
Article MathSciNet Google Scholar
Li, C.-T.: Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 5(2), 280–287 (2010)
Article Google Scholar
Lin, X., Li, C.-T.: Preprocessing reference sensor pattern noise via spectrum equalization. IEEE Trans. Inf. Forensics Secur. 11(1), 126–140 (2016)
Article Google Scholar
Biney, A.G., Sellahewa, H.: Analysis of smartphone model identification using digital images. In: International Conference on Image Processing (ICIP) (2013)
Google Scholar
Bayram, S., Avcibas, I., Sankur, B., Memon, N.: Image manipulation detection. J. Electronic Imaging 15(4), 041102 (2006). International Society for Optics and Photonics
Article Google Scholar
Avcibas, I., Sankur, B., Memon, N.: Image steganalysis with binary similarity measures. In: International Conference on Image Processing (ICIP), vol. 3 (2002)
Google Scholar
Avcibas, I., Memon, N., Sankur, B.: Steganalysis using image quality metrics. IEEE Trans. Image Process. 12(2), 221–229 (2003)
Article MathSciNet Google Scholar
Lyu, S., Farid, H.: Steganalysis using higher-order image statistics. IEEE Trans. Inf. Forensics Secur. 1(1), 111–119 (2006)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Google Scholar
Schaffernicht, E., Gross, H.M.: Weighted mutual information for feature selection. In: International Conference on Artificial Neural Networks (2011)
Google Scholar
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Threshold-based feature selection techniques for high-dimensional bioinformatics data. Network Modeling Anal. Health Inform. Bioinform. 1(1), 47–61 (2012)
Article Google Scholar
Liu, D., Cho, S.Y., Sun, D.M., Qiu, Z.D.: A spearman correlation coefficient ranking for matching-score fusion on speaker recognition. In: TENCON (2010)
Google Scholar
Yuan, C., Sun, D., Liu, D., Cho, S. Y., Zhang, Y.: A research on feature selection and fusion in palmprint recognition. In: International Workshop on Emerging Techniques and Challenges for Hand-Based Biometrics (ETCHB) (2010)
Google Scholar
Onpans, J., Rasmequan, S., Jantarakongkul, B., Chinnasarn, K., Rodtook, A.: Intrusion feature selection using mmodified heuristic greedy algorithm of itemset. In: International Symposium on Communications and Information Technologies (ISCIT) (2013)
Google Scholar
Rachburee, N., Punlumjeak, W.: A comparision of feature selection approach between Greedy, IG-ratio, Chi-square, and mRMR in educational mining. In: International Conference on Information Technology and Electrical Engineering (ICITEE) (2015)
Google Scholar
Bhasin, V., Bedi, P., Singhal, A.: Feature selection for steganalysis based on modified stochastic diffusion search using fisher score. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI), September 2014
Google Scholar
Singh, B., Sankhwar, J.S., Vyas, O.P.: Optimization of feature selection method for high dimensional data using fisher score and minimum spanning tree. In: INDICON, December 2014
Google Scholar
Xu, J., Yin, Y., Man, H., He, H.: Feature selection based on sparse imputation. In: International Joint Conference on Neural Networks (IJCNN), June 2012
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Article MATH Google Scholar
Chen, Y.-H., Lin, T.-C.: Dimension reduction techniques for accessing chinese readability. In: International Conference on Machine Learning and Cybernetics, July 2014
Google Scholar
Packianather, M.S., kapoor, B.: A wrapper-based feature selection approach using bees algorithm for a wood defect classification system. In: System of Systems Engineering Conference (2015)
Google Scholar
Yu, E., Cho, S.: GA-SVM wrapper approach for feature subset selection in keystroke dynamics identity verification. In: Proceedings of the International Joint Conference on Neural Networks (2003)
Google Scholar
Talukder, K.H., Harada, K.: Haar wavelet based approach for image compression and quality assessment of compressed image. Int. J. Appl. Math. 36(1) (2007)
Google Scholar
Gunawan, I.P., Halim, A.: Haar wavelet decomposition based blockiness detector and picture quality assessment method for JPEG images. In: International Conference on Advanced Computer Science and Information System (2011)
Google Scholar
Gloe, T., Bhme, R.: Dresden image database’ for benchmarking digital image forensics. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2007)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Article Google Scholar
Ng, A.: “CS229 Lecture Notes”, CS229 Lecture notes, Stanford (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Rourkela, 769008, Orissa, India
Venkata Udaya Sameer, S. Sugumaran & Ruchira Naskar

Authors

Venkata Udaya Sameer
View author publications
You can also search for this author in PubMed Google Scholar
S. Sugumaran
View author publications
You can also search for this author in PubMed Google Scholar
Ruchira Naskar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Venkata Udaya Sameer .

Editor information

Editors and Affiliations

Colorado State University, Fort Collins, Colorado, USA
Indrajit Ray
Malaviya National Institute of Technology, Jaipur, India
Manoj Singh Gaur
University of Padua, Padua, Italy
Mauro Conti
IIIT Delhi, Delhi, India
Dheeraj Sanghi
IIT Madras, Madras, India
V. Kamakoti

A Appendix: Statistical Measures Used as Feature Filters

The Chi Squared is a statistical method that measures independence of two variables. In feature selection, chi-square used to check whether the class variable is independent of a feature. Consider $O_ij$ is the observed frequency and $E_ij$ is the expected frequency, then chi-squared [19, 20] is defined as
$$\begin{aligned} \chi ^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \end{aligned}$$
(5)

$$\begin{aligned} E_{ij} = \frac{(R_{T_i})(C_{T_j})}{N} \end{aligned}$$
(6)
where $R_{T_i}$ is number of samples in the ith value, $C_{T_j}$ is number of samples in the class j, N is total number of samples.
The Mutual Information [15] method measures the dependency of a variable towards reducing the uncertainty about the target variable (class). It maximizes the mutual information between joint distribution and target class variables in the datasets with many features.
The Fisher Score measures the variance between the expected value of the information and the observed value. The information is maximized when variance is minimized. Consider dataset with c classes, $n_j$ samples for class j, $\mu _j$ mean value of class j, $\mu $ mean value of whole class and $\sigma _j^2$ variance of class j. Then fisher score [21,22,23] $S_k$ for feature $F_k$ is defined as
$$\begin{aligned} S_k = \frac{\sum _{j=1}^{c}n_j(\mu _j-\mu )^2}{\sum _{j=1}^{k}n_j\sigma _j^2} \end{aligned}$$
(7)
The Pearson Correlation Coefficient is a statistical model which finds the strength of the correlation between two variables. It is computed by covariance of two variables dividing by the product of their standard deviations. The Pearson correlation coefficient [14] is defined as
$$\begin{aligned} R = \frac{cov(X,Y)}{ \sqrt{var(X) var(Y)}} \end{aligned}$$
(8)
where cov denotes the covariance and var the variance. Therefore,
$$\begin{aligned} R = \frac{\sum _{k=1}^{m}(x_k-\bar{x})(y_k-\bar{y})}{\sqrt{\sum _{k=1}^{m}(x_k-\bar{x})^{2} \sum _{k=1}^{m}(y_k-\bar{y})^{2}}} \end{aligned}$$
(9)
The Kendall’s Tau rank correlation [16] is a statistical measure which measures the degree of similarity between the ranking of two variables. Consider n number of samples, $n_c$ number of concordant (ordered in the same way) and $n_d$ number of discordant (ordered differently). The kendall’s Tau is defined as
$$\begin{aligned} \tau = \frac{n_c-n_d}{\frac{n(n-1)}{2}} \end{aligned}$$
(10)
The Spearman Correlation is a statistical measure expresses the degree of how two variables are monotonically related. Consider we have n samples and $x_i$ is sample values of X and $r(x_i)$ is the rank of $x_i$ and $y_i$ is values of Y (class) and $r(y_i)$ is the rank of $y_i$. The Spearman coefficient [17, 18] is calculated as
$$\begin{aligned} s(X,Y) = 1-\frac{6\sum _{i=1}^{n}(r(x_i)-r(y_i))^2}{n(n^2-1)} \end{aligned}$$
(11)
The above filters are applied in this paper on a feature set of 598 features, as discussed in Sect. 3.2. The Tables 1 and 2 show the performance of the above filters with respect to accuracy and F–Score.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sameer, V.U., Sugumaran, S., Naskar, R. (2016). Digital Forensic Source Camera Identification with Efficient Feature Selection Using Filter, Wrapper and Hybrid Approaches. In: Ray, I., Gaur, M., Conti, M., Sanghi, D., Kamakoti, V. (eds) Information Systems Security. ICISS 2016. Lecture Notes in Computer Science(), vol 10063. Springer, Cham. https://doi.org/10.1007/978-3-319-49806-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-49806-5_22
Published: 24 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49805-8
Online ISBN: 978-3-319-49806-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Digital Forensic Source Camera Identification with Efficient Feature Selection Using Filter, Wrapper and Hybrid Approaches

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: Statistical Measures Used as Feature Filters

A Appendix: Statistical Measures Used as Feature Filters

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation