Abstract
Since the emergence of deep learning and its adoption in steganalysis fields, most of the reference articles kept using small to medium size CNN, and learn them on relatively small databases.
Therefore, benchmarks and comparisons between different deep learning-based steganalysis algorithms, more precisely CNNs, are thus made on small to medium databases. This is performed without knowing:
-
1.
if the ranking, with a criterion such as accuracy, is always the same when the database is larger,
-
2.
if the efficiency of CNNs will collapse or not if the training database is a multiple of magnitude larger,
-
3.
the minimum size required for a database or a CNN, in order to obtain a better result than a random guesser.
In this paper, after a solid discussion related to the observed behaviour of CNNs as a function of their sizes and the database size, we confirm that the error’s power law also stands in steganalysis, and this in a border case, i.e. with a medium-size network, on a big, constrained and very diverse database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See the paper [17], and the discussions here: https://openreview.net/forum?id=ryenvpEKDr.
- 2.
The ResNet18 with a width = 10 is made of 300,000 parameters and is in the interpolation threshold region for experiments run on CIFAR-10 and CIFAR-100 in [16].
- 3.
A too small database could bias the analysis since there is a region where the error increases when the dataset increase (see [16]). For example, in [23], we report that the number of images needed, for the medium size Yedroudj-Net [25], to reach a region of good performance for spatial steganalysis (that is the performance of a Rich Model with an Ensemble Classifier), is about 10,000 images (5,000 covers and 5,000 stegos) for the learning phase, in the case where there is no cover-source mismatch, and the image size is 256 \(\times \) 256 pixels.
- 4.
The LSSD database is available at: http://www.lirmm.fr/~chaumont/LSSD.html.
- 5.
Experiments with 10M images were disrupted by a maintenance of the platform and it took 22 day. Nevertheless, we rerun the experiment on a similar platform, without any disruption in learning, and the duration was only 10 day. So we report both values to be more precise.
- 6.
The initial point for the non-linear regression is set to \(a'=0.5\), \(\alpha '=0.001\) and \(c'_\infty = 0.01\), with \(c'_\infty \) forced to be positive. The Matlab function is fconmin and the stop criterion is such that the mean of the sum of square error is under \(10^{-6}\).
References
Abdulrahman, H., Chaumont, M., Montesinos, P., Magnier, B.: Color images steganalysis using RGB channel geometric transformation measures. Secur. Commun. Netw. 15, 2945–2956 (2016)
Advani, M.S., Saxe, A.M., Sompolinsky, H.: High-dimensional dynamics of generalization error in neural networks. Neural Netw. 132, 428–446 (2020)
Bas, P., Filler, T., Pevný, T.: “Break our steganographic system”: the ins and outs of organizing BOSS. In: Filler, T., Pevný, T., Craver, S., Ker, A. (eds.) IH 2011. LNCS, vol. 6958, pp. 59–70. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24178-9_5
Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Nat. Acad. Sci. 32, 15849–15854 (2019)
Boroumand, M., Chen, M., Fridrich, J.: Deep residual network for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 5, 1181–1193 (2019)
Chaumont, M.: Deep Learning in steganography and steganalysis. In: Hassaballah, M. (ed.) Digital Media Steganography: Principles, Algorithms, Advances, chap. 14, pp. 321–349. Elsevier (July 2020)
Chubachi, K.: An ensemble model using CNNs on different domains for ALASKA2 image steganalysis. In: Proceedings of the IEEE International Workshop on Information Forensics and Security, WIFS 2020. Virtual Conference due to Covid, New-York, NY, USA, (December 2020)
Cogranne, R., Giboulot, Q., Bas, P.: The ALASKA Steganalysis Challenge: a first step towards steganalysis. In: Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2019, pp. 125–137. Paris, France (July 2019)
Cogranne, R., Giboulot, Q., Bas, P.: Steganography by minimizing statistical detectability: the cases of JPEG and color images. In: Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2020, pp. 161–167 (June 2020)
Cogranne, R., Giboulot, Q., Bas, P.: Challenge academic research on steganalysis with realistic images. In: Proceedings of the IEEE International Workshop on Information Forensics and Security, WIFS 2020. Virtual Conference due to Covid (Formerly New-York, NY, USA) (December 2020)
Fridrich, J.: Steganography in Digital Media. Cambridge University Press, New York (2009)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778. Las Vegas, Nevada (June 2016)
Hestness, J., et al.: Deep Learning Scaling is Predictable, Empirically. In: Unpublished - ArXiv. vol. abs/1712.00409 (2017)
Holub, V., Fridrich, J., Denemark, T.: Universal distortion function for steganography in an arbitrary domain. EURASIP J. Inf. Secur. 2014(1), 1–13 (2014). https://doi.org/10.1186/1687-417X-2014-1
Huang, J., Ni, J., Wan, L., Yan, J.: A customized convolutional neural network with low model complexity for JPEG steganalysis. In: Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2019, pp. 198–203. Paris, France (July 2019)
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. In: Proceedings of the Eighth International Conference on Learning Representations, ICLR 2020. Virtual Conference due to Covid (Formerly Addis Ababa, Ethiopia) (April 2020)
Rosenfeld, J.S., Rosenfeld, A., Belinkov, Y., Shavit, N.: A constructive prediction of the generalization error across scales. In: Proceedings of the Eighth International Conference on Learning Representations, ICLR 2020. Virtual Conference due to Covid (Formerly Addis Ababa, Ethiopia (April 2020)
Ruiz, H., Yedroudj, M., Chaumont, M., Comby, F., Subsol, G.: LSSD: a controlled large JPEG image database for deep-learning-based Steganalysis into the Wild. In: Proceeding of the 25th International Conference on Pattern Recognition, ICPR 2021, Worshop on MultiMedia FORensics in the WILD, MMForWILD 2021, Lecture Notes in Computer Science, LNCS, Springer. Virtual Conference due to Covid (Formerly Milan, Italy) (January 2021). http://www.lirmm.fr/~chaumont/LSSD.html
Sala, V.: Power law scaling of test error versus number of training images for deep convolutional neural networks. In: Proceedings of the multimodal sensing: technologies and applications. vol. 11059, pp. 296–300. International Society for Optics and Photonics, SPIE, Munich (2019)
Spigler, S., Geiger, M., d’Ascoli, S., Sagun, L., Biroli, G., Wyart, M.: A jamming transition from under-to over-parametrization affects generalization in deep learning. J. Phys. Math. Theor. 52(47), 474001 (2019)
Tan, M., Le, Q.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: Proceedings of the 36th International Conference on Machine Learning, PMLR 2019, vol. 97, pp. 6105–6114. Long Beach, California, USA (June 2019)
Ye, J., Ni, J., Yi, Y.: Deep learning hierarchical representations for image steganalysis. IEEE Trans. Inf. Forensics Secur. TIFS 11, 2545–2557 (2017)
Yedroudj, M., Chaumont, M., Comby, F.: How to augment a small learning set for improving the performances of a CNN-based Steganalyzer? In: Proceedings of Media Watermarking, Security, and Forensics, MWSF 2018, Part of IS&T International Symposium on Electronic Imaging, EI 2018. p. 7. Burlingame, California, USA (28 January–2 February 2018)
Yedroudj, M., Chaumont, M., Comby, F., Oulad Amara, A., Bas, P.: Pixels-off: data-augmentation complementary solution for deep-learning steganalysis. In: Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security. p. 39–48. IHMSec 2020, Virtual Conference due to Covid (Formerly Denver, CO, USA) (June 2020)
Yedroudj, M., Comby, F., Chaumont, M.: Yedrouj-Net: an efficient CNN for spatial steganalysis. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, pp. 2092–2096. Calgary, Alberta, Canada (April 2018)
Yousfi, Y., Butora, J., Fridrich, J., Giboulot, Q.: Breaking ALASKA: color separation for steganalysis in JPEG domain. In: Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2019, pp. 138–149. Paris, France (July 2019)
Yousfi, Y., Butora, J., Khvedchenya, E., Fridrich, J.: ImageNet pre-trained CNNs for JPEG steganalysis. In: Proceedings of the IEEE International Workshop on Information Forensics and Security, WIFS 2020. Virtual Conference due to Covid (Formerly New-York, NY, USA) (December 2020)
Yousfi, Y., Fridrich, J.: JPEG steganalysis detectors scalable with respect to compression quality. In: Proceedings of Media Watermarking, Security, and Forensics, MWSF 2020, Part of IS&T International Symposium on Electronic Imaging, EI 2020, p. 10. Burlingame, California, USA (January 2020)
Zeng, J., Tan, S., Liu, G., Li, B., Huang, J.: WISERNet: wider separate-then-reunion network for steganalysis of color images. IEEE Trans. Inf. Forensics Secur. 10, 2735–2748 (2019)
Acknowledgment
The authors would like to thank the French Defense Procurement Agency (DGA) for its support through the ANR Alaska project (ANR-18-ASTR-0009). We also thank IBM Montpellier and the Institute for Development and Resources in Intensive Scientific Computing (IDRISS/CNRS) for providing us access to High-Performance Computing resources.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ruiz, H., Chaumont, M., Yedroudj, M., Amara, A.O., Comby, F., Subsol, G. (2021). Analysis of the Scalability of a Deep-Learning Network for Steganography “Into the Wild”. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-68780-9_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68779-3
Online ISBN: 978-3-030-68780-9
eBook Packages: Computer ScienceComputer Science (R0)