Skip to main content

LSSD: A Controlled Large JPEG Image Database for Deep-Learning-Based Steganalysis “Into the Wild”

  • Conference paper
  • First Online:
Pattern Recognition. ICPR International Workshops and Challenges (ICPR 2021)

Abstract

For many years, the image databases used in steganalysis have been relatively small, i.e. about ten thousand images. This limits the diversity of images and thus prevents large-scale analysis of steganalysis algorithms.

In this paper, we describe a large JPEG database composed of 2 million colour and grey-scale images. This database, named LSSD for Large Scale Steganalysis Database, was obtained thanks to the intensive use of “controlled” development procedures. LSSD has been made publicly available, and we aspire it could be used by the steganalysis community for large-scale experiments.

We introduce the pipeline used for building various image database versions. We detail the general methodology that can be used to redevelop the entire database and increase even more the diversity. We also discuss computational cost and storage cost in order to develop images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Website of the ALASKA challenge#2: https://alaska.utt.fr/ Download page: http://alaska.utt.fr/ALASKA_v2_RAWs_scripts.zip.

  2. 2.

    Challenge BOSS: http://agents.fel.cvut.cz/boss/index.php?mode=VIEW&tmpl=about Download page: ftp://mas22.felk.cvut.cz/RAWs .

  3. 3.

    Obsolete download link http://mmlab.science.unitn.it/RAISE/.

  4. 4.

    http://forensics.inf.tu-dresden.de/ddimgdb Download page: http://forensics.inf.tu-dresden.de/ddimgdb/selections .

  5. 5.

    Site closed on February 17, 2020: http://wesaturate.com/.

  6. 6.

    https://data.csafe.iastate.edu/StegoDatabase/.

  7. 7.

    Documentation: https://pillow.readthedocs.io/en/stable/.

  8. 8.

    Software available at: http://rawtherapee.com More information can be found at: http://rawpedia.rawtherapee.com.

  9. 9.

    Documentation about the different mosaicking methods of Rawtherapee can be found at: https://rawpedia.rawtherapee.com/Demosaicing.

References

  1. Bas, P., Filler, T., Pevný, T.: Break our steganographic system: the ins and outs of organizing BOSS. In: Filler, T., Pevny, T., Craver, S., Ker, A. (eds.) IH 2011. LNCS, vol. 6958, pp. 59–70. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24178-9_5

    Chapter  Google Scholar 

  2. Chaumont, M.: Deep Learning in steganography and steganalysis. In: Hassaballah, M. (ed.) Digital Media Steganography: Principles, Algorithms, Advances, chap. 14, pp. 321–349. Elsevier, July 2020

    Google Scholar 

  3. Chubachi, K.: An Ensemble Model using CNNs on Different Domains for ALASKA2 Image Steganalysis. In: Proceedings of the IEEE International Workshop on Information Forensics and Security, WIFS 2020. Virtual Conference due to Covid (Formerly New-York, NY, USA), December 2020

    Google Scholar 

  4. Cogranne, R., Giboulot, Q., Bas, P.: The ALASKA steganalysis challenge: a first step towards steganalysis. In: Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2019, pp. 125–137. Paris, France, July 2019

    Google Scholar 

  5. Cogranne, R., Giboulot, Q., Bas, P.: Challenge academic research on steganalysis with realistic images. In: Proceedings of the IEEE International Workshop on Information Forensics and Security, WIFS 2020. Virtual Conference due to Covid (Formerly New-York, NY, USA), December 2020

    Google Scholar 

  6. Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: RAISE - a raw images dataset for digital image forensics. In: Proceedings of ACM Multimedia Systems, Portland, Oregon, March 2015

    Google Scholar 

  7. Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255 (2009)

    Google Scholar 

  8. Fridrich, J.: Steganography in Digital Media. Cambridge University Press, New York (2009)

    Book  Google Scholar 

  9. Giboulot, Q., Cogranne, R., Borghys, D., Bas, P.: Effects and solutions of cover-source mismatch in image steganalysis. Signal Proc. Image Commun. 86, 115888 (2020)

    Article  Google Scholar 

  10. Gloe, T., Böhme, R.: The ‘Dresden image database’ for benchmarking digital image forensics. In: Proceedings of the 25th Symposium On Applied Computing (ACM SAC 2010), vol. 2, pp. 1585–1591 (2010)

    Google Scholar 

  11. Holub, V., Fridrich, J., Denemark, T.: Universal distortion function for steganography in an arbitrary domain. EURASIP J. Inf. Secur. 2014(1), 1–13 (2014). https://doi.org/10.1186/1687-417X-2014-1

    Article  Google Scholar 

  12. Ker, A.D., et al.: Moving steganography and steganalysis from the laboratory into the real world. In: Proceedings of the 1st ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2013, pp. 45–58. Montpellier, France, June 2013

    Google Scholar 

  13. Menon, D., Calvagno, G.: Color image demosaicking: an overview. Signal Proc. Image Commun. 8, 518–533 (2011)

    Article  Google Scholar 

  14. Newman, J., et al.: StegoAppDB: a steganography apps forensics image database. In: Proceedings of Media Watermarking, Security, and Forensics, MWSF 2019, Part of IS&T International Symposium on Electronic Imaging, EI 2019. Ingenta, Burlingame, California, USA, January 2019

    Google Scholar 

  15. Ruiz, H., Chaumont, M., Yedroudj, M., Oulad-Amara, A., Comby, F., Subsol, G.: Analysis of the scalability of a deep-learning network for steganography “Into the Wild”. In: Proceeding of the 25th International Conference on Pattern Recognition, ICPR 2021, Worshop on MultiMedia FORensics in the WILD, MMForWILD 2021, Lecture Notes in Computer Science, LNCS, Springer. Virtual Conference due to Covid (Formerly Milan, Italy), January 2021. http://www.lirmm.fr/~chaumont/LSSD.html

  16. Yedroudj, M., Chaumont, M., Comby, F.: How to augment a small learning set for improving the performances of a CNN-based steganalyzer? In: Proceedings of Media Watermarking, Security, and Forensics, MWSF 2018, Part of IS&T International Symposium on Electronic Imaging, EI 2018. p. 7. Burlingame, California, USA, 28 January–2 February 2018

    Google Scholar 

  17. Yousfi, Y., Butora, J., Fridrich, J., Giboulot, Q.: Breaking ALASKA: color separation for steganalysis in jpeg domain. In: Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2019, pp. 138–149. Paris, France, July 2019

    Google Scholar 

  18. Yousfi, Y., Butora, J., Khvedchenya, E., Fridrich, J.: ImageNet pre-trained CNNs for JPEG steganalysis. In: Proceedings of the IEEE International Workshop on Information Forensics and Security, WIFS 2020. Virtual Conference due to Covid (Formerly New-York, NY, USA), December 2020

    Google Scholar 

  19. Yousfi, Y., Fridrich, J.: JPEG steganalysis detectors scalable with respect to compression quality. In: Proceedings of Media Watermarking, Security, and Forensics, MWSF 2020, Part of IS&T International Symposium on Electronic Imaging, EI 2020, p. 10. Burlingame, California, USA, January 2020

    Google Scholar 

  20. Zeng, J., Tan, S., Li, B., Huang, J.: Large-scale jpeg image steganalysis using hybrid deep-learning framework. IEEE Trans. Inf. Forensics Secur. 5, 1200–1214 (2018)

    Article  Google Scholar 

Download references

Acknowledgment

The authors would like to thank the French Defense Procurement Agency (DGA) for its support through the ANR Alaska project (ANR-18-ASTR-0009). We also thank IBM Montpellier and the Institute for Development and Resources in Intensive Scientific Computing (IDRISS/CNRS) for providing us access to High-Performance Computing resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Chaumont .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ruiz, H., Yedroudj, M., Chaumont, M., Comby, F., Subsol, G. (2021). LSSD: A Controlled Large JPEG Image Database for Deep-Learning-Based Steganalysis “Into the Wild”. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68780-9_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68779-3

  • Online ISBN: 978-3-030-68780-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics