An Efficient Implementation of the Algorithm by Lukáš et al. on Hadoop

  • Giuseppe Cattaneo
  • Umberto Ferraro Petrillo
  • Michele Nappi
  • Fabio Narducci
  • Gianluca RoscignoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10232)


Apache Hadoop offers the possibility of coding full-fledged distributed applications with very low programming efforts. However, the resulting implementations may suffer from some performance bottlenecks that nullify the potential of a distributed system. An engineering methodology based on the implementation of smart optimizations driven by a careful profiling activity may lead to a much better experimental performance as shown in this paper.

In particular, we take as a case study the algorithm by Lukáš et al. used to solve the Source Camera Identification problem (i.e., recognizing the camera used for acquiring a given digital image). A first implementation has been obtained, with little effort, using the default facilities available with Hadoop. A deep profiling allowed us to pinpoint some serious performance issues affecting the initial steps of the algorithm and related to a bad usage of the cluster resources. Optimizations were then developed and their effects were measured by accurate experimentation. The improved implementation is able to optimize the usage of the underlying cluster resources as well as of the Hadoop framework, thus resulting in a much better performance than the original naive implementation.


Distributed computing Hadoop Source Camera Identification 


  1. 1.
    Bayram, S., Sencar, H.T., Memon, N., Avcibas, I.: Source camera identification based on CFA interpolation. In: IEEE International Conference on Image Processing (ICIP), vol. 3, pp. 69–72. IEEE (2005)Google Scholar
  2. 2.
    Cattaneo, G., Ferraro Petrillo, U., Giancarlo, R., Roscigno, G.: An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop. J. Supercomput., 1–17 (2016).
  3. 3.
    Cattaneo, G., Ferraro Petrillo, U., Roscigno, G., Fusco, C.: A PNU-based technique to detect forged regions in digital images. In: Battiato, S., Blanc-Talon, J., Gallo, G., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2015. LNCS, vol. 9386, pp. 486–498. Springer, Cham (2015). doi: 10.1007/978-3-319-25903-1_42 CrossRefGoogle Scholar
  4. 4.
    Cattaneo, G., Roscigno, G.: A possible pitfall in the experimental analysis of tampering detection algorithms. In: 17th International Conference on Network-Based Information Systems (NBiS), pp. 279–286, September 2014Google Scholar
  5. 5.
    Cattaneo, G., Roscigno, G., Bruno, A.: Using PNU-based techniques to detect alien frames in videos. In: Blanc-Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2016. LNCS, vol. 10016, pp. 735–746. Springer, Cham (2016). doi: 10.1007/978-3-319-48680-2_64 CrossRefGoogle Scholar
  6. 6.
    Cattaneo, G., Roscigno, G., Ferraro Petrillo, U.: Experimental evaluation of an algorithm for the detection of tampered JPEG images. In: Linawati, M.M.S., Neuhold, E.J., Tjoa, A.M., You, I. (eds.) CT-EurAsia 2014. LNCS, vol. 8407, pp. 643–652. Springer, Heidelberg (2014). doi: 10.1007/978-3-642-55032-4_66 CrossRefGoogle Scholar
  7. 7.
    Cattaneo, G., Roscigno, G., Ferraro Petrillo, U.: A scalable approach to source camera identification over Hadoop. In: IEEE 28th International Conference on Advanced Information Networking and Applications (AINA), pp. 366–373. IEEE (2014)Google Scholar
  8. 8.
    Choi, J., Choi, C., Ko, B., Choi, D., Kim, P.: Detecting web based DDoS attack using MapReduce operations in cloud computing environment. J. Internet Serv. Inf. Secur. (JISIS) 3(3/4), 28–37 (2013)Google Scholar
  9. 9.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  10. 10.
    Ferraro Petrillo, U., Roscigno, G., Cattaneo, G., Giancarlo, R.: FASTdoop: a versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications. Bioinformatics (2017).
  11. 11.
    Fridrich, J., Lukáš, J., Goljan, M.: Detecting digital image forgeries using sensor pattern noise. In: SPIE, Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents VIII, vol. 6072, pp. 1–11 (2006)Google Scholar
  12. 12.
    Gloe, T.: Feature-based forensic camera model identification. In: Shi, Y.Q., Katzenbeisser, S. (eds.) Transactions on Data Hiding and Multimedia Security VIII. LNCS, vol. 7228, pp. 42–62. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-31971-6_3 CrossRefGoogle Scholar
  13. 13.
    Goljan, M., Fridrich, J., Filler, T.: Large scale test of sensor fingerprint camera identification. In: IS&T/SPIE, Electronic Imaging, Security and Forensics of Multimedia Contents XI, vol. 7254, pp. 1–12. International Society for Optics and Photonics (2009)Google Scholar
  14. 14.
    Goljan, M., Fridrich, J., Filler, T.: Managing a large database of camera fingerprints. In: SPIE Conference on Media Forensics and Security, vol. 7541, pp. 1–12. International Society for Optics and Photonics (2010)Google Scholar
  15. 15.
    Golpayegani, N., Halem, M.: Cloud computing for satellite data processing on high end compute clusters. In: IEEE International Conference on Cloud Computing, pp. 88–92. IEEE (2009)Google Scholar
  16. 16.
    Lukáš, J., Fridrich, J., Goljan, M.: Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 1, 205–214 (2006)CrossRefGoogle Scholar
  17. 17.
    McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al.: The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297–1303 (2010)CrossRefGoogle Scholar
  18. 18.
    Precision Optical Imaging: ISO Noise Chart 15739 (2011).
  19. 19.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)Google Scholar
  20. 20.
    The Apache Software Foundation: Apache Hadoop (2016).
  21. 21.
    White, T.: The small files problem. Cloudera (2009).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Giuseppe Cattaneo
    • 1
  • Umberto Ferraro Petrillo
    • 2
  • Michele Nappi
    • 1
  • Fabio Narducci
    • 1
  • Gianluca Roscigno
    • 1
    Email author
  1. 1.Dipartimento di InformaticaUniversità degli Studi di SalernoFiscianoItaly
  2. 2.Dipartimento di Scienze StatisticheUniversità di Roma “La Sapienza”RomaItaly

Personalised recommendations