Multimedia Tools and Applications

, Volume 78, Issue 23, pp 32999–33021 | Cite as

Achieving efficient source camera identification on Hadoop

  • Giuseppe Cattaneo
  • Umberto Ferraro PetrilloEmail author
  • Andrea F. Abate
  • Fabio Narducci
  • Silvio Barra


Hadoop is a software framework allowing for the possibility of coding distributed applications starting from a MapReduce algorithm with very low programming efforts. However, the performance of the implementations resulting from such a straightforward approach are often disappointing. This may happen because a vanilla implementation of a MapReduce distributed algorithm often suffers of some performance bottlenecks that may compromise the potential of a distributed system. As a consequence of this, the execution times of the considered algorithm are not up to the expectations. In this paper, we present the work we have done for efficiently engineering, on Apache Hadoop, a reference algorithm for the Source Camera Identification problem (i.e., determining the particular digital camera used for taking a given image). The algorithm we have chosen is the algorithm by Lukáš et al.. A first implementation has been obtained in a small amount of time using the default facilities available with Hadoop. However, its performance, analyzed using a cluster of 33 PCs, was very unsatisfactory. A careful profiling of this code revealed some serious performance issues targeting the initial steps of the algorithm and resulting in a bad usage of the cluster resources. Several theoretical and practical optimizations were then tried, and their effects were measured by accurate experimentations. This allowed for the development of alternative implementations that, while leaving unaltered the original algorithm, were able to better use the underlying cluster resources as well as of the Hadoop framework, thus allowing for much better performance and reduced energy requirements than the original vanilla implementation.


Digital image forensics Source camera identification Distributed computing Hadoop Commodity hardware 



  1. 1.
    Amorosi L, Chiaraviglio L, D’Andreagiovanni F, Blefari-Melazzi N (2018) Energy-efficient mission planning of uavs for 5g coverage in rural zones. In: 2018 IEEE international conference on environmental engineering (EE), pp 1–9.
  2. 2.
    Barra S, Casanova A, Fraschini M, Nappi M (2017) Fusion of physiological measures for multimodal biometric systems. Multimed Tools Appl 76(4):4835–4847. CrossRefGoogle Scholar
  3. 3.
    Barra S, Fenu G, De Marsico M, Castiglione A, Nappi M (2018) Have you permission to answer this phone?. In: 2018 international workshop on biometrics and forensics (IWBF), pp 1–7. Institute of Electrical and Electronics Engineers Inc.
  4. 4.
    Cattaneo G, Ferraro Petrillo U, Giancarlo R, Roscigno G (2015) Alignment-free sequence comparison over Hadoop for computational biology. In: 44th international conference on parallel processing workshops (ICCPW 2015), pp 184–192. IEEE.
  5. 5.
    Cattaneo G, Roscigno G, Ferraro Petrillo U (2014) Experimental evaluation of an algorithm for the detection of tampered JPEG images. In: Information and communication technology, pp 643–652. SpringerGoogle Scholar
  6. 6.
    Cattaneo G, Roscigno G, Ferraro Petrillo U (2014) A scalable approach to source camera identification over Hadoop. In: IEEE 28th international conference on advanced information networking and applications (AINA), pp 366–373. IEEEGoogle Scholar
  7. 7.
    Cattaneo G, Roscigno G, Ferraro Petrillo U, Nappi M, Narducci F (2017) An efficient implementation of the algorithm by Lukáš et al. on Hadoop. In: The 12th international conference on green, pervasive, and cloud computing (GPC2017), pp 475–489. Springer. CrossRefGoogle Scholar
  8. 8.
    Chiaraviglio L, Amorosi L, Blefari-Melazzi N, Dell’Olmo P, Shojafar M, Salsano S (2019) Optimal management of reusable functional blocks in 5g superfluid networks. Int J Netw Manag 29(1):e2045. CrossRefGoogle Scholar
  9. 9.
    Choi J, Choi C, Ko B, Choi D, Kim P (2013) Detecting web based DDoS attack using MapReduce operations in cloud computing environment. Journal of Internet Services and Information Security 3(3/4):28–37Google Scholar
  10. 10.
    Dean J, Ghemawat S (2008) MapReduce: Simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  11. 11.
    Ferraro Petrillo U, Roscigno G, Cattaneo G, Giancarlo R (2017) FASTdoop: a versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications. Bioinformatics.
  12. 12.
    Ferraro Petrillo U, Roscigno G, Cattaneo G, Giancarlo R (2018) Informational and linguistic analysis of large genomic sequence collections via efficient hadoop cluster algorithms. Bioinformatics 34(11):1826–1833. CrossRefGoogle Scholar
  13. 13.
    Freire-Obregon D, Narducci F, Barra S, Castrillon-Santana M (2018) Deep learning for source camera identification on mobile devices. Pattern Recognition Letters. CrossRefGoogle Scholar
  14. 14.
    Goljan M, Fridrich J, Filler T (2009) Large scale test of sensor fingerprint camera identification. In: IS&T/SPIE, electronic imaging, security and forensics of multimedia contents XI, vol. 7254, pp 1–12. International Society for Optics and PhotonicsGoogle Scholar
  15. 15.
    Goljan M, Fridrich J, Filler T (2010) Managing a large database of camera fingerprints. In: SPIE conference on media forensics and security, vol 7541, pp 1–12. International Society for Optics and PhotonicsGoogle Scholar
  16. 16.
    Golpayegani N, Halem M (2009) Cloud computing for satellite data processing on high end compute clusters. In: IEEE international conference on cloud computing, pp 88–92. IEEEGoogle Scholar
  17. 17.
    Kurosawa K, Kuroki K, Saitoh N (1999) CCD fingerprint method-identification of a video camera from videotaped images. In: International conference on image processing (ICIP), vol 3, pp 537–540Google Scholar
  18. 18.
    Lukáš J, Fridrich J, Goljan M (2006) Digital camera identification from sensor pattern noise. IEEE Trans Inf Forensics Secur 1:205–214CrossRefGoogle Scholar
  19. 19.
    McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303CrossRefGoogle Scholar
  20. 20.
    Neves J, Moreno J, Barra S, Proenç H (2015) A calibration algorithm for multi-camera visual surveillance systems based on single-view metrology. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9117:552–559. Google Scholar
  21. 21.
    Neves J, Narducci F, Barra S, Proenç H (2016) Biometric recognition in surveillance scenarios: a survey. Artif Intell Rev 46(4):515–541. CrossRefGoogle Scholar
  22. 22.
    Neves J, Santos G, Filipe S, Grancho E, Barra S, Narducci F, Proenç H (2015) Quis-campi: Extending in the wild biometric recognition to surveillance environments. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9281:59–68. MathSciNetGoogle Scholar
  23. 23.
    Precision Optical Imaging (2011) ISO noise chart 15739.
  24. 24.
    Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies (MSST), pp 1–10. IEEEGoogle Scholar
  25. 25.
    The Apache Software Foundation (2016) Apache Hadoop.
  26. 26.
    White T (2009) The small files problem. Cloudera

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer SciencesUniversity of SalernoFiscianoItaly
  2. 2.Department of Statistical SciencesUniversity of Rome “La Sapienza”RomeItaly
  3. 3.Department of Sciences and TechnologiesUniversity of Naples “Parthenope”NaplesItaly
  4. 4.Department of Computer SciencesUniversity of CagliariCagliariItaly

Personalised recommendations