Abstract
Apache Hadoop offers the possibility of coding full-fledged distributed applications with very low programming efforts. However, the resulting implementations may suffer from some performance bottlenecks that nullify the potential of a distributed system. An engineering methodology based on the implementation of smart optimizations driven by a careful profiling activity may lead to a much better experimental performance as shown in this paper.
In particular, we take as a case study the algorithm by Lukáš et al. used to solve the Source Camera Identification problem (i.e., recognizing the camera used for acquiring a given digital image). A first implementation has been obtained, with little effort, using the default facilities available with Hadoop. A deep profiling allowed us to pinpoint some serious performance issues affecting the initial steps of the algorithm and related to a bad usage of the cluster resources. Optimizations were then developed and their effects were measured by accurate experimentation. The improved implementation is able to optimize the usage of the underlying cluster resources as well as of the Hadoop framework, thus resulting in a much better performance than the original naive implementation.
Notes
- 1.
A copy of the source code of our implementation is available upon request.
References
Bayram, S., Sencar, H.T., Memon, N., Avcibas, I.: Source camera identification based on CFA interpolation. In: IEEE International Conference on Image Processing (ICIP), vol. 3, pp. 69–72. IEEE (2005)
Cattaneo, G., Ferraro Petrillo, U., Giancarlo, R., Roscigno, G.: An effective extension of the applicability of alignment-free biological sequence comparison algorithms with Hadoop. J. Supercomput., 1–17 (2016). http://dx.doi.org/10.1007/s11227-016-1835-3
Cattaneo, G., Ferraro Petrillo, U., Roscigno, G., Fusco, C.: A PNU-based technique to detect forged regions in digital images. In: Battiato, S., Blanc-Talon, J., Gallo, G., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2015. LNCS, vol. 9386, pp. 486–498. Springer, Cham (2015). doi:10.1007/978-3-319-25903-1_42
Cattaneo, G., Roscigno, G.: A possible pitfall in the experimental analysis of tampering detection algorithms. In: 17th International Conference on Network-Based Information Systems (NBiS), pp. 279–286, September 2014
Cattaneo, G., Roscigno, G., Bruno, A.: Using PNU-based techniques to detect alien frames in videos. In: Blanc-Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2016. LNCS, vol. 10016, pp. 735–746. Springer, Cham (2016). doi:10.1007/978-3-319-48680-2_64
Cattaneo, G., Roscigno, G., Ferraro Petrillo, U.: Experimental evaluation of an algorithm for the detection of tampered JPEG images. In: Linawati, M.M.S., Neuhold, E.J., Tjoa, A.M., You, I. (eds.) CT-EurAsia 2014. LNCS, vol. 8407, pp. 643–652. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55032-4_66
Cattaneo, G., Roscigno, G., Ferraro Petrillo, U.: A scalable approach to source camera identification over Hadoop. In: IEEE 28th International Conference on Advanced Information Networking and Applications (AINA), pp. 366–373. IEEE (2014)
Choi, J., Choi, C., Ko, B., Choi, D., Kim, P.: Detecting web based DDoS attack using MapReduce operations in cloud computing environment. J. Internet Serv. Inf. Secur. (JISIS) 3(3/4), 28–37 (2013)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Ferraro Petrillo, U., Roscigno, G., Cattaneo, G., Giancarlo, R.: FASTdoop: a versatile and efficient library for the input of FASTA and FASTQ files for MapReduce Hadoop bioinformatics applications. Bioinformatics (2017). https://dx.doi.org/10.1093/bioinformatics/btx010
Fridrich, J., Lukáš, J., Goljan, M.: Detecting digital image forgeries using sensor pattern noise. In: SPIE, Electronic Imaging, Security, Steganography, and Watermarking of Multimedia Contents VIII, vol. 6072, pp. 1–11 (2006)
Gloe, T.: Feature-based forensic camera model identification. In: Shi, Y.Q., Katzenbeisser, S. (eds.) Transactions on Data Hiding and Multimedia Security VIII. LNCS, vol. 7228, pp. 42–62. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31971-6_3
Goljan, M., Fridrich, J., Filler, T.: Large scale test of sensor fingerprint camera identification. In: IS&T/SPIE, Electronic Imaging, Security and Forensics of Multimedia Contents XI, vol. 7254, pp. 1–12. International Society for Optics and Photonics (2009)
Goljan, M., Fridrich, J., Filler, T.: Managing a large database of camera fingerprints. In: SPIE Conference on Media Forensics and Security, vol. 7541, pp. 1–12. International Society for Optics and Photonics (2010)
Golpayegani, N., Halem, M.: Cloud computing for satellite data processing on high end compute clusters. In: IEEE International Conference on Cloud Computing, pp. 88–92. IEEE (2009)
Lukáš, J., Fridrich, J., Goljan, M.: Digital camera identification from sensor pattern noise. IEEE Trans. Inf. Forensics Secur. 1, 205–214 (2006)
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al.: The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9), 1297–1303 (2010)
Precision Optical Imaging: ISO Noise Chart 15739 (2011). http://www.precisionopticalimaging.com/products/products.asp?type=15739
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
The Apache Software Foundation: Apache Hadoop (2016). http://hadoop.apache.org/
White, T.: The small files problem. Cloudera (2009). http://www.cloudera.com/blog/2009/02/the-small-files-problem/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cattaneo, G., Ferraro Petrillo, U., Nappi, M., Narducci, F., Roscigno, G. (2017). An Efficient Implementation of the Algorithm by Lukáš et al. on Hadoop. In: Au, M., Castiglione, A., Choo, KK., Palmieri, F., Li, KC. (eds) Green, Pervasive, and Cloud Computing. GPC 2017. Lecture Notes in Computer Science(), vol 10232. Springer, Cham. https://doi.org/10.1007/978-3-319-57186-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-57186-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57185-0
Online ISBN: 978-3-319-57186-7
eBook Packages: Computer ScienceComputer Science (R0)