Abstract
Advancements in satellite imaging and sensor technologies result in capturing of large amount of spatial data. Many parallel processing techniques based on data or control parallelism have been attempted during the past 2 decades to provide performance improvement in image processing applications such as urban sprawl, weather prediction and crop estimation. These techniques have used block-based distributed file processing or the more modern MapReduce-based programming for implementation which still have gaps between optimal and best processing in terms of resource scheduling, data distribution and ease of programming. In this paper, we present a layered framework for parallel data processing to improve storage, retrieval and processing performance of spatial data on an underlying distributed file system. The paper presents a data placement strategy across a distributed HDFS cluster in a way to optimize spatial data retrieval and processing. The presence of neighborhood pixels local to the processing node in a distributed environment reduces network latencies and improves the efficiency of applications such as object recognition, change detection and site selection. We evaluate the data placement strategy on a four-node HDFS cluster and show that it can deliver good performance benefits by way of reading blocks of data at almost 10–12 times the default, which contributes to the improvement in efficiency of the various applications that use region growing methods.
Similar content being viewed by others
References
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., et al. (2013). Hadoop-GIS: A high performance spatial data warehousing system over mapreduce. VLDB, 6, 1009–1020.
APACHE. (2010). Hadoop mapreduce framework. Available http://hadoop.apache.org/mapreduce/. Accessed Mar 2017.
Cudre-Mauroux, P., et al. (2009). A demonstration of SciDB: A science-oriented DBMS. VLDB, 2(2), 1534–1537.
Eldawy, A. (2014). SpatialHadoop: Towards flexible and scalable spatial processing using mapreduce (pp. 46–50). New York: ACM Press.
Eldawy, A., & Mokbel, M. F. (2015). The era of big spatial data: A survey. DBSJ Journal, 13(1), 163–273.
Giachetta, R. (2015). A framework for processing large scale geospatial and remote sensing data in MapReduce environment. Computers & Graphics, 49, 37–46.
Kune, R., Konugurthi, P., Agarwal, A., Chillarige, R. R., & Buyya, R. (2015). XHAMI—Extended HDFS and mapreduce interface for image processing applications (pp. 43–51). https://doi.org/10.1109/ccem.
Matlab. (2009–2018), The MathWorks Inc., Block processing. Available: https://www.mathworks.com/examples/image/mw/imagesex86052154-block-processing-large-images. Accessed Mar 2017.
Nicolescu, C., & Jonker, P. (2001). A data and task parallel image processing environment. Lecture Notes in Computer Science (Vol. 2131, pp. 393–408).
Saxena, S., Sharma, N., & Sharma, S. (2013). Image processing tasks using parallel computing in multicore architecture and its applications in medical imaging. International Journal of Advanced Research in Computer and communication Engineering, 2(4), 1896–1900.
Sweeney, C., Liu, L., Arietta, S., & Lawrence, J. (2011).HIPI: A Hadoop image processing interface for image-based mapreduce tasks. B.S. thesis, University of Virginia.
Tesfamariam, E. B. (2011). Distributed processing of large remote sensing images using mapreduce—A case of edge detection. MS Theses, Münster,North-Rhine Westphalia, Germany.
Vemula, S., & Crick, C. (2015). Hadoop image processing framework (pp. 506–513). https://doi.org/10.1109/bigdatacongress.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Phani Bhushan, R., Somayajulu, D.V.L.N., Venkatraman, S. et al. A Raster Data Framework Based on Distributed Heterogeneous Cluster. J Indian Soc Remote Sens 47, 715–723 (2019). https://doi.org/10.1007/s12524-018-0897-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12524-018-0897-5