A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

  • Zhongyi Sun
  • Fengke Chen
  • Mingmin Chi
  • Yangyong Zhu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9208)


With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this problem. Here, Spark is an open-source distributed computing platform with Hadoop YARN as resource scheduler and HDFS as cloud storage system. On the Spark-based platform, data loaded into memory in the first iteration can be reused in the subsequent iterations. This mechanism makes Spark much suitable for running multi-iteration algorithms compared to MapReduce which has to load data in each iteration. The experiments are carried out on massive remote sensing data using multi-iteration singular value decomposition (SVD) algorithm. The results show that Spark-based SVD can obtain significantly faster computation timethan that by MapReduce, usually by one order of magnitude.


Big data Remote sensing Spark Hadoop 



This work was supported in part by Natural Science Foundation of China under contract 71331005, in part by Shanghai Science and Technology Development Funds (13dz2260200, 13511504300), and in part by the Open Foundation of Second Institute of Oceanography (SOA).


  1. 1.
    Bilotta, G., Sánchez, R.Z., Ganci, G.: Optimizing satellite monitoring of volcanic areas through gpus and multi-core cpus image processing: An opencl case study. Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of 6(6), 2445–2452 (2013)CrossRefGoogle Scholar
  2. 2.
    Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)Google Scholar
  3. 3.
    Callico, G., Lopez, S., Aguilar, B., Lopez, J., Sarmiento, R.: Parallel implementation of the modified vertex component analysis algorithm for hyperspectral unmixing using opencl (2014)Google Scholar
  4. 4.
  5. 5.
    Dagum, L., Menon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRefGoogle Scholar
  6. 6.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  7. 7.
    Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM (2003)Google Scholar
  8. 8.
    Golpayegani, N., Halem, M.: Cloud computing for satellite data processing on high end compute clusters. In: IEEE International Conference on Cloud Computing, 2009. CLOUD 2009, pp. 88–92. IEEE (2009)Google Scholar
  9. 9.
    Grauer-Gray, S., Kambhamettu, C., Palaniappan, K.: Gpu implementation of belief propagation using cuda for cloud tracking and reconstruction. In: IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS 2008), vol. 4, p. 2 (2008)Google Scholar
  10. 10.
    Johnpaul, C., Thampi, N.S.: Distributed in-memory cluster computing approach in scala for solving graph data applications. In: 2014 International Conference on Advances in Electronics, Computers and Communications (ICAECC), pp. 1–6. IEEE (2014)Google Scholar
  11. 11.
    Programming Language, S.:
  12. 12.
    Lin, X., Wang, P., Wu, B.: Log analysis in cloud computing environment with hadoop and spark. In: 2013 5th IEEE International Conference on Broadband Network & Multimedia Technology (IC-BNMT), pp. 273–276. IEEE (2013)Google Scholar
  13. 13.
    Marchal, S., Jiang, X., State, R., Engel, T.: A big data architecture for large scale security monitoring. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 56–63. IEEE (2014)Google Scholar
  14. 14.
    Pan, X., Zhang, S.: A remote sensing image cloud processing system based on hadoop. In: 2012 IEEE 2nd International Conference on Cloud Computing and Intelligent Systems (CCIS), vol. 1, pp. 492–494. IEEE (2012)Google Scholar
  15. 15.
    Tan, Y.K.A., Tan, W.J., Kwoh, L.K.: Fast colour balance adjustment of ikonos imagery using cuda. In: IEEE International Geoscience and Remote Sensing Symposium, 2008. IGARSS 2008, vol. 2, pp. II-1052. IEEE (2008)Google Scholar
  16. 16.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp. 10–10 (2010)Google Scholar
  17. 17.
    Zhao, J., Zhou, H.: Design and optimization of remote sensing image fusion parallel algorithms based on cpu-gpu heterogeneous platforms. In: 2011 4th International Congress on Image and Signal Processing (CISP), vol. 3, pp. 1623–1627. IEEE (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Zhongyi Sun
    • 1
  • Fengke Chen
    • 1
  • Mingmin Chi
    • 1
    • 2
  • Yangyong Zhu
    • 1
  1. 1.School of Computer Science, Shanghai Key Laboratory of Data Science, Key Laboratory for Information Science of Electromagnetic Waves (MoE)Fudan UniversityShanghaiChina
  2. 2.State Key Laboratory of Satellite Ocean Environment DynamicsSecond Institute of Oceanography (SOA)HangzhouChina

Personalised recommendations