Abstract
With the number of satellite sensors and date centers being increased continuously, it is becoming a trend to manage and process massive remote sensing data from multiple distributed sources. However, the combination of multiple satellite data centers for massive remote sensing (RS) data collaborative processing still faces many challenges. In order to reduce the huge amounts of data migration and improve the efficiency of multi-datacenter collaborative process, this paper presents the infrastructures and services of the data management as well as workflow management for massive remote sensing data production. A dynamic data scheduling strategy was employed to reduce the duplication of data request and data processing. And by combining the remote sensing spatial metadata repositories and Gfarm grid file system, the unified management of the raw data, intermediate products and final products were achieved in the co-processing. In addition, multi-level task order repositories and workflow templates were used to construct the production workflow automatically. With the help of specific heuristic scheduling rules, the production tasks were executed quickly. Ultimately, the Multi-datacenter Collaborative Process System (MDCPS) were implemented for large-scale remote sensing data production based on the effective management of data and workflow. As a consequence, the performance of MDCPS in experiments environment showed that those strategies could significantly enhance the efficiency of co-processing across multiple data centers.
Similar content being viewed by others
References
Holmgren, J., Persson, Å., Söderman, U.: Species identification of individual trees by combining high resolution lidar data with multi-spectral images. Int. J. Remote Sens. 29(5), 1537–1552 (2008)
Hall, F.G., Hilker, T., Coops, N.C., Lyapustin, A., Huemmrich, K.F., Middleton, E., Margolis, H., Drolet, G., Black, T.A.: Multi-angle remote sensing of forest light use efficiency by observing pri variation with canopy shadow fraction. Remote Sens. Environ. 112(7), 3201–3211 (2008)
Lunetta, R.S., Knight, J.F., Ediriwickrema, J., Lyon, J.G., Worthy, L.D.: Land-cover change detection using multi-temporal modis ndvi data. Remote Sens. Environ. 105(2), 142–154 (2006)
McCabe, M.F., Wood, E.F.: Scale influences on the remote estimation of evapotranspiration using multiple satellite sensors. Remote Sens. Environ. 105(4), 271–285 (2006)
Nasa eosdis web site. http://www.esdis.eosdis.nasa.gov/
Chi, M., Plaza, A., Benediktsson, J.A., Sun, Z., Shen, J., Zhu, Y.: Big data for remote sensing: challenges and opportunities. Proc. IEEE (2015)
Institute of remote sensing and digital earth, Chinese Academy of Science. http://english.radi.cas.cn/
Zhang, W., Wang, L., Ma, Y., Liu, D.: Design and implementation of task scheduling strategies for massive remote sensing data processing across multiple data centers. Softw. Pract. Exp. 44(7), 873–886 (2014)
Bartholomé, E., Belward, A.: Glc 2000: a new approach to global land cover mapping from earth observation data. Int. J. Remote Sens. 26(9), 1959–1977 (2005)
Scharlemann, J.P., Benz, D., Hay, S.I., Purse, B.V., Tatem, A.J., Wint, G.W., Rogers, D.J.: Global data for ecology and epidemiology: a novel algorithm for temporal fourier processing modis data. PloS One 3(1), e1408 (2008)
Wang, L., Lu, K., Liu, P., Ranjan, R., Chen, L.: Ik-svd: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)
Wang, L., Ranjan, R., Kolodziej, J., Zomaya, A.Y., Alem, L.: Software tools and techniques for big data computing in healthcare clouds. Future Gener. Comput. Syst. 43, 38–39 (2015)
Wang, L., Chen, D., Hu, Y., Ma, Y., Wang, J.: Towards enabling cyberinfrastructure as a service in clouds. Comput. Electr. Eng. 39(1), 3–14 (2013)
Chen, L., Ma, Y., Liu, P., Wei, J., Jie, W., He, J.: A review of parallel computing for large-scale remote sensing image mosaicking. Cluster Comput. 18(2), 517–529 (2015)
Petcu, D., Gorgan, D., Pop, F., Tudor, D., Zaharie, D.: Satellite image processing on a grid-based platform. Int. J. Comput. 7(2), 51–58 (2014)
Cossu, R., Bally, P., Colin, O., Fusco, L.: Esa grid processing on demand for fast access to earth observation data and rapid mapping of flood events. European Geosciences Union General Assembly (2008)
Chervenak, A., Foster, I., Kesselman, C., Salisbury, C., Tuecke, S.: The data grid: towards an architecture for the distributed management and analysis of large scientific datasets. J. Netw. Comput. Appl. 23(3), 187–200 (2000)
Kussul, N., Shelestov, A., Skakun, S.: Grid approach to satellite monitoring systems integration (2008)
Tudor, D.: Mediogrid: a grid-based platform for satellite image processing (2007)
Ramapriyan, H.K., Behnke, J., Sofinowski, E.: Evolution of the earth observing system (eos) data and information system (eosdis). Standard-Based Data and Information Systems for Earth Observation, pp. 63–92. Springer, Berlin (2010)
Zhang, X., Jiang, J., Zhang, X., Wang, X.: A data transmission algorithm for distributed computing system based on maximum flow. Cluster Comput. 18(3), 1157–1169 (2015)
Cafaro, M., Epicoco, I., Quarta, G., Fiore, S., Aloisio, G.: Design and implementation of a grid computing environment for remote sensing. High Performance Computing in Remote Sensing, p. 281. Chapman&Hall/CRC, Boca Raton (2007)
Hoschek, W., Jaen-Martinez, J., Samar, A., Stockinger, H., Stockinger, K.: Data management in an international data grid project. In: Grid ComputingGRID 2000. Springer, pp. 77–90 (2000)
Di, L.: The development of remote-sensing related standards at fgdc, ogc, and iso tc 211. In: 2003 IEEE International on Geoscience and Remote Sensing Symposium, 2003. IGARSS’03, vol. 1, pp. 643–647. IEEE (2003)
Coleşa, A., Ignat, I., Opriş, R.: Providing high data availability in mediogrid. In: Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2006. SYNASC’06, pp. 296–302. IEEE (2006)
Tatebe, O., Hiraga, K., Soda, N.: Gfarm grid file system. New Gener. Comput. 28(3), 257–275 (2010)
Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.: G-hadoop: Mapreduce across distributed data centers for data-intensive computing. Future Gener. Comput. Syst. 29(3), 739–750 (2013)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Wang, Y., Liu, Z., Liao, H., Li, C.: Improving the performance of gis polygon overlay computation with mapreduce for spatial big data processing. Cluster Comput. 18(2), 507–516 (2015)
El Homsi, A.: Workflow system and method (2006). US Patent 7,065,493
Guo, H., Wang, L., Chen, F., Liang, D.: Scientific big data and digital earth. Chin. Sci. Bull. 59(35), 5066–5073 (2014)
Yu, J., Buyya, R., Ramamohanarao, K.: Workflow scheduling algorithms for grid computing. Metaheuristics for Scheduling in Distributed Computing Environments, pp. 173–214. Springer, Berlin (2008)
Wang, L., Khan, S.U., Chen, D., Kołodziej, J., Ranjan, R., Xu, C.Z., Zomaya, A.: Energy-aware parallel task scheduling in a cluster. Future Gener. Comput. Syst. 29(7), 1661–1670 (2013)
Nita, M.C., Pop, F., Voicu, C., Dobre, C., Xhafa, F.: Momth: multi-objective scheduling algorithm of many tasks in hadoop. Cluster Comput. 18(3), 1011–1024 (2015)
Song, W., Yue, S., Wang, L., Zhang, W., Liu, D.: Task scheduling of massive spatial data processing across distributed data centers: Wwat’s new? In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 976–981. IEEE (2011)
Zhang, W., Wang, L., Liu, D., Song, W., Ma, Y., Liu, P., Chen, D.: Towards building a multi-datacenter infrastructure for massive remote sensing image processing. Concurr. Comput. Pract. Exp. 25(12), 1798–1812 (2013)
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurr. Comput. Pract. Exp. 18(10), 1039–1065 (2006)
Jaeger, E., Altintas, I., Zhang, J., Ludäscher, B., Pennington, D., Michener, W.: A scientific workflow approach to distributed geospatial data processing using web services. In: SSDBM, vol. 3, pp. 87–90. Citeseer (2005)
Maheswaran, M., Ali, S., Siegal, H., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Eighth Proceedings on Heterogeneous Computing Workshop, 1999 (HCW’99), pp. 30–44. IEEE (1999)
Multiple satellite data centre workflow scheduling algorithm on basis of near data calculation principle (2015). https://www.google.com/patents/CN104484230A?cl=en CN Patent App. CN 201,410,851,865
Acknowledgments
Dr. Yan Ma’s work is supported by the National High Technology Research and Development Program of China (“863” Program) (No. 2013AA12A301). The authors would also like to acknowledge the editors and anonymous reviewers for their valuable comments on the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, J., Yan, J., Ma, Y. et al. Infrastructures and services for remote sensing data production management across multiple satellite data centers. Cluster Comput 19, 1243–1260 (2016). https://doi.org/10.1007/s10586-016-0577-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0577-6