Abstract
Data pre-deployment in the HDFS (Hadoop distributed file systems) is more complicated than that in traditional file systems. There are many key issues need to be addressed, such as determining the target location of the data prefetching, the amount of data to be prefetched, the balance between data prefetching services and normal data accesses. Aiming to solve these problems, we employ the characteristics of digital ocean information service flows and propose a deployment scheme which combines input data prefetching with output data oriented storage strategies. The method achieves the parallelism of data preparation and data processing, thereby massively reducing I/O time cost of digital ocean cloud computing platforms when processing multi-source information synergistic tasks. The experimental results show that the scheme has a higher degree of parallelism than traditional Hadoop mechanisms, shortens the waiting time of a running service node, and significantly reduces data access conflicts.
Similar content being viewed by others
Reference
Chen D W, He Y J. 2010. A study on secure data storage strategy in cloud computing. JCIT: Journal of Convergence Information Technology, 5(7): 175–179
Chilimbi T M, Hirzel M. 2002. Dynamic hot data stream prefetching for general-purpose programs. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation. New York: ACM Press, 199–209
Cilku B, Ye X D, Hu G, et al. 2010. Using a local prefetch strategy to obtain temporal time predictability. In: 2010 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT 2010). Chengdu, China: IEEE, 8: 576–580
Couceiro M, Romano P, Rodrigues L. 2011. PolyCert: Polymorphic self-optimizing replication for in-memory transactional grids. In: Proceedings of the ACM/IFIP/USENIX 12th International Middleware Conference. Berlin Heidelberg: Springer, 309–328
Huang Y, Gu Z M, Tang J, et al. 2012. Reducing cache pollution of threaded prefetching by controlling prefetch distance. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2012). Shanghai, China: IEEE, 1812–1819
Kawata S. 2010. Review of PSE (Problem Solving Environment) study. JCIT: Journal of Convergence Information Technology, 5(4): 204–215
Kobashi H, Kawata S, Manabe Y, et al. 2010. PSE Park: Framework for problem solving environments. JCIT: Journal of Convergence Information Technology, 5(4): 225–239
Kyriazis D, Tserpes K, Menychtas A, et al. 2008. An innovative workflow mapping mechanism for grids in the frame of quality of service. Future Generation Computer Systems, 24(6): 498–511
Lin F, Zeng W H, Jiang Y, et al. 2010. A group tracing and filtering tree for REST DDos in cloud. JDCTA: International Journal of Digital Content Technology and its Applications, 4(9): 212–224
Lin L, Li X M, Jiang H, et al. 2008. AMP: an affinity-based metadata prefetching scheme in large-scale distributed storage systems. In: Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’08). 459–466
Liu K, Chen J, Yang Y, et al. 2008. A throughput maximization strategy for scheduling transaction-intensive workflows on SwinDeW-G. In: Concurrency and Computation: Practice and Experience-2nd International Workshop on Workflow Management and Applications in Grid Environments. Chichester, UK: John Wiley and Sons Ltd., 1807–1820
Nori A K. 2010. Distributed caching platforms. In: Proceedings of the 36th International Conference on Very Large Data Bases (VLDB 2010). Singapore: VLDB Endowment Inc., 1645–1646
Seo S, Jang I, Woo K, et al. 2009. HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment. In: Proceedings of IEEE International Conference on Cluster Computing and Workshops. New Orleans, LA: IEEE, 1–8
Shafer J, Rixner S, Cox A L. 2010. The Hadoop distributed filesystem: Balancing portability and performance. In: Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2010). White Plains, NY: IEEE, 122–133
Shi Suixiang, Liu Yang, Wei Hongyu, et al. 2013. Research on cloud computing and services framework of marine environmental information management. Acta Oceanologica Sinica, 32(10):57–66
Tang L M, Xing S X, Chen T H. 2012. An improved adaptive cache prefetch algorithm. In: 2012 5th International Symposium on Computational Intelligence and Design (ISCID 2012), 2: 255–258
Wenisch T F, Somogyi S, Hardavellas N, et al. 2005. Temporal streaming of shared memory. In: Proceedings of the 32nd Annual International Symposium on Computer Architecture. Los Alamitos: IEEE Computer Society, 222–233
Wu C J, Jaleel A, Martonosi M, et al. 2011. PACMan: Prefetchaware cache management for high performance caching. In: Proceedings of the Annual International Symposium on Microarchitecture, MICRO. Porto Alegre, Brazil: ACM, 442–453
Xu Y J, Xu L Y, Liu N, et al. 2010. Marine service flow design based on cloud computing. In: 2010 3rd International Conference on Computer and Electrical Engineering. V4-24–V4-27
Yoon U K, Kim H J, Chang J Y. 2010. Intelligent data prefetching for hybrid flash-disk storage using sequential pattern mining technique. In: Proceedings of the 2010 IEEE/ACIS 9th International Conference on Computer and Information Science. Yamagata: IEEE, 280–285
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: The Ocean Public Welfare Scientific Research Project of State Oceanic Administration of China under contract No. 20110533.
Rights and permissions
About this article
Cite this article
Shi, S., Xu, L., Dong, H. et al. Research on data pre-deployment in information service flow of digital ocean cloud computing. Acta Oceanol. Sin. 33, 82–92 (2014). https://doi.org/10.1007/s13131-014-0520-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13131-014-0520-8