Guest Editorial: The Parallel Storage, Processing and Analysis for Big Data
- 585 Downloads
Big data has received a significant amount of attention from both the academia and industry since the US government announced the Big Data Initiative in 2012. Big data is featured swith a number of Vs include high volume, velocity, variety and veracity. The purpose of the special issue is to collate a selection of representative articles that were primarily presented at the 11th International Conference on Natural Computation (ICNC) on 15–17 August 2015 in Zhangjiajie, China. The special issue was also made open to public submissions for a wide inclusion.
MapReduce [1, 2, 3] has become a major computing model in support of big data applications especially in dealing with data of huge volumes in an offline mode. As a result, MapReduce has been widely employed in parallelization of data intensive applications. In , Wang et al. employ MapReduce to parallel a spatio-temporal convolutional neural network for automatic recognition of action events in surveillance videos. The Map and Reduce processes are implemented on a single computer with multiple CPU cores. Liu et al.  further evaluate the performance of MapReduce in computation in comparison with Spark , an in-memory computing technology which can be deployed on MapReduce. A back propagation neural network (BPNN) is parallelized with MapReduce and Spark respectively and the speedup of the parallel BPNN using Spark is significantly faster than the one parallelized with MapReduce. Alternatively, Phan et al.  present an approach to speeding up the computation of neural networks inspired by the parallel circuits found in the human retina. Wu and Wang  extends MapReduce in support of applications with data dependencies facilitating coarse-grained parallelism.
Modern computers nowadays are equipped with multiple CPU cores. A number of papers are included in this special issue targeting at multi-core computing platforms. In , Wang et al. present a partition algorithm for multi-dimensional loop applications on heterogeneous multicore processors. In this work the authors take into account memory access pattern information and fully consider the heterogeneity of processors to achieve high processor utilization. Gu et al.  aim to discover the utilization of CPU resources by mining usage patterns. In , Liu et al. analyze deadlock conditions in concurrent programs which normally are deployed on multi-core platforms. A Petri net is employed to model concurrent programs and the correlation between the Petri net model and sub-process nets is analyzed. Ouyang et al.  look at hardware/software partitioning taking into account communication overhead on heterogeneous multiprocessor system-on-chip platforms.
High performance storage systems are of vital important in support of big data applications. Ou et al.  fully exploit flash memory to target at high-performance PCIe SSD for large capacity storage. Deduplication plays an increasing role in eliminating replicas and saving space and network bandwidth in various storage systems. For this purpose, Deng et al.  focus on deduplication over small-scale storage systems with adequate bandwidth in between and propose a deduplication system with request-aware placement policy. Du et al.  take into account data migration in reduction of power consumption of embedded systems that are empowered with emerging scathed-pad memory and non-volatile memory.
Cloud computing providers such as Amazon still replied on Web services in support of big data applications. In , Du et al. look at big data issues in the form of Web services and employ service clusters to deal with the complexity in computation. Sun et al.  research the problem of repairing workflow models by employing a mirroring matrix based on the variant of footprints and parallel programs.
We hope that the perspectives presented in this special issue would be of a great interest to the readers. We also encourage the readers to contribute to this exciting and fast growing research area.
We would like to thank Ms. Renuka Nidhi, an Assistant of the Journals Editorial Office of Springer, for her great support in publication of the special issue.
- 4.Wang, Q., Zhao, J., Gong, D., Shen, Y., Li, M., Lei, Y.: Parallelizing convolutional neural networks for action event recognition in surveillance videos. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0451-4
- 5.Liu, Y., Xu, L., Li, M.: The parallelization of back propagation neural network in MapReduce and Spark. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0401-1
- 6.Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), Boston (2010)Google Scholar
- 7.Phan, K.T., Maul, T.H., Vu, T.T.: An empirical study on improving the speed and generalization of neural networks using a parallel circuit approach. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0435-4
- 8.Wu, H.H., Wang, C.M.: Generalization of large-scale data processing in one MapReduce job for coarse-grained parallelism. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0444-3
- 9.Wang, Y., Li, K., Li, K.: Partition scheduling on heterogeneous multicore processors for multi-dimensional loops applications. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0445-2
- 10.Gu, Z., He, L., Chang, C., Sun, J., Chen, H., Huang, C.: Developing an efficient pattern discovery method for CPU utilizations of computers. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0439-0
- 11.Liu, W., Wang, L., Du, Y., Li, M.: Deadlock property analysis of concurrent programs based on Petri Net structure. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0440-7
- 12.Ouyang, A., Peng, X., Liu, J., Sallam, A.: Hardware/software partitioning for heterogeneous MPSoC considering communication overhead. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0466-x
- 13.Ou, Y., Xiao, N., Liu, F., Chen, Z., Chen, W., Wu, L.: Gemini: a novel hardware and software implementation of high-performance PCIe SSD. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0449-y
- 14.Deng, M., Chen, W., Xiao, N., Yu, S., Hu, Y.: GLE-Dedup: a globally–locally even deduplication by request-aware placement for better read performance. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0450-5
- 15.Du, J., Li, R., Xiao, Z., Tong, Z., Zhang, L.: Optimization of data allocation on CMP embedded system with data migration. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0436-3
- 16.Du, Y., Wang, L., Qi, M.: Constructing service clusters based on service space. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0437-2
- 17.Sun, Y., Du, Y., Li, M.: A repair of workflow models based on mirroring matrices. Int. J. Parallel Prog. (2016). doi: 10.1007/s10766-016-0438-1