Abstract
A rapid growth in the storage capacity requirements at a computer center can lead to the installation of additional disk racks. The challenging task is not the installation, but to migrate old data to the new storage pools. A framework to parallelize the data migration process, using Linux clusters connected to Storage Area Network storage, is presented. A Linux tool to efficiently parallelize data migration, utilizing the High Performance Computing environment, is developed. Results show that using multiple nodes and multiple data copying streams per node achieves significant speedup factors over manual copying. The tool is demonstrated on four nodes using 178 data copying streams, achieving a speedup factor close to seven. The tool is scalable and capable of higher speedup factors with more available data moving nodes.
Similar content being viewed by others
References
Hulen, H.; Graf, O.; Fitzgerald, K.; Watson, R.: Storage Area Networks and the high performance storage system. In: Nineteenth IEEE Symposium on Mass Storage Systems (2002)
Eisler, M.; Corbett, P.; Kazar M.; Nydick, D.S.: Data ONTAP GX: A scalable storage cluster. In: Proceedings of the 5th USENIX Conference on File and Storage Systems (FAST’07), San Jose, CA (2007)
Golubchik, L.; Khuller, S.; Kim, Y.; Shargorodskaya, S.; Wan, Y-C.: Data migration on parallel disks. In: Proceedings of European Symposium on Algorithms (2004), pp 689–701. LNCS 3221, Springer (2004)
Zissimos, A.; Doka, K.; Chazapis, A.; Koziris, N.: GridTorrent: Optimizing data transfers in the Grid with collaborative sharing. In: Proceedings of the 11th Panhellenic Conference on Informatics (PCI2007). Patras, Greece, May 2007
Allcock, W.; Bresnahan, J.; Kettimuthu, R.; Link, M.; Dumitrescu, C.; Raicu, I.; Foster, I.: The Globus striped GridFTP framework and server. In: Proceedings of Super Computing 2005 (SC05), November 2005
Thain, D.; Basney, J.; Son, S.C.; Livny, M.: The Kangaroo approach to data movement on the grid. In: Proceedings of Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10), San Francisco, California, August 7–9 2001
Samar, A.; Stockinger, H.: Grid data management pilot (GDMP): a tool for wide area replication. In: Proceedings of IASTED International Conference on Applied Informatics (AI2001), Innsbruck, Austria, February 19–22, 2001
Kosar T., Livny M.: A framework for reliable and efficient data placement in distributed computing systems. J. Parallel and Distrib. Comput 65, 1146–1157 (2005)
Lim,S.; Fox, G.; Kaplan, A.; Pallickara, S.; Pierce, M.: GridFTP and parallel TCP support in naradabrokering. In: Proceedings of International Conference on Algorithms and Architectures for Parallel Processing, pp. 93–102 (2005)
Chen, L.; Zhu, Q.; Agrawal, G.: Supporting dynamic migration in tightly coupled grid applications. In: Proceedings of ACM/IEEE SC 2006 Conference, November 2006
Cao, M.; Tso, T.; Pulavarty, B.; Bhattacharya, S.; Dilger, A.; Thomas, A.: State of the art: where we are with the Ext3 filesystem. In: Proceedings of the 2005 Ottawa Linux Symposium, 2005
Eisler, M.; Corbett, P.; Kazar, M.; Nydick, D.S.; Wagner, J.C.: Data ONTAP GX: a Scalable storage cluster. In: Proceedings of FAST’ 07, 2007
Clark, T.: Designing storage area networks: a practical reference for implementing fibre channel and IP SANs, Addison-Wesley Professional, Reading (2003)
Schmuck, F.; Haskin, R.: GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the First Conference on File and Storage Technologies (FAST), pp. 231–244, January 2002
Shepard, L.; Eppe, E.: SGI InfiniteStorage Shares Filesystem CXFS: A high-performance, multi-OS filesystem from SGI, Technical Report, Silicon Graphics, 2006
Carns, P.; Ligon III, W.B.; Rajeev, R.B.; Thakur.: PVFS: A Parallel File System for Linux Clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, pp. 317–327 (2000)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mudawar, M.F., AlGhuson, M.K. Parallel Data Migration Framework on Linux Clusters. Arab J Sci Eng 36, 785–794 (2011). https://doi.org/10.1007/s13369-011-0073-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-011-0073-5