Skip to main content
Log in

Parallel Data Migration Framework on Linux Clusters

  • Research Article – Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

A rapid growth in the storage capacity requirements at a computer center can lead to the installation of additional disk racks. The challenging task is not the installation, but to migrate old data to the new storage pools. A framework to parallelize the data migration process, using Linux clusters connected to Storage Area Network storage, is presented. A Linux tool to efficiently parallelize data migration, utilizing the High Performance Computing environment, is developed. Results show that using multiple nodes and multiple data copying streams per node achieves significant speedup factors over manual copying. The tool is demonstrated on four nodes using 178 data copying streams, achieving a speedup factor close to seven. The tool is scalable and capable of higher speedup factors with more available data moving nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hulen, H.; Graf, O.; Fitzgerald, K.; Watson, R.: Storage Area Networks and the high performance storage system. In: Nineteenth IEEE Symposium on Mass Storage Systems (2002)

  2. Eisler, M.; Corbett, P.; Kazar M.; Nydick, D.S.: Data ONTAP GX: A scalable storage cluster. In: Proceedings of the 5th USENIX Conference on File and Storage Systems (FAST’07), San Jose, CA (2007)

  3. Golubchik, L.; Khuller, S.; Kim, Y.; Shargorodskaya, S.; Wan, Y-C.: Data migration on parallel disks. In: Proceedings of European Symposium on Algorithms (2004), pp 689–701. LNCS 3221, Springer (2004)

  4. Zissimos, A.; Doka, K.; Chazapis, A.; Koziris, N.: GridTorrent: Optimizing data transfers in the Grid with collaborative sharing. In: Proceedings of the 11th Panhellenic Conference on Informatics (PCI2007). Patras, Greece, May 2007

  5. Allcock, W.; Bresnahan, J.; Kettimuthu, R.; Link, M.; Dumitrescu, C.; Raicu, I.; Foster, I.: The Globus striped GridFTP framework and server. In: Proceedings of Super Computing 2005 (SC05), November 2005

  6. Thain, D.; Basney, J.; Son, S.C.; Livny, M.: The Kangaroo approach to data movement on the grid. In: Proceedings of Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10), San Francisco, California, August 7–9 2001

  7. Samar, A.; Stockinger, H.: Grid data management pilot (GDMP): a tool for wide area replication. In: Proceedings of IASTED International Conference on Applied Informatics (AI2001), Innsbruck, Austria, February 19–22, 2001

  8. Kosar T., Livny M.: A framework for reliable and efficient data placement in distributed computing systems. J. Parallel and Distrib. Comput 65, 1146–1157 (2005)

    Article  Google Scholar 

  9. Lim,S.; Fox, G.; Kaplan, A.; Pallickara, S.; Pierce, M.: GridFTP and parallel TCP support in naradabrokering. In: Proceedings of International Conference on Algorithms and Architectures for Parallel Processing, pp. 93–102 (2005)

  10. Chen, L.; Zhu, Q.; Agrawal, G.: Supporting dynamic migration in tightly coupled grid applications. In: Proceedings of ACM/IEEE SC 2006 Conference, November 2006

  11. Cao, M.; Tso, T.; Pulavarty, B.; Bhattacharya, S.; Dilger, A.; Thomas, A.: State of the art: where we are with the Ext3 filesystem. In: Proceedings of the 2005 Ottawa Linux Symposium, 2005

  12. Eisler, M.; Corbett, P.; Kazar, M.; Nydick, D.S.; Wagner, J.C.: Data ONTAP GX: a Scalable storage cluster. In: Proceedings of FAST’ 07, 2007

  13. Clark, T.: Designing storage area networks: a practical reference for implementing fibre channel and IP SANs, Addison-Wesley Professional, Reading (2003)

  14. Schmuck, F.; Haskin, R.: GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the First Conference on File and Storage Technologies (FAST), pp. 231–244, January 2002

  15. Shepard, L.; Eppe, E.: SGI InfiniteStorage Shares Filesystem CXFS: A high-performance, multi-OS filesystem from SGI, Technical Report, Silicon Graphics, 2006

  16. Carns, P.; Ligon III, W.B.; Rajeev, R.B.; Thakur.: PVFS: A Parallel File System for Linux Clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, pp. 317–327 (2000)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed K. AlGhuson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mudawar, M.F., AlGhuson, M.K. Parallel Data Migration Framework on Linux Clusters. Arab J Sci Eng 36, 785–794 (2011). https://doi.org/10.1007/s13369-011-0073-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-011-0073-5

Keywords

Navigation