Euro-Par 2006: Euro-Par 2006 Parallel Processing pp 437-447 | Cite as
Vigne: Towards a Self-healing Grid Operating System
Conference paper
Abstract
We consider building a Grid Operating System in order to relieve users and programmers from the burden of dealing with the highly distributed and volatile resources of computational grids. To tolerate the volatility of the nodes, the system should be self-healing, that is continuously adapt to additions, removals, and failures of nodes. We present the self-healing architecture of the Vigne Grid Operating System through three of its services: system membership, application management, and volatile data management. The experimental results obtained show that our approach is feasible.
Keywords
Application Manager Overlay Network Distribute Hash Table Access Request High Level Service
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999)Google Scholar
- 2.Rilling, L., Morin, C.: A practical transparent data sharing service for the grid. In: Proc. Fifth International Workshop on Distributed Shared Memory (DSM 2005), Held in conjunction with CCGrid 2005, Cardiff, UK (2005)Google Scholar
- 3.Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)CrossRefGoogle Scholar
- 4.Rhea, S., Geels, D., Roscoe, T., Kubiatowicz, J.: Handling churn in a DHT. In: Proceedings of the USENIX Annual Technical Conference, pp. 127–140 (2004)Google Scholar
- 5.Mena, S., Schiper, A., Wojciechowski, P.: A step towards a new generation of group communication systems. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 414–432. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 6.Garbey, M., Ltaief, H.: Fault tolerant domain decomposition for parabolic problems. In: 16th International Conference on Domain Decomposition Methods. Lecture Notes in Computational Science and Engineering, Springer, Heidelberg (to appear, 2005)Google Scholar
- 7.Li, K., Hudak, P.: Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems 7(4), 321–359 (1989)CrossRefGoogle Scholar
- 8.Rilling, L.: Système d’exploitation à image unique pour une grille de composition dynamique: conception et mise en œuvre de services fiables pour exécuter les applications distribuées partageant des données. PhD thesis, Université de Rennes 1, IRISA, Rennes, France (in French) (2005)Google Scholar
- 9.Jeanvoine, E., Rilling, L., Morin, C., Leprince, D.: Using overlay networks to build operating system services for large scale grids. In: Proceedings of the fifth International Symposium on Parallel and Distributed Computing (ISPDC 2006), Timisoara, Romania (to appear, 2006)Google Scholar
- 10.Saroiu, S., Gummadi, P.K., Gribble, S.D.: A measurement study of peer-to-peer file sharing systems. In: Proceedings of Multimedia Computing and Networking (MMCN 2002), San Jose, CA, USA (2002)Google Scholar
- 11.Grimshaw, A.S., Wulf, W.A., Team, C.T.L.: The legion vision of a worldwide virtual computer. Communications of the ACM 40(1), 39–45 (1997)CrossRefGoogle Scholar
- 12.Krauter, K., Maheswaran, M.: Architecture for a grid operating system. In: Buyya, R., Baker, M. (eds.) GRID 2000. LNCS, vol. 1971, pp. 65–76. Springer, Heidelberg (2000)CrossRefGoogle Scholar
- 13.Mirtchovski, A., Simmonds, R., Minnich, R.: Plan 9 – an integrated approach to grid computing. In: 18th International Parallel and Distributed Processing Symposium (IPDPS 2004) - Workshop on High-Performance Grid Computing, Santa Fe, New Mexico, USA, p. 273a. IEEE CS Press, Los Alamitos (2004)Google Scholar
- 14.Traversat, B., Abdelaziz, M., Pouyoul, E.: Project JXTA: A Loosely-Consistent DHT Rendezvous Walker (2003), http://www.jxta.org/docs/jxta-dht.pdf
- 15.Pallickara, S., Fox, G.: NaradaBrokering: A middleware framework and architecture for enabling durable peer-to-peer grids. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 41–61. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 16.Kalbarczyk, Z.T., Iyer, R.K., Bagchi, S., Whisnant, K.: Chameleon: A software infrastructure for adaptive fault tolerance. IEEE Transactions on Parallel and Distributed Systems 10(6), 560–579 (1999)CrossRefGoogle Scholar
- 17.Cappello, F., Djilali, S., Fedak, G., Herault, T., Magniette, F., Néri, V., Lodygensky, O.: Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems 21(3), 417–437 (2005)CrossRefGoogle Scholar
- 18.Antoniu, G., Deverge, J.F., Monnet, S.: How to bring together fault tolerance and data consistency to enable grid data sharing. In: Concurrency and Computation: Practice and Experience (to appear, 2006)Google Scholar
- 19.Busca, J.M., Picconi, F., Sens, P.: Pastis: A highly-scalable multi-user peer-to-peer file system. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 1173–1182. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- 20.Shafi, H., Speight, E., Bennett, J.K.: Raptor: Integrating checkpoints and thread migration for cluster management. In: Proceedings of the 22nd International Symposium on Reliable Distributed Systems (SRDS 2003), pp. 141–152. IEEE, Los Alamitos (2003)CrossRefGoogle Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2006