Abstract
Software based Distributed Shared Memory (DSM) systems have been the focus of considerable research effort, primarily in improving performance and consistency protocols. Unfortunately, computer clusters present a number of challenges for any DSM systems that are not solvable through consistency protocols alone. These challenges relate to the ability of DSM systems to adjust to load fluctuations, computers being added/removed from the cluster, to deal with faults, and the ability to use DSM objects larger than the available physical memory. We present here a proposal for the Synergy Distributed Shared Memory System and its integration with the virtual memory, group communication and process migration services of the Genesis Cluster Operating System.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agbaria, A. and Plank, J. (2000). Design, implementation, and performance of checkpointing in netsolve. In International Conference on Dependable Systems and Networks, pages 49–55, New York, New York. IEEE Computer Society.
Amza, C, Cox, A., Dwarkadas, S., Keleher, P., Lu, H., Rajamony, R., Yu, W., and Zwaenepoel, W. (1996). Treadmarks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18–28.
Bal, H., Kaashoek, F, and Tanenbaum, A (1992). Orca: A language for parallel programming of distributed systems. IEEE Transactions on Software Engineering, 18(3): 190–205.
Carter, J., Bennett, J., and Zwaenepoel, W. (1995). Techniques for reducing consistency-related communication in distributed shared-memory systems. ACM Transactions on Computer Systems, 13(3).
Dwarkadas, S., Hardavellas, N., Kontothanassis, L., Nikhil, R., and Stets, R. (1999). Cashmere-vlm: Remote memory paging for software distributed shared memory. In 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, pages 153–159, San Juan, Puerto Rico. IEEE Computer Society.
Gharachorloo, K. (1999). The plight of software distributed shared memory. In 1st Workshop on Software Distributed Shared Memory (WSDSM’ 99), Rhodes, Greece.
Goscinski, A., Hobbs, M., and Silcock, J. (2002). Genesis: An efficient, transparent and easy to use cluster-based operating system. Parallel Computing, 28(4):557–606.
Hsieh, W. (1995). Dynamic Computation Migration in Distributed Shared Memory Systems. PhD thesis, Massachusetts Institute of Technology.
Iftode, L. and Singh, J. (1999). Shared virtual memory: Progress and challenges. Proc. of the IEEE, 87(3).
Ioannidis, S. and Dwarkdas, S. (1998). Compiler and run-time support for adaptive load balancing in software distributed shared memory systems. In Fourth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers (LCR’ 98), pages 107–122, Pittsburgh, Philadelphia. ACM.
Keleher, P. (1996). The relative importance of concurrent writers and weak consistency models. In 16th International Conference on Distributed Computing Systems (ICDCS-16), pages 91–98, Hong Kong. IEEE.
Li, Q., Jing, J., and Xie, L. (1997). Bfxm: A parallel file system model based on the mechanism of distributed shared memory. Operating Systems Review, 31(4):30–40.
Markatos, E. and Dramitinos, G. (1996). Implementation of a reliable remote memory pager. In 1996 Usenix Technical Conference, pages 177–190, San Diego, CA. Usenix.
Morin, C, Lottiaux, R., and Kermarrec, A.-M. (2001). A two-level checkpoint algorithm in a highly-available parallel single level store system. In Workshop on Distributed Shared Memory on Clusters (CCGrid-01), Brisbane, Australia.
Parallel-Tools (1994). Concurrent programming with treadmarks. User manual, Parallel Tools L.L.C.
Pnevmatikatos, D., Markatos, E. P., Magklis, G., and Ioannidis, S. (1999). On using network ram as a non-volatile buffer. Cluster Computing, 2(4):295–303.
Shi, W., Hu, W., Tang, Z., and Eskicioglu, M. (1999). Dynamic task migration in home-based software dsm systems. In 8th IEEE International Symposium on High Performance Distributed Computing, Redondo Beach, California.
Shi, W. and Tang, Z. (1998). Intervals to evaluating distributed shared memory systems. IEEE TCCA Newsletter, pages 3–10.
Silcock, J. and Goscinski, A. (1998). The rhodos dsm system. Microprocessor and Microsystems, 22(3-4): 183–196.
Stelling, P., Foster, I., Kesselman, C, Lee, C, and Laszewski, G. v. (1999). A fault detection service for wide area distributed computations. Cluster Computing, 2(2): 117–128.
Thitikamol, K. and Keleher, P. (1999). Thread migration and load balancing in non-dedicated environments. In Dwarkadas, S., editor, 3rd Workshop on Runtime Systems for Parallel Programming, San Juan, Puerto Rico. Lecture Notes in Computer Science, Springer-Verlag.
Top500 (2002). Worlds top 500 computer systems. ¡http://www.top500.org¿. Web Site Last accessed 6th December, 2002.
Zoraja, I., Rackl, G., and Ludwig, T. (1999). Towards monitoring in parallel and distributed systems. In Conference on Software in Telecommunications and Computer Networks (SoftCOM’ 99), pages 133–141.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Hobbs, M., Silcock, J., Goscinski, A. (2005). Toward a Comprehensive Software Based Dsm System. In: Guo, M., Yang, L.T. (eds) New Horizons of Parallel and Distributed Computing. Springer, Boston, MA. https://doi.org/10.1007/0-387-28967-4_13
Download citation
DOI: https://doi.org/10.1007/0-387-28967-4_13
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24434-1
Online ISBN: 978-0-387-28967-0
eBook Packages: Computer ScienceComputer Science (R0)