S. Adve, V. Adve, M. Hill, and M. Vernon. Comparison of Hardware and Software Cache Coherence Schemes. In Proc. of the 18th ISCA, pp. 298–308, June 1991.
N.M. Amato, J. Perdue, A. Pietracaprina, G. Pucci, and M. Mathis. Predicting performance on smps. a case study: The SGI Power Challenge. In Proc. IPDPS, pp. 729–737, May 2000.
M. Auslander, H. Franke, B. Gamsa, O. Krieger, and M. Stumm. Customization lite. In Proc. of 6th Workshop on Hot Topics in Operating Systems (HotOS-VI), May 1997.
J. Appavo B. Gamsa, O. Krieger and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proc. of OSDI, 1999.
B. Grant, et. al. An evaluation of staged run-time optimizations in Dyce. In Proc. of the SIGPLAN 1999 PLDI, Atlanta, GA, May 1999.
G.E. Blelloch, P.B. Gibbons, Y. Mattias, and M. Zagha. Accounting for memory bank contention and delay in high-bandwidth multiprocessors. IEEE Trans. Par.Dist. Sys.
, 8(9):943–958, 1997.CrossRefGoogle Scholar
J. Mark Bull. Feedback guided dynamic loop scheduling: Algorithms and experiments. In EUROPAR98, Sept., 1998.
F. Dang and L. Rauchwerger. Speculative parallelization of partially parallel loops. In Proc. of the 5th Int. Workshop LCR 2000
, Lecture Notes in Computer Science, May 2000.Google Scholar
D. Engler. Vcode: a portable, very fast dynamic code generation system. In Proc. of the SIGPLAN 1996 PLDI Philadelphia, PA, May 1996.
D. Bailey et al. The NAS parallel benchmarks. Int. J. Supercomputer Appl.
, 5(3):63–73, 1991.CrossRefGoogle Scholar
P. B. Gibbons, Y. Matias, and V. Ramachandran. Can a shared-memory model serve as a bridging-model for parallel computation? In Proc. ACM SPAA, pp. 72–83, 1997.
H. Han and C.-W. Tseng. Improving compiler and run-time support for adaptive irregular codes. In PACT’98, Oct. 1998.
R. Iyer, N. Amato, L. Rauchwerger, and L. Bhuyan. Comparing the memory system performance of the HP V=AD-Class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applications. In Proc. of ACM ICS, pp. 339–347, June 1999.
B. H. H. Juurlink and H. A. G.Wijshoff. A quantitative comparison of parallel computation models. In Proc. of ACM SPAA, pp. 13–24, 1996.
D. Keppel, S. J. Eggers, and R. R. Henry. A case for runtime code generation. TR UWCSE 91-11-04, Dept. of Computer Science and Engineering, Univ. of Washington, Nov. 1991..Google Scholar
O. Krieger and M. Stumm. Hfs: A performance-oriented flexible file system based on building-block compositions. IEEE Trans. Comput.
, 15(3):286–321, 1997.Google Scholar
S. Owicki and A. Agarwal. Evaluating the performance of software cache coherency. In Proc. of ASPLOS III, April 1989.
L. Rauchwerger, N. Amato, and D. Padua. A Scalable Method for Run-time Loop Parallelization. Int. J. Paral. Prog.
, 26(6):537–576, July 1995.CrossRefGoogle Scholar
L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Trans. on Par. and Dist. Systems, 10(2), 1999.
L. Rauchwerger and D. Padua. Parallelizing WHILE Loops for Multiprocessor Systems. In Proc. of 9th IPPS, April 1995.
Silicon Graphics Corporation 1995. SGI Power Challenge: User’s Guide, 1995.
R. Simoni and M. Horowitz. Modeling the Performance of Limited Pointer Directories for Cache Coherence. In Proc. of the 18th ISCA, pp. 309–318, June 1991.
J. T orrellas, J. Hennessy, and T. Weil. Analysis of Critical Architectural and Programming Parameters in a Hierarchical Shared Memory Multiprocessor. In ACM Sigmetrics Conf. on Measurement and Modeling of Computer Systems, pp. 163–172, May 1990.
H. Y u and L. Rauchwerger. Adaptive reduction parallelization. In Proc. of the 14th ACM ICS, Santa Fe, NM, May 2000.
Y. Zhang, L. Rauchwerger, and J. Torrellas. Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors. In Proc. of HPCA-4, pp. 162–173, 1998.
Y. Zhang, L. Rauchwerger, and J. Torrellas. Speculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors. In Proc. of HPCA-5, Jan. 1999.
Ye Zhang. DSM Hardware for Speculative Parallelization
. Ph.D. Thesis, Department of ECE, Univ. of Illinois, Urbana, IL, Jan. 1999Google Scholar