Abstract
The computation of iterative functions need not be limited by the rate at which external signals, such as a clock, can be supplied to an on-chip circuit. Instead, self-timed structures can compute without clock or latch delays. In particular, a self-timed ring is a loop of logical stages that, after initialization with operands, computes multiple cycles of an iterative computation without further external handshaking. Viewed as a whole, a self-timed ring has a total latency and throughput dependent not only on the individual stages' latencies and cycle times, but also on the total number of stages, tokens, and extra “bubbles” in the ring. This article derives the performance characteristics of self-timed rings, illustrates them with graphs, and discusses the implications for designing rings with optimal performance. Certain suggested ring configurations allow iteration with no latches and zero delay overhead, achieving a total latency equal to just the sum of the raw function-block delays. This property has been verified by measurements on a chip that demonstrates a self-timed ring for the example function of floating-point division. Fabricated in 1.2Μ CMOS, the ring occupies 7 mm2 and generates a quotient bit every 2.8 ns.
Similar content being viewed by others
References
I. Sutherland, “Micropipelines,”Communications of the ACM, vol. 32, 1989, pp. 720–738.
M. Greenstreet, T. Williams, and J. Staunstrup, “Self-timed iteration,”Proceedings of VLSI-87, Vancouver Canada, 1987.
T. Williams, et al., “A self-timed chip for division,”Proc. Stanford Conference on Advanced Research in VLSI, Mar. 1987, pp. 75–95.
T. Williams, “Self-timed rings and their application to division,” Ph.D. Dissertation, Stanford CSL-TR-91-482, May 1991.
T. Williams and M. Horowitz, “A zero-overhead self-timed 160 nS 54 bit CMOS divider,”IEEE Journal of Solid-State Circuits, vol. 26, 1991, pp. 1651–1661.
C.L. Seitz, “System timing,” in Mead and Conway, eds.,Introduction to VLSI Systems, Reading, MA: Addison-Wesley, 1980, chap. 7.
M. Greenstreet and K. Steiglitz, “Throughput of long self-timed pipelines,” CS-TR-190-88, Princeton U., Nov. 1988.
C. Ramachandani, “Analysis of asynchronous concurrent systems by Petri-nets,” Project MAC, TR-120, MIT, Cambridge, MA, Jan. 1974.
F.G. Commoner, et al., “Marked directed graphs,”Journal of Computer and System Sciences, vol. 5, 1971, pp. 511–523.
S.K. Rao, “Analysis and construction of synchronous regular iterative arrays,” Ph.D. Dissertation, Stanford Univ., 1985.
T.A. Chu, “Synthesis of self-timed control circuit from graphs: an example,”Proceedings of ICCD, 1986, pp. 565–571.
T. Meng, R. Brodersen, and D. Messerschmitt, “Automatic synthesis of asynchronous circuits from high-level specifications,”IEEE Tran. on CAD, vol. 8, 1989, pp. 1185–1205.
J.T. Udding, “Classification and composition of delay-insensitive circuits,” Ph.D. Dissertation, Eindhoven Tech. Univ., 1986.
A. Martin, “Compiling communicating processes into delay-insensitive VLSI circuits,”Distributed Computing, vol. 1, 1986, pp. 226–234.
C. Ramamoorthy and G. Ho, “Performance evaluation of asynchronous concurrent systems using Petri-nets,”IEEE Tran. on Software Engineering, vol. SE-6, 1980, pp. 440–449.
S. Burns, “Performance analysis and optimization of asynchronous circuits,” Ph.D. Dissertation, Caltech, 1991.
T. Williams, “Latency and throughput tradeoffs in self-timed asynchronous pipelines and rings,” Stanford Tech Report CSL-TR-90-431, Aug. 1990.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Williams, T.E. Performance of iterative computation in self-timed rings. Journal of VLSI Signal Processing 7, 17–31 (1994). https://doi.org/10.1007/BF02108187
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02108187