Cox-Rower Architecture for Fast Parallel Montgomery Multiplication
This paper proposes a fast parallel Montgomery multiplication algorithm based on Residue Number Systems (RNS). It is easy to construct a fast modular exponentiation by applying the algorithm repeatedly. To realize an efficient RNS Montgomery multiplication, the main contribution of this paper is to provide a new RNS base extension algorithm. Cox-Rower Architecture described in this paper is a hardware suitable for the RNS Montgomery multiplication. In this architecture, a base extension algorithm is executed in parallel by plural Rower units controlled by a Cox unit. Each Rower unit is a single-precision modular multiplier-and-accumulator, whereas Cox unit is typically a 7 bit adder. Although the main body of the algorithm processes numbers in an RNS form, efficient procedures to transform RNS to or from a radix representation are also provided. The exponentiation algorithm can, thus, be adapted to an existing standard radix interface of RSA cryptosystem.
- 9.Jean-Claud Bajard, Laurent-Stephane Didier, Peter Kornerup, “An RNS Montgomery Multiplication Algorithm,” Proceedings of ARITH13, IEEE Computer Society, pp.234–239, July 1997.Google Scholar
- 11.Pascal Paillier, “Low-Cost Double-Size Modular Exponentiation or How to Stretch Your Cryptoprocessor,” Proc. of PKC’99, pp.223–234, 1999.Google Scholar
- 12.D. E. Knuth, The Art of Computer Programming, Vol.2, Seminumerical Algorithms, Second Edition, pp.268–276, Addison-Wesley, 1981.Google Scholar