Skip to main content

Double Level Montgomery Cox-Rower Architecture, New Bounds

  • Conference paper
  • First Online:
Smart Card Research and Advanced Applications (CARDIS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8968))

Abstract

Recently, the Residue Number System and the Cox-Rower architecture have been used to compute efficiently Elliptic Curve Cryptography over FPGA. In this paper, we are rewriting the conditions of Kawamura’s theorem for the base extension without error in order to define the maximal range of the set from which the moduli can be chosen to build a base. At the same time, we give a procedure to compute correctly the truncation function of the Cox module. We also present a modified ALU of the Rower architecture using a second level of Montgomery Representation. Such architecture allows us to select the moduli with the new upper bound defined with the condition. This modification makes the Cox-Rower architecture suitable to compute 521 bits ECC with radix downto 16 bits compared to 18 with the classical Cox-Rower architecture. We validate our results through FPGA implementation of a scalar multiplication at classical cryptography security levels (NIST curves). Our implementation uses 35 % less LUTs compared to the state of the art generic implementation of ECC using RNS for the same performance [5]. We also slightly improve the computation time (latency) and our implementation shows best ratio throughput/area for RNS computation supporting any curve independently of the chosen base.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Barrett reduction is also possible, but we would need larger multipliers for the same results.

  2. 2.

    For \( m_i = 2^r \) (only one even number can be selected), we use a classical multiplier and gather the \( r \) least significant bits of the multiplier.

  3. 3.

    It is well known that the Montgomery representation is stable for addition and product using Algorithm 3.

  4. 4.

    The slices is the cells counting system on Xilinx FPGA. A slice on a Kintex-7 includes 4 LUTs with 6 inputs and 8 registers.

  5. 5.

    An ALM, in the Stratix-2 family, contains 2 LUTs with 5 inputs and 2 registers, and equivalent to the Xilinx Virtex-4 slice.

References

  1. Antão, S., Bajard, J.-C., Sousa, L.: RNS-based elliptic curve point multiplication for massive parallel architectures. Comput. J. 55(5), 629–647 (2012)

    Article  Google Scholar 

  2. Bigou, K., Tisserand, A.: Improving modular inversion in RNS using the plus-minus method. In: Bertoni, G., Coron, J.-S. (eds.) CHES 2013. LNCS, vol. 8086, pp. 233–249. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Cheung, R.C.C., Duquesne, S., Fan, J., Guillermin, N., Verbauwhede, I., Yao, G.X.: FPGA implementation of pairings using residue number system and lazy reduction. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 421–441. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Güneysu, T., Paar, C.: Ultra high performance ECC over NIST primes on commercial FPGAs. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 62–78. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Guillermin, N.: A high speed coprocessor for elliptic curve scalar multiplications over \(\mathbb{F}_p\). In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 48–64. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Kawamura, S., Koike, M., Sano, F., Shimbo, A.: Cox-rower architecture for fast parallel montgomery multiplication. In: Preneel, B. (ed.) EUROCRYPT 2000. LNCS, vol. 1807, pp. 523–538. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  7. Ma, Y., Liu, Z., Pan, W., Jing, J.: A high-speed elliptic curve cryptographic processor for generic curves over \(\text{ GF }(p)\). In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 421–437. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  8. Nozaki, H., Motoyama, M., Shimbo, A., Kawamura, S.: Implementation of RSA algorithm based on RNS montgomery multiplication. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 364–376. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  9. Posch, K.C., Posch, R.: Base extension using a convolution sum in residue number systems. Computing 50(2), 93–104 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  10. Posch, K.C., Posch, R.: Modulo reduction in residue number systems. IEEE Trans. Parallel Distrib. Syst. 6(5), 449–454 (1995)

    Article  MathSciNet  Google Scholar 

  11. Schinianakis, D.M., Fournaris, A.P., Michail, H.E., Kakarountas, A.P., Stouraitis, T.: An RNS implementation of an \( f_{p} \) elliptic curve point multiplier. IEEE Trans. Circuits Syst. I: Regul. Pap. 56(6), 1202–1213 (2009)

    Article  MathSciNet  Google Scholar 

  12. Szerwinski, R., Güneysu, T.: Exploiting the power of GPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Yao, G.X., Fan, J., Cheung, R.C.C., Verbauwhede, I.: Faster pairing coprocessor architecture. In: Abdalla, M., Lange, T. (eds.) Pairing 2012. LNCS, vol. 7708, pp. 160–176. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabil Merkiche .

Editor information

Editors and Affiliations

A Algorithm to Compute the Montgomery Reduction over RNS and Implementation Details

A Algorithm to Compute the Montgomery Reduction over RNS and Implementation Details

Let \( \varvec{\mathfrak {B}} \) and \( \varvec{\mathfrak {B}}' \) be 2 RNS bases such that \( \varvec{\mathfrak {B}} = \lbrace m_i \rbrace \) and \( \varvec{\mathfrak {B}}' = \lbrace {m'}_i \rbrace \) with \( M = \prod _{i=1}^{n} m_i \), \( M' = \prod _{i=1}^{n} {m'}_i \), \( \gcd (p, M) = 1 \) and \( \gcd (M, M') = 1 \). Algorithm 4 recalls the Montgomery reduction over RNS, when using the classical ALU. Precomputed values are in bald.

Algorithm 5 is the algorithm for the Montgomery reduction over RNS, when using our ALU. Operation \( \otimes \) will denote the inner Montgomery multiplication and reduction (Algorithm 3) such that \( a \otimes b \mod m = a b 2^{-r} \mod m \).

figure d
figure e
Table 2. Performances details

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bajard, JC., Merkiche, N. (2015). Double Level Montgomery Cox-Rower Architecture, New Bounds. In: Joye, M., Moradi, A. (eds) Smart Card Research and Advanced Applications. CARDIS 2014. Lecture Notes in Computer Science(), vol 8968. Springer, Cham. https://doi.org/10.1007/978-3-319-16763-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16763-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16762-6

  • Online ISBN: 978-3-319-16763-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics