Abstract
Intel and AMD support the carry-less multiplication (CLMUL) instruction set in their x64 processors. We use CLMUL to implement an almost universal 64-bit hash family (CLHASH). We compare this new family with what might be the fastest almost universal family on x64 processors (VHASH). We find that CLHASH is at least 60 % faster. We also compare CLHASH with a popular hash function designed for speed (Google’s CityHash). We find that CLHASH is 40 % faster than CityHash on inputs larger than 64 bytes and just as fast otherwise.
Similar content being viewed by others
Notes
The low-power AMD Jaguar microarchitecture does even better with a throughput of one cycle and a latency of three cycles.
In the present paper, \(\log n\) means \(\log _2 n\).
The general construction of a finite field of cardinality \(p^n\) for \(n>1\) is commonly explained in terms of polynomials with coefficients from \(\mathrm{GF}(p)\). To avoid unnecessary abstraction, we present finite fields of cardinality \(2^L\) using regular L-bit integers. Interested readers can see Mullen and Panario [30], for the alternative development.
This can be readily verified using a mathematical software package such as Sage or Maple.
Our benchmark software is made freely available under a liberal open-source license (https://github.com/lemire/StronglyUniversalStringHashing), and it includes the modified SMHasher as well as all the necessary software to reproduce our results.
For comparison, Dai and Krovetz reported that VHASH used 0.6 cycles per byte on an Intel Core 2 processor (Merom) [25].
References
Appleby, A.: SMHasher & MurmurHash (2012). http://code.google.com/p/smhasher. Last checked March 2015
ARM Limited: ARMv8 architecture reference manual (2014). http://infocenter.arm.com/help/topic/com.arm.doc.subset.architecture.reference/. Last checked March 2015
Aumasson, J.P., Bernstein, D.J.: SipHash: a fast short-input PRF. In: Galbraith, S., Nandi, M. (eds.) Progress in Cryptology (INDOCRYPT 2012). Lecture Notes in Computer Science, vol. 7668, pp. 489–508. Springer, Berlin (2012). doi:10.1007/978-3-642-34931-7_28
Aumasson, J.P., Bernstein, D.J.: SipHash: high-speed pseudorandom function (reference code) (2014). https://github.com/veorq/SipHash. Last checked Nov 2014
Barrett, P.: Implementing the rivest shamir and adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) Advances in Cryptology (CRYPTO’ 86). Lecture Notes in Computer Science, vol. 263, pp. 311–323. Springer, Berlin (1987). doi:10.1007/3-540-47721-7_24
Bernstein, D.J.: The Poly1305-AES message-authentication code. In: Fast Software Encryption. Lecture Notes in Computer Science, vol. 3557, pp. 32–49. Springer, Berlin (2005). doi:10.1007/11502760_3
Black, J., Halevi, S., Krawczyk, H., Krovetz, T., Rogaway, P.: UMAC: fast and secure message authentication. In: Wiener, M. (ed.) Advances in Cryptology (CRYPTO’ 99). Lecture Notes in Computer Science, vol. 1666, pp. 216–233. Springer, Berlin (1999). doi:10.1007/3-540-48405-1_14
Bluhm, M., Gueron, S.: Fast software implementation of binary elliptic curve cryptography. Tech. rep, Cryptology ePrint Archive (2013)
Bos, J.W., Özen, O., Stam, M.: Efficient hashing using the AES instruction set. In: Proceedings of the 13th International Conference on Cryptographic Hardware and Embedded Systems (CHES’11), pp. 507–522. Springer, Berlin (2011)
Carter, J.L., Wegman, M.N.: Universal classes of hash functions. J. Comput. System Sci. 18(2), 143–154 (1979). doi:10.1016/0022-0000(79)90044-8
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn, 3rd edn. The MIT Press, Cambridge (2009)
Dai, W., Krovetz, T.: VHASH security. Tech. Rep. 338, IACR Cryptology ePrint Archive (2007)
Estébanez, C., Hernandez-Castro, J.C., Ribagorda, A., Isasi, P.: Evolving hash functions by means of genetic programming. In: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 1861–1862. ACM, New York (2006)
Etzel, M., Patel, S., Ramzan, Z.: Square hash: fast message authentication via optimized universal hash functions. In: Wiener, M. (ed.) Advances in Cryptology (CRYPTO’ 99). Lecture Notes in Computer Science, vol. 1666, pp. 234–251. Springer, Berlin (1999). doi:10.1007/3-540-48405-1_15
Fan, B., Andersen, D.G., Kaminsky, M., Mitzenmacher, M.D.: Cuckoo filter: practically better than Bloom. In: Proceedings of the 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT ’14), pp. 75–88. ACM, New York (2014). doi:10.1145/2674005.2674994
Fog, A.: Instruction tables: lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs. Tech. rep., Copenhagen University College of Engineering (2014). http://www.agner.org/optimize/instruction_tables.pdf. Last checked March 2015
Gueron, S., Kounavis, M.: Efficient implementation of the Galois Counter Mode using a carry-less multiplier and a fast reduction algorithm. Inf. Process. Lett. 110(14), 549–553 (2010). doi:10.1016/j.ipl.2010.04.011
Halevi, S., Krawczyk, H.: MMH: software message authentication in the Gbit/second rates. In: Biham, E. (ed.) Fast Software Encryption. Lecture Notes in Computer Science, vol. 1267, pp. 172–189. Springer, Berlin (1997). doi:10.1007/BFb0052345
Intel Corporation: Intel IACA tool: a static code analyser (2012). https://software.intel.com/en-us/articles/intel-architecture-code-analyzer. Last checked March 2015
Intel Corporation: Power ISA Version 2.07 (2013). https://www.power.org/wp-content/uploads/2013/05/PowerISA_V2.07_PUBLIC.pdf. Last checked March 2015
Intel Corporation: Power ISA Version 2.07 (2014). https://software.intel.com/sites/landingpage/IntrinsicsGuide/. Last checked March 2015
Knežević, M., Sakiyama, K., Fan, J., Verbauwhede, I.: Modular reduction in \(GF(2^n)\) without pre-computational phase. In: von zur Gathen, J., Imaña, J.L., Koç, C.K. (eds.) Arithmetic of Finite Fields. Lecture Notes in Computer Science, vol. 5130, pp. 77–87. Springer, Berlin (2008). doi:10.1007/978-3-540-69499-1_7
Knuth, D.E.: Searching and Sorting. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1997)
Krovetz, T.: Message authentication on 64-bit architectures. In: Selected Areas in Cryptography. Lecture Notes in Computer Science, vol. 4356, pp. 327–341. Springer, Berlin (2007). doi:10.1007/978-3-540-74462-7_23
Krovetz, T., Dai, W.: VMAC and VHASH implementation (2007). http://fastcrypto.org/vmac/. Last checked March 2015
Lemire, D., Kaser, O.: Strongly universal string hashing is fast. Comput. J. 57(11), 1624–1638 (2014). doi:10.1093/comjnl/bxt070
Lim, H., Han, D., Andersen, D.G., Kaminsky, M.: Mica: a holistic approach to fast in-memory key-value storage. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14), pp. 429–444. USENIX Association, Berkeley (2014)
Matsumoto, M., Nishimura, T.: Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8(1), 3–30 (1998). doi:10.1145/272991.272995
Motzkin, T.S.: Evaluation of polynomials and evaluation of rational functions. Bull. Am. Math. Soc. 61(9), 163 (1955)
Mullen, G.L., Panario, D.: Handbook of Finite Fields, 1st edn. Chapman & Hall/CRC, London (2013)
Nguyen, L.H., Roscoe, A.W.: New combinatorial bounds for universal hash functions. Tech. Rep. 153, Cryptology ePrint Archive (2009)
Oliveira, T., Aranha, D.F., López, J., Rodríguez-Henríquez, F.: Fast point multiplication algorithms for binary elliptic curves with and without precomputation. In: Joux, A., Youssef, A. (eds.) Selected Areas in Cryptography (SAC 2014). Lecture Notes in Computer Science, pp. 324–344. Springer International Publishing, Switzerland (2014). doi:10.1007/978-3-319-13051-4_20
Oliveira, T., López, J., Aranha, D.F., Rodríguez-Henríquez, F.: Two is the fastest prime: lambda coordinates for binary elliptic curves. J. Cryptogr. Eng. 4(1), 3–17 (2014). doi:10.1007/s13389-013-0069-z
Paoloni, G.: How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures. Intel Corporation, Santa Clara (2010)
Pike, G., Alakuijala, J.: The CityHash family of hash functions (2011). https://code.google.com/p/cityhash/. Last checked March 2015
Stinson, D.R.: Universal hashing and authentication codes. Des. Codes Cryptogr. 4(4), 369–380 (1994). doi:10.1007/BF01388651
Stinson, D.R.: On the connections between universal hashing, combinatorial designs and error-correcting codes. Congr. Numer. 114, 7–28 (1996)
Su, C., Fan, H.: Impact of Intel’s new instruction sets on software implementation of \(GF (2)[x]\) multiplication. Inf. Process. Lett. 112(12), 497–502 (2012). doi:10.1016/j.ipl.2012.03.012
Taverne, J., Faz-Hernández, A., Aranha, D.F., Rodríguez-Henríquez, F., Hankerson, D., López, J.: Speeding scalar multiplication over binary elliptic curves using the new carry-less multiplication instruction. J. Cryptogr. Eng. 1(3), 187–199 (2011). doi:10.1007/s13389-011-0017-8
Acknowledgments
This work was supported by the National Research Council of Canada, under Grant 26143.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lemire, D., Kaser, O. Faster 64-bit universal hashing using carry-less multiplications. J Cryptogr Eng 6, 171–185 (2016). https://doi.org/10.1007/s13389-015-0110-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13389-015-0110-5