Abstract
The paper shows that fast counting non-zero components (Hamming weights) and comparing the results (Hamming distances) in large sets of data items is important for numerous practical applications and this problem has been broadly investigated by software and hardware designers. It is frequently referenced as population or vector set bits count (or simply popcount). This paper is dedicated to multi-core FPGA-based accelerators that compute Hamming weights/distances and compare the results with fixed thresholds and variable bounds. It is shown that widely available in contemporary FPGAs digital signal processing slices may be used efficiently and they provide the fastest and the less resource consuming solutions. A thorough analysis and comparison with the best known alternatives both in hardware and in software is presented and supported by numerous experiments in the recent Nexys-4, ZedBoard and ZyBo prototyping systems. Complete hardware description language (VHDL) specifications for core components are given ready to be synthesized, implemented, tested and evaluated. Experiments with the proposed designs clearly demonstrate significant speed-up comparing to known hardware/software alternatives.
Similar content being viewed by others
References
Knuth, D.E. (2011). The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley.
Parhami, B. (2009). Efficient hamming weight comparators for binary vectors based on accumulative and up/down parallel counters. IEEE Transactions on Circuits and Systems II: Express Briefs, 56(2), 167–171.
Chen, K. (1989). Bit-serial realizations of a class of nonlinear filters based on positive boolean functions. IEEE Transactions on Circuits and Systems, 36(6), 785–794.
Wendt, P. D., Coyle, E. J., & Gallagher, N. C. (1986). Stack filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 898–908.
Storace, M., & Poggi, T. (2011). Digital architectures realizing piecewise-linear multivariate functions: two FPGA implementations. Int. Journal of Circuit Theory and Applications, 39(1), 1–15.
Asada, K., Kumatsu, S., & Ikeda, M. (1999). Associative memory with minimum Hamming distance detector and its application to bus data encoding. In Proc. IEEE Asia-Pacific Application-Specific Integrated Circuits Conf. Korea, 16–18.
Barral, C., Coron, J. S., & Naccache, D. (2004). Externalized fingerprint matching. In Proc. Int. Conf. on Biometric Authentication. Hong Kong, 309–315.
Zakrevskij, A., Pottosin, Y., & Cheremisiniva, L. (2008). Combinatorial Algorithms of Discrete Mathematics. TUT Press.
Skliarova, I., & Ferrari, A. B. (2004). A Software/reconfigurable hardware SAT solver. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(4), 408–419.
Pedroni, V. (2004). Compact Hamming-comparator-based rank order filter for digital VLSI and FPGA implementations. In Proc. IEEE International Symp. on Circuits and Systems, vol. 2. Canada, 585–588.
Hakmem (1972). Artificial Intelligence Memo, 239. Massachusetts Institute of Technology.
Zhang, X., Qin, J., Wang, W., Sun, Y., & Lu, J. (2013). Hmsearch: an efficient hamming distance query processing algorithm (In Proc. 25th Int). USA: Conf. on Scientific and Statistical Database Management. Maryland.
El-Qawasmeh, E. (2003). Beating the popcount. Int. Journal of Information Technology, 9(1), 1–18.
Sklyarov, V., & Skliarova, I. (2013). Digital hamming weight and distance analyzers for binary vectors and matrices. Int. Journal of Innovative Computing, Information and Control, 9(12), 4825–4849.
Sklyarov, V., & Skliarova, I. (2013). Design and implementation of counting networks. Computing. doi:10.1007/s00607-013-0360-y.
Intel Corp. (2007). Intel® SSE4 Programming Reference. http://home.ustc.edu.cn/~shengjie/REFERENCE/sse4_instruction_set.pdf. Accessed 8 May 2014.
ARM Ltd. (2013). NEON™ Version: 1.0 Programmer’s Guide. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0018a/index.html. Accessed 8 May 2014.
Dalke Scientific Software, LLC (2011). Faster population counts, http://dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html. Accessed 8 May 2014.
Manku, G.S., Jain, A., & Sarma, A.D. (2007). Detecting near-duplicates for web crawling. In Proc. 16th Int. World Wide Web Conf. Banff, Canada, 141–150.
Nasr, R., Vernica, R., Li, C., & Baldi, P. (2012). Speeding up chemical searches using the inverted index: the convergence of chemoinformatics and text search methods. Journal of Chemical Information and Modeling, 52(4), 891–900.
Sklyarov, V., & Skliarova, I. (2013). Fast regular circuits for network-based parallel data processing. Advances in Electrical and Computer Engineering, 13(4), 47–50.
Sklyarov, V., Skliarova, I., Mihhailov, D., & Sudnitson, A. (2011). Implementation in FPGA of Address-based Data Sorting. In Proc. 21st Int. Conf. on Field-Programmable Logic and Applications. Crete, Greece, 405–410.
Xilinx Inc. (2013). 7 Series DSP48E1 Slice User Guide. http://www.xilinx.com/support/documentation/user_guides/ug479_7Series_DSP48E1.pdf. Accessed 8 May 2014.
Sklyarov, V., & Skliarova, I. (2013). Parallel Processing in FPGA-based Digital Circuits and Systems. TUT Press.
Piestrak, S. J. (2007). Efficient hamming weight comparators of binary vectors. Electronic Letters, 43(11), 611–612.
Pedroni, V. A. (2003). Compact fixed-threshold and two-vector hamming comparators. Electronic Letters, 39(24), 1705–1706.
Mueller, R., Teubner, J., & Alonso, G. (2012). Sorting networks on FPGAs. The Int. Journal on Very Large Data Bases, 21(1), 1–23.
Milenkovic, O., & Kashyap, N. (2005). On the design of codes for DNA computing (pp. 100–119). Norway: In Proc. Int. Conf. on Coding and Cryptography. Bergen.
Digilent Inc. (2013). Nexys4™ FPGA board reference manual. http://www.digilentinc.com/Data/Products/NEXYS4/Nexys4_RM_VB1_Final_3.pdf. Accessed 8 May 2014.
Sklyarov, V., Skliarova, I., Barkalov, A., & Titarenko, L. (2014). Synthesis and Optimization of FPGA-based Systems, Springer.
Avnet Inc. (2014). ZedBoard (Zynq™ Evaluation and Development) Hardware User’s Guide. http://www.zedboard.org/sites/default/files/documentations/ZedBoard_HW_UG_v2_2.pdf. Accessed 8 May 2014.
Digilent, Inc. (2014). ZyBo Reference Manual. http://digilentinc.com/Data/Products/ZYBO/ZYBO_RM_B_V6.pdf. Accessed 8 May 2014.
Digilent, Inc. (2011). PmodKYPD™ Reference Manual. http://digilentinc.com/Products/Detail.cfm?NavPath = 2,401,940&Prod = PMODKYPD. Accessed 8 May 2014.
Sadri, M., Weis, C., When, N., & Benini, L. (2013). Energy and Performance Exploration of Accelerator Coherency Port Using Xilinx ZYNQ. In Proc. 10th FPGAWorld Conference, Copenhagen/Stockholm.
Skliarova, I., & Sklyarov, V. (2006). Design methods for FPGA-based implementation of combinatorial search algorithms (pp. 359–368). Indonesia: In. Proc. Int. Workshop on SoC and MCSoC Design. Yogyakarta.
Sklyarov, V., Skliarova, I., Silva, J., Rjabov, A., Sudnitson, A., & Cardoso, C. (2014). Hardware/Software Co-design for Programmable Systems-on-Chip. TUT Press.
Anderson, S. E. (2007). Counting bits set, in parallel. http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel. Accessed 8 May 2014.
Xilinx, Inc. (2014). Zynq-7000 All Programmable SoC Technical Reference Manual. http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf. Accessed 8 May 2014.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sklyarov, V., Skliarova, I. Multi-core DSP-based Vector Set Bits Counters/Comparators. J Sign Process Syst 80, 309–322 (2015). https://doi.org/10.1007/s11265-014-0915-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-014-0915-y