Multi-core DSP-based Vector Set Bits Counters/Comparators
- 389 Downloads
The paper shows that fast counting non-zero components (Hamming weights) and comparing the results (Hamming distances) in large sets of data items is important for numerous practical applications and this problem has been broadly investigated by software and hardware designers. It is frequently referenced as population or vector set bits count (or simply popcount). This paper is dedicated to multi-core FPGA-based accelerators that compute Hamming weights/distances and compare the results with fixed thresholds and variable bounds. It is shown that widely available in contemporary FPGAs digital signal processing slices may be used efficiently and they provide the fastest and the less resource consuming solutions. A thorough analysis and comparison with the best known alternatives both in hardware and in software is presented and supported by numerous experiments in the recent Nexys-4, ZedBoard and ZyBo prototyping systems. Complete hardware description language (VHDL) specifications for core components are given ready to be synthesized, implemented, tested and evaluated. Experiments with the proposed designs clearly demonstrate significant speed-up comparing to known hardware/software alternatives.
KeywordsHamming weight/population/vector set bits counter Hamming weight comparator Field-programmable gate array Digital signal processing slice Hardware accelerator On-chip architecture
- 1.Knuth, D.E. (2011). The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley.Google Scholar
- 6.Asada, K., Kumatsu, S., & Ikeda, M. (1999). Associative memory with minimum Hamming distance detector and its application to bus data encoding. In Proc. IEEE Asia-Pacific Application-Specific Integrated Circuits Conf. Korea, 16–18.Google Scholar
- 7.Barral, C., Coron, J. S., & Naccache, D. (2004). Externalized fingerprint matching. In Proc. Int. Conf. on Biometric Authentication. Hong Kong, 309–315.Google Scholar
- 8.Zakrevskij, A., Pottosin, Y., & Cheremisiniva, L. (2008). Combinatorial Algorithms of Discrete Mathematics. TUT Press.Google Scholar
- 10.Pedroni, V. (2004). Compact Hamming-comparator-based rank order filter for digital VLSI and FPGA implementations. In Proc. IEEE International Symp. on Circuits and Systems, vol. 2. Canada, 585–588.Google Scholar
- 11.Hakmem (1972). Artificial Intelligence Memo, 239. Massachusetts Institute of Technology.Google Scholar
- 13.El-Qawasmeh, E. (2003). Beating the popcount. Int. Journal of Information Technology, 9(1), 1–18.Google Scholar
- 14.Sklyarov, V., & Skliarova, I. (2013). Digital hamming weight and distance analyzers for binary vectors and matrices. Int. Journal of Innovative Computing, Information and Control, 9(12), 4825–4849.Google Scholar
- 16.Intel Corp. (2007). Intel® SSE4 Programming Reference. http://home.ustc.edu.cn/~shengjie/REFERENCE/sse4_instruction_set.pdf. Accessed 8 May 2014.
- 17.ARM Ltd. (2013). NEON™ Version: 1.0 Programmer’s Guide. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0018a/index.html. Accessed 8 May 2014.
- 18.Dalke Scientific Software, LLC (2011). Faster population counts, http://dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html. Accessed 8 May 2014.
- 19.Manku, G.S., Jain, A., & Sarma, A.D. (2007). Detecting near-duplicates for web crawling. In Proc. 16th Int. World Wide Web Conf. Banff, Canada, 141–150.Google Scholar
- 22.Sklyarov, V., Skliarova, I., Mihhailov, D., & Sudnitson, A. (2011). Implementation in FPGA of Address-based Data Sorting. In Proc. 21st Int. Conf. on Field-Programmable Logic and Applications. Crete, Greece, 405–410.Google Scholar
- 23.Xilinx Inc. (2013). 7 Series DSP48E1 Slice User Guide. http://www.xilinx.com/support/documentation/user_guides/ug479_7Series_DSP48E1.pdf. Accessed 8 May 2014.
- 24.Sklyarov, V., & Skliarova, I. (2013). Parallel Processing in FPGA-based Digital Circuits and Systems. TUT Press.Google Scholar
- 28.Milenkovic, O., & Kashyap, N. (2005). On the design of codes for DNA computing (pp. 100–119). Norway: In Proc. Int. Conf. on Coding and Cryptography. Bergen.Google Scholar
- 29.Digilent Inc. (2013). Nexys4™ FPGA board reference manual. http://www.digilentinc.com/Data/Products/NEXYS4/Nexys4_RM_VB1_Final_3.pdf. Accessed 8 May 2014.
- 30.Sklyarov, V., Skliarova, I., Barkalov, A., & Titarenko, L. (2014). Synthesis and Optimization of FPGA-based Systems, Springer.Google Scholar
- 31.Avnet Inc. (2014). ZedBoard (Zynq™ Evaluation and Development) Hardware User’s Guide. http://www.zedboard.org/sites/default/files/documentations/ZedBoard_HW_UG_v2_2.pdf. Accessed 8 May 2014.
- 32.Digilent, Inc. (2014). ZyBo Reference Manual. http://digilentinc.com/Data/Products/ZYBO/ZYBO_RM_B_V6.pdf. Accessed 8 May 2014.
- 33.Digilent, Inc. (2011). PmodKYPD™ Reference Manual. http://digilentinc.com/Products/Detail.cfm?NavPath = 2,401,940&Prod = PMODKYPD. Accessed 8 May 2014.
- 34.Sadri, M., Weis, C., When, N., & Benini, L. (2013). Energy and Performance Exploration of Accelerator Coherency Port Using Xilinx ZYNQ. In Proc. 10th FPGAWorld Conference, Copenhagen/Stockholm.Google Scholar
- 35.Skliarova, I., & Sklyarov, V. (2006). Design methods for FPGA-based implementation of combinatorial search algorithms (pp. 359–368). Indonesia: In. Proc. Int. Workshop on SoC and MCSoC Design. Yogyakarta.Google Scholar
- 36.Sklyarov, V., Skliarova, I., Silva, J., Rjabov, A., Sudnitson, A., & Cardoso, C. (2014). Hardware/Software Co-design for Programmable Systems-on-Chip. TUT Press.Google Scholar
- 37.Anderson, S. E. (2007). Counting bits set, in parallel. http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel. Accessed 8 May 2014.
- 38.Xilinx, Inc. (2014). Zynq-7000 All Programmable SoC Technical Reference Manual. http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf. Accessed 8 May 2014.