Skip to main content
Log in

Multi-core DSP-based Vector Set Bits Counters/Comparators

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The paper shows that fast counting non-zero components (Hamming weights) and comparing the results (Hamming distances) in large sets of data items is important for numerous practical applications and this problem has been broadly investigated by software and hardware designers. It is frequently referenced as population or vector set bits count (or simply popcount). This paper is dedicated to multi-core FPGA-based accelerators that compute Hamming weights/distances and compare the results with fixed thresholds and variable bounds. It is shown that widely available in contemporary FPGAs digital signal processing slices may be used efficiently and they provide the fastest and the less resource consuming solutions. A thorough analysis and comparison with the best known alternatives both in hardware and in software is presented and supported by numerous experiments in the recent Nexys-4, ZedBoard and ZyBo prototyping systems. Complete hardware description language (VHDL) specifications for core components are given ready to be synthesized, implemented, tested and evaluated. Experiments with the proposed designs clearly demonstrate significant speed-up comparing to known hardware/software alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18

Similar content being viewed by others

References

  1. Knuth, D.E. (2011). The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley.

  2. Parhami, B. (2009). Efficient hamming weight comparators for binary vectors based on accumulative and up/down parallel counters. IEEE Transactions on Circuits and Systems II: Express Briefs, 56(2), 167–171.

    Article  Google Scholar 

  3. Chen, K. (1989). Bit-serial realizations of a class of nonlinear filters based on positive boolean functions. IEEE Transactions on Circuits and Systems, 36(6), 785–794.

    Article  Google Scholar 

  4. Wendt, P. D., Coyle, E. J., & Gallagher, N. C. (1986). Stack filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 898–908.

    Article  Google Scholar 

  5. Storace, M., & Poggi, T. (2011). Digital architectures realizing piecewise-linear multivariate functions: two FPGA implementations. Int. Journal of Circuit Theory and Applications, 39(1), 1–15.

    Article  MATH  Google Scholar 

  6. Asada, K., Kumatsu, S., & Ikeda, M. (1999). Associative memory with minimum Hamming distance detector and its application to bus data encoding. In Proc. IEEE Asia-Pacific Application-Specific Integrated Circuits Conf. Korea, 16–18.

  7. Barral, C., Coron, J. S., & Naccache, D. (2004). Externalized fingerprint matching. In Proc. Int. Conf. on Biometric Authentication. Hong Kong, 309–315.

  8. Zakrevskij, A., Pottosin, Y., & Cheremisiniva, L. (2008). Combinatorial Algorithms of Discrete Mathematics. TUT Press.

  9. Skliarova, I., & Ferrari, A. B. (2004). A Software/reconfigurable hardware SAT solver. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(4), 408–419.

    Article  Google Scholar 

  10. Pedroni, V. (2004). Compact Hamming-comparator-based rank order filter for digital VLSI and FPGA implementations. In Proc. IEEE International Symp. on Circuits and Systems, vol. 2. Canada, 585–588.

  11. Hakmem (1972). Artificial Intelligence Memo, 239. Massachusetts Institute of Technology.

  12. Zhang, X., Qin, J., Wang, W., Sun, Y., & Lu, J. (2013). Hmsearch: an efficient hamming distance query processing algorithm (In Proc. 25th Int). USA: Conf. on Scientific and Statistical Database Management. Maryland.

    Book  Google Scholar 

  13. El-Qawasmeh, E. (2003). Beating the popcount. Int. Journal of Information Technology, 9(1), 1–18.

    Google Scholar 

  14. Sklyarov, V., & Skliarova, I. (2013). Digital hamming weight and distance analyzers for binary vectors and matrices. Int. Journal of Innovative Computing, Information and Control, 9(12), 4825–4849.

    Google Scholar 

  15. Sklyarov, V., & Skliarova, I. (2013). Design and implementation of counting networks. Computing. doi:10.1007/s00607-013-0360-y.

    MATH  Google Scholar 

  16. Intel Corp. (2007). Intel® SSE4 Programming Reference. http://home.ustc.edu.cn/~shengjie/REFERENCE/sse4_instruction_set.pdf. Accessed 8 May 2014.

  17. ARM Ltd. (2013). NEON™ Version: 1.0 Programmer’s Guide. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0018a/index.html. Accessed 8 May 2014.

  18. Dalke Scientific Software, LLC (2011). Faster population counts, http://dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html. Accessed 8 May 2014.

  19. Manku, G.S., Jain, A., & Sarma, A.D. (2007). Detecting near-duplicates for web crawling. In Proc. 16th Int. World Wide Web Conf. Banff, Canada, 141–150.

  20. Nasr, R., Vernica, R., Li, C., & Baldi, P. (2012). Speeding up chemical searches using the inverted index: the convergence of chemoinformatics and text search methods. Journal of Chemical Information and Modeling, 52(4), 891–900.

    Article  Google Scholar 

  21. Sklyarov, V., & Skliarova, I. (2013). Fast regular circuits for network-based parallel data processing. Advances in Electrical and Computer Engineering, 13(4), 47–50.

    Article  Google Scholar 

  22. Sklyarov, V., Skliarova, I., Mihhailov, D., & Sudnitson, A. (2011). Implementation in FPGA of Address-based Data Sorting. In Proc. 21st Int. Conf. on Field-Programmable Logic and Applications. Crete, Greece, 405–410.

  23. Xilinx Inc. (2013). 7 Series DSP48E1 Slice User Guide. http://www.xilinx.com/support/documentation/user_guides/ug479_7Series_DSP48E1.pdf. Accessed 8 May 2014.

  24. Sklyarov, V., & Skliarova, I. (2013). Parallel Processing in FPGA-based Digital Circuits and Systems. TUT Press.

  25. Piestrak, S. J. (2007). Efficient hamming weight comparators of binary vectors. Electronic Letters, 43(11), 611–612.

    Article  Google Scholar 

  26. Pedroni, V. A. (2003). Compact fixed-threshold and two-vector hamming comparators. Electronic Letters, 39(24), 1705–1706.

    Article  Google Scholar 

  27. Mueller, R., Teubner, J., & Alonso, G. (2012). Sorting networks on FPGAs. The Int. Journal on Very Large Data Bases, 21(1), 1–23.

    Article  Google Scholar 

  28. Milenkovic, O., & Kashyap, N. (2005). On the design of codes for DNA computing (pp. 100–119). Norway: In Proc. Int. Conf. on Coding and Cryptography. Bergen.

    Google Scholar 

  29. Digilent Inc. (2013). Nexys4™ FPGA board reference manual. http://www.digilentinc.com/Data/Products/NEXYS4/Nexys4_RM_VB1_Final_3.pdf. Accessed 8 May 2014.

  30. Sklyarov, V., Skliarova, I., Barkalov, A., & Titarenko, L. (2014). Synthesis and Optimization of FPGA-based Systems, Springer.

  31. Avnet Inc. (2014). ZedBoard (Zynq™ Evaluation and Development) Hardware User’s Guide. http://www.zedboard.org/sites/default/files/documentations/ZedBoard_HW_UG_v2_2.pdf. Accessed 8 May 2014.

  32. Digilent, Inc. (2014). ZyBo Reference Manual. http://digilentinc.com/Data/Products/ZYBO/ZYBO_RM_B_V6.pdf. Accessed 8 May 2014.

  33. Digilent, Inc. (2011). PmodKYPD™ Reference Manual. http://digilentinc.com/Products/Detail.cfm?NavPath = 2,401,940&Prod = PMODKYPD. Accessed 8 May 2014.

  34. Sadri, M., Weis, C., When, N., & Benini, L. (2013). Energy and Performance Exploration of Accelerator Coherency Port Using Xilinx ZYNQ. In Proc. 10th FPGAWorld Conference, Copenhagen/Stockholm.

  35. Skliarova, I., & Sklyarov, V. (2006). Design methods for FPGA-based implementation of combinatorial search algorithms (pp. 359–368). Indonesia: In. Proc. Int. Workshop on SoC and MCSoC Design. Yogyakarta.

    Google Scholar 

  36. Sklyarov, V., Skliarova, I., Silva, J., Rjabov, A., Sudnitson, A., & Cardoso, C. (2014). Hardware/Software Co-design for Programmable Systems-on-Chip. TUT Press.

  37. Anderson, S. E. (2007). Counting bits set, in parallel. http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel. Accessed 8 May 2014.

  38. Xilinx, Inc. (2014). Zynq-7000 All Programmable SoC Technical Reference Manual. http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf. Accessed 8 May 2014.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iouliia Skliarova.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sklyarov, V., Skliarova, I. Multi-core DSP-based Vector Set Bits Counters/Comparators. J Sign Process Syst 80, 309–322 (2015). https://doi.org/10.1007/s11265-014-0915-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-014-0915-y

Keywords

Navigation