Advertisement

Journal of Signal Processing Systems

, Volume 80, Issue 3, pp 309–322 | Cite as

Multi-core DSP-based Vector Set Bits Counters/Comparators

  • Valery Sklyarov
  • Iouliia Skliarova
Article

Abstract

The paper shows that fast counting non-zero components (Hamming weights) and comparing the results (Hamming distances) in large sets of data items is important for numerous practical applications and this problem has been broadly investigated by software and hardware designers. It is frequently referenced as population or vector set bits count (or simply popcount). This paper is dedicated to multi-core FPGA-based accelerators that compute Hamming weights/distances and compare the results with fixed thresholds and variable bounds. It is shown that widely available in contemporary FPGAs digital signal processing slices may be used efficiently and they provide the fastest and the less resource consuming solutions. A thorough analysis and comparison with the best known alternatives both in hardware and in software is presented and supported by numerous experiments in the recent Nexys-4, ZedBoard and ZyBo prototyping systems. Complete hardware description language (VHDL) specifications for core components are given ready to be synthesized, implemented, tested and evaluated. Experiments with the proposed designs clearly demonstrate significant speed-up comparing to known hardware/software alternatives.

Keywords

Hamming weight/population/vector set bits counter Hamming weight comparator Field-programmable gate array Digital signal processing slice Hardware accelerator On-chip architecture 

References

  1. 1.
    Knuth, D.E. (2011). The Art of Computer Programming, vol. 3: Sorting and Searching. Addison-Wesley.Google Scholar
  2. 2.
    Parhami, B. (2009). Efficient hamming weight comparators for binary vectors based on accumulative and up/down parallel counters. IEEE Transactions on Circuits and Systems II: Express Briefs, 56(2), 167–171.CrossRefGoogle Scholar
  3. 3.
    Chen, K. (1989). Bit-serial realizations of a class of nonlinear filters based on positive boolean functions. IEEE Transactions on Circuits and Systems, 36(6), 785–794.CrossRefGoogle Scholar
  4. 4.
    Wendt, P. D., Coyle, E. J., & Gallagher, N. C. (1986). Stack filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 898–908.CrossRefGoogle Scholar
  5. 5.
    Storace, M., & Poggi, T. (2011). Digital architectures realizing piecewise-linear multivariate functions: two FPGA implementations. Int. Journal of Circuit Theory and Applications, 39(1), 1–15.CrossRefzbMATHGoogle Scholar
  6. 6.
    Asada, K., Kumatsu, S., & Ikeda, M. (1999). Associative memory with minimum Hamming distance detector and its application to bus data encoding. In Proc. IEEE Asia-Pacific Application-Specific Integrated Circuits Conf. Korea, 16–18.Google Scholar
  7. 7.
    Barral, C., Coron, J. S., & Naccache, D. (2004). Externalized fingerprint matching. In Proc. Int. Conf. on Biometric Authentication. Hong Kong, 309–315.Google Scholar
  8. 8.
    Zakrevskij, A., Pottosin, Y., & Cheremisiniva, L. (2008). Combinatorial Algorithms of Discrete Mathematics. TUT Press.Google Scholar
  9. 9.
    Skliarova, I., & Ferrari, A. B. (2004). A Software/reconfigurable hardware SAT solver. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(4), 408–419.CrossRefGoogle Scholar
  10. 10.
    Pedroni, V. (2004). Compact Hamming-comparator-based rank order filter for digital VLSI and FPGA implementations. In Proc. IEEE International Symp. on Circuits and Systems, vol. 2. Canada, 585–588.Google Scholar
  11. 11.
    Hakmem (1972). Artificial Intelligence Memo, 239. Massachusetts Institute of Technology.Google Scholar
  12. 12.
    Zhang, X., Qin, J., Wang, W., Sun, Y., & Lu, J. (2013). Hmsearch: an efficient hamming distance query processing algorithm (In Proc. 25th Int). USA: Conf. on Scientific and Statistical Database Management. Maryland.CrossRefGoogle Scholar
  13. 13.
    El-Qawasmeh, E. (2003). Beating the popcount. Int. Journal of Information Technology, 9(1), 1–18.Google Scholar
  14. 14.
    Sklyarov, V., & Skliarova, I. (2013). Digital hamming weight and distance analyzers for binary vectors and matrices. Int. Journal of Innovative Computing, Information and Control, 9(12), 4825–4849.Google Scholar
  15. 15.
    Sklyarov, V., & Skliarova, I. (2013). Design and implementation of counting networks. Computing. doi: 10.1007/s00607-013-0360-y.zbMATHGoogle Scholar
  16. 16.
    Intel Corp. (2007). Intel® SSE4 Programming Reference. http://home.ustc.edu.cn/~shengjie/REFERENCE/sse4_instruction_set.pdf. Accessed 8 May 2014.
  17. 17.
    ARM Ltd. (2013). NEON™ Version: 1.0 Programmer’s Guide. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0018a/index.html. Accessed 8 May 2014.
  18. 18.
    Dalke Scientific Software, LLC (2011). Faster population counts, http://dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html. Accessed 8 May 2014.
  19. 19.
    Manku, G.S., Jain, A., & Sarma, A.D. (2007). Detecting near-duplicates for web crawling. In Proc. 16th Int. World Wide Web Conf. Banff, Canada, 141–150.Google Scholar
  20. 20.
    Nasr, R., Vernica, R., Li, C., & Baldi, P. (2012). Speeding up chemical searches using the inverted index: the convergence of chemoinformatics and text search methods. Journal of Chemical Information and Modeling, 52(4), 891–900.CrossRefGoogle Scholar
  21. 21.
    Sklyarov, V., & Skliarova, I. (2013). Fast regular circuits for network-based parallel data processing. Advances in Electrical and Computer Engineering, 13(4), 47–50.CrossRefGoogle Scholar
  22. 22.
    Sklyarov, V., Skliarova, I., Mihhailov, D., & Sudnitson, A. (2011). Implementation in FPGA of Address-based Data Sorting. In Proc. 21st Int. Conf. on Field-Programmable Logic and Applications. Crete, Greece, 405–410.Google Scholar
  23. 23.
    Xilinx Inc. (2013). 7 Series DSP48E1 Slice User Guide. http://www.xilinx.com/support/documentation/user_guides/ug479_7Series_DSP48E1.pdf. Accessed 8 May 2014.
  24. 24.
    Sklyarov, V., & Skliarova, I. (2013). Parallel Processing in FPGA-based Digital Circuits and Systems. TUT Press.Google Scholar
  25. 25.
    Piestrak, S. J. (2007). Efficient hamming weight comparators of binary vectors. Electronic Letters, 43(11), 611–612.CrossRefGoogle Scholar
  26. 26.
    Pedroni, V. A. (2003). Compact fixed-threshold and two-vector hamming comparators. Electronic Letters, 39(24), 1705–1706.CrossRefGoogle Scholar
  27. 27.
    Mueller, R., Teubner, J., & Alonso, G. (2012). Sorting networks on FPGAs. The Int. Journal on Very Large Data Bases, 21(1), 1–23.CrossRefGoogle Scholar
  28. 28.
    Milenkovic, O., & Kashyap, N. (2005). On the design of codes for DNA computing (pp. 100–119). Norway: In Proc. Int. Conf. on Coding and Cryptography. Bergen.Google Scholar
  29. 29.
    Digilent Inc. (2013). Nexys4™ FPGA board reference manual. http://www.digilentinc.com/Data/Products/NEXYS4/Nexys4_RM_VB1_Final_3.pdf. Accessed 8 May 2014.
  30. 30.
    Sklyarov, V., Skliarova, I., Barkalov, A., & Titarenko, L. (2014). Synthesis and Optimization of FPGA-based Systems, Springer.Google Scholar
  31. 31.
    Avnet Inc. (2014). ZedBoard (Zynq™ Evaluation and Development) Hardware User’s Guide. http://www.zedboard.org/sites/default/files/documentations/ZedBoard_HW_UG_v2_2.pdf. Accessed 8 May 2014.
  32. 32.
    Digilent, Inc. (2014). ZyBo Reference Manual. http://digilentinc.com/Data/Products/ZYBO/ZYBO_RM_B_V6.pdf. Accessed 8 May 2014.
  33. 33.
    Digilent, Inc. (2011). PmodKYPD™ Reference Manual. http://digilentinc.com/Products/Detail.cfm?NavPath = 2,401,940&Prod = PMODKYPD. Accessed 8 May 2014.
  34. 34.
    Sadri, M., Weis, C., When, N., & Benini, L. (2013). Energy and Performance Exploration of Accelerator Coherency Port Using Xilinx ZYNQ. In Proc. 10th FPGAWorld Conference, Copenhagen/Stockholm.Google Scholar
  35. 35.
    Skliarova, I., & Sklyarov, V. (2006). Design methods for FPGA-based implementation of combinatorial search algorithms (pp. 359–368). Indonesia: In. Proc. Int. Workshop on SoC and MCSoC Design. Yogyakarta.Google Scholar
  36. 36.
    Sklyarov, V., Skliarova, I., Silva, J., Rjabov, A., Sudnitson, A., & Cardoso, C. (2014). Hardware/Software Co-design for Programmable Systems-on-Chip. TUT Press.Google Scholar
  37. 37.
    Anderson, S. E. (2007). Counting bits set, in parallel. http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel. Accessed 8 May 2014.
  38. 38.
    Xilinx, Inc. (2014). Zynq-7000 All Programmable SoC Technical Reference Manual. http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf. Accessed 8 May 2014.

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Electronics, Telecommunications and Informatics, IEETAUniversity of AveiroAveiroPortugal

Personalised recommendations