Skip to main content
Log in

Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition

Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Machine-learning algorithms are employed in a wide variety of applications to extract useful information from data sets, and many are known to suffer from super-linear increases in computational time with increasing data size and number of signals being processed (data dimension). Certain principal machine-learning algorithms are commonly found embedded in larger detection, estimation, or classification operations. Three such principal algorithms are the Parzen window-based, non-parametric estimation of Probability Density Functions (PDFs), K-means clustering and correlation. Because they form an integral part of numerous machine-learning applications, fast and efficient execution of these algorithms is extremely desirable. FPGA-based reconfigurable computing (RC) has been successfully used to accelerate computationally intensive problems in a wide variety of scientific domains to achieve speedup over traditional software implementations. However, this potential benefit is quite often not fully realized because creating efficient FPGA designs is generally carried out in a laborious, case-specific manner requiring a great amount of redundant time and effort. In this paper, an approach using pattern-based decomposition for algorithm acceleration on FPGAs is proposed that offers significant increases in productivity via design reusability. Using this approach, we design, analyze, and implement a multi-dimensional PDF estimation algorithm using Gaussian kernels on FPGAs. First, the algorithm’s amenability to a hardware paradigm and expected speedups are predicted. After implementation, actual speedup and performance metrics are compared to the predictions, showing speedup on the order of 20× over a 3.2 GHz processor. Multi-core architectures are developed to further improve performance by scaling the design. Portability of the hardware design across multiple FPGA platforms is also analyzed. After implementing the PDF algorithm, the value of pattern-based decomposition to support reuse is demonstrated by rapid development of the K-means and correlation algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

References

  1. Camastra, F., & Vinciarelli, A. (2008). Machine Learning for Audio, Image and Video Analysis—Theory and Applications. Springer-Verlag London Limited. (16).

  2. Dimitrova, N., Zhang, H., Shahraray, B., Sezan, I., Huang, T., & Zakhor, A. (2002). Applications of Video-content Analysis and Retrieval. Journal of Multimedia, 9(3), 42–55. (21).

    Article  Google Scholar 

  3. Lillo, F., Basile, S., & Mantegna, R. N. (2002). Comparative genomics study of inverted repeats in bacteria. Bioinformatics, 18(7), 971–979. (4).

    Article  Google Scholar 

  4. Pitie, F., Kokaram, A. C., & Dahyot, R. (2005). N-Dimensional Probability Density Function Transfer and its Application to Color Transfer. Proc 10th International Conference on Computer Vision, Beijing, 1434–1439. Oct. 2005. (8)

  5. Kay, S. M., Nuttall, A. H., & Baggenstoss, P. M. (2003). Multidimensional probability density function approximations for detection, classification, and model order selection. IEEE Transactions on Signal Processing, 49(10), 2240–2252. (14).

    Article  MathSciNet  Google Scholar 

  6. Chang, Y., Zeng, W., Kamel, I., & Alonso, R. (1996). Integrated image and speech analysis for content-based video indexing. Proc Multimedia Computing and Systems, Japan, 306–313. (17)

  7. Zhang, H., Zhuang, Y., & Wu, F. (2007). Cross-modal correlation learning for clustering on image-audio dataset. Proc of the 15th International Conference on Multimedia, Augsburg, Germany, 273–276. (18)

  8. Goecke, R., Millar, J. B., Zelinsky, A., & Robert-Ribes, J. (2001). Analysis of audio-video correlation in vowels in Australian English. Intl Conf. on Audio-Visual Speech Processing, Aalborg, Denmark, 115–120. (28)

  9. Llyas, M. (1987). General probability density function of packet service times for computer networks. Electronics Letters, 23(1), 31–32. (5).

    Article  Google Scholar 

  10. Bliss, R. R., & Panigirtzoglou, N. (2002). Testing the stability of implied probability density functions. Journal of Banking & Finance, 26(2), 381–422. (6).

    Article  Google Scholar 

  11. Scheicher, M., & Glatzer, E. (2003). Modelling the implied probability of stock market movements. Working Paper Series 212 European Central Bank. (7)

  12. Greengard, L., & Strain, J. (1991). The fast gauss transform. SIAM Journal on Scientific and Statistical Computing, 12(1), 79–94. (15).

    Article  MATH  MathSciNet  Google Scholar 

  13. Culler, D. E. & Singh, J. P. (1999). Parallel computer architecture: a hardware/software approach. Morgan Kaufmann. (19).

  14. Hemmert, K. S., & Underwood, K. D. (2005). An analysis of the double-precision floating-point FFT on FPGAs. IEEE Symp. on Field-Programmable Custom Computing Machines, Washinton, DC, 171–180. Apr. (1).

  15. Govindu, G., Choi, S., Prasanna, V., Daga, V., Gangadharpalli, S., & Sridhar, V. (2004). A high-performance and energy-efficient architecture for floating-point based LU decomposition on FPGAs. IEEE Symposium on Parallel and Distributed Processing, Santa Fe, NM, 149. (2). Apr.

  16. Jiang, H., Lin, T., & Zhang, H. Video Segmentation with the Assistance of Audio Content Analysis. International Conference on Multimedia and Expo, New York, NY, 1507–1510. (20).

  17. Kim, J. S., Mangalagiri, P., Irick, K., Vijaykrishnan, N., Kandemir, M., Deng, L., et al. (2007). TANOR: A Tool for Accelerating N-body Simulations on Reconfigurable Platform. Proc of the 17th Int. Conf. on Field Programmable Logic and Applications, Amsterdam, 68–73. Aug. (9).

  18. Leeser, M., Theiler, J., Estlick, M., & Szymanski, J. J. (2000). Design tradeoffs in a hardware implementation of the K-means clustering algorithm. Proc. Of Sensor Array and Multichannel Signal Processing Workshop, 520–524.

  19. Frohlich, I., Gabriel, A., Kirschner, D., Lehert, J., Lins, E., Petri, M., et al. (2002). Pattern Recognition in the HADES—Spectrometer: An Application of FPGA Technology in Nuclear and Particle Physics. Proc International Conference on Field-Programmable Technology (FPT), Singapore, 443–444. Dec. (10).

  20. Neo, S., Goh, H., Ng, W. Y., Ong, J., & Pang, W. (2007). Real-time Online Multimedia Content Processing: Mobile Video Optical Character Recognition and Speech Synthesizer for the Visual Impaired. Proc Intl. Convention on Rehabilitation Engineering and Assistive Technology, Singapore, 201–206. (22).

  21. Schmit, H., & Thomas, D. (1995). Hidden Markov modeling and fuzzy controllers in FPGAs. Proc Symp. on FPGAs for Custom Computing Machines, Napa, CA, 214–221. Apr. (11).

  22. VanCourt, T., &Herbordt, M. (2005). Three Dimensional Template Correlation: Object Recognition in 3D Voxel Data. Proc Computer Architecture for Machine Perception, Washinton, DC, 153–158. (12).

  23. Holland, B., Nagarajan, K., Conger, C., Jacobs, A., & George, A. D. (2007). RAT: A Methodology for Predicting Performance in Application Design Migration to FPGAs. Proc. of High-Performance Reconfigurable Computing Technologies & Applications Workshop (HPRCTA 2007), SC’07, Reno, NV. Nov. 11. (3).

  24. Steffen,C. P. (2007). Parametrization of Algorithms and FPGA Accelerators to Predict Performance. Proc. of Reconfigurable System Summer Institute (RSSI), Urbana, IL, 17–20. (23).

  25. DeHon, A., Adams, J., DeLorimier, M., Kapre, N., Matsuda, Y., Naeimi, H., et al. (2004). Design Patterns for Reconfigurable Computing. IEEE Symp on Field Programmable Custom Computing Machines, Napa Valley, CA, 13–23. (24).

  26. Gamma, E., Johnson, R., Helm, H., Vlissides, J.M., & Booch, G. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison–Wesley Professional, 416.

  27. Anvik, J., MacDonald, S., Szafron, D., Schaeffer, J., Bromling, S., & Tan, K. (2002). Generating Parallel Programs from the Wavefront Design Pattern. Proc Intl Workshop on High-Level Parallel Programming Models and Supportive Environments, Fort Lauderdale, FL, 104–111.

  28. Gribbon, K.T., Bailey, D. G., & Johnston, C. T. (2005). Design Patterns for Image Processing Algorithm Development on FPGAs. IEEE TENCON, 1–6.

  29. Mashor, M. Y. (1998). Improving the Performance of K-means Clustering Algorithm to Position the Centres of RBF Networks. International Journal of the Computer, the Internet and Management, 6(2).

Download references

Acknowledgement

This work was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC-0642422. The authors gratefully acknowledge vendor equipment and/or tools provided by Xilinx, Aldec, Cray, and Nallatech that helped make this work possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karthik Nagarajan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nagarajan, K., Holland, B., George, A.D. et al. Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition. J Sign Process Syst 62, 43–63 (2011). https://doi.org/10.1007/s11265-008-0337-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0337-9

Keywords

Navigation