Skip to main content

Fast Approximation of the Top-k Items in Data Streams Using a Reconfigurable Accelerator

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2021)

Abstract

This paper presents a novel method for finding the top-k items in data streams using a reconfigurable accelerator. The accelerator is capable of extracting an approximate list of the topmost frequently occurring items in an input stream, which is only scanned once without the need for random-access. The accelerator is based on a hardware architecture that implements the well-known Probabilistic sampling algorithm by mapping its main processing stages to two custom systolic arrays. The proposed architecture is the first hardware implementation of this algorithm, which shows better scalability compared to other architectures that are based on other stream algorithms. When implemented on an Intel Arria 10 FPGA (10AX115N2F45E1SG), 50% of the FPGA chip is sufficient for 3000+ Processing Elements (PEs). Experimental results on both synthetic and real input datasets showed very good accuracy and significant throughput gains compared to existing solutions. With achieved throughputs exceeding 300 Million items/s, we report average speedups of 20x compared to typical software implementations, 1.5x compared to GPU-accelerated implementations, and 1.8x compared to the fastest FPGA implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends® Theor. Comput. Sci. 2(1), 117–236 (2005)

    Google Scholar 

  2. Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. VLDB Endow. 2(1), 1530–1541 (2008)

    Article  Google Scholar 

  3. Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 4(68), 415–430 (2009)

    Article  Google Scholar 

  4. Biookaghazadeh, S., Zhao, M., Ren, F.: Are FPGAs suitable for edge computing? In: {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 2018) (2018)

    Google Scholar 

  5. Shawahna, A., Sait, S.M., El-Maleh, A.: FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7, 7823–7859 (2018)

    Article  Google Scholar 

  6. Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: European Symposium on Algorithms, pp. 348–360 (2002)

    Google Scholar 

  7. Kung, H., Leiserson, C.E.: Systolic arrays (for VLSI). In: Sparse Matrix, pp. 256–282 (1979)

    Google Scholar 

  8. Baker, Z.K., Prasanna, V.K.: Efficient hardware data mining with the Apriori algorithm on FPGAs. In: 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), pp. 3–12 (2005)

    Google Scholar 

  9. Prost-Boucle, A., Pétrot, F., Leroy, V., Alemdar, H.: Efficient and versatile FPGA acceleration of support counting for stream mining of sequences and frequent itemsets. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 3(10), 1–25 (2017)

    Google Scholar 

  10. Sun, S., Zambreno, J.: Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 9(22), 1497–1505 (2011)

    Article  Google Scholar 

  11. Teubner, J., Muller, R., Alonso, G.: Frequent item computation on a chip. IEEE Trans. Knowl. Data Eng. 8(23), 1169–1181 (2010)

    Google Scholar 

  12. Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: International Conference on Database Theory, pp. 398–412 (2005)

    Google Scholar 

  13. Sun, Y., et al.: Accelerating frequent item counting with FPGA. In: 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 109–112 (2014)

    Google Scholar 

  14. Ebrahim, A., Khlaifat, J.: An efficient hardware architecture for finding frequent items in data streams. In: 2020 IEEE 38th International Conference on Computer Design (ICCD), pp. 113–119 (2020)

    Google Scholar 

  15. Tong, D., Prasanna, V.K.: Sketch acceleration on FPGA and its applications in network anomaly detection. IEEE Trans. Parallel Distrib. Syst. 4(29), 929–942 (2017)

    Google Scholar 

  16. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 1(55), 58–75 (2005)

    Article  MathSciNet  Google Scholar 

  17. Tong, D., Prasanna, V.: Online heavy hitter detector on FPGA. In: 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6 (2013)

    Google Scholar 

  18. Zazo, J.F., Lopez-Buedo, S., Ruiz, M., Sutter, G.: A single-FPGA architecture for detecting heavy hitters in 100 gbit/s ethernet links. In: 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pp. 1–6 (2017)

    Google Scholar 

  19. Intel Arria 10 Device Overview. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/arria-10/a10_overview.pdf. Accessed 26 Jan 2021

  20. Zipf, G.K.: Human behavior and the principle of least effort (1949)

    Google Scholar 

  21. Cafaro, M., Epicoco, I., Aloisio, G., Pulimeno, M.: Cuda based parallel implementations of space-saving on a GPU. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 707–714 (2017)

    Google Scholar 

  22. Erra, U., Frola, B.: Frequent items mining acceleration exploiting fast parallel sorting on the GPU. Procedia Comput. Sci. 9, 86–95 (2012)

    Article  Google Scholar 

  23. Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 254–260 (1999)

    Google Scholar 

  24. Frequent itemset mining dataset repository, University of Helsinki. http://fimi.cs.helsinki.fi/data/. Accessed 26 Jan 2021

  25. Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., Bringmann, B., Fromont, É., Garriga, G., Miettinen, P., Tatti, N., Tresp, V. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Ebrahim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ebrahim, A., Khalifat, J. (2021). Fast Approximation of the Top-k Items in Data Streams Using a Reconfigurable Accelerator. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-79025-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-79024-0

  • Online ISBN: 978-3-030-79025-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics