Fast Approximation of the Top-k Items in Data Streams Using a Reconfigurable Accelerator

Ebrahim, Ali; Khalifat, Jalal

doi:10.1007/978-3-030-79025-7_1

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12700))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

1 Citations

Abstract

This paper presents a novel method for finding the top-k items in data streams using a reconfigurable accelerator. The accelerator is capable of extracting an approximate list of the topmost frequently occurring items in an input stream, which is only scanned once without the need for random-access. The accelerator is based on a hardware architecture that implements the well-known Probabilistic sampling algorithm by mapping its main processing stages to two custom systolic arrays. The proposed architecture is the first hardware implementation of this algorithm, which shows better scalability compared to other architectures that are based on other stream algorithms. When implemented on an Intel Arria 10 FPGA (10AX115N2F45E1SG), 50% of the FPGA chip is sufficient for 3000+ Processing Elements (PEs). Experimental results on both synthetic and real input datasets showed very good accuracy and significant throughput gains compared to existing solutions. With achieved throughputs exceeding 300 Million items/s, we report average speedups of 20x compared to typical software implementations, 1.5x compared to GPU-accelerated implementations, and 1.8x compared to the fastest FPGA implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends® Theor. Comput. Sci. 2(1), 117–236 (2005)
Google Scholar
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. VLDB Endow. 2(1), 1530–1541 (2008)
Article Google Scholar
Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 4(68), 415–430 (2009)
Article Google Scholar
Biookaghazadeh, S., Zhao, M., Ren, F.: Are FPGAs suitable for edge computing? In: {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 2018) (2018)
Google Scholar
Shawahna, A., Sait, S.M., El-Maleh, A.: FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7, 7823–7859 (2018)
Article Google Scholar
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: European Symposium on Algorithms, pp. 348–360 (2002)
Google Scholar
Kung, H., Leiserson, C.E.: Systolic arrays (for VLSI). In: Sparse Matrix, pp. 256–282 (1979)
Google Scholar
Baker, Z.K., Prasanna, V.K.: Efficient hardware data mining with the Apriori algorithm on FPGAs. In: 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), pp. 3–12 (2005)
Google Scholar
Prost-Boucle, A., Pétrot, F., Leroy, V., Alemdar, H.: Efficient and versatile FPGA acceleration of support counting for stream mining of sequences and frequent itemsets. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 3(10), 1–25 (2017)
Google Scholar
Sun, S., Zambreno, J.: Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 9(22), 1497–1505 (2011)
Article Google Scholar
Teubner, J., Muller, R., Alonso, G.: Frequent item computation on a chip. IEEE Trans. Knowl. Data Eng. 8(23), 1169–1181 (2010)
Google Scholar
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: International Conference on Database Theory, pp. 398–412 (2005)
Google Scholar
Sun, Y., et al.: Accelerating frequent item counting with FPGA. In: 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 109–112 (2014)
Google Scholar
Ebrahim, A., Khlaifat, J.: An efficient hardware architecture for finding frequent items in data streams. In: 2020 IEEE 38th International Conference on Computer Design (ICCD), pp. 113–119 (2020)
Google Scholar
Tong, D., Prasanna, V.K.: Sketch acceleration on FPGA and its applications in network anomaly detection. IEEE Trans. Parallel Distrib. Syst. 4(29), 929–942 (2017)
Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 1(55), 58–75 (2005)
Article MathSciNet Google Scholar
Tong, D., Prasanna, V.: Online heavy hitter detector on FPGA. In: 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6 (2013)
Google Scholar
Zazo, J.F., Lopez-Buedo, S., Ruiz, M., Sutter, G.: A single-FPGA architecture for detecting heavy hitters in 100 gbit/s ethernet links. In: 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pp. 1–6 (2017)
Google Scholar
Intel Arria 10 Device Overview. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/arria-10/a10_overview.pdf. Accessed 26 Jan 2021
Zipf, G.K.: Human behavior and the principle of least effort (1949)
Google Scholar
Cafaro, M., Epicoco, I., Aloisio, G., Pulimeno, M.: Cuda based parallel implementations of space-saving on a GPU. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 707–714 (2017)
Google Scholar
Erra, U., Frola, B.: Frequent items mining acceleration exploiting fast parallel sorting on the GPU. Procedia Comput. Sci. 9, 86–95 (2012)
Article Google Scholar
Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 254–260 (1999)
Google Scholar
Frequent itemset mining dataset repository, University of Helsinki. http://fimi.cs.helsinki.fi/data/. Accessed 26 Jan 2021
Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., Bringmann, B., Fromont, É., Garriga, G., Miettinen, P., Tatti, N., Tresp, V. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University of Bahrain, Sakhir Campus, Bahrain
Ali Ebrahim & Jalal Khalifat

Authors

Ali Ebrahim
View author publications
You can also search for this author in PubMed Google Scholar
Jalal Khalifat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Ebrahim .

Editor information

Editors and Affiliations

IRISA, University of Rennes 1, Rennes, France
Steven Derrien
Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Frank Hannig
INESC-ID, Lisboa, Portugal
Pedro C. Diniz
ENSSAT, University of Rennes 1, Lannion, France
Daniel Chillet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ebrahim, A., Khalifat, J. (2021). Fast Approximation of the Top-k Items in Data Streams Using a Reconfigurable Accelerator. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-79025-7_1
Published: 23 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79024-0
Online ISBN: 978-3-030-79025-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics