Abstract
This paper presents a novel method for finding the top-k items in data streams using a reconfigurable accelerator. The accelerator is capable of extracting an approximate list of the topmost frequently occurring items in an input stream, which is only scanned once without the need for random-access. The accelerator is based on a hardware architecture that implements the well-known Probabilistic sampling algorithm by mapping its main processing stages to two custom systolic arrays. The proposed architecture is the first hardware implementation of this algorithm, which shows better scalability compared to other architectures that are based on other stream algorithms. When implemented on an Intel Arria 10 FPGA (10AX115N2F45E1SG), 50% of the FPGA chip is sufficient for 3000+ Processing Elements (PEs). Experimental results on both synthetic and real input datasets showed very good accuracy and significant throughput gains compared to existing solutions. With achieved throughputs exceeding 300 Million items/s, we report average speedups of 20x compared to typical software implementations, 1.5x compared to GPU-accelerated implementations, and 1.8x compared to the fastest FPGA implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends® Theor. Comput. Sci. 2(1), 117–236 (2005)
Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. VLDB Endow. 2(1), 1530–1541 (2008)
Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 4(68), 415–430 (2009)
Biookaghazadeh, S., Zhao, M., Ren, F.: Are FPGAs suitable for edge computing? In: {USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 2018) (2018)
Shawahna, A., Sait, S.M., El-Maleh, A.: FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7, 7823–7859 (2018)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: European Symposium on Algorithms, pp. 348–360 (2002)
Kung, H., Leiserson, C.E.: Systolic arrays (for VLSI). In: Sparse Matrix, pp. 256–282 (1979)
Baker, Z.K., Prasanna, V.K.: Efficient hardware data mining with the Apriori algorithm on FPGAs. In: 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2005), pp. 3–12 (2005)
Prost-Boucle, A., Pétrot, F., Leroy, V., Alemdar, H.: Efficient and versatile FPGA acceleration of support counting for stream mining of sequences and frequent itemsets. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 3(10), 1–25 (2017)
Sun, S., Zambreno, J.: Design and analysis of a reconfigurable platform for frequent pattern mining. IEEE Trans. Parallel Distrib. Syst. 9(22), 1497–1505 (2011)
Teubner, J., Muller, R., Alonso, G.: Frequent item computation on a chip. IEEE Trans. Knowl. Data Eng. 8(23), 1169–1181 (2010)
Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: International Conference on Database Theory, pp. 398–412 (2005)
Sun, Y., et al.: Accelerating frequent item counting with FPGA. In: 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 109–112 (2014)
Ebrahim, A., Khlaifat, J.: An efficient hardware architecture for finding frequent items in data streams. In: 2020 IEEE 38th International Conference on Computer Design (ICCD), pp. 113–119 (2020)
Tong, D., Prasanna, V.K.: Sketch acceleration on FPGA and its applications in network anomaly detection. IEEE Trans. Parallel Distrib. Syst. 4(29), 929–942 (2017)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 1(55), 58–75 (2005)
Tong, D., Prasanna, V.: Online heavy hitter detector on FPGA. In: 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig), pp. 1–6 (2013)
Zazo, J.F., Lopez-Buedo, S., Ruiz, M., Sutter, G.: A single-FPGA architecture for detecting heavy hitters in 100 gbit/s ethernet links. In: 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig), pp. 1–6 (2017)
Intel Arria 10 Device Overview. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/arria-10/a10_overview.pdf. Accessed 26 Jan 2021
Zipf, G.K.: Human behavior and the principle of least effort (1949)
Cafaro, M., Epicoco, I., Aloisio, G., Pulimeno, M.: Cuda based parallel implementations of space-saving on a GPU. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 707–714 (2017)
Erra, U., Frola, B.: Frequent items mining acceleration exploiting fast parallel sorting on the GPU. Procedia Comput. Sci. 9, 86–95 (2012)
Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 254–260 (1999)
Frequent itemset mining dataset repository, University of Helsinki. http://fimi.cs.helsinki.fi/data/. Accessed 26 Jan 2021
Fournier-Viger, P., et al.: The SPMF open-source data mining library version 2. In: Berendt, B., Bringmann, B., Fromont, É., Garriga, G., Miettinen, P., Tatti, N., Tresp, V. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9853, pp. 36–40. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46131-1_8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ebrahim, A., Khalifat, J. (2021). Fast Approximation of the Top-k Items in Data Streams Using a Reconfigurable Accelerator. In: Derrien, S., Hannig, F., Diniz, P.C., Chillet, D. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2021. Lecture Notes in Computer Science(), vol 12700. Springer, Cham. https://doi.org/10.1007/978-3-030-79025-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-79025-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79024-0
Online ISBN: 978-3-030-79025-7
eBook Packages: Computer ScienceComputer Science (R0)