Ensemble Model Compression for Fast and Energy-Efficient Ranking on FPGAs

Gil-Costa, Veronica; Loor, Fernando; Molina, Romina; Nardini, Franco Maria; Perego, Raffaele; Trani, Salvatore

doi:10.1007/978-3-030-99736-6_18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13185))

Included in the following conference series:

European Conference on Information Retrieval

2610 Accesses

Abstract

We investigate novel SoC-FPGA solutions for fast and energy-efficient ranking based on machine-learned ensembles of decision trees. Since the memory footprint of ranking ensembles limits the effective exploitation of programmable logic for large-scale inference tasks, we investigate binning and quantization techniques to reduce the memory occupation of the learned model and we optimize the state-of-the-art ensemble-traversal algorithm for deployment on low-cost, energy-efficient FPGA devices. The results of the experiments conducted using publicly available Learning-to-Rank datasets, show that our model compression techniques do not impact significantly the accuracy. Moreover, the reduced space requirements allow the models and the logic to be replicated on the FPGA device in order to execute several inference tasks in parallel. We discuss in details the experimental settings and the feasibility of the deployment of the proposed solution in a real setting. The results of the experiments conducted show that our FPGA solution achieves performances at the state of the art and consumes from 9\(\times \) up to 19.8\(\times \) less energy than an equivalent multi-threaded CPU implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Asadi, N., Lin, J.: Training efficient tree-based models for document ranking. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 146–157. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_13
Chapter Google Scholar
Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of ICML, pp. 115–123. PMLR (2013)
Google Scholar
Burges, C.J.: From ranknet to lambdarank to lambdamart: an overview. Learning 11(23–581), 81 (2010)
Google Scholar
Busolin, F., Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Trani, S.: Learning early exit strategies for additive ranking ensembles. In: Proceedings of SIGIR, pp. 2217–2221. ACM (2021)
Google Scholar
Cambazoglu, B.B., et al.: Early exit optimizations for additive machine learned ranking systems. In: Proceedings of WSDM, pp. 411–420. ACM (2010)
Google Scholar
Capannini, G., Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N.: Quality versus efficiency in document scoring with learning-to-rank models. Inf. Process. Manag. 52(6), 1161–1177 (2016)
Article Google Scholar
Chapelle, O., Chang, Y.: Yahoo! Learning to rank challenge overview. J. Mach. Learn. Res. 14, 1–24 (2011). Proceedings Track
Google Scholar
Chen, R.C., Gallagher, L., Blanco, R., Culpepper, J.S.: Efficient cost-aware cascade ranking in multi-stage retrieval. In: Proceedings of SIGIR, pp. 445–454. ACM (2017)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of ACM SIGKDD, pp. 785–794. ACM (2016)
Google Scholar
Dato, D., et al.: Fast ranking with additive ensembles of oblivious and non-oblivious regression trees. ACM Trans. Inf. Syst. 35(2), 15:1–15:31 (2016)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)
MathSciNet MATH Google Scholar
Gallagher, L., Chen, R.C., Blanco, R., Culpepper, J.S.: Joint optimization of cascade ranking models. In: Proceedings of WSDM, pp. 15–23. ACM (2019)
Google Scholar
Gao, R., Hsu, F.H.: An FPGA-based accelerator for LambdaRank in web search engines. ACM TRETS 4, 1–19 (2011)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: Proceedings of ICLR (2016)
Google Scholar
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Article Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of NIPS, pp. 3149–3157 (2017)
Google Scholar
Lettich, F., et al.: Parallel traversal of large ensembles of decision trees. IEEE TPDS 30(9), 2075–2089 (2019)
Google Scholar
Li, Q., Wang, E., Fleming, S.T., Thomas, D., Cheung, P.: Accelerating position-aware top-k ListNet for ranking under custom precision regimes. In: Proceedings of FPL, pp. 81–87 (2019)
Google Scholar
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)
Article Google Scholar
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees. In: Proceedings of SIGIR, pp. 73–82. ACM (2015)
Google Scholar
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Silvestri, F., Salvatore, T.: X-CLEaVER: learning ranking ensembles by growing and pruning trees. ACM TIST 9, 1–26 (2018)
Article Google Scholar
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Silvestri, F., Trani, S.: Post-learning optimization of tree ensembles for efficient ranking. In: Proceedings of SIGIR, pp. 949–952 (2016)
Google Scholar
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Tonellotto, N., Venturini, R.: Exploiting CPU SIMD extensions to speed-up document scoring with tree ensembles. In: Proceedings of ACM SIGIR, pp. 833–836 (2016)
Google Scholar
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Trani, S.: X-DART: blending dropout and pruning for efficient learning to rank. In: Proceedings of SIGIR, pp. 1077–1080. ACM (2017)
Google Scholar
Lucchese, C., Nardini, F.M., Orlando, S., Perego, R., Trani, S.: Query-level early exit for additive learning-to-rank ensembles. In: Proceedings of SIGIR, pp. 2033–2036. ACM (2020)
Google Scholar
Molina, R., Loor, F., Gil-Costa, V., Nardini, F.M., Perego, R., Trani, S.: Efficient traversal of decision tree ensembles with FPGAs. J. Parallel Distrib. Comput. 155, 38–49 (2021)
Article Google Scholar
Qasaimeh, M., Denolf, K., Lo, J., Vissers, K., Zambreno, J., Jones, P.H.: Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels. In: Proceedings of IEEE ICESS, pp. 1–8 (2019)
Google Scholar
Sensi, D.D., Torquati, M., Danelutto, M.: Mammut: high-level management of system knobs and sensors. SoftwareX 6, 150–154 (2017)
Article Google Scholar
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of CIKM. ACM (2007)
Google Scholar
Wang, L., Lin, J., Metzler, D.: Learning to efficiently rank. In: Proceedings of SIGIR, pp. 138–145. ACM, New York (2010)
Google Scholar
Wu, Q., Burges, C., Svore, K., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13, 254–270 (2010). https://doi.org/10.1007/s10791-009-9112-1
Xin, J., Tang, R., Yu, Y., Lin, J.: BERxiT: early exiting for BERT with better fine-tuning and extension to regression. In: Proceedings of ACL, pp. 91–104. ACL, April 2021
Google Scholar
Xu, N.Y., Cai, X.F., Gao, R., Zhang, L., Hsu, F.H.: FPGA acceleration of RankBoost in web search engines. ACM TRETS 1(4), 1–19 (2009)
Article Google Scholar
Ye, T., Zhou, H., Zou, W.Y., Gao, B., Zhang, R.: RapidScorer: fast tree ensemble evaluation by maximizing compactness in data level parallelization. In: Proceedings of SIGKDD, pp. 941–950. ACM (2018)
Google Scholar

Download references

Acknowledgements

This work was partially supported by the project HAMLET: Hardware Acceleration of Machine LEarning Tasks, funded by CONICET (Argentina) and CNR (Italy) 2017-2018 collaboration program, by the TEACHING project, funded by the EU Horizon 2020 Research and Innovation program (Grant agreement ID: 871385), and by the OK-INSAID project, funded by the Italian Ministry of Education and Research (GA no. ARS01_00917).

Author information

Authors and Affiliations

Universidad Nacional de San Luis, San Luis, Argentina
Veronica Gil-Costa, Fernando Loor & Romina Molina
Università degli Studi di Trieste, Trieste, Italy
Romina Molina
ISTI-CNR, Pisa, Italy
Romina Molina, Franco Maria Nardini, Raffaele Perego & Salvatore Trani

Authors

Veronica Gil-Costa
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Loor
View author publications
You can also search for this author in PubMed Google Scholar
Romina Molina
View author publications
You can also search for this author in PubMed Google Scholar
Franco Maria Nardini
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Perego
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Trani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salvatore Trani .

Editor information

Editors and Affiliations

Martin Luther University Halle-Wittenberg, Halle, Germany
Matthias Hagen
Leiden University, Leiden, The Netherlands
Suzan Verberne
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Duisburg-Essen, Essen, Germany
Christin Seifert
University of Stavanger, Stavanger, Norway
Krisztian Balog
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Stavanger, Stavanger, Norway
Vinay Setty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gil-Costa, V., Loor, F., Molina, R., Nardini, F., Perego, R., Trani, S. (2022). Ensemble Model Compression for Fast and Energy-Efficient Ranking on FPGAs. In: Hagen, M., et al. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science, vol 13185. Springer, Cham. https://doi.org/10.1007/978-3-030-99736-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-99736-6_18
Published: 05 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99735-9
Online ISBN: 978-3-030-99736-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ensemble Model Compression for Fast and Energy-Efficient Ranking on FPGAs