Sequential Model-Based Optimization for Natural Language Processing Data Pipeline Selection and Optimization

Arntong, Piyadanai; Pongpech, Worapol Alex

doi:10.1007/978-3-030-73280-6_24

Piyadanai Arntong¹² &
Worapol Alex Pongpech¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12672))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

1831 Accesses

Abstract

Natural language processing (NLP) aims to analyze a large amount of natural language data. The NLP computes textual data via a set of data processing elements which is sequentially connected to a path data pipeline. Several data pipelines exist for a given set of textual data with various degrees of model accuracy. Instead of trying all the possible paths, such as random search or grid search to find an optimal path, we utilized the Bayesian optimization to search along with the space of hyper-parameters learning. In this study, we proposed a data pipeline selection for NLP using Sequential Model-based Optimization (SMBO). We implemented the SMBO for the NLP data pipeline using Hyperparameter Optimization (Hyperopt) library with Tree of Parzen Estimators (TPE) model and Adaptive Tree of Parzen Estimators (A-TPE) model for a surface model with expected improvement (EI) acquired function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manage. 50(1), 104–112 (2014)
Article Google Scholar
Naymat, G., Etaiwi, W.: The impact of applying different preprocessing steps on review spam detection. In: The 8th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (2017)
Google Scholar
Alam, S., Yao, N.: The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25(3), 319–335 (2019)
Article Google Scholar
Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using N-gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016)
Article Google Scholar
Gaydhani, A., Doma, V., Kendre, S., Bhagwat, L.: Detecting hate speech and offensive language on twitter using machine learning: an N-gram and TFIDF based approach. arXiv preprint arXiv:1809.08651 (2018)
Quemy, A.: Data pipeline selection and optimization. Design, Optimization, Languages and Analytical Processing of Big Data (2019)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)
MathSciNet MATH Google Scholar
Mullapudi, R.T., Vasista, V., Bondhugula, U.: PolyMage: automatic optimization for image processing pipelines. ACM SIGARCH Comput. Archit. News 43(1), 429–443 (2015)
Google Scholar
Moore, J.H., Olson, R.S.: TPOT: a tree-based pipeline optimization tool for automating machine learning, pp. 151–160 (2019)
Google Scholar
Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25566-3_40
Chapter Google Scholar
Chauhan, K., et al.: Automated machine learning: the new wave of machine learning, pp. 205–212 (2020)
Google Scholar
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Google Scholar
Bergstra, J., Yamins, D., Cox, D.D.: Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in Science Conference, vol. 13, p. 20. Citeseer (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Statistics, National Institute of Development Administration, Bangkok, Thailand
Piyadanai Arntong & Worapol Alex Pongpech

Authors

Piyadanai Arntong
View author publications
You can also search for this author in PubMed Google Scholar
Worapol Alex Pongpech
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Worapol Alex Pongpech .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Suphamit Chittayasothorn
Nanyang Technological University, Singapore, Singapore
Dusit Niyato
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arntong, P., Pongpech, W.A. (2021). Sequential Model-Based Optimization for Natural Language Processing Data Pipeline Selection and Optimization. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-73280-6_24
Published: 05 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73279-0
Online ISBN: 978-3-030-73280-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics