Skip to main content

Sequential Model-Based Optimization for Natural Language Processing Data Pipeline Selection and Optimization

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12672))

Included in the following conference series:

  • 1831 Accesses

Abstract

Natural language processing (NLP) aims to analyze a large amount of natural language data. The NLP computes textual data via a set of data processing elements which is sequentially connected to a path data pipeline. Several data pipelines exist for a given set of textual data with various degrees of model accuracy. Instead of trying all the possible paths, such as random search or grid search to find an optimal path, we utilized the Bayesian optimization to search along with the space of hyper-parameters learning. In this study, we proposed a data pipeline selection for NLP using Sequential Model-based Optimization (SMBO). We implemented the SMBO for the NLP data pipeline using Hyperparameter Optimization (Hyperopt) library with Tree of Parzen Estimators (TPE) model and Adaptive Tree of Parzen Estimators (A-TPE) model for a surface model with expected improvement (EI) acquired function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manage. 50(1), 104–112 (2014)

    Article  Google Scholar 

  2. Naymat, G., Etaiwi, W.: The impact of applying different preprocessing steps on review spam detection. In: The 8th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (2017)

    Google Scholar 

  3. Alam, S., Yao, N.: The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25(3), 319–335 (2019)

    Article  Google Scholar 

  4. Tripathy, A., Agrawal, A., Rath, S.K.: Classification of sentiment reviews using N-gram machine learning approach. Expert Syst. Appl. 57, 117–126 (2016)

    Article  Google Scholar 

  5. Gaydhani, A., Doma, V., Kendre, S., Bhagwat, L.: Detecting hate speech and offensive language on twitter using machine learning: an N-gram and TFIDF based approach. arXiv preprint arXiv:1809.08651 (2018)

  6. Quemy, A.: Data pipeline selection and optimization. Design, Optimization, Languages and Analytical Processing of Big Data (2019)

    Google Scholar 

  7. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  8. Mullapudi, R.T., Vasista, V., Bondhugula, U.: PolyMage: automatic optimization for image processing pipelines. ACM SIGARCH Comput. Archit. News 43(1), 429–443 (2015)

    Google Scholar 

  9. Moore, J.H., Olson, R.S.: TPOT: a tree-based pipeline optimization tool for automating machine learning, pp. 151–160 (2019)

    Google Scholar 

  10. Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 507–523. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25566-3_40

    Chapter  Google Scholar 

  11. Chauhan, K., et al.: Automated machine learning: the new wave of machine learning, pp. 205–212 (2020)

    Google Scholar 

  12. Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)

    Google Scholar 

  13. Bergstra, J., Yamins, D., Cox, D.D.: Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in Science Conference, vol. 13, p. 20. Citeseer (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Worapol Alex Pongpech .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arntong, P., Pongpech, W.A. (2021). Sequential Model-Based Optimization for Natural Language Processing Data Pipeline Selection and Optimization. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73280-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73279-0

  • Online ISBN: 978-3-030-73280-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics