Skip to main content

Optimizing Natural Language Processing Pipelines: Opinion Mining Case Study

  • 1118 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 11896)

Abstract

This research presents NLP-Opt, an Auto-ML technique for optimizing pipelines of machine learning algorithms that can be applied to different Natural Language Processing tasks. The process of selecting the algorithms and their parameters is modelled as an optimization problem and a technique was proposed to find an optimal combination based on the metaheuristic Population-Based Incremental Learning (PBIL). For validation purposes, this approach is applied to a standard opinion mining problem. NLP-Opt effectively optimizes the algorithms and parameters of pipelines. Additionally, NLP-Opt outputs probabilistic information about the optimization process, revealing the most relevant components of pipelines. The proposed technique can be applied to different Natural Language Processing problems, and the information provided by NLP-Opt can be used by researchers to gain insights on the characteristics of the best-performing pipelines. The source code is made available for other researchers. In contrast with other Auto-ML approaches, NLP-Opt provides a flexible mechanism for designing generic pipelines that can be applied to NLP problems. Furthermore, the use of the probabilistic model provides a more comprehensive approach to the Auto-ML problem that enriches researcher understanding of the possible solutions.

Keywords

  • Natural Language Processing
  • Pipeline optimization
  • Metaheuristics
  • Opinion mining

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-33904-3_15
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-33904-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    For blind-review purposes this link is omitted.

  2. 2.

    As illustrated in Fig. 2, each evaluation consists of a full run of the classification pipeline with a specific combination of preprocessing, reduction and classification.

References

  1. Abualigah, L.M., Khader, A.T., Al-Betar, M.A.: Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT), pp. 1–6. IEEE (2016)

    Google Scholar 

  2. Abualigah, L.M., Khader, A.T., Al-Betar, M.A., Alomari, O.A.: Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst. Appl. 84, 24–36 (2017). https://doi.org/10.1016/j.eswa.2017.05.002. http://www.sciencedirect.com/science/article/pii/S0957417417303172

    CrossRef  Google Scholar 

  3. Baluja, S.: Population-based incremental learning. A method for integrating genetic search based function optimization and competitive learning. Technical report, DTIC Document (1994)

    Google Scholar 

  4. Bishop, C.M.: Model-based machine learning. Phil. Trans. R. Soc. A 371(1984), 20120222 (2013)

    MathSciNet  CrossRef  Google Scholar 

  5. Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. Soc. Mobile Web 11(02), 11–17 (2011)

    Google Scholar 

  6. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)

    Google Scholar 

  7. Jain, S., Shukla, S., Wadhvani, R.: Dynamic selection of normalization techniques using data complexity measures. Expert Syst. Appl. 106, 252–262 (2018). https://doi.org/10.1016/j.eswa.2018.04.008. http://www.sciencedirect.com/science/article/pii/S095741741830232X

    CrossRef  Google Scholar 

  8. Komer, B., Bergstra, J., Eliasmith, C.: Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML, pp. 2825–2830. Citeseer (2014)

    Google Scholar 

  9. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of twitter posts. Expert Syst. Appl. 40, 4065–4074 (2013)

    CrossRef  Google Scholar 

  10. Luke, S.: Essentials of Metaheuristics. Lulu 2009. http://cs.gmu.edu/~sean/book/metaheuristics/ (2011)

  11. Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  12. Mohr, F., Wever, M., Hüllermeier, E.: ML-Plan: automated machine learning via hierarchical planning. Mach. Learn. 107(8), 1495–1515 (2018). https://doi.org/10.1007/s10994-018-5735-z

    MathSciNet  CrossRef  MATH  Google Scholar 

  13. Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 151–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_8

    CrossRef  Google Scholar 

  14. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)

    CrossRef  Google Scholar 

  15. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  16. Rosenthal, S., Farra, N., Nakov, P.: Semeval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 502–518 (2017)

    Google Scholar 

  17. de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16

    CrossRef  Google Scholar 

  18. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)

  19. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM (2013)

    Google Scholar 

  20. Villena-román, J., Lana Serrano, S., Martínez Cámara, E., González Cristóbal, J.C.: TASS workshop on sentiment analysis at SEPLN. Procesamiento del Lenguaje Natural (2013)

    Google Scholar 

  21. Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., Liu, B.: Combining lexicon based and learning-based methods for twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011 89 (2011)

    Google Scholar 

Download references

Acknowledgments

This research has been supported by a Carolina Foundation grant in accordance with the University of Alicante and the University of Havana. This work has also been partially funded by both aforementioned universities, the Generalitat Valenciana and the Spanish Government through the projects SIIA (PROMETEU/2018/089), LIVINGLANG (RTI2018-094653-B-C22) and INTEGER (RTI2018-094649-B-I00).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suilan Estevez-Velarde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Estevez-Velarde, S., Gutiérrez, Y., Montoyo, A., Almeida-Cruz, Y. (2019). Optimizing Natural Language Processing Pipelines: Opinion Mining Case Study. In: Nyström, I., Hernández Heredia, Y., Milián Núñez, V. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2019. Lecture Notes in Computer Science(), vol 11896. Springer, Cham. https://doi.org/10.1007/978-3-030-33904-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33904-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33903-6

  • Online ISBN: 978-3-030-33904-3

  • eBook Packages: Computer ScienceComputer Science (R0)