Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks

  • Paula Lauren
  • Guangzhi Qu
  • Jucheng Yang
  • Paul Watta
  • Guang-Bin Huang
  • Amaury Lendasse


Word Embeddings are low-dimensional distributed representations that encompass a set of language modeling and feature learning techniques from Natural Language Processing (NLP). Words or phrases from the vocabulary are mapped to vectors of real numbers in a low-dimensional space. In previous work, we proposed using an Extreme Learning Machine (ELM) for generating word embeddings. In this research, we apply the ELM-based Word Embeddings to the NLP task of Text Categorization, specifically Sentiment Analysis and Sequence Labeling. The ELM-based Word Embeddings utilizes a count-based approach similar to the Global Vectors (GloVe) model, where the word-context matrix is computed then matrix factorization is applied. A comparative study is done with Word2Vec and GloVe, which are the two popular state-of-the-art models. The results show that ELM-based Word Embeddings slightly outperforms the aforementioned two methods in the Sentiment Analysis and Sequence Labeling tasks.In addition, only one hyperparameter is needed using ELM whereas several are utilized for the other methods. ELM-based Word Embeddings are comparable to the state-of-the-art methods: Word2Vec and GloVe models. In addition, the count-based ELM model have word similarities to both the count-based GloVe and the predict-based Word2Vec models, with subtle differences.


Word embeddings Extreme learning machine (ELM) Word2Vec Global vectors (GloVe) Text categorization Sentiment analysis Sequence labeling 


Compliance with Ethical Standards

Conflict of interests

The authors declare that they have no conflict of interest.

Informed Consent

Consent was not required as no human or animals were involved.

Human and Animal Rights

This article does not contain any studies with human participants or animals, performed by any of the authors.


  1. 1.
    Manning CD, Schütze H. Foundations of statistical natural language processing. Cambridge: MIT press; 1999.Google Scholar
  2. 2.
    Duda RO, Hart PE, Stork DG. Pattern classification. New York: Wiley; 2012.Google Scholar
  3. 3.
    Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. ICLR Workshop. 2013.Google Scholar
  4. 4.
    Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. EMNLP; 2014.Google Scholar
  5. 5.
    Goth G. Deep or shallow, nlp is breaking out. Commun ACM 2016;59(3):13–16.CrossRefGoogle Scholar
  6. 6.
    Maas AL, Ng AY. A probabilistic model for semantic word vectors. NIPS 2010 workshop on deep learning and unsupervised feature learning; 2010. p. 1–8.Google Scholar
  7. 7.
    Zou WY, Socher R, Cer DM, Manning CD. Bilingual word embeddings for phrase-based machine translation. EMNLP; 2013. p. 1393–1398.Google Scholar
  8. 8.
    Le QV, Mikolov T. Distributed representations of sentences and documents. Proceedings of ICML; 2014.Google Scholar
  9. 9.
    Huang G, Huang G-B, Song S, You K. Trends in extreme learning machines: a review. Neural Netw 2015;61:32–48.CrossRefPubMedGoogle Scholar
  10. 10.
    Dai AM, Olah C, Le QV. Document embedding with paragraph vectors, arXiv:1507.07998, 2015. 2015.
  11. 11.
    Landauer TK, Foltz PW, Laham D. An introduction to latent semantic analysis. Discourse Processes 1998; 25:259–284.CrossRefGoogle Scholar
  12. 12.
    Lauren P, Qu G, Zhang F, Lendasse A. Clinical narrative classification using discriminant word embeddings with elm. Int’l joint conference on neural networks, Vancouver, Canada, July 24–29. IEEE; 2016.Google Scholar
  13. 13.
    Lauren P, Qu G, Zhang F, Lendasse A. Discriminant document embeddings with an extreme learning machine for classifying clinical narrative. Neurocomputing, vol. 277, 14 February 2018, pp. 129–138. 2017.Google Scholar
  14. 14.
    Zheng W, Qian Y, Lu H. Text categorization based on regularization extreme learning machine. Neural Comput & Applic 2013;22(3-4):447–456.CrossRefGoogle Scholar
  15. 15.
    Zeng L, Li Z. Text classification based on paragraph distributed representation and extreme learning machine. Int’l conference in swarm intelligence. Springer; 2015. p. 81–88.Google Scholar
  16. 16.
    Poria S, Cambria E, Gelbukh A, Bisio F, Hussain A. Sentiment data flow analysis by means of dynamic linguistic patterns. IEEE Comput Intell Mag 2015;10(4):26–36.CrossRefGoogle Scholar
  17. 17.
    Cambria E, Gastaldo P, Bisio F, Zunino R. An elm-based model for affective analogical reasoning. Neurocomputing 2015;149:443–455.CrossRefGoogle Scholar
  18. 18.
    Erb RJ. Introduction to backpropagation neural network computation. Pharm Res 1993;10(2):165–170.CrossRefPubMedGoogle Scholar
  19. 19.
    Murphy KP. Machine learning: a probabilistic perspective. Cambridge: MIT Press; 2012.Google Scholar
  20. 20.
    Baldi P. Autoencoders, unsupervised learning, and deep architectures. J. Mach. Learn. Res. (Proceedings of ICML workshop on unsupervised and transfer learning) 2012;27:37–50.Google Scholar
  21. 21.
    Ofek N, Poria S, Rokach L, Cambria E, Hussain A, Shabtai A. Unsupervised commonsense knowledge enrichment for domain-specific sentiment analysis. Cogn Comput 2016;8(3):467–477. Scholar
  22. 22.
    Xia Y, Cambria E, Hussain A, Zhao H. Word polarity disambiguation using bayesian model and opinion-level features. Cogn Comput 2015;7(3):369–380. Scholar
  23. 23.
    Li Y, Pan Q, Yang T, Wang S, Tang J, Cambria E. Learning word representations for sentiment analysis. Cogn Comput 2017;9(6):843–851. Scholar
  24. 24.
    Cambria E. Affective computing and sentiment analysis. IEEE Intell Syst 2016;31(2):102–107.CrossRefGoogle Scholar
  25. 25.
    Mesnil G, He X, Deng L, Bengio Y. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. Proceedings of interspeech; 2013.Google Scholar
  26. 26.
    Yao K, Zweig G, Hwang M-Y, Shi Y, Yu D. Recurrent neural networks for language understanding. Interspeech; 2013. p. 2524–2528.Google Scholar
  27. 27.
    Zhu S, Yu K. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE; 2017. p. 5675–5679.Google Scholar
  28. 28.
    Lauren P, Qu G, Huang G-B, Watta P, Lendasse A. A low-dimensional vector representation for words using an extreme learning machine. 2017 international joint conference on neural networks (IJCNN); 2017. p. 1817–1822.Google Scholar
  29. 29.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems; 2013. p. 3111–3119.Google Scholar
  30. 30.
    Bottou L. Online learning and stochastic approximations. On-Line Learning in Neural Networks 1998;17(9):142.Google Scholar
  31. 31.
    Mnih A, Hinton GE. A scalable hierarchical distributed language model. Advances in neural information processing systems; 2009. p. 1081–1088.Google Scholar
  32. 32.
    Mikolov T, Deoras A, Povey D, Burget L, Černockỳ J. Strategies for training large scale neural network language models. 2011 IEEE workshop on automatic speech recognition and understanding (ASRU). IEEE; 2011. p. 196–201.Google Scholar
  33. 33.
    Mnih A, Kavukcuoglu K. Learning word embeddings efficiently with noise-contrastive estimation. Advances in neural information processing systems; 2013. p. 2265–2273.Google Scholar
  34. 34.
    Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 2011;12(Jul):2121–2159.Google Scholar
  35. 35.
    Rao CR, Mitra SK, et al. Generalized inverse of a matrix and its applications. Proceedings of the sixth berkeley symposium on mathematical statistics and probability, Volume 1: Theory of Statistics. The Regents of the University of California; 1972.Google Scholar
  36. 36.
    Huang G-B, Siew C-K. Extreme learning machine: Rbf network case. ICARCV 2004 8th control, automation, robotics and vision conference, 2004. vol. 2. IEEE; 2004. p. 1029–1036.Google Scholar
  37. 37.
    Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing 2006;70(1):489–501.CrossRefGoogle Scholar
  38. 38.
    Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern 2012;42(2):513–529.CrossRefPubMedGoogle Scholar
  39. 39.
    Kasun LLC, Zhou H, Huang G-B, Vong CM. Representational learning with elms for big data. IEEE Intell Syst 2013;28(6):31–34.Google Scholar
  40. 40.
    Alpaydin E. Introduction to machine learning. Cambridge: MIT Press; 2014.Google Scholar
  41. 41.
    Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT Press; 2016.Google Scholar
  42. 42.
    James G, Witten D, Hastie T. An introduction to statistical learning: with applications in r. New York: Springer. 2014.Google Scholar
  43. 43.
    Everitt BS, Dunn G. 2001. Applied multivariate data analysis. Wiley Online Library, Vol. 2.Google Scholar
  44. 44.
    Pang B, Lee L. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. Proceedings of the ACL; 2004.Google Scholar
  45. 45.
    Hemphill CT, Godfrey JJ, Doddington GR, et al. The atis spoken language systems pilot corpus. Proceedings of the DARPA speech and natural language workshop; 1990.Google Scholar
  46. 46.
    Dahl DA, Bates M, Brown M, Fisher W, Hunicke-Smith K, Pallett D, Pao C, Rudnicky A, Shriberg E. Expanding the scope of the atis task: the atis-3 corpus. Proceedings of the workshop on human language technology. Association for Computational Linguistics; 1994. p. 43–48.Google Scholar
  47. 47.
    Tur G, Hakkani-Tür D, Heck L. What is left to be understood in atis? 2010 IEEE spoken language technology workshop (SLT). IEEE; 2010. p. 19–24.Google Scholar
  48. 48.
    Ramshaw LA, Marcus MP. Text chunking using transformation-based learning. Natural language processing using very large corpora. Springer; 1999. p. 157–176.Google Scholar
  49. 49.
    Bullinaria JA, Levy JP. Extracting semantic representations from word co-occurrence statistics: a computational study. Behav Res Methods 2007;39(3):510–526.CrossRefPubMedGoogle Scholar
  50. 50.
    Weisstein EW. Sigmoid function. Accessed 5 Jan 2017. 2002.
  51. 51.
    Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(1):1929–1958.Google Scholar
  52. 52.
    Maaten Lvd, Hinton G. Visualizing data using t-sne. J Mach Learn Res 2008;9(Nov):2579–2605.Google Scholar
  53. 53.
    Tan P-N, Steinbach M, Kumar V. Introduction to data mining. London: Pearson; 2006.Google Scholar
  54. 54.
    Li J, Jurafsky D. Do multi-sense embeddings improve natural language understanding? arXiv:1506.01070. 2015.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringOakland UniversityRochesterUSA
  2. 2.College of Computer Science and Information EngineeringTianjin University of Science and TechnologyTianjinChina
  3. 3.Department of Electrical and Computer EngineeringThe University of Michigan-DearbornDearbornUSA
  4. 4.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore
  5. 5.Department of Industrial and Systems EngineeringThe University of IowaIowa CityUSA

Personalised recommendations