Skip to main content

Continuous-Space Language Processing: Beyond Word Embeddings

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9918))

Abstract

Spoken and written language processing has seen a dramatic shift in recent years to increased use of continuous-space representations of language via neural networks and other distributional methods. In particular, word embeddings are used in many applications. This paper looks at the advantages of the continuous-space approach and limitations of word embeddings, reviewing recent work that attempts to model more of the structure in language. In addition, we discuss how current models characterize the exceptions in language and opportunities for advances by integrating traditional and continuous approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alexandrescu, A., Kirchhoff, K.: Factored neural language models. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (2006)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the International Conference Learning Representations (ICLR) (2015)

    Google Scholar 

  3. Ballesteros, M., Dyer, C., Smith, N.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP), pp. 349–359 (2015)

    Google Scholar 

  4. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    MATH  Google Scholar 

  5. Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Proceedings of the Conference Neural Information Processing System (NIPS), pp. 932–938 (2001)

    Google Scholar 

  6. Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  7. Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)

    Google Scholar 

  8. Bruni, E., Tran, N., Baroni, M.: Multimodal distributional semantics. J. Artif. Intell. Res. 49, 1–47 (2014)

    MathSciNet  MATH  Google Scholar 

  9. Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: Proceedings of the International Conference Acoustic, Speech, and Signal Process (ICASSP), pp. 4960–4964 (2016)

    Google Scholar 

  10. Cho, K., van Merriënboer, B., Gulcehre, C., Bahadanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP), pp. 1724–1734 (2014)

    Google Scholar 

  11. Chrupala, G.: Normalizing tweets with edit scripts and recurrent neural embeddings. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2014)

    Google Scholar 

  12. Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the International Conference Machine Learning (ICML), pp. 160–167 (2008)

    Google Scholar 

  13. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    MATH  Google Scholar 

  14. Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR), June 2005

    Google Scholar 

  15. Dyer, C., Kuncoro, A., Ballesteros, M., Smith, N.A.: Recurrent neural network grammars. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2015)

    Google Scholar 

  16. Eisenstein, J., Ahmed, A., Xing, E.P.: Sparse additive generative models of text. In: Proceedings of the International Conference Machine Learning (ICML) (2011)

    Google Scholar 

  17. Eyben, F., Wöllmer, M., Schuller, B., Graves, A.: From speech to letters - using a novel neural network architecture for grapheme based ASR. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 376–380 (2009)

    Google Scholar 

  18. Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J., Zitnick, L., Zweig, G., Zitnick, L.: From captions to visual concepts and back. In: Proceedings of the Conference Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  19. Fang, H., Ostendorf, M., Baumann, P., Pierrehumbert, J.: Exponential language modeling using morphological featues and multi-task learning. IEEE Trans. Audio Speech Lang. Process. 23(12), 2410–2421 (2015)

    Article  Google Scholar 

  20. Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual language processing from bytes. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2016)

    Google Scholar 

  21. Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks. In: Proceedings of the International Conference Machine Learning (ICML) (2006)

    Google Scholar 

  22. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)

    Article  Google Scholar 

  23. He, J., Chen, J., He, X., Gao, J., Li, L., Deng, L., Ostendorf, M.: Deep reinforcement learning with a natural language action space. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2016)

    Google Scholar 

  24. Hershey, J.R., Roux, J.L., Weninger, F.: Deep unfolding: model-based inspiration of novel deep architectures. arXiv preprint arXiv:1409.2574v4 (2014)

  25. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  26. Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the ACM International Conference on Information and Knowledge Management (2013)

    Google Scholar 

  27. Hutchinson, B., Ostendorf, M., Fazel, M.: A sparse plus low rank maximum entropy language model for limited resource scenarios. IEEE Trans. Audio Speech Lang. Process. 23(3), 494–504 (2015)

    Article  Google Scholar 

  28. Hutchinson, B.: Rank and sparsity in language processing. Ph.D. thesis, University of Washington, August 2013

    Google Scholar 

  29. Jaech, A., Heck, L., Ostendorf, M.: Domain adaptation of recurrent neural networks for natural language understanding. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2016)

    Google Scholar 

  30. Ji, Y., Eisenstein, J.: One vector is not enough: entity-augmented distributional semantics for discourse relations. Trans. Assoc. Comput. Linguist. (TACL) 3, 329–344 (2015)

    Google Scholar 

  31. Jozafowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410 (2015)

  32. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2014)

    Google Scholar 

  33. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the Conference Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  34. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP) (2014)

    Google Scholar 

  35. Kim, Y., Jernite, Y., Sontag, D., Rush, A.: Character-aware neural language models. In: Proceedings of the AAAI, pp. 2741–2749 (2016)

    Google Scholar 

  36. Kong, L., Dyer, C., Smith, N.: Segmental neural networks. In: Proceedings of the International Conference Learning Representations (ICLR) (2016)

    Google Scholar 

  37. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of the AAAI (2015)

    Google Scholar 

  38. Lev, G., Klein, B., Wolf, L.: In defense of word embedding for generic text representation. In: International Conference on Applications of Natural Language to Information Systems, pp. 35–50 (2015)

    Google Scholar 

  39. Levy, O., Goldberg, Y.: Linguistic regularities in sparse and explicit word representations. In: Proceedings of the Conference Computational Language Learning, pp. 171–180 (2014)

    Google Scholar 

  40. Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL), pp. 211–225 (2015)

    Google Scholar 

  41. Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A persona-based neural conversation model. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2016)

    Google Scholar 

  42. Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding? In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 1722–1732 (2015)

    Google Scholar 

  43. Lin, R., Liu, S., Yang, M., Li, M., Zhou, M., Li, S.: Hierarchical recurrent neural network for document modeling. In: Proceedings of the Conference Empirical Methods Natural Language Processing (EMNLP), pp. 899–907 (2015)

    Google Scholar 

  44. Ling, W., Luís, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation. In: EMNLP (2015)

    Google Scholar 

  45. Long, M.T., Socher, R., Manning, C.: Better word representations for recursive neural networks for morphology. In: Proceedings of the Conference Computational Natural Language Learning (CoNLL) (2013)

    Google Scholar 

  46. Lu, A., Wang, W., Bansal, M., Gimpel, K., Livescu, K.: Deep multilingual correlation for improved word embeddings. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 250–256 (2015)

    Google Scholar 

  47. Maas, A., Xie, Z., Jurafsky, D., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 345–354 (2015)

    Google Scholar 

  48. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  49. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference Learning Representations (ICLR) (2013)

    Google Scholar 

  50. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (2013)

    Google Scholar 

  51. Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: Proceedings of the IEEE Spoken Language Technologies Workshop (2012)

    Google Scholar 

  52. Mikolov, T., Martin, K., Burget, L., C̆ernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2010)

    Google Scholar 

  53. Mousa, A.E.D., Kuo, H.K.J., Mangu, L., Soltau, H.: Morpheme-based feature-rich language models using deep neural networks for LVCSR of Egyptian Arabic. In: Proceedings of the International Conference Acoustic, Speech, and Signal Process (ICASSP), pp. 8435–8439 (2013)

    Google Scholar 

  54. Murphy, B., Talukdar, P., Mitchell, T.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: Proceedings of the International Conference Computational Linguistics (COLING) (2012)

    Google Scholar 

  55. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP) (2014)

    Google Scholar 

  56. Qui, S., Cui, Q., Bian, J., Gao, B., Liu, T.Y.: Co-learning of word representations and morpheme representations. In: Proceedings of the International Conference Computational Linguistics (COLING) (2014)

    Google Scholar 

  57. Rush, A., Chopra, S., Weston, J.: A neural attention model for sentence summarization. In: Proceedings of the International Conference Empirical Methods Natural Language Process (EMNLP), pp. 379–389 (2015)

    Google Scholar 

  58. dos Santos, C., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the ACL Named Entities Workshop, pp. 25–33 (2015)

    Google Scholar 

  59. dos Santos, C., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: Proceedings of the International Conference Machine Learning (ICML) (2015)

    Google Scholar 

  60. Schutze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)

    MathSciNet  Google Scholar 

  61. Schwartz, R., Reichart, R., Rappoport, A.: Symmetric pattern-based word embeddings for improved word similarity prediction. In: Proceedings of the Conference Computational Language Learning, pp. 258–267 (2015)

    Google Scholar 

  62. Socher, R., Bauer, J., Manning, C.: Parsing with compositional vectors. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2013)

    Google Scholar 

  63. Socher, R., Lin, C., Ng, A., Manning, C.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the International Conference Machine Learning (ICML) (2011)

    Google Scholar 

  64. Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.Y., Gao, J., Dolan, B.: A neural network approach to context-sensitive generation of conversational responses. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2015)

    Google Scholar 

  65. Srivastava, R., Greff, K., Schmidhuber, J.: Training very deep networks. In: Proceedings of the Conference Neural Information Processing System (NIPS) (2015)

    Google Scholar 

  66. Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Proceedings of the Interspeech (2012)

    Google Scholar 

  67. Turney, P.: Similarity of semantic relations. Comput. Linguist. 32(3), 379–416 (2006)

    Article  MATH  Google Scholar 

  68. Wu, Y., Lu, X., Yamamoto, H., Matsuda, S., Hori, C., Kashioka, H.: Factored language model based on recurrent neural network. In: Proceedings of the International Conference Computational Linguistics (COLING) (2012)

    Google Scholar 

  69. Yao, K., Zweig, G., Peng, B.: Intention with attention for a neural network conversation model. arXiv preprint arXiv:1510.08565v3 (2015)

  70. Yogatama, D., Wang, C., Routledge, B., Smith, N., Xing, E.: Dynamic language models for streaming text. Trans. Assoc. Comput. Linguist. (TACL) 2, 181–192 (2014)

    Google Scholar 

  71. Zayats, V., Ostendorf, M., Hajishirzi, H.: Disfluency detection using a bidirectional LSTM. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2016)

    Google Scholar 

  72. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the Conference Neural Information Processing System (NIPS), pp. 1–9 (2015)

    Google Scholar 

Download references

Acknowledgments

I thank my students Hao Cheng, Hao Fang, Ji He, Brian Hutchinson, Aaron Jaech, Yi Luan, and Vicky Zayats for helping me gain insights into continuous space language methods through their many experiments and our paper discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mari Ostendorf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Ostendorf, M. (2016). Continuous-Space Language Processing: Beyond Word Embeddings. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45925-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45924-0

  • Online ISBN: 978-3-319-45925-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics