Continuous-Space Language Processing: Beyond Word Embeddings

Ostendorf, Mari

doi:10.1007/978-3-319-45925-7_1

Continuous-Space Language Processing: Beyond Word Embeddings

Mari Ostendorf¹⁵

Conference paper
First Online: 21 September 2016

652 Accesses
1 Citations
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9918))

Abstract

Spoken and written language processing has seen a dramatic shift in recent years to increased use of continuous-space representations of language via neural networks and other distributional methods. In particular, word embeddings are used in many applications. This paper looks at the advantages of the continuous-space approach and limitations of word embeddings, reviewing recent work that attempts to model more of the structure in language. In addition, we discuss how current models characterize the exceptions in language and opportunities for advances by integrating traditional and continuous approaches.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alexandrescu, A., Kirchhoff, K.: Factored neural language models. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (2006)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the International Conference Learning Representations (ICLR) (2015)
Google Scholar
Ballesteros, M., Dyer, C., Smith, N.: Improved transition-based parsing by modeling characters instead of words with LSTMs. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP), pp. 349–359 (2015)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Proceedings of the Conference Neural Information Processing System (NIPS), pp. 932–938 (2001)
Google Scholar
Botha, J.A., Blunsom, P.: Compositional morphology for word representations and language modelling. In: Proceedings of the International Conference on Machine Learning (ICML) (2014)
Google Scholar
Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18, 467–479 (1992)
Google Scholar
Bruni, E., Tran, N., Baroni, M.: Multimodal distributional semantics. J. Artif. Intell. Res. 49, 1–47 (2014)
MathSciNet MATH Google Scholar
Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: Proceedings of the International Conference Acoustic, Speech, and Signal Process (ICASSP), pp. 4960–4964 (2016)
Google Scholar
Cho, K., van Merriënboer, B., Gulcehre, C., Bahadanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP), pp. 1724–1734 (2014)
Google Scholar
Chrupala, G.: Normalizing tweets with edit scripts and recurrent neural embeddings. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2014)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the International Conference Machine Learning (ICML), pp. 160–167 (2008)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR), June 2005
Google Scholar
Dyer, C., Kuncoro, A., Ballesteros, M., Smith, N.A.: Recurrent neural network grammars. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2015)
Google Scholar
Eisenstein, J., Ahmed, A., Xing, E.P.: Sparse additive generative models of text. In: Proceedings of the International Conference Machine Learning (ICML) (2011)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B., Graves, A.: From speech to letters - using a novel neural network architecture for grapheme based ASR. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 376–380 (2009)
Google Scholar
Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J., Zitnick, L., Zweig, G., Zitnick, L.: From captions to visual concepts and back. In: Proceedings of the Conference Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Fang, H., Ostendorf, M., Baumann, P., Pierrehumbert, J.: Exponential language modeling using morphological featues and multi-task learning. IEEE Trans. Audio Speech Lang. Process. 23(12), 2410–2421 (2015)
Article Google Scholar
Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual language processing from bytes. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2016)
Google Scholar
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks. In: Proceedings of the International Conference Machine Learning (ICML) (2006)
Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)
Article Google Scholar
He, J., Chen, J., He, X., Gao, J., Li, L., Deng, L., Ostendorf, M.: Deep reinforcement learning with a natural language action space. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2016)
Google Scholar
Hershey, J.R., Roux, J.L., Weninger, F.: Deep unfolding: model-based inspiration of novel deep architectures. arXiv preprint arXiv:1409.2574v4 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the ACM International Conference on Information and Knowledge Management (2013)
Google Scholar
Hutchinson, B., Ostendorf, M., Fazel, M.: A sparse plus low rank maximum entropy language model for limited resource scenarios. IEEE Trans. Audio Speech Lang. Process. 23(3), 494–504 (2015)
Article Google Scholar
Hutchinson, B.: Rank and sparsity in language processing. Ph.D. thesis, University of Washington, August 2013
Google Scholar
Jaech, A., Heck, L., Ostendorf, M.: Domain adaptation of recurrent neural networks for natural language understanding. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2016)
Google Scholar
Ji, Y., Eisenstein, J.: One vector is not enough: entity-augmented distributional semantics for discourse relations. Trans. Assoc. Comput. Linguist. (TACL) 3, 329–344 (2015)
Google Scholar
Jozafowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410 (2015)
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2014)
Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the Conference Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP) (2014)
Google Scholar
Kim, Y., Jernite, Y., Sontag, D., Rush, A.: Character-aware neural language models. In: Proceedings of the AAAI, pp. 2741–2749 (2016)
Google Scholar
Kong, L., Dyer, C., Smith, N.: Segmental neural networks. In: Proceedings of the International Conference Learning Representations (ICLR) (2016)
Google Scholar
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proceedings of the AAAI (2015)
Google Scholar
Lev, G., Klein, B., Wolf, L.: In defense of word embedding for generic text representation. In: International Conference on Applications of Natural Language to Information Systems, pp. 35–50 (2015)
Google Scholar
Levy, O., Goldberg, Y.: Linguistic regularities in sparse and explicit word representations. In: Proceedings of the Conference Computational Language Learning, pp. 171–180 (2014)
Google Scholar
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL), pp. 211–225 (2015)
Google Scholar
Li, J., Galley, M., Brockett, C., Gao, J., Dolan, B.: A persona-based neural conversation model. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2016)
Google Scholar
Li, J., Jurafsky, D.: Do multi-sense embeddings improve natural language understanding? In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 1722–1732 (2015)
Google Scholar
Lin, R., Liu, S., Yang, M., Li, M., Zhou, M., Li, S.: Hierarchical recurrent neural network for document modeling. In: Proceedings of the Conference Empirical Methods Natural Language Processing (EMNLP), pp. 899–907 (2015)
Google Scholar
Ling, W., Luís, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation. In: EMNLP (2015)
Google Scholar
Long, M.T., Socher, R., Manning, C.: Better word representations for recursive neural networks for morphology. In: Proceedings of the Conference Computational Natural Language Learning (CoNLL) (2013)
Google Scholar
Lu, A., Wang, W., Bansal, M., Gimpel, K., Livescu, K.: Deep multilingual correlation for improved word embeddings. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 250–256 (2015)
Google Scholar
Maas, A., Xie, Z., Jurafsky, D., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL), pp. 345–354 (2015)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference Learning Representations (ICLR) (2013)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (2013)
Google Scholar
Mikolov, T., Zweig, G.: Context dependent recurrent neural network language model. In: Proceedings of the IEEE Spoken Language Technologies Workshop (2012)
Google Scholar
Mikolov, T., Martin, K., Burget, L., C̆ernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2010)
Google Scholar
Mousa, A.E.D., Kuo, H.K.J., Mangu, L., Soltau, H.: Morpheme-based feature-rich language models using deep neural networks for LVCSR of Egyptian Arabic. In: Proceedings of the International Conference Acoustic, Speech, and Signal Process (ICASSP), pp. 8435–8439 (2013)
Google Scholar
Murphy, B., Talukdar, P., Mitchell, T.: Learning effective and interpretable semantic models using non-negative sparse embedding. In: Proceedings of the International Conference Computational Linguistics (COLING) (2012)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the Conference Empirical Methods Natural Language Process (EMNLP) (2014)
Google Scholar
Qui, S., Cui, Q., Bian, J., Gao, B., Liu, T.Y.: Co-learning of word representations and morpheme representations. In: Proceedings of the International Conference Computational Linguistics (COLING) (2014)
Google Scholar
Rush, A., Chopra, S., Weston, J.: A neural attention model for sentence summarization. In: Proceedings of the International Conference Empirical Methods Natural Language Process (EMNLP), pp. 379–389 (2015)
Google Scholar
dos Santos, C., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the ACL Named Entities Workshop, pp. 25–33 (2015)
Google Scholar
dos Santos, C., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: Proceedings of the International Conference Machine Learning (ICML) (2015)
Google Scholar
Schutze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)
MathSciNet Google Scholar
Schwartz, R., Reichart, R., Rappoport, A.: Symmetric pattern-based word embeddings for improved word similarity prediction. In: Proceedings of the Conference Computational Language Learning, pp. 258–267 (2015)
Google Scholar
Socher, R., Bauer, J., Manning, C.: Parsing with compositional vectors. In: Proceedings of the Annual Meeting Association for Computational Linguistics (ACL) (2013)
Google Scholar
Socher, R., Lin, C., Ng, A., Manning, C.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the International Conference Machine Learning (ICML) (2011)
Google Scholar
Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.Y., Gao, J., Dolan, B.: A neural network approach to context-sensitive generation of conversational responses. In: Proceedings of the Conference North American Chapter Association for Computational Linguistics (NAACL) (2015)
Google Scholar
Srivastava, R., Greff, K., Schmidhuber, J.: Training very deep networks. In: Proceedings of the Conference Neural Information Processing System (NIPS) (2015)
Google Scholar
Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Proceedings of the Interspeech (2012)
Google Scholar
Turney, P.: Similarity of semantic relations. Comput. Linguist. 32(3), 379–416 (2006)
Article MATH Google Scholar
Wu, Y., Lu, X., Yamamoto, H., Matsuda, S., Hori, C., Kashioka, H.: Factored language model based on recurrent neural network. In: Proceedings of the International Conference Computational Linguistics (COLING) (2012)
Google Scholar
Yao, K., Zweig, G., Peng, B.: Intention with attention for a neural network conversation model. arXiv preprint arXiv:1510.08565v3 (2015)
Yogatama, D., Wang, C., Routledge, B., Smith, N., Xing, E.: Dynamic language models for streaming text. Trans. Assoc. Comput. Linguist. (TACL) 2, 181–192 (2014)
Google Scholar
Zayats, V., Ostendorf, M., Hajishirzi, H.: Disfluency detection using a bidirectional LSTM. In: Proceedings of the International Conference Speech Communication Association (Interspeech) (2016)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of the Conference Neural Information Processing System (NIPS), pp. 1–9 (2015)
Google Scholar

Download references

Acknowledgments

I thank my students Hao Cheng, Hao Fang, Ji He, Brian Hutchinson, Aaron Jaech, Yi Luan, and Vicky Zayats for helping me gain insights into continuous space language methods through their many experiments and our paper discussions.

Author information

Authors and Affiliations

Electrical Engineering Department, University of Washington, Seattle, USA
Mari Ostendorf

Authors

Mari Ostendorf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mari Ostendorf .

Editor information

Editors and Affiliations

University of West Bohemia , Plzen, Czech Republic
Pavel Král
Rovira i Virgili University , Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ostendorf, M. (2016). Continuous-Space Language Processing: Beyond Word Embeddings. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-45925-7_1
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45924-0
Online ISBN: 978-3-319-45925-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics