Study on text representation method based on deep learning and topic information

Jiang, Zilong; Gao, Shu; Chen, Liangchen

doi:10.1007/s00607-019-00755-y

Study on text representation method based on deep learning and topic information

Published: 06 September 2019

Volume 102, pages 623–642, (2020)
Cite this article

Computing Aims and scope Submit manuscript

Zilong Jiang^1,2,
Shu Gao¹ &
Liangchen Chen³

661 Accesses
10 Citations
Explore all metrics

Abstract

Deep learning provides a new modeling method for natural language processing. In recent years, it has been applied in language model, text classification, machine translation, sentiment analysis, question and answer system, word distributed representation, etc., and a series of theoretical research results have been obtained. For the text representation task, this paper studies the strategy of fusing global and local context information, and proposes a word representation model called Topic-based CBOW that integrates deep neural network, topic information and word order information. Then, based on the word distributed representation obtained by Topic-based CBOW, a short text representation method with TF–IWF-weighted pooling is proposed. Finally, the performance of the Topic-based CBOW model and the short text representation are compared with the baseline models, and it is found that the proposed method improves the quality of the word distributed representation to some extent by introducing the topic vector and retaining word order information, and text representation also performs well in text classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

References

Quoc L, Tomas M (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on machine learning, Beijing
Bengio Y, Ducharme R, Vincent P (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, Helsinki, pp 160–167
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Workshop track of the 1st international conference on learning representations, Scottsdale
Mikolov T et al (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mikolov T, Karafiát M, Khudanpur S (2010) Recurrent neural network based language model. In: The 11th annual conference of the international speech communication association, Makuhari, Chiba, pp 257–264
Wen Y, Zhang W, Luo R, Wang J (2016) Learning text representation using recurrent convolutional neural network with highway layers. In: Proceedings of the 39th ACM SIGIR workshop on neural information retrieval, Pisa
Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics, Jeju Island, pp 873–882
Maillard J, Clark S (2015) Learning adjective meanings with a tensor-based skip-gram model. In: Nineteenth conference on computational natural language learning, Beijing, pp 327–331
Zheng S, Bao H, Xu J, Hao Y et al (2016) A bidirectional hierarchical skip-gram model for text topic embedding. In: International joint conference on neural networks, Vancouver, BC, pp 855–862
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, Doha, pp 1532–1543
Peters ME, Neumann M et al (2018) Deep contextualized word representations. In: The 16th annual conference of the North American chapter of the association for computational linguistics: human language technologies, New Orleans
Yan X, Guo J, Lan Y, Cheng X (2013) A biterm topic model for short texts. In: International conference on world wide web, Rio de Janeiro, pp 1445–1456
Mikolov T, Sutskever I, Chen K et al (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Frege G (1892) Über sinn und bedeutung. Funktion–Begriff–Bedeutung
Hermann KM (2014) Distributed representations for compositional semantics. PhD thesis, University of Oxford
Basili R, Moschitti A, Pazienza MT (1999) A text classifier based on linguistic processing. In: International joint conference on artificial intelligence, Stockholm, pp 1254–1266
NewGroup Dataset. [EB/OL]. [2019-1-6]. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html
baidu_zhidao. [EB/OL]. [2019-1-6]. http://www.datatang.com/data/39352
gensim. [EB/OL]. [2019-1-6]. https://radimrehurek.com/gensim/
Zhang Y et al (2018) CrossRec: cross-domain recommendations based on social big data and cognitive computing. Mob Netw Appl 23:1610–1623
Article Google Scholar
Zhang Y et al (2017) TempoRec: temporal-topic based recommender for social network services. Mob Netw Appl 22(6):1182–1191
Article Google Scholar
GloVe. [EB/OL]. [2019-1-6].https://github.com/stanfordnlp/GloVe
Wang Y, Liu H (2017) SAR target discrimination based on BOW model with sample-reweighted category-specific and shared dictionary learning. IEEE Geosci Remote Sens Lett 14(11):2097–2101
Article Google Scholar
Moody C (2016) Mixing Dirichlet topic models and word embeddings to make lda2vec. https://arxiv.org/pdf/1605.02019
lda2vec. [EB/OL]. [2019-1-6]. https://pypi.org/project/lda2vec/#files
Van der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245
MathSciNet MATH Google Scholar
Chopra P, Yadav SK (2018) Restricted Boltzmann machine and softmax regression for fault detection and classification. Complex Intell Syst 4:67–77
Article Google Scholar
Harris Z (1981) Distributional structure. Word 10(23):146–162
Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No.71473074), the Science and Technology Program Project of Qiannan Autonomous Prefecture (No. QNKHG201713), and the Scientific Research Project of Qiannan Normal University for Nationalities (No. QNSY2017006, 2018CG010, CST-2019SN02, ML-2018KF001).

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430063, China
Zilong Jiang & Shu Gao
School of Computer and Information, Qiannan Normal University for Nationalities, Duyun, 558000, China
Zilong Jiang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100010, China
Liangchen Chen

Authors

Zilong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Shu Gao
View author publications
You can also search for this author in PubMed Google Scholar
Liangchen Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zilong Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, Z., Gao, S. & Chen, L. Study on text representation method based on deep learning and topic information. Computing 102, 623–642 (2020). https://doi.org/10.1007/s00607-019-00755-y

Download citation

Received: 12 May 2019
Accepted: 28 August 2019
Published: 06 September 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00607-019-00755-y

Keywords

Mathematics Subject Classification

68T50

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study on text representation method based on deep learning and topic information

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Impact of word embedding models on text analytics in deep learning environment: a review

A survey on deep learning approaches for text-to-SQL

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Study on text representation method based on deep learning and topic information

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Impact of word embedding models on text analytics in deep learning environment: a review

A survey on deep learning approaches for text-to-SQL

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation