Advertisement

Multi-task learning using a hybrid representation for text classification

  • Guangquan Lu
  • Jiangzhang Gan
  • Jian Yin
  • Zhiping Luo
  • Bo Li
  • Xishun Zhao
Multi-Source Data Understanding (MSDU)
  • 165 Downloads

Abstract

Text classification is an important task in machine learning. Specifically, deep neural network has been shown strong capability to improve performance in different fields, for example speech recognition, objects recognition and natural language processing. However, in most previous work, the extracted feature models do not achieve the relative text tasks well. To address this issue, we introduce a novel multi-task learning approach called a hybrid representation-learning network for text classification tasks. Our method consists of two network components: a bidirectional gated recurrent unit with attention network module and a convolutional neural network module. In particular, the attention module allows for the task learning private feature representation in local dependence from training texts and that the convolutional neural network module can learn the global representation on sharing. Experiments on 16 subsets of Amazon review data show that our method outperforms several baselines and also proves the effectiveness of joint learning multi-relative tasks.

Keywords

Text classification Deep learning Multi-task learning Feature representation LSTM CNN Big data 

Notes

Acknowledgements

This work is supported by the National Key R&D Program of China (2018YFB1004404) and National Natural Science Foundation of China (61472453, U1401256, U1501252, U1611264, U1711262, U1711261). This research was partially supported by NSSFC Grant 13&ZD186, the China Key Research Program (Grant No: 2016YFB1000905), the Guangxi Science Research and Technology Development Program (Grants No: 15248003-8). We thank the anonymous reviewers for their thorough reviews and excellent suggestions.

References

  1. 1.
    Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representationsGoogle Scholar
  2. 2.
    Balikas G, Moura S (2017) Multitask learning for fine-grained twitter sentiment analysis. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pp 1005–1008Google Scholar
  3. 3.
    Ba J, Mnih V, Kavukcuoglu K (2015) Multiple object recognition with visual attention. In: International conference on learning representationsGoogle Scholar
  4. 4.
    Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bengio Y, Simard PY, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRefGoogle Scholar
  6. 6.
    Bengio Y, Boulangerlewandowski N, Pascanu R (2013) Advances in optimizing recurrent networks. In: International conference on acoustics, speech, and signal processing, pp 8624–8628Google Scholar
  7. 7.
    Boulangerlewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. In: International conference on machine learning, pp 1159–1166Google Scholar
  8. 8.
    Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: Conference on computational natural language learning, pp 10–21Google Scholar
  9. 9.
    Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chen T, Xu R, He Y, Wang X (2017) Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst Appl 72:221–230CrossRefGoogle Scholar
  11. 11.
    Chollet F et al (2015) Keras. https://github.com/fchollet/keras. Accessed 23 Feb 2018
  12. 12.
    Cho K, Van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Empirical methods in natural language processing, pp 1724–1734Google Scholar
  13. 13.
    Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: International conference on machine learning, pp 160–167Google Scholar
  14. 14.
    Conneau A, Schwenk H, Barrault L (2017) Very deep convolutional networks for text classification. In: Conference of the European chapter of the association for computational linguistics, pp 1107–1116Google Scholar
  15. 15.
    Deng L, Seltzer ML, Yu D, Acero A, Mohamed A, Hinton GE (2010) Binary coding of speech spectrograms using a deep auto-encoder. In: Conference of the international speech communication association, pp 1692–1695Google Scholar
  16. 16.
    Er MJ, Zhang Y, Wang N, Pratama M (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388-403CrossRefGoogle Scholar
  17. 17.
    Gers FA, Schmidhuber J (2000) Recurrent nets that time and count. In: Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks, vol 3, pp 189–194Google Scholar
  18. 18.
    Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer vision and pattern recognition, pp 580–587Google Scholar
  19. 19.
    Glocker B, Pauly O, Konukoglu E, Criminisi A (2012) Joint classification-regression forests for spatially structured multi-object segmentation. In: European conference on computer vision, pp 870–881Google Scholar
  20. 20.
    Graves A, Mohamed A, Hinton GE (2013) Speech recognition with deep recurrent neural networks. In: International conference on acoustics, speech, and signal processing, pp 6645–6649Google Scholar
  21. 21.
    Graves A, Wayne G, Danihelka I (2014) Neural turing machines. arXiv: Neural and evolutionary computingGoogle Scholar
  22. 22.
    Gui L, Zhou Y, Xu R, He Y, Lu Q (2017) Learning representations from heterogeneous network for sentiment classification of product reviews. Knowl Based Syst 124:34–45CrossRefGoogle Scholar
  23. 23.
    He R, Mcauley J (2016) Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: International world wide web conferencesGoogle Scholar
  24. 24.
    He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: Computer vision and pattern recognition, pp 770–778Google Scholar
  25. 25.
    He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision, pp 630–645Google Scholar
  26. 26.
    Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefGoogle Scholar
  27. 27.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  28. 28.
    Hu R, Zhu X, Cheng D, He W, Yan Y, Song J, Zhang S (2017) Graph self-representation method for unsupervised feature selection. Neurocomputing 220:130–137CrossRefGoogle Scholar
  29. 29.
    Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Meeting of the association for computational linguistics, pp 655–665Google Scholar
  30. 30.
    Kim Y (2014) Convolutional neural networks for sentence classification. In: Empirical methods in natural language processing, pp 1746–1751Google Scholar
  31. 31.
    Kim J, Lee JK, Lee KM (2016) Deeply-recursive convolutional network for image super-resolution. In: Computer vision and pattern recognition, pp 1637–1645Google Scholar
  32. 32.
    Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representationsGoogle Scholar
  33. 33.
    Koutnik J, Greff K, Gomez FJ, Schmidhuber J (2014) A clockwork RNN. In: International conference on machine learning, pp 1863–1871Google Scholar
  34. 34.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Neural information processing systems, pp 1097–1105Google Scholar
  35. 35.
    Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196Google Scholar
  36. 36.
    Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. IEEE Proc 86(11):2278–2324CrossRefGoogle Scholar
  37. 37.
    Li P, Mao K (2018) Knowledge-oriented convolutional neural network for causal relation extraction from natural language texts. Expert Syst Appl 115:512–523CrossRefGoogle Scholar
  38. 38.
    Liang J, Liu P, Tan J, Bai S (2014) Sentiment classification based on AS-LDA model. Procedia Comput Sci 31:511–516CrossRefGoogle Scholar
  39. 39.
    Liao C, Feng C, Yang S, Huang H (2016) Topic-related Chinese message sentiment analysis. Neurocomputing 210:237–246CrossRefGoogle Scholar
  40. 40.
    Liu X, Gao J, He X, Deng L, Duh K, Wang Y (2015) Representation learning using multi-task deep neural networks for semantic classification and information retrieval. In: North American chapter of the association for computational linguistics, pp 912–921Google Scholar
  41. 41.
    Liu S, Johns E, Davison AJ (2018) End-to-end multi-task learning with attention. arXiv: Computer vision and pattern recognitionGoogle Scholar
  42. 42.
    Liu P, Qiu X, Huang X (2016) Deep multi-task learning with shared memory. In: Conference on empirical methods in natural language processingGoogle Scholar
  43. 43.
    Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: International joint conference on artificial intelligence, pp 2873–2879Google Scholar
  44. 44.
    Liu P, Qiu X, Huang X (2017) Adversarial multi-task learning for text classification. In: Meeting of the association for computational linguistics, vol 1, pp 1–10Google Scholar
  45. 45.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition, pp 3431–3440Google Scholar
  46. 46.
    Lu G, Li B, Yang W, Yin J (2017) Unsupervised feature selection with graph learning via low-rank constraint. Multimed Tools Appl 77(22):29531–29549CrossRefGoogle Scholar
  47. 47.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv: Computation and languageGoogle Scholar
  48. 48.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Neural information processing systems, pp 3111–3119Google Scholar
  49. 49.
    Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines, pp 807–814Google Scholar
  50. 50.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  51. 51.
    Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543Google Scholar
  52. 52.
    Poria S, Cambria E, Gelbukh AF (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl Based Syst 108:42-49CrossRefGoogle Scholar
  53. 53.
    Ruder S (2016) An overview of gradient descent optimization algorithms. Machine learningGoogle Scholar
  54. 54.
    Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536CrossRefGoogle Scholar
  55. 55.
    Santos CND, Gatti MADC (2014) Deep convolutional neural networks for sentiment analysis of short texts, pp 69–78Google Scholar
  56. 56.
    Sepp Hochreiter YB (2001) Gradient flow in recurrent nets: the difficulty of learning longterm dependencies. Wiley-IEEE Press, pp 237–243Google Scholar
  57. 57.
    Wohlmayr M, Stark M, Pernkopf F (2011) A probabilistic interaction model for multipitch tracking with factorial hidden markov models. IEEE Trans Audio Speech Lang Process 19(4):799–810CrossRefGoogle Scholar
  58. 58.
    Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Conference of the North American chapter of the association for computational linguistics, pp 1480–1489Google Scholar
  59. 59.
    Yao K, Cohn T, Vylomova K, Duh K, Dyer C (2015) Depth-gated LSTM. Depth Gated LSTM Neural Evol ComputGoogle Scholar
  60. 60.
    Zhang S, Li X, Zong M, Zhu X, Wang R (2018) Efficient KNN classification with different numbers of nearest neighbors. IEEE Trans Neural Netw Learn Syst 29(5):1774–1785MathSciNetCrossRefGoogle Scholar
  61. 61.
    Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision, pp 94–108Google Scholar
  62. 62.
    Zhang C, Zhang Z (2014) Improving multiview face detection with multi-task deep convolutional neural networks. In: Workshop on applications of computer vision, pp 1036–1041Google Scholar
  63. 63.
    Zhang X, Zhao JJ, Lecun Y (2015) Character-level convolutional networks for text classification. In: Neural information processing systems, pp 649–657Google Scholar
  64. 64.
    Zheng W, Zhu X, Zhu Y, Hu R, Lei C (2018) Dynamic graph learning for spectral feature selection. Multimed Tools Appl 77(22):29739–29755CrossRefGoogle Scholar
  65. 65.
    Zheng W, Zhu X, Wen G, Zhu Y, Yu H, Gan J (2018) Unsupervised feature selection by self-paced learning regularization. Pattern Recognit Lett.  https://doi.org/10.1016/j.patrec.2018.06.029 CrossRefGoogle Scholar
  66. 66.
    Zhu X, Zhang S, Hu R, Zhu Y et al (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529CrossRefGoogle Scholar
  67. 67.
    Zhu X, Zhang S, He W, Hu R, Lei C, Zhu P (2018) One-step multi-view spectral clustering. IEEE Trans Knowl Data Eng.  https://doi.org/10.1109/tkde.2018.2873378 CrossRefGoogle Scholar
  68. 68.
    Zhu X, Zhang S, Li Y, Zhang J, Yang L, Fang Y (2018) Low-rank sparse subspace for spectral clustering. IEEE Trans Knowl Data Eng.  https://doi.org/10.1109/tkde.2018.2858782 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Guangxi Key Lab of Multi-source Information Mining and SecurityGuangxi Normal UniversityGuilinPeople’s Republic of China
  2. 2.Department of Philosophy, Institute of Logic and CognitionSun Yat-sen UniversityGuangzhouPeople’s Republic of China
  3. 3.Guangdong Key Laboratory of Big Data Analysis and ProcessingSun Yat-sen UniversityGuangzhouPeople’s Republic of China

Personalised recommendations