Skip to main content

Aspect-level sentiment capsule network for micro-video click-through rate prediction


Micro-videos, a new form of videos that are constrained in duration, gain significant popularity in recent years. The volume and rate of online micro-videos urgently calls for effective recommendation algorithms to help users find their interested ones. Although some previous works have investigated how to model users’ historical behaviors to predict the click-through rate of micro-videos, they are generally based on positive feedback only but overlook the negative which can help understand user preference at a finer granularity. The positive and negative feedback jointly imply the user’s different sentiments on different aspects, where each aspect is one component of a micro-video such as video_scene and video_subject. To this end, we propose an a spect-level s entiment cap sule network(ASCap) for micro-video click-through rate prediction by aggregating both positive and negative feedback, with an attempt to make the prediction more explainable. More specifically, an aspect-specific gating mechanism is firstly utilized to extract the aspect-level features from the target micro-video and the user’s positive and negative feedback. Then, in the following sentiment capsule network, the aspect-level features of the target micro-video are paired with those of positive and negative feedback respectively to identify their sentiments and form the sentiment capsules. Finally, the prediction layer is employed to calculate the overall click probability based on the sentiment capsules. Experimental results on two real-world micro-video datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4







  1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12Th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI}, vol. 16, pp. 265–283 (2016)

  2. Bahadori, MT: Spectral Capsule Networks (2018)

  3. Baluja, S., Seth, R., Sivakumar, D., Jing, Y., Yagnik, J., Kumar, S., Ravichandran, D., Aly, M.: Video suggestion and discovery for youtube: taking random walks through the view graph. In: Proceedings of the 17th International Conference on World Wide Web, pp. 895–904 (2008)

  4. Chen, B., Wang, J., Huang, Q., Mei, T.: Personalized video recommendation through tripartite graph propagation. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1133–1136 (2012)

  5. Chen, J., Song, X., Nie, L., Wang, X., Zhang, H., Chua, T.S.: Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 898–907 (2016)

  6. Chen, X., Liu, D., Zha, Z.J., Zhou, W., Xiong, Z., Li, Y.: Temporal hierarchical attention at category-and item-level for micro-video click-through prediction. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1146–1153 (2018)

  7. Cui, P., Wang, Z., Su, Z.: What videos are similar with you? Learning a common attributed representation for video recommendation. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 597–606 (2014)

  8. Ferracani, A., Pezzatini, D., Bertini, M., Del Bimbo, A.: Item-based video recommendation: An hybrid approach considering human factors. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pp. 351–354 (2016)

  9. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.S.: Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web, pp. 173–182 (2017)

  10. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv:150302531 (2015)

  11. Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: International Conference on Artificial Neural Networks, pp. 44–51. Springer (2011)

  12. Hinton, G.E., Sabour, S., Frosst, N.: Matrix Capsules with Em Routing (2018)

  13. Huang, L., Luo, B.: Personalized micro-video recommendation via hierarchical user interest modeling. In: Pacific Rim Conference on Multimedia, pp. 564–574. Springer (2017)

  14. Huang, Y., Cui, B., Jiang, J., Hong, K., Zhang, W., Xie, Y.: Real-time video recommendation exploration. In: Proceedings of the 2016 International Conference on Management of Data, pp. 35–46 (2016)

  15. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:14126980 (2014)

  16. Li, C., Liu, Z., Wu, M., Xu, Y, Zhao, H, Huang, P., Kang, G., Chen, Q., Li, W., Lee, D.L.: Multi-interest network with dynamic routing for recommendation at tmall. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2615–2623 (2019a)

  17. Li, C., Quan, C., Peng, L., Qi, Y., Deng, Y., Wu, L.: A capsule network for recommendation and explaining what you like and dislike. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–284 (2019b)

  18. Li, H., Guo, X., DaiWanli Ouyang, B, Wang, X.: Neural network encapsulation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 252–267 (2018)

  19. Li, Y., Liu, M., Yin, J., Cui, C., Xu, X.S., Nie, L.: Routing micro-videos via a temporal graph-guided recommendation system. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1464–1472 (2019c)

  20. Liu, S., Chen, Z.: Sequential behavior modeling for next micro-video recommendation with collaborative transformer. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 460–465. IEEE (2019)

  21. Liu, S., Chen, Z., Liu, H., Hu, X.: User-video co-attention network for personalized micro-video recommendation. In: The World Wide Web Conference, pp. 3020–3026 (2019)

  22. Ma, J., Li, G., Zhong, M., Zhao, X., Zhu, L., Li, X.: Lga: latent genre aware micro-video recommendation on social media. Multimedia Tools Appl 77(3), 2991–3008 (2018)

    Article  Google Scholar 

  23. Ma, J., Wen, J., Zhong, M., Chen, W., Zhou, X., Indulska, J.: Multi-source multi-net micro-video recommendation with hidden item category discovery. In: International Conference on Database Systems for Advanced Applications, pp. 384–400. Springer (2019)

  24. Mei, T., Yang, B., Hua, X.S., Li, S.: Contextual video recommendation by multimodal relevance and user feedback. ACM Trans Inf Sys (TOIS) 29 (2), 1–24 (2011)

    Article  Google Scholar 

  25. Ouyang, W., Zhang, X., Li, L., Zou, H., Xing, X., Liu, Z., Du, Y.: Deep spatio-temporal neural networks for click-through rate prediction. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2078–2086 (2019)

  26. Peska, L., Vojtas, P.: Negative implicit feedback in e-commerce recommender systems. In: Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics, pp. 1–4 (2013)

  27. Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: Bpr: Bayesian personalized ranking from implicit feedback. arXiv:12052618 (2012)

  28. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)

  29. Wang, D., Liu, Q.: An optimization view on dynamic routing between capsules (2018)

  30. Wei, Y., Cheng, Z., Yu, X., Zhao, Z., Zhu, L., Nie, L.: Personalized hashtag recommendation for micro-videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1446–1454 (2019a)

  31. Wei, Y., Wang, X., Nie, L., He, X., Hong, R., Chua, T.S.: Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1437–1445 (2019b)

  32. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemomet Intell Lab Sys 2(1-3), 37–52 (1987)

    Article  Google Scholar 

  33. Xia, C., Zhang, C., Yan, X., Chang, Y., Yu, P.S.: Zero-shot user intent detection via capsule neural networks. arXiv:180900385 (2018)

  34. Xiao, L., Zhang, H., Chen, W., Wang, Y., Jin, Y.: Mcapsnet: Capsule network for text with multi-task learning. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4565–4574 (2018)

  35. Yan, M., Sang, J., Xu, C.: Unified youtube video recommendation via cross-network collaboration. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 19–26 (2015)

  36. Zhang, J., Nie, L., Wang, X., He, X., Huang, X., Chua, T.S.: Shorter-is-better: Venue category estimation from micro-video. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1415–1424 (2016)

  37. Zhang, X., Li, P., Jia, W., Zhao, H.: Multi-labeled relation extraction with attentive capsule network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 7484–7491 (2019)

  38. Zhao, W., Ye, J., Yang, M., Lei, Z., Zhang, S., Zhao, Z.: Investigating capsule networks with dynamic routing for text classification. arXiv:180400538 (2018a)

  39. Zhao, X., Li, G., Wang, M., Yuan, J., Zha, Z.J., Li, Z., Chua, T.S.: Integrating rich information for video recommendation with multi-task rank aggregation. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 1521–1524 (2011)

  40. Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., Yin, D.: Recommendations with negative feedback via pairwise deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1040–1048 (2018b)

  41. Zhou, C., Bai, J., Song, J., Liu, X., Zhao, Z., Chen, X., Gao, J.: Atrank: an attention-based user behavior modeling framework for recommendation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  42. Zhou, X., Chen, L., Zhang, Y., Cao, L., Huang, G., Wang, C.: Online video recommendation in sharing community. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1645–1656 (2015)

Download references


This work was partially supported by the Zhejiang University Education Foundation under grants No. K18-511120-004, No. K17-511120-017, and No. K17-518051-021, the National Natural Science Foundation of China under grant No. 61672453, the National key R&D program sub project “large scale cross-modality medical knowledge management” under grant No. 2018AAA0102100.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jian Wu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Web Intelligence = Artificial Intelligence in the Connected World

Guest Editors: Yuefeng Li, Amit Sheth, Athena Vakali, and Xiaohui Tao

Yuqiang Han and Pan Gu contribute equally.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Han, Y., Gu, P., Gao, W. et al. Aspect-level sentiment capsule network for micro-video click-through rate prediction. World Wide Web 24, 1045–1064 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Aspect-level sentiment
  • Capsule network
  • Micro-video
  • Click-through rate prediction