TPFN: Applying Outer Product Along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data

Li, Binghua; Li, Chao; Duan, Feng; Zheng, Ning; Zhao, Qibin

doi:10.1007/978-3-030-58586-0_26

TPFN: Applying Outer Product Along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data

Conference paper
First Online: 30 November 2020

3331 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12369))

Abstract

Multimodal sentiment analysis (MSA) has been widely investigated in both computer vision and natural language processing. However, studies on the imperfect data especially with missing values are still far from success and challenging, even though such an issue is ubiquitous in the real world. Although previous works show the promising performance by exploiting the low-rank structures of the fused features, only the first-order statistics of the temporal dynamics are concerned. To this end, we propose a novel network architecture termed Time Product Fusion Network (TPFN), which takes the high-order statistics over both modalities and temporal dynamics into account. We construct the fused features by the outer product along adjacent time-steps, such that richer modal and temporal interactions are utilized. In addition, we claim that the low-rank structures can be obtained by regularizing the Frobenius norm of latent factors instead of the fused features. Experiments on CMU-MOSI and CMU-MOSEI datasets show that TPFN can compete with state-of-the art approaches in multimodal sentiment analysis in cases of both random and structured missing values.

B. Li and C. Li—Equal Contribution.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Without ambiguity, we also use the notion of CP-rank to represent the number of rank-1 factors used in matrix/tensor approximation.
2.
See http://immortal.multicomp.cs.cmu.edu/raw_datasets/processed_data/.

References

Antol, S., et al.: Vqa: visual question answering. In: Proceedings of ICCV (2015)
Google Scholar
Barezi, E.J., Fung, P.: Modality-based factorization for multimodal fusion. arXiv preprint arXiv:1811.12624 (2018)
Ben-Younes, H., Cadene, R., Cord, M., Thome, N.: Mutan: multimodal tucker fusion for visual question answering. In: Proceedings of ICCV (2017)
Google Scholar
Ben-Younes, H., Cadene, R., Thome, N., Cord, M.: Block: bilinear superdiagonal fusion for visual question answering and visual relationship detection. In: Proceedings of AAAI (2019)
Google Scholar
Busso, C., Bulut, M., Lee, C.C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J.N., Lee, S., Narayanan, S.S.: Iemocap: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335 (2008)
Article Google Scholar
Cai, L., Wang, Z., Gao, H., Shen, D., Ji, S.: Deep adversarial learning for multi-modality missing data completion. In: Proceedings of SIGKDD (2018)
Google Scholar
Cambria, E., Poria, S., Bajpai, R., Schuller, B.: Senticnet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: Proceedings of COLING (2016)
Google Scholar
Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an n-way generalization of "eckart-young" decomposition. Psychometrika 35(3), 283–319 (1970)
Article Google Scholar
Chen, M., Wang, S., Liang, P.P., Baltrušaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of ICMI (2017)
Google Scholar
Cichocki, A., et al.: Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Found. Trends Mach. Learn. 9(4—-5), 249–429 (2016)
Article Google Scholar
Dong, J., Zheng, H., Lian, L.: Low-rank laplacian-uniform mixed model for robust face recognition. In: Proceedings of CVPR (2019)
Google Scholar
Duong, C.T., Lebret, R., Aberer, K.: Multimodal classification for analysing social media. arXiv preprint arXiv:1708.02099 (2017)
Fan, H., Chen, Y., Guo, Y., Zhang, H., Kuang, G.: Hyperspectral image restoration using low-rank tensor recovery. EEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 10(10), 4589–4604 (2017)
Article Google Scholar
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the v in vqa matter: elevating the role of image understanding in visual question answering. In: Proceedings of CVPR (2017)
Google Scholar
Gu, Y., Li, X., Chen, S., Zhang, J., Marsic, I.: Speech intention classification with multimodal deep learning. In: Canadian Conference on Artificial Intelligence. pp. 260–271 (2017)
Google Scholar
Guo, J., Zhou, Z., Wang, L.: Single image highlight removal with a sparse and low-rank reflection model. In: Proceedings of ECCV (2018)
Google Scholar
Harshman, R.A., et al.: Foundations of the parafac procedure: Models and conditions for an" explanatory" multimodal factor analysis. UCLA Working Phonetics Paper (1970)
Google Scholar
He, W., Yao, Q., Li, C., Yokoya, N., Zhao, Q.: Non-local meets global: an integrated paradigm for hyperspectral denoising. In: Proceedings of CVPR (2019)
Google Scholar
Hou, M., Tang, J., Zhang, J., Kong, W., Zhao, Q.: Deep multimodal multilinear fusion with high-order polynomial pooling. In: Proceedings of NeurIPS (2019)
Google Scholar
Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of MM (2014)
Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM review 51(3), 455–500 (2009)
Article MathSciNet Google Scholar
Li, C., He, W., Yuan, L., Sun, Z., Zhao, Q.: Guaranteed matrix completion under multiple linear transformations. In: Proceedings of CVPR (2019)
Google Scholar
Liang, P.P., et al.: Learning representations from imperfect time series data via tensor rank regularization. arXiv preprint arXiv:1907.01011 (2019)
Liu, B., Zhang, L.: A Survey of Opinion Mining and Sentiment Analysis. In: Aggarwal, C., Zhai, C., (eds.) Mining Text Data. Springer, Boston, MA (2012) https://doi.org/10.1007/978-1-4614-3223-4_13
Liu, H., Lin, M., Zhang, S., Wu, Y., Huang, F., Ji, R.: Dense auto-encoder hashing for robust cross-modality retrieval. In: Proceedings of MM (2018)
Google Scholar
Liu, Z., Shen, Y., Lakshminarasimhan, V.B., Liang, P.P., Zadeh, A., Morency, L.P.: Efficient low-rank multimodal fusion with modality-specific factors. arXiv preprint arXiv:1806.00064 (2018)
Lu, C., Peng, X., Wei, Y.: Low-rank tensor completion with a new tensor nuclear norm induced by invertible linear transforms. In: Proceedings of CVPR (2019)
Google Scholar
Mai, S., Hu, H., Xing, S.: Divide, conquer and combine: hierarchical feature fusion network with local and global perspectives for multimodal affective computing. In: Proceedings of ACL (2019)
Google Scholar
Miech, A., Laptev, I., Sivic, J.: Learning a text-video embedding from incomplete and heterogeneous data. arXiv preprint arXiv:1804.02516 (2018)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Morency, L.P., Mihalcea, R., Doshi, P.: Towards multimodal sentiment analysis: Harvesting opinions from the web. In: Proceedings of ICMI (2011)
Google Scholar
Nimishakavi, M., Jawanpuria, P.K., Mishra, B.: A dual framework for low-rank tensor completion. In: Proceedings of NeurIPS (2018)
Google Scholar
Pan, Y., et al.: Compressing recurrent neural networks with tensor ring for action recognition. In: Proceedings of AAAI (2019)
Google Scholar
Pang, B., Lee, L., et al.: Opinion mining and sentiment analysis. Found. Trends Inf. Ret. 2(1—-21), 1–135 (2008)
Google Scholar
Pham, H., Liang, P.P., Manzini, T., Morency, L.P., Póczos, B.: Found in translation: learning robust joint representations by cyclic translations between modalities. In: Proceedings of AAAI (2019)
Google Scholar
Poria, S., Cambria, E., Winterstein, G., Huang, G.B.: Sentic patterns: dependency-based rules for concept-level sentiment analysis. Know.-Based Syst. 69, 45–63 (2014)
Article Google Scholar
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52(3), 471–501 (2010)
Article MathSciNet Google Scholar
Srebro, N., Shraibman, A.: Rank, trace-norm and max-norm. In: Proceedings of COLT (2005)
Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. CL 37(2), 267–307 (2011)
Google Scholar
Tran, L., Liu, X., Zhou, J., Jin, R.: Missing modalities imputation via cascaded residual autoencoder. In: Proceedings of CVPR (2017)
Google Scholar
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31(3), 279–311 (1966)
Article MathSciNet Google Scholar
Wang, A., Li, C., Jin, Z., Zhao, Q.: Robust tensor decomposition via orientation invariant tubal nuclear norms. In: Proceedings of AAAI (2020)
Google Scholar
Wang, H., Meghawat, A., Morency, L.P., Xing, E.P.: Select-additive learning: improving generalization in multimodal sentiment analysis. In: Proceedings of ICME (2017)
Google Scholar
Wang, Y., Shen, Y., Liu, Z., Liang, P.P., Zadeh, A., Morency, L.P.: Words can shift: dynamically adjusting word representations using nonverbal behaviors. In: Proceedings of AAAI (2019)
Google Scholar
Wöllmer, M., Weninger, F., Knaup, T., Schuller, B., Sun, C., Sagae, K., Morency, L.P.: Youtube movie reviews: sentiment analysis in an audio-visual context. IEEE Intell. Syst. 28(3), 46–53 (2013)
Article Google Scholar
Yang, Y., Krompass, D., Tresp, V.: Tensor-train recurrent neural networks for video classification. In: Proceedings ICML (2017)
Google Scholar
Yu, K., Zhu, S., Lafferty, J., Gong, Y.: Fast nonparametric matrix factorization for large-scale collaborative filtering. In: Proceedings of SIGIR (2009)
Google Scholar
Yuan, L., Li, C., Mandic, D., Cao, J., Zhao, Q.: Tensor ring decomposition with rank minimization on latent space: an efficient approach for tensor completion. In: Proceedings of AAAI (2019)
Google Scholar
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017)
Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.P.: Memory fusion network for multi-view sequential learning. In: Proceedings of AAAI (2018)
Google Scholar
Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.P.: Multi-attention recurrent network for human communication comprehension. In: Proc. AAAI (2018)
Google Scholar
Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv preprint arXiv:1606.06259 (2016)
Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell. Syst. 31(6), 82–88 (2016)
Article Google Scholar
Zadeh, A.B., Liang, P.P., Poria, S., Cambria, E., Morency, L.P.: Multimodal language analysis in the wild: cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of ACL (2018)
Google Scholar

Download references

Acknowledgment

Binghua and Chao contributed equally. We thank our colleagues Dr. Ming Hou and Zihao Huang for discussions that greatly improved the manuscript. This work was partially supported by the National Key R&D Program of China (No. 2017YFE0129700), the National Natural Science Foundation of China (No. 61673224) and the Tianjin Natural Science Foundation for Distinguished Young Scholars (No. 18JCJQJC46100). This work is also supported by JSPS KAKENHI (Grant No. 20H04249, 20H04208, 20K19875).

Author information

Authors and Affiliations

RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan
Binghua Li, Chao Li, Ning Zheng & Qibin Zhao
College of Artificial Intelligence, Nankai University, Tianjin, China
Binghua Li & Feng Duan

Authors

Binghua Li
View author publications
You can also search for this author in PubMed Google Scholar
Chao Li
View author publications
You can also search for this author in PubMed Google Scholar
Feng Duan
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Qibin Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Duan .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 326 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, B., Li, C., Duan, F., Zheng, N., Zhao, Q. (2020). TPFN: Applying Outer Product Along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12369. Springer, Cham. https://doi.org/10.1007/978-3-030-58586-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-58586-0_26
Published: 30 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58585-3
Online ISBN: 978-3-030-58586-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics