Bahuleyan H (2018) Music genre classification using machine learning techniques. arXiv:1804.01149v1
Baltrusaitis T, Ahuja C, Morency LP (2018) Multimodal machine learning:a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41:423–443
Article
Google Scholar
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. Springer proceedings of COMPSTAT’2010 177–186
Carreira J, and Zisserman A (2018) Quo vadis, action recognition? A new model and the kinetics dataset. arXiv:1705.07750v3
Chang WY, Hsu SH, and Chien JH (2017) FATAUVA-net: an integrated deep learning framework for facial attribute recognition, action unit detection, and valence-arousal estimation. IEEE 2160-7516
Choi K, Fazekas G, Sandler M and Cho K (2017) Transfer learning for music classification and regression tasks. International Society for Music Information Retrieval Conference, Suzhou, China 141–149
Clevert DA, Unterthiner T and Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289
Cowen AS, Keltner D (2017) Self-report captures 27 distinct categories of emotion bridged by continuous gradients. PNAS 114(38):E7900–E7909
Article
Google Scholar
Dai W, Dai C, Qu S, Li J, and Das S (2016) Very deep convolutional neural networks for raw waveforms. arXiv:1610.00087v1
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition:1063–6919
Ding W, Xu M, Huang D, Lin W, Dong M, Yu X, Li H (2016) Audio and face video emotion recognition in the wild using deep neural networks and small datasets. International conference on multimodal interfaces. Tokyo, Japan
Google Scholar
Elshaer MEA, Wisdom S, Mishra T (2019) Transfer learning from sound representations for anger detection in speech. arXiv:1902.02120v1
Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using CNN-RNN and C3D hybrid networks. International conference on multimodal interfaces. Tokyo, Japan
Google Scholar
Fridman L, Brown DE, Glazer M, Angell W, Dodd S, Jenik B, Terwilliger J, Patsekin A, Kindelsberger J, Ding L, Seaman S, Mehler A, Sipperley A, Pettinato A, Seppelt B, Angell L, Mehler B, and Reimer B (2019) MIT advanced vehicle technology study: large-scale naturalistic driving study of driver behavior and interaction with automation. arXiv:1711.06976v4
Gao Z, Xuan HZ, Zhang H, Wan S and Choo KKR (2018) Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things Journal
Gao Z, Wang YL, Wan SH, Wang DY, Zhang H (2019) Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval. Futur Gener Comput Syst 94:641–653
Article
Google Scholar
Garces, MLE (2018) Transfer learning for illustration classification, arXiv:1806.02682v1
Grekow J (2018) From content-based music emotion recognition to emotion maps of musical pieces. Springer
Hahnloser RHR, Sarpeshkar R, Mahowald MA, Douglas RJ, and Seung SH (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405-6789-947
Hinton G, Srivastava N, and Swersky K (2012) Lecture 6d - a separate, adaptive learning rate for each connection. Slides of Lecture Neural Networks for Machine Learning.
Google Scholar
Hong S, Im W, and Yang HS (2017) Content-based video–music retrieval using soft intra-modal structure constraint. arXiv:1704.06761v2.
Hussain M, Bird JJ, Faria DR (2018) A study on CNN transfer learning for image classification. UKCI 2018: Advances In Intelligent Systems and Computing, (840) 191-202 Springer
Kahou SE, Bouthillier X, Lamblin P, Gulcehre C and at al. (2015) EmoNets: Multimodal deep learning approaches for emotion recognition in video. arXiv:1503.01800v2.
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE conference on Computer Vision and Pattern Recognition:1725–1732
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R and Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition
Book
Google Scholar
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, Suleyman M, and Zisserman A (2017) The kinetics human action video dataset. arXiv:1705.06950
Kaya H, Gürpınar F, Salah AA (2017) Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vis Comput 65:66–75
Article
Google Scholar
Kingma D and Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Koelstra S, M¨uhl C, Soleymani M, Lee JS, Yazdani A, Ebrahimi T, Pun T, Nijholt N, and Patras I (2012) DEAP: a database for emotion analysis using physiological signals. IEEE Trans Affect Comput
Kunze J, Kirsch L, Kurenkov I, Krug A, Johannsmeier J, and Stober S (2017) Transfer learning for speech recognition on a budget. arXiv:1706.00290v1
Lee J, Park J, Kim KL, Nam J (2018) SampleCNN: end-to-end deep convolutional neural networks using very small filters for music classification. Applied science. https://doi.org/10.3390/app8010150
Liu X, Chen Q, Wu X, Yan L, Ann Yang L (2017) CNN based music emotion classification. arXiv:1704.05665
Lövheim H (2012) A new three-dimensional model for emotions and monoamine neurotransmitters. Med Hypotheses 78:341–348
Article
Google Scholar
Ma Y, Hao Y, Chen M, Chen J, Lu P, Košir A (2019) Audio-visual emotion fusion (AVEF): a deep efficient weighted approach. Information Fusion 46:184–192
Article
Google Scholar
Mahieux TB, Ellis DP, Whitman B, and Lamere P (2011) The million song dataset. 12th international conference on music information retrieval, Miami FL 591-596
Minaee S and Abdolrashidi A (2019) Deep-emotion: facial expression recognition using attentional convolutional network. arXiv:1902.01019v1
Ng JY, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. IEEE conference on computer vision and pattern recognition:4694–4702
Nguyen D, Nguyen K, Sridharan S, Ghasemi A, Dean D and Fookes C (2017) Deep spatio-temporal features for multimodal emotion recognition. IEEE Winter Conference on Applications of Computer Vision
Book
Google Scholar
Noroozi F, Sapiński T, Kamińska D, Anbarjafari G (2017) Vocal-based emotion recognition using random forests and decision tree. International Jornal of Speech Technology 20:239–246
Article
Google Scholar
Ortega JDS, Senoussaoui M, Granger E, and Pedersoli M (2019) Multimodal fusion with deep neural networks for audio-video emotion recognition. arXiv:1907.03196v1.
Ouyang X, Kawaai S, Goh EGH, Shen S, Ding W, Ming H, Huang DY (2017) Audio-visual emotion recognition using deep transfer learning and multiple temporal models. International conference on multimodal interfaces. Glasgow, UK
Google Scholar
Pandeya YR, Lee J (2018) Domestic cat sound classification using transfer learning. International Journal of Fuzzy Logic and Intelligent Systems 18-2:154–160
Article
Google Scholar
Pandeya YR, Kim D, and Lee J (2018) Domestic cat sound classification using learned features from deep neural nets. Applied science 1949
Pini S, Ben-Ahmed O, Cornia M, Baraldi L, Cucchiara R, Huet B (2017) Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild. International conference on multimodal interfaces. Glasgow, UK
Google Scholar
Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Information Fusion 37:98–125
Article
Google Scholar
Ringeval F, Sonderegger A, Sauer J, and Lalanne D (2013) Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
Rozgic V, Vitaladevuni SN, Prasad R (2013) Robust EEG emotion classification using segment level decision fusion. IEEE International Conference on Acoustics, Speech and Signal Processing
Book
Google Scholar
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39-6:1161–1178
Article
Google Scholar
Shiqing Z, Shiliang Z, Huang T, Gao W, Tian Q (2018) Learning affective features with a hybrid deep model for audio-visual emotion recognition. IEEE Transactions on Circuits and Systems for Video Technology:28–10
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15-1:1929–1958
MathSciNet
MATH
Google Scholar
Su YC, Chiu TH, Yeh CY, Huang HF, and Hsu WH (2015) Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network. arXiv:1409.4127v2
Sun K, Yu J, Huang Y, and Hu X (2009) An improved valence-arousal emotion space for video affective content representation and recognition. IEEE International Conference on Multimedia and Expo
Google Scholar
Tan C, Sun F, Kong T, Zhang W, Yang C, and Liu C (2018) A survey on deep transfer learning. arXiv:1808.01974v1
Thayer RE (1989) The biopsychology of mood and arousal. Oxford University Press
Tian H, Tao Y, Pouyanfar S, Chen SC, Shyu ML (2019) Multimodal deep representation learning for video classification. World Wide Web 22:1325–1341
Article
Google Scholar
Tiwari SN, Duong NQK, Lefebvre F, Demarty CH, Huet B and Chevallier L (2016) Deep features for multimodal emotion classification. HAL-01289191.
Torrey L, Shavlik J (2009) Transfer learning. IGI Global Publication Handbook of Research on Machine Learning Applications
Tran D, Bourdev L, Fergus R, Torresani L, and Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. IEEE International Conference on Computer Vision 4489–4497
Tremblay J, To T, Sundaralingam B, Xiang Y, Fox D, and Birchfield S (2018) Deep object pose estimation for semantic robotic grasping of household objects. arXiv:1809.10790v1
Tripathi S, Acharya S, and Sharma RD (2017) Using deep and convolutional neural networks for accurate emotion classification on DEAP dataset. Twenty-Ninth Association for the Advancement of Artificial Intelligence Conference on Innovative Applications
Google Scholar
Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, and Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE Journal of selected topics in signal processing 1301-1309
Wang S, Ji Q (2015) Video affective content analysis: a survey of state-of-the-art methods. IEEE Trans Affect Comput
Wang D, Zheng TF (2015) Transfer learning for speech and language processing. APSIPA Annual Summit and Conference 2015
Wu H, Chen Y, Wang N, and Zhang Z (2019) Sequence level semantics aggregation for video object detection. arXiv:1907.06390v2
Xu YS, Fu TJ, Yang HK, Lee CY (2018) Dynamic video segmentation network. arXiv:1804.00931v2
Yang YH and Chen HH (2012) Machine recognition of music emotion: a review. ACM transactions on intelligent systems and technology 3-3-40
Zhang L and Zhang J (2018) Synchronous prediction of arousal and valence using LSTM network for affective video content analysis. arXiv:1806.00257
Zhang L, Tjondronegoro D, Chandran V (2014) Representation of facial expression categories in continuous arousal–valence space: feature and correlation. Image Vis Comput 32:1067–1079
Article
Google Scholar
Zhang S, Zhang S, Huang T, Gao W (2016) Multimodal deep convolutional neural network for audio-visual emotion recognition. ACM on international conference on multimedia retrieval 281-284.