Multimedia Tools and Applications

, Volume 75, Issue 15, pp 8921–8938 | Cite as

Multi-modal microblog classification via multi-task learning

  • Sicheng Zhao
  • Hongxun YaoEmail author
  • Sendong Zhao
  • Xuesong Jiang
  • Xiaolei Jiang


Recent years have witnessed the flourishing of social media platforms (SMPs), such as Twitter, Facebook, and Sina Weibo. The rapid development of these SMPs has resulted in increasingly large scale multimedia data, which has been proved with remarkable marketing values. It is in an urgent need to classify these social media data into a specified list of concerned entities, such as brands, products, and events, to analyze their sales, popularity or influences. But this is a rather challenging task due to the shortness, conversationality, the incompatibility between images and text, and the data diversity of microblogs. In this paper, we present a multi-modal microblog classification method in a multi-task learning framework. Firstly features of different modalities are extracted for each microblog. Specifically, we extract TF-IDF features for each microblog text and low-level visual features and high-level semantic features for each microblog image. Then multiple related classification tasks are learned simultaneously for each feature to increase the sample size for each task and improve the prediction performance. Finally the outputs of each feature are integrated by a Support Vector Machine that learns how to optimally combine and weight each feature. We evaluate the proposed method on Brand-Social-Net to classify the contained 100 brands. Experimental results demonstrate the superiority of the proposed method, as compared to the state-of-the-art approaches.


Microblog classification Multi-modal classification Multi-task learning Structural regularization Social media analysis 



This work was supported by the National Natural Science Foundation of China (No. 61472103) and Key Program (No. 61133003). Sicheng Zhao was also supported by the Ph.D. Short-Term Overseas Visiting Scholar Program of Harbin Institute of Technology.


  1. 1.
    Ando RK, Zhang T (2005) A framework for learning predictive structures from multiple tasks and unlabeled data. J Mach Learn Res 6:1817–1853MathSciNetzbMATHGoogle Scholar
  2. 2.
    Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. Adv neural infor process syst 19:41Google Scholar
  3. 3.
    Asur S, Huberman BA (2010) Predicting the future with social media. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1, pp 492–499Google Scholar
  4. 4.
    Becker H, Naaman M, Gravano L (2010) Learning similarity metrics for event identification in social media. In: ACM international conference on Web Search and Data Mining, pp 291–300Google Scholar
  5. 5.
    Ben-David S, Schuller R (2003) Exploiting task relatedness for multiple task learning. In: Learning Theory and Kernel MachinesGoogle Scholar
  6. 6.
    Bickel S, Bogojeska J, Lengauer T, Scheffer T (2008) Multi-task learning for hiv therapy screening. In: ACM International Conference on Machine Learning, pp 56–63Google Scholar
  7. 7.
    Borth D, Ji R, Chen T, Breuel T, Chang SF (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM International Conference on Multimedia, pp 223–232Google Scholar
  8. 8.
    Chen C, Li F, Ooi BC, Wu S (2011) Ti: an efficient indexing mechanism for real-time search on tweets. In: ACM SIGMOD International Conference on Management of data, pp 649–660Google Scholar
  9. 9.
    Chen MY, Hauptmann A (2004) Multi-modal classification in digital news libraries. In: Joint ACM/IEEE Conference on Digital Libraries, pp 212–213Google Scholar
  10. 10.
    Chen Y, Li Z, Nie L, Hu X, Wang X, Chua TS, Zhang X (2012) A semi-supervised bayesian network model for microblog topic classification. In: International Conference on Computational Linguistics, pp 561–576Google Scholar
  11. 11.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp 886–893Google Scholar
  12. 12.
    Dunker P, Nowak S, Begau A, Lanz C (2008) Content-based mood classification for photos and music: a generic multi-modal classification framework and evaluation approach. In: ACM International Conference on Multimedia Information Retrieval, pp 97–104Google Scholar
  13. 13.
    Gao Y, Wang F, Luan H, Chua TS (2014) Brand data gathering from live social media streams. In: ACM International Conference on Multimedia RetrievalGoogle Scholar
  14. 14.
    Gao Y, Wang M, Tao D, Ji R, Dai Q. (2012) 3-d object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21(9):4290–4303MathSciNetCrossRefGoogle Scholar
  15. 15.
    Gao Y, Wang M, Zha ZJ, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process 22(1):363–376MathSciNetCrossRefGoogle Scholar
  16. 16.
    Gao Y, Zhao S, Yang Y, Chua TS (2015) Multimedia social event detection in microblog. In: International Conference on Multimedia ModelingGoogle Scholar
  17. 17.
    Gaonkar S, Li J, Choudhury RR, Cox L, Schmidt A (2008) Micro-blog: sharing and querying content through mobile phones and social participation. In: ACM International Conference on Mobile systems, applications, and services, pp 174–186Google Scholar
  18. 18.
    Gong P, Ye J, Zhang C (2012) Robust multi-task feature learning. In: ACM SIGKDD international conference on Knowledge discovery and data mining, pp 895–903Google Scholar
  19. 19.
    Gray KR, Aljabar P, Heckemann RA, Hammers A, Rueckert D (2013) Random forest-based similarity measures for multi-modal classification of alzheimer’s disease. NeuroImage 65:167–175CrossRefGoogle Scholar
  20. 20.
    Gu C, Wang S (2012) Empirical study on social media marketing based on sina microblog. In: IEEE International Conference on Business Computing and Global Informatization, pp 537–540Google Scholar
  21. 21.
    Hanjalic A (2006) Extracting moods from pictures and sounds: Towards truly personalized tv. IEEE Signal Process Mag 23(2):90–100CrossRefGoogle Scholar
  22. 22.
    Jalali A, Ravikumar PD, Sanghavi S, Ruan C (2010) A dirty model for multi-task learning. In: Advances in Neural Information Processing Systems, vol. 3, p 7Google Scholar
  23. 23.
    Ji R, Duan LY, Chen J, Yao H, Yuan J, Rui Y, Gao W (2012) Location discriminative vocabulary coding for mobile landmark search. Int J Comput Vis 96(3):290–314CrossRefzbMATHGoogle Scholar
  24. 24.
    Ji R, Gao Y, Hong R, Liu Q, Tao D, Li X (2014) Spectral-spatial constraint hyperspectral image classification. IEEE Trans Geosci Rem Sens 52(3):1811–1824CrossRefGoogle Scholar
  25. 25.
    Ji R., Gao Y., Liu W., Tian Q., Li X. When location meets social multimedia: A comprehensive survey on location-aware social multimedia. ACM Transactions on Intelligent System and Technology (in press)Google Scholar
  26. 26.
    Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, AUAI Press, pp 339–348Google Scholar
  27. 27.
    Liu Q, Yang Y, Wang X, Cao L (2013) Quality assessment on user generated image for mobile search application. In: International Conference on Multimedia Modeling, pp 1–11Google Scholar
  28. 28.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  29. 29.
    Nagmoti R, Teredesai A, De Cock M (2010) Ranking approaches for microblog search. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1, pp 153–157Google Scholar
  30. 30.
    Naveed N, Gottron T, Kunegis J, Alhadi AC (2011) Searching microblogs: coping with sparsity and document quality. In: ACM International Conference on Information and knowledge management, pp 183–188Google Scholar
  31. 31.
    Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint l2, 1-norms minimization. Adv in Neural Infor Process Syst 23:1813–1821Google Scholar
  32. 32.
    Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recognition 29(1):51–59CrossRefGoogle Scholar
  33. 33.
    Pronobis A, Caputo B (2007) Confidence-based cue integration for visual place recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 2394–2401Google Scholar
  34. 34.
    Pronobis A, Mozos OM, Caputo B, Jensfelt P (2010) Multi-modal semantic place classification. Int J Robot Res 29(2-3):298–320CrossRefGoogle Scholar
  35. 35.
    Reuter T, Cimiano P (2012) Event-based classification of social media streams. In: ACM International Conference on Multimedia Retrieval, p 22Google Scholar
  36. 36.
    Rowlands T, Hawking D, Sankaranarayana R (2010) New-web search with microblog annotations. In: ACM International Conference on World wide web, pp 1293–1296Google Scholar
  37. 37.
    Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Infor process & management 24(5):513–523CrossRefGoogle Scholar
  38. 38.
    Sharifi BP (2010) Automatic microblog classification and summarization. University of Colorado, Ph.D. thesisGoogle Scholar
  39. 39.
    Sharma R, Walavalkar L (2002) Yeasin, M. Multi-modal gender classification using support vector machines (svms)Google Scholar
  40. 40.
    Skowron A, Wang H, Wojna A, Bazan J (2006) Multimodal classification: case studies. In: Transactions on Rough Sets V. Springer, pp 224–239Google Scholar
  41. 41.
    Smeulders AW, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Machine Int 22(12):1349–1380CrossRefGoogle Scholar
  42. 42.
    Sui Y, Yang X (2010) The potential marketing power of microblog. In: IEEE International Conference on Communication Systems, Networks and Applications, vol. 1, pp 164–167Google Scholar
  43. 43.
    Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Society. Series B (Methodological):267–288Google Scholar
  44. 44.
    Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  45. 45.
    Wang F, Qi S, Gao G, Zhao S, Wang X (2014) Logo information recognition in large-scale social media data. Multimedia Syst:1–11Google Scholar
  46. 46.
    Wei Y, Zhang Z, Fei S, Du W (2014) A method of computing the hot topics popularity on the internet combined with the features of the microblogs. In: Frontier and Future Development of Information Technology in Medicine and Education, pp 2721–2728Google Scholar
  47. 47.
    Weng J, Lee BS (2011) Event detection in twitter. In: International AAAI Conference on Weblogs and Social MediaGoogle Scholar
  48. 48.
    Yang J, Yu K, Gong Y, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1794–1801Google Scholar
  49. 49.
    Yang Y, Wang X, Guan T, Shen J, Yu L (2014) A multi-dimensional image quality prediction model for user-generated images in social networks. Infor Scie 281:601–610CrossRefGoogle Scholar
  50. 50.
    Yang YH, Lin YC, Cheng HT, Liao IB, Ho YC, Chen HH (2008) Toward multi-modal music emotion classification. In: Advances in Multimedia Information Processing-PCM. Springer, pp 70–79Google Scholar
  51. 51.
    Zhao S, Gao Y, Jiang X, Yao H, Chua TS, Sun X (2014) Exploring principles-of-art features for image emotion recognition. In: ACM International Conference on MultimediaGoogle Scholar
  52. 52.
    Zhao S, Yao H, Sun X, Jiang X, Xu P (2013) Flexible presentation of videos based on affective content analysis. In: International Conference on Multimedia Modeling, pp 368–379Google Scholar
  53. 53.
    Zhao S, Yao H, Yang Y, Zhang Y (2014) Affective image retrieval via multi-graph learning. In: ACM International Conference on MultimediaGoogle Scholar
  54. 54.
    Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2014) View-based 3d object retrieval via multi-modal graph learning. Signal ProcessingGoogle Scholar
  55. 55.
    Zheng H, Yoshinaga N, Kaji N, Toyoda M (2012) A study on microblog classification based on information publicness. In: DEIM ForumGoogle Scholar
  56. 56.
    Zhou J, Chen J, Ye J (2012) Malsar: Multi-task learning via structural regularization. Arizona State UniversityGoogle Scholar
  57. 57.
    Zhou J, Liu J, Narayan VA, Ye J (2012) Modeling disease progression via fused sparse group lasso. In: ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1095–1103Google Scholar
  58. 58.
    Zhou J, Yuan L, Liu J, Ye J (2011) A multi-task learning formulation for predicting disease progression. In: ACM SIGKDD international conference on Knowledge discovery and data mining, pp 814–822Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Sicheng Zhao
    • 1
  • Hongxun Yao
    • 1
    Email author
  • Sendong Zhao
    • 1
  • Xuesong Jiang
    • 1
  • Xiaolei Jiang
    • 1
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations