Skip to main content
Log in

Heterogeneous Information Fusion based Topic Detection from Social Media Data

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Due to the pervasive nature of social networking platforms, as well as the proliferation of user generated content, the internet has become a repository of unstructured multimedia data. The use of this huge data for user experience enhancement is still a problem, where topic detection is one of the solutions to solve this issue, not having been explored in the literature for this application. Videos with similar content or related to the same topic can be grouped together with the help of topic detection methods. In this paper, a framework for topic detection using web videos textual metadata has been developed. The key contribution in this paper is to leverage multimedia metadata to find web video topics using a two-step process . First, we used transformer-based model to perform topic modeling for identification of topics from the heterogeneous textual data of web videos. Second, topic-based video retrieval has been accomplished using a classification approach. Further, experiments are carried out on a publicly available dataset to assess the performance of the proposed method. The proposed work is compared to the state-of-the-art methods Discriminative Probabilistic Models (DPM), Event clustering based method (ECBM),Multi-Modality Based Method (MMBM), Side-Information Based Method (SIBM), and Similarity Cascades(SC), which shows that the proposed system outperforms others in terms of Precision, Recall, F-measure and Accuracy. The experimental results demonstrates the effectiveness of proposed method for topic detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Aggarwal, C.C., Hinneburg, A., & Keim, D.A. (2001). On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory (pp. 420–434). Springer.

  • Allan, J., Carbonell, J.G., Doddington, G., Yamron, J., & Yang, Y. (1998). Topic detection and tracking pilot study final report.

  • Allaoui, M., Kherfi, M.L., & Cheriet, A. (2020). Considerably improving clustering algorithms using umap dimensionality reduction technique: a comparative study. In International conference on image and signal processing (pp. 317–325). Springer.

  • Bao, B.-K., Xu, C., Min, W., & Hossain, M.S. (2015). Cross-platform emerging topic detection and elaboration from multimedia streams. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 11(4), 1–21.

    Article  Google Scholar 

  • Beyer, K.S., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is ”nearest neighbor” meaningful?. In Proceedings of the 7th international conference on database theory, ICDT ’99 (pp. 217–235). Berlin: Springer.

  • Cao, J., Ngo, C. -W., Zhang, Y. -D., & Li, J. -T. (2011). Tracking web video topics: Discovery, visualization, and monitoring. IEEE Transactions on Circuits and Systems for Video Technology, 21(12), 1835–1846.

    Article  Google Scholar 

  • Cao, J., Zhang, Y., Ji, R., Xie, F., & Su, Y. (2016). Web video topics discovery and structuralization with social network. Neurocomputing, 172, 53–63.

    Article  Google Scholar 

  • Cao, J., Zhang, Y.-D., Song, Y.-C., Chen, Z.-N., Zhang, X., & Li, J.-T. (2009). Mcg-webv: A benchmark dataset for web video analysis. 10.

  • Chen, T., Liu, C., & Huang, Q. (2012). An effective multi-clue fusion approach for web video topic detection. In Proceedings of the 20th ACM international conference on multimedia (pp. 781–784).

  • Devlin, J., Chang, M. -W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.

  • Everingham, M., Van Gool, L., Williams, C.K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Gandhi, A., Sharma, A., Biswas, A., & Deshmukh, O. (2016). Gethr-net: A generalized temporally hybrid recurrent neural network for multimodal information fusion. In European conference on computer vision (pp. 883–899). Springer.

  • Gialampoukidis, I., Moumtzidou, A., Liparas, D., Vrochidis, S., & Kompatsiaris, I. (2016). A hybrid graph-based and non-linear late fusion approach for multimedia retrieval. In 2016 14th International workshop on content-based multimedia indexing (CBMI) (pp. 1–6). IEEE.

  • Grootendorst, M. (2022). Bertopic: neural topic modeling with a class-based tf-idf procedure. arXiv:2203.05794.

  • He, Q., Chang, K., Lim, E. -P., & Banerjee, A. (2010). Keep it simple with time: A reexamination of probabilistic topic detection models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(10), 1795–1808.

    Article  Google Scholar 

  • Lan, Z.-z., Bao, L., Yu, S.-I., Liu, W., & Hauptmann, A.G. (2012). Double fusion for multimedia event detection. In International conference on multimedia modeling (pp. 173–185). Springer.

  • Li, G., Jiang, S., Zhang, W., Pang, J., & Huang, Q. (2016a). Online web video topic detection and tracking with semi-supervised learning. Multimedia Systems, 22(1), 115–125.

    Article  Google Scholar 

  • Li, W., Joo, J., Qi, H., & Zhu, S. -C. (2016b). Joint image-text news topic detection and tracking by multimodal topic and-or graph. IEEE Transactions on Multimedia, 19(2), 367–381.

    Article  Google Scholar 

  • Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link lda: joint models of topic and author community.

  • Lu, Z., Lin, Y. -R., Huang, X., Xiong, N., & Fang, Z. (2017). Visual topic discovering, tracking and summarization from social media streams. Multimedia Tools and Applications, 76(8), 10855–10879.

    Article  Google Scholar 

  • Manning, C., Raghavan, P., & Schütze, H. (2010). Introduction to information retrieval. Natural Language Engineering, 16(1), 100–103.

    Google Scholar 

  • McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426.

  • Min, W., Bao, B. -K., Xu, C., & Hossain, M.S. (2015). Cross-platform multi-modal topic modeling for personalized inter-platform recommendation. IEEE Transactions on Multimedia, 17(10), 1787–1801.

    Article  Google Scholar 

  • Pandove, D., Goel, S., & Rani, R. (2018). Systematic review of clustering high-dimensional and large datasets. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(2), 1–68.

    Article  Google Scholar 

  • Pang, J., Jia, F., Zhang, C., Zhang, W., Huang, Q., & Yin, B. (2015). Unsupervised web topic detection using a ranked clustering-like pattern across similarity cascades. IEEE Transactions on Multimedia, 17 (6), 843–853.

    Article  Google Scholar 

  • Pang, J., Tao, F., Li, L., Huang, Q., Yin, B., & Tian, Q. (2018). A two-step approach to describing web topics via probable keywords and prototype images from background-removed similarities. Neurocomputing, 275, 478–487.

    Article  Google Scholar 

  • Papadopoulos, S., Zigkolis, C., Kompatsiaris, Y., & Vakali, A. (2011). Cluster-based landmark and event detection for tagged photo collections. IEEE Multimedia Magazine, 18(1), 52–63.

    Article  Google Scholar 

  • Qian, S., Zhang, T., Xu, C., & Shao, J. (2015). Multi-modal event topic model for social event analysis. IEEE Transactions on Multimedia, 18(2), 233–246.

    Article  Google Scholar 

  • Shahaf, D., & Guestrin, C. (2010). Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 623–632).

  • Shao, J., Ma, S., Lu, W., & Zhuang, Y. (2012). A unified framework for web video topic discovery and visualization. Pattern Recognition Letters, 33(4), 410–419.

    Article  Google Scholar 

  • Steinbach, M., Ertöz, L., & Kumar, V. (2004). The challenges of clustering high dimensional data. In New directions in statistical physics (pp. 273–309). Springer.

  • Wang, Z., Li, L., & Huang, Q. (2015). Cross-media topic detection with refined cnn based image-dominant topic model. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1171–1174).

  • Wu, X., Lu, Y. -J., Peng, Q., & Ngo, C. -W. (2011). Mining event structures from web videos. IEEE MultiMedia, 18(1), 38–51.

    Article  Google Scholar 

  • Xie, L., Natsev, A., Kender, J.R., Hill, M., & Smith, J. R. (2011). Visual memes in social media: tracking real-world news in youtube videos. In Proceedings of the 19th ACM international conference on multimedia (pp. 53–62).

  • Xue, Z., Jiang, S., Li, G., Huang, Q., & Zhang, W. (2013). Cross-media topic detection associated with hot search queries. In Proceedings of the fifth international conference on internet multimedia computing and service (pp. 403–406).

  • Xue, Z., Li, G., Zhang, W., Pang, J., & Huang, Q. (2014). Topic detection in cross-media: a semi-supervised co-clustering approach. International Journal of Multimedia Information Retrieval, 3 (3), 193–205.

    Article  Google Scholar 

  • You, Q., Cao, L., Cong, Y., Zhang, X., & Luo, J. (2015). A multifaceted approach to social multimedia-based prediction of elections. IEEE Transactions on Multimedia, 17(12), 2271–2280.

    Article  Google Scholar 

  • Zeppelzauer, M., & Schopfhauser, D. (2016). Multimodal classification of events in social media. Image and Vision Computing, 53, 45–56.

    Article  Google Scholar 

  • Zhang, W., Chen, T., Li, G., Pang, J., Huang, Q., & Gao, W. (2015). Fusing cross-media for topic detection by dense keyword groups. Neurocomputing, 169, 169–179.

    Article  Google Scholar 

  • Zhang, Y., Li, G., Chu, L., Wang, S., Zhang, W., & Huang, Q. (2013). Cross-media topic detection: A multi-modality fusion framework. In 2013 IEEE International conference on multimedia and expo (ICME) (pp. 1–6). IEEE.

Download references

Acknowledgements

This work is being supported by the Council of Scientific and Industrial Research (CSIR), New Delhi, India, fellowship under award letter no. 09/135(0745)/2016-EMR-I.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seema Rani.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rani, S., Kumar, M. Heterogeneous Information Fusion based Topic Detection from Social Media Data. Inf Syst Front 25, 513–528 (2023). https://doi.org/10.1007/s10796-022-10334-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-022-10334-w

Keywords

Navigation