Abstract
Multimodal data processing has recently become popular due to technological advances and easier access to real video, audio, images, or text data. This data type is often processed using deep neural networks associated with high time and computational complexity. The present work addresses the problem of classifying a multimodal MM-IMDb dataset, representing the problem of recognizing a movie genre based on a poster and a brief description of the plot. For experiments, 20 binary subsets were separated, from which features were then extracted. Features from the text were obtained using the tf-idf method, while the posters were reduced to a single color. Computer experiments on the resulting tabular data were conducted separately on both modalities, as in the concatenated feature space. The results confirmed that classical approaches to feature extraction could allow satisfactory quality classification of multimodal data to be obtained even when using relatively simple pattern recognition algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
GitHub repository - https://github.com/KoEj/Multimodal_classifiers.
References
Agrawal, M., Konolige, K., Blas, M.R.: CenSurE: center surround extremas for realtime feature detection and matching. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 102–115. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_8
Aizawa, A.: An information-theoretic perspective of tf-idf measures. Inf. Process. Manag. 39(1), 45–65 (2003)
Arevalo, J., Solorio, T., y Gómez, M.M., González, F.A.: Gated multimodal units for information fusion (2017)
Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Visual Comput., 1–32 (2021)
Boulahia, S.Y., Amamra, A., Madi, M.R., Daikh, S.: Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach. Vision Appl. 32(6), 121 (2021)
Cavnar, W.B., Trenkle, J.M., et al.: N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, vol. 161175 (1994)
Chen, T., Wang, S., Chen, S.: Deep multimodal network for multi-label classification (2017)
Eshan, S.C., Hasan, M.S.: An application of machine learning to detect abusive bengali text. In: 2017 20th International Conference of Computer and Information Technology (ICCIT), pp. 1–6 (2017)
Komorniczak, J., Zyblewski, P., Ksieniewicz, P.: Prior probability estimation in dynamically imbalanced data streams. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2021)
Kurita, T.: Principal Component Analysis (PCA), pp. 1–4. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03243-2_649-1
Leng, C., Zhang, H., et al.: Local feature descriptor for image matching: a survey. IEEE Access 7, 6424–6434 (2019)
Luhn, H.P.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1(4), 309–317 (1957)
Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural language processing (2021)
Ramachandram, D., Taylor, G.W.: Deep multimodal learning (2017)
Read, J., Perez-Cruz, F.: Deep learning for multi-label classification (2014)
Stapor, K., Ksieniewicz, P., García, S., Woźniak, M.: How to design the fair experimental classifier evaluation. Appl. Soft Comput. 104, 107219 (2021)
Summaira, J., Li, X., Shoib, A.M., Li, S., Abdul, J.: Recent advances and trends in multimodal deep learning: a review (2021)
Topolski, M., Topolska, K.: Algorithm for constructing a classifier team using a modified pca (principal component analysis) in the task of diagnosis of acute lymphocytic leukaemia type b-cll. In: Hybrid Artificial Intelligent Systems (2019)
Uddin, M.P., Mamun, M.A., Hossain, D.M.A.: Pca-based feature reduction for hyperspectral remote sensing image classification (2020)
Yao, W., Moumtzidou, A., et al.: Early and late fusion of multiple modalities in sentinel imagery and social media retrieval (2021)
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Zhang, Y., Jin, R., Zhou, Z.H.: Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. 1, 43–52 (2010)
Acknowledgments
This work was supported by the statutory funds of the Department of Systems and Computer Networks, Faculty of Information and Communication Technology, Wroclaw University of Science and Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Niedziółka, P., Zyblewski, P. (2023). A Non-deep Approach to Classifying Movie Genres Based on Multimodal Data. In: Burduk, R., Choraś, M., Kozik, R., Ksieniewicz, P., Marciniak, T., Trajdos, P. (eds) Progress on Pattern Classification, Image Processing and Communications. CORES IP&C 2023 2023. Lecture Notes in Networks and Systems, vol 766. Springer, Cham. https://doi.org/10.1007/978-3-031-41630-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-41630-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41629-3
Online ISBN: 978-3-031-41630-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)