Abstract
The rapid growth of multimedia documents has raised huge demand for sophisticated multimedia knowledge discovery systems. The knowledge extraction of the documents mainly relies on the data representation model and the document representation model. As the multimedia document comprised of multimodal multimedia objects, the data representation depends on modality of the objects. The multimodal objects require distinct processing and feature extraction methods resulting in different features with different dimensionalities. Managing multiple types of features is challenging for knowledge extraction tasks. The unified representation of multimedia document benefits the knowledge extraction process, as they are represented by same type of features. The appropriate document representation will benefit the overall decision making process by reducing the search time and memory requirements. In this paper, we propose a domain converting method known as Multimedia to Signal converter (MSC) to represent the multimodal multimedia document in an unified representation by converting multimodal objects as signal objects. A tree based approach known as Multimedia Feature Pattern (MFP) tree is proposed for the compact representation of multimedia documents in terms of features of multimedia objects. The effectiveness of the proposed framework is evaluated by performing the experiments on four multimodal datasets. Experimental results show that the unified representation of multimedia documents helped in improving the classification accuracy for the documents. The MFP tree based representation of multimedia documents not only reduces the search time and memory requirements, also outperforms the competitive approaches for search and retrieval of multimedia documents.
Similar content being viewed by others
References
Adams W, Iyengar G, Lin CY, Naphade MR, Neti C, Nock HJ, Smith JR (2003) Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP J Appl Signal Process 2:170–185
Ananthanarayana V, Murty MN, Subramanian D (2003) Tree structure for efficient data mining using rough sets. Pattern Recogn Lett 24(6):851–862
Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 30th international conference on machine learning, pp 1247–1255
Caicedo JC, BenAbdallah J, González FA, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60
Cazan A, Vârbanescu R, Popescu D (2007) Algorithms and techniques for image to sound conversion for helping the visually impaired people-application proposal. In: 14th international workshop on systems, signals and image processing, 2007 and 6th EURASIP conference focused on speech and image processing, multimedia communications and services. IEEE, pp 471–474
Chen YL, Chiu YT (2011) An ipc-based vector space model for patent retrieval. Inf Process Manag 47(3):309–322
Chim H, Deng X (2008) Efficient phrase-based document similarity for clustering. IEEE Trans Knowl Data Eng 20(9):1217–1229
Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14 (3):734–746
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Fisher B, Perkins S, Walker A, Wolfart E (1996) Hypermedia image processing reference. Wiley, Chichester
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T et al (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129
Huang J, Kumar SR, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: Proceedings of computer society conference on computer vision and pattern recognition, 1997, vol 1997. IEEE, pp 762–768
Hunt MJ, Lennig M, Mermelstein P (1980) Experiments in syllable-based recognition of continuous speech. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’80), vol 5. IEEE, pp 880–883
Zz Lan, Bao L, Yu SI, Liu W, Hauptmann AG (2014) Multimedia classification and event detection using double fusion. Multimedia Tools Appl 71 (1):333–347
Levy M, Sandler M (2009) Music information retrieval using social tags and audio. IEEE Trans Multimedia 11(3):383–395
Li H, Ma B, Lee CH (2007) A vector space modeling approach to spoken language identification. IEEE Trans Audio Speech Lang Process 15(1):271–284
Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404
Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837–842
Mao W, Chu WW (2007) The phrase-based vector space model for automatic retrieval of free-text medical documents. Data Knowl Eng 61(1):76–92
Mao X, Lin B, Cai D, He X, Pei J (2013) Parallel field alignment for cross media retrieval. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 897–906
Monay F, Gatica-Perez D (2007) Modeling semantic aspects for cross-media image indexing. IEEE Trans Pattern Anal Mach Intell 29(10):1802–1817
Muneesawang P, Guan L, Amin T (2010) A new learning algorithm for the fusion of adaptive audio-visual features for the retrieval and classification of movie clips. J Signal Process Syst 59(2):177–188
Nefian AV, Liang L, Pi X, Liu X, Murphy K (2002) Dynamic bayesian networks for audio-visual speech recognition. EURASIP J Appl Signal Process 2002 (1):1274–1288
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 689–696
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Rafailidis D, Manolopoulou S, Daras P (2013) A unified framework for multimodal retrieval. Pattern Recogn 46(12):3358–3370
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Santos I, Laorden C, Sanz B, Bringas PG (2012) Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Syst Appl 39(1):437–444
Sargin ME, Yemez Y, Erzin E et al (2007) Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Trans Multimedia 9(7):1396–1403
Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep boltzmann machines. In: Advances in neural information processing systems, pp 2222–2230
Stricker MA, Orengo M (1995) Similarity of color images. In: IS&T/SPIE’s symposium on electronic imaging: science & technology, international society for optics and photonics, pp 381–392
Taylor P (2009) Text-to-speech synthesis. Cambridge University Press
Tsatsaronis G, Panagiotopoulou V (2009) A generalized vector space model for text retrieval based on semantic relatedness. In: Proceedings of the 12th conference of the european chapter of the association for computational linguistics: student research workshop, association for computational linguistics, pp 70–78
Wang S, Joo J, Wang Y, Zhu SC (2013) Weakly supervised learning for attribute localization in outdoor scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), 2013. IEEE, pp 3111–3118
Wang X, Kankanhalli M (2010) Multifusion: a boosting approach for multimedia fusion. ACM Trans Multimed Comput Commun Appl (TOMM) 6(4):25
Worawitphinyo P, Gao X, Jabeen S (2011) Improving suffix tree clustering with new ranking and similarity measures. In: Advanced data mining and applications. Springer, pp 55–68
Wu P, Hoi SC, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM international conference on Multimedia. ACM, pp 153–162
Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In: Proceedings of the IEEE international conference on computer vision, pp 1177–1184
Yan Y, Ricci E, Liu G, Sebe N (2014) Recognizing daily activities from first-person videos with multi-task clustering. In: Computer vision–ACCV 2014. Springer, pp 522–537
Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995
Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2015) A multi-task learning framework for head pose estimation under target motion. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2015.2477843
Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 175–184
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742
Yoshitaka A, Ichikawa T (1999) A survey on content-based retrieval for multimedia databases. IEEE Trans Knowl Data Eng 11(1):81–93
Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 46–54
Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circuits Syst Video Technol 24 (6):965–978
Zhang Z, Zhang R (2008) Multimedia data mining: a systematic introduction to concepts and theory. CRC Press
Zhao R, Grosky WI (2002) Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimedia 4(2):189–200
Zu Eissen SM, Stein B, Potthast M (2005) The suffix tree document model revisited. In: Proceedings of the 5th international conference on knowledge management, pp 596–603
Author information
Authors and Affiliations
Corresponding author
Appendix:
Appendix:
This section presents the abbreviations and notation used in this paper.
-
MMD - Multimedia Document
-
UMD - Unified Multimedia Document
-
VSD - Vector Space Document model
-
STD - Suffix Tree Document model
-
MSC - Multimedia to Signal Converter
-
MFP - Multimedia Feature Pattern
-
TTS - Text to Speech
-
RMS - Root Mean Square
-
VSMD - Vector Space Multimedia Document
-
MFPC - Multimedia Feature Pattern based Clustering algorithm
-
\(thresh_{ob}\) - Object Similarity Threshold
-
\(DocId\) - Document Identifier
-
\(umd_{i}\) - \(i^{th}\) UMD from the data set
-
\(s_{j}\) - \(j^{th}\) signal object of an UMD
-
\(fs_{m}\) - \(m^{th}\) feature value of a signal object
-
\(b_{k}\) - \(k^{th}\) branch of a MFP tree
-
\(fnode_{n}.\textit {val}\) - value stored in \(\textit {val}\) attribute of \(n^{th}\) feature node of a MFP tree branch
-
\(fb_{m}\) - feature value stored in \(m^{th}\) feature node
Rights and permissions
About this article
Cite this article
Pushpalatha K, Ananthanarayana V S Feature pattern based representation of multimedia documents for efficient knowledge discovery. Multimed Tools Appl 75, 9461–9487 (2016). https://doi.org/10.1007/s11042-016-3434-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3434-y