Feature pattern based representation of multimedia documents for efficient knowledge discovery

Pushpalatha K; Ananthanarayana V S

doi:10.1007/s11042-016-3434-y

Feature pattern based representation of multimedia documents for efficient knowledge discovery

Published: 23 March 2016

Volume 75, pages 9461–9487, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Pushpalatha K¹ &
Ananthanarayana V S¹

328 Accesses
2 Citations
Explore all metrics

Abstract

The rapid growth of multimedia documents has raised huge demand for sophisticated multimedia knowledge discovery systems. The knowledge extraction of the documents mainly relies on the data representation model and the document representation model. As the multimedia document comprised of multimodal multimedia objects, the data representation depends on modality of the objects. The multimodal objects require distinct processing and feature extraction methods resulting in different features with different dimensionalities. Managing multiple types of features is challenging for knowledge extraction tasks. The unified representation of multimedia document benefits the knowledge extraction process, as they are represented by same type of features. The appropriate document representation will benefit the overall decision making process by reducing the search time and memory requirements. In this paper, we propose a domain converting method known as Multimedia to Signal converter (MSC) to represent the multimodal multimedia document in an unified representation by converting multimodal objects as signal objects. A tree based approach known as Multimedia Feature Pattern (MFP) tree is proposed for the compact representation of multimedia documents in terms of features of multimedia objects. The effectiveness of the proposed framework is evaluated by performing the experiments on four multimodal datasets. Experimental results show that the unified representation of multimedia documents helped in improving the classification accuracy for the documents. The MFP tree based representation of multimedia documents not only reduces the search time and memory requirements, also outperforms the competitive approaches for search and retrieval of multimedia documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

Research on Multimedia Database

Content-Based Multimedia Retrieval

References

Adams W, Iyengar G, Lin CY, Naphade MR, Neti C, Nock HJ, Smith JR (2003) Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP J Appl Signal Process 2:170–185
Article Google Scholar
Ananthanarayana V, Murty MN, Subramanian D (2003) Tree structure for efficient data mining using rough sets. Pattern Recogn Lett 24(6):851–862
Article MATH Google Scholar
Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 30th international conference on machine learning, pp 1247–1255
Caicedo JC, BenAbdallah J, González FA, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60
Article Google Scholar
Cazan A, Vârbanescu R, Popescu D (2007) Algorithms and techniques for image to sound conversion for helping the visually impaired people-application proposal. In: 14th international workshop on systems, signals and image processing, 2007 and 6th EURASIP conference focused on speech and image processing, multimedia communications and services. IEEE, pp 471–474
Chen YL, Chiu YT (2011) An ipc-based vector space model for patent retrieval. Inf Process Manag 47(3):309–322
Article Google Scholar
Chim H, Deng X (2008) Efficient phrase-based document similarity for clustering. IEEE Trans Knowl Data Eng 20(9):1217–1229
Article Google Scholar
Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Article Google Scholar
Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14 (3):734–746
Article Google Scholar
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Article Google Scholar
Fisher B, Perkins S, Walker A, Wolfart E (1996) Hypermedia image processing reference. Wiley, Chichester
Google Scholar
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T et al (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129
Huang J, Kumar SR, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: Proceedings of computer society conference on computer vision and pattern recognition, 1997, vol 1997. IEEE, pp 762–768
Hunt MJ, Lennig M, Mermelstein P (1980) Experiments in syllable-based recognition of continuous speech. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’80), vol 5. IEEE, pp 880–883
Zz Lan, Bao L, Yu SI, Liu W, Hauptmann AG (2014) Multimedia classification and event detection using double fusion. Multimedia Tools Appl 71 (1):333–347
Article Google Scholar
Levy M, Sandler M (2009) Music information retrieval using social tags and audio. IEEE Trans Multimedia 11(3):383–395
Article Google Scholar
Li H, Ma B, Lee CH (2007) A vector space modeling approach to spoken language identification. IEEE Trans Audio Speech Lang Process 15(1):271–284
Article Google Scholar
Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404
Article Google Scholar
Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
Article MATH Google Scholar
Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837–842
Article Google Scholar
Mao W, Chu WW (2007) The phrase-based vector space model for automatic retrieval of free-text medical documents. Data Knowl Eng 61(1):76–92
Article Google Scholar
Mao X, Lin B, Cai D, He X, Pei J (2013) Parallel field alignment for cross media retrieval. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 897–906
Monay F, Gatica-Perez D (2007) Modeling semantic aspects for cross-media image indexing. IEEE Trans Pattern Anal Mach Intell 29(10):1802–1817
Article Google Scholar
Muneesawang P, Guan L, Amin T (2010) A new learning algorithm for the fusion of adaptive audio-visual features for the retrieval and classification of movie clips. J Signal Process Syst 59(2):177–188
Article Google Scholar
Nefian AV, Liang L, Pi X, Liu X, Murphy K (2002) Dynamic bayesian networks for audio-visual speech recognition. EURASIP J Appl Signal Process 2002 (1):1274–1288
Article MATH Google Scholar
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 689–696
Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137
Article Google Scholar
Rafailidis D, Manolopoulou S, Daras P (2013) A unified framework for multimodal retrieval. Pattern Recogn 46(12):3358–3370
Article Google Scholar
Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Article MATH Google Scholar
Santos I, Laorden C, Sanz B, Bringas PG (2012) Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Syst Appl 39(1):437–444
Article Google Scholar
Sargin ME, Yemez Y, Erzin E et al (2007) Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Trans Multimedia 9(7):1396–1403
Article Google Scholar
Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep boltzmann machines. In: Advances in neural information processing systems, pp 2222–2230
Stricker MA, Orengo M (1995) Similarity of color images. In: IS&T/SPIE’s symposium on electronic imaging: science & technology, international society for optics and photonics, pp 381–392
Taylor P (2009) Text-to-speech synthesis. Cambridge University Press
Tsatsaronis G, Panagiotopoulou V (2009) A generalized vector space model for text retrieval based on semantic relatedness. In: Proceedings of the 12th conference of the european chapter of the association for computational linguistics: student research workshop, association for computational linguistics, pp 70–78
Wang S, Joo J, Wang Y, Zhu SC (2013) Weakly supervised learning for attribute localization in outdoor scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), 2013. IEEE, pp 3111–3118
Wang X, Kankanhalli M (2010) Multifusion: a boosting approach for multimedia fusion. ACM Trans Multimed Comput Commun Appl (TOMM) 6(4):25
Google Scholar
Worawitphinyo P, Gao X, Jabeen S (2011) Improving suffix tree clustering with new ranking and similarity measures. In: Advanced data mining and applications. Springer, pp 55–68
Wu P, Hoi SC, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM international conference on Multimedia. ACM, pp 153–162
Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In: Proceedings of the IEEE international conference on computer vision, pp 1177–1184
Yan Y, Ricci E, Liu G, Sebe N (2014) Recognizing daily activities from first-person videos with multi-task clustering. In: Computer vision–ACCV 2014. Springer, pp 522–537
Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995
Article MathSciNet Google Scholar
Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2015) A multi-task learning framework for head pose estimation under target motion. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2015.2477843
Google Scholar
Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878
Article MathSciNet Google Scholar
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 175–184
Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742
Article Google Scholar
Yoshitaka A, Ichikawa T (1999) A survey on content-based retrieval for multimedia databases. IEEE Trans Knowl Data Eng 11(1):81–93
Article Google Scholar
Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 46–54
Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circuits Syst Video Technol 24 (6):965–978
Article Google Scholar
Zhang Z, Zhang R (2008) Multimedia data mining: a systematic introduction to concepts and theory. CRC Press
Zhao R, Grosky WI (2002) Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimedia 4(2):189–200
Article Google Scholar
Zu Eissen SM, Stein B, Potthast M (2005) The suffix tree document model revisited. In: Proceedings of the 5th international conference on knowledge management, pp 596–603

Download references

Author information

Authors and Affiliations

National Institute of Technology Karnataka, Surathkal, Mangalore, 575025, India
Pushpalatha K & Ananthanarayana V S

Authors

Pushpalatha K
View author publications
You can also search for this author in PubMed Google Scholar
Ananthanarayana V S
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pushpalatha K.

Appendix:

This section presents the abbreviations and notation used in this paper.

MMD - Multimedia Document
UMD - Unified Multimedia Document
VSD - Vector Space Document model
STD - Suffix Tree Document model
MSC - Multimedia to Signal Converter
MFP - Multimedia Feature Pattern
TTS - Text to Speech
RMS - Root Mean Square
VSMD - Vector Space Multimedia Document
MFPC - Multimedia Feature Pattern based Clustering algorithm
\(thresh_{ob}\) - Object Similarity Threshold
\(DocId\) - Document Identifier
\(umd_{i}\) - \(i^{th}\) UMD from the data set
\(s_{j}\) - \(j^{th}\) signal object of an UMD
\(fs_{m}\) - \(m^{th}\) feature value of a signal object
\(b_{k}\) - \(k^{th}\) branch of a MFP tree
\(fnode_{n}.\textit {val}\) - value stored in \(\textit {val}\) attribute of \(n^{th}\) feature node of a MFP tree branch
\(fb_{m}\) - feature value stored in \(m^{th}\) feature node

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pushpalatha K, Ananthanarayana V S Feature pattern based representation of multimedia documents for efficient knowledge discovery. Multimed Tools Appl 75, 9461–9487 (2016). https://doi.org/10.1007/s11042-016-3434-y

Download citation

Received: 31 October 2015
Revised: 22 February 2016
Accepted: 03 March 2016
Published: 23 March 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s11042-016-3434-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature pattern based representation of multimedia documents for efficient knowledge discovery

Abstract

Access this article

Similar content being viewed by others

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

Research on Multimedia Database

Content-Based Multimedia Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Appendix:

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature pattern based representation of multimedia documents for efficient knowledge discovery

Abstract

Access this article

Similar content being viewed by others

METU-MMDS: An Intelligent Multimedia Database System for Multimodal Content Extraction and Querying

Research on Multimedia Database

Content-Based Multimedia Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Appendix:

Appendix:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation