Skip to main content
Log in

Feature pattern based representation of multimedia documents for efficient knowledge discovery

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The rapid growth of multimedia documents has raised huge demand for sophisticated multimedia knowledge discovery systems. The knowledge extraction of the documents mainly relies on the data representation model and the document representation model. As the multimedia document comprised of multimodal multimedia objects, the data representation depends on modality of the objects. The multimodal objects require distinct processing and feature extraction methods resulting in different features with different dimensionalities. Managing multiple types of features is challenging for knowledge extraction tasks. The unified representation of multimedia document benefits the knowledge extraction process, as they are represented by same type of features. The appropriate document representation will benefit the overall decision making process by reducing the search time and memory requirements. In this paper, we propose a domain converting method known as Multimedia to Signal converter (MSC) to represent the multimodal multimedia document in an unified representation by converting multimodal objects as signal objects. A tree based approach known as Multimedia Feature Pattern (MFP) tree is proposed for the compact representation of multimedia documents in terms of features of multimedia objects. The effectiveness of the proposed framework is evaluated by performing the experiments on four multimodal datasets. Experimental results show that the unified representation of multimedia documents helped in improving the classification accuracy for the documents. The MFP tree based representation of multimedia documents not only reduces the search time and memory requirements, also outperforms the competitive approaches for search and retrieval of multimedia documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Adams W, Iyengar G, Lin CY, Naphade MR, Neti C, Nock HJ, Smith JR (2003) Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP J Appl Signal Process 2:170–185

    Article  Google Scholar 

  2. Ananthanarayana V, Murty MN, Subramanian D (2003) Tree structure for efficient data mining using rough sets. Pattern Recogn Lett 24(6):851–862

    Article  MATH  Google Scholar 

  3. Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 30th international conference on machine learning, pp 1247–1255

  4. Caicedo JC, BenAbdallah J, González FA, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60

    Article  Google Scholar 

  5. Cazan A, Vârbanescu R, Popescu D (2007) Algorithms and techniques for image to sound conversion for helping the visually impaired people-application proposal. In: 14th international workshop on systems, signals and image processing, 2007 and 6th EURASIP conference focused on speech and image processing, multimedia communications and services. IEEE, pp 471–474

  6. Chen YL, Chiu YT (2011) An ipc-based vector space model for patent retrieval. Inf Process Manag 47(3):309–322

    Article  Google Scholar 

  7. Chim H, Deng X (2008) Efficient phrase-based document similarity for clustering. IEEE Trans Knowl Data Eng 20(9):1217–1229

    Article  Google Scholar 

  8. Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  9. Daras P, Manolopoulou S, Axenopoulos A (2012) Search and retrieval of rich media objects supporting multiple multimodal queries. IEEE Trans Multimedia 14 (3):734–746

    Article  Google Scholar 

  10. Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  11. Fisher B, Perkins S, Walker A, Wolfart E (1996) Hypermedia image processing reference. Wiley, Chichester

    Google Scholar 

  12. Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T et al (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems, pp 2121–2129

  13. Huang J, Kumar SR, Mitra M, Zhu WJ, Zabih R (1997) Image indexing using color correlograms. In: Proceedings of computer society conference on computer vision and pattern recognition, 1997, vol 1997. IEEE, pp 762–768

  14. Hunt MJ, Lennig M, Mermelstein P (1980) Experiments in syllable-based recognition of continuous speech. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP’80), vol 5. IEEE, pp 880–883

  15. Zz Lan, Bao L, Yu SI, Liu W, Hauptmann AG (2014) Multimedia classification and event detection using double fusion. Multimedia Tools Appl 71 (1):333–347

    Article  Google Scholar 

  16. Levy M, Sandler M (2009) Music information retrieval using social tags and audio. IEEE Trans Multimedia 11(3):383–395

    Article  Google Scholar 

  17. Li H, Ma B, Lee CH (2007) A vector space modeling approach to spoken language identification. IEEE Trans Audio Speech Lang Process 15(1):271–284

    Article  Google Scholar 

  18. Li Y, Chung SM, Holt JD (2008) Text document clustering based on frequent word meaning sequences. Data Knowl Eng 64(1):381–404

    Article  Google Scholar 

  19. Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693

    Article  MATH  Google Scholar 

  20. Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837–842

    Article  Google Scholar 

  21. Mao W, Chu WW (2007) The phrase-based vector space model for automatic retrieval of free-text medical documents. Data Knowl Eng 61(1):76–92

    Article  Google Scholar 

  22. Mao X, Lin B, Cai D, He X, Pei J (2013) Parallel field alignment for cross media retrieval. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 897–906

  23. Monay F, Gatica-Perez D (2007) Modeling semantic aspects for cross-media image indexing. IEEE Trans Pattern Anal Mach Intell 29(10):1802–1817

    Article  Google Scholar 

  24. Muneesawang P, Guan L, Amin T (2010) A new learning algorithm for the fusion of adaptive audio-visual features for the retrieval and classification of movie clips. J Signal Process Syst 59(2):177–188

    Article  Google Scholar 

  25. Nefian AV, Liang L, Pi X, Liu X, Murphy K (2002) Dynamic bayesian networks for audio-visual speech recognition. EURASIP J Appl Signal Process 2002 (1):1274–1288

    Article  MATH  Google Scholar 

  26. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 689–696

  27. Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Article  Google Scholar 

  28. Rafailidis D, Manolopoulou S, Daras P (2013) A unified framework for multimodal retrieval. Pattern Recogn 46(12):3358–3370

    Article  Google Scholar 

  29. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620

    Article  MATH  Google Scholar 

  30. Santos I, Laorden C, Sanz B, Bringas PG (2012) Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Syst Appl 39(1):437–444

    Article  Google Scholar 

  31. Sargin ME, Yemez Y, Erzin E et al (2007) Audiovisual synchronization and fusion using canonical correlation analysis. IEEE Trans Multimedia 9(7):1396–1403

    Article  Google Scholar 

  32. Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep boltzmann machines. In: Advances in neural information processing systems, pp 2222–2230

  33. Stricker MA, Orengo M (1995) Similarity of color images. In: IS&T/SPIE’s symposium on electronic imaging: science & technology, international society for optics and photonics, pp 381–392

  34. Taylor P (2009) Text-to-speech synthesis. Cambridge University Press

  35. Tsatsaronis G, Panagiotopoulou V (2009) A generalized vector space model for text retrieval based on semantic relatedness. In: Proceedings of the 12th conference of the european chapter of the association for computational linguistics: student research workshop, association for computational linguistics, pp 70–78

  36. Wang S, Joo J, Wang Y, Zhu SC (2013) Weakly supervised learning for attribute localization in outdoor scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), 2013. IEEE, pp 3111–3118

  37. Wang X, Kankanhalli M (2010) Multifusion: a boosting approach for multimedia fusion. ACM Trans Multimed Comput Commun Appl (TOMM) 6(4):25

    Google Scholar 

  38. Worawitphinyo P, Gao X, Jabeen S (2011) Improving suffix tree clustering with new ranking and similarity measures. In: Advanced data mining and applications. Springer, pp 55–68

  39. Wu P, Hoi SC, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM international conference on Multimedia. ACM, pp 153–162

  40. Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In: Proceedings of the IEEE international conference on computer vision, pp 1177–1184

  41. Yan Y, Ricci E, Liu G, Sebe N (2014) Recognizing daily activities from first-person videos with multi-task clustering. In: Computer vision–ACCV 2014. Springer, pp 522–537

  42. Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995

    Article  MathSciNet  Google Scholar 

  43. Yan Y, Ricci E, Subramanian R, Liu G, Lanz O, Sebe N (2015) A multi-task learning framework for head pose estimation under target motion. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2015.2477843

    Google Scholar 

  44. Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann AG, Sebe N (2015) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878

    Article  MathSciNet  Google Scholar 

  45. Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM international conference on multimedia. ACM, pp 175–184

  46. Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742

    Article  Google Scholar 

  47. Yoshitaka A, Ichikawa T (1999) A survey on content-based retrieval for multimedia databases. IEEE Trans Knowl Data Eng 11(1):81–93

    Article  Google Scholar 

  48. Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 46–54

  49. Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circuits Syst Video Technol 24 (6):965–978

    Article  Google Scholar 

  50. Zhang Z, Zhang R (2008) Multimedia data mining: a systematic introduction to concepts and theory. CRC Press

  51. Zhao R, Grosky WI (2002) Narrowing the semantic gap-improved text-based web document retrieval using visual features. IEEE Trans Multimedia 4(2):189–200

    Article  Google Scholar 

  52. Zu Eissen SM, Stein B, Potthast M (2005) The suffix tree document model revisited. In: Proceedings of the 5th international conference on knowledge management, pp 596–603

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pushpalatha K.

Appendix:

Appendix:

This section presents the abbreviations and notation used in this paper.

  • MMD - Multimedia Document

  • UMD - Unified Multimedia Document

  • VSD - Vector Space Document model

  • STD - Suffix Tree Document model

  • MSC - Multimedia to Signal Converter

  • MFP - Multimedia Feature Pattern

  • TTS - Text to Speech

  • RMS - Root Mean Square

  • VSMD - Vector Space Multimedia Document

  • MFPC - Multimedia Feature Pattern based Clustering algorithm

  • \(thresh_{ob}\) - Object Similarity Threshold

  • \(DocId\) - Document Identifier

  • \(umd_{i}\) - \(i^{th}\) UMD from the data set

  • \(s_{j}\) - \(j^{th}\) signal object of an UMD

  • \(fs_{m}\) - \(m^{th}\) feature value of a signal object

  • \(b_{k}\) - \(k^{th}\) branch of a MFP tree

  • \(fnode_{n}.\textit {val}\) - value stored in \(\textit {val}\) attribute of \(n^{th}\) feature node of a MFP tree branch

  • \(fb_{m}\) - feature value stored in \(m^{th}\) feature node

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pushpalatha K, Ananthanarayana V S Feature pattern based representation of multimedia documents for efficient knowledge discovery. Multimed Tools Appl 75, 9461–9487 (2016). https://doi.org/10.1007/s11042-016-3434-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3434-y

Keywords

Navigation