Recent years have witnessed the emerging and booming of social platforms (e.g., Facebook, Instagram, YouTube, and Flickr etc.), where people record, share/broadcast, and comment on media content, i.e., images and videos, leading to an accelerated proliferation of social media on the Internet. The rapid increase of social media has raised numerous new research challenges to multimedia content analysis. On the other hand, the availability of massive social media has also presented many exciting opportunities and challenges to attack in particular within the field of multimedia content analysis.

This special issue aims to report recent research on large-scale social multimedia analysis and presents papers dealing with the various topics in this field. After rigorous reviewing, 11 papers were accepted for publication in MTAP. These papers cover most aspects for social media analysis. In particular, the papers of this issue concern (i) feature description and indexing for social media content, (ii) semantic understanding of social media, (iii) efficiency strategy for large scale practical social media analysis systems, (iv) social multimedia enabled applications, (v) social image processing and finally (vi) evaluations for social video content analysis. They are briefly discussed as follows.

Image feature description is the fundamental for social image analysis and the efficacy of feature description directly affects the performance of image retrieval, annotation and recognition. In the article “Fuzzy Bag of Visual Words for Social Image Description” (10.1007/s11042-014-2138-4), Li et al. propose to utilize the fuzzy set theory to model and measure the ambiguity between the image features and visual words in codebook in the Bag of Visual Words representation. A new fuzzy membership function is designed and a genetic algorithm is proposed to obtain its optimal parameters. In article “Binary Code Re-ranking for Large-scale Image Retrieval” (10.1007/s11042-014-2087-y), the authors address the issue of feature indexing in large scale social image retrieval. Two image ranking methods, i.e., distance weights based re-ranking and bit importance based re-ranking methods are developed to re-rank the hashing indexed images for the given query. These methods assign high weights to important bits and small weights to less important bits, thus achieve better retrieval performance than traditional hashing approaches based on Hamming distance.

Social multimedia is characterized with the vast volume of data and the diversity of content even for the same category, which bring great challenges to the content understanding research and, scalable pattern analysis and machine learning techniques for social media are demanding. In article “Visual Concept Detection of Web Images based on Group Sparse Ensemble Learning” (10.1007/s11042-014-2179-8), the authors propose to collect large scale training data based on dictionary coherence to cover a wide variety of samples to handle the huge intra-class variations for visual concept detection. To efficiently train large scale data, they also propose a group sparse ensemble learning approach based on Automatic Group Sparse Coding. The article “Active Learning SVM with Regularization Path for Image Classification” (10.1007/s11042-014-2141-9) reports the recent work of active learning which is useful in image retrieval and classification. Observing the model parameters of active learning support vector machine are closely related to the training set, Sun et.al. argue that they should be dynamically adjusted instead of being kept fixed and, propose a novel approach to fit the entire solution path of SVM for every value of model parameters. While the article “Salient object detection and classification for stereoscopic images” (10.1007/s11042-014-2142-8) describes visual content analysis for a particular kind of media, stereoscopic images, which have become more and more prevalent following the rapid advances in 3D capturing and display techniques. Specifically, an iterative method that can mutually boost salient object detection and object classification is proposed for stereoscopic images.

Efficiency is critical in large scale social media content analysis and two accepted articles present good practice on this problem. The first article, “Realtime and Robust Object Matching with a Large Number of Templates” (10.1007/s11042-014-2305-7), presents a novel template based object matching approach based on improved Dominant Orientation Templates (DOT). The authors first propose a compact representation for DOT to greatly reduce the size of feature vector and then propose a fast partial occlusion handling strategy to boost the robustness of DOT. The second article “Large-Scale Paralleled Sparse Principal Component Analysis” (10.1007/s11042-014-2004-4) reports an efficient and paralleled method of sparse Principal Component Analysis using graphics processing units (GPUs). The parallel GPU implementation is up to 11 times faster than the corresponding CPU implementation, showing significant advantage for large data analysis.

User generated content, available in massive amounts on the Internet, enable many potential applications. The article “Event-based Cross Media Question Answering” (10.1007/s11042-014-2085-0) targets at the representation of events using multimedia data. The authors present a framework for leveraging social media data to extract and illustrate social events automatically on any given query. Nowadays, social media has shown its predictive power in various domains. In the article “Predicting Movie Box-Office Revenues by Exploiting Large-Scale Social Media Content” (10.1007/s11042-014-2270-1), the authors use the crowd wisdom of social media, especially the posts of users, to predict movie box-office revenues. Both linear regression model and Support Vector Regression are employed for the prediction.

Some work focuses on the processing of social media. The article “Spatially guided local Laplacian filter for nature image detail enhancement” (10.1007/s11042-014-2058-3) addresses the enhancement of nature images in social media. Specifically, an improved local Laplacian filter that spatially guides the filtering strength by approximating the richness of image details is proposed. The proposed method endows the Laplacian filter with the ability to dynamically assign appropriate parameter values to different image content.

Along with the emerging focus of community-contributed videos on the web, there is a strong demand of a well-designed web video benchmark for the research of social network based video content analysis. The article “On Application-Unbiased Benchmarking of Web Videos from a Social Network Perspective” (10.1007/s11042-014-2245-2) releases such a benchmark named MCG-WEBV 2.0, which crawls 248,887 YouTube videos and their corresponding social network structure with 123,063 video contributors. The benchmark can be used to explore the fusion between content and network for several web video analysis tasks.

We thank all authors for their contributions to this special issue. We would also like to thank to all the reviewers for their hard work in judging and improving the quality of the submitted papers.