1 Introduction

Essential to many tasks in relation to multimedia research and development is the availability of a sufficiently large data set and its corresponding ground truth. However, most available data for multimedia research are either too specific, e.g., data for text retrieval; too small, e.g., face figures; nor without ground truth, such as gathering millions of un-preprocessing images from the Web for testing. While it is relatively easy to crawl and store a huge amount of data, the creation of ground truth necessary to systematically train, test, evaluate, and compare the performance of various algorithms and systems remains a challenging issue. For this reason, researchers tend to put (or re-direct) efforts into the creation of such corpus individually to carry out research on large-scale data sets. Thus, a promising trend of a united web-scale and distributed multimedia data management is urgently needed, which would benefit the entire multimedia research community.

This special issue presents and reports on the construction and analysis of large-scale multimedia data sets and resources, and provides a strong reference for multimedia researchers interested in large-scale multimedia data sets. In particular, the special issue demonstrates the emerging techniques and applications for large-scale multimedia data management.

This special issue contains original papers describing the latest developments, trends, and solutions, such as:

  • Algorithms, techniques, framework, and models for multimedia computing

  • Multimedia human-computer interaction

  • Mobile and multi-device empowered multimedia

  • Large-scale multimedia data management

  • Ubiquitous/pervasive data for multimedia

  • Social media and presence

  • Cloud-based multimedia services

  • Web-scale data management and analysis

  • Security issues for multimedia computing

  • Media/data transport, analysis, and delivery

  • Data searching, browsing, and discovery

  • Emerging systems, services, and middleware

  • Crowd-sourcing, authoring, and collaboration

  • Recent other issues in large-scale multimedia data

2 Related works

Seung-Hoon Chae et al. [3] suggested a method to auto-configure the initial contour in the level-set method. Multi-resolution analysis helped in reducing the pace of the auto-configuration process of the initial contour. In addition, the volume data of a CT image was used to prevent data loss that occurs during the MRA transformation process.

Jia Uddin et al. [32] presented a graphics processing unit (GPU)-based implementation of a Bellman-Ford (BF) routing algorithm [1] using NVIDIA’s compute unified device architecture (CUDA) [19]. The Bellman-Ford (BF) algorithm computes the shortest paths from a single source vertex to all of the other vertices in a weighted graph. In the proposed GPU-based approach, multiple threads run concurrently over numerous streaming processors in the GPU to dynamically update routing information. Instead of computing the individual vertex distances one-by-one, a number of threads concurrently updated a larger number of vertex distances, and an individual vertex distance was represented in a single thread.

Kwangmu Shin et al. [28] proposed a novel stereo matching approach that was robust in controlling various radiometric variations such as local and global radiometric variations. They designed a hybrid stereo matching approach using transition of pixel values and data fitting. Transition of pixel values was utilized for the coarse stereo matching stage, and polynomial curve fitting was used for the fine stereo matching stage. Consequentially, they demonstrated that the proposed method was less sensitive to various radiometric variations, and showed an outstanding performance in computational complexity. Jianxin Liao et al. [20] presented LFFIR, a multi-feature image retrieval framework for content similar search in the distributed situation. The key idea was to effectively incorporate image retrieval based on multi-feature into the peer-to-peer (P2P) [29] paradigm. LFFIR fused the multiple features in order to capture the overall image characteristics. And then, it constructed the distributed indexes for the fusion feature through exploiting the property of locality sensitive hashing (LSH).

Oh Jung Min et al. [23] presented a database management scheme for an intelligent surveillance system utilizing multiple visual sensors and RFID readers. The objects were tracked and identified by multiple visual sensors and RFID readers. They defined three different types of data structure to consistently store data for effective data storage. They contained global object number and identification as the common information of the same object. The global object number was uniquely assigned for each track object. The previously stored data without the common information was back-annotated when it was available in the system. Moreover, when the global object number changed because of imperfect detection and tracking, the system maintained consistency information between global object numbers for the same object by comparing their local target information or positions. The fragmented information for an object was also stitched through map information.

Myeongsu Kang et al. [16] showed that the formant synthesis process using multiple pairs of digital resonators and band-pass filters was accelerated with the power of a general-purpose graphics processing unit (GPGPU) [2, 8, 25, 27, 30, 31]. This research compared the performance of the proposed GPGPU-based parallel approach with the CPU-based sequential approach in order to validate the effectiveness of the proposed massively parallel method.

Xavier Jerald Punithan et al. [24] proposed a two-player non-cooperative zero-sum game with incomplete information for dynamic intrusion signature configuration (DISC), where the various lengths of an intrusion signature had been activated in a time-shared manner. After formulating the problem into the game theoretic approach, they found the optimal strategy for DISC in the S-IDS. To the best of our knowledge, this work was the first of its kind that analyzes the optimal DISC strategy against the various mutants of intrusion packets.

ChangWon Jeong et al. [12] suggested a medical image information system environment using data synchronization methods. They designed as synchronization methods using detection of creation image data on components of system. Also, they used the cloud-computing environment, which reduced the number of high-latency image transmissions. Finally, they showed the data synchronization process of the system with imaging application services based on a cloud-computing service.

Seung-Won Jung et al. [14] proposed a simple but effective method for obtaining an all-in-focus (AIF) color image from a database of color and depth image pairs. Since the defocus blur was inherently depth-dependent, the color pixels were first grouped according to their depth values. The defocus blur parameters were then estimated using the amount of the defocus blur of the grouped pixels. Given a defocused color image and its estimated blur parameters, the AIF image was produced by adopting the conventional pixel-wise mapping technique. In addition, the availability of the depth image disambiguated the objects located far or near from the in-focus object and thus facilitates image refocusing.

Shingchem D. You et al. [34] studied the accuracy of detecting singing segments using the hidden Markov model (HMM) classifier with various features, including Mel frequency cepstral coefficients (MFCC) [26], linear predictive cepstral coefficients (LPCC), and linear prediction coefficients (LPC). Simulation results showed that detecting singing segments in a soundtrack was more difficult than detecting them among pure-instrument segments. In addition, combining MFCC and LPCC yields higher accuracy. The bootstrapping technique has only limited accuracy improvement to detect all singing segments in a soundtrack.

Shuai Liu et al. [22] used fractal image encoding into the compression because of its high compression ratio by extracted and analyzed the loss in the fractal encoding. To solve the most important problem in fractal image encoding method, which was its high computational complexity and long encoding time, they first used statistical analysis to the fractal encoding method. They created its box-plot to find the distributional of loss value. Then, they partitioned them to several parts and map them to the given model. After that, they presented a novel method to save the loss and maintain the quality in image compression.

Ing-Jr Ding et al. [7] explored the well-known HMM pattern recognition method with the support of the Kinect device to classify the human’s active gestures where a user adaptation scheme of MAP+GoSSRT that enhances MAP by incorporating group of states shifted by referenced transfer (GoSSRT)

Weina Fu et al. [9] proposed a novel method, which was suitable for applying on relatively high-resolution videos that moving objects can be distinguished from their color and shape information. This method matched and tracked multiple moving objects in video by extracting and combining multi-features. With the background re-construction method, the moving objects were separated as sub images from the background; they first extracted some valuable features from each sub image, especially the topological information. Then, features were applied to a strong classifier which was accumulated with weak feature classifiers. After that, by the initial matching of moving objects, they extracted their kinematical features to reinforce the matching method.

Hao Chen et al. [4] introduced the continuous development of information technology. Various multimedia data were constantly emerging and presented the characteristics of autonomous and heterogeneous; how to integrate and analysis data more correctly and efficiently has become a challenging problem. Firstly, in order to improve the quality of the integrated data, two real-time threads combined with data adapter were used to monitor and refresh necessary updates from heterogeneous data efficiently. Once the original data had been updated, the real-time data will be loaded into the data center soon. Secondly, a data reverse cleaning method was proposed to improve the data quality. It used the data source tree that built in the data integration process to find the location of the original data quickly after reverse cleaning. Finally, a data accuracy assessment algorithm was designed for data quality assessment, which was based on Bayesian network and the path condition algorithm.

Hyosook Jung et al. [15] proposed an approach to introduce the Semantic Web to novice users. To this end, they had built an easy-to-use system that helps users create simple RDF documents and construct a small-scale Semantic Web-like environment. Their system could take an input that a user provided and created an RDF document and all that the user needed to do was to define a string for the RDF document according to the grammar. Users could also define simple rules using the grammar and practice programming using RDF documents.

Gelan Yang et al. [33] proposed a wavelet-energy based new approach for automated classification of MR human brain images as normal or abnormal. SVM was used as the classifier, and biogeography-based optimization (BBO) was introduced to optimize the weights of the SVM. The study offered a new means to detect abnormal brains with excellent performance.

Ruijun Liu et al. [21] proposed a novel feature encoding method called label constrained sparse coding (LCSC) for visual representation. The visual similarities between local features were jointly considered with the corresponding label information of local features. This was achieved by combining the label constraints with the encoding of local features. In this way, they could ensure that similar local features with the same label were encoded with similar parameters. Local features with different labels were encoded with dissimilar parameters to increase the discriminative power of encoded parameters. Besides, instead of optimizing for the coding parameter of each local feature separately, they jointly encoded the local features within one sub-region in the spatial pyramid way to combine the spatial and contextual information of local features. They applied this label constrained sparse coding technique for classification tasks on several public image datasets to evaluate its effectiveness.

Dong-yuan Ge et al. [10] proposed a new approach for binocular vision system calibration and 3D re-construction. While the system was calibrated, the sum of square distances between the vector coordinates recombined with the coordinates of feature points in the world frame and those in image frame to the fitted hyperplane was taken as an objective function. An orthogonal learning neural network was designed, where a self-adaptive minor component extracting method was adopted. When the network comes to equilibrium, the projective matrixes for the two cameras were obtained from the eigenvectors of the autocorrelation matrix corresponding to the minimum eigenvalues, so the calibration of the binocular vision system was achieved. As for 3D re-construction, an autocorrelation matrix was obtained from feature point coordinates in image planes and calibration data, and an orthogonal learning network was designed. After the network was trained, the autocorrelation matrix’s eigenvector corresponding to the minimum eigenvalues was obtained, from which the 3D coordinates are obtained also. The proposed approach was a novel application of minor component analysis and orthogonal learning network in binocular vision system and 3D re-construction.

Ching-Nung Yang et al. [18] show a new data hiding scheme for verifying the embedding rate during the embedding and extracting phases. The proposed research “Hamming+3,” which is a reasonably acceptable steganography method, shows better performance than “Hamming+1.” Woogyoung Jun et al. [13] proposed the duplicate video detection for large-scale multimedia that is based on block histogram. It uses a dynamic matching algorithm for fast and real-time process of large-scale data. Ran Choi et al. [5] proposed a method that uses lattice block pattern to be tested in a white plaster sphere with 14-cm diameter to reconstruct 3D surface curvature. Junchul Chun et al. [6] shows an idea for 3D face pose estimation by a robust real-time tracking of facial features. This research detects facial the facial region and major facial features by Haar-like feature and AdaBoost learning algorithm. Finally, Jin-Mook Kim et al. [17] suggested a novel model of combined risk probability map generation to predict crime frequency. It analyzes risk in collective residential areas using urban spatial information, and then two risk probability maps are generated by the information based on terrain.