Till now, big data is an emerging paradigm applied to massive datasets whose sizes are beyond the ability of commonly used software tools to capture, manage, and process within a tolerable elapsed time. Big data are very important for us to understand the world. Big data are often captured from various sources including social media, sensors network, scientific applications, surveillance, Internet texts and documents, business intelligence and web logs. There are several typical characteristics, including large size, heterogeneous structures, complex processing.

In big data world, there are many important techniques, including novel programming model, new system architecture, novel data storage schemes and novel data partition schemes, etc. Since P2P technologies show the good scalability, low processing cost in the last years on media streaming and file sharing services, it can be applied into the big data management to solve some key problems, including how to organize big data for indexing, searching, processing in a good distributed way, and how to assign data processing jobs in distributed environment.

The purpose of the proposed special issue is to provide a comprehensive view into recent advances in big data management. Our call for papers explicitly asked for contributions illustrating either experiences in conducting real-world data related systems based on peer-to-peer technologies or innovative methodologies that could be useful for assessing such applications with respect to non-functional requirements, such as performance, scalability, fault tolerance, availability and security. The call for papers attracted 13 submissions. Each paper was carefully reviewed by at least three reviewers. At the end of the reviewing process, we selected six papers that we present in this special issue.

“Probabilistic Nearest Neighbor Queries of Uncertain Data via Wireless Data Broadcast” by Fangzhou Zhu et al. focuses on the performance of location-dependent query processing data for smart mobile devices. The key idea of this submission leverages the key properties of Voronoi Diagrams for Uncertain Data (UV-Diagram). This study proposes effective algorithms and demo examples to improve the performance of processing “big location-based data”.

In “DS2: a DHT-based Substrate for Distributed Services”, Lichun Li, Xin Xu, Jun Wang and Wei Wang pay their attention on the massive services deployment in IMS (IP Multimedia Subsystem) system and presents DS2, a DHT-based substrate designed for the application server farms providing distributed services. DS2 facilitates the deployment of DHT-based distributed services based on a powerful data model to manage complex data and it can support application message routing and workload migration. They have implemented the DS2 prototype and used it in ZTE’s system to enable IMS services and service routing services successfully.

“An SSD-Based Accelerator for Directory Parsing in Storage Systems Containing Massive Files” by Zhiguang Chen, Nong Xiao and Fang Liu focuses on big data storage based on SSD ((Solid State Drive) and presents an accelerator, which helps file systems to fetch the metadata of files rapidly. Experimental results demonstrate that, the accelerator can speed up the directory parsing process by nearly four times compared with a file system without an accelerator.

“Handling Partitioning Skew in MapReduce using LEEN” by Shadi Ibrahim et al. focuses on the big data processing issue and presents a novel algorithm named LEEN for locality-aware and fairness-aware key partitioning in MapReduce. This submission has integrated LEEN into Hadoop and its experiments demonstrate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs.

Yan Zhang et al. in “A Novel Cooperative Caching Algorithm for Massive P2P Caches” pay their attention on the massive cache data management among ISPs (Internet Service Provider). They first model the cooperative caching problem as a NP-Complete problem; then they propose a cooperative caching algorithm named cLGV (Cooperative, Lowest Global Value). The cLGV algorithm uses a new concept global value to estimate the benefits of caching or replacing an object in the cooperative caching system.

“Tag-based Personalized Image Ranking in Event Browsing” by Yeqi Lu, Yao Shen and Minyi Guo focuses on big images ranking problem and presents a new tag-based personalized image ranking algorithm in event browsing. The key idea first adopts a local matching model to assign images an original score based on whether this image satisfies users query and preference; then it proposes a global ranking model to take the local scores as initial values and make the salience scores iteratively smooth with respect to all images returned from the events of the query.