Top-k temporal keyword search over social media data


Social media services have already become main sources for monitoring emerging topics and sensing real-life events. A social media platform manages social stream consisting of a huge volume of timestamped user generated data, including original data and repost data. However, previous research on keyword search over social media data mainly emphasizes on the recency of information. In this paper, we first propose a problem of top-k most significant temporal keyword query to enable more complex query analysis. It returns top-k most popular social items that contain the keywords in the given query time window. Then, we design a temporal inverted index with two-tiers posting list to index social time series and a segment store to compute the exact social significance of social items. Next, we implement a basic query algorithm based on our proposed index structure and give a detailed performance analysis on the query algorithm. From the analysis result, we further refine our query algorithm with a piecewise maximum approximation (PMA) sketch. Finally, extensive empirical studies on a real-life microblog dataset demonstrate the combination of two-tiers posting list and PMA sketch achieves remarkable performance improvement under different query settings.

This work is partially supported by National High-tech R&D Program (863 Program) under grant number 2015AA015307, and National Science Foundation of China under grant numbers 61432006 and 61672232.

Xia, F., Yu, C., Xu, L. et al. Top-k temporal keyword search over social media data. World Wide Web 20, 1049–1069 (2017).

  • Social media
  • Temporal keyword query
  • Top-k query