Skip to main content
Log in

Exploring market competition over topics in spatio-temporal document collections

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

With the prominence of location-based services and social networks in recent years, huge amounts of spatio-temporal document collections (e.g., geo-tagged tweets) have been generated. These data collections often imply user’s ideas on different products and thus are helpful for business owners to explore hot topics of their brands and the competition relation to other brands in different spatial regions during different periods. In this work, we aim to mine the topics and the market competition of different brands over each topic for a category of business (e.g., coffeehouses) from spatio-temporal documents within a user-specified region and time period. To support such spatio-temporal search online in an exploratory manner, we propose a novel framework equipped by (1) a generative model for mining topics and market competition, (2) an Octree-based off-line pre-training method for the model and (3) an efficient algorithm for combining pre-trained models to return the topics and market competition on each topic within a user-specified pair of region and time span. Extensive experiments show that our framework is able to improve the runtime by up to an order of magnitude compared with baselines while achieving similar model quality in terms of training log-likelihood.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. https://about.twitter.com/company (accessed date: Feb 9th, 2016).

  2. We collect the data using the Streaming API of Twitter which continuously returns a 1% sample of tweets.

  3. Empirically, \(s=3\) is enough in our experiments.

  4. https://www.yelp.com/dataset/challenge.

  5. https://lucene.apache.org/.

  6. Training the models in different cells are independent and thus can be fully parallelized with multi-threading.

  7. https://radimrehurek.com/gensim/models/coherencemodel.html.

References

  1. Ahmed, A., Aly, M., Gonzalez, J., Narayanamurthy, S., Smola, A.J.: Scalable inference in latent variable models. In: WSDM, pp. 123–132 (2012)

  2. AlSumait, L., Barbará, D., Domeniconi, C.: On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM, pp. 3–12 (2008)

  3. Angel, A., Koudas, N., Sarkas, N., Srivastava, D.: What’s on the grapevine? In: SIGMOD, pp. 1047–1050 (2009)

  4. Archak, N., Ghose, A., Ipeirotis, P.G.: Show me the money!: Deriving the pricing power of product features by mining consumer reviews. In: KDD, pp. 56–65 (2007)

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Ding, B., Zhao, B., Lin, C.X., Han, J., Zhai, C., Srivastava, A.N., Oza, N.C.: Efficient keyword-based search for top-k cells in text cube. IEEE Trans. Knowl. Data Eng. 23(12), 1795–1810 (2011)

    Article  Google Scholar 

  7. Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the l 1-ball for learning in high dimensions. In: ICML, pp. 272–279 (2008)

  8. Feng, W., Zhang, C., Zhang, W., Han, J., Wang, J., Aggarwal, C., Huang, J.: STREAMCUBE: hierarchical spatio-temporal hashtag clustering for event exploration over the twitter stream. In: ICDE, pp. 1561–1572 (2015)

  9. Glance, N., Hurst, M., Nigam, K., Siegler, M., Stockton, R., Tomokiyo, T.: Deriving marketing intelligence from online discussion. In: KDD, pp. 419–428 (2005)

  10. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Burlington (2000)

    MATH  Google Scholar 

  11. Hong, L., Ahmed, A., Gurumurthy, S., Smola, A.J., Tsioutsiouliklis, K.: Discovering geographical topics in the twitter stream. In: WWW, pp. 769–778 (2012)

  12. Hong, L., Convertino, G., Chi, E.H.: Language matters in twitter: a large scale study. In: Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain (2011)

  13. Li, G., Hu, J., Feng, J., Tan, K.: Effective location identification from microblogs. In: ICDE, pp. 880–891 (2014)

  14. Lin, C.X., Ding, B., Han, J., Zhu, F., Zhao, B.: Text cube: computing IR measures for multidimensional text database analysis. In: ICDM, pp. 905–910 (2008)

  15. Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: SIGMOD, pp. 1155–1158 (2010)

  16. Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)

  17. Morstatter, F., Pfeffer, J., Liu, H.: When is it biased? Assessing the representativeness of twitter’s streaming api. In: WWW, WWW ’14 Companion, pp. 555–556 (2014)

  18. Sarkas, N., Angel, A., Koudas, N., Srivastava, D.: Efficient identification of coupled entities in document collections. In: ICDE, pp. 769–772 (2010)

  19. Simitsis, A., Baid, A., Sismanis, Y., Reinwald, B.: Multidimensional content exploration. PVLDB 1(1), 660–671 (2008)

    Google Scholar 

  20. Sizov, S.: Geofolk: latent spatial semantics in web 2.0 social media. In: WSDM, pp. 281–290 (2010)

  21. Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. PVLDB 3(1–2), 703–710 (2010)

    Google Scholar 

  22. Strötgen, J., Gertz, M.: Timetrails: a system for exploring spatio-temporal information in documents. PVLDB 3(2), 1569–1572 (2010)

    Google Scholar 

  23. Strötgen, J., Gertz, M., Popov, P.: Extraction and exploration of spatio-temporal information in documents. In: GIR (2010)

  24. Wang, Y., Bai, H., Stanton, M., Chen, W.Y., Chang, E.Y.: Plda: Parallel latent dirichlet allocation for large-scale applications. In: AAIM, pp. 301–314 (2009)

  25. Wu, S., Rand, W., Raschid, L.: Recommendations in social media for brand monitoring. In: RecSys, pp. 345–348 (2011)

  26. Xu, K., Liao, S.S., Li, J., Song, Y.: Mining comparative opinions from customer reviews for competitive intelligence. Decis. Support Syst. 50(4), 743–754 (2011)

    Article  Google Scholar 

  27. Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI, pp. 353–359 (2015)

  28. Yao, L., Mimno, D., McCallum, A.: Efficient methods for topic model inference on streaming document collections. In: KDD, pp. 937–946 (2009)

  29. Yin, H., Cui, B., Chen, L., Hu, Z., Huang, Z.: A temporal context-aware model for user behavior modeling in social media systems. In: SIGMOD, pp. 1543–1554 (2014)

  30. Yin, Z., Cao, L., Han, J., Zhai, C., Huang, T.: Geographical topic discovery and comparison. In: WWW, pp. 247–256 (2011)

  31. Yuan, Q., Cong, G., Ma, Z., Sun, A., Thalmann, N.M.: Who, where, when and what: discover spatio-temporal topics for twitter users. In: SIGKDD, pp. 605–613 (2013)

  32. Zhang, D., Zhai, C., Han, J.: Topic cube: topic modeling for OLAP on multidimensional text databases. In: SDM, pp. 1124–1135 (2009)

  33. Zhang, D., Zhai, C., Han, J.: Mitexcube: microtextcluster cube for online analysis of text cells. In: CIDU, pp. 204–218 (2011)

  34. Zhang, D., Zhai, C., Han, J., Srivastava, A.N., Oza, N.C.: Topic modeling for OLAP on multidimensional text databases: topic cube and its applications. Stat. Anal. Data Min. 2(5–6), 378–395 (2009)

    Article  MathSciNet  Google Scholar 

  35. Zhang, H., Kim, G., Xing, E.P.: Dynamic topic modeling for monitoring market competition from online text and image data. In: KDD, pp. 1425–1434 (2015)

  36. Zhao, B., Lin, C.X., Ding, B., Han, J.: Texplorer: keyword-based object search and exploration in multidimensional text databases. In: CIKM, pp. 1709–1718 (2011)

  37. Zhao, K., Chen, L., Cong, G.: Topic exploration in spatio-temporal document collections. In: SIGMOD, pp. 985–998 (2016)

  38. Zhao, K., Cong, G., Yuan, Q., Zhu, K.Q.: SAR: a sentiment-aspect-region model for user preference analysis in geo-tagged reviews. In: ICDE, pp. 675–686 (2015)

  39. Zhu, C., Zhu, H., Xiong, H., Ding, P., Xie, F.: Recruitment market trend analysis with sequential latent variable models. In: KDD, pp. 383–392 (2016)

  40. Zhu, J., Xing, E.P.: Sparse topical coding. CoRR arXiv:1202.3778 (2012)

Download references

Acknowledgements

This work was supported in part by a MOE Tier-2 grant MOE2016-T2-1-137, a MOE Tier-1 grant RG31/17, and NSFC under the grant 61772537. It was also partially supported under the A*STAR TSRP fund 1424200021.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaiqi Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, K., Cong, G., Chin, JY. et al. Exploring market competition over topics in spatio-temporal document collections. The VLDB Journal 28, 123–145 (2019). https://doi.org/10.1007/s00778-018-0522-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-018-0522-9

Keywords

Navigation