Advertisement

Data Mining and Knowledge Discovery

, Volume 31, Issue 5, pp 1189–1217 | Cite as

Micro-review synthesis for multi-entity summarization

  • Thanh-Son Nguyen
  • Hady W. Lauw
  • Panayiotis Tsaparas
Article
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2017

Abstract

Location-based social networks (LBSNs), exemplified by Foursquare, are fast gaining popularity. One important feature of LBSNs is micro-review. Upon check-in at a particular venue, a user may leave a short review (up to 200 characters long), also known as a tip. These tips are an important source of information for others to know more about various aspects of an entity (e.g., restaurant), such as food, waiting time, or service. However, a user is often interested not in one particular entity, but rather in several entities collectively, for instance within a neighborhood or a category. In this paper, we address the problem of summarizing the tips of multiple entities in a collection, by way of synthesizing new micro-reviews that pertain to the collection, rather than to the individual entities per se. We formulate this problem in terms of first finding a representation of the collection, by identifying a number of “aspects” that link common threads across two or more entities within the collection. We express these aspects as dense subgraphs in a graph of sentences derived from the multi-entity corpora. This leads to a formulation of maximal multi-entity quasi-cliques, as well as a heuristic algorithm to find K such quasi-cliques maximizing the coverage over the multi-entity corpora. To synthesize a summary tip for each aspect, we select a small number of sentences from the corresponding quasi-clique, balancing conciseness and representativeness in terms of a facility location problem. Our approach performs well on collections of Foursquare entities based on localities and categories, producing more representative and diverse summaries than the baselines.

Keywords

Micro-review synthesis Multi-entity summarization Maximal quasi-clique 

References

  1. Abello J, Resende MG, Sudarsky S (2002) Massive quasi-clique detection. In: Latin American symposium on theoretical informatics, pp 598–612. SpringerGoogle Scholar
  2. Akkoyunlu E (1973) The enumeration of maximal cliques of large graphs. SIAM J Comput 2(1):1–6MathSciNetCrossRefMATHGoogle Scholar
  3. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022MATHGoogle Scholar
  4. Bogdanov P, Baumer B, Basu P, Bar-Noy A, Singh AK (2013) As strong as the weakest link: mining diverse cliques in weighted graphs. In: Joint European conference on machine learning and knowledge discovery in databases, pp 525–540. SpringerGoogle Scholar
  5. Brunato M, Hoos HH, Battiti R (2007) On effectively finding maximal quasi-cliques in graphs. In: International conference on learning and intelligent optimization, pp 41–55. SpringerGoogle Scholar
  6. Chong W-H, Dai BT, Lim E-P (2015) Did you expect your users to say this? Distilling unexpected micro-reviews for venue owners. In: Proceedings of the 26th ACM conference on hypertext and social media, pp 13–22. ACMGoogle Scholar
  7. Cohen R, Katzir L (2008) The generalized maximum coverage problem. Inf Process Lett 108(1):15–22MathSciNetCrossRefMATHGoogle Scholar
  8. Cornuéjols G, Nemhauser GL, Wolsey LA (1983) The uncapacitated facility location problem. Technical report, Defense Technical Information Center (DTIC) DocumentGoogle Scholar
  9. Dawande M, Keskinocak P, Swaminathan JM, Tayur S (2001) On bipartite and multipartite clique problems. J Algorithms 41(2):388–403MathSciNetCrossRefMATHGoogle Scholar
  10. Ference G, Ye M, Lee W-C (2013) Location recommendation for out-of-town users in location-based social networks. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 721–726. ACMGoogle Scholar
  11. Filippova K (2010) Multi-sentence compression: finding shortest paths in word graphs. In: Proceedings of the 23rd international conference on computational linguistics, pp 322–330. Association for Computational LinguisticsGoogle Scholar
  12. Ganesan K, Zhai C, Han J (2010) Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd international conference on computational linguistics, pp 340–348. Association for Computational LinguisticsGoogle Scholar
  13. Hochbaum DS (1982) Heuristics for the fixed cost median problem. Math Program 22(1):148–162MathSciNetCrossRefMATHGoogle Scholar
  14. Jiang D, Pei J (2009) Mining frequent cross-graph quasi-cliques. ACM Trans Knowl Discov Data 2(4):16CrossRefGoogle Scholar
  15. Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (eds) Complexity of computer computations. Springer US, pp 85–103Google Scholar
  16. Kim HD, Zhai C (2009) Generating comparative summaries of contradictory opinions in text. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 385–394. ACMGoogle Scholar
  17. Kirkpatrick S, Gelatt CD, Vecchi MP et al (1983) Optimization by simulated annealing. Science 220(4598):671–680MathSciNetCrossRefMATHGoogle Scholar
  18. Lappas T, Crovella M, Terzi E (2012) Selecting a characteristic set of reviews. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 832–840. ACMGoogle Scholar
  19. Lappas T, Gunopulos D (2010) Efficient confident search in large review corpora. In: Joint European conference on machine learning and knowledge discovery in databases, pp 195–210. SpringerGoogle Scholar
  20. Lindqvist J, Cranshaw J, Wiese J, Hong J, Zimmerman J (2011) I’m the mayor of my house: examining why people use foursquare—a social-driven location sharing application. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 2409–2418. ACMGoogle Scholar
  21. Liu G, Wong L (2008) Effective pruning techniques for mining quasi-cliques. In: Joint European conference on machine learning and knowledge discovery in databases, pp 33–49. SpringerGoogle Scholar
  22. Lu Y, Tsaparas P, Ntoulas A, Polanyi L (2010) Exploiting social context for review quality prediction. In: Proceedings of the 19th international conference on world wide web, pp 691–700. ACMGoogle Scholar
  23. Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  24. Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. In: Proceedings of the conference on empirical methods in natural language processing, pp 404–411. Association for Computational LinguisticsGoogle Scholar
  25. Nguyen T-S, Lauw HW, Tsaparas P (2015) Review selection using micro—reviews. IEEE Trans Knowl Data Eng 27(4):1098–1111CrossRefGoogle Scholar
  26. Nguyen T-S, Lauw HW, Tsaparas P (2015) Review synthesis for micro-review summarization. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 169–178. ACMGoogle Scholar
  27. Noulas A, Scellato S, Mascolo C, Pontil M (2011) An empirical study of geographic user activity patterns in Foursquare. Int Conf Weblogs Soc Media 11:70–573Google Scholar
  28. Pajouh FM, Miao Z, Balasundaram B (2014) A branch-and-bound approach for maximum quasi-cliques. Ann Oper Res 216(1):145–161MathSciNetCrossRefMATHGoogle Scholar
  29. Pattillo J, Veremyev A, Butenko S, Boginski V (2013) On the maximum quasi-clique problem. Discrete Appl Math 161(1):244–257MathSciNetCrossRefMATHGoogle Scholar
  30. Paul MJ, Zhai C, Girju R (2010) Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 66–76. Association for Computational LinguisticsGoogle Scholar
  31. Pontes T, Vasconcelos M, Almeida J, Kumaraguru P, Almeida V (2012) We know where you live: privacy characterization of foursquare behavior. In: Proceedings of the 2012 ACM conference on ubiquitous computing, pp 898–905. ACMGoogle Scholar
  32. Radev DR, Jing H, Styś M, Tam D (2004) Centroid-based summarization of multiple documents. Inf Process Manag 40(6):919–938CrossRefMATHGoogle Scholar
  33. Shmoys DB, Tardos É, Aardal K (1997) Approximation algorithms for facility location problems. In: Proceedings of the twenty-ninth annual ACM symposium on theory of computing, pp 265–274. ACMGoogle Scholar
  34. Sipos R, Joachims T (2013) Generating comparative summaries from reviews. In: Proceedings of the 22nd ACM international conference on conference on information and knowledge management, pp 1853–1856. ACMGoogle Scholar
  35. Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1088–1096. ACMGoogle Scholar
  36. Titov I, McDonald R (2008) Modeling online reviews with multi-grain topic models. In: Proceedings of the 17th international conference on World Wide Web, pp 111–120. ACMGoogle Scholar
  37. Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–176. ACMGoogle Scholar
  38. Tsourakakis C, Bonchi F, Gionis A, Gullo F, Tsiarli M (2013) Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 104–112. ACMGoogle Scholar
  39. Uno T (2010) An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica 56(1):3–16MathSciNetCrossRefMATHGoogle Scholar
  40. Vasconcelos M, Almeida JM, Gonçalves MA (2015) Predicting the popularity of micro-reviews: a foursquare case study. Inf Sci 325:355–374CrossRefGoogle Scholar
  41. Wan X, Yang J, Xiao J (2007) Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. Assoc Comput Linguist 7:552–559Google Scholar
  42. Wang J, Cheng J, Fu AW-C (2013) Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 122–130. ACMGoogle Scholar
  43. Yerva SR, Grosan FA, Tandrau AO, Aberer K (2013) Tripeneer: User-based travel plan recommendation application. In: 7th international AAAI conference on weblogs and social media, number EPFL-CONF-185877Google Scholar
  44. Yu Z, Feng Y, Xu H, Zhou X (2014) Recommending travel packages based on mobile crowdsourced data. IEEE Commun Mag 52(8):56–62CrossRefGoogle Scholar
  45. Zeng Z, Wang J, Zhou L, Karypis G (2006) Coherent closed quasi-clique discovery from large dense graph databases. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 797–802. ACMGoogle Scholar
  46. Zhai C, Velivelli A, Yu B (2004) A cross-collection mixture model for comparative text mining. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 743–748. ACMGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  • Thanh-Son Nguyen
    • 1
  • Hady W. Lauw
    • 1
  • Panayiotis Tsaparas
    • 2
  1. 1.Singapore Management UniversitySingaporeSingapore
  2. 2.University of IoanninaIoanninaGreece

Personalised recommendations