The VLDB Journal

, Volume 25, Issue 6, pp 791–816 | Cite as

Diverse and proportional size-l object summaries using pairwise relevance

Regular Paper


The abundance and ubiquity of graphs (e.g., online social networks such as Google\(+\) and Facebook; bibliographic graphs such as DBLP) necessitates the effective and efficient search over them. Given a set of keywords that can identify a data subject (DS), a recently proposed keyword search paradigm produces a set of object summaries (OSs) as results. An OS is a tree structure rooted at the DS node (i.e., a node containing the keywords) with surrounding nodes that summarize all data held on the graph about the DS. OS snippets, denoted as size-l OSs, have also been investigated. A size-l OS is a partial OS containing l nodes such that the summation of their importance scores results in the maximum possible total score. However, the set of nodes that maximize the total importance score may result in an uninformative size-l OSs, as very important nodes may be repeated in it, dominating other representative information. In view of this limitation, in this paper, we investigate the effective and efficient generation of two novel types of OS snippets, i.e., diverse and proportional size-l OSs, denoted as DSize-l and PSize-l OSs. Namely, besides the importance of each node, we also consider its pairwise relevance (similarity) to the other nodes in the OS and the snippet. We conduct an extensive evaluation on two real graphs (DBLP and Google\(+\)). We verify effectiveness by collecting user feedback, e.g., by asking DBLP authors (i.e., the DSs themselves) to evaluate our results. In addition, we verify the efficiency of our algorithms and evaluate the quality of the snippets that they produce.


Keyword search Diversity Proportionality Snippets Summaries 



Georgios Fakas was supported by GRF Grant 617412 from Hong Kong RGC. Zhi Cai was supported by Research Foundation of Beijing Municipal Education Commission Grant KM201610005022 and Natural Science Foundation of China Grant 91546111.

Supplementary material

778_2016_433_MOESM1_ESM.pdf (1.1 mb)
Supplementary material 1 (pdf 1151 KB)


  1. 1.
    Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)Google Scholar
  2. 2.
    Albert, A., Koudas, N.: Efficient diversity-aware search. In: SIGMOD, pp. 781–792 (2011)Google Scholar
  3. 3.
    Balmin, A., Hristidis, V., Papakonstantinou, Y.: Objectrank: authority-based keyword search in databases. In: VLDB, pp. 564–575 (2004)Google Scholar
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW, pp. 107–117 (1998)Google Scholar
  5. 5.
    Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)Google Scholar
  6. 6.
    Cheng, G., Tran, T., Qu, Y.: Relin: Relatedness and informativeness-based centrality for entity summarization. In: The Semantic Web-ISWC, pp. 114–129 (2011)Google Scholar
  7. 7.
    Cheng, S., Arvanitis, A., Chrobak, M., Hristidis, V.: Multi-query diversification in microblogging posts. In: EDBT, pp. 133–144 (2014)Google Scholar
  8. 8.
    Dang, V., Croft, W.B.: Diversity by proportionality: an election-based approach to search result diversification. In: SIGIR, pp. 65–74 (2012)Google Scholar
  9. 9.
    Dimitriou, A., Theodoratos, D., Sellis, T.: Top-\(k\)-size keyword search on tree structured data. Inf. Syst. 47, 178–193 (2015)Google Scholar
  10. 10.
    Drosou, M., Pitoura, E.: Disc diversity: result diversification based on dissimilarity and coverage. PVLDB 6(1), 13–24 (2012)MathSciNetGoogle Scholar
  11. 11.
    Drosou, M., Pitoura, E.: The disc diversity model. In: EDBT/ICDT Workshops, pp. 173–175 (2014)Google Scholar
  12. 12.
    Fakas, G.J.: Automated generation of object summaries from relational databases: a novel keyword searching paradigm. In: DBRank, ICDE, pp. 564 – 567 (2008)Google Scholar
  13. 13.
    Fakas, G.J.: A novel keyword search paradigm in relational databases: object summaries. DKE 70(2), 208–229 (2011)CrossRefGoogle Scholar
  14. 14.
    Fakas, G.J., Cai, Z.: Ranking of object summaries. In: DBRank ’08, ICDE, pp. 1580–1583 (2009)Google Scholar
  15. 15.
    Fakas, G.J., Cai, Z., Mamoulis, N.: Size-\(l\) object summaries for relational keyword search. PVLDB 5(3), 229–240 (2011)Google Scholar
  16. 16.
    Fakas, G.J., Cai, Z., Mamoulis, N.: Versatile size-\(l\) object summaries for relational keyword search. TKDE 26(4), 1026–1038 (2014)Google Scholar
  17. 17.
    Fakas, G.J., Cai, Z., Mamoulis, N.: Diverse and proportional size-\(l\) object summaries for keyword search. In: SIGMOD, pp. 363–375 (2015)Google Scholar
  18. 18.
    Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: WWW, pp. 381–390 (2009)Google Scholar
  19. 19.
    Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)Google Scholar
  20. 20.
    Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)Google Scholar
  21. 21.
    Huang, Y., Liu, Z., Chen, Y.: Query biased snippet generation in xml search. In: SIGMOD, pp. 315–326 (2008)Google Scholar
  22. 22.
    Kashyap, A., Hristidis, V.: Logrank: Summarizing social activity logs. In: WebDB, pp. 1–6 (2012)Google Scholar
  23. 23.
    Koutrika, G., Simitsis, A., Ioannidis, Y.: Précis: The essence of a query answer. In: ICDE, pp. 69–79 (2006)Google Scholar
  24. 24.
    Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: Top-\(k\) keyword query in relational databases. In: SIGMOD, pp. 115–126 (2007)Google Scholar
  25. 25.
    Simitsis, A., Koutrika, G., Ioannidis, Y.: Précis: from unstructured keywords as queries to structured databases as answers. The VLDB Journal 17(1), 117–149 (2008)CrossRefGoogle Scholar
  26. 26.
    Sydow, M., Pikula, M., Schenkel, R.: The notion of diversity in graphical entity summarisation on semantic knowledge graphs. J. Intell. Inf. Syst. 10(2), 1–41 (2013)Google Scholar
  27. 27.
    Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: SIGIR, pp. 127–134 (2007)Google Scholar
  28. 28.
    Vieira, M.R., Razente, H.L., Barioni, M.C.N., Hadjieleftheriou, M., Srivastava, D., Traina, C., Tsotras, V.J.: On query result diversification. In: ICDE, pp. 1163–1174 (2011)Google Scholar
  29. 29.
    Wu, L., Wang, Y., Shepherd, J., Zhao, X.: An optimization method for proportionally diversifying search results. Adv. Knowl. Discov. Data Min. 70(2), 390–401 (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringHong Kong University of Science and TechnologyClear Water BayHong Kong
  2. 2.College of Computer ScienceBeijing University of TechnologyBeijingChina
  3. 3.Department of Computer ScienceUniversity of Hong KongPokfulamHong Kong

Personalised recommendations