Knowledge and Information Systems

, Volume 43, Issue 2, pp 249–280 | Cite as

Finding top-\(k\, r\)-cliques for keyword search from graphs in polynomial delay

Regular Paper


Keyword search over structured data offers an alternative method to explore and query databases for users that are not familiar with the structure of the data and/or a query language. Structured data are usually modeled as graphs. In this context, an answer is a substructure of the graph that contains all or some of the query keywords. In most of the previous works, a minimal tree that covers all the query keywords are found as the answer. Some recent works offer to find subgraphs rather than minimal trees and show that subgraphs might be more informative for the users. However, current methods suffer from the following problems. Although some of the content nodes (i.e., nodes that contain input keywords) are close to each other in an answer, others might be far from each other. While searching for the best answer, current methods explore the whole graph rather than only the content nodes. This might increase the run time and leads to poor performance. To address these problems, we propose to find top-\(k\, r\)-cliques as the answers to the graph keyword search problem. An \(r\)-clique is a set of content nodes that cover all the input keywords, and the distance between each pair of nodes is less than or equal to \(r\). We propose a new weight function that is the sum of distances between each pair of content nodes. We prove that minimizing the new weight function is NP-hard and propose an approximation algorithm that produces \(r\)-cliques with 2-approximation ratio in polynomial delay. We further improve the run time of the approximation algorithm with the cost of increasing the approximation ratio. Extensive performance studies using three large real datasets confirm the efficiency and accuracy of finding \(r\)-cliques in graphs.


Keyword search Graph data Polynomial delay Approximation algorithm 


  1. 1.
    Anagnostopoulos A, Becchetti L, Castillo C, Gionis A, Leonardi S (2012) Online team formation in social networks. In: Proceedings of the WWW’12, pp 839–848Google Scholar
  2. 2.
    Arkin EM, Hassin R (2000) Minimum-diameter covering problems. Networks 36(3):147–155CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MAGoogle Scholar
  4. 4.
    Bhalotia G, Nakhe C, Hulgeri A, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using banks. In: Proceedings of ICDE’02, pp 431–440Google Scholar
  5. 5.
    Dalvi B, Kshirsagar M, Sudarshan S (2008) Keyword search on external memory data graphs. In: Proceedings of VLDB’08, pp 1189–1204Google Scholar
  6. 6.
    Datta S, Majumder A, Naidu K (2012) Capacitated team formation problem on social networks. In: Proceedings of KDD’12, pp 1005–1013Google Scholar
  7. 7.
    Ding B, Yu J, Wang S, Qin L, Zhang X, Lin X (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of ICDE’07, pp 836–845Google Scholar
  8. 8.
    Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y (2010) Graph pattern matching: from intractable to polynomial time. In: Proceedings of VLDB’10, pp 264–275Google Scholar
  9. 9.
    Golenberg K, Kimelfeld B, Sagiv Y (2008) Keyword proximity search in complex data graphs. In: Proceedings of SIGMOD’08, pp 927–940Google Scholar
  10. 10.
    He H, Wang H, Yang J, Yu P (2007) Blinks: ranked keyword searches on graphs. In: Proceedings of SIGMOD’07, pp 305–316Google Scholar
  11. 11.
    Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H (2005) Bidirectional expansion for keyword search on graph databases. In: Proceedings of VLDB’05, pp 505–516Google Scholar
  12. 12.
    Kargar M, An A (2011) Keyword search in graphs: Finding \(r\)-cliques. In: Proceedings of VLDB’11, pp 681–692Google Scholar
  13. 13.
    Kargar M, An A (2012) Efficient top-k keyword search in graphs with polynomial delay. In: Proceedings of ICDE’12, pp 1269–1272Google Scholar
  14. 14.
    Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW (eds) Complexity of computer computations. Plenum, NY, pp 85–103Google Scholar
  15. 15.
    Karypis G, Kumar V (1995) Analysis of multilevel graph partitioning: supercomputing’95. In: Proceedings of the 1995 ACM/IEEE conference on supercomputingGoogle Scholar
  16. 16.
    Koren Y, North SC, Volinsky C (2006) Measuring and extracting proximity in networks. In: Proceedings of KDD’06, pp 245–255Google Scholar
  17. 17.
    Kou L, Markowsky G, Berman L (1981) A fast algorithm for Steiner trees. Acta Inform 15(2):141–145CrossRefMATHMathSciNetGoogle Scholar
  18. 18.
    Lappas T, Liu K, Terzi E (2009) Finding a team of experts in social networks. In: Proceedings of KDD’09, pp 467–475Google Scholar
  19. 19.
    Lawler E (1972) A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Manag Sci 18(7):401–405CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Li G, Ooi BC, Feng J, Wang J, Zhou L (2008) Ease: Efficient and adaptive keyword search on unstructured, semi-structured and structured data. In: Proceedings of SIGMOD’08, pp 903–914Google Scholar
  21. 21.
    Nielsen J (2012) How many test users in a usability study?
  22. 22.
    Park J, Lee S (2011) Keyword search in relational databases. Knowl Inf Syst 26:175–193CrossRefGoogle Scholar
  23. 23.
    Qin L, Yu J, Chang L, Tao Y (2009) Querying communities in relational databases. In: Proceedings of ICDE’09, pp 724–735Google Scholar
  24. 24.
    Vazirani V (2001) Approximation algorithms. Springer, BerlinGoogle Scholar
  25. 25.
    Yu J, Qin L, Chang L (eds) (2010) Keyword search in databases. Morgan and Claypool Publisher, NYGoogle Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringYork UniversityTorontoCanada

Personalised recommendations