Advertisement

Knowledge and Information Systems

, Volume 50, Issue 3, pp 883–916 | Cite as

Top-k coupled keyword recommendation for relational keyword queries

  • Xiangfu MengEmail author
  • Longbing Cao
  • Xiaoyan Zhang
  • Jingyu Shao
Regular Paper
  • 337 Downloads

Abstract

Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.

Keywords

Web database Keyword query Coupling relationship Typicality estimation Top-k selection 

Notes

Acknowledgments

This work is supported by the National Science Foundation for Young Scientists of China (No. 61003162) and the Young Scholars Growth Plan of Liaoning (No. LJQ2013038).

References

  1. 1.
    Aditya B, Bhalotia G, Chakrabarti S (2002) Banks: browsing and keyword searching in relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 1083–1086Google Scholar
  2. 2.
    Agrawal R, Rantzau R, Terzi E (2006) Context-sensitive ranking. In: Proceedings of the ACM SIGMOD Conference. ACM, Chicago, pp 383–394Google Scholar
  3. 3.
    Agrawal S, Chaudhuri S, Das G (2002) Dbxplorer: a system for keyword-based search over relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 5–16Google Scholar
  4. 4.
    AlSumait L, Domeniconi C (2008) Text clustering with local semantic kernels. In: Berry M, Castellanos M (eds) Survey of text mining II. Springer, London, pp 87–105CrossRefGoogle Scholar
  5. 5.
    Bao Z-F, Lu J-H, Ling T-W (2010) Xreal: an interactive xml keyword searching. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, Toronto, pp 1933–1934Google Scholar
  6. 6.
    Bergamaschi S, Domnori E, Guerra F (2011) Keyword search over relational databases: a metadata approach. In: Proceedings of the ACM SIGMOD Conference. ACM, Athens, pp 565–576Google Scholar
  7. 7.
    Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International World Wide Web Conference. ACM, Banff, pp 757–786Google Scholar
  8. 8.
    Boldi P, Bonchi F, Castillo C et al (2009) Query suggestions using query flow graphs. In: Proceedings of the ACM Workshop on web Search Click Data. ACM, Barcelona, pp 56–63Google Scholar
  9. 9.
    Billhardt H, Borrajo D, Maojo V (1990) A context vector model for information retrieval. J Am Soci Inf Sci 41(6):391–407CrossRefGoogle Scholar
  10. 10.
    Cao L-B, Ou Y-M, Yu P-S (2012) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24(8):1378–1392CrossRefGoogle Scholar
  11. 11.
    Chen Z-Y, Li T (2007) Addressing diverse user preferences in sql-query-result navigation. In: Proceedings of the ACM SIGMOD Conference. ACM, Beijing, pp 641–652Google Scholar
  12. 12.
    Cheng X, Miao D-Q, Wang C et al (2013) Coupled term-term relation analysis for document clustering. In: Proceedings of the international joint conference on neural networks. IEEE, Dallas, pp 1–8Google Scholar
  13. 13.
    Cao G, Nie J, Bai J (2005) Integrating word relationships into language models. In: Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Salvador, pp 298–305Google Scholar
  14. 14.
    Das G, Gunopulos D, Koudas N (2006) Answering top-k queries using views. In: Proceedings of the 32nd international conference on very large data bases. ACM, Seoul, pp 451–462Google Scholar
  15. 15.
    Ding B, Yu J-X, Wang S (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of the 23rd international conference on data engineering. IEEE, Istanbul, pp 468–477Google Scholar
  16. 16.
    Deerwester S, Dumais S, Furnas G et al (1990) Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6):391–407CrossRefGoogle Scholar
  17. 17.
    Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Guisado-Gamez J, Prat-Perez A (2015) Understanding graph structure of Wikipedia for query expansion. In: Proceedings of the ACM SIGMOD international workshop on graph data management experiences and systems. ACM, Melbourne, pp 1–6Google Scholar
  19. 19.
    Hristidis V, Gravano L, Papakonstantinou Y (2003) Efficient ir-style keyword search over relational databases. In: Proceedings of the 29th international conference on very large data bases. ACM, Berlin, pp 850–861Google Scholar
  20. 20.
    Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 670–681Google Scholar
  21. 21.
    Huang A, Milne D, Frank E (2009) Clustering documents using a Wikipedia-based concept representation. In: Theeramunkong T, Kijsirikul B, Cercone N, HoAdvances T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 628–636CrossRefGoogle Scholar
  22. 22.
    Hua M, Pei J, Fu A-W-C et al (2009) Top-k typicality queries and efficient query answering methods on large databases. VLDB J 18:809–835CrossRefGoogle Scholar
  23. 23.
    Kong L-B, Gilleron R, Lemay A (2009) Retrieving meaningful relaxed tightest fragments for xml keyword search. In: Proceedings of the 12th international conference on extending database technology. ACM, Saint-Petersburg, pp 815–826Google Scholar
  24. 24.
    Luo Y, Lin X-M, Wang W (2007) Spark: top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD Conference. ACM, Beijing, pp 305-316Google Scholar
  25. 25.
    Li G-L, Feng J-Y, Zhou L-Z (2008) Retune: retrieving and materializing tuple units for effective keyword search over relational databases. In: Proceedings of the ER Conference. Springer, Barcelona, pp 469–483Google Scholar
  26. 26.
    Qumsiyeh R, Ng Y-K (2014) Assisting web search using query suggestion based on word similarity measure and query modification patterns. J World Wide Web 17(5):1141–1160CrossRefGoogle Scholar
  27. 27.
    Sarkas N, Bansal N, Bansal G (2009) Measure-driven keyword query expansion. In: Proceedings of the 35th international conference on very large data bases. ACM, Lyon, pp 121–132Google Scholar
  28. 28.
    Scott D-W, Sain S-R (2004) Multi-dimensional density estimation. In: Rao CR, Wegman EJ, Solka JL (eds) Handbook of statistics: data mining and data visualization. Elsevier, North Holland, pp 229–261Google Scholar
  29. 29.
    Tata S, Lohman G-M (2008) Sqak: doing more with keywords. In: Proceedings of the 34th international conference on very large data bases. ACM, Auckland, pp 889–902Google Scholar
  30. 30.
    Wang C, Cao L-B, Wang M-C (2011) Coupled nominal similarity in unsupervised learning. In: Proceedings of the ACM international conference on information and knowledge management. ACM, Glasgow, pp 973–978Google Scholar
  31. 31.
    Wang C, She Z, Cao L-B (2013) Coupled clustering ensemble: incorporating coupling relationships both between base clusterings and objects. In: Proceedings of the international conference on data engineering. IEEE, Brisbane, pp 374–385Google Scholar
  32. 32.
    Wang X, Sukthankar G (2013) Multi-label relational neighbor classification using social context features. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Chicago, pp 464–472Google Scholar
  33. 33.
    Wong S, Ziarko W, Wong P (1985) Generalized vector spaces model in information retrieval. In: Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval. ACM, Montreal, pp 18–25Google Scholar
  34. 34.
    Yao J-J, Cui B, Hua L-S (2012) Keyword query reformulation on structured data. In: Proceedings of the 28th international conference on data engineering. IEEE, Arlington, pp 953–964Google Scholar
  35. 35.
    Yu A, Agarwal P-K, Yang J (2014) Top-k preferences in high dimensions. In: Proceedings of the 30th international conference on data engineering. IEEE, Chicago, pp 748–759Google Scholar
  36. 36.
    Zhou R, Liu C-F, Li J-X (2010) Fast elca computation for keyword queries on xml data. In: Proceedings of the 13th international conference on extending database technology. Lausanne, pp 549--560Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Xiangfu Meng
    • 1
    Email author
  • Longbing Cao
    • 2
  • Xiaoyan Zhang
    • 1
  • Jingyu Shao
    • 2
  1. 1.College of Electronic and Information EngineeringLiaoning Technical UniversityHuludaoChina
  2. 2.Advanced Analytics InstituteUniversity of Technology SydneySydneyAustralia

Personalised recommendations