Skip to main content
Log in

HyKSS: Hybrid Keyword and Semantic Search

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

Keyword search suffers from a number of issues: ambiguity, synonymy, and an inability to handle semantic constraints. Semantic search helps resolve these issues but is limited by the quality of annotations which are likely to be incomplete or imprecise. Hybrid search, a search technique that combines the merits of both keyword and semantic search, appears to be a promising solution. In this paper we describe and evaluate HyKSS, a hybrid search system driven by extraction ontologies for both annotation creation and query interpretation. For displaying results, HyKSS uses a dynamic ranking algorithm. We show that over data sets of short topical documents, the HyKSS ranking algorithm outperforms both keyword and semantic search in isolation, as well as a number of other non-HyKSS hybrid approaches to ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. In [2], we used HyKSS as our query system to explore cross-language query and search. Here, we focus on HyKSS itself, describing all its features and providing an in-depth statistical evaluation of its performance relative to alternative systems. Except for the necessary background about extraction ontologies and some parts of the running example and its figures, which serve to illustrate the ideas in HyKSS, this paper is new, describing HyKSS and its features in depth and providing a statistical evaluation substantiating our claimed contributions.

  2. http://www.craigslist.org.

  3. http://www.wikipedia.org.

  4. http://lucene.apache.org.

  5. http://deg.byu.edu (in the OntologyWorkbench tool).

  6. See the HyKSS demo at http://deg.byu.edu/demos.

  7. To see a sample of a generated form, enter some query like the running example query to provide context and then click on Advanced Search in the HyKSS demo (http://deg.byu.edu/demos).

  8. In general, logic form identification in natural language is difficult (e.g., see [8]). In our query here, for example, note that “and” denotes disjunction rather than its more common conjunctive denotation.

  9. http://www.openrdf.org.

  10. See http://lucene.apache.org/core/3_6_1/scoring.html for details.

  11. http://www.deg.byu.edu/demos/askontos.

  12. http://provo.craigslist.org.

  13. http://saltlakecity.craigslist.org.

  14. The document sets may have contained duplicate or similar advertisements due to cross posting and re-posting tendencies of craigslist users. We did not check for this, but rather just took the ads as posted.

  15. http://www.wikipedia.org.

  16. http://en.wikipedia.org/wiki/List_of_mountains.

  17. http://en.wikipedia.org/wiki/List_of_roller_coaster_rankings.

  18. We established the keyword and semantic weights as those that maximized mean average precision over the 50 validation documents. To find the keyword weight \(k\), we executed the 100 queries in the training query set using 101 weight combinations (0.0–1.0) and chose the best. The semantic weight \(s\) became \(1\)-k.

  19. See [13] for a full discussion of the principles of ranking in information retrieval.

  20. We began our analysis using a split plot design, but when we examined the interactions among our factors and found that the interactions were not major sources of response variability, we simplified to a block statistical model.

  21. In our experiment, the keyword results in the representative case turned out to be literally constant, but in general they need not be, and indeed the Keyword—Pre-processing approach (\(K_p\)) yielded results that were not quite constant.

  22. http://www.bing.com.

  23. January, 2012.

References

  1. Bhagdev R, Chapman S, Ciravegna F, Lanfranchi V, Petrelli D (2008) Hybrid search: effectively combining keywords and ontology-based searches. In: Proceedings of the 5th European semantic web conference (ESWC’08), Tenerife, Canary Islands, Spain, June 2008, pp 554–568

  2. Embley DW, Liddle SW, Lonsdale DW, Park JS, Shin B-J, Zitzelberger A (2012) Cross-language hybrid keyword and semantic search. In: Proceedings of the 31st international conference on conceptual modeling (ER 2012), Florence, Italy, October 2012, pp 190–203

  3. Embley DW, Zitzelberger A (2010) Theoretical foundations for enabling a web of knowledge. In: Proceedings of the sixth international symposium on foundations of information and knowledge systems (FoIKS’10), Sophia, Bulgaria, February 2010, pp 211–229

  4. Stuckenschmidt H (2012) Data semantics on the web. J Data Seman 1(1):1–9

    Article  Google Scholar 

  5. Embley DW, Campbell DM, Jiang YS, Liddle SW, Lonsdale DW, Ng Y-K, Smith RD (1999) Conceptual-model-based data extraction from multiple-record web pages. Data Knowl Eng 31(3):227–251

    Article  MATH  Google Scholar 

  6. Buitelaar P, Cimiano P, Haase P, Sintek M (2009) Towards linguistically grounded ontologies. In: Proceedings of the 6th European semantic web conference (ESWC’09), Heraklion, Greece, May/June 2009, pp 111–125

  7. Embley DW (1980) Programming with data frames for everyday data items. In: Proceedings of the 1980 national computer conference, Anaheim, California, May 1980, pp 301–305

  8. Rus V (2004) A first evaluation of logic form identification systems. In: Mihalcea R, Edmonds P (eds) Senseval-3: third international workshop on the evaluation of systems for the semantic analysis of text, Barcelona, Spain, March 2004, pp 37–40

  9. Tao C, Embley DW, Liddle SW (2009) FOCIH: form-based ontology creation and information harvesting. In: Proceedings of the 28th international conference on conceptual modeling (ER2009), Gramado, Brazil, November 2009, pp 346–359

  10. Lopez V, Nikolov A, Fernández M, Sabou M, Uren VS, Motta E (2009) Merging and ranking answers in the semantic web: the wisdom of crowds. In: Gómez-Pérez A, Yu Y, Ding Y (eds) ASWC, volume 5926 of lecture notes in computer science. Springer, pp 135–152

  11. Fernandez M, Lopez V, Sabou M, Uren V, Vallet D, Motta E, Castells P (2008) Semantic search meets the web. In: Proceedings of the second IEEE international conference on semantic computing (ICSC‘08), pp 253–260, Santa Clara, California

  12. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  13. Liu T-Y (2011) Learning to rank for information retrieval, 1st edn. Springer, Berlin

    Book  MATH  Google Scholar 

  14. Rencher AC, Christensen WF (2012) Methods of multivariate analysis, 3rd edn. Wiley, Hoboken, NJ

    Book  MATH  Google Scholar 

  15. Castells P, Fernandez M, Vallet D (2007) An adaptation of the vector-space model for ontology-based information retrieval. IEEE Trans Knowl Data Eng 19(2):261–272

    Article  Google Scholar 

  16. Fazzinga B, Gianforme G, Gottlob G, Lukasiewicz T (2010) Semantic web search based on ontological conjunctive queries. In: Proceedings of the sixth international symposium on foundations of information and knowledge systems (FoIKS10), Sophia, Bulgaria, February 2010, pp 153–172

  17. Giannopoulos G, Bikakis N, Dalamagas T, Sellis TK (2010) GoNTogle: a tool for semantic annotation and search. In: Proceedings of the seventh European semantic web conference (ESWC‘10), pp 376–380

  18. Pound J, Ilyas IF, Weddell G (2010) Expressive and flexible access to web-extracted data: a keyword-based structured query language. In: Proceedings of the 2010 international conference on management of data (SIGMOD’10), Indianapolis, Indiana, June 2010, pp 423–434

  19. Wang H, Tran T, Liu C (2008) CE\(^2\): towards a large scale hybrid search engine with integrated ranking support. In: Proceedings of the 17th ACM conference on information and knowledge management (CIKM’08), Napa Valley, California, October 2008, pp 1323–1324

  20. Zhang L, Liu Q, Zhang J, Wang H, Pan Y, Yu Y (2007) Semplore: an IR approach to scalable hybrid query of semantic web data. In: Proceedings of the 6th international semantic web conference and 2nd Asian semantic web conference (ISWC/ASWC’07), Busan, Korea, November 2007, pp 652–665

  21. Lopez V, Uren V, Sabou MR, Motta E (2009) Cross ontology query answering on the semantic web: an initial evaluation. In: Proceedings of the fifth international conference on knowledge capture (K-CAP’09), Redondo Beach, California, September 2009, pp 17–24

  22. Lopez V, Fernández M, Motta E, Stieler N (2012) Poweraqua: supporting users in querying and exploring the semantic web. Seman Web 3(3):249–265

    Google Scholar 

  23. Damljanovic D, Agatonovic M, Cunningham H (2010) Natural language interfaces to ontologies: Combining syntactic analysis and ontology-based lookup through the user interaction. In: Proceedings of the 7th extended semantic web conference (ESWC10), Heraklion, Greece, May/June 2010, pp 106–120

  24. Egozi O, Markovitch S, Gabrilovich E (2011) Concept-based information retrieval using explicit semantic analysis. ACM Trans Inf Syst 29(2):1–34

    Article  Google Scholar 

  25. Lopez V, Uren VS, Sabou M, Motta E (2011) Is question answering fit for the semantic web?: a survey. Semant Web 2(2):125–155

    Google Scholar 

  26. He H, Wang H, Yang J, Yu PS (2007) Blinks: ranked keyword searches on graphs. In: Chan CY, Ooi BC, Zhou A (eds) SIGMOD Conference. ACM, pp 305–316

  27. Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: Ioannidis YE, Lee DL, Ng RT (eds) ICDE. IEEE, pp 405–416

  28. De Virgilio R, Maccioni A, Cappellari P (2013) A linear and monotonic strategy to keyword search over rdf data. In: Daniel F, Dolog P, Li Q (eds) ICWE, volume 7977 of lecture notes in computer science. Springer, pp 338–353

  29. Lonsdale DW, Embley DW, Ding Y, Xu L, Hepp M (2009) Reusing ontologies and language components for ontology generation. Data Knowl Eng 69(4):318–330

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David W. Embley.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zitzelberger, A.J., Embley, D.W., Liddle, S.W. et al. HyKSS: Hybrid Keyword and Semantic Search. J Data Semant 4, 213–229 (2015). https://doi.org/10.1007/s13740-014-0046-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-014-0046-4

Keywords

Navigation