Abstract
Keyword search suffers from a number of issues: ambiguity, synonymy, and an inability to handle semantic constraints. Semantic search helps resolve these issues but is limited by the quality of annotations which are likely to be incomplete or imprecise. Hybrid search, a search technique that combines the merits of both keyword and semantic search, appears to be a promising solution. In this paper we describe and evaluate HyKSS, a hybrid search system driven by extraction ontologies for both annotation creation and query interpretation. For displaying results, HyKSS uses a dynamic ranking algorithm. We show that over data sets of short topical documents, the HyKSS ranking algorithm outperforms both keyword and semantic search in isolation, as well as a number of other non-HyKSS hybrid approaches to ranking.
Similar content being viewed by others
Notes
In [2], we used HyKSS as our query system to explore cross-language query and search. Here, we focus on HyKSS itself, describing all its features and providing an in-depth statistical evaluation of its performance relative to alternative systems. Except for the necessary background about extraction ontologies and some parts of the running example and its figures, which serve to illustrate the ideas in HyKSS, this paper is new, describing HyKSS and its features in depth and providing a statistical evaluation substantiating our claimed contributions.
http://deg.byu.edu (in the OntologyWorkbench tool).
See the HyKSS demo at http://deg.byu.edu/demos.
To see a sample of a generated form, enter some query like the running example query to provide context and then click on Advanced Search in the HyKSS demo (http://deg.byu.edu/demos).
In general, logic form identification in natural language is difficult (e.g., see [8]). In our query here, for example, note that “and” denotes disjunction rather than its more common conjunctive denotation.
See http://lucene.apache.org/core/3_6_1/scoring.html for details.
The document sets may have contained duplicate or similar advertisements due to cross posting and re-posting tendencies of craigslist users. We did not check for this, but rather just took the ads as posted.
We established the keyword and semantic weights as those that maximized mean average precision over the 50 validation documents. To find the keyword weight \(k\), we executed the 100 queries in the training query set using 101 weight combinations (0.0–1.0) and chose the best. The semantic weight \(s\) became \(1\)-k.
See [13] for a full discussion of the principles of ranking in information retrieval.
We began our analysis using a split plot design, but when we examined the interactions among our factors and found that the interactions were not major sources of response variability, we simplified to a block statistical model.
In our experiment, the keyword results in the representative case turned out to be literally constant, but in general they need not be, and indeed the Keyword—Pre-processing approach (\(K_p\)) yielded results that were not quite constant.
January, 2012.
References
Bhagdev R, Chapman S, Ciravegna F, Lanfranchi V, Petrelli D (2008) Hybrid search: effectively combining keywords and ontology-based searches. In: Proceedings of the 5th European semantic web conference (ESWC’08), Tenerife, Canary Islands, Spain, June 2008, pp 554–568
Embley DW, Liddle SW, Lonsdale DW, Park JS, Shin B-J, Zitzelberger A (2012) Cross-language hybrid keyword and semantic search. In: Proceedings of the 31st international conference on conceptual modeling (ER 2012), Florence, Italy, October 2012, pp 190–203
Embley DW, Zitzelberger A (2010) Theoretical foundations for enabling a web of knowledge. In: Proceedings of the sixth international symposium on foundations of information and knowledge systems (FoIKS’10), Sophia, Bulgaria, February 2010, pp 211–229
Stuckenschmidt H (2012) Data semantics on the web. J Data Seman 1(1):1–9
Embley DW, Campbell DM, Jiang YS, Liddle SW, Lonsdale DW, Ng Y-K, Smith RD (1999) Conceptual-model-based data extraction from multiple-record web pages. Data Knowl Eng 31(3):227–251
Buitelaar P, Cimiano P, Haase P, Sintek M (2009) Towards linguistically grounded ontologies. In: Proceedings of the 6th European semantic web conference (ESWC’09), Heraklion, Greece, May/June 2009, pp 111–125
Embley DW (1980) Programming with data frames for everyday data items. In: Proceedings of the 1980 national computer conference, Anaheim, California, May 1980, pp 301–305
Rus V (2004) A first evaluation of logic form identification systems. In: Mihalcea R, Edmonds P (eds) Senseval-3: third international workshop on the evaluation of systems for the semantic analysis of text, Barcelona, Spain, March 2004, pp 37–40
Tao C, Embley DW, Liddle SW (2009) FOCIH: form-based ontology creation and information harvesting. In: Proceedings of the 28th international conference on conceptual modeling (ER2009), Gramado, Brazil, November 2009, pp 346–359
Lopez V, Nikolov A, Fernández M, Sabou M, Uren VS, Motta E (2009) Merging and ranking answers in the semantic web: the wisdom of crowds. In: Gómez-Pérez A, Yu Y, Ding Y (eds) ASWC, volume 5926 of lecture notes in computer science. Springer, pp 135–152
Fernandez M, Lopez V, Sabou M, Uren V, Vallet D, Motta E, Castells P (2008) Semantic search meets the web. In: Proceedings of the second IEEE international conference on semantic computing (ICSC‘08), pp 253–260, Santa Clara, California
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Liu T-Y (2011) Learning to rank for information retrieval, 1st edn. Springer, Berlin
Rencher AC, Christensen WF (2012) Methods of multivariate analysis, 3rd edn. Wiley, Hoboken, NJ
Castells P, Fernandez M, Vallet D (2007) An adaptation of the vector-space model for ontology-based information retrieval. IEEE Trans Knowl Data Eng 19(2):261–272
Fazzinga B, Gianforme G, Gottlob G, Lukasiewicz T (2010) Semantic web search based on ontological conjunctive queries. In: Proceedings of the sixth international symposium on foundations of information and knowledge systems (FoIKS10), Sophia, Bulgaria, February 2010, pp 153–172
Giannopoulos G, Bikakis N, Dalamagas T, Sellis TK (2010) GoNTogle: a tool for semantic annotation and search. In: Proceedings of the seventh European semantic web conference (ESWC‘10), pp 376–380
Pound J, Ilyas IF, Weddell G (2010) Expressive and flexible access to web-extracted data: a keyword-based structured query language. In: Proceedings of the 2010 international conference on management of data (SIGMOD’10), Indianapolis, Indiana, June 2010, pp 423–434
Wang H, Tran T, Liu C (2008) CE\(^2\): towards a large scale hybrid search engine with integrated ranking support. In: Proceedings of the 17th ACM conference on information and knowledge management (CIKM’08), Napa Valley, California, October 2008, pp 1323–1324
Zhang L, Liu Q, Zhang J, Wang H, Pan Y, Yu Y (2007) Semplore: an IR approach to scalable hybrid query of semantic web data. In: Proceedings of the 6th international semantic web conference and 2nd Asian semantic web conference (ISWC/ASWC’07), Busan, Korea, November 2007, pp 652–665
Lopez V, Uren V, Sabou MR, Motta E (2009) Cross ontology query answering on the semantic web: an initial evaluation. In: Proceedings of the fifth international conference on knowledge capture (K-CAP’09), Redondo Beach, California, September 2009, pp 17–24
Lopez V, Fernández M, Motta E, Stieler N (2012) Poweraqua: supporting users in querying and exploring the semantic web. Seman Web 3(3):249–265
Damljanovic D, Agatonovic M, Cunningham H (2010) Natural language interfaces to ontologies: Combining syntactic analysis and ontology-based lookup through the user interaction. In: Proceedings of the 7th extended semantic web conference (ESWC10), Heraklion, Greece, May/June 2010, pp 106–120
Egozi O, Markovitch S, Gabrilovich E (2011) Concept-based information retrieval using explicit semantic analysis. ACM Trans Inf Syst 29(2):1–34
Lopez V, Uren VS, Sabou M, Motta E (2011) Is question answering fit for the semantic web?: a survey. Semant Web 2(2):125–155
He H, Wang H, Yang J, Yu PS (2007) Blinks: ranked keyword searches on graphs. In: Chan CY, Ooi BC, Zhou A (eds) SIGMOD Conference. ACM, pp 305–316
Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: Ioannidis YE, Lee DL, Ng RT (eds) ICDE. IEEE, pp 405–416
De Virgilio R, Maccioni A, Cappellari P (2013) A linear and monotonic strategy to keyword search over rdf data. In: Daniel F, Dolog P, Li Q (eds) ICWE, volume 7977 of lecture notes in computer science. Springer, pp 338–353
Lonsdale DW, Embley DW, Ding Y, Xu L, Hepp M (2009) Reusing ontologies and language components for ontology generation. Data Knowl Eng 69(4):318–330
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zitzelberger, A.J., Embley, D.W., Liddle, S.W. et al. HyKSS: Hybrid Keyword and Semantic Search. J Data Semant 4, 213–229 (2015). https://doi.org/10.1007/s13740-014-0046-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-014-0046-4