Abstract
Most search engines do their text query and retrieval based on keyword phrases. However, publishers cannot anticipate all possible ways in which users search for the items in their documents. In fact, many times, there may be no direct keyword match between a search phrase and descriptions of items that are perfect “hits” for the search. We present a highly automated solution to the problem of bridging the semantic gap between item information and search phrases. Our system can learn rule-based definitions that can be ascribed to search phrases with dynamic connotations by extracting structured item information from product catalogs and by utilizing a frequent itemset mining algorithm. We present experimental results for a realistic e-commerce domain. Also, we compare our rule-mining approach to vector-based relevance feedback retrieval techniques and show that our system yields definitions that are easier to validate and perform better.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal and R. Srikant. 1994, “Fast Algorithms for mining association rules”. In Proc. 20th Int. Conf. VLDB pp. 487–499
H. Aholen, O. Heinonen, M. Klemettinen, and A. I. Verkamo. 1998, “Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Collections”. In Proceedings of ADL'98, Santa Barabara, USA W. Andrews. 2003 “Gartner Report: Visionaries Invade the 2003 Search Engine Magic Quadrant”.
V. Crescenzi, G. Mecca, and P. Merialdo. 2001 “Roadrunner: Towards automatic data extraction from large web sites”, In Proc. of the 2001 Intl. Conf. on Very Large Data Bases.
H. Davulcu, S. Vadrevu, S. Nagarajan, I.V. Ramakrishnan. 2003, “OntoMiner: Bootstrapping and Populating Ontologies From Domain Specific Web Sites”, in IEEE Intelligent Systems, Volume 18, Number 5.
Deerwester, S., Dumais, S. T., Landauer, T. K., Furnas, G. W. and Harshman, R. A. 1990, “Indexing. Latent semantic analysis”, journal of the Society for Information Science, 41(6), pp. 391–407.
Steve Finch and Andrei Mikheev. 1997, “A Workbench for Finding Structure in Texts”. Applied Natural Language Processing, Washington D.C.
J. Han J.Pei, Y.Yin, and R. Mao. 2000, “Mining frequent pattern without candidate generation.” In Proceedings of the ACM SIGMOD International Conference on Management of Data, volume 29(2) of SIGMOD Record, ACM Press.
J. Han, and M. Kamber. 2001, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers.
Hung V. Nguyen, P. Velamuru, D. Kolippakkam, H. Davulcu, H. Liu, and M. Ates. 2003, “Mining “Hidden Phrase” Definitions from the Web”. APWeb, Xi'an, China, Springer-Velag, LNCS Vol 2642, pp. 156–165.
M.F. Porter. 1980, “An algorithm for suffix stripping”, Program, 14 no. 3, pp. 130–137.
G. Salton and C. Buckley. 1990, “Improving retrieval performance by relevance feedback”, journal of the American Society for Information Science, pp. 288–297.
R. A. Baeza-Yates and Berthier A. Ribeiro-Neto. 1999, “Modern Information Retrieval”, ACM Press / Addison-Wesley.
M.J. Zaki. 2000, “Scalable algorithms for association mining”. IEEE Transactions on Knowledge and Data Engineering, 12(3), pp. 372–390.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this paper
Cite this paper
Davulcu, H., Nguyen, H.V., Ramachandran, V. (2007). BOOSTING ITEM FINDABILITY: BRIDGING THE SEMANTIC GAP BETWEEN SEARCH PHRASES AND ITEM INFORMATION. In: Chen, CS., Filipe, J., Seruca, I., Cordeiro, J. (eds) Enterprise Information Systems VII. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-5347-4_24
Download citation
DOI: https://doi.org/10.1007/978-1-4020-5347-4_24
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-5323-8
Online ISBN: 978-1-4020-5347-4
eBook Packages: Computer ScienceComputer Science (R0)