Mining “Hidden Phrase” Definitions from the Web
- First Online:
Keyword searching is the most common form of document search on the Web. Many Web publishers manually annotate the META tags and titles of their pages with frequently queried phrases in order to improve their placement and ranking. A “ hidden phrase” is defined as a phrase that occurs in the META tag of a Web page but not in its body. In this paper we present an algorithm that mines the definitions of hidden phrases from the Web documents. Phrase definitions allow (i) publishers to find relevant phrases with high query frequency, and, (ii) search engines to test if the content of the body of a document matches the phrases. We use co-occurrence clustering and association rule mining algorithms to learn phrase definitions from high-dimensional data sets. We also provide experimental results.
Unable to display preview. Download preview PDF.
- 1.H. Aholen, O. Heinonen, M. Klemettinen, and A. I. Verkamo.: Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Collections. Proceedings of ADL’98, Santa Barbara, USA (4, 1998)Google Scholar
- 2.R. Agrawal and R. Srikant.: Fast Algorithms for mining association rules. In Proc. 20th Int. Conf. VLDB (1994) 487–499Google Scholar
- 3.Cutting and R. Douglas.: Real life information retrieval: Commercial search engines. Part of a panel discussion at SIGIR 1997: Proc. of the 20th Annual ACM SIGIR Conference on Research and Development on Information Retrieval (1997)Google Scholar
- 4.R. C. Dubes and A. K. Jam.: Algorithms for Clustering Data, Prentice Hall, (1988)Google Scholar
- 5.J. Karlgren.: Non-topical factors in information access. Invited talk at WebNet’ 99, Honolulu, Hawaii, USA, (10,1999)Google Scholar
- 6.L. Kaufman and P. J. Rousseeeuw.: Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley and Sons, (1990).Google Scholar
- 7.B. Len, R. Agrawal, and R. Srikant.: Discovering trends in text databases. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthrysamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97), Newport Beach, California, USA (8,1997). AAAI Press 227–230Google Scholar
- 8.Y. K. Liu.: Finding Description of Definitions of Words on the WWW. Master thesis, University of Sheffield, England, 2000. Available at: http://dis.shef.ac.uk/mark/cv/publications/dissertations/Liu2000.pdf Google Scholar
- 9.L. Page and S. Brin: The anatomy of a large-scale hyper-textual Web search engine. Proceedings of the Seventh International Web Conference WWW 1998Google Scholar
- 10.M. Steinbach, G. Karypis, and V. Kumar.: A Comparison of Document Clustering Techniques. Technical Report #00-034, Department of Computer Science and Engineering, University of Minnesota, USA.Google Scholar
- 11.I. Witten and E. Frank: Data Mining: Practical Machine Learning tools and techniques with Java Implementations. Morgan Kaufman 2000Google Scholar