Abstract
Despite its success as a preferred or de-facto source of information, the Web implicates two key challenges: To provide improved systems that retrieve the most relevant information available, and, secondly, how to target search on information that satisfies user’s need with accurate balance of novelty and relevance. Nevertheless, Web content is not always easy to use. Due to the unstructured and semi-structured nature of the Web pages & design idiosyncrasy of Websites, it is a challenging task to organize & manage content from the Web. Web Mining tries to solve these issues that arise due to the WWW phenomenon. This paper proposes a novel context-based paradigm for improving Web Information Retrieval, given a multi-term query. The technique referred to as the Contextual Proximity Model (CPM), captures query context and matches it against term context in documents to determine term significance and topical relevance. It makes use of the co-information metric to detect the query context. This contextual evidence is used as an additional input to disambiguate and augment the user’s explicit query and dynamically contribute to the term frequency metric to ensure a vital, positive impact on retrieval accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kosala, R., Blockeel, H.: Web Mining Research: A survey. SIGKDD Explorations 2(1), 1–15 (2000)
Bin, W., Zhijing, L.: Web Mining Research. In: ICCIMA 2003. Proc. 5th Int’l. Conf. on Computational Intelligence and Multimedia Applications (2003)
Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2) (2000)
Kuhlen, R.: Information and Pragmatic Value-adding: Language Games and Information Science. Computers and the Humanities 25, 93–101 (1991)
Luhn, H.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Develop. 1(4), 309–317 (1957)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999)
Salton, G., Yang, C.: On the specification of term values in automatic indexing. J. Doc. 29(4), 351–372 (1973)
Anh, V., Moffat, A.: Robust and web retrieval document-centric integral impacts. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 726–731 (2003)
Craswell, N., Hawking, D., Upstill, T., McLean, A., Wilkinson, R., Wu, M.: TREC 12 Web and interactive tracks at CSIRO. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 193–203 (2003)
Lawrence, S., Giles, C.: Context and page analysis for improved web search. IEEE Internet Computing 2(4), 38–46 (1998)
Robertson, S., Walker, S.: Okapi/Keenbow at TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA, pp. 151–161 (1999)
Yu, S., Cai, D., Wen, J., Ma, W.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International Word Wide Web Conference (2003)
Voorhees, E.: Using WordNet for text retrieval. WordNet: An Electronic Lexical Database, pp. 285–303. MIT Press, Cambridge (1998)
Yu, S., Cai, D., Wen, J., Ma, W.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International Word Wide Web Conference (2003)
Kang, I., Kim, G.: Query type classification for web document retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64–71. ACM Press, New York (2003)
Plachouris, V., Cacheda, F., Ounis, L., van Rijsbergen, C.: University of Glasgow at the Web Track: Dynamic Application of Hyperlink Analysis using the Query Scope. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 636–642 (2003)
Plachouris, V., Ounis, I.: Query-biased combination of evidence on the web. In: Workshop on Mathematical/Formal Methods in Information Retrieval, ACM SIGIR Conference, pp. 105–121 (2002)
Jing, H., Tzoukermann, E.: Information retrieval based on context distance and morphology. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 90–96. ACM Press, New York (1999)
Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. J. Am. Soc. Inf. Sci. Technol. 53(3), 236–249 (2002)
Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: WEBSOM — self-organizing maps of document collections. In: Proceedings of WSOM_97 (Workshop on Self-Organizing Maps), Espoo, Finland, pp. 310–315 (1997)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International World Wide Web Conference, pp. 406–414 (2001)
Glover, E., Lawrence, S., Gordon, M., Birmingham, W., Lee Giles, C.: Web search — your way. Commun. ACM 44(12), 97–102 (2001)
Lawrence, S., Giles, C.: Context and page analysis for improved web search. IEEE Internet Computing 2(4), 38–46 (1998)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Amitay, E., Carmel, D., Darlow, A., Lempel, R., Soffer, A.: Topic distillation with knowledge agents. In: Proceedings of the 11th Text Retrieval Conference (TREC-11), Gaithersburg, Maryland, USA (2002)
Brin, S., Page, L.: The anatomy of a large-scale hyper-textual web search engine. In: Proceedings of the 7th WWW Conference, Brisbane, Australia, pp. 107–117 (1998a)
Robertson, S.: On term selection for query expansion. J. Doc.
Tian, C., et al.: Web Search Improvement Based on Proximity and density of multiple keywords. In: ICDEW 2006. IEEE Proceedings of the 22nd International conference on Data engineering Workshops, IEEE Computer Society Press, Los Alamitos (2006)
Wen, J.R., et al.: Probabilistic Model for Contextual Retrieval, ACM SIGIR (2004)
Pickens, J., Farlance, A.M.: Term Context Models for Information Retrieval, ACM CIKM (2006)
Jonathan, S., et al.: Context Driven Ranking for the Web
Zakos, J., Verma, B.: A Novel Context-based Technique for Web Information Retrieval World Wide Web 9(4), 485–503 (December 2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhatia, M.P.S., Kumar Khalid, A. (2007). Contextual Proximity Based Term-Weighting for Improved Web Information Retrieval. In: Zhang, Z., Siekmann, J. (eds) Knowledge Science, Engineering and Management. KSEM 2007. Lecture Notes in Computer Science(), vol 4798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76719-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-76719-0_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76718-3
Online ISBN: 978-3-540-76719-0
eBook Packages: Computer ScienceComputer Science (R0)