Skip to main content

Contextual Proximity Based Term-Weighting for Improved Web Information Retrieval

  • Conference paper
Knowledge Science, Engineering and Management (KSEM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4798))

Abstract

Despite its success as a preferred or de-facto source of information, the Web implicates two key challenges: To provide improved systems that retrieve the most relevant information available, and, secondly, how to target search on information that satisfies user’s need with accurate balance of novelty and relevance. Nevertheless, Web content is not always easy to use. Due to the unstructured and semi-structured nature of the Web pages & design idiosyncrasy of Websites, it is a challenging task to organize & manage content from the Web. Web Mining tries to solve these issues that arise due to the WWW phenomenon. This paper proposes a novel context-based paradigm for improving Web Information Retrieval, given a multi-term query. The technique referred to as the Contextual Proximity Model (CPM), captures query context and matches it against term context in documents to determine term significance and topical relevance. It makes use of the co-information metric to detect the query context. This contextual evidence is used as an additional input to disambiguate and augment the user’s explicit query and dynamically contribute to the term frequency metric to ensure a vital, positive impact on retrieval accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kosala, R., Blockeel, H.: Web Mining Research: A survey. SIGKDD Explorations 2(1), 1–15 (2000)

    Article  Google Scholar 

  2. Bin, W., Zhijing, L.: Web Mining Research. In: ICCIMA 2003. Proc. 5th Int’l. Conf. on Computational Intelligence and Multimedia Applications (2003)

    Google Scholar 

  3. Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2) (2000)

    Google Scholar 

  4. Kuhlen, R.: Information and Pragmatic Value-adding: Language Games and Information Science. Computers and the Humanities 25, 93–101 (1991)

    Article  Google Scholar 

  5. Luhn, H.: A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Develop. 1(4), 309–317 (1957)

    Article  MathSciNet  Google Scholar 

  6. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999)

    Google Scholar 

  7. Salton, G., Yang, C.: On the specification of term values in automatic indexing. J. Doc. 29(4), 351–372 (1973)

    Google Scholar 

  8. Anh, V., Moffat, A.: Robust and web retrieval document-centric integral impacts. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 726–731 (2003)

    Google Scholar 

  9. Craswell, N., Hawking, D., Upstill, T., McLean, A., Wilkinson, R., Wu, M.: TREC 12 Web and interactive tracks at CSIRO. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 193–203 (2003)

    Google Scholar 

  10. Lawrence, S., Giles, C.: Context and page analysis for improved web search. IEEE Internet Computing 2(4), 38–46 (1998)

    Article  Google Scholar 

  11. Robertson, S., Walker, S.: Okapi/Keenbow at TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), Gaithersburg, USA, pp. 151–161 (1999)

    Google Scholar 

  12. Yu, S., Cai, D., Wen, J., Ma, W.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International Word Wide Web Conference (2003)

    Google Scholar 

  13. Voorhees, E.: Using WordNet for text retrieval. WordNet: An Electronic Lexical Database, pp. 285–303. MIT Press, Cambridge (1998)

    Google Scholar 

  14. Yu, S., Cai, D., Wen, J., Ma, W.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th International Word Wide Web Conference (2003)

    Google Scholar 

  15. Kang, I., Kim, G.: Query type classification for web document retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64–71. ACM Press, New York (2003)

    Google Scholar 

  16. Plachouris, V., Cacheda, F., Ounis, L., van Rijsbergen, C.: University of Glasgow at the Web Track: Dynamic Application of Hyperlink Analysis using the Query Scope. In: Proceedings of the 12th Text Retrieval Conference (TREC-12), Gaithersburg, USA, pp. 636–642 (2003)

    Google Scholar 

  17. Plachouris, V., Ounis, I.: Query-biased combination of evidence on the web. In: Workshop on Mathematical/Formal Methods in Information Retrieval, ACM SIGIR Conference, pp. 105–121 (2002)

    Google Scholar 

  18. Jing, H., Tzoukermann, E.: Information retrieval based on context distance and morphology. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 90–96. ACM Press, New York (1999)

    Chapter  Google Scholar 

  19. Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. J. Am. Soc. Inf. Sci. Technol. 53(3), 236–249 (2002)

    Article  Google Scholar 

  20. Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: WEBSOM — self-organizing maps of document collections. In: Proceedings of WSOM_97 (Workshop on Self-Organizing Maps), Espoo, Finland, pp. 310–315 (1997)

    Google Scholar 

  21. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International World Wide Web Conference, pp. 406–414 (2001)

    Google Scholar 

  22. Glover, E., Lawrence, S., Gordon, M., Birmingham, W., Lee Giles, C.: Web search — your way. Commun. ACM 44(12), 97–102 (2001)

    Article  Google Scholar 

  23. Lawrence, S., Giles, C.: Context and page analysis for improved web search. IEEE Internet Computing 2(4), 38–46 (1998)

    Article  Google Scholar 

  24. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  25. Amitay, E., Carmel, D., Darlow, A., Lempel, R., Soffer, A.: Topic distillation with knowledge agents. In: Proceedings of the 11th Text Retrieval Conference (TREC-11), Gaithersburg, Maryland, USA (2002)

    Google Scholar 

  26. Brin, S., Page, L.: The anatomy of a large-scale hyper-textual web search engine. In: Proceedings of the 7th WWW Conference, Brisbane, Australia, pp. 107–117 (1998a)

    Google Scholar 

  27. Robertson, S.: On term selection for query expansion. J. Doc.

    Google Scholar 

  28. Tian, C., et al.: Web Search Improvement Based on Proximity and density of multiple keywords. In: ICDEW 2006. IEEE Proceedings of the 22nd International conference on Data engineering Workshops, IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

  29. Wen, J.R., et al.: Probabilistic Model for Contextual Retrieval, ACM SIGIR (2004)

    Google Scholar 

  30. Pickens, J., Farlance, A.M.: Term Context Models for Information Retrieval, ACM CIKM (2006)

    Google Scholar 

  31. Jonathan, S., et al.: Context Driven Ranking for the Web

    Google Scholar 

  32. Zakos, J., Verma, B.: A Novel Context-based Technique for Web Information Retrieval World Wide Web 9(4), 485–503 (December 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zili Zhang Jörg Siekmann

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bhatia, M.P.S., Kumar Khalid, A. (2007). Contextual Proximity Based Term-Weighting for Improved Web Information Retrieval. In: Zhang, Z., Siekmann, J. (eds) Knowledge Science, Engineering and Management. KSEM 2007. Lecture Notes in Computer Science(), vol 4798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76719-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76719-0_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76718-3

  • Online ISBN: 978-3-540-76719-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics