Skip to main content

Advertisement

Log in

A Wikipedia Matching Approach to Contextual Advertising

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Contextual advertising is an important part of today’s Web. It provides benefits to all parties: Web site owners and an advertising platform share the revenue, advertisers receive new customers, and Web site visitors get useful reference links. The relevance of selected ads for a Web page is essential for the whole system to work. Problems such as homonymy and polysemy, low intersection of keywords and context mismatch can lead to the selection of irrelevant ads. Therefore, a simple keyword matching technique gives a poor accuracy. In this paper, we propose a method for improving the relevance of contextual ads. We propose a novel “Wikipedia matching” technique that uses Wikipedia articles as “reference points” for ads selection. We show how to combine our new method with existing solutions in order to increase the overall performance. An experimental evaluation based on a set of real ads and a set of pages from news Web sites is conducted. Test results show that our proposed method performs better than existing matching strategies and using the Wikipedia matching in combination with existing approaches provides up to 50% lift in the average precision. TREC standard measure bpref-10 also confirms the positive effect of using Wikipedia matching for the effective ads selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anagnostopoulos, A., Broder, A., Gabrilovich, E., Josifovski, V., Riedel, L.: Just-in-time contextual advertising. In: Proc. of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 331–340, Lisbon, Portugal (2007) L

    Chapter  Google Scholar 

  2. Anthony H.J.: Probability and Statistics for Engineers and Scientists. Duxbury, Belmont (2007)

    Google Scholar 

  3. Broder, A., Fontoura, M., Josifovski, V., Riedel, L.: Semantic approach to contextual advertising. In: Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands (2007)

  4. Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: SIGIR ’04: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32, New York, NY, USA, ACM (2004)

  5. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 491–407 (1988)

    Google Scholar 

  6. Ding, C., He, X.: K-means clustering via principal component analysis. In: ICML ’04: Proceedings of the Twenty-first International Conference on Machine Learning, p. 29, New York, NY, USA, ACM (2004)

  7. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW ’01: Proceedings of the 10th International Conference on World Wide Web, pp. 613–622, New York, NY, USA, ACM (2001)

  8. IDC.: Worldwide and U.S. Internet ad Spend Report 4q08: U.S. Growth Flat, 1q09 Spending Likely to Contract (2009)

  9. Jun, Z.: Comprehensive Perl Archive Network. http://search.cpan.org/~jzhang/html-contentextractor-0.02/lib/html/contentex%tractor.pm (2007)

  10. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  11. Murdock, V., Ciaramita, M., Plachouras, V.: A noisy-channel approach to contextual advertising. In: Proc. of the 1st International Workshop on Data Mining and Audience Intelligence for Advertising, pp. 21–27, San Jose, California (2007)

  12. Porter, M.F.: An algorithm for suffix stripping. Readings in Information Retrieval, pp. 313–316 (1997)

  13. Porter, M.F.: The Porter Stemming Algorithm official home page. http://tartarus.org/~martin/porterstemmer/index.html (2006)

  14. Ribeiro-Neto, B., Cristo, M.: Impedance coupling in content-targeted advertising. In: Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 496–503, Salvador, Brazil (2005)

  15. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM, 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  16. Sullivan, D.: Search Engine Watch. http://searchenginewatch.com/2183531 (2003)

  17. TREC.: The Fifteenth Text Retrieval Conference (TREC 2006) Proceedings. http://trec.nist.gov/pubs/trec15/appendices/ce.measures06.pdf (2006)

  18. Zhang, Y., Vogel, S.: Measuring confidence intervals for the machine translation evaluation metrics. In: In Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, TMI-2004, pp. 4–6 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chin-Wan Chung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pak, A.N., Chung, CW. A Wikipedia Matching Approach to Contextual Advertising. World Wide Web 13, 251–274 (2010). https://doi.org/10.1007/s11280-010-0084-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-010-0084-2

Keywords

Navigation