Enriching Product Ads with Metadata from HTML Annotations

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)


Product ads are a popular form of search advertizing offered by major search engines, including Yahoo, Google and Bing. Unlike traditional search ads, product ads include structured product specifications, which allow search engine providers to perform better keyword-based ad retrieval. However, the level of completeness of the product specifications varies and strongly influences the performance of ad retrieval.

On the other hand, online shops are increasing adopting semantic markup languages such as Microformats, RDFa and Microdata, to annotate their content, making large amounts of product description data publicly available. In this paper, we present an approach for enriching product ads with structured data extracted from thousands of online shops offering Microdata annotations. In our approach we use structured product ads as supervision for training feature extraction models able to extract attribute-value pairs from unstructured product descriptions. We use these features to identify matching products across different online shops and enrich product ads with the extracted data. Our evaluation on three product categories related to electronics show promising results in terms of enriching product ads with useful product data.


Microdata Data integration Product data 



We would like to acknowledge Roi Blanco (Yahoo Labs) and Christian Bizer (University of Mannheim) for their helpful comments to our work. We would also like to acknowledge the support, help and insights of the Yahoo Gemini Product Ads engineering and the Yahoo Labs Advertising Sciences teams, in particular Nagaraj Kota and Ben Shahshahani.


  1. 1.
    de Bakker, M., Frasincar, F., Vandic, D.: A hybrid model words-driven approach for web product duplicate detection. In: Salinesi, C., Norrie, M.C., Pastor, Ó. (eds.) CAiSE 2013. LNCS, vol. 7908, pp. 149–161. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  2. 2.
    van Bezu, R., Borst, S., Rijkse, R., Verhagen, J., Vandic, D., Frasincar, F.: Multi-component similarity method for web product duplicate detection (2015)Google Scholar
  3. 3.
    Bhattacharya, S., Gollapudi, S., Munagala, K.: Consideration set generation in commerce search. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 317–326. ACM, New York, NY, USA (2011).
  4. 4.
    Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)Google Scholar
  6. 6.
    Ghani, R., Probst, K., Liu, Y., Krema, M., Fano, A.: Text mining for product attribute extraction. ACM SIGKDD Explor. Newslett. 8(1), 41–48 (2006)CrossRefGoogle Scholar
  7. 7.
    Isele, R., Bizer, C.: Learning linkage rules using genetic programming. In: Proceedings of the International Workshop on Ontology Matching, pp. 13–24 (2011)Google Scholar
  8. 8.
    Kannan, A., Givoni, I.E., Agrawal, R., Fuxman, A.: Matching unstructured product offers to structured product specifications. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 404–412 (2011)Google Scholar
  9. 9.
    Melli, G.: Shallow semantic parsing of product offering titles (for better automatic hyperlink insertion). In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1670–1678. ACM (2014)Google Scholar
  10. 10.
    Meusel, R., Petrovski, P., Bizer, C.: The webdatacommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014)Google Scholar
  11. 11.
    Meusel, R., Primpeli, A., Meilicke, C., Paulheim, H., Bizer, C.: Exploiting microdata annotations to consistently categorize product offers at web scale. In: Stuckenschmidt, H., Jannach, D. (eds.) EC-Web 2015. LNBIP, vol. 239, pp. 83–93. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  12. 12.
    Nguyen, H., Fuxman, A., Paparizos, S., Freire, J., Agrawal, R.: Synthesizing products for online catalogs. Proc. VLDB Endowment 4(7), 409–418 (2011)CrossRefGoogle Scholar
  13. 13.
    Petrovski, P., Bryl, V., Bizer, C.: Integrating product data from websites offering microdata markup. In: Proceedings of the 23rd International Conference on World Wide Web Companion, pp. 1299–1304 (2014)Google Scholar
  14. 14.
    Petrovski, P., Bryl, V., Bizer, C.: Learning regular expressions for the extraction of product attributes from e-commerce microdata (2014)Google Scholar
  15. 15.
    Qiu, D., Barbosa, L., Dong, X.L., Shen, Y., Srivastava, D.: Dexter: large-scale discovery and extraction of product specifications on the web. Proc. VLDB Endowment 8(13), 2194–2205 (2015)CrossRefGoogle Scholar
  16. 16.
    Vandic, D., Van Dam, J.W., Frasincar, F.: Faceted product search powered by the semantic web. Decis. Support Syst. 53(3), 425–437 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany
  2. 2.Yahoo LabsLondonUK

Personalised recommendations