Enriching Product Ads with Metadata from HTML Annotations
Product ads are a popular form of search advertizing offered by major search engines, including Yahoo, Google and Bing. Unlike traditional search ads, product ads include structured product specifications, which allow search engine providers to perform better keyword-based ad retrieval. However, the level of completeness of the product specifications varies and strongly influences the performance of ad retrieval.
On the other hand, online shops are increasing adopting semantic markup languages such as Microformats, RDFa and Microdata, to annotate their content, making large amounts of product description data publicly available. In this paper, we present an approach for enriching product ads with structured data extracted from thousands of online shops offering Microdata annotations. In our approach we use structured product ads as supervision for training feature extraction models able to extract attribute-value pairs from unstructured product descriptions. We use these features to identify matching products across different online shops and enrich product ads with the extracted data. Our evaluation on three product categories related to electronics show promising results in terms of enriching product ads with useful product data.
KeywordsMicrodata schema.org Data integration Product data
We would like to acknowledge Roi Blanco (Yahoo Labs) and Christian Bizer (University of Mannheim) for their helpful comments to our work. We would also like to acknowledge the support, help and insights of the Yahoo Gemini Product Ads engineering and the Yahoo Labs Advertising Sciences teams, in particular Nagaraj Kota and Ben Shahshahani.
- 2.van Bezu, R., Borst, S., Rijkse, R., Verhagen, J., Vandic, D., Frasincar, F.: Multi-component similarity method for web product duplicate detection (2015)Google Scholar
- 3.Bhattacharya, S., Gollapudi, S., Munagala, K.: Consideration set generation in commerce search. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 317–326. ACM, New York, NY, USA (2011). http://doi.acm.org/10.1145/1963405.1963452
- 5.Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)Google Scholar
- 7.Isele, R., Bizer, C.: Learning linkage rules using genetic programming. In: Proceedings of the International Workshop on Ontology Matching, pp. 13–24 (2011)Google Scholar
- 8.Kannan, A., Givoni, I.E., Agrawal, R., Fuxman, A.: Matching unstructured product offers to structured product specifications. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 404–412 (2011)Google Scholar
- 9.Melli, G.: Shallow semantic parsing of product offering titles (for better automatic hyperlink insertion). In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1670–1678. ACM (2014)Google Scholar
- 10.Meusel, R., Petrovski, P., Bizer, C.: The webdatacommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 277–292. Springer, Heidelberg (2014)Google Scholar
- 13.Petrovski, P., Bryl, V., Bizer, C.: Integrating product data from websites offering microdata markup. In: Proceedings of the 23rd International Conference on World Wide Web Companion, pp. 1299–1304 (2014)Google Scholar
- 14.Petrovski, P., Bryl, V., Bizer, C.: Learning regular expressions for the extraction of product attributes from e-commerce microdata (2014)Google Scholar