International Conference on Electronic Commerce and Web Technologies

E-Commerce and Web Technologies pp 83-99

Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale

  • Robert Meusel
  • Anna Primpeli
  • Christian Meilicke
  • Heiko Paulheim
  • Christian Bizer
Conference paper

DOI: 10.1007/978-3-319-27729-5_7

Volume 239 of the book series Lecture Notes in Business Information Processing (LNBIP)
Cite this paper as:
Meusel R., Primpeli A., Meilicke C., Paulheim H., Bizer C. (2015) Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale. In: Stuckenschmidt H., Jannach D. (eds) E-Commerce and Web Technologies. Lecture Notes in Business Information Processing, vol 239. Springer, Cham

Abstract

Semantically annotated data, using markup languages like RDFa and Microdata, has become more and more publicly available in the Web, especially in the area of e-commerce. Thus, a large amount of structured product descriptions are freely available and can be used for various applications, such as product search or recommendation. However, little efforts have been made to analyze the categories of the available product descriptions. Although some products have an explicit category assigned, the categorization schemes vary a lot, as the products originate from thousands of different sites. This heterogeneity makes the use of supervised methods, which have been proposed by most previous works, hard to apply. Therefore, in this paper, we explain how distantly supervised approaches can be used to exploit the heterogeneous category information in order to map the products to set of target categories from an existing product catalogue. Our results show that, even though this task is by far not trivial, we can reach almost \(56\,\%\) accuracy for classifying products into 37 categories.

Keywords

MicrodataRDFaStructured web dataClassification

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Robert Meusel
    • 1
  • Anna Primpeli
    • 1
  • Christian Meilicke
    • 1
  • Heiko Paulheim
    • 1
  • Christian Bizer
    • 1
  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany