Advertisement

Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis

  • Christian Bizer
  • Kai Eckert
  • Robert Meusel
  • Hannes Mühleisen
  • Michael Schuhmacher
  • Johanna Völker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8219)

Abstract

More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our findings to be verified and to be used as starting points for further domain-specific investigations as well as for focused information extraction endeavors.

Keywords

Web Science Web of Data RDFa Microdata Microformats 

References

  1. 1.
    Adida, B., Birbeck, M.: RDFa primer - bridging the human and data webs - W3C recommendation (2008), http://www.w3.org/TR/xhtml-rdfa-primer/
  2. 2.
    Goel, K.: Extended schema.org news support (2011), http://blog.schema.org/2011/09/extended-schemaorg-news-support.html
  3. 3.
    Goel, K., Guha, R.V., Hansson, O.: Introducing rich snippets (2009), http://googlewebmastercentral.blogspot.de/2009/05/introducing-rich-snippets.html
  4. 4.
    Guha, R.V.: Schema.org support for job postings (2011), http://blog.schema.org/2011/11/schemaorg-support-for-job-postings.html
  5. 5.
    Haas, K., Mika, P., Tarjan, P., Blanco, R.: Enhanced results for web search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 725–734. ACM, New York (2011)Google Scholar
  6. 6.
    Hickson, I.: HTML Microdata. Working Draft (2011), http://www.w3.org/TR/microdata/
  7. 7.
    Mühleisen, H., Bizer, C.: Web data commons – extracting structured data from two large web corpora. In: LDOW 2012: Linked Data on the Web. CEUR Workshop Proceedings, vol. 937. CEUR-ws.org (2012)Google Scholar
  8. 8.
    Mika, P.: Microformats and RDFa deployment across the Web (2011), http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/
  9. 9.
    Mika, P., Potter, T.: Metadata statistics for a large web corpus. In: LDOW 2012: Linked Data on the Web. CEUR Workshop Proceedings, vol. 937. CEUR-ws.org (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Christian Bizer
    • 1
  • Kai Eckert
    • 1
  • Robert Meusel
    • 1
  • Hannes Mühleisen
    • 2
  • Michael Schuhmacher
    • 1
  • Johanna Völker
    • 1
  1. 1.Data and Web Science GroupUniversity of MannheimGermany
  2. 2.Centrum Wiskunde & InformaticaDatabase Architectures GroupThe Netherlands

Personalised recommendations