Why Are There More Hotels in Tyrol than in Austria? Analyzing Schema.org Usage in the Hotel Domain
It has been almost 4 years now since the world’s leading search engine operators, Bing, Google, Yahoo! and Yandex, decided to start working on an initiative to enrich web pages with structured data, known as schema.org. Since then, many web masters and those responsible for web pages started adapting this technology to enrich websites with semantic information. This paper analyzes parts of the structured data in the largest available open to the public web crawl, the Common Crawl, to find out how the hotel branch is using schema.org. On the use case of schema.org/Hotel, this paper studies who uses it, how it is applied and whether or not the classes and properties of the vocabulary are used in the syntactically and semantically correct way. Further, this paper will compare the usage based on numbers of 2013 and 2014 to find out whether or not an increase in usage can be noted. We observe a wide and growing distribution of schema.org, but also a large variety of erroneous and restricted usage of schema.org within the data set, which makes the data hard to use for real-life applications. When it comes to geographical comparison, the outcome shows that the United States are far in the lead with annotation of hotels with schema.org and Europe still has work to do to catch up.
KeywordsSchema.org Semantic annotation Analysis Hotel Tourism
This work has been partially supported by research projects, which are co-funded by FFG – TourPack (http://tourpack.sti2.at), ÖAD – LDCT (http://ldct.sti2.at), and European Commission in FP7 and H2020: BYTE (http://byte-project.eu) and EuTravel (http://www.eutravelproject.eu). The authors thank their colleagues for useful inputs, the reviewers for useful comments, and Amy Strub for proofreading of English.
- Cyganiak, R., Harth, A., & Hogan, A. (2008). N-quads: Extending n-triples with context. Technical report, DERI, NUI Galway.Google Scholar
- Fodor, O., & Werthner, H. (2005). Harmonise: a step toward an interoperable e-tourism marketplace. International Journal of Electronic Commerce, 9(2), 11–39.Google Scholar
- Hepp, M. (2013). Accommodation ontology language reference. Technical report, Hepp Research GmbH, Innsbruck.Google Scholar
- Jakkilinki, R., Georgievski, M., & Sharda, N. (2007). Connecting destinations with an ontology-based e-tourism planner. Information and Communication Technologies in Tourism, 2007, 21–32.Google Scholar
- Khalili, A., & Auer, S. (2013). Wysiwym authoring of structured content based on schema.org. In Web information systems engineering – WISE 2013 (pp. 425–438). Springer.Google Scholar
- Meusel, R., & Paulheim, H. (2015). Heuristics for fixing common errors in deployed schema.org microdata. In The semantic web – Latest advances and new domains (pp. 152–168). Springer.Google Scholar
- Meusel, R., Petrovski, P., & Bizer, C. (2014). The webdatacommons microdata, RDFa and microformat dataset series. In The semantic web – ISWC 2014 (pp. 277–292). Springer.Google Scholar
- Prud’hommeaux, E., & Seaborne, A. (2008). SPARQL query language for RDF. W3C recommendation, 15.Google Scholar
- Stavrakantonakis, I., Toma, I., Fensel, A., & Fensel, D. (2014). Hotel websites, web 2.0, web 3.0 and online direct marketing: The case of Austria. In Information and communication technologies in tourism 2014 (pp. 665–677). Springer.Google Scholar
- Toma, I., Stanciu, C., Fensel, A., Stavrakantonakis, I., & Fensel, D. (2014). Improving the online visibility of touristic service providers by using semantic annotations. In The semantic web: ESWC 2014 satellite events (pp. 259–262). Springer.Google Scholar
- Zanker, M., Fuchs, M., Seebacher, A., Jessenitschnig, M., & Stromberger, M. (2009). An automated approach for deriving semantic annotations of tourism products based on geospatial information. Information and Communication Technologies in Tourism, 2009, 211–221.Google Scholar