Skip to main content

Automatic Web Page Annotation with Google Rich Snippets

  • Conference paper
On the Move to Meaningful Internet Systems, OTM 2010 (OTM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6427))

Abstract

Web pages are designed to be read by people, not machines. Consequently, searching and reusing information on the Web is a difficult task without human participation. Adding semantics (i.e meaning) to a Web page would help machines to understand Web contents and better support the Web search process. One of the latest developments in this field is Google’s Rich Snippets, a service for Web site owners to add semantics to their Web pages. In this paper we provide an approach to automatically annotate a Web page with Rich Snippets RDFa tags. Exploiting several heuristics and a named entity recognition technique, our method is capable of recognizing and annotating a subset of Rich Snippets’ vocabulary, i.e., all attributes of its Review concept, and the names of Person and Organization concepts. We implemented an on-line service and evaluated the accuracy of the approach on real E-commerce Web sites.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 34–43 (2001)

    Article  Google Scholar 

  2. Goel, K., Guha, R.V., Hansson, O.: Introducing Rich Snippets, http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html

  3. Google: Google Webmaster Tools: About review data (2009), http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146645

  4. Adida, B., Birbeck, M.: RDFa Primer: Bridging the Human and Data Webs (2008), http://www.w3.org/TR/xhtml-rdfa-primer/

  5. Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without gazetteers. In: Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 1–8. Association for Computational Linguistics (1999)

    Google Scholar 

  6. Morgan, R., Garigliano, R., Callaghan, P., Poria, S., Smith, M., Urbanowicz, A., Collingham, R., Costantino, M., Cooper, C., Group, L.: University of Durham: Description of the LOLITA System as Used in MUC-6. In: Sixth Message Understanding Conference. Morgan Kaufmann Publishers, San Francisco (1995)

    Google Scholar 

  7. Krupka, G.R., Hausman, K.: IsoQuest, Inc: Description of the NetOwl(TM) extractor system as used for MUC-7. In: Seventh Message Understanding Conference (1998)

    Google Scholar 

  8. Seomoz.org.: Search Engine Ranking Factors (2009), http://www.seomoz.org/article/search-ranking-factors

  9. Virgilio, R.D., Torlone, R.: A Structured Approach to Data Reverse Engineering of Web Applications. In: 9th International Conference on Web Engineering, pp. 91–105. Springer, Heidelberg (2009)

    Google Scholar 

  10. Can, L., Qian, Z., Xiaofeng, M., Wenyin, L.: Postal Address Detection from Web Documents. In: International Workshop on Challenges in Web Information Retrieval and Integration, pp. 40–45. IEEE Computer Society, Los Alamitos (2005)

    Chapter  Google Scholar 

  11. Yahoo!: SearchMonkey: Site Owner Overview (2009), http://developer.yahoo.com/searchmonkey/siteowner.html

  12. Electrum: Valid HTML Statistics (2009), http://try.powermapper.com/demo/statsvalid.aspx

  13. Tomberg, V., Laanpere, M.: RDFa versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-up Personal Learning Environments. In: Second International Workshop on Mashup Personal Learning Environments, M. Jeusfeld c/o Redaktion Sun SITE, Informatik V, pp. 102–109. RWTH Aachen (2009)

    Google Scholar 

  14. Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)

    Google Scholar 

  15. Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: 40th Annual Meeting of the Association for Computational Linguistics, pp. 417–424. ACL (2002)

    Google Scholar 

  16. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Conference on Emprirical Methods in Natural Language Processing, pp. 79–86. ACL (2002)

    Google Scholar 

  17. Ye, Q., Zhang, Z., Law, R.: Sentiment Classification of Online Reviews to Travel Destinations by Supervised Machine Learning Approaches. Expert Systems with Applications 36(3), 6527–6535 (2009)

    Article  Google Scholar 

  18. Kennedy, A., Inkpen, D.: Sentiment Classification of Movie Reviews Using Contextual Valence Shifters. Computational Intelligence 22(2), 110–225 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hop, W., Lachner, S., Frasincar, F., De Virgilio, R. (2010). Automatic Web Page Annotation with Google Rich Snippets . In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems, OTM 2010. OTM 2010. Lecture Notes in Computer Science, vol 6427. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16949-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16949-6_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16948-9

  • Online ISBN: 978-3-642-16949-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics