Advertisement

Automatic Web Page Annotation with Google Rich Snippets

  • Walter Hop
  • Stephan Lachner
  • Flavius Frasincar
  • Roberto De Virgilio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6427)

Abstract

Web pages are designed to be read by people, not machines. Consequently, searching and reusing information on the Web is a difficult task without human participation. Adding semantics (i.e meaning) to a Web page would help machines to understand Web contents and better support the Web search process. One of the latest developments in this field is Google’s Rich Snippets, a service for Web site owners to add semantics to their Web pages. In this paper we provide an approach to automatically annotate a Web page with Rich Snippets RDFa tags. Exploiting several heuristics and a named entity recognition technique, our method is capable of recognizing and annotating a subset of Rich Snippets’ vocabulary, i.e., all attributes of its Review concept, and the names of Person and Organization concepts. We implemented an on-line service and evaluated the accuracy of the approach on real E-commerce Web sites.

Keywords

Name Entity Recognition Entity Recognition Page Title Page Area Natural Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 34–43 (2001)CrossRefGoogle Scholar
  2. 2.
    Goel, K., Guha, R.V., Hansson, O.: Introducing Rich Snippets, http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html
  3. 3.
    Google: Google Webmaster Tools: About review data (2009), http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146645
  4. 4.
    Adida, B., Birbeck, M.: RDFa Primer: Bridging the Human and Data Webs (2008), http://www.w3.org/TR/xhtml-rdfa-primer/
  5. 5.
    Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without gazetteers. In: Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 1–8. Association for Computational Linguistics (1999)Google Scholar
  6. 6.
    Morgan, R., Garigliano, R., Callaghan, P., Poria, S., Smith, M., Urbanowicz, A., Collingham, R., Costantino, M., Cooper, C., Group, L.: University of Durham: Description of the LOLITA System as Used in MUC-6. In: Sixth Message Understanding Conference. Morgan Kaufmann Publishers, San Francisco (1995)Google Scholar
  7. 7.
    Krupka, G.R., Hausman, K.: IsoQuest, Inc: Description of the NetOwl(TM) extractor system as used for MUC-7. In: Seventh Message Understanding Conference (1998)Google Scholar
  8. 8.
    Seomoz.org.: Search Engine Ranking Factors (2009), http://www.seomoz.org/article/search-ranking-factors
  9. 9.
    Virgilio, R.D., Torlone, R.: A Structured Approach to Data Reverse Engineering of Web Applications. In: 9th International Conference on Web Engineering, pp. 91–105. Springer, Heidelberg (2009)Google Scholar
  10. 10.
    Can, L., Qian, Z., Xiaofeng, M., Wenyin, L.: Postal Address Detection from Web Documents. In: International Workshop on Challenges in Web Information Retrieval and Integration, pp. 40–45. IEEE Computer Society, Los Alamitos (2005)CrossRefGoogle Scholar
  11. 11.
    Yahoo!: SearchMonkey: Site Owner Overview (2009), http://developer.yahoo.com/searchmonkey/siteowner.html
  12. 12.
    Electrum: Valid HTML Statistics (2009), http://try.powermapper.com/demo/statsvalid.aspx
  13. 13.
    Tomberg, V., Laanpere, M.: RDFa versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-up Personal Learning Environments. In: Second International Workshop on Mashup Personal Learning Environments, M. Jeusfeld c/o Redaktion Sun SITE, Informatik V, pp. 102–109. RWTH Aachen (2009)Google Scholar
  14. 14.
    Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)Google Scholar
  15. 15.
    Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: 40th Annual Meeting of the Association for Computational Linguistics, pp. 417–424. ACL (2002)Google Scholar
  16. 16.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Conference on Emprirical Methods in Natural Language Processing, pp. 79–86. ACL (2002)Google Scholar
  17. 17.
    Ye, Q., Zhang, Z., Law, R.: Sentiment Classification of Online Reviews to Travel Destinations by Supervised Machine Learning Approaches. Expert Systems with Applications 36(3), 6527–6535 (2009)CrossRefGoogle Scholar
  18. 18.
    Kennedy, A., Inkpen, D.: Sentiment Classification of Movie Reviews Using Contextual Valence Shifters. Computational Intelligence 22(2), 110–225 (2006)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Walter Hop
    • 1
  • Stephan Lachner
    • 1
  • Flavius Frasincar
    • 1
  • Roberto De Virgilio
    • 2
  1. 1.Erasmus School of EconomicsErasmus University RotterdamRotterdamThe Netherlands
  2. 2.Dipartimento di Informatica e AutomazioneUniversitá Roma TreRomeItaly

Personalised recommendations