Automatic Web Page Annotation with Google Rich Snippets
- 3 Citations
- 520 Downloads
Abstract
Web pages are designed to be read by people, not machines. Consequently, searching and reusing information on the Web is a difficult task without human participation. Adding semantics (i.e meaning) to a Web page would help machines to understand Web contents and better support the Web search process. One of the latest developments in this field is Google’s Rich Snippets, a service for Web site owners to add semantics to their Web pages. In this paper we provide an approach to automatically annotate a Web page with Rich Snippets RDFa tags. Exploiting several heuristics and a named entity recognition technique, our method is capable of recognizing and annotating a subset of Rich Snippets’ vocabulary, i.e., all attributes of its Review concept, and the names of Person and Organization concepts. We implemented an on-line service and evaluated the accuracy of the approach on real E-commerce Web sites.
Keywords
Name Entity Recognition Entity Recognition Page Title Page Area Natural TextPreview
Unable to display preview. Download preview PDF.
References
- 1.Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 34–43 (2001)CrossRefGoogle Scholar
- 2.Goel, K., Guha, R.V., Hansson, O.: Introducing Rich Snippets, http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html
- 3.Google: Google Webmaster Tools: About review data (2009), http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146645
- 4.Adida, B., Birbeck, M.: RDFa Primer: Bridging the Human and Data Webs (2008), http://www.w3.org/TR/xhtml-rdfa-primer/
- 5.Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without gazetteers. In: Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 1–8. Association for Computational Linguistics (1999)Google Scholar
- 6.Morgan, R., Garigliano, R., Callaghan, P., Poria, S., Smith, M., Urbanowicz, A., Collingham, R., Costantino, M., Cooper, C., Group, L.: University of Durham: Description of the LOLITA System as Used in MUC-6. In: Sixth Message Understanding Conference. Morgan Kaufmann Publishers, San Francisco (1995)Google Scholar
- 7.Krupka, G.R., Hausman, K.: IsoQuest, Inc: Description of the NetOwl(TM) extractor system as used for MUC-7. In: Seventh Message Understanding Conference (1998)Google Scholar
- 8.Seomoz.org.: Search Engine Ranking Factors (2009), http://www.seomoz.org/article/search-ranking-factors
- 9.Virgilio, R.D., Torlone, R.: A Structured Approach to Data Reverse Engineering of Web Applications. In: 9th International Conference on Web Engineering, pp. 91–105. Springer, Heidelberg (2009)Google Scholar
- 10.Can, L., Qian, Z., Xiaofeng, M., Wenyin, L.: Postal Address Detection from Web Documents. In: International Workshop on Challenges in Web Information Retrieval and Integration, pp. 40–45. IEEE Computer Society, Los Alamitos (2005)CrossRefGoogle Scholar
- 11.Yahoo!: SearchMonkey: Site Owner Overview (2009), http://developer.yahoo.com/searchmonkey/siteowner.html
- 12.Electrum: Valid HTML Statistics (2009), http://try.powermapper.com/demo/statsvalid.aspx
- 13.Tomberg, V., Laanpere, M.: RDFa versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-up Personal Learning Environments. In: Second International Workshop on Mashup Personal Learning Environments, M. Jeusfeld c/o Redaktion Sun SITE, Informatik V, pp. 102–109. RWTH Aachen (2009)Google Scholar
- 14.Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: Thirteenth Conference on Computational Natural Language Learning, pp. 147–155. Association for Computational Linguistics (2009)Google Scholar
- 15.Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: 40th Annual Meeting of the Association for Computational Linguistics, pp. 417–424. ACL (2002)Google Scholar
- 16.Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Conference on Emprirical Methods in Natural Language Processing, pp. 79–86. ACL (2002)Google Scholar
- 17.Ye, Q., Zhang, Z., Law, R.: Sentiment Classification of Online Reviews to Travel Destinations by Supervised Machine Learning Approaches. Expert Systems with Applications 36(3), 6527–6535 (2009)CrossRefGoogle Scholar
- 18.Kennedy, A., Inkpen, D.: Sentiment Classification of Movie Reviews Using Contextual Valence Shifters. Computational Intelligence 22(2), 110–225 (2006)MathSciNetCrossRefGoogle Scholar