RDFa Based Annotation of Web Pages through Keyphrases Extraction

  • Roberto De Virgilio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7045)


The goal of the Semantic Web is the creation of a linked mesh of information that is easily processable by machines, on a global scale. The process of upgrading current Web pages to machine-understandable units of information relies on semantic annotation. A typical process of semantic annotation includes three main tasks: (i) the identification of an ontology describing the domain of interest, (ii) the discovering of the concepts of the ontology in the target Web pages, and (iii) the annotations of each page with links to Web resources describing the content of the page. The goal is to support an ontology-aware agent in the interpretation of target documents. In this paper, we present an approach to the automatic annotation of Web pages. Exploiting a data reverse engineering technique, our approach is capable of: recognizing entities in Web pages, extracting keyphrases from them, and annotating such pages with RDFa tags that map discovered entities to Linked data repositories matching the extracted keyphrases. We have implemented the approach and evaluated its accuracy of on real Web sites for e-commerce.


Semantic Concept Semantic Annotation Target Pattern Link Open Data Keyphrases Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 34–43 (2001)CrossRefGoogle Scholar
  2. 2.
    Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM, pp. 233–242 (2007)Google Scholar
  3. 3.
    Milne, D.N., Witten, I.H.: Learning to link with wikipedia. In: CIKM, pp. 509–518 (2008)Google Scholar
  4. 4.
    Gardner, J.J., Xiong, L.: Automatic link detection: a sequence labeling approach. In: CIKM, pp. 1701–1704 (2009)Google Scholar
  5. 5.
    Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM – Semi-Automatic Creation of Metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 358–372. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 379–391. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R.V., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A case for automated large-scale semantic annotation. J. Web Sem. 1(1), 115–132 (2003)CrossRefGoogle Scholar
  8. 8.
    Adida, B., Birbeck M.: RDFa Primer: Bridging the Human and Data Webs (2008),
  9. 9.
    Laender, A., Ribeiro-Neto, B., Silva, A.D., Teixeira, J.S.: A brief survey of web data extraction tools. ACM SIGMOD Record 31(2), 84–93 (2002)CrossRefGoogle Scholar
  10. 10.
    De Virgilio, R., Torlone, R.: A Structured Approach to Data Reverse Engineering of Web Applications. In: Gaedke, M., Grossniklaus, M., Díaz, O. (eds.) ICWE 2009. LNCS, vol. 5648, pp. 91–105. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: Practical automatic keyphrase extraction. In: ACM DL, pp. 254–255 (1999)Google Scholar
  12. 12.
    De Virgilio, R., Cappellari, P., Miscione, M.: Cluster-Based Exploration for Effective Keyword Search Over Semantic Datasets. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds.) ER 2009. LNCS, vol. 5829, pp. 205–218. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Kahan, J., Koivunen, M.R., Prud’hommeaux, E., Swick, R.R.: Annotea: an open rdf infrastructure for shared web annotations. Computer Networks 39(5), 589–608 (2002)CrossRefGoogle Scholar
  14. 14.
    Ciravegna, F., Dingli, A., Wilks, Y., Petrelli, D.: Amilcare: adaptive information extraction for document annotation. In: SIGIR, pp. 367–368 (2002)Google Scholar
  15. 15.
    Decker, S., Erdmann, M., Fensel, D., Studer, R.: Ontobroker: Ontology based access to distributed and semi-structured information. In: Proceedings of the IFIP TC2/WG2.6 Eighth Working Conference on Database Semantics- Semantic Issues in Multimedia Systems, vol. DS-8, pp. 351–369 (1998)Google Scholar
  16. 16.
    Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic annotation, indexing, and retrieval. J. Web Sem. 2(1), 49–79 (2004)CrossRefGoogle Scholar
  17. 17.
    De Virgilio, R., Torlone, R.: A Meta-Model Approach to the Management of Hypertexts in Web Information Systems. In: Song, I.-Y., Piattini, M., Chen, Y.-P.P., Hartmann, S., Grandi, F., Trujillo, J., Opdahl, A.L., Ferri, F., Grifoni, P., Caschera, M.C., Rolland, C., Woo, C., Salinesi, C., Zimányi, E., Claramunt, C., Frasincar, F., Houben, G.-J., Thiran, P. (eds.) ER Workshops 2008. LNCS, vol. 5232, pp. 416–425. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Allison, L., Wallace, C.S., Yee, C.N.: When is a string like a string? AI & Maths (1990)Google Scholar
  19. 19.
    Tomberg, V., Laanpere, M.: RDFa versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-up Personal Learning Environments. In: Second International Workshop on Mashup Personal Learning Environments, M. Jeusfeld c/o Redaktion Sun SITE, Informatik V, RWTH Aachen, pp. 102–109 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Roberto De Virgilio
    • 1
  1. 1.Dipartimento di Informatica e AutomazioneUniversitá Roma TreRomeItaly

Personalised recommendations