Advertisement

Context-Driven Semantic Enrichment of Italian News Archive

  • Andrei Tamilin
  • Bernardo Magnini
  • Luciano Serafini
  • Christian Girardi
  • Mathew Joseph
  • Roberto Zanoli
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6088)

Abstract

Semantic enrichment of textual data is the operation of linking mentions with the entities they refer to, and the subsequent enrichment of such entities with the background knowledge about them available in one or more knowledge bases (or in the entire web). Information about the context in which a mention occurs, (e.g., information about the time, the topic, and the space, which the text is relative to) constitutes a critical resource for a correct semantic enrichment for two reasons. First, without context, mentions are “too little text” to unambiguously refer to a single entity. Second, knowledge about entities is also context dependent (e.g., speaking about political life of Illinois during 1996, Obama is a Senator, while since 2009, Obama is the US president). In this paper, we describe a concrete approach to context-driven semantic enrichment, built upon four core sub-tasks: detection of mentions in text (i.e., finding references to people, locations and organizations); determination of the context of discourses of the text, identification of the referred entities in the knowledge base, and enrichment of the entity with the knowledge relevant to the context. In such approach, context-driven semantic enrichment needs also to have contextualized background knowledge. To cope with this aspect, we propose a customization of Sesame, one of state-of-the-art knowledge repositories, to support representation and reasoning with contextualized knowledge. The approach has been fully implemented in a system, which has been practically deployed and applied to the textual archive of the local Italian newspaper “L’Adige”, covering the decade of years from 1999 to 2009.

Keywords

Background Knowledge Entity Recognition Soccer Match Knowledge Repository Semantic Enrichment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Ricca, F., Pianta, E., Tonella, P., Girardi, C.: Improving web site understanding with keyword-based clustering. Journal of Software Maintenance and Evolution: Research and Practice 20(1), 1–29 (2008)CrossRefGoogle Scholar
  2. 2.
    Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M., Bizer, C., Lee, R.: Media meets semantic web – how the BBC uses dBpedia and linked data to make connections. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 723–737. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Carroll, J.J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, provenance and trust. In: Proceedings of the 14th International Conference on World Wide Web (WWW-2005), pp. 613–622 (2005)Google Scholar
  5. 5.
    Ko, H.J., Kang, W.: Enhanced access control with semantic context hierarchy tree for ubiquitous computing. International Journal of Computer Science and Network Security 8(10), 114–120 (2008)Google Scholar
  6. 6.
    Bentivogli, L., Girardi, C., Pianta, E.: Creating a gold standard for person cross-document coreference resolution in italian news. In: Proceedings of the Workshop on Resource and Evaluation for Identity Matching, Entity Resolution and Entity Management (LREC-2008 ), Marrakech, Morocco, pp. 19–26 (2008)Google Scholar
  7. 7.
    Ding, L., Finin, T., Peng, Y., Pinheiro da Silva, P., McGuinness, D.L.: Tracking rdf graph provenance using rdf molecules (Poster paper). In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, Springer, Heidelberg (2005) Poster paperCrossRefGoogle Scholar
  8. 8.
    Liao, H.-C., Tu, C.-C.: A rdf and owl-based temporal context reasoning model for smart home. Information Technology Journal 6, 1130–1138 (2007)CrossRefGoogle Scholar
  9. 9.
    Benerecetti, M., Bouquet, P., Ghidini, C.: On the dimensions of context dependence. In: Bouquet, P., Serafini, L., Thomason, R.H. (eds.) Perspectives on Contexts, ch. 1. CSLI Lecture Notes, pp. 1–18. Center for the Study of Language and Information/SRI (2007)Google Scholar
  10. 10.
    Speranza, M.: The named entity recognition task at evalita 2009. In: Proceedings of the Workshop Evalita 2009, Reggio Emilia, Italy (2009)Google Scholar
  11. 11.
    Nickles, M.: Social acquisition of ontologies from communication processes. Appl. Ontol. 2(3-4), 373–397 (2007)Google Scholar
  12. 12.
    Popescu, O., Magnini, B.: Web people search using name entities. In: Proceedings of the Workshop SemEval-2007, Prague, CZ (2009)Google Scholar
  13. 13.
    Zanoli, R., Pianta, E., Giuliano, C.: Named entity recognition through redundancy driven classifiers. In: Proceedings of the Workshop Evalita 2009, Reggio Emilia, Italy (2009)Google Scholar
  14. 14.
    Stajner, T., Rusu, D., Dali, L., Fortuna, B., Mladenic, D., Grobelnik, M.: Enrycher: service oriented text enrichment. In: Proceedings of the 11th International multi-conference Information Society (IS-2009), Ljubljana, Slovenia (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Andrei Tamilin
    • 1
  • Bernardo Magnini
    • 1
  • Luciano Serafini
    • 1
  • Christian Girardi
    • 1
  • Mathew Joseph
    • 1
  • Roberto Zanoli
    • 1
  1. 1.FBK, Center for Information Technology - IRSTPovo di TrentoItaly

Personalised recommendations