Skip to main content

Context-Driven Semantic Enrichment of Italian News Archive

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 6088)

Abstract

Semantic enrichment of textual data is the operation of linking mentions with the entities they refer to, and the subsequent enrichment of such entities with the background knowledge about them available in one or more knowledge bases (or in the entire web). Information about the context in which a mention occurs, (e.g., information about the time, the topic, and the space, which the text is relative to) constitutes a critical resource for a correct semantic enrichment for two reasons. First, without context, mentions are “too little text” to unambiguously refer to a single entity. Second, knowledge about entities is also context dependent (e.g., speaking about political life of Illinois during 1996, Obama is a Senator, while since 2009, Obama is the US president). In this paper, we describe a concrete approach to context-driven semantic enrichment, built upon four core sub-tasks: detection of mentions in text (i.e., finding references to people, locations and organizations); determination of the context of discourses of the text, identification of the referred entities in the knowledge base, and enrichment of the entity with the knowledge relevant to the context. In such approach, context-driven semantic enrichment needs also to have contextualized background knowledge. To cope with this aspect, we propose a customization of Sesame, one of state-of-the-art knowledge repositories, to support representation and reasoning with contextualized knowledge. The approach has been fully implemented in a system, which has been practically deployed and applied to the textual archive of the local Italian newspaper “L’Adige”, covering the decade of years from 1999 to 2009.

Keywords

  • Background Knowledge
  • Entity Recognition
  • Soccer Match
  • Knowledge Repository
  • Semantic Enrichment

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Ricca, F., Pianta, E., Tonella, P., Girardi, C.: Improving web site understanding with keyword-based clustering. Journal of Software Maintenance and Evolution: Research and Practice 20(1), 1–29 (2008)

    CrossRef  Google Scholar 

  2. Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M., Bizer, C., Lee, R.: Media meets semantic web – how the BBC uses dBpedia and linked data to make connections. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 723–737. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  3. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002)

    CrossRef  Google Scholar 

  4. Carroll, J.J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, provenance and trust. In: Proceedings of the 14th International Conference on World Wide Web (WWW-2005), pp. 613–622 (2005)

    Google Scholar 

  5. Ko, H.J., Kang, W.: Enhanced access control with semantic context hierarchy tree for ubiquitous computing. International Journal of Computer Science and Network Security 8(10), 114–120 (2008)

    Google Scholar 

  6. Bentivogli, L., Girardi, C., Pianta, E.: Creating a gold standard for person cross-document coreference resolution in italian news. In: Proceedings of the Workshop on Resource and Evaluation for Identity Matching, Entity Resolution and Entity Management (LREC-2008 ), Marrakech, Morocco, pp. 19–26 (2008)

    Google Scholar 

  7. Ding, L., Finin, T., Peng, Y., Pinheiro da Silva, P., McGuinness, D.L.: Tracking rdf graph provenance using rdf molecules (Poster paper). In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, Springer, Heidelberg (2005) Poster paper

    CrossRef  Google Scholar 

  8. Liao, H.-C., Tu, C.-C.: A rdf and owl-based temporal context reasoning model for smart home. Information Technology Journal 6, 1130–1138 (2007)

    CrossRef  Google Scholar 

  9. Benerecetti, M., Bouquet, P., Ghidini, C.: On the dimensions of context dependence. In: Bouquet, P., Serafini, L., Thomason, R.H. (eds.) Perspectives on Contexts, ch. 1. CSLI Lecture Notes, pp. 1–18. Center for the Study of Language and Information/SRI (2007)

    Google Scholar 

  10. Speranza, M.: The named entity recognition task at evalita 2009. In: Proceedings of the Workshop Evalita 2009, Reggio Emilia, Italy (2009)

    Google Scholar 

  11. Nickles, M.: Social acquisition of ontologies from communication processes. Appl. Ontol. 2(3-4), 373–397 (2007)

    Google Scholar 

  12. Popescu, O., Magnini, B.: Web people search using name entities. In: Proceedings of the Workshop SemEval-2007, Prague, CZ (2009)

    Google Scholar 

  13. Zanoli, R., Pianta, E., Giuliano, C.: Named entity recognition through redundancy driven classifiers. In: Proceedings of the Workshop Evalita 2009, Reggio Emilia, Italy (2009)

    Google Scholar 

  14. Stajner, T., Rusu, D., Dali, L., Fortuna, B., Mladenic, D., Grobelnik, M.: Enrycher: service oriented text enrichment. In: Proceedings of the 11th International multi-conference Information Society (IS-2009), Ljubljana, Slovenia (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tamilin, A., Magnini, B., Serafini, L., Girardi, C., Joseph, M., Zanoli, R. (2010). Context-Driven Semantic Enrichment of Italian News Archive. In: , et al. The Semantic Web: Research and Applications. ESWC 2010. Lecture Notes in Computer Science, vol 6088. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13486-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13486-9_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13485-2

  • Online ISBN: 978-3-642-13486-9

  • eBook Packages: Computer ScienceComputer Science (R0)