An Architecture for Data and Knowledge Acquisition for the Semantic Web: The AGROVOC Use Case

  • Maria Teresa Pazienza
  • Armando Stellato
  • Alexandra Gabriela Tudorache
  • Andrea Turbati
  • Flaminia Vagnoni
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7567)

Abstract

We are surrounded by ever growing volumes of unstructured and weakly-structured information, and for a human being, domain expert or not, it is nearly impossible to read, understand and categorize such information in a fair amount of time. Moreover, different user categories have different expectations: final users need easy-to-use tools and services for specific tasks, knowledge engineers require robust tools for knowledge acquisition, knowledge categorization and semantic resources development, while semantic applications developers demand for flexible frameworks for fast and easy, standardized development of complex applications. This work represents an experience report on the use of the CODA framework for rapid prototyping and deployment of knowledge acquisition systems for RDF. The system integrates independent NLP tools and custom libraries complying with UIMA standards. For our experiment a document set has been processed to populate the AGROVOC thesaurus with two new relationships.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fiorelli, M., Pazienza, M.T., Petruzza, S., Stellato, A., Turbati, A.: Computer-aided Ontology Development: an integrated environment. In: New Challenges for NLP Frameworks, Valletta, Malta, May 18 (2010)Google Scholar
  2. 2.
    Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Transactions on KDE, 1411–1428 (October 2006)Google Scholar
  3. 3.
    Kluegl, P., Atzmueller, M., Puppe, F.: TextMarker: A Tool for Rule-Based Information Extraction. In: Unstructured Information Management Architecture (UIMA), 2nd UIMA@GSCL Workshop, 2009 Conference of the GSCL (2009)Google Scholar
  4. 4.
    Jayram, T.S., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Zhu, H.: Avatar Information Extraction System. IEEE Data Eng. Bull., 40–48 (2006)Google Scholar
  5. 5.
    Regev, Y., et al.: Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1). SIGKDD 4(2), 90–92 (2002)CrossRefGoogle Scholar
  6. 6.
    Mykowiecka, A., Marciniak, M., Kupsc, A.: Rule-based information extraction from patients’ clinical data. Journal of Biomedical Informatics 42(5), 923–936 (2009)CrossRefGoogle Scholar
  7. 7.
    Vossen, P., Soroa, A., Zapirain, B., Rigau, G.: Cross-lingual event-mining using wordnet as a shared knowledge interface. In: Proceedings of GWC 2012, Japan (January 2012)Google Scholar
  8. 8.
    Pazienza, M.T., Stellato, A.: Linguistic Enrichment of Ontologies: a methodological framework. In: OntoLex 2006, Genoa, Italy (2006)Google Scholar
  9. 9.
    Buitelaar, P., Cimiano, P., Haase, P., Sintek, M.: Towards Linguistically Grounded Ontologies. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 111–125. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Cimiano, P.: Ontology Learning and Population from Text Algorithms, Evaluation and Applications XXVIII. Springer (2006)Google Scholar
  11. 11.
    Cunningham, H.: GATE, a General Architecture for Text Engineering. Computers and the Humanities 36, 223–254 (2002)CrossRefGoogle Scholar
  12. 12.
    Ferrucci, D., Lally, A.: Uima: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10(3-4), 327–348 (2004)CrossRefGoogle Scholar
  13. 13.
    Morshed, A., Keizer, J., Johannsen, G., Stellato, A., Baker, T.: From AGROVOC OWL Model towards AGROVOC SKOS Model. FAOAIMS (2010)Google Scholar
  14. 14.
    Morshed, A., Sini, M.: Creating and aligning controlled vocabularies. Report (2009)Google Scholar
  15. 15.
    Lee, H., et al.: Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task. In: CoNLL-2011 Shared Task (2011)Google Scholar
  16. 16.
    Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  17. 17.
    Liu, B., Chiticariu, L., Chu, V., Jagadish, H.V., Reiss, F.: Automatic Rule Refinement for Information Extraction. PVLDB 3(1), 588–597 (2010)Google Scholar
  18. 18.
    Pazienza, M.T., Stellato, A., Turbati, A.: PEARL: ProjEction of Annotations Rule Language, a Language for Projecting (UIMA) Annotations over RDF Knowledge Bases. In: International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, May 21-27 (2012)Google Scholar
  19. 19.
    Basili, R., Zanzotto, F.M.: Parsing Engineering and Empirical Robustness. Journal of Natural Language Engineering 8 (June 2-3 2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Maria Teresa Pazienza
    • 1
  • Armando Stellato
    • 1
  • Alexandra Gabriela Tudorache
    • 1
  • Andrea Turbati
    • 1
  • Flaminia Vagnoni
    • 1
  1. 1.University of Rome Tor VergataRomeItaly

Personalised recommendations