Abstract
We are surrounded by ever growing volumes of unstructured and weakly-structured information, and for a human being, domain expert or not, it is nearly impossible to read, understand and categorize such information in a fair amount of time. Moreover, different user categories have different expectations: final users need easy-to-use tools and services for specific tasks, knowledge engineers require robust tools for knowledge acquisition, knowledge categorization and semantic resources development, while semantic applications developers demand for flexible frameworks for fast and easy, standardized development of complex applications. This work represents an experience report on the use of the CODA framework for rapid prototyping and deployment of knowledge acquisition systems for RDF. The system integrates independent NLP tools and custom libraries complying with UIMA standards. For our experiment a document set has been processed to populate the AGROVOC thesaurus with two new relationships.
Keywords
- Knowledge Acquisition
- Resource Description Framework
- Relation Extraction
- Ontology Development
- Negative Sentence
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Fiorelli, M., Pazienza, M.T., Petruzza, S., Stellato, A., Turbati, A.: Computer-aided Ontology Development: an integrated environment. In: New Challenges for NLP Frameworks, Valletta, Malta, May 18 (2010)
Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Transactions on KDE, 1411–1428 (October 2006)
Kluegl, P., Atzmueller, M., Puppe, F.: TextMarker: A Tool for Rule-Based Information Extraction. In: Unstructured Information Management Architecture (UIMA), 2nd UIMA@GSCL Workshop, 2009 Conference of the GSCL (2009)
Jayram, T.S., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Zhu, H.: Avatar Information Extraction System. IEEE Data Eng. Bull., 40–48 (2006)
Regev, Y., et al.: Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1). SIGKDD 4(2), 90–92 (2002)
Mykowiecka, A., Marciniak, M., Kupsc, A.: Rule-based information extraction from patients’ clinical data. Journal of Biomedical Informatics 42(5), 923–936 (2009)
Vossen, P., Soroa, A., Zapirain, B., Rigau, G.: Cross-lingual event-mining using wordnet as a shared knowledge interface. In: Proceedings of GWC 2012, Japan (January 2012)
Pazienza, M.T., Stellato, A.: Linguistic Enrichment of Ontologies: a methodological framework. In: OntoLex 2006, Genoa, Italy (2006)
Buitelaar, P., Cimiano, P., Haase, P., Sintek, M.: Towards Linguistically Grounded Ontologies. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 111–125. Springer, Heidelberg (2009)
Cimiano, P.: Ontology Learning and Population from Text Algorithms, Evaluation and Applications XXVIII. Springer (2006)
Cunningham, H.: GATE, a General Architecture for Text Engineering. Computers and the Humanities 36, 223–254 (2002)
Ferrucci, D., Lally, A.: Uima: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10(3-4), 327–348 (2004)
Morshed, A., Keizer, J., Johannsen, G., Stellato, A., Baker, T.: From AGROVOC OWL Model towards AGROVOC SKOS Model. FAOAIMS (2010)
Morshed, A., Sini, M.: Creating and aligning controlled vocabularies. Report (2009)
Lee, H., et al.: Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task. In: CoNLL-2011 Shared Task (2011)
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)
Liu, B., Chiticariu, L., Chu, V., Jagadish, H.V., Reiss, F.: Automatic Rule Refinement for Information Extraction. PVLDB 3(1), 588–597 (2010)
Pazienza, M.T., Stellato, A., Turbati, A.: PEARL: ProjEction of Annotations Rule Language, a Language for Projecting (UIMA) Annotations over RDF Knowledge Bases. In: International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, May 21-27 (2012)
Basili, R., Zanzotto, F.M.: Parsing Engineering and Empirical Robustness. Journal of Natural Language Engineering 8 (June 2-3 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pazienza, M.T., Stellato, A., Tudorache, A.G., Turbati, A., Vagnoni, F. (2012). An Architecture for Data and Knowledge Acquisition for the Semantic Web: The AGROVOC Use Case. In: Herrero, P., Panetto, H., Meersman, R., Dillon, T. (eds) On the Move to Meaningful Internet Systems: OTM 2012 Workshops. OTM 2012. Lecture Notes in Computer Science, vol 7567. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33618-8_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-33618-8_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33617-1
Online ISBN: 978-3-642-33618-8
eBook Packages: Computer ScienceComputer Science (R0)