Abstract
This chapter presents an example of software architecture, developed by the authors, for performing Ontology Based Information Extraction (OBIE) using an arbitrary ontology. The goal of the architecture is to allow the deployment of applications for arbitrary domains without need of system reprogramming. For that, human operator(s) define the semantics of the application and provide some examples of ontology concepts in target texts; then the system learns how to extract information according to the defined ontology.
An instantiation of the proposed architecture using freely available and high performance software tools is also presented. This instantiation is made for processing texts in a natural language, Portuguese, that was not the original target for most of the tools, showing and discussing the preparation of tools for other languages than the ones provided out of the box.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afonso S, Bick E, Haber R, Santos D (2002) “Floresta sintá(c)tica”: a treebank for Portuguese. In: Proceedings of the third international conference on Language Resources and Evaluation (LREC). pp 1698–1703
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the Web of Data. Web Semant 7:154–165
Cardoso N (2008) REMBRANDT—Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Mota C, Santos D (eds) Desafios Na Avaliação Conjunta Do Reconhecimento de Entidades Mencionadas: O Segundo HAREM. Linguateca, pp 195–211
Carlson A, Betteridge J, Hruschka ER, Mitchell TM (2009) Coupling semi-supervised learning of categories and relations. In: SemiSupLearn ’09: Proceedings of the NAACL HLT 2009 workshop on semi-supervised learning for natural language processing. Association for Computational Linguistics, Stroudsburg, pp 1–9
Chakravarthy A, Ciravegna F, Lanfranchi V (2006) Cross-media document annotation and enrichment. In: SAAW2006—Proceedings of the 1st Semantic Authoring and Annotation Workshop
Freitas C, Rocha P, Bick E (2008) Floresta Sintá(c)tica: bigger, thicker and easier. In: Teixeira A, de Lima V, de Oliveira L, Quaresma P (eds) PROPOR 2008—Proceedings of the international conference on computational processing of the Portuguese language. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 216–219
Garcia M, Gamallo P, Gayo I, Cruz MAP (2014) PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems. Nat Lang Process 53:95–101
Hall J, Nilsson J, Nivre J, Eryigit G, Megyesi B, Nilsson M, Saers M (2007) Single malt or blended? A study in multilingual parser optimization. In: Proceedings of the CoNLL shared task session of EMNLP-CoNLL 2007. Association for Computational Linguistics, Prague, pp 933–939
Kiss T, Strunk J (2006) Unsupervised multilingual sentence boundary detection. Comput Linguist 32:485–525
Knublauch H, Fergerson R, Noy N, Musen M (2004) The Protégé OWL plugin: an open development environment for semantic web applications. In: McIlraith S, Plexousakis D, van Harmelen F (eds) The Semantic Web—ISWC 2004—Proceedings of the 3rd international Semantic Web conference. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 229–243
Kübler S, McDonald R, Nivre J (2009) Dependency parsing. In: Synthesis lectures on human language technologies, vol 2. Morgan & Claypool, San Rafael
Màrquez L, Klein D (eds) (2006) CoNLL-X—Proceedings of the tenth conference on computational natural language learning. Omnipress, New York
Mota C, Santos D (eds) (2008) Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca
Nivre J, Hall J, Nilsson J, Atanas C, Eryiqit G, Kübler S, Marinov S, Marsi E (2006) Labeled pseudo-projective dependency parsing with support vector machines. In: CoNLL-X—Proceedings of the 10th conference on computational natural language learning. Association for Computational Linguistics, Stroudsburg, pp 221–225
Noy N, Fergerson R, Musen M (2000) The knowledge model of Protégé-2000: combining interoperability and flexibility. In: Dieng R, Corby O (eds) EKAW 2000–Proceedings of the 12th international conference on knowledge engineering and knowledge management. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 69–82
Rodrigues M (2013) Model of access to natural language sources is electronic government. Ph.D. Thesis, University of Aveiro
Rodrigues M, Dias GP, Teixeira A (2011a) Criação e acesso a informação semântica aplicada ao governo eletrónico. Linguamática 3:55–68
Rodrigues M, Dias GP, Teixeira A (2011b) Ontology driven knowledge extraction system with application in e-government. In: Proceedings of the 15th Portuguese conference on artificial intelligence, Lisboa. pp 760–774
Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the international conference on new methods in language processing, Manchester
Sirin E, Parsia B (2004) Pellet: an OWL DL reasoner. In: Haarslev V, Möller R (eds) DL 2004—Proceedings of the 2004 international workshop on description logics, CEUR workshop proceedings. pp 212–213
Suchanek FM, Ifrim G, Weikum G (2006) LEILA: learning to extract information by linguistic analysis. In: Proceedings of the 2nd workshop on ontology learning and population: bridging the gap between text and knowledge. Association for Computational Linguistics, Sydney, pp 18–25
Teixeira A, Ferreira L, Rodrigues M (2014) Online health information semantic search and exploration: reporting on two prototypes for performing extraction on both a hospital intranet and the world wide web. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, pp 49–73
Weibel S, Kunze J, Lagoze C, Wolf M (1998) Dublin core metadata for resource discovery. Internet Engineering Task Force RFC 2413
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 The Authors
About this chapter
Cite this chapter
Rodrigues, M., Teixeira, A. (2015). Extracting Relevant Information Using a Given Semantic. In: Advanced Applications of Natural Language Processing for Performing Information Extraction. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-15563-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-15563-0_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15562-3
Online ISBN: 978-3-319-15563-0
eBook Packages: EngineeringEngineering (R0)