Extracting Relevant Information Using a Given Semantic

Rodrigues, Mário; Teixeira, António

doi:10.1007/978-3-319-15563-0_4

Mário Rodrigues⁴ &
António Teixeira⁵

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

1858 Accesses

Abstract

This chapter presents an example of software architecture, developed by the authors, for performing Ontology Based Information Extraction (OBIE) using an arbitrary ontology. The goal of the architecture is to allow the deployment of applications for arbitrary domains without need of system reprogramming. For that, human operator(s) define the semantics of the application and provide some examples of ontology concepts in target texts; then the system learns how to extract information according to the defined ontology.

An instantiation of the proposed architecture using freely available and high performance software tools is also presented. This instantiation is made for processing texts in a natural language, Portuguese, that was not the original target for most of the tools, showing and discussing the preparation of tools for other languages than the ones provided out of the box.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Afonso S, Bick E, Haber R, Santos D (2002) “Floresta sintá(c)tica”: a treebank for Portuguese. In: Proceedings of the third international conference on Language Resources and Evaluation (LREC). pp 1698–1703
Google Scholar
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the Web of Data. Web Semant 7:154–165
Article Google Scholar
Cardoso N (2008) REMBRANDT—Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Mota C, Santos D (eds) Desafios Na Avaliação Conjunta Do Reconhecimento de Entidades Mencionadas: O Segundo HAREM. Linguateca, pp 195–211
Google Scholar
Carlson A, Betteridge J, Hruschka ER, Mitchell TM (2009) Coupling semi-supervised learning of categories and relations. In: SemiSupLearn ’09: Proceedings of the NAACL HLT 2009 workshop on semi-supervised learning for natural language processing. Association for Computational Linguistics, Stroudsburg, pp 1–9
Google Scholar
Chakravarthy A, Ciravegna F, Lanfranchi V (2006) Cross-media document annotation and enrichment. In: SAAW2006—Proceedings of the 1st Semantic Authoring and Annotation Workshop
Google Scholar
Freitas C, Rocha P, Bick E (2008) Floresta Sintá(c)tica: bigger, thicker and easier. In: Teixeira A, de Lima V, de Oliveira L, Quaresma P (eds) PROPOR 2008—Proceedings of the international conference on computational processing of the Portuguese language. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 216–219
Google Scholar
Garcia M, Gamallo P, Gayo I, Cruz MAP (2014) PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems. Nat Lang Process 53:95–101
Google Scholar
Hall J, Nilsson J, Nivre J, Eryigit G, Megyesi B, Nilsson M, Saers M (2007) Single malt or blended? A study in multilingual parser optimization. In: Proceedings of the CoNLL shared task session of EMNLP-CoNLL 2007. Association for Computational Linguistics, Prague, pp 933–939
Google Scholar
Kiss T, Strunk J (2006) Unsupervised multilingual sentence boundary detection. Comput Linguist 32:485–525
Article Google Scholar
Knublauch H, Fergerson R, Noy N, Musen M (2004) The Protégé OWL plugin: an open development environment for semantic web applications. In: McIlraith S, Plexousakis D, van Harmelen F (eds) The Semantic Web—ISWC 2004—Proceedings of the 3rd international Semantic Web conference. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 229–243
Google Scholar
Kübler S, McDonald R, Nivre J (2009) Dependency parsing. In: Synthesis lectures on human language technologies, vol 2. Morgan & Claypool, San Rafael
Google Scholar
Màrquez L, Klein D (eds) (2006) CoNLL-X—Proceedings of the tenth conference on computational natural language learning. Omnipress, New York
Google Scholar
Mota C, Santos D (eds) (2008) Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca
Google Scholar
Nivre J, Hall J, Nilsson J, Atanas C, Eryiqit G, Kübler S, Marinov S, Marsi E (2006) Labeled pseudo-projective dependency parsing with support vector machines. In: CoNLL-X—Proceedings of the 10th conference on computational natural language learning. Association for Computational Linguistics, Stroudsburg, pp 221–225
Google Scholar
Noy N, Fergerson R, Musen M (2000) The knowledge model of Protégé-2000: combining interoperability and flexibility. In: Dieng R, Corby O (eds) EKAW 2000–Proceedings of the 12th international conference on knowledge engineering and knowledge management. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 69–82
Google Scholar
Rodrigues M (2013) Model of access to natural language sources is electronic government. Ph.D. Thesis, University of Aveiro
Google Scholar
Rodrigues M, Dias GP, Teixeira A (2011a) Criação e acesso a informação semântica aplicada ao governo eletrónico. Linguamática 3:55–68
Google Scholar
Rodrigues M, Dias GP, Teixeira A (2011b) Ontology driven knowledge extraction system with application in e-government. In: Proceedings of the 15th Portuguese conference on artificial intelligence, Lisboa. pp 760–774
Google Scholar
Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the international conference on new methods in language processing, Manchester
Google Scholar
Sirin E, Parsia B (2004) Pellet: an OWL DL reasoner. In: Haarslev V, Möller R (eds) DL 2004—Proceedings of the 2004 international workshop on description logics, CEUR workshop proceedings. pp 212–213
Google Scholar
Suchanek FM, Ifrim G, Weikum G (2006) LEILA: learning to extract information by linguistic analysis. In: Proceedings of the 2nd workshop on ontology learning and population: bridging the gap between text and knowledge. Association for Computational Linguistics, Sydney, pp 18–25
Google Scholar
Teixeira A, Ferreira L, Rodrigues M (2014) Online health information semantic search and exploration: reporting on two prototypes for performing extraction on both a hospital intranet and the world wide web. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, pp 49–73
Google Scholar
Weibel S, Kunze J, Lagoze C, Wolf M (1998) Dublin core metadata for resource discovery. Internet Engineering Task Force RFC 2413
Google Scholar

Download references

Author information

Authors and Affiliations

ESTGA/IEETA, University of Aveiro, Aveiro, Portugal
Mário Rodrigues
DETI/IEETA, University of Aveiro, Aveiro, Portugal
António Teixeira

Authors

Mário Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
António Teixeira
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rodrigues, M., Teixeira, A. (2015). Extracting Relevant Information Using a Given Semantic. In: Advanced Applications of Natural Language Processing for Performing Information Extraction. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-15563-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-15563-0_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15562-3
Online ISBN: 978-3-319-15563-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics