Skip to main content

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 1858 Accesses

Abstract

This chapter presents an example of software architecture, developed by the authors, for performing Ontology Based Information Extraction (OBIE) using an arbitrary ontology. The goal of the architecture is to allow the deployment of applications for arbitrary domains without need of system reprogramming. For that, human operator(s) define the semantics of the application and provide some examples of ontology concepts in target texts; then the system learns how to extract information according to the defined ontology.

An instantiation of the proposed architecture using freely available and high performance software tools is also presented. This instantiation is made for processing texts in a natural language, Portuguese, that was not the original target for most of the tools, showing and discussing the preparation of tools for other languages than the ones provided out of the box.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.linguateca.pt/floresta/CoNLL-X/

  2. 2.

    http://ftp.jaist.ac.jp/pub/sourceforge/a/ak/aktivemedia/

References

  • Afonso S, Bick E, Haber R, Santos D (2002) “Floresta sintá(c)tica”: a treebank for Portuguese. In: Proceedings of the third international conference on Language Resources and Evaluation (LREC). pp 1698–1703

    Google Scholar 

  • Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the Web of Data. Web Semant 7:154–165

    Article  Google Scholar 

  • Cardoso N (2008) REMBRANDT—Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto. In: Mota C, Santos D (eds) Desafios Na Avaliação Conjunta Do Reconhecimento de Entidades Mencionadas: O Segundo HAREM. Linguateca, pp 195–211

    Google Scholar 

  • Carlson A, Betteridge J, Hruschka ER, Mitchell TM (2009) Coupling semi-supervised learning of categories and relations. In: SemiSupLearn ’09: Proceedings of the NAACL HLT 2009 workshop on semi-supervised learning for natural language processing. Association for Computational Linguistics, Stroudsburg, pp 1–9

    Google Scholar 

  • Chakravarthy A, Ciravegna F, Lanfranchi V (2006) Cross-media document annotation and enrichment. In: SAAW2006—Proceedings of the 1st Semantic Authoring and Annotation Workshop

    Google Scholar 

  • Freitas C, Rocha P, Bick E (2008) Floresta Sintá(c)tica: bigger, thicker and easier. In: Teixeira A, de Lima V, de Oliveira L, Quaresma P (eds) PROPOR 2008—Proceedings of the international conference on computational processing of the Portuguese language. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 216–219

    Google Scholar 

  • Garcia M, Gamallo P, Gayo I, Cruz MAP (2014) PoS-tagging the Web in Portuguese. National varieties, text typologies and spelling systems. Nat Lang Process 53:95–101

    Google Scholar 

  • Hall J, Nilsson J, Nivre J, Eryigit G, Megyesi B, Nilsson M, Saers M (2007) Single malt or blended? A study in multilingual parser optimization. In: Proceedings of the CoNLL shared task session of EMNLP-CoNLL 2007. Association for Computational Linguistics, Prague, pp 933–939

    Google Scholar 

  • Kiss T, Strunk J (2006) Unsupervised multilingual sentence boundary detection. Comput Linguist 32:485–525

    Article  Google Scholar 

  • Knublauch H, Fergerson R, Noy N, Musen M (2004) The Protégé OWL plugin: an open development environment for semantic web applications. In: McIlraith S, Plexousakis D, van Harmelen F (eds) The Semantic Web—ISWC 2004—Proceedings of the 3rd international Semantic Web conference. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 229–243

    Google Scholar 

  • Kübler S, McDonald R, Nivre J (2009) Dependency parsing. In: Synthesis lectures on human language technologies, vol 2. Morgan & Claypool, San Rafael

    Google Scholar 

  • Màrquez L, Klein D (eds) (2006) CoNLL-X—Proceedings of the tenth conference on computational natural language learning. Omnipress, New York

    Google Scholar 

  • Mota C, Santos D (eds) (2008) Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca

    Google Scholar 

  • Nivre J, Hall J, Nilsson J, Atanas C, Eryiqit G, Kübler S, Marinov S, Marsi E (2006) Labeled pseudo-projective dependency parsing with support vector machines. In: CoNLL-X—Proceedings of the 10th conference on computational natural language learning. Association for Computational Linguistics, Stroudsburg, pp 221–225

    Google Scholar 

  • Noy N, Fergerson R, Musen M (2000) The knowledge model of Protégé-2000: combining interoperability and flexibility. In: Dieng R, Corby O (eds) EKAW 2000–Proceedings of the 12th international conference on knowledge engineering and knowledge management. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 69–82

    Google Scholar 

  • Rodrigues M (2013) Model of access to natural language sources is electronic government. Ph.D. Thesis, University of Aveiro

    Google Scholar 

  • Rodrigues M, Dias GP, Teixeira A (2011a) Criação e acesso a informação semântica aplicada ao governo eletrónico. Linguamática 3:55–68

    Google Scholar 

  • Rodrigues M, Dias GP, Teixeira A (2011b) Ontology driven knowledge extraction system with application in e-government. In: Proceedings of the 15th Portuguese conference on artificial intelligence, Lisboa. pp 760–774

    Google Scholar 

  • Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the international conference on new methods in language processing, Manchester

    Google Scholar 

  • Sirin E, Parsia B (2004) Pellet: an OWL DL reasoner. In: Haarslev V, Möller R (eds) DL 2004—Proceedings of the 2004 international workshop on description logics, CEUR workshop proceedings. pp 212–213

    Google Scholar 

  • Suchanek FM, Ifrim G, Weikum G (2006) LEILA: learning to extract information by linguistic analysis. In: Proceedings of the 2nd workshop on ontology learning and population: bridging the gap between text and knowledge. Association for Computational Linguistics, Sydney, pp 18–25

    Google Scholar 

  • Teixeira A, Ferreira L, Rodrigues M (2014) Online health information semantic search and exploration: reporting on two prototypes for performing extraction on both a hospital intranet and the world wide web. In: Neustein A (ed) Text mining of web-based medical content. De Gruyter, Berlin, pp 49–73

    Google Scholar 

  • Weibel S, Kunze J, Lagoze C, Wolf M (1998) Dublin core metadata for resource discovery. Internet Engineering Task Force RFC 2413

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 The Authors

About this chapter

Cite this chapter

Rodrigues, M., Teixeira, A. (2015). Extracting Relevant Information Using a Given Semantic. In: Advanced Applications of Natural Language Processing for Performing Information Extraction. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-15563-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15563-0_4

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15562-3

  • Online ISBN: 978-3-319-15563-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics