Skip to main content

Advertisement

Log in

DptOIE: a Portuguese open information extraction based on dependency analysis

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

It is estimated that more than 80% of the information on the Web is stored in textual form. As such, it has become increasingly difficult for humans to sort and extract useful information from the daily influx of data. In order to automate this process, open information extraction (OIE) methods have been proposed, which can extract facts from large textual bases. While most OIE methods were initially developed for the English language, the importance of developing methods for other languages, such as Portuguese, has been increasingly recognized in recent literature. OIE methods based on hand-crafted rules and shallow syntactic analysis have achieved good performances for the English language. Nevertheless, methods based on similar approaches in the Portuguese language have not achieved equivalent success. We believe that the shallow syntactic patterns previously explored in the literature do not cover important aspects of the Portuguese language syntax. For this reason, we propose the DptOIE method based on a new set of syntax-based rules using dependency parsers and a depth-first search (DFS) algorithm for OIE and a set of grammar-based rules to cover specific syntactic phenomena of the language. DptOIE was compared against the state-of-the-art OIE for the Portuguese language, obtaining favorable results both in our empirical evaluation and at the IberLEF evaluation track of OIE systems for the Portuguese language. Furthermore, we believe our method can be easily adapted to other Romance languages related to Portuguese.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.internetlivestats.com/ (03/24/2021).

  2. https://www.statista.com/statistics/262946/share-of-the-most-common-languages-on-the-internet/.

  3. http://www.internetworldstats.com/stats7.htm (03/24/2021).

  4. ArgOE and DepOE identify clause-like structures on the sentence employing manually-crafted rules on the structure of the dependency trees. Their work is more rigid in exploring the rich variation in syntactic realizations of semantic relations and information structures.

  5. https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2515 (Acessado em 03/24/2021).

  6. http://universaldependencies.org/format.html.

  7. https://github.com/citiususc/Linguakit. (03/24/2021).

  8. Available at: https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2515 (access 03/24/2021).

  9. An improved version with a Portuguese tokenizer has been developed.

  10. http://www.linguateca.pt/cetenfolha/.

  11. http://formas.ufba.br.

  12. https://github.com/FORMAS/DptOIE.

References

  • Akbik A, Broß J (2009) Wanderlust: extracting semantic relations from natural language text using dependency grammar patterns. In: SemSearch workshop day at World Wide Web conference (WWW2009), 2009, vol 48

  • Akbik A, Löser A (2012) KrakeN: N-ary facts in open information extraction. In: Proceedings of the joint workshop on automatic knowledge base construction and Web-scale knowledge extraction, 2012. Association for Computational Linguistics, pp 52–56

  • Banko M, Cafarella MJ, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the Web. IJCAI 7:2670–2676

    Google Scholar 

  • Bassa A, Kroll M, Kern R (2018) GerIE—an open information extraction system for the German language. J Univers Comput Sci 24(1):2–24

    MathSciNet  Google Scholar 

  • Bast H, Haussmann E (2013) Open information extraction via contextual sentence decomposition. In: 2013 IEEE seventh international conference on semantic computing (ICSC), 2013. IEEE, pp 154–159

  • Bechara E (2012) Moderna gramática portuguesa. Nova Fronteira, Rio de Janeiro

    Google Scholar 

  • Bender EM (2009) Linguistically naïve!= language independent: why NLP needs linguistic typology. In: Proceedings of the EACL 2009 workshop on the interaction between linguistics and computational linguistics: virtuous, vicious or vacuous? 2009, pp 26–32

  • Buďa J (2017) A posição do adjetivo no sintagma nominal em português. Études romanes de Brno 38(1):219–238

    Article  Google Scholar 

  • Cabral B, Souza M, Claro DB (2020a) Explainable OpenIE classifier with morpho-syntactic rules. In: Proceedings of the workshop on hybrid intelligence for natural language processing tasks (HI4NLP 2020), 2020. CEUR-WS.org, pp 7–15

  • Cabral BS, Glauber R, Souza M, Claro DB (2020b) CrossOIE: cross-lingual classifier for open information extraction. In: International conference on computational processing of the Portuguese language, 2020. Springer, pp 368–378

  • Cimiano P, Wenderoth J (2005) Automatically learning Qualia structures from the Web. In: Proceedings of the ACL-SIGLEX workshop on deep lexical acquisition, 2005. Association for Computational Linguistics, pp 28–37

  • Claro DB, Souza M, Castellã Xavier C, Oliveira L (2019) Multilingual open information extraction: challenges and opportunities. Information 10(7):228. https://doi.org/10.3390/info10070228

  • Collovini S, Machado G, Vieira R (2016) Extracting and structuring open relations from Portuguese text. In: International conference on computational processing of the Portuguese language, 2016. Springer, pp 153–164

  • Collovini S, Neto JFS, Consoli BS, Terra J, Vieira R, Quaresma P, Souza M, Claro DB, Glauber R (2019) IberLEF 2019 Portuguese named entity recognition and relation extraction tasks. In: IberLEF@ SEPLN, 2019, pp 390–410

  • Cui L, Wei F, Zhou M (2018) Neural open information extraction. CoRR. arXiv:abs/1805.04270

  • Damiano E, Minutolo A, Esposito M (2018) Open information extraction for Italian sentences. In: 2018 32nd International conference on advanced information networking and applications workshops (WAINA), 2018, pp 668–673. https://doi.org/10.1109/WAINA.2018.00165

  • Del Corro L, Gemulla R (2013) ClausIE: clause-based open information extraction. In: Proceedings of the 22nd international conference on World Wide Web, 2013. ACM, pp 355–366

  • Dryer MS, Haspelmath M (eds) (2013) WALS online. Max Planck Institute for Evolutionary Anthropology, Leipzig. https://wals.info/

  • Fader A, Soderland S, Etzioni O (2011) Identifying relations for open information extraction. In: Proceedings of the conference on empirical methods in natural language processing, 2011. Association for Computational Linguistics, pp 1535–1545

  • Faruqui M, Kumar S (2015) Multilingual open relation extraction using cross-lingual projection, pp 1351–1356. arXiv preprint. arXiv:1503.06450, http://www.aclweb.org/anthology/N15-1151

  • Gamallo P, Garcia M (2015) Multilingual open information extraction. In: Portuguese conference on artificial intelligence, 2015. Springer, pp 711–722

  • Gamallo P, Garcia M (2017) Linguakit: uma ferramenta multilingue para a análise linguística e a extração de informação. Linguamática 9(1):19–28

    Article  Google Scholar 

  • Gamallo P, Garcia M, Fernández-Lanza S (2012) Dependency-based open information extraction. In: Proceedings of the joint workshop on unsupervised and semi-supervised learning in NLP, 2012. Association for Computational Linguistics, pp 10–18

  • Garcia M, Gamallo P (2014) Entity-centric coreference resolution of person entities for open information extraction. Proces Leng Nat 53:25–32

    Google Scholar 

  • Glauber R, Claro DB (2018) A systematic mapping study on open information extraction. Expert Syst Appl 112:372–387

    Article  Google Scholar 

  • Glauber R, de Oliveira LS, Sena CFL, Claro DB, Souza M (2018) Challenges of an annotation task for open information extraction in Portuguese. In: International conference on computational processing of the Portuguese language, 2018. Springer, pp 66–76

  • Guarasci R, Damiano E, Minutolo A, Esposito M, Pietro GD (2020) Lexicon-grammar based open information extraction from natural language sentences in Italian. Expert Syst Appl 143:112954. https://doi.org/10.1016/j.eswa.2019.112954

    Article  Google Scholar 

  • Jurafsky D, Martin JH (2017) Chapter 6: vector semantics. In: Jurafsky D, Martin JH (eds) Speech and language processing, 3rd edn. Prentice Hall, pp 101–130 (draft of 23 Sep 2018). https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

  • Kato MA (2000) A restrição de mono-argumentalidade da ordem vs no português do brasil. Fórum Linguíst 2(1):97–127

    MathSciNet  Google Scholar 

  • Kilgarriff A, Grefenstette G (2001) Web as corpus. In: Proceedings of corpus linguistics 2001, Corpus Linguistics. Readings in a widening discipline, 2001, pp 342–344

  • Léchelle W, Gotti F, Langlais P (2018) WiRe57: a fine-grained benchmark for open information extraction. arXiv preprint. arXiv:1809.08962

  • Leung H, Li CY, Li J, Li K, Ljubešić N, Loginova O, Lyashevskaya O, Lynn T, Macketanz V, Makazhanov A et al (2017) Universal dependencies 2.1

  • Lockard C, Shiralkar P, Dong XL (2019) OpenCeres: when open information extraction meets the semi-structured Web. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies: long and short papers, 2019, vol 1. Association for Computational Linguistics, Minneapolis, pp 3047–3056. https://doi.org/10.18653/v1/N19-1309

  • Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: ACL (system demonstrations), 2014, pp 55–60

  • Nivre J, Hall J, Nilsson J (2006) MaltParser: a data-driven parser-generator for dependency parsing. Proc LREC 6:2216–2219

    Google Scholar 

  • Oliveira L, Glauber R, Claro DB (2017) DependentIE: an open information extraction system on Portuguese by a dependence analysis. In: ENIAC—2017 XIV Encontro Nacional de Inteligência Artificial e Computacional. http://comissoes.sbc.org.br/ce-ia/pg/historico/?file=ENIAC-2017|Anais-ENIAC-2017.pdf

  • Pereira V, Pinheiro V (2015) Report-um sistema de extração de informações aberta para língua portuguesa (report-an open information extraction system for Portuguese language). In: Proceedings of the 10th Brazilian symposium in information and human language technology, 2015, pp 191–200

  • Pilati E (2016) Sobre a ordem verbo-sujeito no português brasileiro: 30 anos em mirada crítica. Rev Linguí\(\int \)t 12(2):183–205. https://doi.org/10.31513/linguistica.2016.v12n2a5474

  • Ro Y, Lee Y, Kang P (2020) Multi\(\hat{}\) 2OIE: multilingual open information extraction based on multi-head attention with BERT. arXiv preprint. arXiv:2009.08128

  • Rodríguez JM, Merlino HD, Pesado P, García-Martínez R (2016) Performance evaluation of knowledge extraction methods. In: International conference on industrial engineering and other applications of applied intelligent systems, 2016. Springer, pp 16–22

  • Sacconi LA (2012) Gramática Para Todos os Cursos e Concursos -Teoria e Prática, 5th edn. Nova Geração

  • Santos D, Cardoso N (2007) Reconhecimento de entidades mencionadas em português: Documentação e atas do HAREM, a primeira avaliação conjunta na área. Linguateca, Lisboa

    Google Scholar 

  • Schmitz M, Bart R, Soderland S, Etzioni O et al (2012) Open language learning for information extraction. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, 2012. Association for Computational Linguistics, pp 523–534

  • Sena CFL, Claro DB (2019) InferPortOIE: a Portuguese open information extraction system with inference. Nat Lang Eng 25:287–306. https://doi.org/10.1017/S135132491800044X

    Article  Google Scholar 

  • Sena CFL, Claro DB (2020) PragmaticOIE: a pragmatic open information extraction for Portuguese language. Knowl Inf Syst 62:3811–3836

    Article  Google Scholar 

  • Sena CFL, Glauber R, Claro DB (2017) Inference approach to enhance a Portuguese open information extraction. In: Proceedings of the 19th international conference on enterprise information systems (ICEIS), 2017, vol 1. INSTICC, ScitePress, pp 442–451. https://doi.org/10.5220/0006338204420451

  • Stanovsky G, Michael J, Zettlemoyer L, Dagan I (2018) Supervised open information extraction. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies: long papers, 2018, vol 1, pp 885–895

  • Teixeira RFA (1986) Zero Anaphora in Brazilian Portuguese subjects and objects: morphological and typological considerations (Brazil). University of California, Berkeley

    Google Scholar 

  • Virtanen A, Kanerva J, Ilo R, Luoma J, Luotolahti J, Salakoski T, Ginter F, Pyysalo S (2019) Multilingual is not enough: BERT for Finnish. arXiv preprint. arXiv:1912.07076

  • Wu S, Dredze M (2020) Are all languages created equal in multilingual BERT? arXiv preprint. arXiv:2005.09093

  • Wu F, Weld DS (2010) Open information extraction using Wikipedia. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, 2010. Association for Computational Linguistics, pp 118–127

  • Xavier CC, de Lima VLS, Souza M (2013) Open information extraction based on lexical–syntactic patterns. In: 2013 Brazilian conference on intelligent systems (BRACIS), 2013. IEEE, pp 189–194

  • Xavier CC, de Lima VLS, Souza M (2015) Open information extraction based on lexical semantics. J Braz Comput Soc 21(1):4

    Article  Google Scholar 

  • Zeman D, Hajič J, Popel M, Potthast M, Straka M, Ginter F, Nivre J, Petrov S (2018) CoNLL 2018 shared task: multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2018 shared task: multilingual parsing from raw text to universal dependencies, 2018. Association for Computational Linguistics, Brussels, pp 1–21. http://www.aclweb.org/anthology/K18-2001

Download references

Acknowledgements

We would like to thank FAPESB and CAPES Finance Code 001 for their financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniela Barreiro Claro.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oliveira, L., Claro, D.B. & Souza, M. DptOIE: a Portuguese open information extraction based on dependency analysis. Artif Intell Rev 56, 7015–7046 (2023). https://doi.org/10.1007/s10462-022-10349-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-022-10349-4

Keywords

Navigation