Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines

Jaradeh, Mohamad Yaser; Singh, Kuldeep; Stocker, Markus; Both, Andreas; Auer, Sören

doi:10.1007/978-3-030-74296-6_19

Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines

Conference paper
First Online: 11 May 2021

2014 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12706))

Abstract

We propose Plumber, the first framework that brings together the research community’s disjoint information extraction (IE) efforts. The Plumber architecture comprises 33 reusable components for various Knowledge Graphs (KG) information extraction subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable information extraction pipelines and offers overall 264 distinct pipelines. We study the optimization problem of choosing suitable pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over two KGs: DBpedia, and Open Research Knowledge Graph (ORKG). Our results demonstrate the effectiveness of Plumber in dynamically generating KG information extraction pipelines, outperforming all baselines agnostics of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components, and discuss their limitations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction, pp. 344–354. ACL (2015)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC-2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Balog, K.: Entity linking. Entity-Oriented Search. TIRS, vol. 39, pp. 147–188. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93935-3_5
Chapter Google Scholar
Bastos, A., et al.: RECON: relation extraction using knowledge graph context in a graph neural network. In: Proceedings of The Web Conference (WWW) (2021)
Google Scholar
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)
Article Google Scholar
Both, A., Diefenbach, D., Singh, K., Shekarpour, S., Cherix, D., Lange, C.: Qanary - A methodology for vocabulary-driven open question answering systems, vol. 9678, pp. 625–641 (2016)
Google Scholar
Cetto, M., Niklaus, C., Freitas, A., Handschuh, S.: Graphene: semantically-linked propositions in open information extraction. In: Proceedings of the 27th COLING, pp. 2300–2311 (2018)
Google Scholar
Clark, K., Manning, C.D.: Deep reinforcement learning for mention-ranking coreference models. In: Proceedings of the 2016 EMNLP, pp. 2256–2262 (2016)
Google Scholar
Cui, W., Liu, S., Wu, Z., Wei, H.: How hierarchical topics evolve in large text corpora. IEEE TVCG 20(12), 2281–2290 (2014)
Google Scholar
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th I-Semantics (2013)
Google Scholar
Delpeuch, A.: OpenTapioca: lightweight entity linking for Wikidata (2019)
Google Scholar
Derczynski, L., et al.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51, 32–49 (2015)
Article Google Scholar
Dong, T., Wang, Z., Li, J., Bauckhage, C., Cremers, A.B.: Triple classification using regions and fine-grained entity typing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 77–85 (2019)
Google Scholar
Dubey, M., Banerjee, D., Chaudhuri, D., Lehmann, J.: EARL: joint entity and relation linking for question answering over knowledge graphs. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 108–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_7
Chapter Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 EMNLP, pp. 1535–1545, July 2011
Google Scholar
Fensel, D., et al.: Towards LarKC: a platform for web-scale reasoning. In: IEEE ICSC, pp. 524–529 (2008)
Google Scholar
Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities), pp. 1625–1628 (2010)
Google Scholar
Freitas, A., Bermeitinger, B., Handschuh, S.: Lambda-3/pycobalt: coreference resolution in python. https://github.com/Lambda-3/PyCobalt
Garcia, J., et al.: Constructing a shared infrastructure for software architecture analysis and maintenance. In: ICSA (2021)
Google Scholar
Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planners, pp. 179–188 (2017)
Google Scholar
Gashteovski, K., Gemulla, R., del Corro, L.: MinIE: minimizing facts in open information extraction. In: Proceedings of the 2017 EMNLP, pp. 2630–2640 (2017)
Google Scholar
Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. In: Proceedings of the 57th ACL, pp. 5203–5213 (2019)
Google Scholar
Ibrahim, Y., Riedewald, M., Weikum, G., Zeinalipour-Yazti, D.: Bridging quantities in tables and text. In: 2019 IEEE 35th ICDE, pp. 1010–1021 (2019)
Google Scholar
Jaradeh, M.Y., et al.: Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge. Marina Del K-CAP19 (2019)
Google Scholar
Kertkeidkachorn, N., Ichise, R.: T2kg: an end-to-end system for creating knowledge graph from unstructured text. In: AAAI Workshops, vol. WS-17 (2017)
Google Scholar
Kim, J.D., et al.: OKBQA framework for collaboration on developing natural language question answering systems (2017)
Google Scholar
Liang, S., Stockinger, K., de Farias, T.M., Anisimova, M., Gil, M.: Querying knowledge graphs in natural language (2020)
Google Scholar
Liu, Y., Zhang, T., Liang, Z., Ji, H., McGuinness, D.: Seq2rdf: an end-to-end application for deriving triples from natural language text (2018)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)
Google Scholar
Mihindukulasooriya, N., et al.: Leveraging semantic parsing for relation linking over knowledge bases. ISWC (2020)
Google Scholar
Morbidoni, C., Polleres, A., Tummarello, G., Le-Phuoc, D.: Semantic web pipes (2007)
Google Scholar
Niklaus, C., Cetto, M., Freitas, A., Handschuh, S.: A survey on open information extraction. In: Proceedings of the 27th COLING, pp. 3866–3878 (2018)
Google Scholar
Ponza, M., Del Corro, L., Weikum, G.: Facts that matter. In: Proceedings of the 2018 EMNLP, pp. 1043–1048. ACL (2018)
Google Scholar
Raghunathan, K., et al.: A multi-pass sieve for coreference resolution. In: EMNLP (2010)
Google Scholar
Sakor, A., et al.: Old is gold: linguistic driven approach for entity and relation linking of short text, pp. 2336–2346. ACL (2019)
Google Scholar
Sakor, A., Singh, K., Patel, A., Vidal, M.E.: Falcon 2.0: an entity and relation linking tool over wikidata. In: CIKM (2020)
Google Scholar
Sanh, V., Wolf, T., Ruder, S.: A hierarchical multi-task approach for learning embeddings from semantic tasks. In: Proceedings of the AAAI, vol. 33, pp. 6949–6956 (2019)
Google Scholar
Singh, K., et al.: Capturing knowledge in semantically-typed relational patterns to enhance relation linking. In: Proceedings of the Knowledge Capture Conference, K-CAP 2017, 4–6 December 2017, Austin, TX, USA, pp. 31:1–31:8 (2017)
Google Scholar
Singh, K., et al.: Why reinvent the wheel: let’s build question answering systems together, pp. 1247–1256. WWW 2018 (2018)
Google Scholar
Singh, K., et al.: QaldGen: towards microbench marking of question answering systems over knowledge graphs. In: ISWC, pp. 277–292 (2019)
Google Scholar
Usbeck, R., Röder, M., et al., N.N.: GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th WWW, pp. 1133–1143 (2015)
Google Scholar
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Article Google Scholar
Yang, X., et al.: Learning dynamic context augmentation for global entity linking. In: EMNLP-IJCNLP, pp. 271–281 (2019)
Google Scholar
Yao, L., Mao, C., Luo, Y.: KG-BERT: BERT for knowledge graph completion (2019)
Google Scholar
Yu, W., Li, Z., Zeng, Q., Jiang, M.: Tablepedia: automating pdf table reading in an experimental evidence exploration and analytic system. WWW 2019, pp. 3615–3619 (2019)
Google Scholar

Download references

Acknowledgements

This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the TIB Leibniz Information Centre for Science and Technology.

Author information

Authors and Affiliations

L3S Research Center, Leibniz University Hannover, Hanover, Germany
Mohamad Yaser Jaradeh
Zerotha-Research and Cerence GmbH, Aachen, Germany
Kuldeep Singh
TIB Leibniz Information Centre for Science and Technology, Hanover, Germany
Markus Stocker & Sören Auer
Anhalt University of Applied Sciences, Bernburg, Germany
Andreas Both

Authors

Mohamad Yaser Jaradeh
View author publications
You can also search for this author in PubMed Google Scholar
Kuldeep Singh
View author publications
You can also search for this author in PubMed Google Scholar
Markus Stocker
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Both
View author publications
You can also search for this author in PubMed Google Scholar
Sören Auer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamad Yaser Jaradeh .

Editor information

Editors and Affiliations

Dipartimento di Elettronica, Politecnico di Milano, Milan, Italy
Marco Brambilla
E2S UPPA, LIUPPA, Université de Pau et des Pays de l’Adour, Anglet, France
Richard Chbeir
Econometric Institute, Erasmus University Rotterdam, Rotterdam, The Netherlands
Flavius Frasincar
Inria Saclay-Île-de-France, Institut Polytechnique de Paris, Palaiseau, France
Ioana Manolescu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaradeh, M.Y., Singh, K., Stocker, M., Both, A., Auer, S. (2021). Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines. In: Brambilla, M., Chbeir, R., Frasincar, F., Manolescu, I. (eds) Web Engineering. ICWE 2021. Lecture Notes in Computer Science(), vol 12706. Springer, Cham. https://doi.org/10.1007/978-3-030-74296-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-74296-6_19
Published: 11 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74295-9
Online ISBN: 978-3-030-74296-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics