Abstract
We propose Plumber, the first framework that brings together the research community’s disjoint information extraction (IE) efforts. The Plumber architecture comprises 33 reusable components for various Knowledge Graphs (KG) information extraction subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable information extraction pipelines and offers overall 264 distinct pipelines. We study the optimization problem of choosing suitable pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over two KGs: DBpedia, and Open Research Knowledge Graph (ORKG). Our results demonstrate the effectiveness of Plumber in dynamically generating KG information extraction pipelines, outperforming all baselines agnostics of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components, and discuss their limitations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction, pp. 344–354. ACL (2015)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC-2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Balog, K.: Entity linking. Entity-Oriented Search. TIRS, vol. 39, pp. 147–188. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93935-3_5
Bastos, A., et al.: RECON: relation extraction using knowledge graph context in a graph neural network. In: Proceedings of The Web Conference (WWW) (2021)
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)
Both, A., Diefenbach, D., Singh, K., Shekarpour, S., Cherix, D., Lange, C.: Qanary - A methodology for vocabulary-driven open question answering systems, vol. 9678, pp. 625–641 (2016)
Cetto, M., Niklaus, C., Freitas, A., Handschuh, S.: Graphene: semantically-linked propositions in open information extraction. In: Proceedings of the 27th COLING, pp. 2300–2311 (2018)
Clark, K., Manning, C.D.: Deep reinforcement learning for mention-ranking coreference models. In: Proceedings of the 2016 EMNLP, pp. 2256–2262 (2016)
Cui, W., Liu, S., Wu, Z., Wei, H.: How hierarchical topics evolve in large text corpora. IEEE TVCG 20(12), 2281–2290 (2014)
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th I-Semantics (2013)
Delpeuch, A.: OpenTapioca: lightweight entity linking for Wikidata (2019)
Derczynski, L., et al.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51, 32–49 (2015)
Dong, T., Wang, Z., Li, J., Bauckhage, C., Cremers, A.B.: Triple classification using regions and fine-grained entity typing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 77–85 (2019)
Dubey, M., Banerjee, D., Chaudhuri, D., Lehmann, J.: EARL: joint entity and relation linking for question answering over knowledge graphs. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 108–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_7
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 EMNLP, pp. 1535–1545, July 2011
Fensel, D., et al.: Towards LarKC: a platform for web-scale reasoning. In: IEEE ICSC, pp. 524–529 (2008)
Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities), pp. 1625–1628 (2010)
Freitas, A., Bermeitinger, B., Handschuh, S.: Lambda-3/pycobalt: coreference resolution in python. https://github.com/Lambda-3/PyCobalt
Garcia, J., et al.: Constructing a shared infrastructure for software architecture analysis and maintenance. In: ICSA (2021)
Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planners, pp. 179–188 (2017)
Gashteovski, K., Gemulla, R., del Corro, L.: MinIE: minimizing facts in open information extraction. In: Proceedings of the 2017 EMNLP, pp. 2630–2640 (2017)
Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. In: Proceedings of the 57th ACL, pp. 5203–5213 (2019)
Ibrahim, Y., Riedewald, M., Weikum, G., Zeinalipour-Yazti, D.: Bridging quantities in tables and text. In: 2019 IEEE 35th ICDE, pp. 1010–1021 (2019)
Jaradeh, M.Y., et al.: Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge. Marina Del K-CAP19 (2019)
Kertkeidkachorn, N., Ichise, R.: T2kg: an end-to-end system for creating knowledge graph from unstructured text. In: AAAI Workshops, vol. WS-17 (2017)
Kim, J.D., et al.: OKBQA framework for collaboration on developing natural language question answering systems (2017)
Liang, S., Stockinger, K., de Farias, T.M., Anisimova, M., Gil, M.: Querying knowledge graphs in natural language (2020)
Liu, Y., Zhang, T., Liang, Z., Ji, H., McGuinness, D.: Seq2rdf: an end-to-end application for deriving triples from natural language text (2018)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)
Mihindukulasooriya, N., et al.: Leveraging semantic parsing for relation linking over knowledge bases. ISWC (2020)
Morbidoni, C., Polleres, A., Tummarello, G., Le-Phuoc, D.: Semantic web pipes (2007)
Niklaus, C., Cetto, M., Freitas, A., Handschuh, S.: A survey on open information extraction. In: Proceedings of the 27th COLING, pp. 3866–3878 (2018)
Ponza, M., Del Corro, L., Weikum, G.: Facts that matter. In: Proceedings of the 2018 EMNLP, pp. 1043–1048. ACL (2018)
Raghunathan, K., et al.: A multi-pass sieve for coreference resolution. In: EMNLP (2010)
Sakor, A., et al.: Old is gold: linguistic driven approach for entity and relation linking of short text, pp. 2336–2346. ACL (2019)
Sakor, A., Singh, K., Patel, A., Vidal, M.E.: Falcon 2.0: an entity and relation linking tool over wikidata. In: CIKM (2020)
Sanh, V., Wolf, T., Ruder, S.: A hierarchical multi-task approach for learning embeddings from semantic tasks. In: Proceedings of the AAAI, vol. 33, pp. 6949–6956 (2019)
Singh, K., et al.: Capturing knowledge in semantically-typed relational patterns to enhance relation linking. In: Proceedings of the Knowledge Capture Conference, K-CAP 2017, 4–6 December 2017, Austin, TX, USA, pp. 31:1–31:8 (2017)
Singh, K., et al.: Why reinvent the wheel: let’s build question answering systems together, pp. 1247–1256. WWW 2018 (2018)
Singh, K., et al.: QaldGen: towards microbench marking of question answering systems over knowledge graphs. In: ISWC, pp. 277–292 (2019)
Usbeck, R., Röder, M., et al., N.N.: GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th WWW, pp. 1133–1143 (2015)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Yang, X., et al.: Learning dynamic context augmentation for global entity linking. In: EMNLP-IJCNLP, pp. 271–281 (2019)
Yao, L., Mao, C., Luo, Y.: KG-BERT: BERT for knowledge graph completion (2019)
Yu, W., Li, Z., Zeng, Q., Jiang, M.: Tablepedia: automating pdf table reading in an experimental evidence exploration and analytic system. WWW 2019, pp. 3615–3619 (2019)
Acknowledgements
This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the TIB Leibniz Information Centre for Science and Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Jaradeh, M.Y., Singh, K., Stocker, M., Both, A., Auer, S. (2021). Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines. In: Brambilla, M., Chbeir, R., Frasincar, F., Manolescu, I. (eds) Web Engineering. ICWE 2021. Lecture Notes in Computer Science(), vol 12706. Springer, Cham. https://doi.org/10.1007/978-3-030-74296-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-74296-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74295-9
Online ISBN: 978-3-030-74296-6
eBook Packages: Computer ScienceComputer Science (R0)