Skip to main content

Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12706))

Abstract

We propose Plumber, the first framework that brings together the research community’s disjoint information extraction (IE) efforts. The Plumber architecture comprises 33 reusable components for various Knowledge Graphs (KG) information extraction subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable information extraction pipelines and offers overall 264 distinct pipelines. We study the optimization problem of choosing suitable pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over two KGs: DBpedia, and Open Research Knowledge Graph (ORKG). Our results demonstrate the effectiveness of Plumber in dynamically generating KG information extraction pipelines, outperforming all baselines agnostics of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components, and discuss their limitations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Angeli, G., Premkumar, M.J.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction, pp. 344–354. ACL (2015)

    Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC-2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  3. Balog, K.: Entity linking. Entity-Oriented Search. TIRS, vol. 39, pp. 147–188. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93935-3_5

    Chapter  Google Scholar 

  4. Bastos, A., et al.: RECON: relation extraction using knowledge graph context in a graph neural network. In: Proceedings of The Web Conference (WWW) (2021)

    Google Scholar 

  5. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)

    Article  Google Scholar 

  6. Both, A., Diefenbach, D., Singh, K., Shekarpour, S., Cherix, D., Lange, C.: Qanary - A methodology for vocabulary-driven open question answering systems, vol. 9678, pp. 625–641 (2016)

    Google Scholar 

  7. Cetto, M., Niklaus, C., Freitas, A., Handschuh, S.: Graphene: semantically-linked propositions in open information extraction. In: Proceedings of the 27th COLING, pp. 2300–2311 (2018)

    Google Scholar 

  8. Clark, K., Manning, C.D.: Deep reinforcement learning for mention-ranking coreference models. In: Proceedings of the 2016 EMNLP, pp. 2256–2262 (2016)

    Google Scholar 

  9. Cui, W., Liu, S., Wu, Z., Wei, H.: How hierarchical topics evolve in large text corpora. IEEE TVCG 20(12), 2281–2290 (2014)

    Google Scholar 

  10. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th I-Semantics (2013)

    Google Scholar 

  11. Delpeuch, A.: OpenTapioca: lightweight entity linking for Wikidata (2019)

    Google Scholar 

  12. Derczynski, L., et al.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51, 32–49 (2015)

    Article  Google Scholar 

  13. Dong, T., Wang, Z., Li, J., Bauckhage, C., Cremers, A.B.: Triple classification using regions and fine-grained entity typing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 77–85 (2019)

    Google Scholar 

  14. Dubey, M., Banerjee, D., Chaudhuri, D., Lehmann, J.: EARL: joint entity and relation linking for question answering over knowledge graphs. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 108–126. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_7

    Chapter  Google Scholar 

  15. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the 2011 EMNLP, pp. 1535–1545, July 2011

    Google Scholar 

  16. Fensel, D., et al.: Towards LarKC: a platform for web-scale reasoning. In: IEEE ICSC, pp. 524–529 (2008)

    Google Scholar 

  17. Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia entities), pp. 1625–1628 (2010)

    Google Scholar 

  18. Freitas, A., Bermeitinger, B., Handschuh, S.: Lambda-3/pycobalt: coreference resolution in python. https://github.com/Lambda-3/PyCobalt

  19. Garcia, J., et al.: Constructing a shared infrastructure for software architecture analysis and maintenance. In: ICSA (2021)

    Google Scholar 

  20. Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planners, pp. 179–188 (2017)

    Google Scholar 

  21. Gashteovski, K., Gemulla, R., del Corro, L.: MinIE: minimizing facts in open information extraction. In: Proceedings of the 2017 EMNLP, pp. 2630–2640 (2017)

    Google Scholar 

  22. Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. In: Proceedings of the 57th ACL, pp. 5203–5213 (2019)

    Google Scholar 

  23. Ibrahim, Y., Riedewald, M., Weikum, G., Zeinalipour-Yazti, D.: Bridging quantities in tables and text. In: 2019 IEEE 35th ICDE, pp. 1010–1021 (2019)

    Google Scholar 

  24. Jaradeh, M.Y., et al.: Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge. Marina Del K-CAP19 (2019)

    Google Scholar 

  25. Kertkeidkachorn, N., Ichise, R.: T2kg: an end-to-end system for creating knowledge graph from unstructured text. In: AAAI Workshops, vol. WS-17 (2017)

    Google Scholar 

  26. Kim, J.D., et al.: OKBQA framework for collaboration on developing natural language question answering systems (2017)

    Google Scholar 

  27. Liang, S., Stockinger, K., de Farias, T.M., Anisimova, M., Gil, M.: Querying knowledge graphs in natural language (2020)

    Google Scholar 

  28. Liu, Y., Zhang, T., Liang, Z., Ji, H., McGuinness, D.: Seq2rdf: an end-to-end application for deriving triples from natural language text (2018)

    Google Scholar 

  29. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)

    Google Scholar 

  30. Mihindukulasooriya, N., et al.: Leveraging semantic parsing for relation linking over knowledge bases. ISWC (2020)

    Google Scholar 

  31. Morbidoni, C., Polleres, A., Tummarello, G., Le-Phuoc, D.: Semantic web pipes (2007)

    Google Scholar 

  32. Niklaus, C., Cetto, M., Freitas, A., Handschuh, S.: A survey on open information extraction. In: Proceedings of the 27th COLING, pp. 3866–3878 (2018)

    Google Scholar 

  33. Ponza, M., Del Corro, L., Weikum, G.: Facts that matter. In: Proceedings of the 2018 EMNLP, pp. 1043–1048. ACL (2018)

    Google Scholar 

  34. Raghunathan, K., et al.: A multi-pass sieve for coreference resolution. In: EMNLP (2010)

    Google Scholar 

  35. Sakor, A., et al.: Old is gold: linguistic driven approach for entity and relation linking of short text, pp. 2336–2346. ACL (2019)

    Google Scholar 

  36. Sakor, A., Singh, K., Patel, A., Vidal, M.E.: Falcon 2.0: an entity and relation linking tool over wikidata. In: CIKM (2020)

    Google Scholar 

  37. Sanh, V., Wolf, T., Ruder, S.: A hierarchical multi-task approach for learning embeddings from semantic tasks. In: Proceedings of the AAAI, vol. 33, pp. 6949–6956 (2019)

    Google Scholar 

  38. Singh, K., et al.: Capturing knowledge in semantically-typed relational patterns to enhance relation linking. In: Proceedings of the Knowledge Capture Conference, K-CAP 2017, 4–6 December 2017, Austin, TX, USA, pp. 31:1–31:8 (2017)

    Google Scholar 

  39. Singh, K., et al.: Why reinvent the wheel: let’s build question answering systems together, pp. 1247–1256. WWW 2018 (2018)

    Google Scholar 

  40. Singh, K., et al.: QaldGen: towards microbench marking of question answering systems over knowledge graphs. In: ISWC, pp. 277–292 (2019)

    Google Scholar 

  41. Usbeck, R., Röder, M., et al., N.N.: GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th WWW, pp. 1133–1143 (2015)

    Google Scholar 

  42. Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)

    Article  Google Scholar 

  43. Yang, X., et al.: Learning dynamic context augmentation for global entity linking. In: EMNLP-IJCNLP, pp. 271–281 (2019)

    Google Scholar 

  44. Yao, L., Mao, C., Luo, Y.: KG-BERT: BERT for knowledge graph completion (2019)

    Google Scholar 

  45. Yu, W., Li, Z., Zeng, Q., Jiang, M.: Tablepedia: automating pdf table reading in an experimental evidence exploration and analytic system. WWW 2019, pp. 3615–3619 (2019)

    Google Scholar 

Download references

Acknowledgements

This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the TIB Leibniz Information Centre for Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamad Yaser Jaradeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jaradeh, M.Y., Singh, K., Stocker, M., Both, A., Auer, S. (2021). Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines. In: Brambilla, M., Chbeir, R., Frasincar, F., Manolescu, I. (eds) Web Engineering. ICWE 2021. Lecture Notes in Computer Science(), vol 12706. Springer, Cham. https://doi.org/10.1007/978-3-030-74296-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74296-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74295-9

  • Online ISBN: 978-3-030-74296-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics