From Low Resource Information Extraction to Identifying Influential Nodes in Knowledge Graphs

Cai, Erica; Simek, Olga; Miller, Benjamin A.; Sullivan, Danielle; Young, Evan; Smith, Christopher L.

doi:10.1007/978-3-031-57515-0_2

Erica Cai^5,6,
Olga Simek⁵,
Benjamin A. Miller ORCID: orcid.org/0000-0002-1649-1401⁵,
Danielle Sullivan⁵,
Evan Young⁵ &
…
Christopher L. Smith⁵

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Included in the following conference series:

International Conference on Complex Networks

64 Accesses

Abstract

We propose a pipeline for identifying important entities from intelligence reports that constructs a knowledge graph, where nodes correspond to entities of fine-grained types (e.g. traffickers) extracted from the text and edges correspond to extracted relations between entities (e.g. cartel membership). The important entities in intelligence reports then map to central nodes in the knowledge graph. We introduce a novel method that extracts fine-grained entities in a few-shot setting (few labeled examples), given limited resources available to label the frequently changing entity types that intelligence analysts are interested in. It outperforms other state-of-the-art methods. Next, we identify challenges facing previous evaluations of zero-shot (no labeled examples) methods for extracting relations, affecting the step of populating edges. Finally, we explore the utility of the pipeline: given the goal of identifying important entities, we evaluate the impact of relation extraction errors on the identification of central nodes in several real and synthetic networks. The impact of these errors varies significantly by graph topology, suggesting that confidence in measurements based on automatically extracted relations should depend on observed network features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cai, E., et al.: From low resource information extraction to identifying influential nodes in knowledge graphs. arXiv preprint arXiv:2401.04915 (2024)
Chen, C.Y., Li, C.T.: ZS-BERT: towards zero-shot relation extraction with attribute representation learning. In: NAACL, pp. 3470–3479 (2021)
Google Scholar
Chen, Q., et al.: Enhanced LSTM for natural language inference. In: ACL, pp. 1657–1668 (2017)
Google Scholar
Das, S., et al.: CONTaiNER: Few-shot named entity recognition via contrastive learning. In: ACL (2021)
Google Scholar
Devlin, J., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL. Minneapolis, Minnesota (2019)
Google Scholar
Ding, N., et al.: Few-NERD: a few-shot named entity recognition dataset. In: ACL-IJCNLP, pp. 3198–3213 (2021)
Google Scholar
Gao, T., et al.: FewRel 2.0: towards more challenging few-shot relation classification. In: EMNLP-IJCNLP, pp. 6250–6255 (2019)
Google Scholar
Gerdes, L.M., et al.: Assessing the Abu Sayyaf Group’s strategic and learning capacities. Stud. Confl. Terror. 37(3), 267–293 (2014)
Article Google Scholar
Gill, P., et al.: Lethal connections: the determinants of network connections in the Provisional Irish Republican Army, 1970–1998. Int. Interact. 40(1), 52–78 (2014)
Article Google Scholar
Han, X., et al.: FewRel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: EMNLP, pp. 4803–4809 (2018)
Google Scholar
Huang, J., et al.: Few-shot named entity recognition: an empirical baseline study. In: EMNLP, pp. 10408–10423 (2021)
Google Scholar
Isella, L., et al.: What’s in a crowd? Analysis of face-to-face behavioral networks. J. Theor. Biol. 271(1), 166–180 (2011)
Article MathSciNet Google Scholar
Jo, H., et al.: Vulcan: Automatic extraction and analysis of cyber threat intelligence from unstructured text. Comput. Secur. 120 (2022)
Google Scholar
Leitner, E., et al.: Fine-grained named entity recognition in legal documents. In: SEMANTiCS, pp. 272–287 (2019)
Google Scholar
Li, J., et al.: Few-shot named entity recognition via meta-learning. IEEE Trans. Knowl. Data Eng. 34(9), 4245–4256 (2020)
Article Google Scholar
Liu, C., Yang, S.: Using text mining to establish knowledge graph from accident/incident reports in risk assessment. Expert Syst. Appl. 207, 117991 (2022)
Article Google Scholar
Liu, M., et al.: LTP: a new active learning strategy for CRF-based named entity recognition. Neural Process. Lett. 54(3), 2433–2454 (2022)
Google Scholar
Lothritz, C., et al.: Evaluating pretrained transformer-based models on the task of fine-grained named entity recognition. In: COLING, pp. 3750–3760 (2020)
Google Scholar
Lyu, Q., et al.: Zero-shot event extraction via transfer learning: challenges and insights. In: ACL-IJCNLP, pp. 322–332 (2021)
Google Scholar
Manning, C.D., et al.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014)
Google Scholar
Mayhew, S., et al.: Named entity recognition with partially annotated training data. In: CoNLL (2019)
Google Scholar
Najafi, S., Fyshe, A.: Weakly-supervised questions for zero-shot relation extraction. In: EACL, pp. 3075–3087 (2023)
Google Scholar
Newman, M.E.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)
Article MathSciNet Google Scholar
Radmard, P., et al.: Subsequence based deep active learning for named entity recognition. In: ACL-IJCNLP, pp. 4310–4321 (2021)
Google Scholar
Ren, Y., et al.: CSKG4APT: a cybersecurity knowledge graph for advanced persistent threat organization attribution. IEEE Trans. Knowl. Data Eng. (2022)
Google Scholar
Rocktäschel, T., et al.: Reasoning about entailment with neural attention. In: ICLR (2016)
Google Scholar
Siddhant, A., Lipton, Z.C.: Deep Bayesian active learning for natural language processing: results of a large-scale empirical study. In: EMNLP, pp. 2904–2909 (2018)
Google Scholar
Simek, O., et al.: XLab: early indications and warnings from open source data with application to biological threat. HICSS (2018)
Google Scholar
Touvron, H., et al.: LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)
Tran, V.H., et al.: Improving discriminative learning for zero-shot relation extraction. In: SpaNLP, pp. 1–6 (2022)
Google Scholar
Wang, Q., Li, C.: Evaluating risk propagation in renewable energy incidents using ontology-based bayesian networks extracted from news reports. Int. J. Green Energy 19(12), 1290–1305 (2022)
Article Google Scholar
Williams, A., et al.: A broad-coverage challenge corpus for sentence understanding through inference. In: NAACL, pp. 1112–1122 (2018)
Google Scholar
Xue, M., et al.: Coarse-to-fine pre-training for named entity recognition. In: EMNLP (2020)
Google Scholar
Zhou, B., et al.: MTAAL: multi-task adversarial active learning for medical named entity recognition and normalization. In: AAAI, vol. 35, pp. 14586–14593 (2021)
Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the Department of Defense under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Department of Defense.

Author information

Authors and Affiliations

MIT Lincoln Laboratory, Lexington, MA, 02421, USA
Erica Cai, Olga Simek, Benjamin A. Miller, Danielle Sullivan, Evan Young & Christopher L. Smith
University of Massachusetts Amherst, Amherst, MA, 01003, USA
Erica Cai

Authors

Erica Cai
View author publications
You can also search for this author in PubMed Google Scholar
Olga Simek
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin A. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Danielle Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Evan Young
View author publications
You can also search for this author in PubMed Google Scholar
Christopher L. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erica Cai .

Editor information

Editors and Affiliations

Department of Computer Science, University of Exeter, Exeter, UK
Federico Botta
Department of Data Science, Northeastern University London, London, UK
Mariana Macedo
Department of Computer Science, University of Exeter, Exeter, UK
Hugo Barbosa
Department of Computer Science, University of Exeter, Exeter, UK
Ronaldo Menezes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, E., Simek, O., Miller, B.A., Sullivan, D., Young, E., Smith, C.L. (2024). From Low Resource Information Extraction to Identifying Influential Nodes in Knowledge Graphs. In: Botta, F., Macedo, M., Barbosa, H., Menezes, R. (eds) Complex Networks XV. CompleNet-Live 2024. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-031-57515-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-57515-0_2
Published: 14 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57514-3
Online ISBN: 978-3-031-57515-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

From Low Resource Information Extraction to Identifying Influential Nodes in Knowledge Graphs