A Minwise Hashing Method for Addressing Relationship Extraction from Text

Batista, David S.; Silva, Rui; Martins, Bruno; Silva, Mário J.

doi:10.1007/978-3-642-41154-0_16

David S. Batista²⁰,
Rui Silva²⁰,
Bruno Martins²⁰ &
…
Mário J. Silva²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8181))

Included in the following conference series:

International Conference on Web Information Systems Engineering

2853 Accesses

Abstract

Relationship extraction concerns with the detection and classification of semantic relationships between entities mentioned in a collection of textual documents. This paper proposes a simple and on-line approach for addressing the automated extraction of semantic relations, based on the idea of nearest neighbor classification, and leveraging a minwise hashing method for measuring similarity between relationship instances. Experiments with three different datasets that are commonly used for benchmarking relationship extraction methods show promising results, both in terms of classification performance and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Airola, A., Pyysalo, S., Björne, J., Pahikkala, T., Ginter, F., Salakoski, T.: A graph kernel for protein-protein interaction extraction. In: Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (2008)
Google Scholar
Broder, A.: On the resemblance and containment of documents. In: Proceedings of the Conference on Compression and Complexity of Sequences (1997)
Google Scholar
Bunescu, R., Mooney, R.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2005)
Google Scholar
Bunescu, R., Mooney, R.: Subsequence kernels for relation extraction. In: Proceedings of the Conference on Neural Information Processing Systems (2006)
Google Scholar
Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of the British Machine Vision Conference (2008)
Google Scholar
Culotta, A., McCallum, A., Betz, J.: Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In: Proceedings of the Conference of the North American Chapter of the ACL (2006)
Google Scholar
Culotta, A., Sorensen, J.: Dependency tree kernels for relation extraction. In: Proceedings of the Annual Meeting of the ACL (2004)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2011)
Google Scholar
Hendrickx, I., Kim, N., Kozareva, Z., Nakov, P., Séaghdha, D., Padó, S., Pennacchiotti, M., Romano, L., Szpakowicz, S.: Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the International Workshop on Semantic Evaluation (2010)
Google Scholar
Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In: Proceedings of the Annual Meeting of the ACL (2004)
Google Scholar
Kim, S., Yoon, J., Yang, J., Park, S.: Walk-weighted subsequence kernels for protein-protein interaction extraction. BMC Bioinformatics 11(107) (2010)
Google Scholar
Li, P., König, C.: b-bit minwise hashing. In: Proceedings of the International Conference on World Wide Web (2010)
Google Scholar
Nguyen, T.-V., Moschitti, A., Riccardi, G.: Convolution kernels on constituent, dependency and sequential structures for relation extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (2009)
Google Scholar
Petrov, S., Das, D., McDonald, R.T.: A universal part-of-speech tagset. In: Proceedings of the Conference on Language Resources and Evaluation (2012)
Google Scholar
Rajaraman, A., Ullman, J.: Mining of massive datasets, ch. 3. Finding Similar Items. Cambridge University Press (2011)
Google Scholar
Teixeira, C., Silva, A., Junior, W.: Min-hash fingerprints for graph kernels: A trade-off among accuracy, efficiency, and compression. Journal of Information and Data Management 3(3) (2012)
Google Scholar
Tikk, D., Thomas, P., Palaga, P., Hakenberg, J., Leser, U.: A comprehensive benchmark of kernel methods to extract protein protein interactions from literature. PLoS Computational Biology 6(7) (2010)
Google Scholar
Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. Journal of Machine Learning Research 3 (2003)
Google Scholar
Zhang, Y., Lin, H., Yang, Z., Wang, J., Li, Y.: Hash subgraph pairwise kernel for protein-protein interaction extraction. IEEE/ACM Transactions on Computer Biology and Bioinformatics 9(4) (2012)
Google Scholar
Zhao, S., Grishman, R.: Extracting relations with integrated information using kernel methods. In: Proceedings of the Annual Meeting of the ACL (2005)
Google Scholar
Zhou, G., Zhang, M.: Extracting relation information from text documents by exploring various types of knowledge. Information Processing and Management 43(4) (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Superior Técnico and INESC-ID, Lisboa, Portugal
David S. Batista, Rui Silva, Bruno Martins & Mário J. Silva

Authors

David S. Batista
View author publications
You can also search for this author in PubMed Google Scholar
Rui Silva
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Martins
View author publications
You can also search for this author in PubMed Google Scholar
Mário J. Silva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Xuemin Lin
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
AT&T Labs-Research, Florham Park, NJ, USA
Divesh Srivastava
Victoria University, Melbourne, Australia
Guangyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batista, D.S., Silva, R., Martins, B., Silva, M.J. (2013). A Minwise Hashing Method for Addressing Relationship Extraction from Text. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41154-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-41154-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41153-3
Online ISBN: 978-3-642-41154-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics