Recursive Sequence Mining to Discover Named Entity Relations

Cellier, Peggy; Charnois, Thierry; Plantevit, Marc; Crémilleux, Bruno

doi:10.1007/978-3-642-13062-5_5

Peggy Cellier¹⁹,
Thierry Charnois¹⁹,
Marc Plantevit²⁰ &
…
Bruno Crémilleux¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6065))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

886 Accesses

Abstract

Extraction of named entity relations in textual data is an important challenge in natural language processing. For that purpose, we propose a new data mining approach based on recursive sequence mining. The contribution of this work is twofold. First, we present a method based on a cross-fertilization of sequence mining under constraints and recursive pattern mining to produce a user-manageable set of linguistic information extraction rules. Moreover, unlike most works from the state-of-the-art in natural language processing, our approach does not need syntactic parsing of the sentences neither resource except the training data. Second, we show in practice how to apply the computed rules to detect new relations between named entities, highlighting the interest of hybridization of data mining and natural language processing techniques in the discovery of knowledge. We illustrate our approach with the detection of gene interactions in biomedical literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE. IEEE, Los Alamitos (1995)
Google Scholar
Bunescu, R.C., Mooney, R.J.: A shortest path dependency kernel for relation extraction. In: HLT/EMNLP, pp. 724–731. ACL (2005)
Google Scholar
Cellier, P., Charnois, T., Plantevit, M.: Sequential patterns to discover and characterise biological relations. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 537–548. Springer, Heidelberg (2010)
Google Scholar
Crémilleux, B., Soulet, A., Klema, J., Hébert, C., Gandrillon, O.: Discovering knowledge from local patterns in sage data. In: Data Mining and Medical Knowledge Management: Cases and Applications, pp. 251–267. IGI Publishing (2009)
Google Scholar
Fundel, K., Küffner, R., Zimmer, R.: Relex - Relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)
Article Google Scholar
Garofalakis, M.N., Rastogi, R., Shim, K.: Spirit: Sequential pattern mining with regular expression constraints. In: Proc. Int. Conf. on Very Large Data Bases, pp. 223–234. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Giuliano, C., Lavelli, A., Romano, L.: Exploiting shallow linguistic information for relation extraction from biomedical literature. In: EACL, pp. 401–408 (2006)
Google Scholar
Hakenberg, J., Plake, C., Royer, L., Strobelt, H., Leser, U., Schroeder, M.: Gene mention normalization and interaction extraction with context models and sentence motifs. Genome biology 9(Suppl. 2), S14 (2008)
Article Google Scholar
Joshi, S., Ramakrishnan, G., Balakrishnan, S., Srinivasan, A.: Information extraction using non-consecutive word sequences. In: Workshop on Text Mining and Link Analysis IJCAI (2007)
Google Scholar
Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 9(Suppl. 2), S4 (2008)
Article Google Scholar
Nanni, M., Rigotti, C.: Extracting trees of quantitative serial episodes. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 170–188. Springer, Heidelberg (2007)
Chapter Google Scholar
Nédellec, C.: Machine learning for information extraction in genomics - state of the art and perspectives. In: Studies in Fuzziness and Soft Comp. Sirmakessis (2004)
Google Scholar
Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: ACM SIGMOD (1998)
Google Scholar
Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent itemsets with convertible constraints. In: ICDE, pp. 433–442. IEE Computer Society (2001)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp. 215–224. IEEE Computer Society, Los Alamitos (2001)
Google Scholar
Rosario, B., Hearst, M.A.: Multi-way relation classification: Application to protein-protein interactions. In: HLT/EMNLP, pp. 732–739. ACL (2005)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proc. of Int. Conf. on New Methods in Language Processing (September 1994)
Google Scholar
Schneider, G., Kaljurand, K., Rinaldi, F.: Detecting protein-protein interactions in biomedical texts using a parser and linguistic resources. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 406–417. Springer, Heidelberg (2009)
Chapter Google Scholar
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Chapter Google Scholar
Tanabe, L., Xie, N., Thom, L.H., Matten, W., Wilbur, J.: GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics 6, 10 (2005)
Article Google Scholar
Yeh, A., Morgan, A., Colosimo, M., Hirschman, L.: BioCreAtIvE Task 1A: Gene mention finding evaluation. BMC Bioinformatics 6(Suppl. 1), S2 (2005)
Article Google Scholar
Zaki, M.: Spade: An efficient algorithm for mining frequent sequences. Machine Learning 42(1/2), 31–60 (2001)
Article MATH Google Scholar
Zweigenbaum, P., Demner-Fushman, D., Yu, H., Cohen, K.B.: Frontiers of biomedical text mining: current progress. Brief. Bioinform. 8(5), 358–375 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Université de Caen, GREYC, CNRS, UMR6072, F-14032, France
Peggy Cellier, Thierry Charnois & Bruno Crémilleux
Université Lyon 1, LIRIS, UMR5205, F-69622, France
Marc Plantevit

Authors

Peggy Cellier
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Charnois
View author publications
You can also search for this author in PubMed Google Scholar
Marc Plantevit
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Crémilleux
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Arizona, 1040 East 4th Street, 85721, Tucson, AZ, USA
Paul R. Cohen
Department of Mathematics, Imperial College London, South Kensington Campus, SW7 2PG, London, UK
Niall M. Adams
Department of Computer and Information Science, University of Konstanz, Box 712, 78457, Konstanz, Germany
Michael R. Berthold

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cellier, P., Charnois, T., Plantevit, M., Crémilleux, B. (2010). Recursive Sequence Mining to Discover Named Entity Relations. In: Cohen, P.R., Adams, N.M., Berthold, M.R. (eds) Advances in Intelligent Data Analysis IX. IDA 2010. Lecture Notes in Computer Science, vol 6065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13062-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-13062-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13061-8
Online ISBN: 978-3-642-13062-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics