Pattern Mining with Natural Language Processing: An Exploratory Approach

Mendes, Ana Cristina; Antunes, Cláudia

doi:10.1007/978-3-642-03070-3_20

Ana Cristina Mendes²⁰ &
Cláudia Antunes²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

2448 Accesses
4 Citations

Abstract

Pattern mining derives from the need of discovering hidden knowledge in very large amounts of data, regardless of the form in which it is presented. When it comes to Natural Language Processing (NLP), it arose along the humans’ necessity of being understood by computers. In this paper we present an exploratory approach that aims at bringing together the best of both worlds. Our goal is to discover patterns in linguistically processed texts, through the usage of NLP state-of-the-art tools and traditional pattern mining algorithms.

Articles from a Portuguese newspaper are the input of a series of tests described in this paper. First, they are processed by an NLP chain, which performs a deep linguistic analysis of text; afterwards, pattern mining algorithms Apriori and GenPrefixSpan are used. Results showed the applicability of sequential pattern mining techniques in textual structured data, and also provided several evidences about the structure of the language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Kao, A., Poteet, S.: Report on KDD Conference 2004 Panel Discussion Can Natural Language Processing Help Text Mining? SIGKDD Exp. Newsl. 6(2), 132–133 (2004)
Article Google Scholar
Kao, A., Poteet, S.: Text mining and natural language processing: introduction for the special issue. SIGKDD Explor. Newsl. 7(1), 1–2 (2005)
Article Google Scholar
Liang, J., Koperski, K., Nguyen, T., Marchisio, G.: Extracting Statistical Data Frames from Text. SIGKDD Explor. Newsl. 7(1), 67–75 (2005)
Article Google Scholar
Leser, U., Hakenberg, J.: What Makes a Gene Name? Named Entity Recognition in the Biomedical Literature. Briefings in Bioinformatics 6(4), 357–369 (2005)
Article Google Scholar
Otasek, D., Brown, K., Jurisica, I.: Confirming protein-protein interactions by text mining. In: SIAM Conference on Text Mining (2006)
Google Scholar
Matsumoto, S., Takamura, H., Okumura, M.: Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees. In: Ho, T.B., Cheung, D., Li, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 301–311. Springer, Heidelberg (2005)
Chapter Google Scholar
Budi, I., Bressan, S.: Association rules mining for name entity recognition. In: WISE 2003: Proceedings of the Fourth International Conference on Web Information Systems Engineering, Washington, DC, USA, p. 325. IEEE Computer Society, Los Alamitos (2003)
Chapter Google Scholar
Budi, I., Bressan, S., Nasrullah: Co-reference resolution for the indonesian language using association rules. In: Kotsis, G., Taniar, D., Pardede, E., Ibrahim, I.K. (eds.) iiWAS, vol. 214, pp. 117–126. Austrian Computer Society (2006)
Google Scholar
Jurafsky, D., Martin, J.H.: 12. In: Speech and Language Processing: An Introduction to Natural Language Processing. In: Computational Linguistics and Speech Recognition. Prentice Hall, Englewood Cliffs (2008)
Google Scholar
Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., Shi, B.: Efficient pattern-growth methods for frequent tree pattern mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 441–451. Springer, Heidelberg (2004)
Chapter Google Scholar
Luís, T.: Paralelização de Algoritmos de Processamento de Língua Natural em Ambientes Distribuídos. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (2008)
Google Scholar
Medeiros, J.C.: Análise morfológica e correcção ortográfica do português. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (1995) (in Portuguese)
Google Scholar
Rodrigues, D.J.: Uma evolução no sistema ShRep: optimizacão, interface gráfica e integracão de mais duas ferramentas”. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (2007) (in Portuguese)
Google Scholar
Paulo, J.: Extracção Semi-Automática de Termos. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (2001) (in Portuguese)
Google Scholar
Aït-Mokhtar, S., Chanod, J.P., Roux, C.: A multi-input dependency parser. In: IWPT. Tsinghua University Press (2001)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Antunes, C.M.: Pattern Mining over Nominal Event Sequences using Constraint Relaxations. PhD thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (2005)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE 2001: Proceedings of the 17th International Conference on Data Engineering, Washington, DC, USA, pp. 215–226. IEEE Computer Society, Los Alamitos (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Spoken Language Systems Laboratory - L2F/INESC-ID Instituto Superior Técnico, Technical University of Lisbon, R. Alves Redol, 9 - 2º, 1000-029, Lisboa, Portugal
Ana Cristina Mendes
Department of Computer Science and Engineering Instituto Superior Técnico, Technical University of Lisbon, Av. Rovisco Pais 1, 1049-001, Lisboa, Portugal
Cláudia Antunes

Authors

Ana Cristina Mendes
View author publications
You can also search for this author in PubMed Google Scholar
Cláudia Antunes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Bildverarbeitung und angewandte Informatik, Körnerstr. 10, 04107, Leipzig, Deutschland, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mendes, A.C., Antunes, C. (2009). Pattern Mining with Natural Language Processing: An Exploratory Approach. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-03070-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03069-7
Online ISBN: 978-3-642-03070-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics