Skip to main content

Pattern Mining with Natural Language Processing: An Exploratory Approach

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

Abstract

Pattern mining derives from the need of discovering hidden knowledge in very large amounts of data, regardless of the form in which it is presented. When it comes to Natural Language Processing (NLP), it arose along the humans’ necessity of being understood by computers. In this paper we present an exploratory approach that aims at bringing together the best of both worlds. Our goal is to discover patterns in linguistically processed texts, through the usage of NLP state-of-the-art tools and traditional pattern mining algorithms.

Articles from a Portuguese newspaper are the input of a series of tests described in this paper. First, they are processed by an NLP chain, which performs a deep linguistic analysis of text; afterwards, pattern mining algorithms Apriori and GenPrefixSpan are used. Results showed the applicability of sequential pattern mining techniques in textual structured data, and also provided several evidences about the structure of the language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  2. Kao, A., Poteet, S.: Report on KDD Conference 2004 Panel Discussion Can Natural Language Processing Help Text Mining? SIGKDD Exp. Newsl. 6(2), 132–133 (2004)

    Article  Google Scholar 

  3. Kao, A., Poteet, S.: Text mining and natural language processing: introduction for the special issue. SIGKDD Explor. Newsl. 7(1), 1–2 (2005)

    Article  Google Scholar 

  4. Liang, J., Koperski, K., Nguyen, T., Marchisio, G.: Extracting Statistical Data Frames from Text. SIGKDD Explor. Newsl. 7(1), 67–75 (2005)

    Article  Google Scholar 

  5. Leser, U., Hakenberg, J.: What Makes a Gene Name? Named Entity Recognition in the Biomedical Literature. Briefings in Bioinformatics 6(4), 357–369 (2005)

    Article  Google Scholar 

  6. Otasek, D., Brown, K., Jurisica, I.: Confirming protein-protein interactions by text mining. In: SIAM Conference on Text Mining (2006)

    Google Scholar 

  7. Matsumoto, S., Takamura, H., Okumura, M.: Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees. In: Ho, T.B., Cheung, D., Li, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 301–311. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Budi, I., Bressan, S.: Association rules mining for name entity recognition. In: WISE 2003: Proceedings of the Fourth International Conference on Web Information Systems Engineering, Washington, DC, USA, p. 325. IEEE Computer Society, Los Alamitos (2003)

    Chapter  Google Scholar 

  9. Budi, I., Bressan, S., Nasrullah: Co-reference resolution for the indonesian language using association rules. In: Kotsis, G., Taniar, D., Pardede, E., Ibrahim, I.K. (eds.) iiWAS, vol. 214, pp. 117–126. Austrian Computer Society (2006)

    Google Scholar 

  10. Jurafsky, D., Martin, J.H.: 12. In: Speech and Language Processing: An Introduction to Natural Language Processing. In: Computational Linguistics and Speech Recognition. Prentice Hall, Englewood Cliffs (2008)

    Google Scholar 

  11. Wang, C., Hong, M., Pei, J., Zhou, H., Wang, W., Shi, B.: Efficient pattern-growth methods for frequent tree pattern mining. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS, vol. 3056, pp. 441–451. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Luís, T.: Paralelização de Algoritmos de Processamento de Língua Natural em Ambientes Distribuídos. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (2008)

    Google Scholar 

  13. Medeiros, J.C.: Análise morfológica e correcção ortográfica do português. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (1995) (in Portuguese)

    Google Scholar 

  14. Rodrigues, D.J.: Uma evolução no sistema ShRep: optimizacão, interface gráfica e integracão de mais duas ferramentas”. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (2007) (in Portuguese)

    Google Scholar 

  15. Paulo, J.: Extracção Semi-Automática de Termos. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (2001) (in Portuguese)

    Google Scholar 

  16. Aït-Mokhtar, S., Chanod, J.P., Roux, C.: A multi-input dependency parser. In: IWPT. Tsinghua University Press (2001)

    Google Scholar 

  17. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994)

    Google Scholar 

  18. Antunes, C.M.: Pattern Mining over Nominal Event Sequences using Constraint Relaxations. PhD thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Portugal (2005)

    Google Scholar 

  19. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE 2001: Proceedings of the 17th International Conference on Data Engineering, Washington, DC, USA, pp. 215–226. IEEE Computer Society, Los Alamitos (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mendes, A.C., Antunes, C. (2009). Pattern Mining with Natural Language Processing: An Exploratory Approach. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03070-3_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03069-7

  • Online ISBN: 978-3-642-03070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics