Discovering Linguistic Patterns Using Sequence Mining

  • Nicolas Béchet
  • Peggy Cellier
  • Thierry Charnois
  • Bruno Crémilleux
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7181)

Abstract

In this paper, we present a method based on data mining techniques to automatically discover linguistic patterns matching appositive qualifying phrases. We develop an algorithm mining sequential patterns made of itemsets with gap and linguistic constraints. The itemsets allow several kinds of information to be associated with one term. The advantage is the extraction of linguistic patterns with more expressiveness than the usual sequential patterns. In addition, the constraints enable to automatically prune irrelevant patterns. In order to manage the set of generated patterns, we propose a solution based on a partial ordering. A human user can thus easily validate them as relevant linguistic patterns. We illustrate the efficiency of our approach over two corpora coming from a newspaper.

Keywords

Partial Order Sequential Pattern Data Mining Technique Support Threshold Grammatical Category 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE. IEEE (1995)Google Scholar
  2. 2.
    Bonchi, F.: On closed constrained frequent pattern mining. In: Proc. IEEE Int. Conf. on Data Mining, ICDM 2004, pp. 35–42. Press (2004)Google Scholar
  3. 3.
    Califf, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: AAAI 1999, pp. 328–334 (1999)Google Scholar
  4. 4.
    Cellier, P., Charnois, T., Plantevit, M.: Sequential Patterns to Discover and Characterise Biological Relations. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 537–548. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Davey, B.A., Priestley, H.A.: Introduction To Lattices And Order. Cambridge University Press (1990)Google Scholar
  6. 6.
    Dong, G., Pei, J.: Sequence Data Mining. Springer, Heidelberg (2007)MATHGoogle Scholar
  7. 7.
    Ferr, S.: Camelis: a logical information system to organize and browse a collection of documents. Int. J. General Systems 38(4) (2009)Google Scholar
  8. 8.
    Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: An overview. In: KDD, pp. 1–30. AAAI/MIT Press (1991)Google Scholar
  9. 9.
    Fundel, K., Küffner, R., Zimmer, R.: RelEx - relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)CrossRefGoogle Scholar
  10. 10.
    Giuliano, C., Lavelli, A., Romano, L.: Exploiting shallow linguistic information for relation extraction from biomedical literature. In: EACL (2006)Google Scholar
  11. 11.
    Hobbs, J.R., Riloff, E.: Information extraction. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010)Google Scholar
  12. 12.
    Jackiewicz, A.: Structures avec constituants détachés et jugements d’évaluation. Document Numérique 13(3), 11–40 (2010)CrossRefGoogle Scholar
  13. 13.
    Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 9 (2008)Google Scholar
  14. 14.
    Nédellec, C.: Machine learning for information extraction in genomics - state of the art and perspectives. In: Text Mining and its Applications: Results of the NEMIS Launch Conf., Studies in Fuzziness and Soft Comp., Sirmakessis, Spiros (2004)Google Scholar
  15. 15.
    Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp. 215–224. IEEE Computer Society (2001)Google Scholar
  16. 16.
    Riloff, E.: Automatically generating extraction patterns from untagged text. In: AAAI/IAAI 1996 (1996)Google Scholar
  17. 17.
    Sagot, B., Clément, L., de La Clergerie, E., Boullier, P.: The lefff 2 syntactic lexicon for french: architecture, acquisition, use. In: LREC 2006, Głnes, Italy (2009)Google Scholar
  18. 18.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing (September 1994)Google Scholar
  19. 19.
    Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)Google Scholar
  20. 20.
    Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: ICDE, pp. 79–90. IEEE Computer Society (2004)Google Scholar
  21. 21.
    Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large databases. In: Barbará, D., Kamath, C. (eds.) SDM. SIAM (2003)Google Scholar
  22. 22.
    Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal 42(1/2 ), 31–60 (2001) (special issue on Unsupervised Learning)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nicolas Béchet
    • 1
  • Peggy Cellier
    • 2
  • Thierry Charnois
    • 1
  • Bruno Crémilleux
    • 1
  1. 1.GREYC Université de Caen Basse-NormandieCaen CEDEXFrance
  2. 2.INSA Rennes/IRISARennes cedexFrance

Personalised recommendations