Advertisement

Data Mining for Grammatical Inference with Bioinformatics Criteria

  • Vivian F. López
  • Ramiro Aguilar
  • Luis Alonso
  • María N. Moreno
  • Juan M. Corchado
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6077)

Abstract

In this paper we describe both theoretical and practical results of a novel data mining process that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics to generate grammatical structures of a specific language. We used an application of a compilers generator system that allows the development of a practical application within the area of grammarware, where the concepts of the language analysis are applied to other disciplines, such as Bioinformatic. The tool allows the complexity of the obtained grammar to be measured automatically from textual data. A technique of incremental discovery of sequential patterns is presented to obtain simplified production rules, and compacted with bioinformatics criteria to make up a grammar.

Keywords

Grammatical Inference Bioinformatic Free Context Grammar DNA sequential patterns 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aguilar, R.: Minería de datos. Fundamentos, técnicas y aplicaciones. Salamanca University, Salamanca (2003)Google Scholar
  2. 2.
    Aguilar, R.: Descubrimiento incremental y alineación de patrones secuenciales en inferencia gramatical. Thesis for the Degree of Doctor in Computer Science. Salamanca University, Spain (2005)Google Scholar
  3. 3.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining. MIT Press, Cambridge (1996)Google Scholar
  4. 4.
    Fu, K.S.: Syntactic methods in pattern recognition. Academic Press, London (1974)zbMATHGoogle Scholar
  5. 5.
    Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. National Academic Science 89, 10915–10919 (1992)CrossRefGoogle Scholar
  6. 6.
    Higuera, C.: A bibliographical study of grammatical inference. Pattern Recognition (2004)Google Scholar
  7. 7.
    Jiménez-Montao, M.A., Feistel, R., Diez - Martínez, O.: On the information hidden in signals and macromolecules I. Symbolic time-series analysis (2003)Google Scholar
  8. 8.
    Jiménez-Montao, M.A., Ortiz, R., Ramos, A.: Alfabetos reducidos para la compactación de secuencias de proteínas empleando métodos de minería de datos (2003)Google Scholar
  9. 9.
    Louden, K.C.: Compiler construction. Principles and practice. International Thomsom Publishing Inc. (1997)Google Scholar
  10. 10.
    López, V., Alonso, L., Moreno, M., Aguilar, R.: Aplicación de las métricas de calidad del software en la evaluación objetiva de gramáticas independientes de contexto inferidas. In: Moreno, M.N., y García, F.J. (eds.) Actas del I Simposio Avances en Gestión de Proyectos y Calidad del Software, Salamanca, pp. 209–220 (2004)Google Scholar
  11. 11.
    López, V., Sánchez, A., Alonso, L., Moreno, M.N.: A tool to create grammar based systems. In: Corchado, J.M., et al. (eds.) DCAI 2008. ASC, vol. 50, pp. 338–346. Springer, Heidelberg (2009)Google Scholar
  12. 12.
    Mernik, M., Crepinsek, M., Kosar, T., Rebernak, D., umer, V.: Grammar-Based systems: definition and examples. Univerty of Maribor (2004)Google Scholar
  13. 13.
    Mitra, S., Acharya, T.: Data mining. Multimedia, soft computing and bioinformatics. John Wiley and sons, Chichester (2003)Google Scholar
  14. 14.
    Moreno, A.: Linguística computacional. Editorial Síntesis, Madrid (1998)Google Scholar
  15. 15.
    Piattini, M., Calvo-Manzano, J., Cervera, J., Fernández, L.: Análisis y diseño detallado de aplicaciones informáticas de gestión: una perspectiva de Ingeniería del Software. Edit. Ra-Ma. Madrid (2004)Google Scholar
  16. 16.
    Pressman, R.S.: Ingeniería del software, un enfoque práctico. Quinta edición. Edit. McGraw-Hill, Madrid (2002)Google Scholar
  17. 17.
    Searls, D.B., Dong, S.: A syntactic pattern recognition system for DNA sequences. In: Proc. 2nd Intl. Conf. on Bioinformatics, Supercomputing, and Complex Genome Analysis (1993)Google Scholar
  18. 18.
    Searls, D.B., et al.: Formal language theory and biological macromolecules (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Vivian F. López
    • 1
  • Ramiro Aguilar
    • 1
  • Luis Alonso
    • 1
  • María N. Moreno
    • 1
  • Juan M. Corchado
    • 1
  1. 1.Departamento Informática y AutomáticaUniversity of SalamancaSalamanca

Personalised recommendations