Skip to main content

Data Mining for Grammatical Inference with Bioinformatics Criteria

  • Conference paper
Hybrid Artificial Intelligence Systems (HAIS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6077))

Included in the following conference series:

  • 955 Accesses

Abstract

In this paper we describe both theoretical and practical results of a novel data mining process that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics to generate grammatical structures of a specific language. We used an application of a compilers generator system that allows the development of a practical application within the area of grammarware, where the concepts of the language analysis are applied to other disciplines, such as Bioinformatic. The tool allows the complexity of the obtained grammar to be measured automatically from textual data. A technique of incremental discovery of sequential patterns is presented to obtain simplified production rules, and compacted with bioinformatics criteria to make up a grammar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aguilar, R.: Minería de datos. Fundamentos, técnicas y aplicaciones. Salamanca University, Salamanca (2003)

    Google Scholar 

  2. Aguilar, R.: Descubrimiento incremental y alineación de patrones secuenciales en inferencia gramatical. Thesis for the Degree of Doctor in Computer Science. Salamanca University, Spain (2005)

    Google Scholar 

  3. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in knowledge discovery and data mining. MIT Press, Cambridge (1996)

    Google Scholar 

  4. Fu, K.S.: Syntactic methods in pattern recognition. Academic Press, London (1974)

    MATH  Google Scholar 

  5. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. National Academic Science 89, 10915–10919 (1992)

    Article  Google Scholar 

  6. Higuera, C.: A bibliographical study of grammatical inference. Pattern Recognition (2004)

    Google Scholar 

  7. Jiménez-Montao, M.A., Feistel, R., Diez - Martínez, O.: On the information hidden in signals and macromolecules I. Symbolic time-series analysis (2003)

    Google Scholar 

  8. Jiménez-Montao, M.A., Ortiz, R., Ramos, A.: Alfabetos reducidos para la compactación de secuencias de proteínas empleando métodos de minería de datos (2003)

    Google Scholar 

  9. Louden, K.C.: Compiler construction. Principles and practice. International Thomsom Publishing Inc. (1997)

    Google Scholar 

  10. López, V., Alonso, L., Moreno, M., Aguilar, R.: Aplicación de las métricas de calidad del software en la evaluación objetiva de gramáticas independientes de contexto inferidas. In: Moreno, M.N., y García, F.J. (eds.) Actas del I Simposio Avances en Gestión de Proyectos y Calidad del Software, Salamanca, pp. 209–220 (2004)

    Google Scholar 

  11. López, V., Sánchez, A., Alonso, L., Moreno, M.N.: A tool to create grammar based systems. In: Corchado, J.M., et al. (eds.) DCAI 2008. ASC, vol. 50, pp. 338–346. Springer, Heidelberg (2009)

    Google Scholar 

  12. Mernik, M., Crepinsek, M., Kosar, T., Rebernak, D., umer, V.: Grammar-Based systems: definition and examples. Univerty of Maribor (2004)

    Google Scholar 

  13. Mitra, S., Acharya, T.: Data mining. Multimedia, soft computing and bioinformatics. John Wiley and sons, Chichester (2003)

    Google Scholar 

  14. Moreno, A.: Linguística computacional. Editorial Síntesis, Madrid (1998)

    Google Scholar 

  15. Piattini, M., Calvo-Manzano, J., Cervera, J., Fernández, L.: Análisis y diseño detallado de aplicaciones informáticas de gestión: una perspectiva de Ingeniería del Software. Edit. Ra-Ma. Madrid (2004)

    Google Scholar 

  16. Pressman, R.S.: Ingeniería del software, un enfoque práctico. Quinta edición. Edit. McGraw-Hill, Madrid (2002)

    Google Scholar 

  17. Searls, D.B., Dong, S.: A syntactic pattern recognition system for DNA sequences. In: Proc. 2nd Intl. Conf. on Bioinformatics, Supercomputing, and Complex Genome Analysis (1993)

    Google Scholar 

  18. Searls, D.B., et al.: Formal language theory and biological macromolecules (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

López, V.F., Aguilar, R., Alonso, L., Moreno, M.N., Corchado, J.M. (2010). Data Mining for Grammatical Inference with Bioinformatics Criteria. In: Corchado, E., Graña Romay, M., Manhaes Savio, A. (eds) Hybrid Artificial Intelligence Systems. HAIS 2010. Lecture Notes in Computer Science(), vol 6077. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13803-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13803-4_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13802-7

  • Online ISBN: 978-3-642-13803-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics