Skip to main content

Automated Enzyme Classification by Formal Concept Analysis

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 8478)

Abstract

Enzymes are macro-molecules (linear sequences of linked molecules) with a catalytic activity that make them essential for any biochemical reaction. High throughput genomic techniques give access to the sequence of new enzymes found in living organisms. Guessing the enzyme’s functional activity from its sequence is a crucial task that can be approached by comparing the new sequences with those of already known enzymes labeled by a family class. This task is difficult because the activity is based on a combination of small sequence patterns and sequences greatly evolved over time. This paper presents a classifier based on the identification of common subsequence blocks between known and new enzymes and the search of formal concepts built on the cross product of blocks and sequences for each class. Since new enzyme families may emerge, it is important to propose a first classification of enzymes that cannot be assigned to a known family. FCA offers a nice framework to set the task as an optimization problem on the set of concepts. The classifier has been tested with success on a particular set of enzymes present in a large variety of species, the haloacid dehalogenase superfamily.

Keywords

  • bioinformatics
  • protein classification
  • FCA application

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-07248-7_17
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-07248-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sillitoe, I., Cuff, A., Dessailly, B., Dawson, N., Furnham, N., Lee, D., Lees, J., Lewis, T., Studer, R., Rentzsch, R., Yeats, C., Thornton, J.M., Orengo, C.A.: New functional families (funfams) in cath to improve the mapping of conserved functional sites to 3d structures. Nucleic Acids Res. 41(D1), D490–D498 (2013)

    Google Scholar 

  2. Fox, N.K., Brenner, S.E., Chandonia, J.M.: SCOPe: Structural Classification of Proteins-extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42(D1), D304–D309 (2014)

    Google Scholar 

  3. Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein α-chain identification. In: HICSS (5), pp. 113–122 (1994)

    Google Scholar 

  4. Peris, P., López, D., Campos, M.: Igtm: An algorithm to predict transmembrane domains and topology in proteins. BMC Bioinformatics 9 (2008)

    Google Scholar 

  5. Kerbellec, G.: Apprentissage d’automates modélisant des familles de séquences protéiques. PhD thesis, Université Rennes 1 (2008)

    Google Scholar 

  6. Lee, B.J., Lee, H.G., Lee, J.Y., Ryu, K.H.: Classification of enzyme function from protein sequence based on feature representation. In: Proc. of the 7th IEEE Int. Conf. on Bioinformatics and Bioengineering, BIBE 2007, pp. 741–747 (October 2007)

    Google Scholar 

  7. Lee, B.J., Lee, H.G., Ryu, K.H.: Design of a novel protein feature and enzyme function classification. In: IEEE 8th Int. Conf. on Computer and Information Technology Workshops, CIT Workshops 2008, pp. 450–455 (July 2008)

    Google Scholar 

  8. Kumar, C., Choudhary, A.: A top-down approach to classify enzyme functional classes and sub-classes using random forest. EURASIP Journal on Bioinformatics and Systems Biology 2012(1), 1 (2012)

    Google Scholar 

  9. Brown, D.P., Krishnamurthy, N., Sjölander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3(8), e160 (2007)

    Google Scholar 

  10. Wang, J., Liang, J., Qian, Y.: Closed-label concept lattice based rule extraction approach. In: Huang, D.-S., Gan, Y., Premaratne, P., Han, K. (eds.) ICIC 2011. LNCS, vol. 6840, pp. 690–698. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  11. Carpineto, C., Romano, G.: Galois: An order-theoretic approach to conceptual clustering. In: Proceedings of the 10th International Conference on Machine Learning (ICML 1990), pp. 33–40 (July 1993)

    Google Scholar 

  12. Sahami, M.: Learning classification rules using lattices. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 343–346. Springer, Heidelberg (1995)

    CrossRef  Google Scholar 

  13. Ikeda, M., Yamamoto, A.: Classification by Selecting Plausible Formal Concepts in a Concept Lattice. In: Workshop on Formal Concept Analysis meets Information Retrieval (FCAIR 2013), pp. 22–35 (2013)

    Google Scholar 

  14. Mephu Nguifo, E.: Legal-e: une méthode d’apprentissage de concepts à partir d’exemples, basée sur le treillis de galois. In: Actes du 9ème Congrès Recon. des Formes en Intell. Artificielle (RFIA), Paris, vol. 2, pp. 35–46 (January 1994)

    Google Scholar 

  15. Klimushkin, M., Obiedkov, S., Roth, C.: Approaches to the selection of relevant concepts in the case of noisy data. In: Kwuida, L., Sertkaya, B. (eds.) ICFCA 2010. LNCS, vol. 5986, pp. 255–266. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  16. Njiwoua, P.: Améliorer l’apprentissage à partir d’instances grĉce à l’induction de concepts: le système cible. In: Science, H., (ed.): Revue d’ Intelligence Artificielle, vol. 13, pp. 413–440 (1999)

    Google Scholar 

  17. Kovacs, L.: Generating decision tree from lattice for classification. In: 7th International Conference on Applied Informatics, vol. 2, pp. 377–384 (2007)

    Google Scholar 

  18. Sahami, M.: Learning classification rules using lattices. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 343–346. Springer, Heidelberg (1995)

    CrossRef  Google Scholar 

  19. Xie, Z., Hsu, W., Liu, Z., Lee, M.L.: Concept lattice based composite classifiers for high predictability. J. Exp. Theor. Artif. Intell. 14(2-3), 143–156 (2002)

    CrossRef  MATH  Google Scholar 

  20. Busygin, S., Prokopyev, O., Pardalos, P.M.: Biclustering in data mining. Comput. Oper. Res. 35(9), 2964–2987 (2008)

    CrossRef  MATH  MathSciNet  Google Scholar 

  21. Gaume, B., Navarro, E., Prade, H.: Clustering bipartite graphs in terms of approximate formal concepts and sub-contexts. International Journal of Computational Intelligence Systems 6(6), 1125–1142 (2013)

    CrossRef  Google Scholar 

  22. Navarro, E., Prade, H., Gaume, B.: Clustering sets of objects using concepts-objects bipartite graphs. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds.) SUM 2012. LNCS, vol. 7520, pp. 420–432. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  23. Brewka, G., Eiter, T., Truszczyński, M.: Answer set programming at a glance. Commun. ACM 54(12), 92–103 (2011)

    CrossRef  Google Scholar 

  24. Gebser, M., Kaufmann, B., Schaub, T.: Conflict-driven answer set solving: From theory to practice. Artif. Intell. 187, 52–89 (2012)

    CrossRef  MathSciNet  Google Scholar 

  25. Kuznetsova, E., Proudfoot, M., Gonzalez, C.F., Brown, G., Omelchenko, M.V., Borozan, I., Carmel, L., Wolf, Y.I., Mori, H., Savchenko, A.V., Arrowsmith, C.H., Koonin, E.V., Edwards, A.M., Yakunin, A.F.: Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family. Journal of Biological Chemistry 281(47), 36149–36161 (2006)

    CrossRef  Google Scholar 

  26. Seifried, A., Schultz, J., Gohla, A.: Human HAD phosphatases: structure, mechanism, and roles in health and disease. FEBS Journal 280(2), 549–571 (2013)

    CrossRef  Google Scholar 

  27. Koonin, E.V., Tatusov, R.L.: Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity: Application of an iterative approach to database search. J. Mol. Bio. 244(1), 125–132 (1994)

    CrossRef  Google Scholar 

  28. Burroughs, A.M., Allen, K.N., Dunaway-Mariano, D., Aravind, L.: Evolutionary Genomics of the HAD Superfamily: Understanding the Structural Adaptations and Catalytic Diversity in a Superfamily of Phosphoesterases and Allied Enzymes. Journal of Molecular Biology 361(5), 1003–1034 (2006)

    CrossRef  Google Scholar 

  29. Janssen, D.B.: Biocatalysis by dehalogenating enzymes. Advances in Applied Microbiology, vol. 61, pp. 233–252. Academic Press (2007)

    Google Scholar 

  30. Mark Cock, J., Sterck, L., Rouz, P., Scornet, D., Allen, A., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J., Badger, J.: The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature (7298), 617–621 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Coste, F., Garet, G., Groisillier, A., Nicolas, J., Tonon, T. (2014). Automated Enzyme Classification by Formal Concept Analysis. In: Glodeanu, C.V., Kaytoue, M., Sacarea, C. (eds) Formal Concept Analysis. ICFCA 2014. Lecture Notes in Computer Science(), vol 8478. Springer, Cham. https://doi.org/10.1007/978-3-319-07248-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07248-7_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07247-0

  • Online ISBN: 978-3-319-07248-7

  • eBook Packages: Computer ScienceComputer Science (R0)