Automated Enzyme Classification by Formal Concept Analysis

  • François Coste
  • Gaëlle Garet
  • Agnès Groisillier
  • Jacques Nicolas
  • Thierry Tonon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8478)


Enzymes are macro-molecules (linear sequences of linked molecules) with a catalytic activity that make them essential for any biochemical reaction. High throughput genomic techniques give access to the sequence of new enzymes found in living organisms. Guessing the enzyme’s functional activity from its sequence is a crucial task that can be approached by comparing the new sequences with those of already known enzymes labeled by a family class. This task is difficult because the activity is based on a combination of small sequence patterns and sequences greatly evolved over time. This paper presents a classifier based on the identification of common subsequence blocks between known and new enzymes and the search of formal concepts built on the cross product of blocks and sequences for each class. Since new enzyme families may emerge, it is important to propose a first classification of enzymes that cannot be assigned to a known family. FCA offers a nice framework to set the task as an optimization problem on the set of concepts. The classifier has been tested with success on a particular set of enzymes present in a large variety of species, the haloacid dehalogenase superfamily.


bioinformatics protein classification FCA application 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sillitoe, I., Cuff, A., Dessailly, B., Dawson, N., Furnham, N., Lee, D., Lees, J., Lewis, T., Studer, R., Rentzsch, R., Yeats, C., Thornton, J.M., Orengo, C.A.: New functional families (funfams) in cath to improve the mapping of conserved functional sites to 3d structures. Nucleic Acids Res. 41(D1), D490–D498 (2013)Google Scholar
  2. 2.
    Fox, N.K., Brenner, S.E., Chandonia, J.M.: SCOPe: Structural Classification of Proteins-extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42(D1), D304–D309 (2014)Google Scholar
  3. 3.
    Yokomori, T., Ishida, N., Kobayashi, S.: Learning local languages and its application to protein α-chain identification. In: HICSS (5), pp. 113–122 (1994)Google Scholar
  4. 4.
    Peris, P., López, D., Campos, M.: Igtm: An algorithm to predict transmembrane domains and topology in proteins. BMC Bioinformatics 9 (2008)Google Scholar
  5. 5.
    Kerbellec, G.: Apprentissage d’automates modélisant des familles de séquences protéiques. PhD thesis, Université Rennes 1 (2008)Google Scholar
  6. 6.
    Lee, B.J., Lee, H.G., Lee, J.Y., Ryu, K.H.: Classification of enzyme function from protein sequence based on feature representation. In: Proc. of the 7th IEEE Int. Conf. on Bioinformatics and Bioengineering, BIBE 2007, pp. 741–747 (October 2007)Google Scholar
  7. 7.
    Lee, B.J., Lee, H.G., Ryu, K.H.: Design of a novel protein feature and enzyme function classification. In: IEEE 8th Int. Conf. on Computer and Information Technology Workshops, CIT Workshops 2008, pp. 450–455 (July 2008)Google Scholar
  8. 8.
    Kumar, C., Choudhary, A.: A top-down approach to classify enzyme functional classes and sub-classes using random forest. EURASIP Journal on Bioinformatics and Systems Biology 2012(1), 1 (2012)Google Scholar
  9. 9.
    Brown, D.P., Krishnamurthy, N., Sjölander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3(8), e160 (2007)Google Scholar
  10. 10.
    Wang, J., Liang, J., Qian, Y.: Closed-label concept lattice based rule extraction approach. In: Huang, D.-S., Gan, Y., Premaratne, P., Han, K. (eds.) ICIC 2011. LNCS, vol. 6840, pp. 690–698. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Carpineto, C., Romano, G.: Galois: An order-theoretic approach to conceptual clustering. In: Proceedings of the 10th International Conference on Machine Learning (ICML 1990), pp. 33–40 (July 1993)Google Scholar
  12. 12.
    Sahami, M.: Learning classification rules using lattices. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 343–346. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  13. 13.
    Ikeda, M., Yamamoto, A.: Classification by Selecting Plausible Formal Concepts in a Concept Lattice. In: Workshop on Formal Concept Analysis meets Information Retrieval (FCAIR 2013), pp. 22–35 (2013)Google Scholar
  14. 14.
    Mephu Nguifo, E.: Legal-e: une méthode d’apprentissage de concepts à partir d’exemples, basée sur le treillis de galois. In: Actes du 9ème Congrès Recon. des Formes en Intell. Artificielle (RFIA), Paris, vol. 2, pp. 35–46 (January 1994)Google Scholar
  15. 15.
    Klimushkin, M., Obiedkov, S., Roth, C.: Approaches to the selection of relevant concepts in the case of noisy data. In: Kwuida, L., Sertkaya, B. (eds.) ICFCA 2010. LNCS, vol. 5986, pp. 255–266. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Njiwoua, P.: Améliorer l’apprentissage à partir d’instances grĉce à l’induction de concepts: le système cible. In: Science, H., (ed.): Revue d’ Intelligence Artificielle, vol. 13, pp. 413–440 (1999)Google Scholar
  17. 17.
    Kovacs, L.: Generating decision tree from lattice for classification. In: 7th International Conference on Applied Informatics, vol. 2, pp. 377–384 (2007)Google Scholar
  18. 18.
    Sahami, M.: Learning classification rules using lattices. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 343–346. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  19. 19.
    Xie, Z., Hsu, W., Liu, Z., Lee, M.L.: Concept lattice based composite classifiers for high predictability. J. Exp. Theor. Artif. Intell. 14(2-3), 143–156 (2002)CrossRefzbMATHGoogle Scholar
  20. 20.
    Busygin, S., Prokopyev, O., Pardalos, P.M.: Biclustering in data mining. Comput. Oper. Res. 35(9), 2964–2987 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Gaume, B., Navarro, E., Prade, H.: Clustering bipartite graphs in terms of approximate formal concepts and sub-contexts. International Journal of Computational Intelligence Systems 6(6), 1125–1142 (2013)CrossRefGoogle Scholar
  22. 22.
    Navarro, E., Prade, H., Gaume, B.: Clustering sets of objects using concepts-objects bipartite graphs. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds.) SUM 2012. LNCS, vol. 7520, pp. 420–432. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  23. 23.
    Brewka, G., Eiter, T., Truszczyński, M.: Answer set programming at a glance. Commun. ACM 54(12), 92–103 (2011)CrossRefGoogle Scholar
  24. 24.
    Gebser, M., Kaufmann, B., Schaub, T.: Conflict-driven answer set solving: From theory to practice. Artif. Intell. 187, 52–89 (2012)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Kuznetsova, E., Proudfoot, M., Gonzalez, C.F., Brown, G., Omelchenko, M.V., Borozan, I., Carmel, L., Wolf, Y.I., Mori, H., Savchenko, A.V., Arrowsmith, C.H., Koonin, E.V., Edwards, A.M., Yakunin, A.F.: Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family. Journal of Biological Chemistry 281(47), 36149–36161 (2006)CrossRefGoogle Scholar
  26. 26.
    Seifried, A., Schultz, J., Gohla, A.: Human HAD phosphatases: structure, mechanism, and roles in health and disease. FEBS Journal 280(2), 549–571 (2013)CrossRefGoogle Scholar
  27. 27.
    Koonin, E.V., Tatusov, R.L.: Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity: Application of an iterative approach to database search. J. Mol. Bio. 244(1), 125–132 (1994)CrossRefGoogle Scholar
  28. 28.
    Burroughs, A.M., Allen, K.N., Dunaway-Mariano, D., Aravind, L.: Evolutionary Genomics of the HAD Superfamily: Understanding the Structural Adaptations and Catalytic Diversity in a Superfamily of Phosphoesterases and Allied Enzymes. Journal of Molecular Biology 361(5), 1003–1034 (2006)CrossRefGoogle Scholar
  29. 29.
    Janssen, D.B.: Biocatalysis by dehalogenating enzymes. Advances in Applied Microbiology, vol. 61, pp. 233–252. Academic Press (2007)Google Scholar
  30. 30.
    Mark Cock, J., Sterck, L., Rouz, P., Scornet, D., Allen, A., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J., Badger, J.: The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature (7298), 617–621 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • François Coste
    • 1
  • Gaëlle Garet
    • 1
  • Agnès Groisillier
    • 2
  • Jacques Nicolas
    • 1
  • Thierry Tonon
    • 2
  1. 1.Irisa / Inria RennesRennes cedexFrance
  2. 2.Integrative Biology of Marine ModelsSorbonne Universités, UPMC Univ Paris 06, UMR 8227, and CNRS, UMR 8227Roscoff cedexFrance

Personalised recommendations