Granular Support Vector Machine Based Method for Prediction of Solubility of Proteins on Overexpression in Escherichia Coli

  • Pankaj Kumar
  • V. K. Jayaraman
  • B. D. Kulkarni
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4815)


We employed a granular support vector Machines(GSVM) for prediction of soluble proteins on over expression in Escherichia coli . Granular computing splits the feature space into a set of subspaces (or information granules) such as classes, subsets, clusters and intervals [14]. By the principle of divide and conquer it decomposes a bigger complex problem into smaller and computationally simpler problems. Each of the granules is then solved independently and all the results are aggregated to form the final solution. For the purpose of granulation association rules was employed. The results indicate that a difficult imbalanced classification problem can be successfully solved by employing GSVM.


Association Rule Support Vector Machine Model Mine Association Rule Minority Class Granular Computing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agrawal, et al.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Washington, D.C., pp. 207–216 (May 1993)Google Scholar
  2. 2.
    Agrawal, R., Ramakrishnan, S.: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pp. 12–15. Morgan Kaufmann, San Francisco (1994)Google Scholar
  3. 3.
    Baneyx, F.: Recombinant protein expression in Escherichia coli. Curr. Opin. Biotechnol. 10, 411–421 (1999)CrossRefGoogle Scholar
  4. 4.
    Bertone, P., et al.: SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics. Nucleic Acids Res. 29, 2884–2898 (2001)CrossRefGoogle Scholar
  5. 5.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Disc 2(2), 121–167 (1998)CrossRefGoogle Scholar
  6. 6.
    Davis, G.D., Elisee, C., Newham, D.M., Harrison, R.G.: New Fusion Protein Systems Designed to Give Soluble Expression in Escherichia coli. Biotechnol. Bioeng. 65, 382–388 (1999)CrossRefGoogle Scholar
  7. 7.
    Goh, C.S., et al.: Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J. Mol. Biol. 336, 115–130 (2004)CrossRefGoogle Scholar
  8. 8.
    Harrison, R.G.: Expression of soluble heterologous proteins via fusion with NusA protein. inNovations 11, 4–7 (2000)Google Scholar
  9. 9.
    Hirota, K., Pedrycz, W.: Fuzzy computing for data mining. Proceedings of the IEEE 87, 1575–1600 (1999)CrossRefGoogle Scholar
  10. 10.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  11. 11.
    Idicula-Thomas, S., Balaji, P.V.: Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli. emphProtein Sci. 14, 582–592 (2005)CrossRefGoogle Scholar
  12. 12.
    Idicula-Thomas, S., Kulkarni, A.J., Kulkarni, B.D., Jayaraman, V.K., Balaji, P.V.: A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli. Bioinformatics 22, 278–284 (2006)CrossRefGoogle Scholar
  13. 13.
    Keerthi, S.S., Lin, C.-J.: Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation 15(7), 1667–1689 (2003)zbMATHCrossRefGoogle Scholar
  14. 14.
    Lin, T.Y.: Granular computing, Announcement of the BISC Special Interest Group on Granular Computing (1997)Google Scholar
  15. 15.
    Luan, C.H., et al.: High-throughput expression of C. elegans proteins. Genome Res. 14, 2102–2110 (2004)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Yuchun, T., Bo, J., Zhang, Y.-Q.: Granular support vector machines with association rules mining for protein homology prediction, Artificial Intelligence in Medicine. Computational Intelligence Techniques in Bioinformatics 35(1-2), 121–134 (2005)Google Scholar
  17. 17.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)zbMATHGoogle Scholar
  18. 18.
    Wilkinson, D.L., Harrison, R.G.: Predicting the solubility of recombinant proteins in Escherichia coli. Biotechnology 9, 443–448 (1991)CrossRefGoogle Scholar
  19. 19.
    Yao, Y.Y.: Granular computing: basic issues and possible solutions. In: Wang, P.P. (ed.) Proceedings of the 5th Joint Conference on Information Sciences, Atlantic City, New Jersey, USA. Association for Intelligent Machinery, vol. I, pp. 186–189 (2000)Google Scholar
  20. 20.
    Zadeh, L.A.: Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems 90(2), 111–127 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Zhong, W., He, J., Harrison, R., Tai, P.C., Pan, Y.: Clustering support vector machines for protein local structure prediction. Expert Systems with Applications 32(2), 518–526 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Pankaj Kumar
    • 2
  • V. K. Jayaraman
    • 1
  • B. D. Kulkarni
    • 1
  1. 1.Chemical Engineering Division, National Chemical Laboratory, Pune-411008India
  2. 2.Department of Chemical Engineering, Indian Institute of Technology, Kharagpur-721302India

Personalised recommendations