Journal of Structural and Functional Genomics

, Volume 6, Issue 2–3, pp 195–202 | Cite as

Automatic Classification and Pattern Discovery in High-throughput Protein Crystallization Trials

  • Christian Cumbaa
  • Igor Jurisica


Conceptually, protein crystallization can be divided into two phases search and optimization. Robotic protein crystallization screening can speed up the search phase, and has a potential to increase process quality. Automated image classification helps to increase throughput and consistently generate objective results. Although the classification accuracy can always be improved, our image analysis system can classify images from 1536-well plates with high classification accuracy (85%) and ROC score (0.87), as evaluated on 127 human-classified protein screens containing 5600 crystal images and 189472 non-crystal images. Data mining can integrate results from high-throughput screens with information about crystallizing conditions, intrinsic protein properties, and results from crystallization optimization. We apply association mining, a data mining approach that identifies frequently occurring patterns among variables and their values. This approach segregates proteins into groups based on how they react in a broad range of conditions, and clusters cocktails to reflect their potential to achieve crystallization. These results may lead to crystallization screen optimization, and reveal associations between protein properties and crystallization conditions. We also postulate that past experience may lead us to the identification of initial conditions favorable to crystallization for novel proteins.

Key words

association-rule discovery image analysis protein crystallization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Acton, T.B., Gunsalus, K., Xiao, R., Ma, L., Aramini, J., Baron, M.C., Chiang, Y., Clement, T., Cooper, B., Denissova, N., Douglas, S., Everett, J.K., Palacios, D., Paranji, R.H., Shastry, R., Wu, M., Ho, C.-H., Shih, L., Swapna, G.V.T., Wilson, M., Gerstein, M., Inouye, M., Hunt, J.F., Montelione, G.T. 2005Meth. Enzymol394210243CrossRefPubMedGoogle Scholar
  2. 2.
    Agrawal, R., Imielinski, T. and Swami, A.N. (1993). In Mining Association Rules between Sets of Items in Large Databases. (Eds., Buneman, P., Jajodia, S.), Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, ACM Press, 207–216Google Scholar
  3. 3.
    Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., O’Donovan, C., Redaschi, N., Yeh, L.S. 2004Nucleic Acids Res.32D115D119CrossRefPubMedGoogle Scholar
  4. 4.
    Bern, M., Goldberg, D., Stevens, R.C., Kuhn, P. 2004J. Appl. Cryst.37279287CrossRefGoogle Scholar
  5. 5.
    Cumbaa, C., Lauricella, A.M., Fehrman, N.A., Veatch, C.K., Collins, R.J., Luft, J.R., Titta, G.T., Jurisica, I. 2003Acta Cryst., D5916191627Google Scholar
  6. 6.
    Fisher, R. 1936Ann. Eugenics7179188Google Scholar
  7. 7.
    Gilliland, G.L., Tung, M., Ladner, J. 1996J. Res. Natl. Inst. Stand. Technol.101309320PubMedGoogle Scholar
  8. 8.
    Goethals, B. (2003). Survey on Frequent Pattern Mining. Manuscript
  9. 9.
    Goh, C.S., Lan, N., Echols, N., Douglas, S.M., Milburn, D., Bertone, P., Xiao, R., Ma, L.C., Zheng, D., Wunderlich, Z., Acton, T., Montelione, G.T., Gerstein, M. 2003Nucleic Acids Res.3128332838CrossRefPubMedGoogle Scholar
  10. 10.
    Gopalakrishnan, V., Livingston, G., Hennessy, D., Buchanan, B., Rosenberg, J.M. 2004Acta Crystallogr. D. Biol. Crystallogr.6017051716CrossRefPubMedGoogle Scholar
  11. 11.
    Luft, J.R., Collins, R.J., Fehrman, N.A., Lauricella, A.M., Veatch, C.K., Titta, G.T. 2003J. Struct. Biol.1421701799CrossRefPubMedGoogle Scholar
  12. 12.
    Putnam C. (1999). The Protein Calculator. cdputnam/protcalc.html. Scripps Research InstituteGoogle Scholar
  13. 13.
    Spraggon, G., Lesley, S.A., Kreusch, A., Priestle, J.P. 2002Acta Crystallogr. D Biol. Crystallogr.5819151923CrossRefPubMedGoogle Scholar
  14. 14.
    Srikant, R. and Agrawal, R. (1996). In Mining Quantitative Association Rules in Large Relational Tables. (Eds., Jagadish, H.V., Mumick, I.S.). Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, ACM Press, 1–12Google Scholar
  15. 15.
    Wilson, J. 2002Acta Cryst. D.5819071914CrossRefGoogle Scholar
  16. 16.
    Zhu, X., Sun, S., Cheng, S.E. and Bern, M. (2004). Classification of Protein Crystallization Imagery. 26th Annual International Conference of IEEE Engineering in Medicine and Biology Society, IEEE PressGoogle Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  1. 1.Ontario Cancer Institute, Northeast Structural Genomics ConsortiumTorontoCanada

Personalised recommendations