Skip to main content

Advertisement

Log in

Automatic Classification and Pattern Discovery in High-throughput Protein Crystallization Trials

  • Published:
Journal of Structural and Functional Genomics

Abstract

Conceptually, protein crystallization can be divided into two phases search and optimization. Robotic protein crystallization screening can speed up the search phase, and has a potential to increase process quality. Automated image classification helps to increase throughput and consistently generate objective results. Although the classification accuracy can always be improved, our image analysis system can classify images from 1536-well plates with high classification accuracy (85%) and ROC score (0.87), as evaluated on 127 human-classified protein screens containing 5600 crystal images and 189472 non-crystal images. Data mining can integrate results from high-throughput screens with information about crystallizing conditions, intrinsic protein properties, and results from crystallization optimization. We apply association mining, a data mining approach that identifies frequently occurring patterns among variables and their values. This approach segregates proteins into groups based on how they react in a broad range of conditions, and clusters cocktails to reflect their potential to achieve crystallization. These results may lead to crystallization screen optimization, and reveal associations between protein properties and crystallization conditions. We also postulate that past experience may lead us to the identification of initial conditions favorable to crystallization for novel proteins.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. T.B. Acton K. Gunsalus R. Xiao L. Ma J. Aramini M.C. Baron Y. Chiang T. Clement B. Cooper N. Denissova S. Douglas J.K. Everett D. Palacios R.H. Paranji R. Shastry M. Wu C.-H. Ho L. Shih G.V.T. Swapna M. Wilson M. Gerstein M. Inouye J.F. Hunt G.T. Montelione (2005) Meth. Enzymol 394 210–243 Occurrence Handle10.1016/S0076-6879(05)94008-1 Occurrence Handle15808222

    Article  PubMed  Google Scholar 

  2. Agrawal, R., Imielinski, T. and Swami, A.N. (1993). In Mining Association Rules between Sets of Items in Large Databases. (Eds., Buneman, P., Jajodia, S.), Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, ACM Press, 207–216

  3. R. Apweiler A. Bairoch C.H. Wu W.C. Barker B. Boeckmann S. Ferro E. Gasteiger H. Huang R. Lopez M. Magrane M.J. Martin D.A. Natale C. O’Donovan N. Redaschi L.S. Yeh (2004) Nucleic Acids Res. 32 D115–D119 Occurrence Handle10.1093/nar/gkh131 Occurrence Handle14681372

    Article  PubMed  Google Scholar 

  4. M. Bern D. Goldberg R.C. Stevens P. Kuhn (2004) J. Appl. Cryst. 37 279–287 Occurrence Handle10.1107/S0021889804001761

    Article  Google Scholar 

  5. C. Cumbaa A.M. Lauricella N.A. Fehrman C.K. Veatch R.J. Collins J.R. Luft G.T. Titta ParticleDe I. Jurisica (2003) Acta Cryst., D 59 1619–1627

    Google Scholar 

  6. R. Fisher (1936) Ann. Eugenics 7 179–188

    Google Scholar 

  7. G.L. Gilliland M. Tung J. Ladner (1996) J. Res. Natl. Inst. Stand. Technol. 101 IssueID3 309–320 Occurrence Handle11542472

    PubMed  Google Scholar 

  8. Goethals, B. (2003). Survey on Frequent Pattern Mining. http://www.cs.helsinki.fi/u/goethals/publications/survey.pdf Manuscript

  9. C.S. Goh N. Lan N. Echols S.M. Douglas D. Milburn P. Bertone R. Xiao L.C. Ma D. Zheng Z. Wunderlich T. Acton G.T. Montelione M. Gerstein (2003) Nucleic Acids Res. 31 IssueID11 2833–2838 Occurrence Handle10.1093/nar/gkg397 Occurrence Handle12771210

    Article  PubMed  Google Scholar 

  10. V. Gopalakrishnan G. Livingston D. Hennessy B. Buchanan J.M. Rosenberg (2004) Acta Crystallogr. D. Biol. Crystallogr. 60 1705–1716 Occurrence Handle10.1107/S090744490401683X Occurrence Handle15388916

    Article  PubMed  Google Scholar 

  11. J.R. Luft R.J. Collins N.A. Fehrman A.M. Lauricella C.K. Veatch G.T. Titta ParticleDe (2003) J. Struct. Biol. 142 IssueID1 170–1799 Occurrence Handle10.1016/S1047-8477(03)00048-0 Occurrence Handle12718929

    Article  PubMed  Google Scholar 

  12. Putnam C. (1999). The Protein Calculator. http://www.scripps.edu/ cdputnam/protcalc.html. Scripps Research Institute

  13. G. Spraggon S.A. Lesley A. Kreusch J.P. Priestle (2002) Acta Crystallogr. D Biol. Crystallogr. 58 IssueID11 1915–1923 Occurrence Handle10.1107/S0907444902016840 Occurrence Handle12393922

    Article  PubMed  Google Scholar 

  14. Srikant, R. and Agrawal, R. (1996). In Mining Quantitative Association Rules in Large Relational Tables. (Eds., Jagadish, H.V., Mumick, I.S.). Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, ACM Press, 1–12

  15. J. Wilson (2002) Acta Cryst. D. 58 1907–1914 Occurrence Handle10.1107/S0907444902016633

    Article  Google Scholar 

  16. Zhu, X., Sun, S., Cheng, S.E. and Bern, M. (2004). Classification of Protein Crystallization Imagery. 26th Annual International Conference of IEEE Engineering in Medicine and Biology Society, IEEE Press

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Jurisica.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cumbaa, C., Jurisica, I. Automatic Classification and Pattern Discovery in High-throughput Protein Crystallization Trials. J Struct Funct Genomics 6, 195–202 (2005). https://doi.org/10.1007/s10969-005-5243-9

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10969-005-5243-9

Key words

Navigation