Conditional Random Fields for Transmembrane Helix Prediction
It is estimated that 20% of genes in the human genome encode for integral membrane proteins (IMPs) and some estimates are much higher. IMPs control a broad range of events essential to the proper functioning of cells, tissues and organisms and are the most common target of clinically useful drugs . However there is a dearth of high-resolution 3D structural information on the IMPs. Therefore good prediction methods of IMPs structures are to be highly valued. In this paper we apply Conditional Random Fields (CRFs) to build a probabilistic model to solve the membrane protein helix prediction problem. The advantage of CRFs is that it allows seamless and principled integration of biological domain knowledge into the model. Our results show that the CRF model outperforms other well known helix prediction approaches on several important measures.
Unable to display preview. Download preview PDF.
- 1.Chen, C.P., Rost, B.: State-of-the-art in membrane protein prediction. Applied Bioinformatics 1, 21–35 (2002)Google Scholar
- 2.Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001)Google Scholar
- 3.Wallach, H.M.: Conditional random fields: An introduction. Technical Report MS-CIS-04-21, University of Pennsylvania (2004)Google Scholar
- 4.Li, S.: Markov random field modeling in computer vision. Springer, New York (1995)Google Scholar
- 5.Berger, A.: The improved iterative scaling algorithm: A gentle introduction. Technical report, Carnegie Mellon University (1997)Google Scholar
- 6.Buehler, E.C., Ungar, L.H.: Maximum entropy methods for biological sequence modeling. In: BIOKDD, pp. 60–64 (2001)Google Scholar
- 7.Sternberg, M.J.: Protein Structure Prediction: A Practical Approach. Oxford University Press, Oxford (1996)Google Scholar