False Positive Reduction in Automatic Segmentation System
An application has been developed for automatic segmentation of Potyvirus polyproteins through stochastic models of Pattern Recognition. These models usually find the correct location of the cleavage site but also suggest other possible locations called false positives. For reducing the number of false positives, we evaluated three methods. The first is to shrink the search range skipping portions of polyprotein with low probability of containing the cleavage site. In the second and third approach, we use a measure to rank candidate locations in order to maximize the ranking of the correct cleavage site. Here we evaluate probability emitted by Hidden Markov Models (HMM) and Minimum Editing Distance (MED) as measure alternatives. Our results indicate that HMM probability is a better quality measure of a candidate location than MED. This probability is useful to eliminate most of false positive. Besides, it allows to quantify the quality of an automatic segmentation.
KeywordsHide Markov Model Cleavage Site Automatic Segmentation Search Range Candidate Location
Unable to display preview. Download preview PDF.
- 2.von Heijne, G.: Patterns of amino acids near signal-sequence cleavage sites. Eur. J. Biochem. 133(1), 17–21 (1983); PubMed: 6852022Google Scholar
- 3.Li, B.Q., Cai, Y.D., Feng, K.Y., Zhao, G.J.: Prediction of protein cleavage site with feature selection by random forest. PLoS ONE 7(9), e45854 (2012) doi:10.1371/journal.pone.0045854, PubMed Central:PMC3445488, PubMed:23029276Google Scholar