Exact Protein Structure Classification Using the Maximum Contact Map Overlap Metric
In this work we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows to avoid pairwise comparisons on the entire database and thus to significantly accelerate exploring the protein space compared to non-metric spaces. We show on a gold-standard classification benchmark set of 6,759 and 67,609 proteins, resp., that our exact k-nearest neighbor scheme classifies up to 95% and 99% of queries correctly. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on contact map overlap.
Keywordsk-nearest neighbours metric spaces maximum contact map overlap automatic classification of proteins
Unable to display preview. Download preview PDF.
- 1.Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician (1992)Google Scholar
- 9.Hidovic, D., Pelillo, M.: Metrics for attributed graphs based on the maximal similarity common subgraph. IJPRAI 18(3), 299–313 (2004)Google Scholar
- 11.Malod-Dognin, N., Przulj, N.: Gr-align: fast and flexible alignment of protein 3d structures using graphlet degree similarity. Bioinformatics (2014)Google Scholar
- 14.Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540 (1995)Google Scholar
- 18.Wohlers, I., Boudic-Jamin, M.L., Djidjev, H., Klau, G.W., Andonov, R.: Exact protein structure classification using the maximum contact map overlap metric. Tech. Rep. LA-UR-14-20815, Los Alamos National Laboratory (2014)Google Scholar
- 19.Wohlers, I., Malod-Dognin, N., Andonov, R., Klau, G.W.: CSA: comprehensive comparison of pairwise protein structure alignments. Nucleic Acids Research 40(W1), W303–W309 (2012)Google Scholar