Abstract
In this work we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows to avoid pairwise comparisons on the entire database and thus to significantly accelerate exploring the protein space compared to non-metric spaces. We show on a gold-standard classification benchmark set of 6,759 and 67,609 proteins, resp., that our exact k-nearest neighbor scheme classifies up to 95% and 99% of queries correctly. Our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on contact map overlap.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician (1992)
Andonov, R., Malod-Dognin, N., Yanev, N.: Maximum contact map overlap revisited. J. Comput. Biol. 18(1), 27–41 (2011)
Bernstein, F., Koetzle, T., Williams, G., Meyer Jr., E., Brice, M., Rodgers, J., Kennard, O., Shimanouchi, T., Tasumi, M.: The protein data bank: A computer-based archival file for macromolecular structures. J. of Mol. Biol. 112, 535 (1977)
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19, 255–259 (1998)
Caprara, A., Carr, R., Istrail, S., Lancia, G., Walenz, B.: 1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap. J. Comput. Biol. 11(1), 27–52 (2004)
Csaba, G., Birzele, F., Zimmer, R.: Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis. BMC Struct. Biol. 9, 23–23 (2009)
Godzik, A., Skolnick, J., Kolinski, A.: Regularities in interaction patterns of globular proteins. Protein Eng. 6(8), 801–810 (1993)
Harder, T., Borg, M., Boomsma, W., Røgen, P., Hamelryck, T.: Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics 28(4), 510–515 (2012)
Hidovic, D., Pelillo, M.: Metrics for attributed graphs based on the maximal similarity common subgraph. IJPRAI 18(3), 299–313 (2004)
Lathrop, R.H.: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng. 7(9), 1059–1068 (1994)
Malod-Dognin, N., Przulj, N.: Gr-align: fast and flexible alignment of protein 3d structures using graphlet degree similarity. Bioinformatics (2014)
Malod-Dognin, N., Le Boudic-Jamin, M., Kamath, P., Andonov, R.: Using dominances for solving the protein family identification problem. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS, vol. 6833, pp. 201–212. Springer, Heidelberg (2011)
Moreno-Seco, F., Mico, L., Oncina, J.: A modification of the laesa algorithm for approximated k-nn classification. Pattern Recognition Letters 24, 47–53 (2003)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540 (1995)
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH–a hierarchic classification of protein domain structures. Structure 5(8), 1093–1108 (1997)
Pelta, D.A., González, J.R., Moreno Vega, M.: A simple and fast heuristic for protein structure comparison. BMC Bioinformatics 9, 161–161 (2008)
Rogen, P., Fain, B.: Automatic classification of protein structure by using gauss integrals. Proceedings of the National Academy of Sciences of the United States of America 100(1), 119–124 (2003)
Wohlers, I., Boudic-Jamin, M.L., Djidjev, H., Klau, G.W., Andonov, R.: Exact protein structure classification using the maximum contact map overlap metric. Tech. Rep. LA-UR-14-20815, Los Alamos National Laboratory (2014)
Wohlers, I., Malod-Dognin, N., Andonov, R., Klau, G.W.: CSA: comprehensive comparison of pairwise protein structure alignments. Nucleic Acids Research 40(W1), W303–W309 (2012)
Xie, W., Sahinidis, N.V.: A reduction-based exact algorithm for the contact map overlap problem. J. Comput. Biol. 14(5), 637–654 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wohlers, I., Le Boudic-Jamin, M., Djidjev, H., Klau, G.W., Andonov, R. (2014). Exact Protein Structure Classification Using the Maximum Contact Map Overlap Metric. In: Dediu, AH., Martín-Vide, C., Truthe, B. (eds) Algorithms for Computational Biology. AlCoB 2014. Lecture Notes in Computer Science(), vol 8542. Springer, Cham. https://doi.org/10.1007/978-3-319-07953-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-07953-0_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07952-3
Online ISBN: 978-3-319-07953-0
eBook Packages: Computer ScienceComputer Science (R0)