Granular Approach for Protein Sequence Analysis
Granular computing uses granules as basic units to compute with. Granules can be formed by either information abstraction or information decomposition. In this paper, we view information decomposition as a paradigm for processing data with complex structures. More specifically, we apply lossless information decomposition to protein sequence analysis. By decomposing a protein sequence into a set of proper granules and applying dynamic programming to align the position sequences of two corresponding granules, we are able to distribute the calculation of pairwise similarity of protein sequences to multiple parallel processes, each of which is less time consuming than the calculation based on an alignment of original sequences.
KeywordsPosition Series Position Sequence Pairwise Similarity Information Granulation Protein Sequence Analysis
Unable to display preview. Download preview PDF.
- 2.Zadeh, L.: Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems. Soft Computing, 23–25 (1998)Google Scholar
- 3.Yao, J.T.: Recent Developments in Granular Computing: A Bibliometrics Study. In: Proceedings of IEEE International Conference on Granular Computing, Hangzhou, China, pp. 74–79 (2008)Google Scholar
- 4.Yao, J.T.: A Ten-Year Review of Granular Computing. In: Proceedings of 2007 IEEE International Conference on Granular Computing, Sillicon Valley, CA, USA, pp. 734–739 (2007)Google Scholar
- 5.Lin, T.: Granular computing of binary relations I: data mining and neighborhood systems. In: Polkowski, Skowron (eds.) Rough Sets and Knowledge Discovery, pp. 107–121. Physica-Verlag (1998)Google Scholar
- 8.Yao, Y.: Information granulation and rough set approximation. International Journal of Intelligent Systems, 87–104 (2001)Google Scholar
- 9.Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasonging about Data. Kluwer Academic Publishers (1991)Google Scholar
- 10.Needleman, B., Wunsch, D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 443–453 (1970)Google Scholar
- 11.Smith, F., Waterman, S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology, 195–197 (1981)Google Scholar
- 12.Leslie, C., Eskin, E., Weston, J., Noble, W.: Mismatch String Kernels for SVM Protein Classification. In: Advances in Neural Information Processing Systems, NIPS 2002, Vancouver, British Columbia, Canada, December 9-14, pp. 1417–1424 (2002)Google Scholar
- 13.Akkoc, C., Johnsten, T., Benton, R.: Multi-layered Vector Spaces for Classifying and Analyzing Biological Sequences. In: Proceedings of 2011 International Conference on Bioinformatics and Computational Biology, New Orleans, pp. 160–166 (2011)Google Scholar
- 14.Liao, L., Noble, S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology, 857–868 (2003)Google Scholar