3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach

Vignesh, U.; Parvathi, R.

doi:10.1007/s11227-018-2319-4

3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach

Published: 05 April 2018

Volume 76, pages 4287–4301, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

183 Accesses
2 Citations
Explore all metrics

Abstract

This work explains synthesis of protein structures based on the unsupervised learning method known as clustering. Protein structure prediction was performed for different crab and egg datasets with inputs collected from the Protein Data Bank (PDB ID: 3LIG, 2W3Z, 3ZVQ, 2KLR and 2YIZ). The three-dimensional protein structure was merged together with the filtering instances inbuilt in data mining techniques known as MergeSets. The problem description in this proposed methodology, referred to as attribute-related cluster sequence analysis, is to identify a good working algorithm for clustering of protein structures by comparing four existing algorithms: k-means, expectation maximization, farthest first and COBWEB. Experiments are conducted with the BioWeka data mining tool, Modeler 9.15 and the PyMOL tool with scripts using the Python programming language. This paper shows that the expectation maximization algorithm is the best for structured protein clustering, and this will also pave the way for identifying better algorithms for supervised learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RAFTS3G: an efficient and versatile clustering software to analyses in large protein datasets

Article Open access 15 July 2019

Bruno Thiago de Lima Nichio, Aryel Marlus Repula de Oliveira, … Roberto Tadeu Raittz

Detecting intermediate protein conformations using algebraic topology

Article Open access 06 December 2017

Nurit Haspel, Dong Luo & Eduardo González

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Article Open access 05 February 2015

Juliana S Bernardes, Fabio RJ Vieira, … Gerson Zaverucha

References

Vignesh U (2013) Implementing efficient DNA matching using suffix tree. Eng Sci Int Res J 1:170–172
Google Scholar
Vignesh U, Sivakumar M (2013) Implementing high performance retrieval process by max-score ranking. IOSR J Comput Eng 8:28–33
Article Google Scholar
Vignesh U, Senthilraja P (2013) MashQL Editor using Query Detection Algorithm. Eng Sci Int Res J 1:173–176
Google Scholar
Vignesh U, Valarmathi P, Arun S (2013) Implementing clustering using CSI by K-means. Int J Eng Sci Innov Technol 2:568–573
Google Scholar
Vignesh U, Parvathi R (2017) Clustering on structured proteins with filtering instances on Bioweka. J Eng Sci Technol 12:820–833
Google Scholar
Vignesh U, Parvathi R (2017) Next generation sequencing data analysis software and methods: a survey. Int J Control Theory Appl 9:1–28
Google Scholar
Vignesh S, Robert P, Vignesh U, Bharathidasan D, Rajasekaran S (2013) Implementing CURE to address scalability issue in social media. Int J Comput Eng Res 3:1–7
Google Scholar
Birlutiu A, d’Alche-Buc F, Heskes T (2015) A Bayesian framework for combining protein and network topology information for predicting protein–protein interactions. IEEE Trans Comput Biol Bioinform 12(1):538–550
Article Google Scholar
Song D, Chen J, Chen G, Li N, Li J, Fan J, Bu D, Li SC (2015) Parameterized BLOSUM matrices for protein alignment. IEEE Trans Comput Biol Bioinform 12(3):686–694
Article Google Scholar
Tseng VA, Kao C-P (2005) Efficiently mining gene expression data via a novel parameterless clustering method. IEEE Trans Comput Biol Bioinform 2(1):355–365
Article Google Scholar
Yang J, Wang W (2003) CLUSEQ: efficient and effective sequence clustering. In: 19th International Conference on Data Engineering, IEEE Computer Society Press, Los Alamitos, pp 101–112
Ng YK, Yin L, Ono H, Li SC (2015) Finding all longest common segments in protein structures efficiently. IEEE Trans Comput Biol Bioinform 12(3):644–655
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing Science and Engineering, VIT University - Chennai Campus, Vandalur – Kelambakkam Road, Chennai, 600 127, India
U. Vignesh & R. Parvathi

Authors

U. Vignesh
View author publications
You can also search for this author in PubMed Google Scholar
R. Parvathi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to U. Vignesh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vignesh, U., Parvathi, R. 3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach. J Supercomput 76, 4287–4301 (2020). https://doi.org/10.1007/s11227-018-2319-4

Download citation

Published: 05 April 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11227-018-2319-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach

Abstract

Access this article

Similar content being viewed by others

RAFTS3G: an efficient and versatile clustering software to analyses in large protein datasets

Detecting intermediate protein conformations using algebraic topology

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach

Abstract

Access this article

Similar content being viewed by others

RAFTS3G: an efficient and versatile clustering software to analyses in large protein datasets

Detecting intermediate protein conformations using algebraic topology

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation