Skip to main content
Log in

3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This work explains synthesis of protein structures based on the unsupervised learning method known as clustering. Protein structure prediction was performed for different crab and egg datasets with inputs collected from the Protein Data Bank (PDB ID: 3LIG, 2W3Z, 3ZVQ, 2KLR and 2YIZ). The three-dimensional protein structure was merged together with the filtering instances inbuilt in data mining techniques known as MergeSets. The problem description in this proposed methodology, referred to as attribute-related cluster sequence analysis, is to identify a good working algorithm for clustering of protein structures by comparing four existing algorithms: k-means, expectation maximization, farthest first and COBWEB. Experiments are conducted with the BioWeka data mining tool, Modeler 9.15 and the PyMOL tool with scripts using the Python programming language. This paper shows that the expectation maximization algorithm is the best for structured protein clustering, and this will also pave the way for identifying better algorithms for supervised learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Vignesh U (2013) Implementing efficient DNA matching using suffix tree. Eng Sci Int Res J 1:170–172

    Google Scholar 

  2. Vignesh U, Sivakumar M (2013) Implementing high performance retrieval process by max-score ranking. IOSR J Comput Eng 8:28–33

    Article  Google Scholar 

  3. Vignesh U, Senthilraja P (2013) MashQL Editor using Query Detection Algorithm. Eng Sci Int Res J 1:173–176

    Google Scholar 

  4. Vignesh U, Valarmathi P, Arun S (2013) Implementing clustering using CSI by K-means. Int J Eng Sci Innov Technol 2:568–573

    Google Scholar 

  5. Vignesh U, Parvathi R (2017) Clustering on structured proteins with filtering instances on Bioweka. J Eng Sci Technol 12:820–833

    Google Scholar 

  6. Vignesh U, Parvathi R (2017) Next generation sequencing data analysis software and methods: a survey. Int J Control Theory Appl 9:1–28

    Google Scholar 

  7. Vignesh S, Robert P, Vignesh U, Bharathidasan D, Rajasekaran S (2013) Implementing CURE to address scalability issue in social media. Int J Comput Eng Res 3:1–7

    Google Scholar 

  8. Birlutiu A, d’Alche-Buc F, Heskes T (2015) A Bayesian framework for combining protein and network topology information for predicting protein–protein interactions. IEEE Trans Comput Biol Bioinform 12(1):538–550

    Article  Google Scholar 

  9. Song D, Chen J, Chen G, Li N, Li J, Fan J, Bu D, Li SC (2015) Parameterized BLOSUM matrices for protein alignment. IEEE Trans Comput Biol Bioinform 12(3):686–694

    Article  Google Scholar 

  10. Tseng VA, Kao C-P (2005) Efficiently mining gene expression data via a novel parameterless clustering method. IEEE Trans Comput Biol Bioinform 2(1):355–365

    Article  Google Scholar 

  11. Yang J, Wang W (2003) CLUSEQ: efficient and effective sequence clustering. In: 19th International Conference on Data Engineering, IEEE Computer Society Press, Los Alamitos, pp 101–112

  12. Ng YK, Yin L, Ono H, Li SC (2015) Finding all longest common segments in protein structures efficiently. IEEE Trans Comput Biol Bioinform 12(3):644–655

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to U. Vignesh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vignesh, U., Parvathi, R. 3D visualization and cluster analysis of unstructured protein sequences using ARCSA with a file conversion approach. J Supercomput 76, 4287–4301 (2020). https://doi.org/10.1007/s11227-018-2319-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2319-4

Keywords

Navigation