Abstract
As a result of the increasing interest on genomic signal processing, there is the necessity of develop computational software that combines different tools for signal processing and automatic analysis. This paper presents a computational tool for mapping and clustering DNA sequences. Several DNA numerical representations, a feature extraction method, the K-means algorithm and different clustering evaluation metrics were implemented. This software allows to researchers to perform genomic signal analysis through a graphical user interface, without need programming skills. The tool is prepared to increase their capabilities by implementing different algorithms or modules. Also, a comparative analysis of eleven DNA numerical representation is presented. The results show that Electron-ion and Voss present the best performances when clustering genomic signals using K-means. The clusters quality was measure using the ARI metric.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chang, X., Escobar, F.A., Valderrama, C., Robert, V.: Exploring sequence alignment algorithms on FPGA-based heterogeneous architectures. In: IWBBIO, pp. 330–341 (2014)
Hou, W., Pan, Q., Peng, Q., He, M.: A new method to analyze protein sequence similarity using dynamic time warping. Genomics 109(2), 123–130 (2017)
Mendizabal-Ruiz, G., Román-Godínez, I., Torres-Ramos, S., Salido-Ruiz, R.A., Vélez-Pérez, H., Morales, J.A.: Genomic signal processing for DNA sequence clustering. PeerJ 6, e4264 (2018)
Mendizabal-Ruiz, G., Román-Godínez, I., Torres-Ramos, S., Salido-Ruiz, R.A., Morales, J.A.: On DNA numerical representations for genomic similarity computation. PLoS ONE 12(3), e0173288 (2017)
Borrayo, E., Mendizabal-Ruiz, E.G., Vélez-Pérez, H., Romo-Vázquez, R., Mendizabal, A.P., Morales, J.A.: Genomic signal processing methods for computation of alignment-free distances from DNA sequences. PLoS ONE 9(11), e110954 (2014)
Dougherty, E.R., Shmulevich, I., Chen, J., Wang, Z.J.: Genomic Signal Processing and Statistics, vol. 2. Hindawi Publishing Corporation, New York (2005)
Liu, D.W., Jia, R.P., Wang, C.F., Arunkumar, N., Narasimhan, K., Udayakumar, M., Elamaran, V.: Automated detection of cancerous genomic sequences using genomic signal processing and machine learning. Future Gener. Comput. Syst. 98, 233–237 (2019)
Shen, T., Nagai, Y., Udayakumar, M., Narasimhan, K., Shriram, R.K., Mohanraj, N., Elamaran, V.: Automated genomic signal processing for diseased gene identification. J. Med. Imaging Health Inform. 9(6), 1254–1261 (2019)
Weighill, D., Macaya-Sanz, D., DiFazio, S.P., Joubert, W., Shah, M., Schmutz, J., Jacobson, D.: Wavelet-based genomic signal processing for centromere identification and hypothesis generation. Front. Genet. 10, 487 (2019)
Mabrouk, M.S., Naeem, S.M., Eldosoky, M.A.: Different genomic signal processing methods for eukaryotic gene prediction: a systematic REVIEW. Biomed. Eng.: Appl. Basis Commun. 29(01), 1730001 (2017)
Anastassiou, D.: Genomic signal processing. IEEE Signal Process. Mag. 18(4), 8–20 (2001)
Yu, N., Li, Z., Yu, Z.: Survey on encoding schemes for genomic data representation and feature learning—from signal processing to machine learning. Big Data Min. Analytics 1(3), 191–210 (2018)
NCBI. https://www.ncbi.nlm.nih.gov/. Accessed 14 May 2019
BioPython. https://biopython.org/. Accessed 14 May 2019
Kwan, H.K., Arniker, S.B.: Numerical representation of DNA sequences. In: IEEE International Conference on Electro/Information Technology EIT2009, pp. 307–310. IEEE (2009)
DSP Guide. http://www.dspguide.com/. Accessed 14 May 2019
The Python Package Index (PyPI). https://pypi.org/. Accessed 14 May 2019
NumPy Package. http://www.numpy.org/. Accessed 14 May 2019
Scikit-learn. https://scikit-learn.org/stable/. Accessed 14 May 2019
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Santos, J.M., Embrechts, M.: On the use of the adjusted rand index as a metric for evaluating supervised classification. In: International Conference on Artificial Neural Networks ICANN 2009, pp. 175–184. Springer, Heidelberg (2009)
The wxPython. https://wxpython.org/. Accessed 14 May 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
The authors declare that they have no conflict of interest.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ramírez, V., Román-Godínez, I., Torres-Ramos, S. (2020). DNA-MC: Tool for Mapping and Clustering DNA Sequences. In: González Díaz, C., et al. VIII Latin American Conference on Biomedical Engineering and XLII National Conference on Biomedical Engineering. CLAIB 2019. IFMBE Proceedings, vol 75. Springer, Cham. https://doi.org/10.1007/978-3-030-30648-9_98
Download citation
DOI: https://doi.org/10.1007/978-3-030-30648-9_98
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30647-2
Online ISBN: 978-3-030-30648-9
eBook Packages: EngineeringEngineering (R0)