Abstract
It has been mentioned in Chapter [3] that text-mining techniques can be used to classify genomes. Out of all the methods considered in the previous chapter, the N-gram technique is one of the most appropriate for the genome text classification. In the field of linguistics, the N-gram concept has always been marginal and isolated. Similarly, in the case of genetic texts, a set of N-grams is in no way a set of functional elements. However, for the needs of formal text recognition, the N-gram technique proved to be exceptionally useful. On the other hand, the notion of “word” has not yet been successfully used in the genetic context. However, as it has been shown above, in this context, it is possible to give a definition of a “word” as having certain functional meaning. Nevertheless, the word as an element of the genetic text (similar to the case of hieroglyphic written language) is not as much flexible and universal as it can be in European languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bolshoy, A., Volkovich, Z.(., Kirzhner, V., Barzily, Z. (2010). N-Gram Spectra of the DNA Text. In: Genome Clustering. Studies in Computational Intelligence, vol 286. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12952-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-12952-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12951-3
Online ISBN: 978-3-642-12952-0
eBook Packages: EngineeringEngineering (R0)