IWOCA 2013: Combinatorial Algorithms pp 337-348

# Suffix Tree of Alignment: An Efficient Index for Similar Data

• Joong Chae Na
• Heejin Park
• Maxime Crochemore
• Jan Holub
• Costas S. Iliopoulos
• Laurent Mouchard
• Kunsoo Park
Conference paper

DOI: 10.1007/978-3-642-45278-9_29

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8288)
Cite this paper as:
Na J.C. et al. (2013) Suffix Tree of Alignment: An Efficient Index for Similar Data. In: Lecroq T., Mouchard L. (eds) Combinatorial Algorithms. IWOCA 2013. Lecture Notes in Computer Science, vol 8288. Springer, Berlin, Heidelberg

## Abstract

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings A and B is a compacted trie representing all suffixes in A and B. It has |A| + |B| leaves and can be constructed in O(|A| + |B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of A and B.

In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of A and B has |A| + ld + l1 leaves where ld is the sum of the lengths of all parts of B different from A and l1 is the sum of the lengths of some common parts of A and B. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern P in O(|P| + occ) time where occ is the number of occurrences of P in A and B. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires O(|A| + ld + l1 + l2) time where l2 is the sum of the lengths of other common substrings of A and B. When the suffix tree of A is already given, it requires O(ld + l1 + l2) time.

### Keywords

Indexes for similar data suffix trees alignments

## Authors and Affiliations

• Joong Chae Na
• 1
• Heejin Park
• 2
• Maxime Crochemore
• 3
• Jan Holub
• 4
• Costas S. Iliopoulos
• 3
• Laurent Mouchard
• 5
• Kunsoo Park
• 6
1. 1.Sejong UniversityKorea
2. 2.Hanyang UniversityKorea
3. 3.King’s College LondonUK
4. 4.Czech Technical University in PragueCzech Republic
5. 5.University of RouenFrance
6. 6.Seoul National UniversityKorea