Abstract
The hierarchical overlap graph (HOG for short) is an overlap encoding graph that efficiently represents overlaps from a given set P of n strings. A previously known algorithm constructs the HOG in \(O(\vert \vert P \vert \vert + n^2)\) time and \(O(\vert \vert P \vert \vert +n\times \min (n,\max \{|s|:s\in P\}))\) space, where \(\vert \vert P \vert \vert \) is the sum of lengths of the n strings in P. We present a new algorithm of \(O(\vert \vert P \vert \vert \log n)\) time and \(O(\vert \vert P \vert \vert )\) space to compute the HOG, which exploits the segment tree data structure. We also propose an alternative algorithm using \(O(\vert \vert P \vert \vert \frac{\log n}{\log \log n})\) time and \(O(\vert \vert P \vert \vert )\) space in the word RAM model of computation.
S. G. Park and K. Park—Supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00551, Framework of Practical Algorithms for NP-hard Graph Problems).
E. Rivals—ER thanks funding Labex NUMEV, GEM project (ANR 2011-LABX-076).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975). https://doi.org/10.1145/360825.360855
Arge, L., Brodal, G.S., Georgiadis, L.: Improved dynamic planar point location. In: 47th Proceedings of FOCS, pp. 305–314 (2006). https://doi.org/10.1109/FOCS.2006.40
Armen, C., Stein, C.: A \(2\frac{2}{3}\)-approximation algorithm for the shortest superstring problem. In: CPM, pp. 87–101 (1996). https://doi.org/10.1007/3-540-61258-0_8
Bassino, F., Clement, J., Nicodeme, P.: Counting occurrences for a finite set of words: combinatorial methods. ACM Trans. Algorithms 8(3), 31:1–31:28 (2012). https://doi.org/10.1145/2229163.2229175
Berg, M., Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry: Algorithms and Applications, 3rd edn. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-77974-2
Blum, A., Jiang, T., Li, M., Tromp, J., Yannakakis, M.: Linear approximation of shortest superstrings. J. ACM 41(4), 630–647 (1994). https://doi.org/10.1145/179812.179818
Cazaux, B., Juhel, S., Rivals, E.: Practical lower and upper bounds for the shortest linear superstring. In: SEA, pp. 18:1–18:14 (2018). https://doi.org/10.4230/LIPIcs.SEA.2018.18
Cazaux, B., Rivals, E.: A linear time algorithm for shortest cyclic cover of strings. J. Discrete Algorithms 37, 56–67 (2016). https://doi.org/10.1016/j.jda.2016.05.001
Cazaux, B., Rivals, E.: Hierarchical overlap graph. Inf. Process. Lett. 155, 105862 (2020). https://doi.org/10.1016/j.ipl.2019.105862
Gevezes, T.P., Pitsoulis, L.S.: Recognition of overlap graphs. J. Comb. Optim. 28(1), 25–37 (2013). https://doi.org/10.1007/s10878-013-9663-3
Giora, Y., Kaplan, H.: Optimal dynamic vertical ray shooting in rectilinear planar subdivisions. ACM Trans. Algorithms 5(3) (2009). https://doi.org/10.1145/1541885.1541889
Gonnella, G., Kurtz, S.: Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC Bioinform. 13(1), 82 (2012). https://doi.org/10.1186/1471-2105-13-82
Guibas, L.J., Odlyzko, A.M.: Periods in strings. J. Comb. Theory Ser. A 30(1), 19–42 (1981). https://doi.org/10.1016/0097-3165(81)90038-8
Gusfield, D., Landau, G.M., Schieber, B.: An efficient algorithm for the all pairs suffix-prefix problem. Inf. Process. Lett. 41(4), 181–185 (1992). https://doi.org/10.1016/0020-0190(92)90176-V
Hagerup, T.: Sorting and searching on the word RAM. In: Morvan, M., Meinel, C., Krob, D. (eds.) STACS 1998. LNCS, vol. 1373, pp. 366–398. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0028575
Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Inferring strings from suffix trees and links on a binary alphabet. Discret. Appl. Math. 163, 316–325 (2014). https://doi.org/10.1016/j.dam.2013.02.033
Jacquet, P., Szpankowski, W.: Autocorrelation on words and its applications: analysis of suffix trees by string-ruler approach. J. Comb. Theory Ser. A 66(2), 237–269 (1994). https://doi.org/10.1016/0097-3165(94)90065-5
Karkkainen, J., Piatkowski, M., Puglisi, S.J.: String inference from longest-common-prefix array. In: ICALP. LIPIcs, vol. 80, pp. 62:1–62:14 (2017). https://doi.org/10.4230/LIPIcs.ICALP.2017.62
Laaksonen, A.: Guide to Competitive Programming. UTCS. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-72547-5
Lim, J., Park, K.: A fast algorithm for the all-pairs suffix-prefix problem. Theoret. Comput. Sci. 698, 14–24 (2017). https://doi.org/10.1016/j.tcs.2017.07.013
Mucha, M.: Lyndon words and short superstrings. In: SODA, pp. 958–972. SIAM (2013). https://doi.org/10.1137/1.9781611973105.69
Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(Suppl. 2), ii79–ii85 (2005). https://doi.org/10.1093/bioinformatics/bti1114
Paluch, K.: Better approximation algorithms for maximum asymmetric traveling salesman and shortest superstring (2014). https://arxiv.org/abs/1401.3670
Park, G., Hwang, H., Nicodeme, P., Szpankowski, W.: Profiles of tries. SIAM J. Comput. 38(5), 1821–1880 (2009). https://doi.org/10.1137/070685531
Peltola, H., Soderlund, H., Tarhio, J., Ukkonen, E.: Algorithms for some string matching problems arising in molecular genetics. In: IFIP Congress, pp. 53–64 (1983)
Pevzner, P.A., Tang, H., Waterman, M.S.: An eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. 98(17), 9748–9753 (2001). https://doi.org/10.1073/pnas.171285098
Rachid, M.H., Malluhi, Q.: A practical and scalable tool to find overlaps between sequences. BioMed Res. Int. 2015 (2015). https://doi.org/10.1155/2015/905261
Robin, S., Rodolphe, F., Schbath, S.: DNA, Words and Models. Cambridge University Press, Cambridge (2005)
Sweedyk, Z.: A \(2\frac{1}{2}\)-approximation algorithm for shortest superstring. SIAM J. Comput. 29(3), 954–986 (2000). https://doi.org/10.1137/S0097539796324661
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Park, S.G., Cazaux, B., Park, K., Rivals, E. (2020). Efficient Construction of Hierarchical Overlap Graphs. In: Boucher, C., Thankachan, S.V. (eds) String Processing and Information Retrieval. SPIRE 2020. Lecture Notes in Computer Science(), vol 12303. Springer, Cham. https://doi.org/10.1007/978-3-030-59212-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-59212-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59211-0
Online ISBN: 978-3-030-59212-7
eBook Packages: Computer ScienceComputer Science (R0)