Parallel construction of wavelet trees on multicore architectures


The wavelet tree has become a very useful data structure to efficiently represent and query large volumes of data in many different domains, from bioinformatics to geographic information systems. One problem with wavelet trees is their construction time. In this paper, we introduce two algorithms that reduce the time complexity of a wavelet tree’s construction by taking advantage of nowadays ubiquitous multicore machines. Our first algorithm constructs all the levels of the wavelet in parallel with O(n) time and \(O(n\lg \sigma + \sigma \lg n)\) bits of working space, where n is the size of the input sequence and \(\sigma \) is the size of the alphabet. Our second algorithm constructs the wavelet tree in a domain decomposition fashion, using our first algorithm in each segment, reaching \(O(\lg n)\) time and \(O(n\lg \sigma + p\sigma \lg n/\lg \sigma )\) bits of extra space, where p is the number of available cores. Both algorithms are practical and report good speedup for large real datasets.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    We use \(\lg x = \log _2 x\).

  2. 2.

    Notice that the RAM model is a subset of the DYM model where the outdegree of every vertex \(v \in V\) is \({\le }1\).

  3. 3.

    We also tested a new version of Libcds called Libcds2; however, the former had better running times for the construction of wtrees.

  4. 4. (April, 2015).

  5. 5. (April, 2015).

  6. 6. (April, 2015).

  7. 7. (March, 2013).

  8. 8.

    In order to be less sensitive to outliers, we use the median time instead of other statistics. In our experiments, the pwt algorithm showed a larger deviation with respect to the number of threads than the other algorithms. However, the differences were not statistically significant.

  9. 9.

    A complete report of running times and everything needed to replicate these results is available at

  10. 10.

    The Unicode Consortium:

  11. 11.

    The construction times of shun with the src.2GB dataset exceeds 1 h. To make the algorithms in the figures comparable, we report the running times for the dataset src.1GB.

  12. 12.

    The computer tested is a dual-processor \(\hbox {Intel}^{\circledR }\) \(\hbox {Xeon}^{\circledR }\) CPU (E5645) with six cores per processor, for a total of 12 physical cores running at 2.50GHz. Hyperthreading was disabled. The computer runs Linux 3.5.0-17-generic, in 64-bit mode. This machine has per-core L1 and L2 caches of sizes 32KB and 256KB, respectively, and 1 per-processor shared L3 cache of 12MB, with a 5,958MB (\(\sim \hbox {6GB}\)) DDR3 RAM.

  13. 13.

    To ensure the constant access cost, we use the numactl command with “interleave \(=\) all” option. The command allocates the memory using round robin on the NUMA nodes.


  1. 1.

    Arroyuelo D, Costa VG, González S, Marín M, Oyarzún M (2012) Distributed search based on self-indexed compressed text. Inf Process Manag 48(5):819–827. doi:10.1016/j.ipm.2011.01.008

    Article  Google Scholar 

  2. 2.

    Bingmann T (2015) malloc_count—tools for runtime memory usage analysis and profiling. (2013). Last accessed: 17 Jan 2015

  3. 3.

    Blumofe RD, Leiserson CE (1999) Scheduling multithreaded computations by work stealing. J ACM 46(5):720–748. doi:10.1145/324133.324234

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Brisaboa NR, Luaces MR, Navarro G, Seco D (2013) Space-efficient representations of rectangle datasets supporting orthogonal range querying. Inf Syst 38(5):635–655. doi:10.1016/

    Article  Google Scholar 

  5. 5.

    Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. Tech. rep., Digital Equipment Corporation

  6. 6.

    Claude F (2011) A compressed data structure library. Last accessed: 13 August 2015

  7. 7.

    Claude F, Navarro G (2009) Practical rank/select queries over arbitrary sequences. In: SPIRE. Springer, Berlin, pp 176–187. doi:10.1007/978-3-540-89097-3_18

  8. 8.

    Claude F, Navarro G (2012) The wavelet matrix. In: SPIRE, vol 7608. Springer, Berlin, pp 167–179. doi:10.1007/978-3-642-34109-0_18

  9. 9.

    Claude F, Nicholson PK, Seco D (2011) Space efficient wavelet tree construction. In: SPIRE, vol 7024. Springer, Berlin, pp 185–196

  10. 10.

    Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn., chap. Multithreaded algorithms. The MIT Press, Cambridge, pp 772–812

  11. 11.

    Faro S, Külekci MO (2012) Fast multiple string matching using streaming SIMD extensions technology. In: SPIRE. Springer, Berlin, pp 217–228. doi:10.1007/978-3-642-34109-0_23

  12. 12.

    Ferragina P, Manzini G (2000) Opportunistic data structures with applications. In: Proceedings of the 41st annual symposium on foundations of computer science, FOCS ’00. IEEE Computer Society, Washington, DC, USA, p 390.

  13. 13.

    Ferragina P, Manzini G, Mäkinen V, Navarro G (2004) String processing and information retrieval: 11th international conference, SPIRE 2004, Padova, Italy, 5–8 October 2004. Proceedings, chap. An Alphabet-Friendly FM-Index. Springer, Berlin, pp 150–160. doi:10.1007/978-3-540-30213-1_23

  14. 14.

    Ferragina P, Manzini G, Mäkinen V, Navarro G (2007) Compressed representations of sequences and full-text indexes. ACM Trans Algorithms 3(2):20. doi:10.1145/1240233.1240243

    MathSciNet  Article  MATH  Google Scholar 

  15. 15.

    Fuentes-Sepúlveda J, Elejalde E, Ferres L, Seco D (2014) Efficient wavelet tree construction and querying for multicore architectures. In: Gudmundsson J, Katajainen J (eds) Experimental algorithms, Lecture Notes in Computer Science, vol 8504. Springer, Berlin, pp 150–161. doi:10.1007/978-3-319-07959-2_13

  16. 16.

    Gog S (2015) Succinct data structure library 2.0. (2012). Last accessed: 17 Jan 2015

  17. 17.

    González R, Grabowski S, Mäkinen V, Navarro G (2005) Practical implementation of rank and select queries. In: WEA. CTI Press, Greece, pp 27–38. Poster

  18. 18.

    Grossi R, Gupta A, Vitter JS (2003) High-order entropy-compressed text indexes. In: SODA. Soc. Ind. Appl. Math., Philadelphia, pp 841–850

  19. 19.

    Helman DR, JáJá J (2001) Prefix computations on symmetric multiprocessors. J Parallel Distrib Comput 61(2):265–278. doi:10.1006/jpdc.2000.1678

    Article  MATH  Google Scholar 

  20. 20.

    Illumina, Inc. (2016) An introduction to next-generation sequencing technology.

  21. 21.

    Ladra S, Pedreira O, Duato J, Brisaboa NR (2012) Exploiting SIMD instructions in current processors to improve classical string algorithms. In: ADBIS. Springer, Berlin, pp 254–267. doi:10.1007/978-3-642-33074-2_19

  22. 22.

    Makris C (2012) Wavelet trees: a survey. Comput Sci Inf Syst 9(2):585–625

    Article  Google Scholar 

  23. 23.

    Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul 8(1):3–30. doi:10.1145/272991.272995

    Article  MATH  Google Scholar 

  24. 24.

    Navarro G (2012) Wavelet trees for all. In: CPM. Springer, Berlin, pp 2–26. doi:10.1007/978-3-642-31265-6_2

  25. 25.

    Navarro G, Nekrich Y, Russo LMS (2013) Space-efficient data-analysis queries on grids. Theor Comput Sci 482:60–72. doi:10.1016/j.tcs.2012.11.031

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Pantaleoni J, Subtil N (2016) Nvbio library. Accessed 12 April 2016

  27. 27.

    Raman R, Raman V, Satti SR (2007) Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans Algorithms 3(4):43. doi:10.1145/1290672.1290680

    MathSciNet  Article  Google Scholar 

  28. 28.

    Schnattinger T, Ohlebusch E, Gog S (2012) Bidirectional search in a string with wavelet trees and bidirectional matching statistics. Inf Comput 213:13–22. doi:10.1016/j.ic.2011.03.007. Special Issue: Combinatorial Pattern Matching (CPM 2010)

  29. 29.

    Shun J (2015) Parallel wavelet tree construction. In: Proceedings of the IEEE data compression conference, Utah, USA, pp 63–72. doi:10.1109/DCC.2015.7

  30. 30.

    Singer J (2012) A wavelet tree based fm-index for biological sequences in SeqAn. Master’s thesis, Freie Universität Berlin.

  31. 31.

    Tischler G (2011) On wavelet tree construction. In: CPM. Springer, Berlin, pp 208–218

  32. 32.

    Touati SAA, Worms J, Briais S (2013) The Speedup-Test: a statistical methodology for program speedup analysis and computation. Concurr Comput Pract Exp 25(10):1410–1426. doi:10.1002/cpe.2939. Article first published online: 15 Oct 2012

  33. 33.

    Välimäki N, Mäkinen V (2007) Space-efficient algorithms for document retrieval. In: CPM, LNCS, vol. 4580. Springer, Berlin, pp 205–215. doi:10.1007/978-3-540-73437-6_22

  34. 34.

    Wetterstrand KA (2016) DNA sequencing costs: data from the NHGRI genome sequencing program (GSP). Accessed 12 April 2016

Download references


This work was supported in part by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 690941 and the doctoral scholarships of CONICYT Nos. 21120974 and 63130228 (first and second authors, respectively). We also would like to thank Roberto Asín for making his multicore computers, Mastropiero and Günther Frager, available to us.

Author information



Corresponding authors

Correspondence to José Fuentes-Sepúlveda or Leo Ferres.

Additional information

A previous version of this paper appeared in the 13th International Symposium on Experimental Algorithms (SEA 2014) [15].

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fuentes-Sepúlveda, J., Elejalde, E., Ferres, L. et al. Parallel construction of wavelet trees on multicore architectures. Knowl Inf Syst 51, 1043–1066 (2017).

Download citation


  • Succinct data structure
  • Wavelet tree construction
  • Multicore
  • Parallel algorithm