Skip to main content

Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 9954)

Abstract

For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of \(\sigma \) characters, we can store a nearly optimal alphabetic prefix-free code in \(o (\sigma )\) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ordóñez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewords’ lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in \(\mathcal {O}\!\left( {\sigma \log L + 2^{\epsilon L}}\right) \) bits, where L is the maximum codeword length and \(\epsilon \) is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of \(\ell \) bits in time \(\mathcal {O}\!\left( {\ell }\right) \) using \(\mathcal {O}\!\left( {\sigma \log L}\right) \) bits of space.

Keywords

  • Lexicographic Order
  • Alphabet Size
  • Inverted List
  • Left Child
  • Codeword Length

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Funded in part by European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 690941 (project BIRDS). The first author was supported by: MINECO (PGE and FEDER) grants TIN2013-47090-C3-3-P and TIN2015-69951-R; MINECO and CDTI grant ITC-20151305; ICT COST Action IC1302; and Xunta de Galicia (co-founded with FEDER) grant GRC2013/053. The second author was supported by Academy of Finland grants 268324 and 250345 (CoECGR). The fourth author was supported by Millennium Nucleus Information and Coordination in Networks ICM/FIC P10-024F, Chile.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-46049-9_5
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-46049-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    Since the code tree has height L and \(\sigma \) leaves, it follows that \(L < \sigma \).

  2. 2.

    This descent is conceptual; we do not have a concrete node v at each level, but we do know \(r_v\).

References

  1. Claude, F., Navarro, G., Ordóñez, A.: The wavelet matrix: an efficient wavelet tree for large alphabets. Inf. Syst. 47, 15–32 (2015)

    CrossRef  Google Scholar 

  2. Evans, W., Kirkpatrick, D.G.: Restructuring ordered binary trees. J. Algorithms 50, 168–193 (2004)

    MathSciNet  CrossRef  MATH  Google Scholar 

  3. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences, full-text indexes. ACM Trans. Algorithm 3(2), 20 (2007)

    MathSciNet  CrossRef  MATH  Google Scholar 

  4. Gagie, T., He, M., Munro, J.I., Nicholson, P.K.: Finding frequent elements in compressed 2D arrays and strings. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 295–300. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  5. Gagie, T., Navarro, G., Nekrich, Y., Ordóñez, A.: Efficient and compact representations of prefix codes. IEEE Trans. Inf. Theory 61(9), 4999–5011 (2015)

    MathSciNet  CrossRef  Google Scholar 

  6. Grossi, R., Gupta, A., and Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings SODA, pp. 841–850 (2003)

    Google Scholar 

  7. Itai, A.: Optimal alphabetic trees. SIAM J. Comp. 5, 9–18 (1976)

    MathSciNet  CrossRef  MATH  Google Scholar 

  8. Kraft, L.G.: A device for quantizing, grouping, and coding amplitude modulated pulses. M.Sc. thesis, EE Dept., MIT (1949)

    Google Scholar 

  9. Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. Comp. 31(3), 762–776 (2001)

    MathSciNet  CrossRef  MATH  Google Scholar 

  10. Navarro, G.: Wavelet trees for all. J. Discr. Algorithm 25, 2–20 (2014)

    MathSciNet  CrossRef  MATH  Google Scholar 

  11. Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  12. Pǎtraşcu, M.: Succincter. In: Proceedings FOCS, pp. 305–313 (2008)

    Google Scholar 

  13. Schwartz, E.S., Kallick, B.: Generating a canonical prefix encoding. Commun. ACM 7, 166–169 (1964)

    CrossRef  MATH  Google Scholar 

  14. Wessner, R.L.: Optimal alphabetic search trees with restricted maximal height. Inf. Proc. Lett. 4, 90–94 (1976)

    MathSciNet  CrossRef  MATH  Google Scholar 

Download references

Acknowledgements

This research was carried out in part at University of A Coruña, Spain, while the second author was visiting and the fifth author was a PhD student there. It started at a StringMasters workshop at the Research Center on Information and Communication Technologies (CITIC) of the university. The workshop was partly funded by EU RISE project BIRDS (Bioinformatics and Information Retrieval Data Structures). The authors thank Nieves Brisaboa and Susana Ladra.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Travis Gagie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Fariña, A., Gagie, T., Manzini, G., Navarro, G., Ordóñez, A. (2016). Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes. In: Inenaga, S., Sadakane, K., Sakai, T. (eds) String Processing and Information Retrieval. SPIRE 2016. Lecture Notes in Computer Science(), vol 9954. Springer, Cham. https://doi.org/10.1007/978-3-319-46049-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46049-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46048-2

  • Online ISBN: 978-3-319-46049-9

  • eBook Packages: Computer ScienceComputer Science (R0)