Abstract
We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing an SLP with n symbols requires at least 2n + logn! + o(n) bits. We then present a succinct representation of an SLP; this representation is asymptotically equivalent to the lower bound. The space is at most 2nlogρ(1 + o(1)) bits for \(\rho \leq 2\sqrt{n}\), while supporting random access to any production rule of an SLP in O(loglogn) time. In addition, we present a novel dynamic data structure associating a digram with a unique symbol. Such a data structure is called a naming function and has been implemented using a hash table that has a space-time tradeoff. Thus, the memory space is mainly occupied by the hash table during the development of production rules. Alternatively, we build a dynamic data structure for the naming function by leveraging the idea behind the wavelet tree. The space is strictly bounded by 2nlogn(1 + o(1)) bits, while supporting O(logn) query and update time.
Keywords
- Directed Acyclic Graph
- Production Rule
- Query Time
- Naming Function
- Entropy Bound
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This study was supported by KAKENHI(23680016,20589824) and JST PRESTO program.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Apostolico, A., Lonardi, S.: Off-line Compression by Greedy Textual Substitution. Proceedings of the IEEE 88, 1733–1744 (2000)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa, S.: Efficient Substructure Discovery from Large Semi-structured Data. In: SDM, pp. 158–174 (2002)
Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet Partitioning for Compressed Rank/Select and Applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)
Barbay, J., Navarro, G.: Compressed Representations of Permutations, and Applications. In: STACS, pp. 111–122 (2009)
Botelho, F.C., Pagh, R., Ziviani, N.: Simple and Space-Efficient Minimal Perfect Hash Functions. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 139–150. Springer, Heidelberg (2007)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inform. Theory 51, 2554–2576 (2005)
Claude, F., Navarro, G.: Self-Indexed Grammar-Based Compression. Fundam. Inform. 111, 313–337 (2011)
Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theor. Comput. Sci. 372, 115–121 (2007)
Fomin, F.V., Kratsch, D., Novelli, J.-C.: Approximating minimum cocolorings. Inf. Process. Lett. 84, 285–290 (2002)
Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)
González, R., Navarro, G.: Statistical Encoding of Succinct Data Structures. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 294–305. Springer, Heidelberg (2006)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)
Jacobson, G.: Space-efficient Static Trees and Graphs. In: FOCS, pp. 549–554 (1989)
Jansson, J., Sadakane, K., Sung, W.-K.: Ultra-succinct representation of ordered trees with applications. J. Comput. Syst. Sci. 78, 619–631 (2012)
Jansson, J., Sadakane, K., Sung, W.-K.: CRAM: Compressed Random Access Memory. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 510–521. Springer, Heidelberg (2012)
Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid Identification of Repeated Patterns in Strings, Trees and Arrays. In: STOC, pp. 125–136 (1972)
Karp, R.M., Rabin, M.O.: Efficient Randomized Pattern-Matching Algorithms. IBM Journal of Research and Development 31, 249–260 (1987)
Karpinski, M., Rytter, W., Shinohara, A.: An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions. Nordic J. Comp. 4, 172–186 (1997)
Larsson, N.J., Moffat, A.: Offline Dictionary-Based Compression. In: DCC, pp. 296–305 (1999)
Lehman, E.: Approximation Algorithms for Grammar-Based Compression. PhD thesis, MIT (2002)
Lehman, E., Shelat, A.: Approximation algorithms for grammar-based compression. In: SODA, pp. 205–212 (2002)
Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 398–409. Springer, Heidelberg (2011)
Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. J. Discrete Algorithms 18, 100–112 (2013)
Maruyama, S., Sakamoto, H., Takeda, M.: An Online Algorithm for Lightweight Grammar-Based Compression. Algorithms 5, 213–235 (2012)
Navarro, G.: Wavelet Trees for All. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 2–26. Springer, Heidelberg (2012)
Navarro, G., Providel, E.: Fast, small, simple rank/Select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012)
Okanohara, D., Sadakane, K.: Practical Entropy-Compressed Rank/Select Dictionary. In: ALENEX (2007)
Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302, 211–222 (2003)
Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: SODA, pp. 1230–1239 (2006)
Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3, 416–430 (2005)
Takabatake, Y., Tabei, Y., Sakamoto, H.: Variable-Length Codes for Space-Efficient Grammar-Based Compression. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 398–410. Springer, Heidelberg (2012)
Yehuda, R.B., Fogel, S.: Partitioning a Sequence into Few Monotone Subsequences. Acta. Inf. 35, 421–440 (1998)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Trans. Inform. Theory 23, 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24, 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tabei, Y., Takabatake, Y., Sakamoto, H. (2013). A Succinct Grammar Compression. In: Fischer, J., Sanders, P. (eds) Combinatorial Pattern Matching. CPM 2013. Lecture Notes in Computer Science, vol 7922. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38905-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-38905-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38904-7
Online ISBN: 978-3-642-38905-4
eBook Packages: Computer ScienceComputer Science (R0)