Range Majorities and Minorities in Arrays

Belazzougui, Djamal; Gagie, Travis; Munro, J. Ian; Navarro, Gonzalo; Nekrich, Yakov

doi:10.1007/s00453-021-00799-7

Range Majorities and Minorities in Arrays

Published: 19 March 2021

Volume 83, pages 1707–1733, (2021)
Cite this article

Algorithmica Aims and scope Submit manuscript

Djamal Belazzougui¹,
Travis Gagie²,
J. Ian Munro³,
Gonzalo Navarro ORCID: orcid.org/0000-0002-2286-741X⁴ &
…
Yakov Nekrich⁵

202 Accesses
Explore all metrics

Abstract

The problem of parameterized range majority asks us to preprocess a string of length n such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold $\tau$. This is a more tractable version of the classical problem of finding the range mode, which is unlikely to be solvable in polylogarithmic time and linear space. In this paper we give the first linear-space solution with optimal $\mathcal {O}\!\left( {1 / \tau } \right)$ query time, even when $\tau$ can be specified with the query. We then consider data structures whose space is bounded by the entropy of the distribution of the symbols in the sequence. For the case when the alphabet size $\sigma$ is polynomial on the computer word size, we retain the optimal time within optimally compressed space (i.e., with sublinear redundancy). Otherwise, either the compressed space is increased by an arbitrarily small constant factor or the time rises to any function in $(1/\tau )\cdot \omega (1)$. We obtain the same results on the complementary problem of parameterized range minority.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Better Space Bounds for Parameterized Range Majority and Minority

Optimal Encodings for Range Majority Queries

Article 21 March 2015

Gonzalo Navarro & Sharma V. Thankachan

Encodings for Range Majority Queries

Notes

In that paper they find the predecessor of x, which is the largest $x_i \le x$, but the problem is analogous.
M. Pǎtraşcu, personal communication, 2009.
This could have been simply $X \leftarrow (Y-X)~\textsc {and}~Y$ if there was an unused highest bit set to zero in the $(\lg \sigma ')$-length fields of X. Instead, we have to use this more complex formula that first zeroes the highest bit of the fields and later considers them separately.

References

Barbay, J., Claude, F., Gagie, T., Navarro, G., Nekrich, Y.: Efficient fully-compressed sequence representations. Algorithmica 69(1), 232–268 (2014)
Article MathSciNet Google Scholar
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithm. 16(3), 2 (2011)
MathSciNet MATH Google Scholar
Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Linear-time string indexing and analysis in small space. ACM Trans. Algorithm. 16(2), 17:1-17:54 (2020)
Article MathSciNet Google Scholar
Belazzougui, D., Gagie, T., Navarro, G.: Better space bounds for parameterized range majority and minority. In: Proceedings of the 12th Annual Workshop on Algorithms and Data Structures (WADS), LNCS 8037, pp. 121–132 (2013)
Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. ACM Trans. Algorithm. 10(4), 23 (2014)
Article MathSciNet Google Scholar
Belazzougui, D., Navarro, G.: Optimal lower and upper bounds for representing sequences. ACM Trans. Algorithm. 11(4), 31 (2015)
Article MathSciNet Google Scholar
Bose, P., Kranakis, E., Morin, P., Tang, Y.: Approximate range mode and range median queries. In: Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS), pp. 377–388 (2005)
Boyer, R.S., Moore, J.S.: MJRTY–a fast majority vote algorithm. In: Automated Reasoning, pp. 105–117. Springer, Berlin (1991)
Chan, T.M., Durocher, S., Larsen, K.G., Morrison, J., Wilkinson, B.T.: Linear-space data structures for range mode query in arrays. Theory Comput. Syst. 55(4), 719–741 (2014)
Article MathSciNet Google Scholar
Chan, T.M., Durocher, S., Skala, M., Wilkinson, B.T.: Linear-space data structures for range minority query in arrays. Algorithmica 72(4), 901–913 (2015)
Article MathSciNet Google Scholar
Cormode, G.: Misra-Gries summaries. In: Encyclopedia of Algorithms, pp. 1334–1337. Springer, Berlin (2016)
Cormode, G., Muthukrishnan, S.: Data stream methods. http://www.cs.rutgers.edu/~muthu/198-3.pdf, 2003. Lecture 3 of Rutger’s 198:671 Seminar on Processing Massive Data Sets
Demaine, E.D., López-Ortiz, A., Munro, J. I.: Frequency estimation of internet packet streams with limited space. In: Proceedings of the 10th European Symposium on Algorithms (ESA), pp. 348–360 (2002)
Durocher, S., He, M., Munro, J.I., Nicholson, P.K., Skala, M.: Range majority in constant time and linear space. Inf. Comput. 222, 169–179 (2013)
Article MathSciNet Google Scholar
Durocher, S., Shah, R., Skala, M., Thankachan, S.V.: Linear-space data structures for range frequency queries on arrays and trees. Algorithmica 74(1), 344–366 (2016)
Article MathSciNet Google Scholar
Elmasry, A., He, M., Munro, J.I., Nicholson, P.K.: Dynamic range majority data structures. Theoret. Comput. Sci. 647, 59–73 (2016)
Article MathSciNet Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Algorithm. 3(2), 1–22 (2007)
Article MathSciNet Google Scholar
Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theoret. Comput. Sci. 371(1), 115–121 (2007)
Article MathSciNet Google Scholar
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)
Article MathSciNet Google Scholar
Gagie, T., He, M., Munro, J.I., Nicholson, P.K.: Finding frequent elements in compressed 2D arrays and strings. In: Proceedings of the 18th Symposium on String Processing and Information Retrieval (SPIRE), pp. 295–300 (2011)
Gagie, T., He, M., Navarro, G.: Compressed dynamic range majority and minority data structures. Algorithmica 82(7), 2063–2086 (2020)
Article MathSciNet Google Scholar
Gawrychowski, P., Nicholson, P.K.: Optimal query time for encoding range majority. In: Proceedings of the 15th International Symposium on Algorithms and Data Structures (WADS), pp. 409–420 (2017). Extended version in CoRR abs/1704.06149
Greve, M., Jørgensen, A.G., Larsen, K.D., Truelsen, J.: Cell probe lower bounds and approximations for range mode. In: Proceedings of the 37th International Colloquium on Automata, Languages and Programming (ICALP), pp. 605–616 (2010)
Grossi, R., Orlandi, A., Raman, R., Srinivasa Rao, S.: More haste, less waste: Lowering the redundancy in fully indexable dictionaries. In: Proceedings of the 26th International Symposium on Theoretical Aspects of Computer Science (STACS), pp. 517–528 (2009)
Hon, W.-K., Shah, R., Vitter, J.: Space-efficient framework for top-$k$ string retrieval problems. In: Proceedings of the 50th IEEE Annual Symposium on Foundations of Computer Science (FOCS), pp. 713–722 (2009)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)
Article Google Scholar
Karpinski, M., Nekrich, Y.: Searching for frequent colors in rectangles. In: Proceedings of the 20th Canadian Conference on Computational Geometry (CCCG), pp. 11–14 (2008)
Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. Comput. 29(3), 893–911 (2000)
Article MathSciNet Google Scholar
Krizanc, D., Morin, P., Smid, M.H.M.: Range mode and range median queries on lists and trees. Nordic J. Comput. 12(1), 1–17 (2005)
MathSciNet MATH Google Scholar
Lai, Y.K., Poon, C.K., Shi, B.: Approximate colored range and point enclosure queries. J. Discrete Algorithm. 6(3), 420–432 (2008)
Article MathSciNet Google Scholar
Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Article MathSciNet Google Scholar
Munro, J.I., Navarro, G., Nekrich, Y.: Fast compressed self-indexes with deterministic linear-time construction. Algorithmica 82(2), 316–337 (2020)
Article MathSciNet Google Scholar
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Symposium on Discrete Algorithms (SODA), pp. 657–666 (2002)
Navarro, G., Thankachan, S.V.: Optimal encodings for range majority queries. Algorithmica 74(3), 1082–1098 (2016)
Article MathSciNet Google Scholar
Petersen, H.: Improved bounds for range mode and range median queries. In: Proceedings of the 34th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), pp. 418–423 (2008)
Petersen, H., Grabowski, S.: Range mode and range median queries in constant time and sub-quadratic space. Inf. Process. Lett. 109(4), 225–228 (2009)
Article MathSciNet Google Scholar
Pǎtraşcu, M.: Succincter. In: Proceedings of the 49th Symposium on Foundations of Computer Science (FOCS), pp. 305–313 (2008)
Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithm. 5(1), 12–22 (2007)
Article MathSciNet Google Scholar
Wei, Z., Yi, K.: Beyond simple aggregates: indexing for summary queries. In: Proceedings of the 30th Symposium on Principles of Database Systems (PODS), pp. 117–128 (2011)

Download references

Acknowledgements

Many thanks to Patrick Nicholson for helpful comments.

Author information

Authors and Affiliations

Research Center on Technical and Scientific Information (CERIST), Algiers, Algeria
Djamal Belazzougui
Faculty of Computer Science, Dalhousie University, Halifax, Canada
Travis Gagie
David Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada
J. Ian Munro
Millennium Institute for Foundational Research on Data, Department of Computer Science, University of Chile, Santiago, Chile
Gonzalo Navarro
Department of Computer Science, Michigan Technological University, Houghton, USA
Yakov Nekrich

Authors

Djamal Belazzougui
View author publications
You can also search for this author in PubMed Google Scholar
Travis Gagie
View author publications
You can also search for this author in PubMed Google Scholar
J. Ian Munro
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Yakov Nekrich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gonzalo Navarro.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Funded in part by Academy of Finland Grant 268324, by NSERC Grant RGPIN-07185-2020, by ANID – Millennium Science Initiative Program – Code ICN17_002, and by Fondecyt Grant 1-200038, Chile. An early partial version of this article appeared in Proc. WADS 2013.

Appendix: Finding $\tau '$-Majorities on Tiny Alphabets

We show how to find $\tau '$-majorities in time $\mathcal {O}\!\left( {1/\tau '} \right)$ on ranges of length $\mathcal {O}\!\left( {(1/\tau ')w^\beta } \right)$, over alphabet $[0..\sigma '-1]$, with $\sigma '=w^\beta$, in the case $1/\tau ' < \sigma '$. We will compute an array of $\sigma '$ counters with the frequency of the symbols in the range, and then report those exceeding the threshold. The maximum size of the range is $(4/\tau ')w^\beta /4 \le \sigma ' w^\beta =w^{2\beta }$, and thus $2\beta \lg w$ bits suffice to represent each counter. The $\sigma '$ counters then require $2\beta w^\beta \lg w$ bits and can be maintained in a computer word (although we will store them somewhat spaced for technical reasons). We can read the elements in $S_v$ by chunks of $w^\beta$ symbols, and compute in constant time the corresponding counters for those symbols. Then we sum the current counters and the counters for the chunk, all in constant time because they are fields in a single computer word. The range is then processed in time $\mathcal {O}\!\left( {1/\tau '} \right)$.

To compute the counters corresponding to $w^\beta$ symbols, we extend the popcounting algorithm of Belazzougui and Navarro [6, Sec. 4.1]; assume we extract the $w^\beta$ symbols from $S_v$ and have them packed in the lowest $k\ell$ bits of a computer word X, where $k=w^\beta$ is the number of symbols and $\ell =\lg \sigma '$ is the number of bits used per symbol. We first create $\sigma '$ copies of the sequence at distance $2k\ell$ of each other: $X \leftarrow X \cdot (0^{2k\ell -1}1)^{\sigma '}$. In each copy we will count the occurrences of a different symbol. To have the $(i+1)$th copy count the occurrences of symbol i, for $0 \le i < \sigma '$, we perform

$$\begin{aligned} X ~\leftarrow ~ X~~\textsc {xor}~~0^{k\ell } ((\sigma '-1)_\ell )^k \ldots 0^{k\ell } (2_\ell )^k ~ 0^{k\ell } (1_\ell )^k ~ 0^{k\ell } (0_\ell )^k, \end{aligned}$$

where $i_\ell$ is number i written in $\ell$ bits. Thus in the $(i+1)$th copy the symbols equal to i become zero and the others nonzero. We then set a 1 at the highest bit of the symbols equal to i in the $(i+1)$th copy, with

$$\begin{aligned} X ~\leftarrow ~ (Y - (X~\textsc {and~not}~Y))~\textsc {and}~Y~\textsc {and~not}~X, \end{aligned}$$

where $Y=(0^{k\ell } (10^{\ell -1})^k)^{\sigma '}$.^{Footnote 3} Now we add all the 1s in each copy with $X \leftarrow X \cdot 0^{k\ell (2\sigma '-1)} (0^{\ell -1}1)^k$. This spreads several sums across the $2k\ell$ bits of each copy, and in particular the kth sum adds up all the 1s of the copy. Each sum requires $\lg k$ bits, which is precisely the $\ell$ bits we have allocated per field. Finally, we isolate the desired counters using $X \leftarrow X~\textsc {and}~(0^{k\ell }1^\ell 0^{(k-1)\ell })^{\sigma '}$. The $\sigma '$ counters are not contiguous in the computer word, but we still can afford to store them spaced: we use $2k\ell \sigma ' = 2 \beta w^{2\beta }\lg w$ bits, which since $\beta \le 1/4$, is always less than w.

Since the range is of length at most $w^{2\beta }$, the cumulative counters need $\lg (w^{2\beta }) = 2\ell$ bits. We will store them in a computer word A separated by $2k\ell$ bits so that we can directly add the resulting word X after processing a chunk of $w^\beta$ symbols of the range in $S_v$: $A \leftarrow A+X$. If the last chunk is of length $l < w^\beta$, we complete it with zeros and then subtract those spurious $w^\beta -l$ occurrences from the first counter, $A \leftarrow A - (w^\beta -l) \cdot 2^{(k-1)\ell }$.

The last challenge is to output the counters that are at least $y = \lfloor \tau ' (j-i+1) \rfloor +1$ after processing the range. We use

$$\begin{aligned} A \leftarrow A + (2^{2\ell }-y) \cdot (0^{k\ell +\ell -1} 1 0^{(k-1)\ell })^{\sigma '} \end{aligned}$$

so that the counters reaching y will overflow to the next bit. We isolate those overflow bits with $A \leftarrow A~\textsc {and}~(0^{(k-1)\ell -1} 1 0^{(k+1)\ell })^{\sigma '}$, so that we have to report the symbol i if and only if $A~\textsc {and}~0^{(k(2\sigma '-2i-1)-1)\ell -1} 1 0^{(k(2i+1)+1)\ell } \not = 0$. We then repeatedly isolate the lowest bit of A with

$$\begin{aligned} D \leftarrow (A~\textsc {xor}~(A-1))~\textsc {and}~ (0^{(k-1)\ell -1} 1 0^{(k+1)\ell })^{\sigma '}, \end{aligned}$$

and then remove it with $A \leftarrow A~\textsc {and}~(A-1)$, until $A=0$. Once we have a position isolated in D, we find the position in constant time by using a monotone minimum perfect hash function over the set $\{ 2^{(k(2i+1)+1)\ell },~0 \le i < \sigma '\}$, which uses $\mathcal {O}\!\left( {\sigma ' \lg w} \right) =o(w)$ bits [2]. Only one such data structure is needed for all the sequences, and it takes less space than a single systemwide pointer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Belazzougui, D., Gagie, T., Munro, J.I. et al. Range Majorities and Minorities in Arrays. Algorithmica 83, 1707–1733 (2021). https://doi.org/10.1007/s00453-021-00799-7

Download citation

Received: 16 April 2018
Accepted: 06 January 2021
Published: 19 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00453-021-00799-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Range Majorities and Minorities in Arrays

Abstract

Access this article

Similar content being viewed by others

Better Space Bounds for Parameterized Range Majority and Minority

Optimal Encodings for Range Majority Queries

Encodings for Range Majority Queries

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Finding \(\tau '\)-Majorities on Tiny Alphabets

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Range Majorities and Minorities in Arrays

Abstract

Access this article

Similar content being viewed by others

Better Space Bounds for Parameterized Range Majority and Minority

Optimal Encodings for Range Majority Queries

Encodings for Range Majority Queries

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Finding \(\tau '\)-Majorities on Tiny Alphabets

Appendix: Finding \(\tau '\)-Majorities on Tiny Alphabets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation