Resizing

Mailund, Thomas

doi:10.1007/978-1-4842-4066-3_4

Thomas Mailund²

1404 Accesses

Abstract

If you know how performance degrades as the load factor of a hash table increases, you can use this to pick a table size where the expected performance matches your needs, presuming that you know how many keys the table will need to store. If you do not know the number of elements you need to store, n, then you cannot choose a table size, m, that ensures that α = n/m is below a desired upper bound. In most applications, you do not know n before you run your program. Therefore, you must adjust m as n increases by resizing the table.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Strictly speaking, amortized means that you write off expensive operations over time, and this suggests that cheaper ones follow costly operations. Doing this would not give you the runtime guarantee you are after. If you stop an algorithm right after an expensive operation and do not follow it with a series of cheap operations, you will be in trouble. What you do with amortized running time is that you save up some “computation” when doing cheap operations such that you can guarantee that you have enough computation in your bank account when you need to pay for an expensive operation.
2.
Technically, you could compute these primes as needed, but this would be much slower than all the other hash table operations, so tabulating the primes you need is the only practical way. You can go to https://primes.utm.edu/lists / to get a list of the first 1000, 10,000 or 50 million primes and build a table from these by filtering them according to your choice of β.
3.
You do not necessarily need your table size to be prime just because you use modulo and prime to get your bins. You can first get a random key using modulus and then mask out the lower bits. This way, you get a table size that is easier to work with; you can grow it and shrink it by a power of two, but, of course, at the cost of needing two operations to get your bin index. Since getting this index is unlikely to be the most time-critical in using a hash-table, this is a small price to pay.
4.
The reason we say that n insertion takes (amortized) linear time is that the cost per operation does not depend on n. It does depend on β, however, as you can see from the figure.

Author information

Authors and Affiliations

Aarhus N, Denmark
Thomas Mailund

Authors

Thomas Mailund
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mailund, T. (2019). Resizing. In: The Joys of Hashing. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4066-3_4

Download citation

DOI: https://doi.org/10.1007/978-1-4842-4066-3_4
Published: 10 February 2019
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4065-6
Online ISBN: 978-1-4842-4066-3
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics