Abstract
If you know how performance degrades as the load factor of a hash table increases, you can use this to pick a table size where the expected performance matches your needs, presuming that you know how many keys the table will need to store. If you do not know the number of elements you need to store, n, then you cannot choose a table size, m, that ensures that α = n/m is below a desired upper bound. In most applications, you do not know n before you run your program. Therefore, you must adjust m as n increases by resizing the table.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Strictly speaking, amortized means that you write off expensive operations over time, and this suggests that cheaper ones follow costly operations. Doing this would not give you the runtime guarantee you are after. If you stop an algorithm right after an expensive operation and do not follow it with a series of cheap operations, you will be in trouble. What you do with amortized running time is that you save up some “computation” when doing cheap operations such that you can guarantee that you have enough computation in your bank account when you need to pay for an expensive operation.
- 2.
Technically, you could compute these primes as needed, but this would be much slower than all the other hash table operations, so tabulating the primes you need is the only practical way. You can go to https://primes.utm.edu/lists / to get a list of the first 1000, 10,000 or 50 million primes and build a table from these by filtering them according to your choice of β.
- 3.
You do not necessarily need your table size to be prime just because you use modulo and prime to get your bins. You can first get a random key using modulus and then mask out the lower bits. This way, you get a table size that is easier to work with; you can grow it and shrink it by a power of two, but, of course, at the cost of needing two operations to get your bin index. Since getting this index is unlikely to be the most time-critical in using a hash-table, this is a small price to pay.
- 4.
The reason we say that n insertion takes (amortized) linear time is that the cost per operation does not depend on n. It does depend on β, however, as you can see from the figure.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Thomas Mailund
About this chapter
Cite this chapter
Mailund, T. (2019). Resizing. In: The Joys of Hashing. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4066-3_4
Download citation
DOI: https://doi.org/10.1007/978-1-4842-4066-3_4
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4065-6
Online ISBN: 978-1-4842-4066-3
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)