Searching and Sorting

Mailund, Thomas

doi:10.1007/978-1-4842-7077-6_5

Thomas Mailund²

3232 Accesses

Abstract

In this chapter, we will explore two fundamental problems that are the foundations of many other algorithms: sorting sequences and searching for an element in them. These are central problems used as building blocks for a variety of other algorithms, and Python already has built-in functionality for solving them. You should practically always use the existing implementations; they are well engineered and optimized, so you are not likely to implement faster solutions yourself. The exception is when you have some a priori knowledge about your data that Python does not have that you can then exploit, while Python must use a general algorithm. Different algorithms have different pros and cons, and we will briefly discuss these. You can choose the right algorithm for the job because you know more about the data than Python does. Optimizing the algorithm you use this way is rarely worthwhile, though, so you are usually better off just using what Python already has. Anyway, onward to the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this chapter, we assume we are working with numbers, but we can search for any type of data as long as we can compare two items to see if they are equal. With only this assumption, the linear search will get the job done. If we furthermore assume that our data has a total order, that is, for any two items we can decide if they are equal or if the first is smaller than or greater than the second, then the elements can be sorted and we can use binary search. These properties are satisfied by more than just numbers, and we briefly discuss what it takes to handle more general sequences at the end of the chapter.
2.
If we have a list or a tuple of numbers, we can always get the element at any given index in constant time. This property is called random access , and data structures where we can get an element by index in constant time are called random-access data structures. It is necessary to have the distinction between random-access data and not because there are many data structures where we do not have constant-time random access. In Chapter 13, we shall see one common sequence structure, linked lists, that enables us to scan through its elements in linear time but not access items by index in constant time. There are many more.
3.
Unless you have a good reason to, you should use the in operator to check membership of an item in a data structure. It will work for all sequences, using the linear search algorithm, but for data structures that allow for faster lookup, such as dictionaries or sets, the in operator will use the faster algorithms. The only case I can think of where you wouldn’t necessarily use in is for sorted, random-access sequences. Python cannot know if a sequence is sorted or not so it will use linear search when you use the in operator on general sequences. Even for sorted sequences, I would probably use in unless the search is a bottleneck in my algorithm because of the simpler syntax and because it makes it easier to replace a sequence with another data structure that provides faster membership checks.
4.
Timsort modifies the input list in place when it can get away with it but uses additional memory to speed up the computations when necessary.
5.
“Seeing the wrong solution to a problem (and understanding why it is wrong) is often as informative as seeing the correct solution.” —W. Richard Stevens
6.
Strictly speaking, we are not guaranteed that appending to a list is always in constant time, but there is a guarantee that if we append to a list n times, then all n operations can be done in O(n). So although some operations are not in O(1), on average, they are. Using a different data structure, a so-called doubly linked list (see Chapter 13), we can achieve worst-case constant time for all append and prepend operations—at the cost of linear time to look up elements. If we use these, then we need a list x with constant-time lookup, as we have for Python list objects, and other lists, y, where prepend or append is in constant time, but where we do not need to access random indices. We consider linked lists later in the book; for the sorting algorithms we consider now, we can still use Python’s list objects without affecting the worst-case running time.
7.
At this point, you are excused for getting the impression that we can always sort numbers in linear time using radix sort. It sounds like that is what I just wrote. This isn’t true, however. I did write that we needed the numbers to fit into a constant number of computer words. This put a bound on how large the integers can be. If you want to be able to hold at least n distinct elements in your input, you need O(log n) bits. The time usage depends on the size of the integers, so they cannot grow arbitrarily larger. If you want to sort n distinct numbers, you need a logarithmic number of subkeys, and then you have the same runtime complexity as the fastest comparison sorting algorithms.
8.
A suffix of a string is a substring that starts somewhere inside the string and continues to the end of the string. For x, any substring x[i:] is a suffix.
9.
It is somewhat arbitrary that I chose 32-bit integers here. Python’s integers are not restricted to 32 bits and can be arbitrarily large. The underlying hardware, however, will work with fixed-sized integers, typically 32 or 64 bits, and this section is about the underlying hardware and not Python’s integer representation.
10.
What happens to the bits at the left end of a shifted word can vary from hardware to hardware and according to which type of shift we use. There are really two different versions of right-shift. One will always fill the bits on the left of the result with zeros. This is called a logical shift. Another shift, called arithmetic shift, will fill the bits with zeros or ones depending on whether the most significant bit is zero or one. It will fill the leftmost bits with the same value as the most significant bit in the input. The right-shift operator in Python, >>, does arithmetic shift. The reason that you might fill with ones if the top bit is one has to do with how negative numbers are represented.
11.
It is still necessary for division.

Author information

Authors and Affiliations

Aarhus N, Denmark
Thomas Mailund

Authors

Thomas Mailund
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mailund, T. (2021). Searching and Sorting. In: Introduction to Computational Thinking. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-7077-6_5

Download citation

DOI: https://doi.org/10.1007/978-1-4842-7077-6_5
Published: 17 July 2021
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-7076-9
Online ISBN: 978-1-4842-7077-6
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics