Databases usually expose a specific communication protocol for their users. This protocol is the foundation of communication between clients and servers, so it’s often well-documented and has a formal specification. Some databases, like PostgreSQL, implement their own binary format on top of the TCP/IP stack.Footnote 1 Others, like Amazon DynamoDB,Footnote 2 build theirs on top of HTTP, which is a little more verbose, but also more versatile and compatible with web browsers. It’s also not uncommon to see a database exposing a protocol based on gRPCFootnote 3 or any other well-established framework.

Regardless of the implementation details, users seldom use the bare protocol themselves because it’s usually a fairly low-level API. What’s used instead is a driver—a programming interface written in a particular language, implementing a higher-level abstraction for communicating with the database. Drivers hide all the nitty-gritty details behind a convenient interface, which saves users from having to manually handle connection management, parsing, validation, handshakes, authentication, timeouts, retries, and so on.

In a distributed environment (which a scalable database cluster usually is), clients, and therefore drivers, are an extremely important part of the ecosystem. The clients are usually the most numerous group of actors in the system, and they are also very heterogeneous in nature, as visualized in Figure 5-1. Some clients are connected via local network interfaces, other ones connect via a questionable Wi-Fi hotspot on another continent and thus have vastly different latency characteristics and error rates. Some might run on microcontrollers with 1MiB of random access memory, while others utilize 128-core bare metal machines from a cloud provider. Due to this diversity, it’s very important to take drivers into consideration when thinking about performance, scalability, and resilience to failures. Ultimately it’s the drivers that generate traffic and its concurrency, so cooperation between them and database nodes is crucial for the whole system to be healthy and efficient.

FormalPara Note

As a reminder, concurrency, in the context of this book, is the measure of how many operations are performed at the same point in time. It’s conceptually similar to parallelism. With concurrency, the operations occur physically at the same time (e.g. on multiple CPU cores or multiple machines). Parallelism does not specify that; the operations might just as well be executed in small steps on a single machine. Nowadays, distributed systems must rely on providing high concurrency in order to remain competitive and catch up with ever-developing technology.

This chapter takes a look at how drivers impact performance—through the eyes of someone who has engineered drivers for performance. It provides insight into various ways that drivers can support efficient client-server interactions and shares tips for getting the most out of a driver, particularly from the performance perspective. Finally, the chapter wraps up with several considerations to keep in mind as you’re selecting a driver.

Relationship Between Clients and Servers

Scalability is a measure of how well your system reacts to increased load. This load is usually generated by clients using their drivers, so keeping the relationship between your clients and servers sound is an important matter. The more you know about your workloads, your clients’ behavior, and their usage patterns, the better you’re prepared to handle both sudden spikes in traffic and sustained, long-term growth in usage.

Each client is different and should be treated as such. The differences come both from clients’ characteristics, like their number and volume, and from their requirements. Some clients have strict latency guarantees, even at the cost of higher error rates. Others do not particularly care about the latency of any single database query, but just want a steady pace of progress in their long-standing queries. Some databases target specific types of clients (e.g., analytical databases which expect clients processing large aggregate queries operating on huge volumes of historical data). Other ones strive to be universal, handling all kinds of clients and balancing the load so that everyone is happy (or, more precisely, “happy enough”).

Figure 5-1
An illustration in which two clients interact with 3 database servers that mutually interacts, and are connected to seven clients.

Visualization of clients and servers in a distributed system

Workload Types

There are multiple ways of classifying database clients. One particularly interesting way is to delineate between clients processing interactive and batch (e.g., analytical) workloads, also known as OLTP (online transaction processing) vs OLAP (online analytical processing)—see Figure 5-2.

Figure 5-2
Two illustrations. 1. Title is Interactive. Three database servers interact with four clients individually. 2. Title is batch. Three database servers interact with four clients as a whole.

Difference between interactive and batch (analytical) workloads

Interactive Workloads

A client processing an interactive workload typically wants certain latency guarantees. Receiving a response fast is more important than ensuring that the query succeeded. In other words, it’s better to return an error in a timely manner than make the client indefinitely wait for the correct response. Such workloads are often characterized by unbounded concurrency, which means that the number of in-progress operations is hard to predict.

A prime example of an interactive workload is a server handling requests from web browsers. Imagine an online game, where players interact with the system straight from their favorite browsers. High latency for such a player means a poor user experience because people tend to despise waiting for online content for more than a few hundred milliseconds; with multi-second delays, most will just ditch the game as unusable and try something else. It’s therefore particularly important to be as interactive as possible and return the results quickly—even if the result happens to be a temporary error. In such a scenario, the concurrency of clients varies and is out of control for the database. Sometimes there might be a large influx of players, and the database might need to refuse some of them to avoid overload.

Batch (Analytical) Workloads

A batch (analytical) workload is the conceptual opposite of an interactive one. With such workloads, it doesn’t matter whether any single request is processed in a few milliseconds or hours. The important thing is that the processing makes steady progress with a satisfactory error rate, which is ideally zero. Batch workloads tend to have fixed concurrency, which makes it easier for the database to keep the load under control.

A good example of a batch workload is an Apache SparkFootnote 4 job performing analytics on a big dataset (think terabytes). There are only a few connections established to the database, and they continuously send requests in order to fetch data for long computations. Because the concurrency is predictable, the database can easily respond to an increased load by applying backpressure (e.g., by delaying the responses a little bit). The analytical processing will simply slow down, adjusting its speed according to the speed at which the database can consume queries.

Mixed Workloads

Certain workloads cannot be easily qualified as fully interactive or fully batch. The clients are free to intermix their requirements, concurrency, and load however they please—so the databases should also be ready for surprises. For example, a batch workload might suddenly experience a giant temporary spike in concurrency. Databases should, on the one hand, maintain a level of trust in the workload’s typical patterns, but on the other hand anticipate that workloads can simply change over time—due to bugs, hardware changes, or simply because the use case has diverged from its original goal.

Throughput vs Goodput

A healthy distributed database cluster is characterized by stable goodput, not throughput. Goodput is an interesting portmanteau of good + throughput, and it’s a measure of useful data being transferred between clients and servers over the network, as opposed to just any data. Goodput disregards errors and other churn-like redundant retries, and is used to judge how effective the communication actually is.

This distinction is important.

Imagine an extreme case of an overloaded node that keeps returning errors for each incoming request. Even though stable and sustainable throughput can be observed, this database brings no value to the end-user. Thus, it’s essential to track how much useful data can be delivered in an acceptable time. For example, this can be achieved by tracking both the total throughput and throughput spent on sending back error messages and then subtracting one from another to see how much valid data was transferred (see Figure 5-3).

Figure 5-3
Two graphs. 1. The graph title is, Client Received Requests. There are two oscillating lines at 100 kilobytes per second, and 60 kilobytes per second. 2. The graph title is, Write Timeouts per seconds. There are spikes at 7 p m for 14 writers per second, and at 9 p m for 10 writers per second.

Note how a fraction of the throughput times out, effectively requiring more work from clients to achieve goodput

Maximizing goodput is a delicate operation and it heavily depends on the infrastructure, workload type, clients’ behavior, and many other factors. In some cases, the database shedding load might be beneficial for the entire system. Shedding is a rather radical measure of dealing with overload: Requests qualified as “risky” are simply ignored by the server, or immediately terminated with an error. This type of overload protection is especially useful against issues induced by interactive workloads with unbounded concurrency (there’s not much a database can do to protect itself except drop some of the incoming requests early).

The database server isn’t an oracle; it can’t accurately predict whether a request is going to fail due to overload, so it must guess. Fortunately, there are quite a few ways of making that guess educated:

  • Shedding load if X requests are already being processed, where X is the estimated maximum a database node can handle.

  • Refusing a request if its estimated memory usage is larger than the database could handle at the moment.

  • Probabilistically refusing a request if Y requests are already being processed, where Y is a percentage of the maximum a database node can handle, with the probability raising to 100 percent once a certain threshold is reached.

  • Refusing a request if its estimated execution time indicates that it’s not going to finish in time, and instead it is likely to time out anyway.

While refusing clients’ requests is detrimental to user experience, sometimes it’s simply the lesser of two evils. If dropping a number of requests allows even more requests to successfully finish in time, it increases the cluster’s goodput.

Clients can help the database maximize goodput and keep the latency low by declaring for how long the request is considered valid. For instance, in high frequency trading, a request that takes more than a couple of milliseconds is just as good as a request that failed. By letting the database know that’s the case, you can allow it to retire some requests early, leaving valuable resources for other requests which still have a chance to be successful. Proper timeout management is a broad topic and it deserves a separate section.

Timeouts

In a distributed system, there are two fundamental types of timeouts that influence one another: client-side timeouts and server-side timeouts. While both are conceptually similar, they have different characteristics. It’s vital to properly configure both of them to prevent problems like data races and consistency issues.

Client-Side Timeouts

This type of timeout is generally configured in the database driver. It signifies how long it takes for a driver to decide that a response from a server is not likely to arrive. In a perfect world built on top of a perfect network, all parties always respond to their requests. However, in practice, there are numerous causes for a response to either be late or lost:

  • The recipient died

  • The recipient is busy with other tasks

  • The network failed, maybe due to hardware malfunction

  • The network has a significant delay because packets get stuck in an intermediate router

  • A software bug caused the packet to be lost

  • And so on

Since in a distributed environment it’s usually impossible to guess what happened, the client must sometimes decide that a request is lost. The alternative is to wait indefinitely. That might work for a select set of use cases, but it’s often simply unacceptable. If a single failed request holds a resource for an unspecified time, the system is eventually doomed to fail. Hence, client-side timeouts are used as a mechanism to make sure that the system can operate even in the event of communication issues.

A unique characteristic of a client-side timeout is that the decision to give up on a request is made solely by the client, in the absence of any feedback from the server. It’s entirely possible that the request in question is still being processed and utilizes the server’s resources. And, worst of all, the unaware server can happily return the response to the client after it’s done processing, even though nobody’s interested in this stale data anymore! That presents another aspect of error handling: Drivers must be ready to handle stray, expired responses correctly.

Server-Side Timeouts

A server-side timeout determines when a database node should start considering a particular request as expired. Once this point in time has passed, there is no reason to continue processing the query. (Doing so would waste resources which could have otherwise been used for serving other queries that still have a chance to succeed.) When the specified time has elapsed, databases often return an error indicating that the request took too long.

Using reasonable values for server-side timeouts helps the database manage its priorities in a more precise way, allocating CPU, memory and other scarce resources on queries likely to succeed in a timely manner. Drivers that receive an error indicating that a server-side timeout has occurred should also act accordingly—perhaps by reducing the pressure on a particular node or retrying on another node that hasn’t experienced timeouts lately.

A Cautionary Tale

The CQL protocol, which specifies the communication layer in Apache Cassandra and ScyllaDB, comes with built-in support for concurrency. Namely, each request is assigned a stream ID, unique for each connection. This stream ID is encoded as a 16-bit integer with the first bit being reserved by the protocol, which leaves the drivers 32768 unique values for handling in-flight requests per single connection. This stream ID is later used to match an incoming response with its original request. That’s not a particularly large number, given that modern systems are known to handle millions of requests per second. Thus, drivers need to eventually reuse previously assigned stream IDs.

But the CQL driver for Python had a bug.Footnote 5 In the event of a client-side timeout, it assumed that the stream ID of an expired request was immediately free to reuse. While the assumption holds true if the server dies, it is incorrect if processing simply takes longer than expected. It was therefore possible that once a response with a given stream ID arrived, another request had already reused the stream ID, and the driver would mistakenly match the response with the new request. If the user was lucky, they would simply receive garbage data that did not pass validation. Unfortunately, data from the mismatched response might appear correct, even though it originates from a totally different request. This is the kind of bug that looks innocent at first glance, but may cause people to log in to other people’s bank accounts and wreak havoc on their lives.

A rule of thumb for client-side timeouts is to make sure that a server-side timeout also exists and is strictly shorter than the client-side one. It should take into account clock synchronization between clients and servers (or lack thereof), as well as estimated network latency. Such a procedure minimizes the chances for a late response to arrive at all, and thus removes the root cause of many issues and vulnerabilities.

Contextual Awareness

At this point it should be clear that both servers and clients can make better, more educated, and mutually beneficial decisions if they know more about each other. Exchanging timeout information is important, but drivers and servers can do even more to keep each other up to date.

Topology and Metadata

Database servers are often combined into intricate topologies where certain nodes are grouped in a single geographical location, others are used only as a fast cache layer, and yet others store seldom accessed cold data in a cheap place, for emergency purposes only.

Not every database exposes its topology to the end-user. For example, DynamoDB takes that burden off of its clients and exposes only a single endpoint, taking care of load balancing, overload prevention, and retry mechanisms on its own. On the other hand, a fair share of popular databases (including ScyllaDB, Cassandra, and ArangoDB) rely on the drivers to connect to each node, decide how many connections to keep, when to speculatively retry, and when to close connections if they are suspected of malfunctioning. In the ScyllaDB case, sharing up-to-date topology information with the drivers helps them make the right decisions. This data can be shared in multiple ways:

  • Clients periodically fetching topology information from the servers

  • Clients subscribing to events sent by the servers

  • Clients taking an active part in one of the information exchange protocols (e.g., gossipFootnote 6)

  • Any combination of these

Depending on the database model, another valuable piece of information often cached client-side is metadata—a prime example of which is database schema. SQL databases, as well as many NoSQL ones, keep the data at least partially structured. A schema defines the shape of a database row (or column), the kinds of data types stored in different columns, and various other characteristics (e.g., how long a database row is supposed to live before it’s garbage-collected). Based on up-to-date schemas, drivers can perform additional validation, making sure that data sent to the server has a proper type and adheres to any constraints required by the database. On the other hand, when a driver-side cache for schemas gets out of sync, clients can experience their queries failing for no apparent reason.

Synchronizing full schema information can be costly in terms of performance, and finding a good compromise in how often to update highly depends on the use case. A rule of thumb is to update only as often as needed to ensure that the traffic induced by metadata exchange never negatively impacts the user experience. It’s also worth noting that in a distributed database, clients are not always up to date with the latest schema information, and the system as a whole should be prepared to handle it and provide tactics for dealing with such inconsistencies.

Current Load

Overload protection and request latency optimization are tedious tasks, but they can be substantially facilitated by exchanging as much context as possible between interested parties.

The following methods can be applied to distribute the load evenly across the distributed system and prevent unwanted spikes:

  1. 1.

    Gathering latency statistics per each database connection in the drivers:

    1. a.

      What’s the average latency for this connection?

    2. b.

      What’s the 99th percentile latency?

    3. c.

      What’s the maximum latency experienced in a recent time frame?

  2. 2.

    Exchanging information about server-side caches:

    1. a.

      Is the cache full?

    2. b.

      Is the cache warm (i.e., filled with useful data)?

    3. c.

      Are certain items experiencing elevated traffic and/or latency?

  3. 3.

    Interpreting server events:

    1. a.

      Has the server started replying with “overload errors”?

    2. b.

      How often do requests for this server time out?

    3. c.

      What is the general rate of errors for this server?

    4. d.

      What is the measured goodput from this server?

Based on these indicators, drivers should try to amend the amount of data they send, the concurrency, and the rate of retries as well as speculative execution, which can keep the whole distributed system in a healthy, balanced state. It’s ultimately in the driver’s interest to ease the pressure on nodes that start showing symptoms of getting overloaded, be it by reducing the concurrency of operations, limiting the frequency and number of retries, temporarily giving up on speculatively sent requests, and so on. Otherwise, if the database servers get overloaded, all clients may experience symptoms like failed requests, timeouts, increased latency, and so on.

Request Caching

Many database management systems, ranging from SQLite, MySQL, and Postgres to NoSQL databases, implement an optimization technique called prepared statements. While the language used to communicate with the database is usually human-readable (or at least developer-readable), it is not the most efficient way of transferring data from one computer to another.

Let’s take a look at the (simplified) lifecycle of an unprepared statement once it’s sent from a ScyllaDB driver to the database and back. This is illustrated in Figure 5-4.

Figure 5-4
A lifecycle of an unprepared statement which consists of, 1. A query string, 2. The string is packed into a C Q L frame by the driver, 3. The CQL frame is sent over the network, 4. The database parses the string to validate its content, and 5. Database processing.

Lifecycle of an unprepared statement

  1. 1.

    A query string is created:

    INSERT INTO my_table(id, descr) VALUES (42, 'forty two');

  2. 2.

    The string is packed into a CQL frame by the driver. Each CQL frame consists of a header, which describes the purpose of a particular frame. Following the header, a specific payload may be sent as well. The full protocol specification is available at https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec.

  3. 3.

    The CQL frame is sent over the network.

  4. 4.

    The frame is received by the database.

  5. 5.

    Once the frame is received, the database interprets the frame header and then starts parsing the payload. If there’s an unprepared statement, the payload is represented simply as a string, as seen in Step 1.

  6. 6.

    The database parses the string in order to validate its contents and interpret what kind of an operation is requested: is it an insertion, an update, a deletion, a selection?

  7. 7.

    Once the statement is parsed, the database can continue processing it (e.g., by persisting data on disk, fetching whatever’s necessary, etc.).

Now, imagine that a user wants to perform a hundred million operations on the database in quick succession because the data is migrated from another system. Even if parsing the query strings is a relatively fast operation and takes 50 microseconds, the total time spent on parsing strings will take over an hour of CPU time. Sounds like an obvious target for optimization.

The key observation is that operations performed on a database are usually similar to one another and follow a certain pattern. For instance, migrating a table from one system to another may mean sending lots of requests with the following schema:

INSERT INTO my_table(id, descr) VALUES (?, ?)

where ? denotes the only part of the string that varies between requests.

This query string with question marks instead of real values is actually also valid CQL! While it can’t be executed as is (because some of the values are not known), it can be prepared.

Preparing such a statement means that the database will meticulously analyze the string, parse it, and create an internal representation of the statement in its own memory. Once done, a unique identifier is generated and sent back to the driver. The client can now execute the statement by providing only its identifier (which is a 128-bit UUIDFootnote 7 in ScyllaDB) and all the values missing from the prepared query string. The process of replacing question marks with actual values is called binding and it’s the only thing that the database needs to do instead of launching a CQL parser, which offers a significant speedup.

Preparing statements without care can also be detrimental to overall cluster performance though. When a statement gets prepared, the database needs to keep a certain amount of information about it in memory, which is hardly a limitless resource. Caches for prepared statements are usually relatively small, under the assumption that the driver’s users (app developers) are kind and only prepare queries that are used frequently. If, on the other hand, a user were to prepare lots of unique statements that aren’t going to be reused any time soon, the database cache might invalidate existing entries for frequently used queries. The exact heuristics of how entries are invalidated depends on the algorithm used in the cache, but a naive LRU (least recently used) eviction policy is susceptible to this problem. Therefore, other cache algorithms resilient to such edge cases should be considered when designing a cache without full information about expected usage patterns. Some notable examples include the following:

  • LFU (least frequently used)

    Aside from keeping track of which item was most recently accessed, LFU also counts how many times it was needed in a given time period, and tries to keep frequently used items in the cache.

  • LRU with two pools

    One probationary pool for new entries, and another, usually larger, pool for frequently used items. This algorithm avoids cache thrashing when lots of one-time entries are inserted in the cache, because they only evict other items from the probationary pool, while more frequently accessed entries are safe in the main pool.

Finally, regardless of the algorithm used for cache eviction implemented server-side, drivers should take care not to prepare queries too aggressively, especially if it happens automatically, which is often the case in ORMs (object-relational mappings). Making an interface convenient for the user may sound tempting, and developer experience is indeed an important factor when designing a driver, but being too eager with reserving precious database resources may be disadvantageous in the long term.

Query Locality

In distributed systems, any kind of locality is welcome because it reduces the chances of failure, keeps the latency low, and generally prevents many undesirable events. While database clients, and thus also drivers, do not usually share the same machines with the database cluster, it is possible to keep the distance between them short. “Distance” might mean either a physical measure or the number of intermediary devices in the network topology. Either way, for latency’s sake, it’s good to minimize it between parties that need to communicate with each other frequently.

Many database management systems allow their clients to announce their “location,” for example, by declaring which datacenter is their local, default one. Drivers should take that information into account when communicating with the database nodes. As long as all consistency requirements are fulfilled, it’s usually better to send data directly to a nearby node, under the assumption that it will spend less time in transit. Short routes also usually imply fewer middlemen, and that in turn translates to fewer potential points of failure.

Drivers can make much more educated choices though. Quite a few NoSQL databases can be described as “distributed hash tables” because they partition their data and spread it across multiple nodes which own a particular set of hashes. If the hashing algorithm is well known and deterministic, drivers can leverage that fact to try to optimize the queries even further—sending data directly to the appropriate node, or even the appropriate CPU core.

ScyllaDB, Cassandra, and other NoSQL databases apply a concept of tokenFootnote 8 awareness (see Figures 5-5, 5-6, and 5-7):

  1. 1.

    A request arrives.

  2. 2.

    The receiving node computes the hash of the given input.

  3. 3.

    Based on the value of this hash, it computes which database nodes are responsible for this particular value.

  4. 4.

    Finally, it forwards the request directly to the owning nodes.

Figure 5-5
A Naive client interacts with Coordinator node, which in turn interacts with Data replicas.

Naive clients route queries to any node (coordinator)

However, in certain cases, the driver can compute the token locally on its own, and then use the cluster topology information to route the request straight to the owning node. This local node-level routing saves at least one network round-trip as well as the CPU time of some of the nodes.

Figure 5-6
A Token aware client interacts with Coordinator plus data replica, which in turn interacts with Data replicas.

Token-aware clients route queries to the right node(s)

In the Cassandra/ScyllaDB case, this is possible because each table has a well-defined “partitioner,” which simply means a hash function implementation. The default choice—used in Cassandra—is murmur3,Footnote 9 which returns a 64-bit hash value, has satisfying distribution, and is relatively cheap to compute. ScyllaDB takes it one step further and allows the drivers to calculate which CPU core of which database node owns a particular datum. When a driver is cooperative and proactively establishes a separate connection per each core of each machine, it can send the data not only to the right node, but also straight to the single CPU core responsible for handling it. This not only saves network bandwidth, but is also very friendly to CPU caches.

Figure 5-7
A shard aware client interacts with Coordinator plus data replica, which in turn interacts with Data replicas.

Shard-aware clients route queries to the correct node(s) + core

Retries

In a perfect system, no request ever fails and logic implemented in the drivers can be kept clean and minimal. In the real world, failures happen disturbingly often, so the drivers should also be ready to deal with them. One such mechanism for failure tolerance is a driver’s retry policy. A retry policy’s job is to decide whether a request should be sent again because it failed (or at least the driver strongly suspects that it did).

Error Categories

Before diving into techniques for retrying requests in a smart way, there’s a more fundamental question to consider: does a retry even make sense? The answer is not that obvious and it depends on many internal and external factors. When a request fails, the error can fall into the following categories, presented with a few examples:

  1. 1.

    Timeouts

    1. a.

      Read timeouts

    2. b.

      Write timeouts

  2. 2.

    Temporary errors

    1. a.

      Database node overload

    2. b.

      Dead target node

    3. c.

      Temporary schema mismatch

  3. 3.

    Permanent errors

    1. a.

      Incorrect query syntax

    2. b.

      Authentication error

    3. c.

      Insufficient permissions

Depending on the category, the retry decision may be vastly different. For instance, it makes absolutely no sense to retry a request that has incorrect syntax. It will not magically start being correct, and such a retry attempt would only waste bandwidth and database resources.

Idempotence

Error categories aside, retry policy must also consider one important trait of the request itself: its idempotence. An idempotent request can be safely applied multiple times, and the result will be indistinguishable from applying it just once.

Why does this need to be taken into account at all? For certain classes of errors, the driver cannot be sure whether the request actually succeeded. A prime example of such error is a timeout. The fact that the driver did not manage to get a response in time does not mean that the server did not successfully process the request. It’s a similar situation if the network connection goes down: The driver won’t know if the database server actually managed to apply the request.

When in doubt, the driver should make an educated guess in order to ensure consistency. Imagine a request that withdraws $100 from somebody’s bank account. You certainly don’t want to retry the same request again if you’re not absolutely sure that it failed; otherwise, the bank customer might become a bit resentful. This is a perfect example of a non-idempotent request: Applying it multiple times changes the ultimate outcome.

Fortunately, there’s a large subset of idempotent queries that can be safely retried, even when it’s unclear whether they already succeeded:

  1. 1.

    Read-only requests

    Since they do not modify any data, they won’t have any side effects, no matter how often they’re retried.

  2. 2.

    Certain conditional requests that have compare-and-set characteristics (e.g., “bump the value by 1 if the previous value is 42”)

    Depending on the use case, such a condition may be enough to guarantee idempotence. Once this request is applied, applying it again would have no effect since the previous value would then be 43.

  3. 3.

    Requests with unique timestamps

    When each request has a unique timestamp (represented in wall clock time or based on a logical clockFootnote 10), applying it multiple times can be idempotent. A retry attempt will contain a timestamp identical to the original request, so it will only overwrite data identified by this particular timestamp. If newer data arrives in-between with a newer timestamp, it will not be overwritten by a retry attempt with an older timestamp.

In general, it’s a good idea for drivers to give users an opportunity to declare their requests’ idempotence explicitly. Some queries can be trivially deduced to be idempotent by the driver (e.g., when it’s a read-only SELECT statement in the database world), but others may be less obvious. For example, the conditional example from the previous Step 2 is idempotent if the value is never decremented, but not in the general case. Imagine the following counter-example:

  1. 1.

    The current value is 42.

  2. 2.

    A request “bump the value by 1 if the previous value is 42” is sent.

  3. 3.

    A request “bump the value by 1 if the previous value is 42” is retried.

  4. 4.

    Another request, “decrement the value by 1,” is sent.

  5. 5.

    The request from Step 2 arrives and is applied—changing the value to 43.

  6. 6.

    The request from Step 4 arrives and is applied—changing the value to 42.

  7. 7.

    The retry from Step 3 is applied—changing the value back to 43 and interfering with the effect of the query from Step 4. This wasn’t idempotent after all!

Since it’s often impossible to guess if a request is idempotent just by analyzing its contents, it’s best for drivers to have a set_idempotent() function exposed in their API. It allows the users to explicitly mark some queries as idempotent, and then the logic implemented in the driver can assume that it’s safe to retry such a request when the need arises.

Retry Policies

Finally, there’s enough context to discuss actual retry policies that a database driver could implement. The sole job of a retry policy is to analyze a failed query and return a decision. This decision depends on the database system and its intrinsics, but it’s often one of the following (see Figure 5-8):

  • Do not retry

  • Retry on the same database node

  • Retry, but on a different node

  • Retry, but not immediately—apply some delay

Figure 5-8
Three illustrations of decision graph for retrying a query. 1. A box with label, Idempotent and a question mark, leading to another box named, Safe, if yes. 2. A box with label, Definite failure and a question mark, leading to another box named, Safe, if yes. 3. A box with label, Ok to write twice and a question mark, leading to another box named, Safe, if yes, and leading to another box named, Not safe, if No.

Decision graph for retrying a query

Deciding not to retry is often a decent choice—it’s the only correct one when the driver isn’t certain whether an idempotent query really failed or just timed out. It’s also the obvious choice for permanent errors; there’s no point in retrying a request that was previously refused due to incorrect syntax. And whenever the system is overloaded, the “do not retry” approach might help the entire cluster. Although the immediate effect (preventing a user’s request from being driven to completion) is not desirable, it provides a level of overload protection that might pay off in the future. It prevents the overload condition from continuing to escalate. Once a node gets too much traffic, it refuses more requests, which increases the rate of retries, and ends up in a vicious circle.

Retrying on the same database node is generally a good option for timeouts. Assuming that the request is idempotent, the same node can probably resolve potential conflicts faster. Retrying on a different node is a good idea if the previous node showed symptoms of overload, or had an input/output error that indicated a temporary issue.

Finally, in certain cases, it’s a good idea to delay the retry instead of firing it off immediately (see Figure 5-9).

Figure 5-9
An illustration with three lines named, bottom to top, End user, App and Cluster. Attempt 1 just goes beyond App and ends. Attempt 2 reaches up to cluster and bounces back. Attempt 3 reaches up to cluster, bounces back to app and then to end user.

Retry attempts eventually resulting in a successful query

When the whole cluster shows the symptoms of overload—be it high reported CPU usage or perceived increased latency—retrying immediately after a request failed may only exacerbate the problem. What a driver can do instead is apply a gentle backoff algorithm, giving the database cluster time to recover. Remember that even a failed retry costs resources: networking, CPU, and memory. Therefore, it’s better to balance the costs and chances for success in a reasonable manner.

The three most common backoff strategies are constant, linear, and exponential backoff, as visualized in Figure 5-10.

Figure 5-10
Three lines. 1. An exponential line with dots at 1, 2, 7, 16, and 32, 2. A linear line with dots at 1, 2, 3, 6, 10, 16, 23, and 32, 3. A constant line with dots from 1 to 32.

Constant, linear, and exponential backoffs

The first type (constant) simply waits a certain predefined amount of time before retrying. Linear backoff increases the time between attempts in a linear fashion; it could wait one second before the first attempt, two seconds before the second one, and so forth. Finally, exponential backoff, arguably the most commonly used method, increases the delay by multiplying it by a constant each time. Usually it just doubles it—because both processors and developers love multiplying and dividing by two (the latter ones mostly just to show off their intricate knowledge of the bitwise shift operator). Exponential backoff has especially nice characteristics for overload prevention. The retry rate drops exponentially, and so does the pressure that the driver places on the database cluster.

Paging

Databases usually store amounts of data that are orders of magnitude larger than a single client machine could handle. If you fetch all available records, the result is unlikely to fit into your local disks, not to mention your available RAM. Nonetheless, there are many valid cases for processing large amounts of data, such as analyzing logs or searching for specific documents. It is quite acceptable to ask the database to serve up all the data it has—but you probably want it to deliver that data in smaller bits.

That technique is customarily called paging, and it is ubiquitous. It’s exactly what you’ve experienced when browsing through page 17 of Google search results in futile search for an answer to a question that was asked only on an inactive forum seven years ago—or getting all the way to page 24 of eBay listings, hunting for that single perfect offer. Databases and their drivers also implement paging as a mechanism beneficial for both parties. Drivers get their data in smaller chunks, which can be done with lower latency. And databases receive smaller queries, which helps with cache management, workload prioritization, memory usage, and so on.

Different database models may have a different view of exactly what paging involves and how you interface with it. Some systems may offer fine-grained control, which allows you to ask for “page 16” of your data. Others are “forward-only”: They reduce the user-facing interface to “here’s the current page—you can ask for the next page if you want.” Your ability to control the page size also varies. Sometimes it’s possible to specify the size in terms of a number of database records or bytes. In other cases, the page size is fixed.

On top of a minimal interface that allows paging to be requested, drivers can offer many interesting features and optimizations related to paging. One of them is readahead—which usually means that the driver transparently and speculatively fetches new pages before you actually ask for them to be read. A readahead is a classic example of a double-edged sword. On the one hand, it makes certain read operations faster, especially if the workload consists of large consecutive reads. On the other, it may cause prohibitive overhead, especially if the workload is based on small random reads.

Although most drivers support paging, it’s important to check whether the feature is opt-in or opt-out and consciously decide what’s best for a specific workload. In particular, pay attention to the following aspects:

  1. 1.

    What’s the default behavior (would a read query be paged or unpaged)?

  2. 2.

    What’s the default page size and is it configurable? If so, in what units can a size be specified? Bytes? Number of records?

  3. 3.

    Is readahead on by default? Can it be turned on/off?

  4. 4.

    Can readahead be configured further? For example, can you specify how many pages to fetch or when to decide to start fetching (e.g., “When at least three consecutive read requests already occurred”)?

Setting up paging properly is important because a single unpaged response can be large enough to be problematic for both the database servers forced to produce it, and for the client trying to receive it. On the other hand, too granular paging can lead to unnecessary overhead (just imagine trying to read a billion records row-by-row, due to the default page size of “1 row”). Finally, readahead can be a fantastic optimization technique—but it can also be entirely redundant, fetching unwanted pages that cost memory, CPU time, and throughput, as well as confuse the metrics and logs. With paging configuration, it’s best to be as explicit as possible.

Concurrency

In many cases, the only way to utilize a database to the fullest—and achieve optimal performance—is to also achieve high concurrency. That often requires the drivers to perform many I/O operations at the same time, and that’s in turn customarily achieved by issuing asynchronous tasks. That being said, let’s take quite a few steps back to explain what that really means and what’s involved in achieving that from both a hardware and software perspective.

Note

High concurrency is not a silver bullet. When it’s too high, it’s easy to overload the system and ruin the quality of service for other users—see Figure 5-11 for its effect on latency. Chapter 1 includes a cautionary tale on what can happen when concurrency gets out of bounds and Chapter 2 also touches on the dangers of unbounded concurrency.

Modern Hardware

Back in the old days, making decisions around I/O concurrency was easy because magnetic storage drives (HDD) had an effective concurrency of 1. There was (usually) only a single actuator arm used to navigate the platters, so only a single sector of data could have been read at once. Then, an SSD revolution happened. Suddenly, disks could read from multiple offsets concurrently. Moreover, it became next to impossible to fully utilize the disk (i.e., to read and write with the speeds advertised in shiny numbers printed on their labels) without actually asking for multiple operations to be performed concurrently. Now, with enterprise-grade NVMe drives and inventions like Intel Optane,Footnote 11 concurrency is a major factor when benchmarking input/output devices. See Figure 5-11.

Figure 5-11
A line graph between latency and concurrency. A solid line begins at the bottom left, remains almost flat and rises gradually after x-axis value of 5, to reach the top right. Another line is present at y-axis value of 3. Values are estimated.

Relationship between the system’s concurrency and latency

Networking technology is not lagging behind either. Modern networking cards have multiple independent queues, which, with the help of receive-side scaling (RSSFootnote 12), enable previously unimaginable levels of performance, with throughput measured in Tbps.Footnote 13 With such advanced hardware, achieving high concurrency in software is required to simply utilize the available capabilities.

CPU cores obviously deserve to be mentioned here as well. That’s the part of computer infrastructure that’s undoubtedly most commonly associated with concurrency. Buying a 64-core consumer-grade processor is just a matter of going to the hardware store next door, and the assortment of professional servers is even more plentiful.

Operating systems focus on facilitating highly concurrent programs too. io_uringFootnote 14 by Jens Axboe is a novel addition to the Linux kernel. As noted in Chapter 3, it was developed for asynchronous I/O, which in turn plays a major part in allowing high concurrency in software to become the new standard. Some database drivers already utilize io_uring underneath, and many more put the integration very high in the list of priorities.

Modern Software

How could modern software adapt to the new, highly concurrent era? Historically, a popular model of ensuring that multiple operations can be performed at the same time was to keep a pool of operating system threads, with each thread having its own queue of tasks. That only scales in a limited way though, so now the industry leans toward so-called “green threads,” which are conceptually similar to their operating system namesakes, but are instead implemented in userspace, in a much more lightweight manner.

For example, in Seastar (a high-performance asynchronous framework implemented in C++ and based on a future-promise modelFootnote 15), there are quite a few ways of expressing a single flow of execution, which could be called a green thread. A fiber of execution can be created by chaining futures, and you can also use the C++ coroutines mechanism to build asynchronous programs in a clean way, with the compiler assisting in making the code async-friendly.

In the Rust language, the asynchronous model is quite unique. There, a future represents the computation, and it’s the programmer’s responsibility to advance the state of this asynchronous state machine. Other languages, like JavaScript, Go, and Java, also come with well-defined and standardized support for asynchronous programming.

This async programming support is good, because database drivers are prime examples of software that should support asynchronous operations from day one. Drivers are generally responsible for communicating over the network with highly specialized database clusters, capable of performing lots of I/O operations at the same time. We can’t emphasize enough that high concurrency is the only way to utilize the database to the fullest. Asynchronous code makes that substantially easier because it allows high levels of concurrency to be achieved without straining the local resources. Green threads are lightweight and there can be thousands of them even on a consumer-grade laptop. Asynchronous I/O is a perfect fit for this use case as well because it allows efficiently sending thousands of requests over the network in parallel, without blocking the CPU and forcing it to wait for any of the operations to complete, which was a known bottleneck in the legacy threadpool model.

Note

The future-promise model and asynchronous I/O are introduced in Chapter 3.

What to Look for When Selecting a Driver

Database drivers are commonly available as open-source software. It’s a great model that allows people to contribute and also makes the software easily accessible, ergo popular (precisely what database vendors want). Drivers can be developed either by the vendor, or another company, or simply your next door open-source contributor. This kind of competition is very healthy for the entire system, but it also forces the users to make a choice: which driver to use? For instance, at the time of this writing, the official PostgreSQL documentation lists six drivers for C/C++ alone, with the complete list being much longer.Footnote 16

Choosing a driver should be a very deliberate decision, tailored to your unique situation and preceded by tests, benchmarks, and evaluations. Nevertheless, there are some general rules of thumb that can help guide you:

  1. 1.

    Clear documentation

    Clear documentation is often initially underestimated by database drivers’ users and developers alike. However, in the long term, it’s the most important repository of knowledge for everyone, where implementation details, good practices, and hidden assumptions can be thoroughly explained. Choosing an undocumented driver is a lottery—buying a pig in a poke. Don’t get distracted by shiny benchmarks on the front page; the really valuable part is thorough documentation. Note that it does not have to be a voluminous book. On the contrary—concise, straight-to-the-point docs with clear, working examples are even better.

  2. 2.

    Long-term support and active maintainership

    Officially supported drivers are often maintained by their vendors, get released regularly, and have their security vulnerabilities fixed faster. External open-source drivers might look appealing at first, easily winning in their self-presented benchmarks, but it’s important to research how often they get released, how often bugs are fixed, and how likely they are to be maintained in the foreseeable future. On the other hand, sometimes the situation is reversed: The most modern, efficient code can be found in an open-source driver, while the official one is hardly maintained at all!

  3. 3.

    Asynchronous API

    Your code is eventually going to need high concurrency, so it’s better to bet on an async-friendly driver, even if you’re not ready to take advantage of that quite yet. The decision will likely pay off later. While it’s easy to use an asynchronous driver in a synchronous manner, the opposite is not true.

  4. 4.

    Decent test coverage

    Testing is extremely important not only for the database nodes, but also for the drivers. They are the first proxy between the users and the database cluster, and any error in the driver can quickly propagate to the whole system. If the driver corrupts outgoing data, it may get persisted on the database, eventually making the whole cluster unusable. If the driver incorrectly interprets incoming data, its users will have a false picture of the database state. And if it produces data based on this false picture, it can just as well corrupt the entire database cluster. A driver that cannot properly handle its load balancing and retry policy can inadvertently overload a database node with excess requests, which is detrimental to the whole system. If the driver is at least properly tested, users can assume a higher level of trust in it.

  5. 5.

    Database-specific optimizations

    A good driver should cooperate with its database. The more context it gathers from the cluster, the more educated decisions it can make. Remember that clients, and therefore drivers, are often the most ubiquitous group of agents in distributed systems, directly contributing to the cluster-wide concurrency. That makes it especially important for them to be cooperative.

Summary

This chapter provided insights into how the choice of a database driver impacts performance and highlighted considerations to keep in mind when selecting a driver. Drivers are often an overlooked part of a distributed system. That’s a shame because drivers are so close to database users, both physically and figuratively! Proximity is an extremely important factor in all networked systems because it directly translates to latency. The next chapter ponders proximity from a subtly different point of view: How to get the data itself closer to the application users.