Advertisement

Getting Started with Couchbase Server

  • David Ostrovsky
  • Yaniv Rodenski
Chapter

Abstract

Relational databases have dominated the data landscape for over three decades. Emerging in the 1970s and early 1980s, relational databases offered a searchable mechanism for persisting complex data with minimal use of storage space. Conserving storage space was an important consideration during that era, due to the high price of storage devices. For example, in 1981, Morrow Designs offered a 26 MB hard drive for $3,599—which was a good deal compared to the 18 MB North Star hard drive for $4,199, which had appeared just six months earlier. Over the years, the relational model progressed, with the various implementations providing more and more functionality.

Keywords

Relational Database Server Node Cluster Manager NoSQL Database Document Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Relational databases have dominated the data landscape for over three decades. Emerging in the 1970s and early 1980s, relational databases offered a searchable mechanism for persisting complex data with minimal use of storage space. Conserving storage space was an important consideration during that era, due to the high price of storage devices. For example, in 1981, Morrow Designs offered a 26 MB hard drive for $3,599—which was a good deal compared to the 18 MB North Star hard drive for $4,199, which had appeared just six months earlier. Over the years, the relational model progressed, with the various implementations providing more and more functionality.

One of the things that allowed relational databases to provide such a rich set of capabilities was the fact that they were optimized to run on a single machine. For many years, running on a single machine scaled nicely, as newer and faster hardware became available in frequent intervals. This method of scaling is known as vertical scaling. And while most relational databases could also scale horizontally—that is, scale across multiple machines—it introduced additional complexity to the application and database design, and often resulted in inferior performance.

From SQL to NoSQL

This balance was finally disrupted with the appearance of what is known today as Internet scale, or web scale, applications. Companies such as Google and Facebook needed new approaches to database design in order to handle the massive amounts of data they had. Another aspect of the rapidly growing industry was the need to cope with constantly changing application requirements and data structure. Out of these new necessities for storing and accessing large amounts of frequently changing data, the NoSQL movement was born. These days, the term NoSQL is used to describe a wide range of mechanisms for storing data in ways other than with relational tables. Over the past few years, dozens of open-source projects, commercial products, and companies have begun offering NoSQL solutions.

The CAP Theorem

In 2000, Eric Brewer, a computer scientist from the University of California, Berkeley, proposed the following conjecture:

It is impossible for a distributed computer system to satisfy the following three guarantees simultaneously (which together form the acronym CAP):
  • Consistency: All components of the system see the same data.

  • Availability: All requests to the system receive a response, whether success or failure.

  • Partition tolerance: The system continues to function even if some components fail or some message traffic is lost.

A few years later, Brewer further clarified that consistency and availability in CAP should not be viewed as binary, but rather as a range—and distributed systems can compromise with weaker forms of one or both in return for better performance and scalability. Seth Gilbert and Nancy Lynch of MIT offered a formal proof of Brewer’s conjecture. While the formal proof spoke of a narrower use of CAP, and its status as a “theorem” is heavily disputed, the essence is still useful for understanding distributed system design.

Traditional relational databases generally provide some form of the C and A parts of CAP and struggle with horizontal scaling because they are unable to provide resilience in the face of node failure. The various NoSQL products offer different combinations of CA/AP/CP. For example, some NoSQL systems provide a weaker form of consistency, known as eventual consistency, as a compromise for having high availability and partition tolerance. In such systems, data arriving at one node isn’t immediately available to others—the application logic has to handle stale data appropriately. In fact, letting the application logic make up for weaker consistency or availability is a common approach in distributed systems that use NoSQL data stores.

As you’ll see in this book, Couchbase Server provides cluster-level consistency and good partition tolerance through replication.

NoSQL and Couchbase Server

NoSQL databases have made a rapid entrance onto the main stage of the database world. In fact, it is the wide variety of available NoSQL products that makes it hard to find the right choice for your needs. When comparing NoSQL solutions, we often find ourselves forced to compare different products feature by feature in order to make a decision. In this dense and competitive marketplace each product must offer unique capabilities to differentiate itself from its brethren.

Couchbase Server is a distributed NoSQL database, which stands out due to its high performance, high availability, and scalability. Reliably providing these features in production is not a trivial thing, but Couchbase achieves this in a simple and easy manner. Let’s take a look at how Couchbase deals with these challenges.
  • Scaling: In Couchbase Server, data is distributed automatically over nodes in the cluster, allowing the database to share and scale out the load of performing lookups and disk IO horizontally. Couchbase achieves this by storing each data item in a vBucket, a logical partition (sometimes called a shard), which resides on a single node. The fact that Couchbase shards the data automatically simplifies the development process. Couchbase Server also provides a cross-datacenter replication (XDCR) feature, which allows Couchbase Server clusters to scale across multiple geographical locations.

  • High availability: Couchbase can replicate each vBucket across multiple nodes to support failover. When a node in the cluster fails, the Couchbase Server cluster makes one of the replica vBuckets available automatically.

  • High performance: Couchbase has an extensive integrated caching layer. Keys, metadata, and frequently accessed data are kept in memory in order to increase read/write throughput and reduce data access latency.

To understand how unique Couchbase Server is, we need to take a closer look at each of these features and how they’re implemented. We will do so later in this chapter, because first we need to understand Couchbase as a whole. Couchbase Server, as we know it today, is the progeny of two products: Apache CouchDB and Membase. CouchOne Inc., was a company funded by Damien Katz, the creator of CouchDB. The company provided commercial support for the Apache CouchDB open-source database. In February 2011 CouchOne Inc. merged with Membase Inc., the company behind the open source Membase distributed key-value store. Membase was created by a few of the core contributors of Memcached, the popular distributed cache project, and provided persistence and querying on top of the simplicity and high-performance key-value mechanism provided by Memcached.

The new company, called Couchbase Inc., released Couchbase Server, a product that was based on Membase’s scalable high-performance capabilities, to which they eventually added capabilities from CouchDB, including storage, indexing, and querying. The initial version of Couchbase Server included a caching layer, which traced its origins directly back to Membase, and a persistence layer, which owed a lot to Apache CouchDB.

Membase and CouchDB represent two of the leading approaches in the NoSQL world today: key-value stores and document-oriented databases. Both approaches still exist in today’s Couchbase Server.

Couchbase as Key-Value Store vs. Document Database

Key-value stores are, in essence, managed hash tables. A key-value store uses keys to access values in a straightforward and relatively efficient way. Different key-value stores expose different functionality on top of the basic hash-table-based access and focus on different aspects of data manipulation and retrieval.

As a key-value store, Couchbase is capable of storing multiple data types. These include simple data types such as strings, numbers, datetime, and booleans, as well as arbitrary binary data. For most of the simple data types, Couchbase offers a scalable, distributed data store that provides both key-based access as well as minimal operations on the values. For example, for numbers you can use atomic operations such as increment and decrement. Operations are covered in depth in  Chapter 4.

Document databases differ from key-value stores in the way they represent the stored data. Key-value stores generally treat their data as opaque blobs and do not try to parse it, whereas document databases encapsulate stored data into “documents” that they can operate on. A document is simply an object that contains data in some specific format. For example, a JSON document holds data encoded in the JSON format, while a PDF document holds data encoded in the Portable Document binary format.

Note

JavaScript Object Notation (JSON) is a widely used, lightweight, open data interchange format. It uses human-readable text to encode data objects as collections of name–value pairs. JSON is a very popular choice in the NoSQL world, both for exchanging and for storing data. You can read more about it at: www.json.org .

One of the main strengths of this approach is that documents don’t have to adhere to a rigid schema. Each document can have different properties and parts that can be changed on the fly without affecting the structure of other documents. Furthermore, document databases actually “understand” the content of the documents and typically offer functionality for acting on the stored data, such as changing parts of the document or indexing documents for faster retrieval. Couchbase Server can store data as JSON documents, which lets it index and query documents by specific fields.

Couchbase Server Architecture

A Couchbase Server cluster consists of between 1 and 1024 nodes, with each node running exactly one instance of the Couchbase Server software. The data is partitioned and distributed between the nodes in the cluster. This means that each node holds some of the data and is responsible for some of the storing and processing load. Distributing data this way is often referred to as sharding, with each partition referred to as a shard.

Each Couchbase Server node has two major components: the Cluster Manager and the Data Manager, as shown in Figure 1-1. Applications use the Client Software Development Kits (SDKs) to communicate with both of these components. The Couchbase Client SDKs are covered in depth in  Chapter 3.
  • The Cluster Manager: The Cluster Manager is responsible for configuring nodes in the cluster, managing the rebalancing of data between nodes, handling replicated data after a failover, monitoring nodes, gathering statistics, and logging. The Cluster Manager maintains and updates the cluster map, which tells clients where to look for data. Lastly, it also exposes the administration API and the web management console. The Cluster Manager component is built with Erlang/OTP, which is particularly suited for creating concurrent, distributed systems.

  • The Data Manager: The Data Manager, as the name implies, manages data storage and retrieval. It contains the memory cache layer, the disk persistence mechanism, and the query engine. Couchbase clients use the cluster map provided by the Cluster Manager to discover which node holds the required data and then communicate with the Data Manager on that node to perform database operations.

Figure 1-1.

Couchbase server architecture

Data Storage

Couchbase manages data in buckets—logical groupings of related resources. You can think of buckets as being similar to databases in Microsoft SQL Server, or to schemas in Oracle. Typically, you would have separate buckets for separate applications. Couchbase supports two kinds of buckets: Couchbase and memcached.

Memcached buckets store data in memory as binary blobs of up to 1 MB in size. Data in memcached buckets is not persisted to disk or replicated across nodes for redundancy. Couchbase buckets, on the other hand, can store data as JSON documents, primitive data types, or binary blobs, each up to 20 MB in size. This data is cached in memory and persisted to disk and can be dynamically rebalanced between nodes in a cluster to distribute the load. Furthermore, Couchbase buckets can be configured to maintain between one and three replica copies of the data, which provides redundancy in the event of node failure. Because each copy must reside on a different node, replication requires at least one node per replica, plus one for the active instance of data.

Documents in a bucket are further subdivided into virtual buckets (vBuckets) by their key. Each vBucket owns a subset of all the possible keys, and documents are mapped to vBuckets according to a hash of their key. Every vBucket, in turn, belongs to one of the nodes of the cluster. As shown in Figure 1-2, when a client needs to access a document, it first hashes the document key to find out which vBucket owns that key. The client then checks the cluster map to find which node hosts the relevant vBucket. Lastly, the client connects directly to the node that stores the document to perform the get operation.
Figure 1-2.

Sharding and replicating a bucket across nodes

In addition to maintaining replicas of data within buckets, Couchbase can replicate data between entire clusters. Cross-Datacenter Replication (XCDR) adds further redundancy and brings data geographically closer to its users. Both in-bucket replication and XCDR occur in parallel. XCDR is covered in depth in  Chapter 9.

Installing Couchbase Server

Installing and configuring Couchbase Server is very straightforward. You pick the platform, the correct edition for your needs, and then download and run the installer. After the installation finishes, you use the web console, which guides you through a quick setup process.

Selecting a Couchbase Server Edition

Couchbase Server comes in two different editions: Enterprise Edition and Community Edition. There are some differences between them:
  • Enterprise Edition (EE) is the latest stable version of Couchbase, which includes all the bugfixes and has passed a rigorous QA process. It is free for use with any number of nodes for testing and development purposes, and with up to two nodes for production. You can also purchase an annual support plan with this edition.

  • Community Edition (CE) lags behind the EE by about one release cycle and does not include all the latest fixes or commercial support. However, it is open source and entirely free for use in testing and, if you’re very brave, in production. This edition is largely meant for enthusiasts and non-critical systems.

When you are ready to give Couchbase a hands-on try, download the appropriate installation package for your system from www.couchbase.com/download .

Installing Couchbase on Different Operating Systems

The installation step is very straightforward. Let’s take a look at how it works on different operating systems.

Linux

Couchbase is officially supported on several Linux distributions: Ubuntu 10.04 and higher, Red Hat Enterprise Linux (RHEL) 5 and 6, CentOS 5 and 6, and Amazon Linux. Unofficially, you can get Couchbase to work on most distributions, however, we recommend sticking to the supported operating systems in production environments.

Couchbase also requires OpenSSL to be installed separately. To install OpenSSL on RHEL run the following command:

> sudo yum install openssl

On Ubuntu, you can install OpenSSL using the following command:

> sudo apt-get install openssl

With OpenSSL installed, you can now install the Couchbase package you downloaded earlier.

RHEL:

> sudo rpm –install couchbase-server-<version>.rpm

Ubuntu:

> sudo dpkg -i couchbase-server-<version>.deb

Note that <version> is the version of the installer you have downloaded.

After the installation completes, you will see a confirmation message that Couchbase Server has been started.

Windows

On a Windows system, run the installer you’ve downloaded and follow the instructions in the installation wizard.

Mac OS X

Download and unzip the install package and then move the Couchbase Server.app folder to your Applications folder. Double-click Couchbase Server.app to start the server.

Note

Couchbase Server is not supported on Mac OS X for production purposes. It is recommended that you only use it for testing and development.

Configuring Couchbase Server

With Couchbase installed, you can now open the administration web console and configure the server. Open the following address in your browser: http://<server>:8091, where <server> is the machine on which you’ve installed Couchbase. The first time you open the web console, you’re greeted with the screen shown in Figure 1-3.
Figure 1-3.

Opening the web console for the first time

Click Setup to begin the configuration process, as shown in Figure 1-4.
Figure 1-4.

Configuring Couchbase Server, step 1

The Databases Path field is the location where Couchbase will store its persisted data. The Indices Path field is where Couchbase will keep the indices created by views. Both locations refer only to the current server node. Placing the index data on a different physical disk than the document data is likely to result in better performance, especially if you will be using many views or creating views on the fly. Indexing and views are covered in  Chapter 6.

In a Couchbase cluster, every node must have the same amount of RAM allocated. The RAM quota you set when starting a new cluster will be inherited by every node that joins the cluster in the future. It is possible to change the server RAM quota later through the command-line administration tools.

The Sample Buckets screen (shown in Figure 1-5) lets you create buckets with sample data and views so that you can test some of the features of Couchbase Server with existing samples. Throughout this book you’ll build your own sample application, so you won’t need the built-in samples, but feel free to install them if you’re curious.
Figure 1-5.

Configuring Couchbase Server, step 2

The next step, shown in Figure 1-6, is creating the default bucket. Picking the memcached bucket type will hide the unsupported configuration options, such as replicas and read-write concurrency.
Figure 1-6.

Configuring Couchbase Server, step 3

The memory size is the amount of RAM that will be allocated for this bucket on every node in the cluster. Note that this is the amount of RAM that will be allocated on every node, not the total amount that will be split between all nodes. The per-node bucket RAM quota can be changed later through the web console or via the command-line administration tools.

Couchbase buckets can replicate data across multiple nodes in the cluster. With replication enabled, all data will be copied up to three times to different nodes. If a node fails, Couchbase will make one of the replica copies available for use. Note that the “number of replicas” setting refers to copies of data. For example, setting it to 3 will result in a total of four instances of your data in the cluster, which also requires a minimum of four nodes.

Enabling index replication will also create copies of the indices. This has the effect of increasing traffic between nodes, but also means that the indices will not need to be rebuilt in the event of node failure. The “disk read-write concurrency” setting controls the number of threads that will perform disk IO operations for this bucket.  Chapter 8 goes into more detail about disk-throughput optimization. For now, we’ll leave this set at the default value. The Flush Enable checkbox controls whether the Flush command is enabled for the bucket. The Flush command deletes all data and is useful for testing and development, but should not be enabled for production databases.

The next step, Notifications, is shown in Figure 1-7.
Figure 1-7.

Configuring Couchbase Server, step 4

Update Notifications will show up in the web console to alert you of important news or product updates. Note that enabling update notifications will send anonymous data about your product version and server configuration to Couchbase (the company). This step also lets you register to receive email notifications and information related to Couchbase products.

The final step, Configure Server, as you can see in Figure 1-8, is to configure the administrator username and password. These credentials are used for administrative actions, such as logging into the web console or adding new nodes to the cluster. Data buckets you create are secured separately and do not use the administrator password.
Figure 1-8.

Configuring Couchbase Server, step 5

Tip

Avoid using the same combination as on your luggage.

Click Next to finish the setup process, and you will be taken to the Cluster Overview screen in the web console. Couchbase will need about a minute to finalize the setup process and then initialize the default bucket, after which you should see something similar to Figure 1-9.
Figure 1-9.

The Cluster Overview tab of the Couchbase web console

Congratulations—your Couchbase Server is now fully operational!

Creating a Bucket

Throughout this book, you’ll be building a sample application that will demonstrate the various features of Couchbase Server. RanteR, the Anti-Social Network, is a web application that lets users post “rants,” comment on rants, and follow their favorite—using the word loosely—ranters. It bears no resemblance whatsoever to any existing, well-known web applications. At all.

To start building your RanteR application, you’ll need a Couchbase bucket to hold all your data. Couchbase Server administration is covered in depth in the chapters in Part III, so for now you’ll only create a new bucket with mostly default values.

Click Create New Data Bucket on the Data Buckets tab of the web console to open the Create Bucket dialog, as shown in Figure 1-10.
Figure 1-10.

The Data Buckets tab of the Couchbase web console

Enter ranter as the bucket name, as shown in Figure 1-11, and set the RAM quota to a reasonable amount. RanteR doesn’t need much RAM for now. Leave the Access Control set to the standard port. You can enter a password to secure your data bucket.
Figure 1-11.

Creating a new Couchbase bucket

Because you only have one Couchbase node installed and configured, we cannot use replication, so make sure to uncheck the Replicas box as shown in Figure 1-12. For convenience, and because this is not a production server, enable the Flush command for this bucket. Leave the other settings at their default values for now. Click Create, and you are done.
Figure 1-12.

Creating a new Couchbase bucket, continued

Summary

As you saw in this chapter, setting up Couchbase Server is a fast and straightforward process. Now that you have it up and running, it’s time to consider how you’re going to use it. The next chapter examines the various considerations for designing a document database and mapping your application entities to documents.

Copyright information

© David Ostrovsky and Yaniv Rodenski 2014

Authors and Affiliations

  • David Ostrovsky
    • 1
  • Yaniv Rodenski
    • 1
  1. 1.BasingstokeUnited Kingdom

Personalised recommendations