Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Databases as a Service

  • Renato Luiz de Freitas CunhaEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_83-1

Synonyms

Definitions

Cloud provider: A cloud infrastructure provider, such as Amazon Web Services, Microsoft Azure, or IBM Cloud.

End user: A user of cloud services, a person or company that uses services from the cloud provider.

Database server: A database management system that uses the client-server model.

Database as a service: Cloud computing service model that abstracts the setup of hardware, software, or tuning for providing access to a database.

Acronyms

IaaS

Infrastructure as a service

PaaS

Platform as a service

SaaS

Software as a service

DBaaS

Database as a service

it

Information technology

dba

Database administrator

api

Application programming interface

Overview

Configuring and maintaining database systems is a complicated endeavor, which includes planning the hardware that will be used, configuring disks and file systems, installing and tuning database management systems, determining whether and how replication will be done, and managing backups. In traditional IT (IT) environments, the setup of database systems requires the work of different professionals, such as DBAs (DBAs), systems administrators, and storage administrators. In fact, tuning databases requires broad knowledge and a principled approach for measuring performance (Shasha and Bonnet 2002), and a well-tuned database system can perform faster than an improperly tuned system by a factor of two or more (Stonebraker and Cattell 2011).

In a DBAAS (DBAAS) setting, a service provider hosts databases providing mechanisms to create, store, and access databases at the provider. The means of interacting with the service are usually via web interfaces and APIs (APIs). Although popularized in recent years by cloud computing and big data, this is not a new concept and, in fact, has been available for more than a decade (Hacigumus et al. 2002).

More recently, cloud providers have made DBAAS offerings available. In a DBAAS model, application developers neither have to maintain machines nor have to install and maintain database management systems themselves. Instead, the database service provider is responsible for installing and maintaining database systems, and users are charged according to their usage of the service.

To better place the DBAAS model in the cloud computing service models, a quick review of the cloud computing definition will be presented. Interested readers are directed to other sources (Mell and Grance 2011; Armbrust et al. 2010) for a more detailed definition. Readers familiar with cloud computing concepts can safely skip the next section.

Defining Characteristics of Cloud Computing

The essential characteristics of the cloud computing model is that it has broad network access, with resource pooling, rapid elasticity, and measured services, and provides on-demand self-service (Mell and Grance 2011). Broad network access is a necessity in clouds, for cloud providers can be geographically distant from end users, and even in this case, end users rely on cloud services. In clouds, resources are pooled, meaning that resources are already available in the provider and, upon request, are assigned to users. These resource pools enable the rapid elasticity in clouds, enabling the addition and removal of resources in a short amount of time.

Services in clouds can be provided in three main models:

Software as a service ( SAAS )

In this model, applications execute on a cloud infrastructure, and users normally access them from a web browser. Usually, consumers are end users, who will not do development on top of such applications.

Platform as a service ( PAAS )

In this model, application developers and service providers are able to deploy onto the cloud infrastructure applications created using programming languages, libraries, tools, and services supported by the cloud provider.

Infrastructure as a service ( IAAS )

In this model, customers have access to computing, storage, network, and similar services where the customer can execute arbitrary operating systems and software.

In this entry, there is no distinction between private and public clouds, for the discussion will be made considering the perspective of the end user and the cloud provider. In private clouds, the infrastructure is provisioned for exclusive use by a single organization. In this model, different business units are usually the end users of cloud services. In public clouds, the infrastructure is provisioned for open use by the general public.

Positioning DBAAS

IAAS allows one to abstract the actual hardware configuration away, but the issues for properly configuring databases remain. Hence, end users still have to devote time and energy on properly configuring and maintaining database systems, when such energy would be better applied in developing the system they wanted to write in the first place.

From the cloud computing definition, DBAAS has aspects of both SAAS and PAAS: although databases are software packages, they expose a programming interface, the query language, with which developers interact with. In public clouds, DBAAS usually is offered in the PAAS layer.

Advantages of the DBAAS Model

There are many advantages in adopting a DBAAS model, all of them resulting from using a cloud-based service model. In DBAAS, due to the self-service nature of the cloud, to create new database instances, a user only needs to make a request in the cloud administration portal. Also, there is no need to plan for machines, storage, or replication strategies, since cloud providers either provide options to enable replication automatically in different zones or manage storage transparently, depending on provider and service.

The agility of cloud deployments should be contrasted with traditional IT operations. Consider, for example, that a business unit is developing a new application that requires a database server connection. To set up the server, a system administrator would have to be involved to allocate the machine and install the database server for the business unit. Between requesting a database server and actually being able to use it, the business unit could have to wait one or two business days. In the cloud, due to its high levels of automation, a new database instance can be provisioned in a matter of minutes.

Assuming the business unit of the previous example used IAAS as a model for installing database servers, whoever provisioned the computing resource would still be required to implement corporate policies, such as creating different accounts for different users and configuring passwords and firewalls according to corporate policy. Essentially, the business unit would still need access to a system administrator to properly configure the computing resource. Some of the work involved in this can be reduced by using virtual machine templates, but, given the control users can have over virtual machines, some human verification may still be required.

What makes DBAAS stand out is the heavy use of abstraction and automation, another characteristic of the cloud. To provision a new database server instance, the user specifies the database engine and version required and the amount of RAM and the amount of storage required, and the cloud provider takes care of actually provisioning the database instance.

In principle, DBAAS works well for both small and large users. Teams that need small databases benefit from the fast provisioning times and from the lack of requiring a DBA, while teams with large database requirements benefit from the ability to scale up instances whenever needed.

Disadvantages of the DBAAS Model

Although DBAAS is a good solution for many companies and teams, it is not without its issues. Users might need database extensions not available in the cloud provider. Consider, for example, PostGIS. PostGIS is an extension that adds support for geographic objects to the PostgreSQL database. To add this support, PostGIS is installed as a set of helper programs, shared library files, and SQL scripts. If an end user needs PostGIS enabled in a database, but the cloud provider does not support it, then the user has no means of adding geographic object support for that database instance. The same argument can be made about legacy versions of database servers or unpopular database engines: if the cloud provider does not support them, they cannot be used.

With regard to scaling, not all providers might support scaling down database server instances. If, for example, a user provisioned a database instance larger than what was actually needed, the only solution to scaling down might be provisioning a new database instance and migrating the data from the old instance to the new one. Scaling database services out might suffer from a similar issue. If the cloud provider lacks support for multiple server instances serving the same database – this can occur, for example, when a database grows to be bigger than the biggest instance type supported by the cloud provider – then the user has to implement support manually, i.e., by partitioning the data among different database servers.

Depending on the needs of the end user, the premium for using DBAAS might be higher than deploying databases traditionally in IAAS clouds or on in-house machines. Therefore, a cost-benefit analysis must be made prior to choosing one option or another.

Types of Databases

Currently there are two dominant database paradigms: relational (SQL) and the so-called NoSQL databases. Relational databases are based on the relational calculus theory and make use of the structured query language (SQL). Different relational databases implement different dialects of the SQL, but the language itself is standardized, and SQL databases make use of similar terminology. Relational databases provide ACID (atomicity, consistency, isolation, and durability) guarantees:
Atomicity

requires that each transaction happen all at once or not at all – if one part of the transaction fails, then the whole transaction fails – to the outside world, committed transactions appear indivisible;

Consistency

ensures that transactions change the database only between valid states, making sure that all defined rules, such as cascades, constraints, and triggers, are always valid;

Isolation

ensures that one transaction cannot interfere with another and that the concurrent execution of transactions results in the same state that would be obtained if they were executed sequentially;

Durability

ensures that once a transaction has been committed, it will remain committed, even in the event of crashes.

The term NoSQL is newer and relates to databases with different data models, such as tuples (where row fields are predefined in a schema), documents (which allow attributes to be nested documents or lists of values as well as scalers and without the definition of a schema), extensible records (hybrids between tuples and documents), and objects (analogous to objects in programming languages, but without methods) (Cattell 2011). Opposed to relational databases, NoSQL databases provide BASE (basically available, soft state, eventually consistent) properties.
Basically available

meaning that the system guarantees availability, in which every request receives a responses, without guarantee of containing the most recent write

Soft state

means that state can change, even without input, due to eventual consistency;

Eventually consistent

indicates the system will become consistent over time, given it does not receive input during that time.

NoSQL databases became popular due to big Internet companies, such as Amazon, Facebook, and Google, using them for scaling their services. Traditionally, relational databases were considered hard to scale out, particularly due to the ACID guarantees. Hence, companies with millions of users needed better ways to scale their systems. In a way, these database systems were introduced to address the “velocity” and “volume” of the big data Vs. Examples of early successful systems are Dynamo (DeCandia et al. 2007) and BigTable (Chang et al. 2008).

The gains in performance of NoSQL databases over SQL databases come from the BASE properties. By relaxing the ACID properties, NoSQL databases can return from queries faster, allowing for faster responses. Also, the eventual consistency allows for easier load balancing between back-end servers, since the global view does not have to be consistent the whole time. In fact, “customers tolerate airline over-booking, and orders that are rejected when items in an online shopping cart are sold out before the order is finalized” (Cattell 2011), which makes a case for using such databases.

Current DBAAS Offerings

As of the fourth quarter of 2017, all the big cloud providers have a DBAAS offering in their cloud systems, whether relational or NoSQL. Table 1 displays a non-exhaustive list of the offerings of three big cloud providers. As can be seen in the table, various databases with different data models are available, allowing users to select the model that best fits the problem being tackled.
Table 1

Non-exhaustive list of DBAAS offerings of three big cloud providers as of the fourth quarter of 2017. Table sorted by provider name

Provider

Type

Databases supported

Amazon

Relational

Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, Microsoft SQL Server

Amazon

NoSQL

Amazon SimpleDB, Amazon Redshift, Amazon DynamoDB, Amazon Neptune, JanusGraph

Google

Relational

PostgreSQL, MySQL, Cloud Spanner

Google

NoSQL

Cloud Bigtable, Cloud Datastore

IBM

Relational

IBM DB2, PostgreSQL, MySQL

IBM

NoSQL

IBM Cloudant, MongoDB, JanusGraph, Redis, RethinkDB, ScyllaDB

DBAAS and Big Data

Advances in database technologies enabled collection and analysis for big data. As the ideas used by Internet companies to implement scalable database services became popular, application developers started using such technologies to implement faster applications. Today, DBAAS enables companies from various industries to collect, process, and analyze big data in the cloud.

Future Directions for Research

Research in DBAAS is related to the areas discussed previously: relational and NoSQL databases and cloud computing. Therefore, it makes sense that future research directions should be related to a tighter integration between these areas.

Relational database management systems achieve scalability by partitioning data between different database servers, a technique known as “sharding.” One example of how sharding can be done is by splitting the data based on the continent users live in – e.g., the data that belongs to a user from Europe would be placed on a server different from a server that stores data about a user from Asia. When a user migrates between continents, migrating user data between regions would reduce access times but would incur data transfer costs. One interesting study would be to analyze the trade-off between higher latencies for users and the costs of such migrations. Also related to costs, cloud providers could benefit from more accurate cost models that better reflect their fail-over strategies (Mansouri 2017).

In recent years, we have seen database management systems originally written as NoSQL databases adopting SQL as a query language and becoming relational databases (Corbett et al. 2013; Bacon et al. 2017). We have also seen new relational databases being written specifically to work on the cloud (Verbitski et al. 2017). Such systems were designed to overcome the technical limitations of both traditional relational (which were hard to scale horizontally due to the inherent complexity of sharding) and NoSQL databases (which typically only provide eventual consistency), while providing ACID guarantees and without the need of performing explicit sharding. These new relational databases were made generally available to public cloud customers, but currently there seem to be no private cloud alternatives to such database systems. It would be interesting to investigate how cloud software stacks (e.g., OpenStack (Sefraoui et al. 2012)) could integrate these improvements and whether current DBAAS software architectures (as implemented by cloud providers) could accommodate these new designs.

As for the perceived “rigidity” of relational databases and SQL, vendors of database management systems have added support for a JSON data type to their products, resulting in more flexibility in the programming model, as such data types add support for the JSON syntax, allowing for more complex queries without the need for changing the database schema every time a new field is modified (in the JSON columns). Previous research (Chasseur et al. 2013; Tahara et al. 2014) suggests that such integration of SQL and NoSQL properties results in systems with higher performance and better consistency guarantees than only relational or only NoSQL systems, but may need further evaluation on Internet-scale and DBAAS systems.

Further Reading

Hacigumus et al. (2002) describe the design of one of the first web-based DBAAS systems before the term cloud computing was coined. Mell and Grance (2011) and Armbrust et al. (2010) define the term “cloud computing” and discuss the characteristics of the cloud. DeCandia et al. (2007) and Chang et al. (2008) describe two early successful NoSQL database systems. Cattell (2011) examines SQL and NoSQL systems and categorizes NoSQL according to data model while also presenting use cases for each of the defined categories. Abourezq and Idrissi (2016) present an overview of database as a service for big data, reviewing the database solutions offered as DBAAS.

Cross-References

References

  1. Abourezq M, Idrissi A (2016) Database-as-a-service for big data: an overview. Int J Adv Comput Sci Appl (IJACSA) 7(1):157–177Google Scholar
  2. Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2010) A view of cloud computing. Commun ACM 53(4):50–58. https://doi.org/10.1145/1721654.1721672, http://doi.acm.org/10.1145/1721654.1721672
  3. Bacon DF, Bales N, Bruno N, Cooper BF, Dickinson A, Fikes A, Fraser C, Gubarev A, Joshi M, Kogan E, et al (2017) Spanner: becoming a SQL system. In: Proceedings of the 2017 ACM international conference on management of data. ACM, pp 331–343Google Scholar
  4. Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27. http://doi.org/10.1145/1978915.1978919, http://doi.acm.org/10.1145/1978915.1978919
  5. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) 26(2):4CrossRefGoogle Scholar
  6. Chasseur C, Li Y, Patel JM (2013) Enabling JSON document stores in relational systems. In: WebDB, vol 13, pp 14–15Google Scholar
  7. Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, et al (2013) Spanner: Google’s globally distributed database. ACM Trans Comput Syst (TOCS) 31(3):8CrossRefGoogle Scholar
  8. DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper Syst Rev 41(6):205–220CrossRefGoogle Scholar
  9. Hacigumus H, Iyer B, Mehrotra S (2002) Providing database as a service. In: Data engineering, 2002. Proceedings 18th international conference on IEEE, pp 29–38Google Scholar
  10. Mansouri Y (2017) Brokering algorithms for data replication and migration across cloud-based data stores. PhD thesisGoogle Scholar
  11. Mell P, Grance T (2011) The NIST definition of cloud computing. Technical report 800-145, Computer security division, Information technology laboratory, National institute of standards and technology gaithersburgGoogle Scholar
  12. Sefraoui O, Aissaoui M, Eleuldj M (2012) Openstack: toward an open-source solution for cloud computing. Int J Comput Appl 55(3):38–42Google Scholar
  13. Shasha D, Bonnet P (2002) Database tuning: principles, experiments, and troubleshooting techniques. Morgan Kaufmann, AmsterdamCrossRefGoogle Scholar
  14. Stonebraker M, Cattell R (2011) 10 rules for scalable performance in “simple operation” datastores. Commun ACM 54(6):72–80CrossRefGoogle Scholar
  15. Tahara D, Diamond T, Abadi DJ (2014) Sinew: a SQL system for multi-structured data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. ACM, pp 815–826Google Scholar
  16. Verbitski A, Gupta A, Saha D, Brahmadesam M, Gupta K, Mittal R, Krishnamurthy S, Maurice S, Kharatishvili T, Bao X (2017) Amazon aurora: design considerations for high throughput cloud-native relational databases. In: Proceedings of the 2017 ACM international conference on management of data. ACM, pp 1041–1052Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.IBM ResearchSão PauloBrazil

Section editors and affiliations

  • Rodrigo N. Calheiros
    • 1
  • Marcos Dias de Assuncao
    • 2
  1. 1.School of Computing, Engineering and MathematicsWestern Sydney UniversityPenrithAustralia
  2. 2.Inria, LIP, ENS LyonLyonFrance