Parameters tuning of multi-model database based on deep reinforcement learning

As we all know, the performance of database management system is directly linked to a vast array of knobs, which control various aspects of system operation, ranging from memory and thread counts settings to I/O optimization. Improper settings of configuration parameters are shown to have detrimental effects on performance, reliability and availability of the overall database management system. This is also true for multi-model databases, which use a single platform to support multiple data models. Existing approaches for automatic DBMS knobs tuning are not directly applicable to multi-model databases due to the diversity of multi-model database instances and workloads. Firstly, in cloud environment, they have difficulty adapting to changing environments and diverse workloads. Secondly, they rely on large-scale high-quality training samples that are difficult to obtain. Finally, they focus primarily on throughput metrics, ignoring tuning requirements for resource utilization. Therefore, in this paper, we propose a multi-model database configuration parameters tuning solution named MMDTune. It selects influential parameters, recommends the optimal configurations in a high-dimensional continuous space. For different workloads, the TD3 algorithm is improved to generate reasonable parameter adjustment plans according to the internal state of the multi-model databases. We conduct extensive experiments under 5 different workloads on real cloud databases to evaluate MMDTune. Experimental results show that MMDTune adapts well to a new hardware environment or workloads, and significantly outperforms the representative tuning tools, such as OtterTune, CDBTune.


Introduction
In practice, as the world becomes more interconnected, we are witnessing a torrent of digital data with different structures produced by various hardware or software. How to store and manage data of multiple models (Sawadogo & Darmont, 2021) and how to facilitate data interoperation (Braun et al., 2022) become key issues. Multi-model databases (Płuciennik & Zgorzałek, 2017) become a feasible and burgeoning solution, which can be understood as a database that can store data in different formats (relational, document, graph, object, etc) under one management system.
The performance of a multi-model database depends mainly on hundreds of tunable knobs that control many aspects, such as memory allocation, I/O optimization, query planning overhead, and other behaviors (Gordon-Ross & Vahid, 2007). Due to the diversity of workloads and the flexibility of the environment, multi-model databases are often not in the best state, and even deteriorate. So it is not possible to rely on a few experienced database administrators (DBAs) to set appropriate knob configurations. Most existing database automatic tuning studies rely on search-based algorithms and learning-based algorithms to recommend knobs. However, they are less able to adapt to the changing environment and more diverse workloads in the cloud or rely on large-scale high-quality training samples that are difficult to obtain. Moreover, as multi-model database is capable of managing multiple data models at the same time, it can implement CRUD operations on various data models and complex cross-model transactions. However, existing benchmarking platforms focus on relational databases and single data schema NoSQL stores (Davoudian et al., 2018;Huang et al., 2017), which make tuning significantly limited. This is also a challenge, with two main aspects: the complexity of the parameters and the heterogeneity of multi-model database workloads. For example, workloads of multi-model database are more diversified than traditional databases, and workloads may contain retrieval of documents, as well as operations on graph data. Meanwhile, some of the existing database benchmarks do not support long-term stress testing, and they only consider using execution time to evaluate performance and lack other metrics, such as latency.
To solve the above problems, enrich the research on knob tuning of multi-model database, and explore the feasibility and effectiveness of tuning methods on multi-model database, we propose a performance tuning solution of multi-model database based on deep reinforcement learning called MMDTune. It consists of three parts, which are the multiround sensitivity analysis method (Borgonovo & Plischke, 2016;Sobol, 2001;Zadeh et al., 2017), the knob tuning algorithm based on improved TD3 (Fujimoto et al., 2018;Dankwa & Zheng, 2019) and the benchmarking platform oriented to the multi-model databases. Among them, multi-round sensitivity method is used to select the configuration parameters that have a significant impact on the metrics. The tuning algorithm based on improved TD3 is used to recommend knob parameters for multi-model databases. It interacts with the real environment of the multi-model database so that it can be tuned without prior preparation of training samples. Moreover, using the trial-and-error strategy can make the interaction samples more diverse and increase the possibility to find the optimal configuration. The benchmarking platform includes various workloads and can collect performance metrics. In addition, the benchmarking platform introduces Prometheus (Prometheus Team, 2022) to collect the performance metrics of the multi-model database, so that it not only accurately evaluates the throughput of the multi-model database, but also monitors the resource utilization rate of the system in real time.
The contributions of our work are summarized as below: (1) Extended Sobol method is used to carry out multi-round sensitivity analysis on the tunable knobs to extract the key parameters, so as to reduce the size of the network search space and effectively avoid over fitting.
(2) An improved TD3 algorithm is proposed, and the effectiveness of the algorithm is verified on the benchmarking platform proposed for different tuning tasks, operating environments and tuning objects. The experimental results show that MMDTune can recommend optimal configuration schemes under different scenarios. At the same time, compared with the existing database tuning methods OtterTune (Van Aken et al., 2017) and CDB-Tune (Zhang et al., 2019), MMDTune is able to further improve performance, namely gives a higher throughput and lower resource utilization.
The rest of this paper is organized as follows. Section 2 introduces the related work. Section 3 introduces the overall framework and components of MMDTune. In Section 4, the details and results of the experiments are described. At last, the conclusion is made and the prospect is put forward.

Related work
There are two classes of representative studies in DBMS configuration tuning. As shown in Table 1, they are the search-based approaches and the learning-based approaches.

Search-based approaches
IBM DB2 has released a self-tuning memory manager (Storm et al., 2006;Tian et al., 2003) that combines runtime simulation modeling with cost-benefit analysis to efficiently allocate memory to the DBMS's internal components using a heuristic approach. BestConfig (Zhu et al., 2017) divides the high-dimensional knob space into several subspaces, and uses heuristic methods to search for the optimal configuration from the history records, so as to realize recommending optimal configuration under the condition of limited resources. Tran et al. (2008) uses linear and quadratic regression models to conduct buffer adjustment, which can optimize buffer partitions, ensure fair buffer recovery, and dynamically adjust allocation when workloads change. Wei et al., (2014) proposes a performance tuning framework that can generate rules and use those rules for tuning. Similarly, D-Tunes (PN et al., 2013) provides a tuning solution for distributed database storage that uses an analysis model to capture the relationship between workloads and database performance and introduces self-tuning algorithms to accommodate workload changes over a short time horizon. Search-based approaches 13,14,15,16,17,18 Using rules or heuristics to search for the best database parameter configuration.
Learning-based approaches 11,12,19,20,21,31,32 Learning the mappings between parameter combinations and target data to recommend the configuration for optimal database performance.
The rule-based approach above has some limitations: in the case of many parameters and large state space, the tuning process needs to test a very large number of samples, which is very inefficient, and the end result may fall into a local optimal situation. Duan et al. (2009) introduces iTuned, the first tool for database knobs tuning using predefined experiments. iTuned uses statistical methods to find the most influential knobs, and establishes the Gaussian process response surface model for automatic configuration adjustment. Researchers from Carnegie Mellon University develop an automatic parameter adjustment tool OtterTune (Van Aken et al., 2017). It builds a ML (Machine Learning) model by maintaining a knowledge base accumulated in a previous tuning process, and by capturing the response of the database system to different parameter settings, it recommends the setting of the knobs. Basu et al. (2016) proposes a learning method to adjust database performance. It learns the cost model through reinforcement learning, and models the execution of query and updates as a Markov decision process. Its state is database configuration, action is configuration change, and return is a function of configuration change cost and query and updates evaluation. However, this method is only proved to be feasible in index tuning, and whether it is suitable for other aspects of database configuration needs further study. In Van Aken et al. (2017), researchers design an end-to-end cloud database automatic adjustment system CDBTune using deep reinforcement learning. It uses DDPG (Wu et al., 2018;Fekry et al., 2020) algorithm to find the optimal configuration for cloud database in high-dimensional continuous space. Then, Li et al. (2019) proposes a database tuning system QTune in query dimension. Similarly, the system combines reinforcement learning with neural network, and adds a predictor on the basis of DDPG to predict the changes of external metrics before and after query processing, which finally proves the effectiveness of the model. In Zhang et al. (2021), an improved version of CDBTune + has been released. Compared with the original one, a big improvement in this paper is the use of Prioritized Experience replay (Schaul et al., 2015) in the tuning process, which speeds up the convergence of model training and greatly improves the efficiency of tuning. However, DDPG algorithm has the problem of over estimation in the process of training. Van Aken et al. (2021) conducts a comprehensive evaluation of ML-based DBMS knob tuning methods in an enterprise database application, and it is verified that GPR, DNN and DDPG can be effectively applied to the knob tuning work and their differences in this scenario. The validity of the learning-based algorithms for the database are tested. However, the applicability and effectiveness of the above methods in multi-model databases have not been verified.

Learning-based approaches
The automatic tuning of database has achieved some research results, and the quality of tuning is gradually improving. However, due to the complexity and diversity of multi-model databases, the related research is not abundant. In addition, the methods based on ML do not perform well in high dimension continuous space. Although the methods based on deep learning has certain ability to understand and recommend configurations, it is one-sided and inefficient to some extent.

Framework of MMDTune
To realize the automatic operation after the tuning target is determined, we propose a knob tuning solution named MMDTune for multi-model databases. Figure 1 shows the overall framework, which consists of three parts: the multi-round sensitivity analysis method, the parameter tuning algorithm and the benchmarking platform. Among them, the multi-round sensitivity analysis method is to explore important parameters related to the tuning task through the iterative Sobol method. The tuning algorithm based on improved TD3 is used to recommend optimal configuration. The benchmarking platform is used to generate workloads and collect performance metrics from the multi-model database system. During the whole tuning steps, the benchmarking platform provides metrics data for the multi-round sensitivity analysis and knob tuning algorithm simultaneously. The first step of program execution is to receive the specified tuning tasks, including the workload and the performance metrics to be optimized. Next, a corresponding file repository is generated to store the metrics collected by the benchmarking platform and the network parameters in the tuning algorithm. In the initial stage of automatic tuning, the multi-round sensitivity analysis will calculate the sensitivity of each parameter for a specific tuning task to reduce the size of the search space. Once these parameters are identified, the tuning algorithm begins to build a strategy and value network based on deep reinforcement learning to explore the optimal configuration. Benchmarking again plays an important role at this stage. It will serve as the middleware, connecting to the running environment at one end. The other end is connected to the agent in the algorithm, which is used to provide external state data for the agent. For different multi-model databases, we can extend the interfaces in the benchmark to guide experiments and recommend the best parameters.

Identifying important parameters
Parameters have a significant impact on overall performance (Lu & Holubová, 2019). Trying to adjust parameters that have no effect is not only a waste of resources, but can lead to over-fitting results. Therefore, in the initial stage of tuning, parameters that are positively or negatively correlated with performance should firstly be found, and these parameters should be tuned and trained to achieve better learning effects and more efficiency. MMD-Tune combines sensitivity analysis method to investigate how the variation in execution cost (for example, execution time) of a multi-model database is attributable to different configuration parameters. The relationship between them is not simply linear. There are dependencies between some of the multi-model database's knobs, so changing one parameter may affect the other. For example, in OrientDB , the maximum heap space and disk cache are tunable parameters. In theory, increasing the heap cache and disk cache will improve the performance of running the multi-model database, but if their sum is too high, it will cause a huge slowdown. Based on the above two points, MMDTune uses a global sensitivity analysis method to select important parameters.
Traditional sensitivity analysis method typically involves running intensive off-line benchmarks with many different configuration values and constructing a set of influencing parameters by analyzing the performance differences caused by each configuration parameter. Not only is it expensive to apply this approach directly, but it also takes hundreds of executions, which can be quite time consuming. Therefore, MMDTune uses an iterative Sobol method to find key parameters that have a significant impact on performance metrics. In each iteration, the approach at first uses Monte Carlo method to sample in the parameter space of the multi-model database. It then combines the large amount of sampled data and configures them separately into a multi-model database. The benchmarking platform is then used to execute specific workloads and the resulting measurements are used to calculate the corresponding sensitivity metrics. The principle of the Sobol method is to assume that the variance of the model output is the sum of the variances of a single parameter and the combination of each parameter. Therefore, for a configuration parameter p i in multi-model database, its first-order sensitivity is expressed as the ratio of the variance of the feature to the total variance,the calculation is shown in (1).
In order to obtain the relationship between the database parameters, it can also be obtained by calculating the higher-order sensitivity. The calculation method is as (2). Among them, S p 1 ,p 2 ,...,p k is the k-order sensitivity. In multi-model database, in addition to a single configuration parameter that affects performance, the relationship between other parameters is mostly expressed as two parameters working together to bring changes to database performance.
Therefore, MMDTune focuses on the first-order and second-order sensitivity indicators of each parameter. In each round, the first-order sensitivity corresponding to each parameter is sorted from high to low. When the variance from top-k to top-(k + 1) decreases significantly (i.e., S top−k − S top−(k+1) > S top−(k+1) ), we choose the parameter with high sensitivity as the key parameter. In addition, when the second-order sensitivity between the two parameters is greater than 0.5, they are also included in the selection range.

Multi-model database parameter tuning algorithm based on deep reinforcement learning
To simulate the try-and-error method that the DBAs adopt and overcome the shortcoming caused by regression, we introduce reinforcement learning which originates from the method of try-and-error in animal learning psychology and is a key technology to solve NPhard problems of database tuning in continuous space (Zhang et al., 2019). Therefore, it is a reasonable choice to combine deep reinforcement learning to find the reasonable knobs for multi-model database.
After determining the parameters to be adjusted, MMDTune will adjust the value of each parameter based on the idea of TD3 algorithm. As shown in Fig. 2, the algorithm is mainly composed of environment and agent. The detailed Actor-Critic network and parameters of TD3 are shown in the Table 2.
Among them, the environment is a multi-model database cluster, which constantly interacts with the agent to provide quantifiable internal state and network training data. The agent is composed of Actor and Critic, which are two independent deep neural networks. The task of the Actor network is to map the observed internal state of the multi-model database to a set of parameters to maximize the cumulative reward, that is, it takes the internal state of the multi-model database as an input and can output a vector composed of parameter values. The Critic network takes the internal state and configuration parameters of the multi-model database as input and outputs a Q value that reflect whether or not the action output by Actor is valid. The agent in the initial stage is a model without knowledge, it learns through a series of fine-tuning actions. As it becomes more experienced in configuration parameters  and performance, it will recommend the optimal configuration parameters for a multi-model database. The detailed description of each part is as follows: 1) State space S: The internal state obtained when the multi-model database cluster finishes executing the workload is s, that is, counter information. For example, in Ori-entDB, counter information includes global information at the server level and count information at the session level. 2) Action space A: Suppose, after multi-round sensitivity analysis, the selected set of parameters is P = {p 1 , p 2 , . . . , p m }, where m is the number of key parameters. So, action a is a set of these parameter values, expressed as {c 1 , c 2 , . . . , c m }, where c i is the value of parameter p i . 3) Reward function R: For parameter tuning problems in a multi-model database, the reward is used to reflect the performance changes before and after the new knobs configuration recommended by MMDTune. So the reward needs to consider three aspects: a) it can provide valuable feedback on the performance of systems; b) it can provide accurate evaluation of knobs tuning with the maximum probability for the RL network; c) multiple metrics of the system performance can dynamically assign different weighting matrix to indicate different importance. Formally, the performance index of the database is expressed as M = {m 1 , m 2 , m 3 }, where m i corresponds to throughput, CPU and memory utilization respectively. It is assumed that the measured index values at time t are Y t = {y t,1 , y t,2 , y t,3 }. Here, y t,i corresponds to the value of m i . In particular, y 0,i is the index value in the default configuration. Since in a multi-model database environment, for a given workload, the system must pay some cost to execute it, y t,i is always positive. In order to make the difference between positive and negative rewards to distinguish good or bad actions, the specific calculation process of rewards is as follows.
The essence of the optimization problem is to find the configuration parameters that make the throughput of the multi-model database as high as possible and the resource utilization as low as possible. Firstly, we calculate the changes in the initial and last time based on performance metric. The external metrics of the initial time are y 0,i , and the external metrics of the last time are y t−1,i , so the performance difference between the current moment and the initial time, and between the current moment and the previous moment is calculated according to (3) and (4) respectively.
We combine these two index differences into (5) to get the reward of m i .
There are three cases. Firstly, the reward will be negative if the current tuning system recommends knobs to the system with worse performance than the default knobs. Secondly, the reward will be positive if the current tuning system recommends knobs to the system with better performance than all the knobs previously recommended. Finally, if the knobs recommended by the current tuning system is better than the default, but not as good as the historically optimal knobs, the reward is 0.
Let us note that different tuning tasks may choose different tuning metrics (type or quantity), for example, throughput and latency can be tuned at the same time. Therefore, the tuning system in this paper assigns a weight coefficient w i to the tuning indicator to indicate the direction of tuning preference, so that the tuning system can simultaneously tune multiple indicators. Then, the final total reward can be expressed as (6) below: If the goal of the optimization is throughput, our reward function does not need to change, because the reward function is independent of changes in the hardware environment and workload and depends only on the optimization goal. Therefore, the reward function needs to be redesigned only when the optimization goal changes. Algorithm 1 describes the specific flow of the tuning algorithm. To find the optimal strategy, we start with an arbitrary strategy μ. Before the iteration, the initial state of the multi-model database is needed, which is the internal state and external metrics of the multi-model database after the workload is executed in the default configuration. Unlike reinforcement learning in general, the multi-model database is configured so that its transition from one state to another is deterministic. Therefore, there is no need to re-measure at the beginning of each tuning cycle.
In addition, taking a random sample from the experience replay pool in a uniformly distributed manner leads to a low probability of obtaining useful data, leading to some meaningless iterations. Therefore, the tuning algorithm combines the prioritized Algorithm 1 Parameter tuning algorithm based on TD3. experience replay to train Actor and Critic, where Actor updates the weight of its neural network according to Q value, and uses deterministic strategy gradient iteration to calculate the optimal strategy. Critic updates the weight of its neural network based on the reward value.
The traditional TD3 algorithm always targets the minimum between two estimates when updating the Critic network. This update rule does not introduce any additional overestimation risk as traditional Q-Learning does, but it can also lead to underestimation bias. While underestimation does not spread during the learning process, it can have some negative performance effects. Therefore, in order to reduce overestimation while minimizing the negative effects of underestimation, the tuning algorithm uses a positive parameter α (α < 1) to mix the minimum and maximum output of the two Critic target networks to update the target, rather than just using the minimum Q value.

Benchmarking platform
To meet the requirements of performance monitoring in the process of multi-model database tuning, we propose a benchmarking platform for multi-model database and integrate it in MMDTune. As shown in Fig. 3, it adopts a multi-layers structure, which is mainly divided into five parts: infrastructure layer, data storage layer, message transmission layer, workload implementation layer and metrics collection layer.
For infrastructure layer, in essence, it is a computer cluster or cloud computing environment, which provides hardware foundation or virtual machine running environment for multi-model database.
Data storage layer consists of various NoSQL stores. Two well-known multi-model databases, ArangoDB and OrientDB, have been integrated with MMDTune.
In order to benchmark the different databases fairly, the core is the messaging transmission layer. To simulate the real situation of streaming data transmission, the messaging mechanism used is Apache Kafka (Dunning & Friedman, 2016). MMDTune uses Kafka to interact with a variety of multi-model databases for a variety of workload operations. When a user needs to extend a new multi-model database in the platform, the approach is to implement the corresponding services according to the standard interface approach.
In workload implementation layer, the most important aspect is to evaluate a multi-model database as comprehensively as possible. More specifically, it provides four parameters for generating workloads: multi-model database operations, how data requests are distributed, number of threads, and operands, enabling dynamic generation of variable workloads. As shown in Table 3, the platform implements the following multi-model database operations for generating workloads: 1. Inserting operation. It writes data of different models to the database, including documents, graphs, and key/values. 2. Joining query across models. The feature of multi-model database is that it can manage multiple data models at the same time, so join query is the most important function of (3) Join data from JSON and sub-graph multi-model database. Querying in a single statement by joining different data models realizes its cross-model characteristics. 3. Shortest path query. Both ArangoDB and OrientDB provide a shortest path query statement that can directly retrieve all shortest paths between two nodes. 4. Aggregating query. This operation aggregates information from multiple records using the aggregation functions unique to the multi-model database. 5. Updating/deleting records. The platform implements more workloads, such as updating documents, deleting records, and so on.
In different application scenarios, data access always meets a certain distribution mode. For example, on news sites, the most recently published news are more likely to be searched and visited. On platforms like MicroBlog, the higher the traffic to an item, the easier it is to retrieve it, regardless of freshness. Therefore, in order to achieve the fidelity of simulated workloads, different data request distributions are introduced in the design of workloads, including Zipfian, Poisson, Uniform, and Latest. Each distribution pattern determines which records to retrieve or which data to insert into the database. In particular, Zipfian and Poisson attributed data are selected according to Zipfian's law and Poisson distribution respectively. Uniform means to read data with equal probability. Finally, in the Latest distribution, the probability of data being accessed is closely related to the order in which it is inserted, that is, the most recently inserted record becomes the most popular, while previously popular data becomes less popular.
In the indicator collection layer, Prometheus is selected as the fine-grained performance indicator mechanism. It allows us to obtain the resource consumption of the machine over a specific period of time through a simple expression. It uses carefully designed data structures and algorithms to achieve very low per-node overhead and high concurrency, so that it has little impact on the machine. Therefore, this study used a series of functions provided by Prometheus to obtain the desired measurements indirectly. Table 4 lists the corresponding calculations for CPU and memory.
Existing benchmarking tools, such as YCSB (Cooper et al., 2010;Matallah et al., 2017), provide throughput and other metrics. Throughput reflects the number of operations processed by the database system in a fixed amount of time. This performance metric is also added to the platform. Table 4 also lists the calculation methods of throughput.

Experiments and result analysis
To verify the effectiveness and adaptability of MMDTune, we take OrientDB as the specific research object and apply it to different experimental scenarios to carry out tuning experiments. MMDTune is similarly and easily applied to other multi-model databases. The performance changes of OrientDB are tested by setting different workloads, tunable parameters, optimization objectives and operating environments. Then, it is compared with the existing works, and the tuning effect of MMDTune is investigated through various experiments. Finally, the method is extended to ArangoDB for experiments to verify that the method can be effectively applied to other multi-model database tuning objects.

Experimental environment
The experimental environment consists of four Ali cloud servers, one of which is the client node, and the other three servers are used to build OrientDB cluster. Their hardware and software versions and configurations are completely consistent, as shown in Table 5.  Fig. 4 An example of a multi-model dataset

Experimental dataset
To evaluate the performance of multi-model databases, we need to generate and use largescale multi-model data.
MMDTune uses seed datasets from Unibench (Zhang et al., 2018), and generates largescale multi-model data. It simulates a scenario combining social network with e-commerce, and contains four data models (key-value, document, graph, relationship) that can be supported by OrientDB and ArangoDB. Figure 4 shows an example of each entity and the relationships between them. The customer is the core of this dataset, and most other entities are related to it. For example, the relationships between customers form a social network, and the publishing relationships between customers and posts form another network. Orders are document-type data that contains an embedded array of the order row records within an order data. The product information in the order record together with the customer information forms the key to the feedback data and is used to indicate the customer's rating of the product purchased.

Important parameter identification experiment
This section starts with different tuning goals and uses multiple rounds of sensitivity analysis to identify critical parameters. Here, without loss of generality, we take the multi-model database operation Q1 as example (See Table 3). According to OrientDB's actual situation, there are some knobs that don't need to be considered, including those that are obviously not directly related to performance (such as pathnames) or those that are not allowed to be tuned (which can cause serious problems), so the experiment ended up with 187 adjustable sorted knobs. For each tuning target, 3 rounds of sensitivity analysis are performed separately.

Identification of important parameters related to throughput
Throughput means the number of operations the database can handle per second. For the throughput task, the parameters shown below are finally filtered, and Fig. 5 lists the firstorder sensitivities of these parameters.
As shown in Fig. 5, the first-order sensitivities of "query.parallelMinimumRecords" and "query.parallelResultQueueSize" are not high, but the second-order sensitivities formed by them and query.parallelAuto are 0.726354 and 0.702862 respectively, so they are also included in the optional range. In addition, among these parameters, the database connection pool and the number of concurrent sessions have a greater impact on the throughput. This is because OrientDB needs to establish a communication session between the client and the server through a remote connection when executing the workload. At the same time, the workload in the experiment consists of the order of 100000 operations multi-model database operations and multi-threads. Increasing the number of concurrent sessions and the size of the connection pool causes establishing multiple database connections at the same time, thereby reducing the execution time. Creating too many sessions can also stress the system and slow down operations.

Identification of important parameters related to memory
Memory tuning is the process of determining optimal cache parameter values for the multimodel database OrientDB. For example, OrientDB with a large working set benefits from a large cache; a small cache incurs excess cost as data is swapped in and out. Conversely, OrientDB with a small working set benefits from a small cache; a large cache is a waste of resources due to the high cost of each fetch and unnecessary static power. Therefore, in this experiment, with memory as the tuning task, the important parameters shown in Fig. 6 are selected. Fig. 6 Multi-round sensitivity related to memory As can be seen from Fig. 6, there are some settings in OrientDB that can make it run on systems with limited resources, and they are mainly concentrated in the two areas of cache and log. Among them, "storage.diskCache.bufferSize" has the highest first-order sensitivity among all parameters, which means that it plays an important role in OrientDB's memory tuning.

Identification of important parameters related to the CPU
When the number of threads set in the workload is too high, OrientDB may take up to 100% of the CPU, which puts a serious burden on the system. If there are other running programs, it will cause multiple applications to compete for the CPU, resulting in extremely slow operation or even crash. Therefore, it is very important to try to reduce the CPU utilization of the storage system for applications with limited operating environment. The user cannot really reduce the CPU usage of OrientDB by modifying the configuration parameters, but one can reduce the number of threads running in parallel. This is also verified by the execution results of multiple rounds of sensitivity analysis. Table 6 lists the configuration parameters related to the CPU.
In the experiment, due to the small variance of the measured CPU utilization, the importance of each parameter is high. In theory, when "environment.concurrent" is set to false, the database will turn off its internal lock management, so that the multi-model database OrientDB executes in a single-threaded environment, which can reduce the CPU utilization of the system. The parameter "distributed.dbWorkerThreads" has an important impact on both throughput and CPU utilization of OrientDB. Analysing it, one can find that this parameter is mutually exclusive for different tuning goals. Increasing the number of parallel worker threads will inevitably increase the CPU utilization of the database and thus shorten the execution time. On the contrary, it will reduce the CPU used by the system and prolong the running time. Therefore, one cannot meet the demands of increasing throughput and reducing resource cost at the same time.

Tuning experiment and analysis
To verify that MMDTune can adapt to different tuning tasks, we use the workloads shown in Table 7. Five workloads involve typical multi-model database operations, including singlemodel read-only, cross-model join queries, and a combination of read-write operation, which can fully test the tuning effect under different workloads to avoid the contingency of tuning results.
To verify the effectiveness of the tuning algorithm and the multi-round sensitivity analysis method, the experiment uses the random method and the multi-round sensitivity analysis method to select the tunable parameters respectively, and performed tuning operations on different workloads with throughput as the tuning objective. At the same time, as the reinforcement learning algorithm explores the parameter space, and in order to find the optimal configuration more likely, random noise is added to the output of the model to enhance its exploration ability. In order to obtain the correlation between the number of tuning steps and tuning results, we observe the performance of the system in increments of five tuning steps. The experimental results are shown in Fig. 7. Figure 7(a)∼(e) respectively show the optimal performance of different workloads after different number of tuned steps, where the horizontal coordinate represents the number of tuned steps and the vertical coordinate represents the maximum throughput after tuning with MMDTune. When the number of steps is 0, it indicates the performance of OrientDB under the default configuration. Curve If means using the multi-round sensitivity analysis method to select adjustable parameters, and Curve RC means adjusting the knobs of random selection.
Firstly, for all workloads, OrientDB performs better than the default configuration after five tuning steps, indicating that the tuning algorithm can learn from past experience and achieve high efficiency. As the number of tuning times increases, the agent can fine-tune action to gradually adapt to the current workload, thereby continuously improving the system throughput. Secondly, if we accept longer tuning times, users will get better configuration to achieve higher performance. However, as the number of tuning steps continues to increase, the throughput gain does not gradually increase, but tends to stabilize. This is because the prioritized experience replay method is used in the algorithm, which leads to the fast convergence speed of the algorithm. Thirdly, by comparing the two curves of If and RC in Fig. 7, the two parameter selection strategies have the same performance trend and can achieve better performance, but the multi-round sensitivity analysis method has much higher performance gain than the random method. For example, by looking at the If curve in Fig. 7(a), it can be found that the throughput of OrientDB increased by 30% when the tunable parameter was selected using multi-round sensitivity analysis, while the throughput of OrientDB increased by only 11.42% when the random knob was adjusted. This is because the former needs to extract the parameters with high correlation, which can help the learning of the neural network very effectively. In contrast, randomly selected parameters may have little effect on the performance of OrientDB, or combine with other parameters to create complex relationships between database performance. As a result, it can lead to longer study times and less noticeable improvement in grades. According to the above conclusions, the validity of multi-round sensitivity analysis is verified, and it can truly select the parameters related to performance. Assume that the running environment of the application system is a server with a small amount of kernel and memory. Then, users will want to limit the resource utilization of the database system. By default, OrientDB will try to use as much memory as possible, which can easily cause the database to fail or even crash. Here, W3 is used as the workload to perform tuning, which includes both the read-write operation of the database and the connection query of the multi-model database, which can effectively avoid the particularity of tuning. Figure 8 describes the changes of resource utilization after different training steps. Similar to the throughput task, the resource degradation of the system tends to be stable as the number of tuning times increases. Compared to database performance in standard configuration, memory utilization of OrientDB decreased by 36.88%, while CPU utilization was reduced by two-thirds. Thus it can be seen that MMDTune can effectively act on the combination of metrics. This is due to the reward function in reinforcement learning, which fully considers the importance of each metric and makes the tuning results meet the target requirements as far as possible.
In a cloud environment, different users have their own database memory size and disk capacity. As models migrate to different hardware environments, the knowledge about disk size, memory size, and computing power needs to be updated. Therefore, it is a challenge to adapt to the new hardware environment. To verify that MMDTune can greatly optimize the performance of the multi-model database for different hardware configurations, this experiment optimizes OrientDB under different hardware environments. Table 8 shows the two operating environments used in the experiment. Table 9 lists the tuning results of OrientDB running different workloads in both hardware environments. Looking at both configurations, instance B is better than instance A in every respect. Therefore, the performance of the database on instance B is better. As can be seen from Table 9, MMDTune performs better when instance B is used as the running environment. For example, for workload W3, in environment B, the database throughput increased by 71.22%.  In environment A, the increase was only 65.08%. This is because the policy network has a lot of configuration space to explore in the running environment B, and there is more room for performance improvement. In general, the experimental results have verified that MMDTune can adapt to different hardware environments, and a better configuration is recommended.
To verify the efficiency of MMDTune, the online tuning efficiency of OtterTune and CDBTune are compared. Since CDBTune does not process database parameters, the method of multi-round sensitivity analysis is adopted in the experiment to select adjustable parameters. In addition, because the benchmark tools cannot generate workloads suitable for multi-model databases, our proposed multi-model database benchmark platform is integrated into the system to support the tuning of OrientDB.
By blindly reducing the cache configuration and the number of working threads, we can limit the memory consumption and reduce the CPU utilization, but this will sacrifice the execution speed of the system. This is not reasonable in modern application system. Users hope that the multi-model database can not only meet certain throughput requirements, but also make use of system resources as little as possible. In most cases, only when the multimodel database system makes full use of CPU, can it achieve faster execution speed, and cannot meet the requirements of high throughput and low CPU utilization at the same time. Therefore, the throughput and memory utilization are set as the tuning goals. When the rewards stabilize, tuning stops. The final experimental results are shown in Fig. 9 below. Among them, OtterTune uses 1000 pieces of tuning data stored in the MMDTune tuning process as training data.
According to Fig. 9, all tuning methods achieve better performance than the default configuration. MMDTune has the best tuning effect. For workload W1, the throughput obtained using MMDTune is 7.35% higher than CDBTune, and the memory utilization is reduced by 11.54%. The reasons are as follows: firstly, CDBTune adapts DDPG algorithm to recommend configuration, which has the problem of overestimation when updating the network, resulting in cumulative error. On the basis of TD3 algorithm, it effectively alleviates the above problems by mixing two Critic network estimates. Secondly, CDBTune relies on the policy network every time it selects parameters. However, at the initial stage of training, the  . 9 Experimental results of different tuning methods policy network can only recommend parameter values according to the local optimal solution, thus limiting the exploration space of CDBTune. So the tuning effect of CDBTune is worse than that of MMDTune. But because CDBTune can effectively learn from past experience, it has better tuning capabilities than OtterTune. Of all the methods, OtterTune achieves the lowest performance gain. Similarly, for workload W1, the database throughput is increased by 16.21% with MMDTune compared to OtterTune, and the memory utilization is reduced by 16.55%. This is because OtterTune uses Gaussian processes to map configurations. Although it can learn from history, this regression model is still too simple to explore new knowledge to refine itself compared with neural network, so the performance gain is very limited. The deep reinforcement learning method enables the neural network to simulate the human brain, learn in the direction of optimization, and recommend reasonable parameter settings corresponding to the current workload and hardware environment. Different multi-model databases have different system parameters, including different meanings, types, names and value ranges. To verify that MMDTune can be effectively applied to different multi-model databases, ArangoDB is also used to verify the performance of MMDTune. To avoid the contingency of tuning results, throughput, memory and CPU utilization are taken as optimization objectives, and W3 is used as workload. The MMD-Tune is compared with the tuning results of OtterTune and CDBTune, and the experimental results are shown in Tables 10 and 11. From Table 10, when workload W3 is executed, ArangoDB's throughput increases by 126.15% with MMDTune compared to the default configuration, and from Table 11, memory utilization and CPU utilization decrease by 41.57% and 31.33%, respectively. Because it is not limited by the throughput, the memory and CPU utilization of ArangoDB are significantly reduced by the three tuning methods. This is because ArangoDB provides a large number of parameters for modifying buffers and worker threads, so the algorithm has a good chance to adjust resource utilization. By comparing the tuning effects of different methods in the table, it can be seen that MMDTune always achieve better performance. In summary, MMDTune can efficiently adapt to different multi-model database systems while maintaining relatively good performance.

Conclusion
As more and more applications are proposed to deal with multi-model data, the task of managing and tuning multi-model databases becomes very important. In practice, the research of multi-model database tuning based on deep reinforcement learning is not only helpful to adjust parameter values according to different tuning objectives and workloads, but also helpful to set the best parameter configuration for the database under different hardware environments to improve the stability and reliability of the system. To solve the problem of parameter selection and adjustment in the process of tuning, we propose a parameter tuning method MMDTune for multi-model database, which can recommend excellent configuration scheme in complex environment. It uses multi-round sensitivity analysis and improved TD3 algorithm to improve the tuning results of the database. At the same time, it uses the benchmarking platform of the multi-model database to generate the workload and collect the performance metrics to meet the performance monitoring requirement during the tuning process. We modify workloads, optimizing targets, and running environments to carry out the tuning experiment, and the results showed that MMDTune has a strong adaptability regardless of how the tuning tasks change.