Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA

WEKA is a widely used, open-source machine learning platform. Due to its intuitive interface, it is particularly popular with novice users. However, such users often ﬁnd it hard to identify the best approach for their particular dataset among the many available. We describe the new version of Auto-WEKA , a system designed to help such users by automatically searching through the joint space of WEKA’s learning algorithms and their respective hyperparameter settings to maximize performance, using a state-of-the-art Bayesian optimization method. Our new package is tightly integrated with WEKA, making it just as accessible to end users as any other learning algorithm


The Principles Behind Auto-WEKA
The WEKA machine learning software (Hall et al., 2009) puts state-of-the-art machine learning techniques at the disposal of even novice users. However, such users do not typically know how to choose among the dozens of machine learning procedures implemented in WEKA and each procedure's hyperparameter settings to achieve good performance.
Auto-WEKA 1 addresses this problem by treating all of WEKA as a single, highly parametric machine learning framework, and using Bayesian optimization to find a strong instantiation for a given dataset. Specifically, it considers the combined space of WEKA's learning algorithms A = {A (1) , . . . , A (k) } and their associated hyperparameter spaces Λ (1) , . . . , Λ (k) and aims to identify the combination of algorithm A (j) ∈ A and hyperparameters λ ∈ Λ (j) that minimizes cross-validation loss, 1. Thornton et al. (2013) first introduced Auto-WEKA and empirically demonstrated state-of-the-art performance. Here we describe an improved and more broadly accessible implementation of Auto-WEKA, focussing on usability and software design. test . We call this the combined algorithm selection and hyperparameter optimization (CASH) problem. CASH can be seen as a blackbox function optimization problem: determining argmin θ∈Θ f (θ), where each configuration θ ∈ Θ comprises the choice of algorithm A (j) ∈ A and its hyperparameter settings λ ∈ Λ (j) . In this formulation, the hyperparameters of algorithm A (j) are conditional on A (j) being selected. For a given θ representing algorithm A (j) ∈ A and hyperparameter settings λ ∈ Λ (j) , f (θ) is then defined as the cross-validation loss 1 Bayesian optimization (see, e.g., Brochu et al., 2010), also known as sequential modelbased optimization, is an iterative method for solving such blackbox optimization problems. In its n-th iteration, it fits a probabilistic model based on the first n−1 function evaluations i=1 , uses this model to select the next θ n to evaluate (trading off exploration of new parts of the space vs exploitation of regions known to be good) and evaluates f (θ n ). While Bayesian optimization based on Gaussian process models is known to perform well for low-dimensional problems with numerical hyperparameters (see, e.g., Snoek et al., 2012), tree-based models have been shown to be more effective for high-dimensional, structured, and partly discrete problems (Eggensperger et al., 2013), such as the highly conditional space of WEKA's learning algorithms and their corresponding hyperparameters we face here. 3 Thornton et al. (2013) showed that tree-based Bayesian optimization methods yielded the best performance in Auto-WEKA, with the random-forest-based SMAC (Hutter et al., 2011) performing better than the tree-structured Parzen estimator, TPE (Bergstra et al., 2011). Auto-WEKA uses SMAC to determine the classifier with the best performance on the given data.

Auto-WEKA 2.0
Since the initial release of a usable research prototype in 2013, we have made substantial improvements to the Auto-WEKA package described by Thornton et al. (2013). At a prosaic level, we have fixed bugs, improved tests and documentation, and updated the software to work with the latest versions of WEKA and Java. We have also added four major features.
First, we now support regression algorithms, expanding Auto-WEKA beyond its previous focus on classification (starred entries in Fig. 1). Second, we now support the optimization of all performance metrics WEKA supports. Third, we now natively support parallel runs (on a single machine) to find good configurations faster and save the N best configurations of each run instead of just the single best. Fourth, Auto-WEKA 2.0 is now fully integrated with WEKA. This is important, because the crux of Auto-WEKA lies in its simplicity: providing a push-button interface that requires no knowledge about the available learning algorithms or their hyperparameters, asking the user to provide, in addition to the dataset to be processed, only a memory bound (1 GB by default) and the overall time 2. In fact, on top of machine learning algorithms and their respective hyperparameters, we also include attribute selection methods and their respective hyperparameters in the configurations θ, thereby jointly optimizing over their choice and the choice of algorithms. 3. Conditional dependencies can also be accommodated in the Gaussian process framework Swersky et al., 2013), but currently, tree-based methods achieve better performance.  budget available for the entire learning process. 4 The overall budget is set to 15 minutes by default to accommodate impatient users; longer runs allow the Bayesian optimizer to search the space more thoroughly; we recommend at least several hours for production runs. The usability of the earlier research prototype was hampered by the fact that users had to download Auto-WEKA manually and run it separately from WEKA. In contrast, Auto-WEKA 2.0 is now available through WEKA's package manager. Users do not need to install software separately; everything is included in the package and installed automatically upon request. After installation, Auto-WEKA 2.0 can be used in two different ways: 1. As a meta-classifier: Auto-WEKA can be run like any other machine learning algorithm in WEKA: via the GUI, the command-line interface, or the public API. Figure 2 shows how to run it from the command line. 2. Through the Auto-WEKA tab: This provides a customized interface that hides some of the complexity. Figure 3 shows the output of an example run. Source code for Auto-WEKA is hosted on GitHub (https://github.com/automl/autoweka) and is available under the GPL license (version 3). Releases are published to the WEKA package repository and available both through the WEKA package manager and from the Auto-WEKA project website (http://automl.org/autoweka). A manual describes how to use the WEKA package and gives a high-level overview for developers; we also provide lower-level Javadoc documentation. An issue tracker on GitHub, JUnit tests and the continuous integration system Travis facilitate bug tracking and correctness of the code. Since its release on March 1, 2016, Auto-WEKA 2.0 has been downloaded more than 15 000 times, with an average of about 400 downloads per week. java -cp autoweka . jar weka . classifiers . meta . AutoWEKAClassifier -timeLimit 5 -t iris . arff -no -cv Figure 2: Command-line call for running Auto-WEKA with a time limit of 5 minutes on training dataset iris.arff. Auto-WEKA performs cross-validation internally, so we disable WEKA's cross-validation (-no-cv). Running with -h lists the available options. Figure 3: Example Auto-WEKA run on the iris dataset. The resulting best classifier along with its parameter settings is printed first, followed by its performance. While Auto-WEKA runs, it logs to the status bar how many configurations it has evaluated so far.

Related Implementations
Auto-WEKA was the first method to use Bayesian optimization to automatically instantiate a highly parametric machine learning framework at the push of a button. This automated machine learning (AutoML) approach has recently also been applied to Python and scikitlearn (Pedregosa et al., 2011) in Auto-WEKA's sister package, Auto-sklearn (Feurer et al., 2015). Auto-sklearn uses the same Bayesian optimizer as Auto-WEKA, but comprises a smaller space of models and hyperparameters, since scikit-learn does not implement as many different machine learning techniques as WEKA; however, Auto-sklearn includes additional meta-learning techniques. It is also possible to optimize hyperparameters using WEKA's own grid search and MultiSearch packages. However, these packages only permit tuning one learner and one filtering method at a time. Grid search handles only one hyperparameter. Furthermore, hyperparameter names and possible values have to be specified by the user.