Compatibility of neural network estimators with TPOT
The two NN estimators we have currently designed for TPOT (PyTorchLRClassifier and PyTorchMLPClassifier) integrate without issue into the TPOT workflow. Due to the frequently increased training time of TPOT-NN pipelines versus base TPOT pipelines, the NN estimators are not enabled by default, but can be enabled by passing the parameter config_dict=’TPOT NN’ when initializing TPOT.
Overall, TPOT has not been tested extensively on many specialized classification tasks (such as image classification, text classification, and others), which comprise an important topic that we have prioritized for future exploration. However, the inclusion of NN estimators is an important step in bridging this gap—neural networks implemented in PyTorch and other neural computing libraries have proven to be incredibly flexible for a vast number of applications, which opens many exciting opportunities for expanding TPOT and other AutoML tools.
This, however, points to one of the major challenges in developing NN estimators for TPOT: While most shallow estimators can be included in TPOT simply by referring to their modules in Scikit-Learn or XGBoost, TPOT-NN estimators need to be implemented ‘from scratch’ in PyTorch (or another neural computing library). Once these are written and incorporated into TPOT’s codebase, an appropriate set of tunable metaparameters also need to be defined. For most non-NN estimators, this is as simple as enumerating the possible arguments provided by their source libraries, but for NN estimators it can include complex dynamic characteristics that are highly responsive to the underlying dataset, like the number of hidden layers, layer width, multiple activation functions, and dropout rates, among many others.
Nevertheless, the fact that TPOT is supported by contributions from the open-source community—as well as the continued development of more streamlined neural computing interfaces (such as Keras)—suggests that these barriers will prove less challenging to handle in the future. The examples we show in this study illustrate the early potential of TPOT-NN and demonstrate how it behaves in comparison to base TPOT.
TPOT-NN significantly improves classification accuracy and reduces variance, but only for some datasets
A major criticism of neural networks and deep learning is that it has often been unfairly touted as a “magic bullet” that is ideal for solving most problems in AI. Recent research does a good job acknowledging that this is more nuanced than originally thought [31], and that shallow learners actually outperform deep learning in many cases, in spite of deep models in theory being universal approximators. We observed that the TPOT-NN performs substantially better than non-NN TPOT on 2 of the 6 datasets we tested—HV-with-noise and HV-without-noise—and we have not found a situation when it performs worse in terms of classification accuracy when compared to base TPOT.
Of possibly equal importance, we observed that repeated experiments using TPOT-NN yield more consistent results (lower variance in classification accuracy). This effect was statistically significant in 3 of 6 datasets (HV-with-noise, HV-without-noise, and ionosphere), but the observed variance measurements were smaller using TPOT-NN versus base TPOT in all 6 datasets. This observation suggests that TPOT-NN (and similar tools, in the future) could be used to improve the reproducibility of ML analyses. As this result was unexpected, we intend to explore this phenomenon more comprehensively in future studies.
This highlights one of the chief strengths of AutoML, and one of the major motivations for developing TPOT-NN: Neural network models clearly are advantageous for certain classification tasks performed on certain datasets, but simpler shallow models might work better on other datasets, including smaller datasets like those tested in this study. Further still, TPOT pipelines that incorporate both NN and non-NN estimators with different optimization objectives have the potential to outperform simpler pipelines containing only one estimator, especially when datasets contain complex sets of features made up of different data types. Finally, the inclusion of feature transformer and feature selector operators in TPOT adds model introspection capabilities to experiments that use ANNs.
Assessing the tradeoff between model performance and training efficiency
The amount of time needed to train a pipeline is an important pragmatic consideration in real-world applications of ML. This certainly extends to the case of AutoML: The parameters we use for TPOT include somewhere between 50 and 100 training generations with a population size of 100 in each generation, meaning that we evaluate several thousand candidate pipelines—each of which consists of a variable number of independently optimizable operators—for every experiment (of which there were 720 in the present study). As shown in Table 4, we generally expect a non-NN pipeline to train in the range of several hours to slightly over 1 day, depending on the dataset.
Our relatively simple NN estimators sit at the lower end (complexity-wise) of components used to build DL architectures, and likewise are among the simplest to train. Regardless, using either the LR or MLP PyTorch estimators in a TPOT experiment can cause the average training time to increase significantly—in our experiments, the average training time on the mushroom dataset increased by 18-fold when comparing the tpot-base to tpot-mlp configurations, and the datasets we used are smaller than those used in most DL applications, which can have millions of datapoints comprised of thousands of features each [35]. Users will have to determine, on an individual basis and dependent on the use case, whether the potential accuracy increase of at most several percentage points is worth the additional time and computational investment inherent to ANNs.
Nonetheless, our results make it clear that it is unlikely for a TPOT-NN pipeline to perform worse than a (non-NN) TPOT pipeline. In ‘mission critical’ settings where training time is not a major concern, TPOT-NN can be expected to perform at least as well as standard TPOT. Furthermore, the surprising observation that TPOT-NN seems to yield pipelines with less variance in their classification accuracy suggests that TPOT-NN’s new estimators may make the results of experiments more reliable and reproducible. However, this claim needs to be tested on additional datasets and explored further before any definitive conclusions can be made.
AutoML as a tool to discover novel neural network architectures
Based on the results we describe in Sect. 4.4, AutoML (and TPOT-NN, in particular) may be useful for discovering new neural network “motifs” to be composed into larger networks. For example, by repeating the internal architecture shown in Fig. 4 to a final depth of 152 hidden layers, converting the fully connected layers to convolutional layers, and adjusting the number of nodes in those layers, the result is highly similar to the version of ResNet that won first place in 5 categories at two major image recognition competitions in 2015 [14]. In the near future, we plan to investigate whether this phenomenon could be scaled into a larger, fully data-driven approach for generating modular neural network motifs that can be composed into models effective for a myriad of learning tasks.
However, there are two main challenges that need to be addressed before TPOT can automatically learn models of substantially greater depth. First, new strategies for efficiently learning pipelines using large training datasets need to be implemented. As it currently stands, TPOT pipelines become computationally infeasible to learn in a reasonable amount of time (and with reasonable computational resources) when datasets reach tens of thousands of samples, which is substantially smaller than many of the popular datasets used to train highly performant DL models. Second, TPOT penalizes larger pipelines in favor of smaller (and more interpretable) pipelines. Since increasing depth would result in larger pipelines, TPOT-NN would need to compensate for this penalty somehow. Both of these challenges are currently on the roadmap of tasks to address for TPOT in the near future.
Future work on integrating AutoML and ANNs
Since one of our primary goals in this work was to provide a baseline for future development of neural network models in the context of AutoML, the two PyTorch models we have currently built (logistic regression and multilayer perceptron) are structurally simple. Future work on TPOT-NN will allow expansion of its functionality to improve the capabilities of the existing models as well as incorporate other, more complex architectures, such as convolutional neural networks, recurrent neural networks, and other applications of deep learning. Additionally, we will be adding support for NN-based regression.
In implementing these neural network estimators, we also intend to evaluate and improve TPOT for use with other, more complex types datatypes, including images, text, graph data, and others, which have all played important roles in the success of ANNs and modern applications of AI. However, by first evaluating TPOT-NN on simple binary classification datasets made up of regularly structured, pre-extracted features, we have layed a strong foundation for future development in many exciting directions.