Abstract
Process-based modeling is an approach to constructing explanatory models of dynamical systems from knowledge and data. The knowledge encodes information about potential processes that explain the relationships between the observed system entities. The resulting process-based models provide both an explanatory overview of the system components and closed-form equations that allow for simulating the system behavior. In this paper, we present three recent improvements of the process-based approach: (i) improving predictive performance of process-based models using ensembles, (ii) extending the scope of process-based models towards handling uncertainty and (iii) addressing the task of automated process-based design.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Process-based Models (PBM)
- Exploratory Overview
- Candidate Model Structures
- Stochastic PBM
- Computational Scientific Discovery
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Process-based modeling (PBM) supports knowledge discovery by learning understandable and communicable models of dynamical systems. PBM uses domain-specific knowledge as declarative bias in combination with observed time-series data to address the task of modeling real-world systems. It performs both structure identification and parameter estimation, resulting in a process-based model which specifies a set of differential equations. In turn, such models accurately capture the complex and nonlinear behavior of a dynamical system through time.
Learning models of dynamical systems is a supervised machine learning task: the predictive variables correspond to observed system variables, while the targets correspond to their time derivatives. However, the task bears two specific properties that limit the use of traditional machine learning approaches. First, the resulting models take the form of a set of entities, processes and differential equations, i.e., artifacts used by scientists and engineers to construct explanatory models. On the other hand, machine learning methods operate on classes of predictive models that generalize well over arbitrary data, while keeping the complexity of training and evaluation procedures low. Second, the observed variables are measured at consecutive time points, so the data instances breach the common assumption of their mutual independence.
The PBM approach relies on the paradigm of computational scientific discovery [3] and more specifically, on approaches to inductive process modeling. On one hand, research in this area has a long tradition and has been applied to a variety of domains [1, 2, 10, 11]. However, while successful, it has been at the margins of mainstream machine learning. On the other hand, the PBM approach has so far focused primarily on applications within a narrow class of problems that emphasize descriptive and deterministic models at output, given a single data type at input. In terms of output, such models are typically simulated and analyzed using the learning data. Therefore, they have a tendency to overfit – rendering them incapable at accurately predicting future system’s behavior. Also, these models do not capture the intrinsic uncertainty of the interactions in the system. They always predict exactly the same behavior of the system at output in a deterministic manner: determined only by initial conditions and ignoring the uncertainty in real-world systems. In terms of input, an assumption of the PBM is that time-series of observations are always available and sufficient. This, however, does not hold for problems with limited observability or tasks, such as design, where different types of input are required.
In response, our recent developments of the PBM approach have aimed at bridging the gap between machine learning and domains of application within physical and life sciences. We address the limitations of the PBM approach by broadening the classes of tasks it can address. We build on the tradition of constant performance improvement, but also extend the scope of potential applications. In particular, to improve the performance on the task of predictive modeling, we support the learning of different types of ensembles of process-based models [4,5,6]. Next, we extended the output to include process-based models that describe stochastic interactions [7]. Finally, in order to address tasks of modeling dynamical systems under limited observability and tasks of design of dynamical systems, we consider different types of input data. Namely, in addition to time-series of observations of system variables we allow for the definition of expected properties of the behavior of the dynamical system [8, 9].
2 Methods
The PBM learning task takes domain-specific knowledge and time-series data at input (Fig. 1). The resulting model comprises system variables represented as entities and their interactions that define the underlying model structure represented as processes. This representation allows for straightforward mapping of process-based models into a set of differential equations. The model parameters are fitted to the data using evolutionary optimization methods with the sum-of-squares loss function as the objective. The PBM approach, however, adds an extra layer to the model equations. In particular, the models are constructed using components from a library of domain-knowledge, represented by template entities and processes. These templates encode taxonomies of variable and constant properties of the constituents in the dynamical systems as well as the taxonomies of processes (interactions) among them. The (partial) instantiations of such templates, taken from arbitrary levels of the respective taxonomies, define and constrain the model structure search space for a specific modeling task.
PBM has four distinguishing features. First, it produces understandable models, which give clear insight into the structure of a dynamical system building on the traditional mathematical description. The processes relate specific parts of the set of differential equations to understandable real world causal relations between the system’s components. Second, process-based models retain the utility of traditional mathematical models. They can be readily simulated and analyzed using well established numerical approaches. Third, PBM is generally applicable to domains that require models described in terms of equations. Finally, the PBM approach is modular. The domain-knowledge library can be instantiated into a number of different modeling components specific to a particular modeling task. It captures the basic modeling principles in a given domain and can be reused for different modeling applications within the same domain.
We report on three extensions of PBM (Fig. 1). To improve the capability to predict future system’s behavior, we consider learning of ensembles of process-based models. The constituent base process-based models are learned either from different samples of the measured data [4], random samples of the library of domain knowledge [6] or both [5]. Such sampling approaches have a direct effect on the generalization ability of the ensembles, leading to improved predictive performance. Second, the ensembles of process-based models can provide long-term predictions, relying only on the initial values of the state variables as opposed to traditional ML ensembles (in the context of time-series) that are typically used for short-term prediction.
To capture the intrinsic uncertainty of interactions within real world dynamical systems, we propose an improved finer grained formalism for representing domain knowledge [7]. It encodes the interactions between entities, i.e., processes in the form of reaction equations allowing for both deterministic and stochastic interpretation of process-based models and knowledge.
We extended the input to the PBM approach to different types of data, which allows handling a broader set of tasks ranging from completely data-driven to completely knowledge-driven modeling. In this context, we first strengthen the evaluation bias of modeling tasks with limited observability [9]. We use domain-specific criteria for model selection as part of a general regularized objective function for parameter optimization and model selection. Second, we formulate the novel task of process-based design of dynamical systems [8]. This approach does not take measured data at input, but is completely based on the description of desired properties of the behavior of a dynamical system. We further generalize the task by taking advantage of methods for simultaneous optimization of multiple conflicting objectives (desired properties of the behavior). We use the complete information from the Pareto front of optimal solutions (obtained for every candidate design) to rank the designs and make a well informed selection.
3 Significance and Challenges
The methodology for learning ensembles of PBMs extends the scope of the traditional ensemble paradigm in machine learning towards modeling dynamical systems. It improves the generalization power of PBMs, providing more accurate simulation of the future behavior of the modeled systems. The proposed methodology employs four different methods for constructing ensembles of process-based models. Each of these significantly improves the predictive performance (on average up to 60% of relative improvement) over individual models on tasks of modeling population dynamics in three lake ecosystems [4,5,6].
The extension of the PBM approach towards stochastic process-based models has allowed us to model dynamical systems that are out of the scope of deterministic models. We have demonstrated that the stochastic PBM is capable of reconstructing known, manually constructed models from synthetic and real-world data in the domains of systems biology and epidemiology [7].
The capability of PBM to handle different inputs and multiple modeling objectives has led to important contributions in the domains of systems and synthetic biology. In particular, PBM can address the problem of high structural uncertainty (many candidate model structures) and incomplete data (i.e., limited observability of the system variables). In system biology, our approach can alleviate the model selection problem by strengthening the evaluation bias with introducing domain-specific model selection criteria [9]. In synthetic biology, we can now use PBM to solve the task of automated design. Our results show that PBM is capable of reconstructing known/good designs, as well as proposing novel alternative designs of a synthetic stochastic switch and a synthetic oscillator [8].
Note, finally, that all three extensions of the PBM approach are designed and implemented as independent modular components. Therefore, they are interoperable. They can be, in principle, arbitrarily combined and applied to novel tasks, such as learning ensembles of stochastic process-based models.
Several challenges, that we are aware of and currently working on, remain in PBM. The exhaustive combinatorial search currently in use is computationally inefficient and does not scale well with the number of candidate model structures. It is therefore necessary to integrate methods for heuristic search in our current implementation. An alternative approach to reducing search complexity is to use higher-level constraints on model structures that are more expressive than the current constraints. They can be based on the topological properties of the candidate model structures, or can define a probability distribution over the model structures. Finally, both process-based modeling and design require further evaluation on other related domains, such as neurobiology, systems pharmacology and systems medicine, or on completely new domains. The new applications will most certainly open up new directions for improvement of the PBM approach.
References
Bridewell, W., Langley, P., Todorovski, L., Džeroski, S.: Inductive process modelling. Mach. Learn. 71, 109–130 (2008)
Džeroski, S., Langley, P., Todorovski, L.: Computational discovery of scientific knowledge. In: Džeroski, S., Todorovski, L. (eds.) Computational Discovery of Scientific Knowledge. LNCS (LNAI), vol. 4660, pp. 1–14. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73920-3_1
Langley, P., Simon, H.A., Bradshaw, G.L., Zytkow, J.M.: Scientific Discovery: Computational Explorations of the Creative Processes. MIT Press, Cambridge (1992)
Simidjievski, N., Todorovski, L., Džeroski, S.: Predicting long-term population dynamics with bagging and boosting of process-based models. Expert Syst. Appl. 42(22), 8484–8496 (2015)
Simidjievski, N., Todorovski, L., Džeroski, S.: Learning ensembles of process-based models by bagging of random library samples. In: Calders, T., Ceci, M., Malerba, D. (eds.) DS 2016. LNCS (LNAI), vol. 9956, pp. 245–260. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46307-0_16
Simidjievski, N., Todorovski, L., Džeroski, S.: Modeling dynamic systems with efficient ensembles of process-based models. PLoS One 11(4), 1–27 (2016)
Tanevski, J., Todorovski, L., Džeroski, S.: Learning stochastic process-based models of dynamical systems from knowledge and data. BMC Syst. Biol. 10(1), 1–30 (2016)
Tanevski, J., Todorovski, L., Džeroski, S.: Process-based design of dynamical biological systems. Sci. Rep. 6(1), 1–13 (2016)
Tanevski, J., Todorovski, L., Kalaidzidis, Y., Džeroski, S.: Domain-specific model selection for structural identification of the Rab5-Rab7 dynamics in endocytosis. BMC Syst. Biol. 9(1), 1–31 (2015)
Todorovski, L., Bridewell, W., Shiran, O., Langley, P.: Inducing hierarchical process models in dynamic domains. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 892–897. AAAI Press (2005)
Čerepnalkoski, D., Taškova, K., Todorovski, L., Atanasova, N., Džeroski, S.: The influence of parameter fitting methods on model structure selection in automated modeling of aquatic ecosystems. Ecol. Model. 245, 136–165 (2012)
Acknowledgements
The authors acknowledge the financial support of the Slovenian Research Agency (research core funding No. P2-0103, No. P5-0093 and project No. N2-0056 Machine Learning for Systems Sciences) and the Ministry of Education, Science and Sport of Slovenia (agreement No. C3330-17-529021).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tanevski, J., Simidjievski, N., Todorovski, L., Džeroski, S. (2017). Process-Based Modeling and Design of Dynamical Systems. In: Altun, Y., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10536. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-71273-4_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71272-7
Online ISBN: 978-3-319-71273-4
eBook Packages: Computer ScienceComputer Science (R0)