Robust and accurate regression-based techniques for period inference in real-time systems

With the growth in complexity of real-time embedded systems, there is an increasing need for tools and techniques to understand and compare the observed runtime behavior of a system with the expected one. Since many real-time applications require periodic interactions with the environment, one of the fundamental problems in guaranteeing their temporal correctness is to be able to infer the periodicity of certain events in the system. The practicability of a period inference tool, however, depends on both its accuracy and robustness (also its resilience) against noise in the output trace of the system, e.g., when the system trace is impacted by the presence of aperiodic tasks, release jitters, and runtime variations in the execution time of the tasks. This work (i) presents the first period inference framework that uses regression-based machine-learning (RBML) methods, and (ii) thoroughly investigates the accuracy and robustness of different families of RBML methods in the presence of uncertainties in the system parameters. We show, on both synthetically generated traces and traces from actual systems, that our solutions can reduce the error of period estimation by two to three orders of magnitudes w.r.t. the state of the art.


Introduction
The rapid growth of software size and complexity in real-time embedded systems has posed imminent challenges to the ability to debug systems, identify runtime deviations from the correct service (Young et al. 2019), and detect (and evade) security attacks at runtime (Nasri et al. 2019). This raises an urge for tools and techniques to understand (or infer) the runtime behavior of a system from its observable outputs such as the traces of output messages, task executions, actuations, etc. without impacting the system itself or, in some cases, without being able to access the source code or the internal parts of the system.
In this paper, we focus on developing a tool for inferring the timing properties of a system. Such a tool can be used to (i) find time-bugs during the development phase, for example, to check if activities happen with the expected frequency or period, or to act as an automated test oracle (Barr et al. 2015), (ii) detect timing anomalies and security attacks that leave a trace on the observable timing profile of the system during the operation phase (e.g., such as those explained by Nasri et al. (2019), Salem et al. (2016), and Iegorov and Fischmeister (2018) to spot anomalies in the regularity of an activity in the system), and (iii) diagnosing the system after applying a patch or an upgrade during the maintenance phase (e.g., to check if a data-consumer application still performs periodically after installing an upgrade on the data-producer application).
Since many real-time applications require periodic interactions with the environment (Akesson et al. 2020), one of the primary use cases of a timing inference tool is to infer the periodicity of events from a system's output traces (Berberidis et al. 2002;McKilliam et al. 2014;Puech et al. 2019). What makes this very first step challenging is that the observable timing traces are typically obtained from the components' interfaces and hence are impacted by the internal structure of the application, operating system, hardware platform, and their interactions. For instance, consider an execution trace that indicates the time intervals during which a certain task has occupied the processor. It is easy to infer the period if the task exclusively runs on top of dedicated hardware. It becomes harder if the task is one of the low-priority tasks in a set of periodic tasks running on top of a real-time operating system (RTOS) with a preemptive fixed-priority scheduling (FP) policy because then the task's execution intervals are affected (e.g., preempted) due to the interference from the higher-priority periodic tasks. Finally, it becomes much harder if the latter system also includes high-priority aperiodic or event-driven activities (such as interrupt services), sporadic tasks, release jitters, and deadline misses. A timing inference tool, therefore, must be robust against these interferences, dynamic behavior, and uncertainties; otherwise, it might not be able to address true challenges faced by real systems and hence becomes useless in practice. Furthermore, it must be accurate, else it will not be helpful to find time bugs or to detect deviations from the expected periodicity.
Related work. Iegorov et al. (2017) are among the few pioneers who proposed a solution for the problem of inferring periods from execution traces. They created an algorithm which identifies the time intervals between consecutive jobs and computed the period as the mode of the intervals' distribution. However, their method performs poorly when the tasks have runtime execution-time variation and/or the true period of the task under analysis does not divide all other smaller periods in the task set, i.e., it is not harmonic with the rest of the tasks. Young et al. (2019) use a fast Fourier transformation to infer the periodicity of messages sent on a controller area network (CAN) in order to detect security attacks that impact the timing of the messages. Their problem, however, is only a subset of ours since CAN applies a non-preemptive fixed-priority policy and messages have typically a fixed size with a low runtime variation on the message length.
Data-driven methods such as k-nearest neighbors and dynamic time-warping algorithms as well as long short-term memory (LSTM) neural networks have been used in reverse engineering real-time systems to identify tasks from their runtime power traces by Lamichhane et al. (2018) and to reconstruct traces affected by noise by Sucholutsky et al. (2019). However, to the best of our knowledge, no study so far has utilized regression-based machine learning (RBML) methods to infer the timing 1 3 properties of real-time systems. We not only provide the first such solution, but also extensively investigate the accuracy and robustness of various families of RBML for this problem.
Finding the periodicity of a signal is a well-studied problem in signal processing research (Schuster 1898;Berberidis et al. 2002;Vlachos et al. 2005;Li 2012;McKilliam et al. 2014;Malode et al. 2015;Unnikrishnan and Jothiprakash 2018;Puech et al. 2019;Gubner 2006). Periodogram (Schuster 1898) and circular autocorrelation (Gubner 2006) are among the widely used methods to find a plausible set of periods for a signal. However, as we will see in the experimental section, these methods perform poorly when used on signals generated from preempted tasks. Nonetheless, despite their limitations, we found them to be helpful to generate an initial set of candidate periods and hence will use them only in the first step of our solution to extract features from execution traces.
This paper. We consider the problem of inferring a task's period from a timedsequence of zeros and ones (called a binary projection) that shows when the task was occupying the resource (see Sect. 2). We consider a single processing resource (it can be a CPU, a network link, a CAN bus, etc.) that is governed by a workconserving job-level fixed-priority (JLFP) scheduling policy. We assume no prior knowledge about the number of other tasks in the system and their parameters, execution model (preemptive or non-preemptive), runtime execution-time variations, and release jitters.
Our framework uses two signal-processing techniques, i.e., periodogram and circular autocorrelation, to extract features from the binary projection, treat and reduce the size and the number of features, and then use them to train a set of RBML methods (in Sect. 3). This work (i) presents the first period inference framework 1 that utilizes RBML methods, and (ii) thoroughly investigates the accuracy and robustness of different families of RBML methods in the presence of uncertainties in the system parameters, noise resulted from aperiodic tasks in the input data or missed jobs.
Our results show that RBML methods infer tasks' periods with an average error of 0.4% (for periodic tasks with or without execution-time variation), 1.1% (for periodic tasks with release jitter), and 0.4% (for task sets with a mixture of periodic, sporadic, and aperiodic tasks) while the state of the art (Iegorov and Fischmeister 2018) has an average error of 1160%, 1950%, and 156%, respectively. On case studies from actual systems (Lee et al. 2017;Seo et al. 2018), the error of our (best) solution was below 1.7%. Sect. 6 provides insight on the strengths and weaknesses of different families of RBML methods for the problem of period inference.

System model
We assume a system with a single (processing) resource (such as a CPU core, I/O or CAN bus, or a link on the network). The resource can be occupied/used by a set of tasks = { 1 , 2 , … , n } , scheduled by a work-conserving job-level fixedpriority (JLFP) scheduling policy on the resource, i.e., only the highest-priority job among the ready jobs can be dispatched on the resource, where a job is an instance of a task in . JLFP policies include widely implemented/used scheduling algorithms in real-time systems such as the earlier-deadline first (EDF), fixedpriority (FP), and first-in-first-out (FIFO) scheduling policies. A work-conserving scheduling policy is the one that does not leave the resource idle if there is a task that is ready to occupy the resource. Furthermore, we assume no restriction on whether each task executes preemptively or non-preemptively.
A task in can be activated periodically, sporadically, or aperiodically. A periodic or sporadic task is identified by are the best-case and worst-case execution times (BCET and WCET), T i is the period, D i is the relative deadline (which is assumed to be equal to the period), and i is the maximum release jitter of the task. Following Audsley's convention (Audsley et al. 1993), we assume positive release jitter, i.e., the k th ≥ 1 job of a periodic task i is supposed to be released during the interval If the task is sporadic, its period indicates the minimum-inter arrival time between its activations. An aperiodic task is identified by a 3-tuple j = (C min j , C max j , D j ) , where C min j and C max j are the BCET and WCET and D j is the relative deadline of the task, respectively.
We further assume that all timing parameters are positive integer values in ℕ + with the exception of C min i and i that can be 0. The total utilization of the system is denoted by U and is the sum of the utilization of all periodic and sporadic tasks, i.e., U = ∑ u i , where u i = C max i ∕T i . The hyperperiod of a task set, denoted by H, is the least common multiple of the periods.
A task i generates an infinite number of instances, called jobs, during the lifetime of the system. We use J i,j to denote the j-th job of a task i . The priority of a job J i,j is denoted by p i,j and is determined by the scheduling policy. We assume that at any time instant t, either one of the tasks in or the idle task, denoted by 0 , is running on the resource.
A trace T = ([t s , t e ], ⟨ 1 , 2 , … , N ⟩) is a time-ordered sequence of symbols that represents a schedule generated by the JLFP scheduler for the task set ∪ { 0 } from the time t s to t e . Each symbol i in the trace T is an identifier (index) of a task that was occupying the resource at time i, where i ∈ {t s , t s+1 , … , t e } . Hence, i ∈ {0, 1, … , n} . The length of a trace is |T| = t e − t s . Figure 1a and b show a schedule of a task set with 4 tasks and the equivalent trace of that schedule.

Problem definition
To formally define the problems considered in the paper, we need to introduce three other notions that are tied to a trace: binary projection, ternary projection, and quaternary projection (shown in Fig. 1c to e).
A binary projection for a task i is a sequence of zeros and ones that represents the times at which a job of the task under observation was occupying the resource in the trace. Figure 1c shows the binary projection of the task 3 .
Definition 1 A binary projection of a trace T for a task i , denoted by P B i = ⟨p 1 , p 2 , … , p �T� ⟩ , is a time-ordered sequence of elements p k , where = 2 ) and one periodic task ( 3 with C max 3 = 4 and T 3 = 10 ) with release jitter scheduled by a FP policy (assuming p i = i ). a Shows a schedule, b shows the trace of the schedule, and c-e show the binary, ternary, and quaternary projections of task 3 in the task set A ternary projection (Fig. 1d) for a task contains resource-idle intervals in addition to what is stored in a binary projection.
Definition 2 A ternary projection of a trace T for a task i , denoted by P T i = ⟨p 1 , p 2 , … , p �T� ⟩ , is a time-ordered sequence of elements p k , where A quaternary projection of a task i is similar to the ternary projection except that it also includes the intervals in which a job of a lower-priority task than i is occupying the resource. Namely, quaternary projections can be derived for FP scheduling policy (and not EDF).
Definition 3 A quaternary projection of a trace T for a task i , denoted by where lp (T, i, k, k ) returns true only if it is possible to verify that the job of k that occupies the resource at time k has a lower priority than the latest released job of i at time k. Figure 1e shows the quaternary projection of the task 3 . As it can be seen, a lowpriority task ( 4 ) creates two time intervals, i.e., [10, 11) and [17, 18), with the label "low" in the quaternary projection of 3 . Note that quaternary projections do not distinguish low-priority tasks from each other (namely, "low" can represent "any" of the low-priority tasks).
We conclude this section by defining three versions of the period inference (PI) problem: Problem 1 Find the period of i from its projection P B i .
Problem 2 Find the period of i from its projection P T i provided that P T i does not include a deadline miss from i .

Problem 3
Find the period of i from its projection P Q i provided that P Q i does not include a deadline miss from i . (1) (2) It is worth noting that the only input to the Problems 1, 2, and 3 is a projection. Since a projection is just a sequence of limited symbols ('0', '1', 'idle', and 'low'), it does not contain any information about the scheduling policy or tasks' parameters (such as the execution times, release jitter, periods, etc.). Moreover, the projections themselves do not contain information about whether or not there are tardy jobs (i.e., jobs that have completed after their deadline) in the original trace.

Obtaining projections
In practice, one may utilize operating system commands such as top and trace commands or the Linux trace toolkit to obtain a trace of a certain task. There is no need to distinguish or annotate preemptions, start of a new job, blocking times by lowerpriority jobs, or self-suspensions because that is the part that the period inference problem answers. It is also fine if the information gathered in a projection is incomplete (namely, misses some jobs of the task or some of its execution intervals).
In the context of a network resource, for example, on a CAN bus, one can obtain projections by observing the messages transferred on the bus to form a trace or to just use the outputs of the message filters of the CAN controller to get a binary, ternary, or quaternary projection for the message ID of interest.
Ternary and quaternary projections require slightly more detailed observations from the system. Still, the same tools mentioned above can be used to derive ternary projections (as the only added information in a ternary projection is the moments in which the resource is idle).
Quaternary projections are helpful only if there are "tasks" with a lower priority than the one being observed in the system (namely, the scheduling policy is not EDF). In our paper, we do not need to include information of all lower-priority tasks in a quaternary projection, i.e., only a partial observation would suffice (for example, only some of the lower-priority tasks can be observed but not all). The more information (about the lower-priority tasks) can be added to a quaternary projection, the better would be the period bounds that we will derive from the quaternary projections in Sect. 4.2.
In some cases, a task may roughly know the execution window of other higheror lower-priority tasks if it collaborates with them, e.g., when it sends messages to them and waits until it receives an acknowledgement or response. However, when tasks are independent or isolated from other tasks, they may not be able to obtain information needed for quaternary projections without the help of an operating system. This may then restrict the applicability of quaternary projections to cases where the operating system also takes part in the safety-monitoring activity. For example, consider a case where a system is equipped with a safety-monitoring component whose goal is to ensure that certain activities happen periodically within an expected period range. To improve monitoring accuracy, the architect may even equip the operating system with extra functions/APIs that gather binary, ternary, or quaternary projections and feed them to the safety-monitoring component.
When the period inference is used in runtime monitoring tools or time-debugger tools designed for a special system with known parameters, it is typically possible to obtain richer projection types such as ternary and quaternary projections as we explained earlier. We will later (in Sect. 4) investigate how the extra information in these projections can be used to improve the accuracy of the period inference Problems 2 and 3.

Regression-based period mining
This section first introduces the challenges of the period inference (PI) problem and then presents our solution framework.
Challenges. As mentioned earlier, the PI problem has a long history in signal processing. Methods such as periodogram (Schuster 1898) and circular autocorrelation (or autocorrelation for short) (Gubner 2006) have been applied to infer periodicity of a signal and shown to work well in the presence of small (or standard) noise. However, they do not perform well (see Sect. 5) when applied to the PI problem because: (i) they may generate many period candidates most of which are irrelevant, (ii) although they assign a weight (called power) to each candidate, there is no direct relation between the weight and the true period, (iii) they cannot cope with preemptions well because they perceive each preemption as a new occurrence of the event under analysis (which adds a significant amount of noise to their inputs), and (iv) the true period is not necessarily among their generated candidates (specially in the autocorrelation method).
We then decided to look into the learning-based methods that could work well on the PI problem. We specially focused on those whose decision logic is explainable and traceable by a human. Therefore, we deliberately avoided using deep neural networks for the problem or for the feature extraction. However, this raised the next challenge: how to extract meaningful and helpful features from a projection? A starting point could be to use the whole binary projection as a feature and let the machine-learning method figure out the period. However, that could lead to two major issues: (i) dimensionality problems with the feature space, and (ii) having inputs with varying-length.
High-dimensional feature spaces typically lead to sparse data which in turn reduces the efficiency and increases the runtime of model learning (Bellman 1961). Moreover, most machine-learning methods require a fixed input size which implies that the input projection must be cut (unanimously for all projections of all training and testing task sets). However, since task sets have different hyperperiods, putting a predetermined cut-off threshold could either lead to low accuracy (if the cut-off is too short) or to a huge runtime and low efficiency in learning the model (if the cutoff is too long).
Solution highlights. The framework we propose to solve the period inference problem suggests a four-stage pipeline where Stage 0 extracts features and Stages 1 to 3 are for accuracy improvement of period estimation. Figure 2 shows the pipeline and the stages.
In Stage 0, we extract a fixed set of features from the top k highest-rank candidates of the periodogram and autocorrelation methods (see Sect. 3.1). In Stage 1, we use supervised-learning methods, and in particular, the regression-based machine learning (RBML) methods, to determine the relationship between our feature vectors and the target output, i.e., task period (see Sect. 3.2). RBML methods are commonly used when the goal is to predict a continuous output that takes order into consideration (in our case, the period). We call our RBML solution regression-based period miner (RPM). In Stage 2, we further adjust the predictions of RPM according to a set of high-ranked candidates from periodogram and autocorrelation. This aims to use RPM as a referee whose purpose is to highlight the most accurate peak from the two signal-processing methods (see Sect. 3.3). Finally, Stage 3 introduces some pruning rules using the extra information provided in ternary (Sect. 4.1) and quaternary (Sect. 4.2) projections to further restrict the number of candidates.

Feature extraction
Next, we explain our feature extraction and briefly introduce the periodogram and autocorrelation methods.
Periodogram (Schuster 1898). Consider the binary projection as a sequence P B i (where p n is the nth item of the projection) and its discrete Fourier transform X(f). The periodogram P gives an estimation of the spectral density of the discrete signal P B i and is obtained from the squared magnitude of the Fourier coefficients X(f), as presented in Leondes (1996): i | is the sequence length and P(f ) is the power of frequency f. The Fourier coefficients X(f) can be obtained from the sequence P B i as follows The norm of a Fourier coefficient is the magnitude of that coefficient, namely, where Re{X(f )} and Im{X(f )} are the real and imaginary coefficients for each frequency f, respectively. Figure 3a and b show two periodograms obtained for two periodic tasks with period 1000 and 5000 from a task set with four tasks scheduled by rate monotonic As can be seen in Fig. 3a, the highest peak in this example (here, peak refers to a jump in the diagram) of the periodogram indicates the true period of the task, i.e., 1000. However, for the task with period 5000, this observation does not hold; the true period of this task is not the highest-peak but the 5th highest peak. The lower the priority of a task, the higher is the amount of interference it will have in its schedule. These interferences make the projections less regular and hence result in a more irregular periodogram that has many peaks.
Circular Autocorrelation (Gubner 2006). It is a metric that describes how similar is a sequence to its past values for different circular phase shifts. We use Vlachos et al. (2005) method to compute the circular autocorrelation: Periodogram and circular autocorrelation methods applied on task projections for a task with a period 1000, and b period 5000 from a system containing 4 tasks, with a total utilization of 30% scheduled by rate monotonic. The other two tasks have a period of 2000 and 10000, respectively where N is the sequence length and w is the phase shift. In the case of period inference problem, we would expect that the highest value of the autocorrelation function would be at a lag w equal to the true period.
A practical way to compute the ACF is to translate the operations into the frequency domain. Since (6) is a convolution, one can compute it with the dot product between the Fourier coefficients of the sequence and their complex conjugates (Vlachos et al. 2005): In this paper, we apply the discrete Fourier transform on the projection and extract the Fourier coefficients using (5). Furthermore, we perform the dot product between the coefficients and their complex conjugates and apply the inverse Fourier transform on the result to obtain the autocorrelation. An implementation of our method can be found on github 2 , along with the rest of the framework. Figure 3a and b illustrate the usage of autocorrelation when the input is a projected trace. Firstly, we notice that the highest value that this technique exhibits is for a lag (period) of w = 0 . This behavior is normal, since the highest similarity between a signal and itself is present when the two signals perfectly overlap with each other, e.g., at time 0 (see Eq. 6). Hence, the peak at 0 is excluded from the examination. The other observation is that, similar to the periodogram, the autocorrelation method is able to discover the true period only in the case from Fig. 3a, while for the second period, its top peak indicates an erroneous value. Moreover, we observe that the autocorrelation is sensitive to low utilization values.
In Fig. 3b, we see that the start time of the task with period 5000 is not at integer multiples of 5000 and varies a bit due to the interference caused by other high-priority tasks in the system. However, even though this task has not been preempted, we see that the projection does not have any overlap with itself when is shifted by the true period of 5000 (i.e., at w = 5000 ). As a result, the autocorrelation method could not detect the actual periodic behavior. However, it could observe two smaller peaks slightly shorter and slightly larger than 5000 at 4635 and 5365, respectively.
It is worth noting that, both periodogram and autocorrelation methods have an O(N log N) time complexity, where N is the length of the projection.
Extracting fixed-size features. Our fixed-size candidate list is constructed from the top k = 3 peaks of the outputs of the two methods, namely, we gather k-highest peaks from periodogram and k-highest peaks from the autocorrelation methods. It is worth noting that the width of a peak is correlated with the position of the peak in periodogram (the further from the origin the larger the width). Thus, it does not provide enough information to be considered as a feature for regression.
Having a feature set of size k = 6 allows us to work on a much smaller dimension for the input-data and have fixed input size to use with our regression-based solution. For the cases when there are fewer than k peaks for a method, the number of features is completed by appending the highest peak of that method until we reach the desired k. The choice on the number of features, i.e., k = 3 , was made after evaluating the impact of k on various scenarios and finding out the suitable value that results in a high accuracy without increasing the dimensions of the feature space ( Fig. 7b in Sect. 5 compares different choices).

Regression methods
Regression analysis is a method originating from statistics, whose purpose is to estimate the relationship between a dependent variable (or "outcome") denoted by Y and one or more independent variables (or "features") denoted by X. In machine learning, regression is employed when the aim is to predict a continuous output, which takes order into consideration. A regression model is formally described by where Y i is the outcome variable, X i is a feature vector, represents unknown parameters, and e i is an additive error term (residual) associated with the prediction.
Since we try to estimate the period from a projection, in our regression scenario, the dependent variable Y i is the task's period T i . The independent variables X i contain the features we extracted at the previous step, while the function f comes from the choice of a regression algorithm, whose parameters need to be estimated during the training phase.
In other words, our goal is to choose the form of function f and to compute the estimates of the parameters ̂ such that the function has the best fit on the data. In order to assess how well the model fits the data, the predicted outcome, i.e., , is compared against the true dependent variable. The comparison is present in the shape of a loss function L Y, f X,̂ , where Y is a vector containing the outcome variables and X includes all vectors of independent variables. For instance, the most commonly used loss function is the mean square error (MSE) (also used in our paper): Table 1 Overview of best performing families of regression algorithms and for each family the best model (Delgado et al. 2019) Algorithm Nickname Category Cubist Regression (Quinlan 1992(Quinlan , 1993(Quinlan , 2014  where N is the total number of observations. The choice of regression methods. Table 1 lists the overall best performing families of regression algorithms and for each family the best model, as suggested by Delgado et al. (2019) in their extensive recent survey on the performance and effectiveness of regression methods. These methods present distinctive characteristics in their implementation, namely, they do not theoretically dominate each other. Hence, in order to answer the question "which regression method performs best for the period-inference problem", we implemented and investigated all of these methods to gather insights about their performance on our particular problem.
We, however, anticipate to see that the tree-based solutions (cubist, gbm, extraTrees, bartMachine) have a better performance than svm and avNNet because we expect the transition from a set of candidate periods (the features) to the true period to be better approximated by a set of rules and/or comparisons rather than a linear or non-linear combination of these features as in svr and avNNet, respectively.
Regression trees. A majority of the RBML methods in Table 1 are variations of regression trees. A regression tree (Breiman et al. 1984) recursively partitions the feature space of the data into smaller regions until the final sub-divisions are similar enough to be summarized by a simple model in a leaf. This model can be simply the average of the outcomes from that sub-division. Figure 4 shows the rules generated by a regression tree that was trained on the automotive task sets with four periodic tasks and 30% utilization (see details of the task set generation in Sect. 5.1). The features used for training are the three highest peaks from the periodogram (denoted by P1, P2, and P3) and autocorrelation (denoted by A1, A2, and A3) methods. The non-terminal nodes represent the rules that will be used to guide the inference process by narrowing down the period estimate of a new task.
To make it more tangible, we explain how to use the regression tree in Fig. 4 to estimate the period of the two tasks in Fig. 3a and b. In the first step, we derive the three highest peaks of the periodogram and autocorrelation methods to build the feature vectors X 1 and X 2 for the first and second tasks, respectively. Here, X 1 = ⟨P1=1000, P2=500, P3=333, A1=1000, A2=2000, A3=3000⟩ and X 2 = ⟨P1 =769, P2=666, P3=5000, A1=4635, A2=10000, A3=5365⟩ . Next, we traverse the tree by evaluating the rules starting from the root node. For example, for the first task, P1 = 1000 and hence the condition in the root node (i.e., P1 ≤ 60000 ) is satisfied. Thus, we go to the right branch and repeat the process until we reach to a leaf. The value in the leaf is the period estimate. In this example, the trained model can accurately estimate both tasks' period.
An interesting observation in Fig. 4 is the exclusion of A2 and A3 in the tree's rules which basically means that these two features had no impact on the final period estimate. With a further investigation, we observed that typically in task sets with low utilization, the trained regression trees tend to be smaller and rules contain fewer features because there are less preemptions (and hence, less noise) in the input. However, with an increase in utilization, the tree is forced to consider more features and even become deeper to keep the estimation error low.
Training a regression tree can be done in O(m ⋅ N ⋅ log N) , where m is the number of features (in our case it is a constant value equal to 6) and N is the number of samples (projections) used for training. Later in Sect. 5, we provide an evaluation on the runtime and memory consumption of various RBML methods.
Understanding how a simple regression tree works, we can now discuss the actual RBML methods used in our work according to the suggestions of Delgado et al. (2019). Note that four of these methods are extended variations of the regression trees but none is as simple as the tree shown in Fig. 4.
Cubist Regression (cubist). Kuhn and Quinlan 2020;Quinlan 1992Quinlan , 1993Quinlan , 2014. It is a regression tree whose leaves embed linear regression models instead of simple 'estimates of the output'. The tree can be further reduced by combining or pruning the rules via collapsing the nodes of the trees into rules.
By training a cubist regression model on the same data-set as in Fig. 4, we obtain the following rules: In this example, we observe that while the rules and outputs rely on the top candidates of the periodogram, they are not limited to them. For example, rule 2 outputs the period 5000 which is not among the three top features of periodogram. The cubist regression uses these rules to compensate for projections where the periodogram is wrong.

3
Cubist regression consumes notably less memory than the regression trees (see Sect. 5) and hence it is a better choice when the solution must have low memory consumption and runtime. However, we also noticed a growth in the number of rules when it is trained on task sets with high utilization because then the underlying regression tree from which the cubist regression rules are obtained gets larger and deeper when the number of preemptions increases.
Generalized Boosting Regression (gbm) (Greenwell et al. 2019;Friedman 2002). This algorithm is a regression tree-based solution which uses a committee of regression trees of fixed size. The initial prediction of the algorithm starts from a leaf, which contains the average value of the outcome variables (i.e., the periods). The next step is to compute the residuals of this initial prediction against the true output (true period). Next, a regression tree is fitted on the data, but having the previously computed residuals as the outcome variables.
In order to preserve the generalization capabilities of the model, the results from the tree are multiplied by a constant value. Afterwards, the output from the tree is added to the initial leaf to obtain a new set of predictions, which are again used to compute residuals. The process is repeated until a maximum number of trees is reached.
Extremely Randomized Regression Trees (extraTrees) (Simm et al. 2014;Geurts et al. 2006). The algorithm relies on a committee of regression trees for its predictions. When building the trees, this method randomly picks a rule for each feature (instead of searching for a rule that minimizes the error) and then chooses the one that provides the lowest error. Hence, a randomized regression tree is much faster to build than a regular regression tree.
Bayesian Additive Regression Tree (bartMachine) (Kapelner and Bleich 2016;Chipman et al. 2010). Similar to gbm, this method also relies on a group of trees, where each tree is fit on the residuals of the predictions from a previous tree. The major difference is that bartMachine is based on a probability model containing a set of priors for the tree structure and a likelihood for the leaves' values. extraTrees, gbm, and bartMachine stop building the model when a given (maximum) number of trees is achieved.
Averaged Neural Network (avNNet) (Kuhn 2020;Ripley 2007). The technique involves a committee of five multilayer perceptrons having the same size, but trained using different random seeds. The network is set to have linear output neurons, which makes it suitable for regression. Finally, the predictions from the five networks are averaged to provide the final estimate. Support Vector Regression (svm) (Meyer et al. 2019;Cortes and Vapnik 1995). The goal of svr is to find a line or a hyperplane that is able to fit the most data points within a certain margin from it. Moreover, it can accommodate nonlinear trends by fitting the line in a transformed feature space using a kernel function.
Sections 5 and 6 provide further insights on the performance of the RBML methods.

Candidate selection
As the example in Fig. 3 shows, the true period is among one of the peaks of the periodogram and autocorrelation, although not always is the highest peak. After further investigations, we observed that on the one hand, in a majority of projections, the true period is indeed among the peaks of periodogram and autocorrelation. However, it is hard to know which of those peaks just by looking at their power or rank. On the other hand, the RBML methods typically predict only an approximation of the real period which is not always equal to the true one (resulting in non-zero errors in most cases). Thus, we introduce a further pruning phase on the output of our RPM method and create a method called RPM with period adjustment (RPMPA).
RPMPA treats the RPM method as a referee which chooses the right period from a set of candidates. Namely, it first calculates the period estimate using the RPM method and then finds the closest period to this estimate from a fixed set of values gathered from the 20 highest peaks of each of the periodogram and autocorrelation (hence, 40 candidates in total). The number of candidates (i.e., 40) is a hand-tuned value and comes from experimenting on many task sets (see Sect. 5.2).

Deriving period bounds to improve accuracy
Why. As it will be shown in our experiments, despite the success of the RPMPA method to improve accuracy, in some scenarios, the "adjustment" step increases the error instead of reducing it (see Sect. 5). Those cases happen when the underlying regression algorithm (as a part of the RPM method) produces an output that significantly deviates from the true period. As a result, when the RPMPA chooses a candidate, it introduces more error. To reduce the chance of deviating from the true period, this section presents methods to derive upper and lower bounds on the period directly from the input projections so that the search space for RPMPA is further narrowed down and its final error is reduced.
What. We present a space-pruning method (SPM) whose goal is to derive a lower and an upper bound on the possible set of period values by looking at the higherorder projections such as ternary and quaternary projections. These bounds are meant to remove the impossible period values from the candidate set generated from the highest 20 peaks of each of the periodogram and autocorrelation methods before they are fed to the RPMPA (recall Fig. 2).
It is worth noting that if applying the lower and upper bounds on the 40 period candidates results in an empty set (i.e., all 40 candidates are outside of the bounds), we suggest to just use the upper bound as the period estimate. Later in Sect. 5.3 ( Fig. 9), we show that choosing the upper bound results in higher accuracy than just using the output of the RPM (regression) method.
How. Ternary and quaternary projections include information about the idle times and the execution of lower-priority tasks, respectively. These information together with some basic knowledge about the scheduling policy can help deriving upper and lower bounds on the actual periods. For example, under a work-conserving scheduling policy, we can deduce that "if a task has accessed the resource between two idle times in a ternary projection, then it must have released a job somewhere between those idle times". In the example shown in Fig. 1d, at least one job of 3 must have been released in the interval [10,19) since there is at least a '1' in the ternary projection of 3 during this interval. Similarly, another job must have been released in the interval [20,25). An upper bound on the period of this task can be derived from the largest inter-arrival times observed in the projection. In Sects. 4.1.2 and 4.1.3, we will elaborate on how to derive such upper bound when tasks do not have or have release jitter, respectively.

Improving the accuracy for ternary projections
Our key idea to derive an upper bound on the task's period is to traverse the ternary projection to find pairs of consecutive intervals separated by idle times in which the task has occupied the resource. We call them effective intervals. Then by looking at every three consecutive effective intervals, we can obtain one upper bound on the task's period. After traversing the whole projection, the smallest upper bound found is the bound we use to prune the period candidates obtained from the peaks of periodogram and autocorrelation.
In the rest of this section, we first discuss how to obtain the effective intervals (see Sect. 4.1.1), and then how to derive upper bounds for tasks with no release jitter (see Sect. 4.1.2) and with bounded release jitter (see Sect. 4.1.3). It is worth noting that our upper bounds for the period are tighter than that of Vădineanu and Nasri (2020). Finally, in Sect. 4.1.4, we show how to calculate a lower bound on the period.

Extracting effective intervals from ternary projections
Assumptions (to derive the upper bounds). Before we explain how to obtain the upper bounds, we summarize the required assumptions: (A1) the scheduling policy is work-conserving and (A2) the task under analysis does neither skips a job (the BCET of the task is not zero) nor suspends itself. If these assumptions do not hold, then the upper bound is ∞ . In practice, it is easy to check if the scheduling policy that governs the resource is work-conserving. Most well-known scheduling policies implemented by operating systems are work conserving, for example, EDF, fixedpriority scheduling, FIFO scheduling, etc. To check if the assumption A2 holds, one may use a separate monitoring tool that checks whether each instance of the task has been completed. If the code of the task is available, an easier solution is to instrument the task so that it sends a signal whenever it finishes. If no 'missed' job occurs during the time the projection is being stored, then the upper bounds that we derive in Sects. 4.1.2 and 4.1.3 can be used.
Let P T i be a ternary projection and x be a time instant at which p x = 1 and ∃z < x in the ternary projection such that p z = idle . Then, the beginning of the effective interval that contains the time instant x, called the effective point (denoted by  I s (x, z) ), is a function that returns the latest idle-time prior to the execution of i , namely, Note that the effective points are only defined for time slots x in which p x = 1 . By traversing through the projection once, one can obtain the starting points of all effective intervals.
In the example shown in Fig. 5a and b, the effective points are I s = ⟨9, 14, 18, 28, 33, 39⟩ . Note that when calculating I s (31, 18) , the idle slot at time 26 does not have the conditions of Eq. (10) because there exists another idle slot in a later time than 26, i.e., at time 28. If it is not certain that the starting point of the ternary projection was aligned with an idea time, the first idle slot in the projection will be considered as the first effective point. (10)

Deriving an upper bound for tasks with no release jitter
We start with a case where the task under analysis does not have release jitter. Later (in Sect. 4.1.3), we will extend our discussions to tasks with bounded release jitter.
Let I s j−1 , I s j , I s j+1 ∈ I s be three consecutive effective points in the ternary projection P T i . In order to obtain an upper bound on the period, we calculate the largest possible distance between the release of two consecutive jobs of the task that have been released in the intervals [I s j−1 , I s j ) and [I s j , I s j+1 ) . To achieve this goal, we will calculate the earliest possible release (denoted by e(I s j−1 , I s j ) ) of a job of i that is released in the interval [I s j−1 , I s j ) and the latest release time of the first job of i that is released in the interval [I s j , I s j+1 ) (denoted by l(I s j , I s j+1 )). Since the ternary projections do not include any special information that allows us to distinguish two jobs of the same task from each other, and since we have no knowledge about the execution time of the task, apart from that the BCET is not zero (i.e., the task does not skip a job), the earliest time at which a job of i might have been released in the interval [I s j−1 , I s j ) is: To obtain the l(I s j , I s j+1 ) , we find the earliest time at which a job of task i has occupied the resource in the interval [I s j , I s j+1 ) . Namely, Note that there might be more than two jobs that have been released in the interval from [I s j−1 , I s j ) . For example, in Fig. 5a and b, the interval [18, 28) contains two actual jobs of i (released at time instants 20 and 25) but we assume there is only one (which is released at time e(18, 28) = 19 ) since we have no evidence in the projection that suggests that the occupation of the resource at the time slot 25 belongs to a new job of i . The next step is to obtain an upper bound on T i using the difference between e(I s j−1 , I s j ) and l(I s j , I s j+1 ): The following theorem proves that Eq. (13) is a sound upper bound for the period. Proof The proof is trivial. By the definition of effective points, we know that there is at least one time instant in each of the intervals I j−1 and I j at which task i has occupied the resource. Since the scheduling policy is work conserving and at time I s j−1 the resource was idle, the earliest time at which a job of task i could have been released in the interval I j−1 is at I s j−1 + 1 (calculated by Eq. 11). Moreover, from (11) e(I s j−1 , I s j ) = I s j−1 + 1.
Real-Time Systems (2022) 58:313-357 Eq. (12), we know that l(I s j , I s j+1 ) is the earliest instant at which the task has occupied the resource within the interval I j . Hence, the latest release of the first job of the task within this interval must have been at or before l(I s j , I s j+1 ) . Consequently, the distance between two releases of the task i in the intervals I j−1 and I j cannot be larger than l(I s j , I s j+1 ) − e(I s j−1 , I s j ) . Hence, Eq. (13) provides a safe upper bound on the period of i . ◻ Figure 5c shows how to calculate four upper bounds for T i from different sets of effective intervals in the example shown in Fig. 5a. These upper bounds are 6, 7, 12, and 6. Thus, T i ≤ 6 is the tightest upper bound that our SPM method obtains from the ternary projections for this example.
It is worth noting that the difference between the upper bound in the current paper and in our prior work (Vădineanu and Nasri 2020) is that here, we calculate the latest arrival time of the first job of the task in the interval [I j , I j+1 ) but in our prior work, we calculated the last arrival time of a job of the task in the interval [I j , I j+1 ) . This has been captured by Eq. (10) in Vădineanu and Nasri (2020) as follows fin(I j , I j+1 ) = max{k | I s j < k < I s j+1 ∧ p k = 1 ∧ ∀p y , k < y < I s j+1 , p y ≠ 1}. As it can be seen, fin(I j , I j+1 ) produces a value that is always larger than or equal to l(I j , I j+1 ) (defined in Eq. 12). Hence, the new upper bound in Theorem 1 is always smaller than or equal to the upper bound in Vădineanu and Nasri (2020).

Deriving an upper bound for tasks with bounded release jitter
When a task has release jitter, Eq. (13) may not hold anymore. For example, by applying Eq. (13) on the intervals [30,34) and [34,39) in the example shown in Fig. 6a (which represents a periodic task with at most two units of positive release jitter, i.e., i = 2 ), one may mistakenly conclude that the period must be smaller than or equal to 4 because e(30, 34) = 31 and l(34, 39) = 35 . However, by looking at the actual release times of the task, we see that the idle slot at time 30 is caused by the release jitter of a job of i that has been released at time 32 instead of 30.
Assumptions and requirements. To be able to derive an upper bound on the period of a task that has release jitter, we would need to know the maximum amount of the release jitter that the task may suffer (i.e., i ). Such information is typically available when the period inference framework is used for runtime monitoring of a known system. If the exact value of i is not know, it is fine to use a safe upper bound on it, if available.
If no safe upper bound on the maximum release jitter can be provided, then only RPM and RPMPA (but not the SPM) solutions can be used to estimate the period. As we will see later in Sect. 5.3, these two solutions can accurately predict the period even when there is release jitter.
Solution idea. Deriving an upper bound for T i requires finding an upper bound on the largest distance between arrival times of two consecutive jobs of the task, where the arrival time is the expected release time when there is no release jitter 3 . We will derive the latter upper bound by calculating a lower bound on the arrival time of a job released in the effective interval I j−1 and an upper bound on the arrival time of the next job released in the effective interval I j .
From the definition of the effective intervals, we know that the task has occupied the resource during the interval [I s j−i , I s j ) . The earliest time at which a job of the task could actually be released in this interval is at I s j−1 + 1 (since the resource was idle at I s j−1 ). However, in the presence of release jitter, the arrival time of that job could be earlier than I s j−1 + 1 . By reducing the maximum value of release jitter, i.e., i , from the release time, we will have a safe lower bound on the earliest possible arrival time of that job at I s j−1 + 1 − i . Since we consider positive release jitter, the actual release time of a job is already an upper bound on the arrival time of that job. Hence, we can use Eq. (12) to obtain an upper bound on the arrival time of the "next" job of the task (in the effective interval [I j , I j+1 ) ). Hence, the new upper bound on the period of a task with at most i units of release jitter is 1 3 Proof The proof is trivial and follows from the above discussion. ◻ Fig. 6c shows the calculations of the upper bound for the example shown in Fig. 6a. As it can be seen, there are four upper bounds for T i and the tightest one is 6.

Calculating a lower bound for period
Assumptions. To obtain a lower bound, we would need the following assumptions: (A1) the scheduling policy is work-conserving, (A2) the task under analysis does not skip a job (e.g., there is no execution path in the task that has zero execution time and the activation of this task is not conditional to some external events) and the task does not self-suspend, (A3) the task has a constrained deadline and the projection does not contain any deadline misses.
If any of these assumptions do not hold, the lower bound on the period will be 0. Note that A3 can be known, for example, in systems that are equipped with separate monitoring tools that report any deadline miss or dropped jobs to the period-inference tool.
Key idea. To obtain a lower bound on period, we extract the largest interval with length L in which the task i does not occupy the resource. This can be obtained as follows Equation (15) finds the largest interval [a, b] in the projection between two time instants a and b, such that the task under analysis has occupied the resource at time a and b but not between them, i.e., p a = p b = 1 and ∀j, a < j < b, p j ≠ 1.
Under assumptions A1, A2, and A3, the length of the largest interval during which no job of the task i has occupied the resource is upper bounded by |L| ≤ 2 ⋅ T i . The reason is that in the worst case, the largest interval during which the task does not occupy the resource happens when one job of the task is executed right after its arrival time and the next job completes right before its deadline, resulting in a value slightly smaller than 2T i . Given that periodogram can contain many peaks at small periods, having a lower bound can help reducing the error efficiently. Note that this bound holds whether the task has release jitter or not: Theorem 3 Given an interval g = (a, b) during which i is not present on the projection, i.e., p a−1 = p b = 1 ∧ ∀j, a ≤ j < b, p j ≠ 1 , if assumptions A1, A2, and A3 in Sect. 4.1.4 hold, then T i > 0.5 |g| is a safe lower bound on the period of the task.
Proof The proof is trivial and follows the above discussion. According to A2 and A3, the task under analysis does not miss a job and does not have a tardy job, hence, its earliest finish time is when it starts its execution right at its arrival time and it has at most one unit of execution. From A3, we know that the latest theoretical upper bound on the completion time of a job is when it completes at its deadline. Since A3 assumes a constrained deadline, an upper bound on the deadline is the period of the task. Now putting these two facts together, the largest interval during which a job of the task does not appear on the projection happens when one job completes as early as possible, i.e., if it is supposed to arrive at t 1 , it completes at t 1 + 1 and the 'next' job completes as late as possible, i.e., at t 2 = (t 1 + T i ) + T i . Consequently, the largest interval during which the task is not executing is upper bounded by t 2 − t 1 = 2T i , in the worst case. Hence, if an interval g is found during which the task does not occupy the resource, |g| < 2T i because otherwise the task must have had a deadline miss (which would violate A3). ◻ It is worth noting that any interval g = (a, b) that is consistent with Theorem 3 can be used to derive a lower bound on T i regardless of what has occupied the resource during that interval (i.e., the resource might be idle or executing some tasks other than i ). However, such a lower bound might be too small (and hence ineffective). For example, one lower bound that can be obtained from Fig. 6a is for the interval [36, 37) which will result in T i > 0.5 . Obviously, it is less effective than the lower bound that is obtained from interval [26, 32) which results in T i > 2.5.
Since both the lower bound and the upper bound can be calculated at the same time (by passing through the projection only once), they have a linear time complexity w.r.t. the projection length.

Improving the accuracy for quaternary projections
Key idea. Quaternary projections contain information about the intervals during which the lower-priority tasks were occupying the resource. Under a fixed-priority scheduling policy, we know that if a lower-priority task is executing, then the task under the analysis must have been completed (otherwise, the assumption about the scheduling policy will be violated). As a result, we can treat the moments/intervals during which a lower-priority task has occupied the resource as "idle instants" when calculating the upper bound on the period.
More formally, when obtaining the upper bound on the period of a task, it is possible to create an augmented ternary projection P T ′ i from a quaternary projection P Q i using filter f that converts the 'low' symbols in the quaternary projection to 'idle' symbols in the augmented ternary projection, defined as follows: Definition 4 An augmented ternary projection P T ′ i derived from a quaternary projection P Q i for task i is defined as To use quaternary projections to derive an upper bound on the period, we need the following assumptions: (A1) the scheduling policy is work-conserving, (A2) the task under analysis does not skip a job (the BCET of the task is not zero) or selfsuspend, and (A3) the system is scheduled by a preemptive fixed-priority scheduling policy.
Lemma 1 Under the assumptions A1, A2, and A3 (Sect. 4.2), at any time slot at which the processor is idle or a task with a lower priority than i has occupied the resource, the task i cannot have a pending job.
Proof The proof is a direct conclusion of scheduling the task set with a work-conserving preemptive fixed-priority scheduling policy and the fact that the task under analysis does not suspend itself and does not skip a job. Namely, whenever it is released, no other low-priority task can occupy the resource. Hence, if a low-priority task has occupied the resource, the task under analysis must not have a job in the ready queue (a job that has been released but has not completed). ◻ Lemma 1 allows us to treat augmented ternary projections (Definition 4) as a normal ternary projection when deriving the effective intervals. Figures 5d and 6d show the effective points obtained from the augmented ternary projections and their impact on tightening the upper bound on the period. Later in Sect. 5.7 we will empirically compare the bounds obtained from ternary and quaternary projections.

Empirical results
We performed a set of experiments to answer the following questions: (i) Does our framework improve the accuracy w.r.t. the state of the art? (ii) How do various families of RBML methods compare against each other? (iii) How robust is our solution against uncertainties and non-deterministic events? (iv) What are the tradeoffs between the accuracy, runtime, and the memory requirements of various RBML methods? and (v) How good our solution generalizes to systems that are widely different from those on which it trained? Questions (i) and (ii) are addressed throughout the evaluation section. Question (iii) is answered in Sect. 5.4, and finally, Sect. 5.6 focuses on questions (iv) and (v). We divided our task systems into three groups: periodic task systems where every task is periodic but tasks might have release jitter or execution time variation (Sects. 5.3 and 5.6), non-periodic task systems, where the task under analysis is periodic but the rest of the system might not be periodic (Sect. 5.4), and case studies from actual systems (Sect. 5.5). The source code and our evaluation framework for these experiments are both available on github (Vădineanu 2020).

Experimental setup
For the experiments in Sects. 5.3, 5.4, and 5.6, we considered two types of task sets: automotive benchmark application and synthetic task sets. For the automotive benchmark applications, we adopted the model proposed by Kramer et al. (2015) for task sets used in automotive industry, where task periods are chosen randomly from {1, 2, 5, 10, 20, 50, 100, 200, 1000}ms with a non-uniform distribution provided by Kramer et al. (2015). For simplicity, we refer to the traces of these task sets as automotive traces.
Our synthetic task sets are comprised of non-harmonic periods. In order to ensure that the chosen periods cover evenly all magnitudes, we used a log-uniform distribution as suggested and described by Emberson et al. (2010). The periods are thereby generated for the range [100, 10000] with a base period of 100ms. For simplicity, we refer to the traces of these task sets as log-uniform traces. We use Stafford's Randfixedsum algorithm which is also used by Emberson et al. (2010) to generate random utilization values for the tasks and then use the utilization and the period to calculate the WCET of each task.
To generate the traces, we use Simso (Chéramy et al. 2014), an open source and flexible simulation tool that generates schedules under various scheduling policies and setups.
Evaluation strategy. The data set used for training the regression models is composed of the projections from 2000 traces (we saw no benefit in increasing the data set size in our preliminary experiment). The length of a trace is set to be either six hyperperiods (traces without random variations) and ten hyperperiods (when there is execution time variation or release jitter) for the experiments in Sects. 5.3 to 5.5. The same trace lengths are used for the testing to capture enough random behavior. In Sect. 5.6, we specifically investigate the impact of trace length on the accuracy of testing.
Metric. The metric we use to evaluate the accuracy is the average error, which is the mean of the individual errors a method makes for every period in a test set unless it is explicitly stated that the error has been obtained for only one task in the task set. Furthermore, we calculate the error of one experiment (that includes 2000 task sets) by using fivefold cross-validation. Namely, we divide the data set into five randomly chosen subsets of equal size. Out of the five subsets, four are used for training and one is used for testing. We measure the error of the testing and repeat the process until all five subsets have been used once for testing.
Baselines. We considered three baselines: (i) PeTaMi, a mining algorithm for periodic tasks (Iegorov et al. 2017), (ii) periodogram (Schuster 1898), and (iii) autocorrelation (Gubner 2006). PeTaMi represents the state of the art on period inference in the real-time systems community, while the other two represent widely used solutions from the signal-processing literature. These two were chosen to evaluate the improvements made by our RPM and RPMPA over solutions that are (only) based on signal-processing techniques.
We compare the RBML methods mentioned in Table 1, denoted by cubist (Quinlan 2014), gbm (Friedman 2002), avNNet (Ripley 2007), extraTrees (Geurts et al. 2006), bartMachine (Chipman et al. 2010), and svr (Cortes and Vapnik 1995). Each of these methods is defined by a set of hyperparameters that require tuning for improving the model's fit on the data. Hence, we performed an additional tuning phase using random search on the parameter's space. This step was integrated in the cross-validation process such that every training set comprised of the four subsets, is further split into a training and validation set. The parameters are varied while being trained on the training set and the model's performance is estimated on the validation set. The purpose of doing one more split is to avoid bias by not involving the test set into the parameter choice.
To be able to focus on the accuracy of the RBML methods, we only show the results of RPM method in Figs. 7, 8a to o, 12, and 15. We compare the accuracy of RPM with RPMPA in Figs. 8p to r and 9; Tables 2 and 4. We performed our evaluation on a Dutch supercomputer based in the cloud. We used thin nodes with 2 × 16 -core 2.6GHz Intel Xeon E5-2697A v4 (Broadwell) and 64GB of memory.

Parameter tuning
Before evaluating our solutions, we need to determine their parameters, i.e., the number of features for RPM and the number of candidates for RPMPA, since they impact the solution's accuracy. The evaluation from Fig. 7a was performed on an aggregated data set, containing automotive traces with four levels of utilization (0.3, 0.5, 0.7, and 0.9). Similarly, for the second experiment from Fig. 7b, we used data Fig. 7 The impact of the number of features (for RPM) and the number of candidates (for RPMPA) on the solution's accuracy. Note that the shade around the curves represents the confidence intervals for 0.95 confidence level 1 3 sets incorporating the four utilization values and we also kept 20% execution time variation for the tasks in both data sets. We picked extraTrees, since it is a representative member of tree-based algorithms and is less affected by the increase in the number of its features in terms of runtime. The experiments were conducted by generating 20 random splits of the data set into training and testing sets (for every parameter value). The model would then be fit on the training data and the average error measured on the test data. Figure 7a shows how the error for extraTrees decreases when we include more features. The gain in accuracy becomes insignificant after adding more than three features from periodogram and autocorrelation. Thus, we kept three features from each of the periodogram and autocorrelation (i.e., six in total). We further analyzed the impact of the number of candidates for RPMPA method on accuracy. As shown in Fig. 7b, a relatively small number of candidates is required in order to achieve a low error until it reaches saturation.

Assessing accuracy in periodic systems
Impact of system utilization. Figure 8a and b show the average error as a function of the total utilization for task sets with 8 tasks. The error of the regression models increase with the increase in the utilization (which in-turn increases the number of preemptions). Furthermore, we observe a dramatic reduction in PeTaMi's accuracy when it is applied on log-uniform traces. This decrease is a result of having nonharmonic periods in log-uniform traces. In contrast, we see that the accuracy of our regression-based solutions has not been negatively affected when applied on nonharmonic periods.
Impact of the number of tasks. Figure 8c and d shows that the error reduces when there are more tasks in log-uniform traces for some of the tree-based solutions such as gbm. It is due to the decrease in the individual task utilization. Thus, although the system is as congested, the individual projection of a task contains larger idle intervals and shorter execution times that are likely not preempted much. This enables the periodogram to extract more meaningful features. However, in automotive task sets, the algorithms are rather unaffected by the number of tasks in the trace since they already have a good performance even for lower number of tasks.
Impact of execution time variations. From Fig. 8e and f, we observe that the regression-based methods are more robust to runtime execution-time variations than the baselines, showing a similar trend for both types of traces to the case with constant execution time. Also, with an increase in the execution time variation in Fig. 8g and h, we notice that most of the RBML methods are robust Fig. 9 Space-pruning for traces with jitter 1 3 (w.r.t. to this variation) for automotive traces, while for log-uniform traces, the error decreases with the increase in the execution-time variation. This behavior is due to the reduction in the average execution time for individual tasks. Since the execution time for a job is drawn from a uniform distribution in the range [(1 − ) × WCET, WCET] , the wider the interval becomes the lower is the average execution time. Having smaller execution time is associated with lower utilization for the system and we previously observed that the methods perform better in lower utilization values.
Impact of release jitter. Figure 8i and j show that the release jitter has a much bigger impact on the error than the execution-time variation. One possible explanation is that the periodogram, which provides most of the information to the algorithms, is negatively impacted by jitter, thus, it produces less useful features for training. However, we observe that some regression algorithms such as extraTrees and cubist are still able to keep a low error even for this challenging scenario.
Impact of candidate adjustment method (RPMPA). Figure 8p and q show that RPMPA has about 50% less error than RPM for most RBML methods when used for cases with execution time variation and, implicitly, on ideal traces too (i.e., traces that do not have variations in the execution time or release time of the tasks). However, when the signal-processing techniques (periodogram and autocorrelation) are disturbed, as is the case for release jitter, the period adjustment step has a negative impact on the accuracy. Later in Table 4, we see a similar pattern for systems scheduled by non-preemptive scheduling policies.
Impact of space-pruning method (SPM). By analyzing Fig. 8r we observe that the inclusion of an upper and a lower bound for SPM contributes to reducing the error even further, proving that the regression is still prone to mistakes even when choosing candidates. However, this solution is expected to show little benefits for systems with large utilization, when fewer intervals of idle-time will be present. The reason is that in that case, SPM will provide upper bounds that are so large that they will not contribute much to filtering infeasible candidates. As we will show in Sect. 5.7, using quaternary projections can significantly reduce these period bounds for the SPM method.
While conducting this experiment, we noticed that under specific setups there can be cases where no candidates are left after the pruning phase. This situation occurs most frequently when the release time of the jobs is affected by jitter so much that the signal processing techniques only generate useless candidates that fall outside of the valid period bounds. As a consequence, when analyzing the effect of release jitter, we defined two criteria to provide a period estimate when no candidate is available. Namely, we either select the output of regression (SPM-R) or we select the upper bound (SPM-UB). Figure 9 shows the results for SPM on traces with jitter. In all cases, both versions of SPM succeed in reducing the error of RPMPA by 45% points. Also, SPM-UB is able to achieve an average error below RPM for cubist, gbm, and bartMachine, while for extraTrees, although it has a larger error, it presents a much narrower confidence interval. Thus, we can expect that the estimate of SPM-UB based on extraTrees to be more reliable than the corresponding RPM.

Assessing robustness
Next question to answer is how robust is our solution w.r.t. uncertainties in the underlying system that may drastically influence the traces generated from those systems. In the rest of this section, we evaluate the robustness of our solution in the presence of (i) higher-priority aperiodic tasks (Sect. 5.4.1), (ii) dropped or discarded jobs (Sect. 5.4.2), (iii) overloads (Sect. 5.4.3), and (iv) initial offsets (Sect. 5.4.4) in the system.

Robustness w.r.t. the presence of higher-priority aperiodic tasks
For this experiment, we considered a configuration consisting of 12 automotive tasks (6 periodic and 6 sporadic tasks) scheduled by Rate Monotonic scheduling policy that are interfered by high-priority aperiodic tasks arriving according to a Poisson process with a rate λ = 0.0005 events/ns (namely, roughly 5 arrivals in every 10us). Furthermore, we focused on analyzing one periodic task in scenarios of having high, medium, and low priority, respectively. For each of the three priority scenarios, the task's priority has been chosen randomly in the ranges [1, 3] for high, [4,7] for medium, and [8, 12] for low-priority tasks. Figure 8m to o show the error as a function of utilization for the tree-based algorithms, periodogram and PeTaMi. The periodogram is affected significantly when priority changes from high to low (comparing Fig. 8m and o) due to the increase in the number of preemptions in low-priority tasks which in-turn causes more noise in periodogram. In contrast, the error of RPM algorithms increases only slightly in large utilization values. We also see that the error of gbm and bartMachine at low utilization values is smaller when the task under analysis has a low priority. It is due to the fact that these two algorithms may not be able to generalize well when the periodogram has low error. Having a low error for periodogram means having less significant (shorter) peaks, which in turn do not provide enough information for these algorithms to excel.

Robustness w.r.t. dropping jobs
Next, we explore the impact of having missed (dropped) jobs in the input projections on the accuracy of our solutions. The setup includes 10 automotive tasks. We consider two scenarios: (i) the tasks under analysis has dropped jobs (with a 15% probability), and (ii) all the other tasks have dropped jobs. Figure 8k and l show that all algorithms exhibit a relatively higher error when there are dropped jobs and the utilization is higher in comparison with experiments with no dropped job (e.g., comparing Fig. 8a and k or l). This increase is due to the fact that projections are imperfect and even can be misleading when some jobs are dropped. Moreover, periodogram is affected notably when the task under analysis drops jobs. However, while the RBML methods show little variations from one case to the other, they are still able to retain meaningful information from their features even when the task under analysis has a low utilization.

Robustness w.r.t. permanent overloads
For this case, we experimented with task sets whose total utilization exceeded 100%. A consequence of such an overload is that lower-priority tasks will experience starvation (i.e., these tasks will not get any opportunity to be executed). Since the projections of starved tasks do not include any information about their periodicity, we excluded such tasks from both the training and testing phase. Figure 10 illustrates the effect of tardiness on the performance of the four treebased regression algorithms and on the two signal processing techniques as a function of the execution time variation. In Fig. 10a we observe a decrease in the error with respect to the execution-time variation factor when the total utilization of the system is 100%. This trend is due to the decrease in the average utilization with the increase in . For instance, for = 0.5, the execution time of the tasks will be uniformly drawn from [0.5 × WCET, WCET], which implies an average execution of 0.75 × WCET, hence, the total utilization will be around 75% rather than 100%.
However, in Fig. 10c we observe an opposite trend for 140% utilization. In this case, the smaller values of error when there is no execution time variation are due to the elimination of the tasks suffering from starvation (as mentioned earlier, they disappear from the trace).
On the other hand, we see an increase in the error for larger values. It is due to the large execution time variations which then allows some of the previously starved low-priority tasks find chances to be executed. However, since these chances appear randomly and infrequently, the resulting projections would not be consistent enough to provide meaningful data for the regression algorithms both during the training phase and testing phase. Figure 10b reflects the combination of the observations for the previous two cases. Until 30% execution time variation, the total utilization does not fall under 100%, hence Fig. 10b shows a similar trend to Fig. 10c. However, when increases, the curve becomes similar to Fig. 10a, since now the system allows more systematic running intervals for the lower-priority tasks.

Robustness w.r.t. offsets
So far, the majority of our experiments were focused on synchronous tasks (namely, the initial offset of the tasks was zero). However, there are many scenarios where the presence of offsets is an essential element of the system, e.g., when the trace has been gathered from messages that transfer over a controller area network (CAN) and are generated by unsynchronized nodes.
We performed an experiment similar to the one for periodic systems (Sect. 5.3) where we varied the utilization of the task set. For this experiment, we used automotive task sets with 10 periodic tasks, 50% execution time variation, and 10% release jitter. The offsets were randomly selected from the range [0, H/2] with a uniform distribution, where H is the hyperperiod of the tasks. The results are presented in Fig. 11.
The first observation in Fig. 11 is that most of the RBML methods have an almost constant error regardless of the utilization. This behavior is due to the decrease in the average utilization, since a variation of 50% in execution implies  an average execution time 75% × WCET, which in turn makes our actual utilization values to be small enough for the RBML methods to perform similarly. Also, we notice both cubist and extraTrees having an error below 5% for all utilization values, while PeTaMi shows errors reaching to 1000%. We have seen a similar poor performance from PeTaMi in Fig. 8i for task sets with release jitter (without offset). It seems that the addition of offsets and execution time variation appears to exacerbate the impact of release jitter when considering different utilization values.

Case study
In this section, we validate our period-inference methods on two case studies. We use two data-sets consisting of traces coming from controller area networks (CANs), denoted by CAN 1 (Lee et al. 2017;HCRL 2010) and CAN 2 (Seo et al. 2018;HCRL 2010) in Table 2. The first data set consists of 988,987 messages with 27 tasks and the second one of 2,369,868 messages with 45 tasks. In order to generate our test data, we split the projections from the messages into smaller projections of 100 jobs. As for our training data, we synthetically generated traces that would provide a good proxy for real data, namely, we created a data set of 6000 automotive traces, with 20 tasks scheduled by non-preemptive rate monotonic scheduling policy, with 50% utilization and 5% jitter as the training set. The results from Table 2 show that our methods successfully estimated the periods of the messages on the actual use case, having errors below 2% for both data sets.  Figure 12a shows an increase in the number of rules generated by cubist as the utilization grows for log-uniform traces with 12 tasks. This behavior is expected since the projections become more complex as the number of preemptions increases in higher utilization values. For example, the average number of rules stored at 30% utilization is 16, but it raises to 65 at 90% utilization. Figure 12b and c illustrate how the mismatch between the number of tasks within the traces used for training and the traces used for testing affects the accuracy of the tree-based algorithms. The experiment is conducted on log-uniform traces with 70% utilization and without uncertainties (since we want to capture a large range of periods and also isolate the effect of the discrepancy between the number of tasks). Figure 12b shows that training on traces with fewer tasks than the target system leads to 2.9% higher error in average than the opposite scenario (Fig. 12c). Moreover, in both cases, we observe that when the gap between the number of tasks in training and testing traces is smaller, the error is smaller too. We also see that cubist has the best generalization capability; it has less than 2.5% error as long as it is trained with task sets that have at least 8 tasks.

Learning robustness: training and testing on different task set types
For this experiment, we train the models on an aggregated set containing 2000 loguniform traces for every utilization value in {0.3, 0.5, 0.7, 0.9} and 12 tasks. Afterwards, the evaluation is completed on 2000 automotive traces with 12 tasks for each of the aforementioned utilization values individually.
The results, summarized in Fig. 12f, show that cubist has the smallest error (i.e., below 8% for utilization values lower than 90%). Comparing Figs. 12f and 8a (where the training and testing were done on the same task set types) we see only a slight difference in the error of the RBML methods which shows that they rather generalize well.

Learning robustness: training and testing on different projection lengths
The goal of this experiment is to see how the error of the RBML methods is affected by the length of their input trace during the inference phase (testing). This experiment is performed on log-uniform traces with 70% utilization and 10 tasks, with the particularity that, when testing, we limit the length of the projections to a certain multiple of the task's period (shown on the horizontal axis of Fig. 12e). As it can be seen in Fig. 12e, both cubist and extraTrees are able to estimate the true period with less than 4% error even when only two jobs of the target task appear in the trace. As expected, the error reduces gradually with the increase in the length of the trace. In contrast, bartMachine has the largest fluctuations in error, making it less reliable when the projection's length is lower than 20 times the period of the target task.

Learning robustness: training and testing on task sets scheduled by different scheduling policies
One other aspect we take into account is the robustness of our method when there could be a difference between the scheduling policy used during the training and testing phases. In this experiment, we train the RBML methods on traces of task sets coming from both rate monotonic (RM) and earlier-deadline first (EDF) scheduling policies. Afterwards, we test the RBML models on separate test sets, each containing traces only from RM and EDF, respectively. Also, since the periods coming from a log-uniform distribution are not harmonic, we chose log-uniform traces for this experiment.
Since for harmonic periods (as in automotive traces), RM generates an almost identical schedule to EDF, it is not useful to consider those task sets in this experiment. For this experiment, task sets where generated following the same approach explained in Sect. 5.3. Each task set has 10 tasks with log-uniform periods and 50% execution time variation. Table 3 shows that all considered regression methods have a very similar performance regardless of the utilization.

Quaternary projections
As mentioned in Sect. 4.2, the information about the execution of lower-priority tasks in quaternary projections can be used to reduce the upper bound on the period. To evaluate the impact of this extra information in reducing the period bounds, we perform an experiment to compare the accuracy improvement resulted from using bounds obtained from quaternary and ternary projections. Our metric is the percentage of improvement in the bounds (the smaller the upper bound, the better it is). We reported (B T − B Q )∕B T , where B Q is the upper bound resulted from quaternary projections and B T is the upper bound resulted from ternary projections (see Sects. 4.1 and 4.2). A higher value of improvement indicates that the upper bound obtained from the quaternary projection is smaller (tighter) than the upper bound from the ternary projection. The experiment configuration is similar to those described in Sect. 5.1. We consider automotive tasks with 10 tasks each having a 50% execution time variation and no release jitter. Figure 13a shows the percentage of the traces that have an improvement in their bounds when quaternary projections are used, i.e., when B Q < B T . Figure 13b shows the average improvement (left vertical axis) and maximum improvement (right vertical axis). These average and maximum improvement values are obtained for projections for which using quaternary projections improves the upper bound. Fig. 13 The effect of quaternary projections. a Is the percentage of the traces that had an improvement in their bounds when quaternary projections were used and b is the average improvement (left vertical axis) and maximum improvement (right vertical axis). Please note the difference in the scale of these two vertical axes  Vădineanu and Nasri (2020) to quantify the improvement in the new upper bound Figure 13 shows that with the increase in the utilization, the improvement on the bounds also increases (both the percentage of task sets that have improvement in their bound and the amount of improvements). This is expected since at larger utilization values, the idle intervals become more scarce and, therefore, less informative. However, the intervals during which lower-priority tasks execute either do not change or increase (like all other tasks when a system has higher utilization). Consequently, the quaternary projections become richer and richer in terms of information they contain.

New upper bound
In this experiment, we evaluate the improvement resulted from the new upper bound (Theorem 1 in this paper) and the upper bound in our prior work [Theorem 1 in (Vădineanu and Nasri 2020)]. We report this improvement by (B old − B new )∕B old , where B new is the upper bound from Theorem 1 (in this paper) and B old is the upper bound from Theorem 1 in Vădineanu and Nasri (2020).
We performed the experiment as explained in Sect. 5.1. We varied the utilization and for each utilization value, the experiment was performed on 2000 automotive traces with 10 tasks each without any uncertainties in their timing parameters (i.e., no release jitter). Figure 14 shows the average and maximum improvements for all projections generated in the experiment. We observe that with the increase in the utilization, the new bound becomes tighter than the old bound (the improvement increases). As we discussed earlier, with the increase in the utilization, the chance to find idle times in a ternary projection reduces and hence it becomes more important to use the remaining opportunities (resulted from the few remaining idle slots) more efficiently. The new bound uses these opportunities more efficiently by having a tighter estimation of the arrival time of the second effective interval in every two consecutive effective intervals. This can also be seen when comparing Eq. (12) in this paper and Eq. (10) in Vădineanu and Nasri (2020).

Non-preemptive scheduling
To evaluate performance of our solution for systems that are scheduled by a nonpreemptive scheduling policy, we performed a similar set of experiments as in Sect. 5.3, using automotive task sets for the following utilization values {0.15, 0.3, 0.45, 0.6}. We scheduled these task sets by the non-preemptive fixed-priority scheduling policy (following rate monotonic priorities). The reason we could not include higher utilization values was due to the large amount of deadline misses that appear in traces. Since we wanted to focus on the impact of the non-preemptive scheduling policies in the experiments (and not the impact of deadline misses), we excluded higher utilization values from this experiment.
We excluded svr and avNNet from the diagrams because they had a poor performance (similar to Fig. 8 for the non-preemptive systems as well. We also added one more baseline, called the naive solution that calculates the mean of the inter-arrival intervals within the projection as an estimation for the period of the task. Figure 15a shows how the average error is influenced by the change in utilization. We notice a slight increase in error for every tree-based algorithm with the increase in utilization (this is similar to the preemptive systems). Also, the errors of  all RBML methods stay below 1% with cubist, extraTrees, and PeTaMi producing very similar results. Impact of the number of tasks. Figure 15b shows that the error increases with the increase in the number of tasks, which is the opposite of the trend we saw in Fig. 8c. This increase is caused by the increase in the blocking of higher-priority tasks. In a preemptive system, higher-priority tasks typically are scheduled as soon as they are released, but in a non-preemptive system, they might be blocked by the lower-priority tasks. With the increase in the number of tasks, the chance that a higher-priority task is being blocked by a low-priority one increases.
Impact of execution-time variation. Figure 15c shows that adding even a small amount of execution-time variation significantly increases the error of PeTaMi. We see a slight increase (of 5%) in the error of gbm, while cubist, extraTrees and bart-Machine are largely unaffected. Moreover, we observe from Fig. 15d that increasing the execution time variation (the horizontal axis) results in a decrease of the average error. This behavior is similar to what we saw in preemptive systems (shown in Fig. 8g and explained in Sect. 5.3).
Impact of release jitter. Similar to the preemptive case (see Sect. 5.3 the part on release jitter), having release jitter in the system increases the error for all methods, in particular for PeTaMi. As we expected, when there are inherent uncertainties in the arrival times (caused by release jitters), the accuracy of PeTaMi decreases drastically, while our solutions have a low error (below 7%).
Impact of initial task offsets. To assess the impact of initial offsets on the accuracy of our solutions for non-preemptive systems, we performed another experiment in which we assigned a random offset with a uniform distribution in the range [0, H/2] to each task, where H is the hyperperiod of the task set. Fig. 15f shows the impact of random offsets (its offset-free counterpart is Fig. 15e). We observe that adding offsets hardly changes the behavior of most of the methods since Fig. 15f looks very similar to Fig. 15e. Interestingly, adding offsets improved PeTaMi's accuracy (by about 20%). One explanation might be that when a set of almost harmonic tasks have non-identical offsets, the chance that they arrive at a time that the resource is idle is higher. Consequently, when they arrive, they are not interfered or being blocked by other higher-or lower-priority tasks. Therefore, the uncertainty about the execution window of the tasks, and hence the error of PeTaMi, reduces.
Comparing RPMPA and SPM with RPM. This experiment quantifies the degree of improvement introduced by the period adjustment method (RPMPA) and by the space-pruning method (SPM) in systems that consist of non-preemptive tasks. For this experiment, we aggregated the task sets generated for Fig. 15 into three categories: (i) task sets with no execution-time variation and no jitter (including all task sets used for Fig. 15a and b), (ii) task sets with execution-time variation (including all task sets used for Fig. 15c and d), (iii) task sets with release jitter (including all task sets used for Fig. 15e and f). These categories are represented in the second column of Table 4 with labels no uncertainty, execution var., and release jitter. We then run SPM and RPMPA on each of the categories and compare their performance with RPM.
To evaluate the impact of RPMPA and SPM w.r.t. RPM, we used a metric called improvement % shown on the third and fourth columns of Table 4. This metric is calculated as follows (E RPM − E X )∕E RPM , where X is either RPMPA or SPM. This metric shows by what percentage each of the RPMPA and SPM could further reduce the error of the RPM method. The higher the value of the improvement, the larger the positive impact of the method. When the improvement is negative, the method being considered has increased the error in comparison to the original RPM. This case happened, for example, for RPMPA when task sets have release jitter (as it can be seen in the third column of Table. 4 for cubist and extraTrees methods). Table 4 shows that applying period bounds (i.e., the SPM method) significantly reduces the error in comparison to the original RPM. For task sets that do not have uncertainties or have execution-time variation, the average improvement is about 85% (see the upper part of the fourth column of Table 4). For task sets with release jitter, the benefit is a bit smaller but it is certainly more than just using RPMPA (comparing the third and fourth columns). In average, SPM has an 80% improvement in the accuracy (for all task sets in Fig. 15 combined).
In particular, for systems with release jitter, using the SPM certainly (and always) reduces the errors while this might not be the case for RPMPA. As it was discussed earlier (in Sect. 5.3), release jitter negatively impact the signal-processing techniques (periodogram and autocorrelation) and hence the candidates they produce may be farther from the true period.

3
Discussions on memory consumption and runtime. Figure 12d addresses both the memory requirements and the runtime of the six considered RBML methods. We notice that cubist has a considerably low memory consumption compared to the rest. It is also almost the fastest solution during both the training and testing phases among the well-performing algorithms. On the other end of the spectrum, bartMachine has the slowest training and testing phases, and avNNet has the largest memory consumption among the considered algorithms.
Non-preemptive scheduling. By conducting a set of experiments for systems without preemptions (see Fig. 15), we observed that the behavior expressed by all treebased solutions was highly similar to the preemptive case. This demonstrates the applicability of our solutions to both types of scheduling paradigms.
Offsets. We noticed that the addition of offsets hardly resulted in a change of behavior for the RBML methods, presenting an almost identical pattern to the scenario with simultaneous releases.
Comparing RPM, RPMPA, and SPM. Our results (see Fig. 9; Tables 2 and 4) show that both RPMPA and SMP are able to further reduce the error resulted from the regression methods (RPM method). We also observed a significant reduction of error when using period bounds (SPM method) in comparison with RPM.

Conclusions
In this paper, we introduced the first regression-based machine learning (RBML) solution for the problem of inferring a task's period from its binary projections. We investigated six most-successful families of RBML methods for this problem and provided comprehensive evaluations and discussions about their accuracy and robustness under various scenarios. We proposed further steps for improving the accuracy by creating period-adjustment and space-pruning methods that use the properties of a work-considering scheduler to prune the space of valid periods of a task. Our solutions proved to be robust and highly accurate. The average observed error of our (best) solution was under 1% in most scenarios including those with a mixture of periodic, aperiodic, and sporadic tasks, execution time variation, and release jitter while the existing work has two to three orders-of-magnitude higher errors. On the case studies from actual systems, the error of our best solution was 1.7%. In the future, we would like to explore RBML methods to infer the timing properties of parallel applications running on multiprocessor platforms and under partial observations.