1 Introduction

Integrated Circuits (ICs) must work at expected frequencies with respect to the timing constrains specified in their designs, which is checked by Computer-Aided Design (CAD) tools. The key challenge is to satisfy these constraints in an optimal way, so that the area taken by the designs on a chip and the power consumption are minimized. This leads to a particular class of optimization problems. The development of the techniques used for design verification forms a separate field of study, the Timing Analysis (Sapatnekar 2004).

With the decrease of features sizes, the impact of variations that occur in manufacturing processes, the process variations, increases. These variations lead to uncertainties in the parameters of the transistors and interconnects, which affect the delays and hence the overall performance of a circuit. For example, the performance of the same designs can differ from chip to chip (intradie or global variations). At the same time, a single design can have variation of delays in different parts of a die (inter–die or local variations). The most reliable way to take these variations into account in order to predict the yield (i.e., the fraction of correctly functional chips among all fabricated) is to run Monte Carlo (MC) simulations. For modern Very Large Scale Integration (VLSI) designs, this is very expensive, which again increases the cost of chips.

A less computationally expensive approach is to use a Static Timing Analysis (STA), which is still the most common way to take into account systematic (global) process variations (Sapatnekar 2004; Vygen 2006; Bhasker and Chadha 2009). This approach treats variations as single and determined values (so-called corner values). The delay values computed in such a way are too pessimistic (Blaauw et al. 2008), which results in an increase in the cost of chips when these delays are mitigated (Visweswariah 2003). An additional challenge is that variations have substantial non-Gaussian behaviour and are often strongly correlated. In modern ultra-VLSI circuits (5 nm and below), the impact of random correlated processes and fluctuations has become even more important. Thus, an alternative approach has been developed.

Statistical Static Timing Analysis (SSTA) addresses randomness in a natural way, treating delays in a system as Random Variables (RVs) from the very beginning. The analysis then allows us to determine the mean value of the delay across selected paths. The maximum delay corresponds to the critical path. Current industrial realizations of SSTA allow one to determine moments of delay distributions and/or their quantiles (Chang and Sapatnekar 2003, 2005; Chang et al. 2005). In principle, SSTA can give a delay distribution of the whole circuit. This makes SSTA comparable to MC simulations in terms of accuracy. At the same time, SSTA algorithms are much less resourceful than MC but have higher computational complexity than deterministic STA.

This work is motivated by Freeley et al. (2018); Mishagli and Blokhina (2020), where an approach was proposed to deal with non-Gaussian distributions of gates’ delays without loss of information while keeping complexity low enough. The authors based their method on the exact solution to the problem of finding a distribution of a logic gate delay assuming that all distributions are Gaussian. The resulting non-Gaussian distribution is then decomposed into a mixture of Radial Basis Functions with fixed shape parameters and locations, but unknown mixing coefficients (the weights). Such a mixture, which is referred to as Gaussian comb, can be obtained by means of a Linear Programming. This is the main advantage of the proposed model.

The aim of this paper is to investigate a classical SSTA problem from the mathematical optimization point of view, which is, to the best of our knowledge, done for the first time. For simplicity, we will represent actual distributions with histograms, which inevitably introduces accuracy drop. We make this choice deliberately, as our focus is on proofing the concept rather than developing an accurate approach. We give two formulations of the SSTA problem as an optimisation problem via (i) Binary–Integer Programming and (ii) Geometric Programming. Finding efficient solutions will be the subject of a separate study where we combine the approach presented in this work with the Gaussian comb model.

The paper is organized as follows. In Sect. 2, we give necessary mathematical preliminaries, terminology, and discuss related work. Section 3 discusses standard and straightforward (i.e., non-optimization) approaches to SSTA and gives the statement of the problem. We give our solutions via optimization techniques in Sects. 4 and 5. Section 4 discusses formal mathematical solution by means of Binary Integer Programming. Section 5 shows that the scalability can be improved by utilizing Geometric Programming. We then conclude the paper in Sect. 6 with overall discussion of the obtained results.

2 Background and related work

In this Section, we start from some mathematical preliminaries, followed by introducing the terminology, and finally briefly describe the key results in the Statistical Static Timing Analysis.

2.1 Mathematical preliminaries

Here we will present only necessary definitions and refer to Boyd et al. (2007) for a very accessible overview of Geometric Programming. For further details, we refer the reader to the references to the literature cited there.

Monomial is a function: \(\mathbb {R}^n_{++} \rightarrow \mathbb {R}_{++} \)Footnote 1

$$\begin{aligned} f(x_{1},x_{2},...,x_{n}) = cx_{1}^{a_{1}...}x_{n}^{a_{n}}, \end{aligned}$$
(1)

where \(c > 0\) and \({ a_i \in \mathbb {R} }\). We call c the coefficient of the monomial and \(a_{i}\) exponents of the monomial. We refer to a sum of monomials as a posynomial, that is, a function of the form

$$\begin{aligned} f(x_{1},x_{2},...,x_{n}) = \sum _{k=1}^{K} c_{k}x_{1}^{a_{1k}...}x_{n}^{a_{nk}}. \end{aligned}$$
(2)

It is easy to see from the definitions that for the set of all monomials A and for the set of all posynomials B, it holds \({ A \subseteq B }\). One should also note that posynomials are closed under addition, multiplication, positive scaling, and results in a posynomial, when divided by a monomial.

A Geometric Program (GP) is an optimization problem of the form

$$\begin{aligned}&\text {minimize} \qquad f_0(\textbf{x}) \nonumber \\&\quad \text {subject to} \quad f_i(\textbf{x}) \quad \le 1, \quad \quad i \quad =1 ,..., n\nonumber \\&\qquad \qquad \qquad \quad g_i(\textbf{x}) \quad = 1, \quad \quad i \quad =1 ,..., p, \end{aligned}$$
(3)

where \(x_i\) are optimization variables, \(f_i\) are posynomial functions, and \(g_i\) are monomials. We call (3) a GP program in a standard form.

An Integer Linear Programming (ILP) problem is written in general form as follows (see, e.g., Wolsey and Nemhauser (1999)):

$$\begin{aligned}&\text {maximize} \quad \textbf{c}^T \textbf{x} \nonumber \\&\text {subject to} \quad \textbf{A} \textbf{x} \le \textbf{b}, \nonumber \\&\qquad \qquad \qquad \textbf{x} \ge \textbf{0}, \nonumber \\&\qquad \qquad \qquad \textbf{x} \in \mathbb {Z}^n. \end{aligned}$$
(4)

If the domain polyhedron is intersected with the hypercube \(\{0, 1\}^n\), we talk about Binary–Integer Programming (BIP) problems. This can be specified using constraints

$$\begin{aligned} \begin{aligned} \textbf{0} \le \textbf{x} \le \textbf{1},\\ \textbf{x} \in \mathbb {Z}^n.\\ \end{aligned} \end{aligned}$$
(5)

2.2 Definitions

Throughout the paper, we will use the following commonly accepted terminology (Blaauw et al. 2008). A logic circuit can be represented as a timing graph G(EV), where the graph and its paths are defined as follows.

Definition 1

A timing graph G(EV) is an acyclic directed graph, where E and V are the sets of edges and vertices, respectively. The vertices correspond to logic gates of a circuit. The timing graph always has one source and one sink. The edges are characterised by weights \(d_i\) that describe delays. The timing graph is called a statistical timing graph within SSTA when the edges of the graph are described by RVs.

The task then is to determine the critical (longest) path.

Definition 2

Let \(p_i (i = 1,\ldots ,N)\) be a path of ordered edges from the source to the sink in a timing graph G and let \(D_i\) be the path length of \(p_i\). Then \(D_{\text {max}} = \max (D_1,\ldots ,D_N)\) is called the SSTA problem of a circuit.

Since the gates have internal structure presented by corresponding combination of transistors, this results in a characteristic time needed for the gates to operate. This is one of the sources of delays in a circuit. Due to delays, input signals can have different arrival times, and therefore, the delay of a gate is determined by the maximum of input delays. On the other hand, the operation time of a gate can have a significant impact on the circuit delay, in addition to the arrival times. In such a case, the delay of a gate itself should be added to the result of the \(\max \) function:

$$\begin{aligned} d_{\text {gate}} = \max (d_1,d_2) + d_0 + d_{\text {int}} + \ldots , \end{aligned}$$
(6)

where \(d_1\), \(d_2\) are delays in input signals, \(d_0\) is a gate delay (due to its operation time) and \(d_{\text {int}}\) is an interconnect delay.

The calculation is straightforward in the case of a deterministic timing analysis, but is not the case when uncertainty arises. As we have already mentioned, within the SSTA, the arrival and gate operation times are described by RVs given by the corresponding distributions. Therefore, the delay (6) can be written as

$$\begin{aligned} \zeta _{\text {gate}} = \max (\xi _1,\xi _2) + \xi _0 + \xi _{\text {int}} + \ldots , \end{aligned}$$
(7)

where \(\xi _1\) and \(\xi _2\) are RVs that describe the arrival times of input signals, \(\xi _0\) and \(\xi _{\text {int}}\) are RVs related to the gate operation time and the interconnect delay respectively. The whole gate delay, \(d_{\text {gate}}\), is now an RV itself and is indicated by \(\zeta _{\text {gate}}\). In principle, \(\xi _0\) and \(\xi _{\text {int}}\) can be combined in a single RV, thus, the latter will be omitted in future discussion.

Therefore, as we can see from (7), two operations fully describe delay propagation at the gate level: (i) the maximum of delays entering a gate and (ii) the summation of the latter with the delay of the gate. These operations are often called atomic operations of SSTA (see, e.g., works by Cheng et al. 2007, 2012). In the language of distributions, Eq. (7) gives a convolution of probability density functions of the RVs \(\max (\xi _1,\xi _2)\) and \(\xi _0\). In this work, we consider a histogram approximation to the problem, which will be discussed in the next sections.

2.3 Related work

There were excellent reviews of the work done in the early stage of the SSTA era, 2001–2009, namely by Blaauw et al. (2008) and Forzan and Pandini (2009). A good overview is also conducted by Beece et al. (2014), where a transistor sizing problem was addressed by means of optimization techniques. We shall summarise key ideas of the SSTA research in this subsection.

The research at that stage was based on variants of the idea first presented by Clark (1961) within the block-based approach. The idea is that actual distributions can be approximated by Gaussians by matching the first two moments (mean and variance). Thus, in Chang and Sapatnekar (2003, 2005) Clark’s algorithms were accompanied by handling spatial correlations using principal component analysis (PCA). To propagate a delay through the timing graph, the linear canonical model of a delay was proposed (Visweswariah et al. 2004; Chang et al. 2005; Visweswariah et al. 2006). The delay is described as a linear function of parameter variations:

$$\begin{aligned} D = a_0 + \sum \limits _{i=1}^n a_i \varDelta X_i + a_{n+1} \varDelta R, \end{aligned}$$
(8)

where \(a_0\) is the mean or nominal value, \(\varDelta X_i\) represents the variation of n global sources of variation \(X_i\) from their nominal values, \(a_i\) are the sensitivities to each of the RVs, and \(\varDelta R\) is the variation of the independent RV, R. Then, the mean and variance of a delay were represented using a concept of tightness probability (or binding probability in Jess et al. (2003, 2006)) \(T_A=P(A>B)\), which is the probability that the arrival time A is greater than B. The linear approximation for \(\max (A,B)\) was proposed.

Also, various extensions to this approach were proposed mainly based on adding non-linear terms to (8). For example, the quadratic term was introduced in Zhan et al. (2005, 2005, 2006). Chang et al. (2005) proposed to use numerically computed tables to describe the non-linear part of the canonical form. Khandelwal and Srivastava (2005) considered gate delays and arrival times using their Taylor-series expansion. The paper (Ramprasath et al. 2016) discusses another modification of the canonical form (8), based on the addition of the quadratic term and using skew–normal distributions. The correlations were considered in Chang and Sapatnekar (2005), as we mentioned above, within the PCA method. Later, in Singh and Sapatnekar (2008), it was proposed to transform the set of correlated non-Gaussian variables via an independent component analysis (ICA) into a non-correlated set. The described canonical delay model suffers from a big disadvantage: it requires approximation of the maximum operation, which is a source of errors that we want to mitigate.

More broadly, optimisation problems appear in various aspects of CAD for VLSI (Korte and Vygen 2008; Brenner and Vygen 2009; Held et al. 2011). For example, the gate sizing problem has received attention in the community for more than 30 years (Berkelaar and Jess 1990; Sapatnekar et al. 1993; Jacobs and Berkelaar 2000; Rakai and Farshidi 2015). One should point out the GP formulations of this optimization problem (Boyd et al. 2005, 2007; Joshi and Boyd 2008; Naidu 2021), as the latter plays a significant role in the present study, but we shall not discuss these works in detail, as the gate sizing is out of scope in the present paper. One should note that in these papers, as well as in the above mentioned work (Beece et al. 2014), timing analysis was performed using the canonical model of a delay.

3 Statistical static timing analysis: setting up the problem

This Section revisits the problem of calculating the maximum delay in the SSTA histogram approximation, as captured in Algorithm 1. In particular, we will present the exact computation of the maximum and the convolution, without claiming novelty of the presented material.

Algorithm 1
figure a

General SSTA algorithm

Throughout this paper, distributions are represented by the histogram approximation, where we assume that all histograms share edges of all bins. This assumption is made for the sake of clarity and can be removed at the cost of a somewhat more complicated notation.

Let us find the histogram approximation of (i) a distribution of the maximum \(\zeta \) of two independent random variables \(\eta \), \(\xi \), and (ii) a convolution of two histograms. We assume a set of bins \(B=\{0,~1,~...,~n-1\}\), and that the histogram samples \(\eta _i\) and \(\xi _i\) take the value in the interval [ab], \(\forall {i}\). Given the edges interval [ab], we partition \(\mathbb {R}\) into n intervals of size \(|b - a| / n\) with points \(e_1\)\(e_2\), ... \(e_{n+1}\) such that \(e_i\) and \(e_{i+1}\) are the start and end of the \(i^{\text {th}}\) interval correspondingly. The midpoints of the intervals \(m_1\)\(m_2\), ... \(m_N\) are also given.

3.1 Maximum operation

The maximum of two RVs, \(\zeta =\max (\eta ,\xi )\), which is an RV itself, is defined as

$$\begin{aligned} \zeta = {\left\{ \begin{array}{ll} \eta ,&{} \quad \text {if } \eta \ge \xi ,\\ \xi ,&{} \quad \text {if } \eta < \xi . \end{array}\right. } \end{aligned}$$
(9)

Using the law of total probability and taking into account the independence of \(\xi \) and \(\eta \), the probability P of a realization \(\zeta =z\) can be written as

$$\begin{aligned} P(\zeta = z) = P(\eta = z) \cdot P(\xi \le z) + P(\xi = z) \cdot P(\eta < z). \end{aligned}$$
(10)

It is easy to see that the discrete random variable formulation and histogram estimations \(h_{\zeta }\), \(h_{\eta }\), and \(h_{\xi }\) for the RVs \(\zeta \), \(\eta \), and \(\xi \) are expressed as follows:

$$\begin{aligned} h_{\zeta }[i] = h_{\eta }[i] \cdot \sum _{k = 1}^{i}{h_{\xi }[k]} + h_{\xi }[i] \cdot \sum _{k = 1}^{i-1}{h_{\eta }[k]}. \end{aligned}$$
(11)

Note that the upper bound of the second sum is \(i-1\). This is due to a strict inequality, \(\eta <\xi \), in (9). This is summarized as the Algorithm 2.

Algorithm 2
figure b

Maximum of two histograms

3.2 Convolution operation

The convolution of two discrete-valued functions, f and g, is defined as:

$$\begin{aligned} (f*g)[z] = \sum _{k=-\infty }^{\infty }{f[k] \cdot g[z-k]}. \end{aligned}$$
(12)

Time complexity of the naïve implementation of the convolution is \(\mathcal O(N^2)\), which is can be seen in Algorithm 3. The formula (12) implies that the values of the edges must be changed. The value of the first edge has to be added to all other edges. This can be done in many ways and is discussed in the following.

One can add the first value to all edges and unite them during the SSTA algorithm when the edges of the second histogram differ. The receipt is simple: find a new array of edges \(\textbf{e}\) and modify the PDFs of histogram approximations \(f_{\alpha }\), \(f_{\beta }\) with the new changed edges. The array of edges of \(f_{\alpha }\) is given as \(\textbf{e}^{\alpha }\), similarly \(\textbf{e}^{\beta }\) denotes the array of edges of \(f_{\beta }\).

Similar to the rv_histogram() method of the scipy.stats library, one can find a distribution function (PDF) that fits the given histogram as follows. Let us say that F is the fitted cumulative distribution function (CDF) of the \(f_{\alpha }\). Then PDF of the realization \(z \in n_i\) of the new histogram \(f_{\alpha '}\) with the desired edges \(\textbf{e}\) is

$$\begin{aligned} f_{\alpha '}(z) = F(e_{i+1}) - F(e_{i}), \end{aligned}$$
(13)

or, recalling a definition of CDF,

$$\begin{aligned} f_{\alpha '}(z) = \int _{e_i}^{e_{i+1}} f_{\alpha }(x) \,dx. \end{aligned}$$
(14)

The same would be done for \(f_{\beta '}\). The solution (14) gives more precise results and bypasses the problem of fitting functions, which is relatively computationally demanding, all at the cost of a slightly longer code. The exact integration can be performed in O(N) for the whole histogram (Fig. 1).

Fig. 1
figure 1

Sketch demonstrating integration over the fitted function (13) (yellow surface) and the exact integration (14) (orange shaded area to the left). Blue function represents the fitted PDF. (Color figure online)

When one looks at the SSTA Algorithm 1, a problem with such a union of edges is evident. After convolutions of the input gates, every time a maximum and then convolution are computed, the edges will differ and have to be united. Taking into account the two inputs for each gate, the function (13) or (14) is called twice per a gate. Moreover, more problems are to come when trying to solve a convolution optimization problem.

A different and more straightforward solution is presented in the next subsection.

3.3 Convolution with shifting a histogram

A second solution to the problem of adding a value to the edges is a simple shifting. We can shift the whole histogram to the left or right by the number of bins determined by a value that is to be added to all edges divided by the length of the bins and floored. Shifting the value of a bin by such a number simulates the addition of the first value to all edges. We assume n bins, set of bins \(B = \{0, 1,..., n - 1\}\) the identical edges of the histograms is given as \(\textbf{e} \in \mathbb {R}^{(n+1)\times 1}\), two RVs \(\alpha , \beta \); their convolution \(\zeta \), its shifted version \(\zeta '\); and their histogram approximations \(h_{\alpha }\), \(h_{\beta }\), \(h_{\zeta }\), \(h_{\zeta '}\). The shift s can be computed as

$$\begin{aligned} s = \left\lfloor \frac{\mid e_{0}\mid }{e_{1} - e_{0}} \right\rfloor , \end{aligned}$$
(15)

where \(\lfloor \cdot \rfloor \) denotes the flooring operation. In the case of \(e_{0} > 0\), the new changed histogram \(h_{\zeta '}\) at point \(x \in B\) will look like

$$\begin{aligned} h_{\zeta '}[x + s] = h_{\zeta }[x] \end{aligned}$$
(16)

In the case of \(e_{0} < 0\), the shift is similar:

$$\begin{aligned} h_{\zeta '}[x] = h_{\zeta }[x + s] \end{aligned}$$
(17)

When shifting to the right, there are s unoccupied positions on the left. These are nullified. Similarly, done when shifting to the left. Having the starting interval set correctly, this does not have any effect on precision as the starting and ending bins should always be zero. If the interval is small, the accuracy increases, since the bins can encode more information in a smaller interval. However, if it is too small, then we cut some information by this shift. Furthermore, the more bins we add, the more precise this shift will be.

Algorithm 3
figure c

Convolution

The shifting method gives exactly precise solutions as the one with union. It is straightforward to implement, can be done in linear time, and can be used nearly without any change in the optimization problem. Therefore, this method is used better than the union method.

In summary, we have reviewed two key (atomic) operations of SSTA algorithms, maximum and convolution, and the way how these operations can be performed for histograms. In the following sections, we will give a formulation of the SSTA as (i) a Binary Integer Programming problem and (ii) a Geometric Programming problem. These formulations given for histograms constitute the original contribution of this work.

4 SSTA via Binary–Integer programming

Here, we formulate the SSTA algorithm as a Binary–Integer Programming (BIP) problem. For this purpose, we will introduce a unary encoding of counts in binary variables.Footnote 2 Then, we will show how to perform the atomic operations (\(\max \) and convolution) on histograms via BIP. Finally, we will compare the approach against Monte Carlo simulations and discuss its scalability.

4.1 Statement of the problem

To write the SSTA Problem of a Circuit, \(D_{\max } = \max (D_1,\ldots ,D_N)\), as a BIP problem (4)–(5), one should (i) propose a risk measure for the objective function \(\textbf{c}^T \textbf{x}\), and (ii) formulate corresponding constraints. Let us discuss the constraints first.

From the Algorithms 2 and 3, one can see that a multiplication of two non-negative real numbers occurs in both of them. By utilizing multiplication naively, we obtain a bi-linear function, which is not convex jointly in both arguments. This cannot be used as a constraint or as an objective function. One solution could be to use McCormick envelopes. This requires setting the lower and upper bounds of the factors. The only way to compute these bounds is by the exact computation of the problem using the methods shown in Sect. 3. Another option is to formulate the problem in unary notation, which is the subject of the present Section.

Below, we will discuss how the atomic operations of SSTA can be performed on histograms in unary encoding. It will be shown that this leads to the corresponding mixed–integer linear programs.

4.2 Atomic operations on histograms in unary representation

As we have discussed above, multiplication is the key operation for both \(\max \) and convolution. This Section discusses the multiplication in unary representation.

4.2.1 Multiplication

Let us first make a note on vectorization of the multiplication operation. Consider two natural numbers, \(\alpha \) and \(\beta \). By definition, the multiplication of two numbers is equal to the repeated addition:

$$\begin{aligned} \alpha \cdot \beta = \underbrace{\beta + \beta + \ldots + \beta }_{{\alpha }\text { times }}. \end{aligned}$$
(18a)

It is easy to see that in unary notation it can be represented by matrix–vector multiplication. Indeed, having written \(\alpha \) and \(\beta \) as column vectors, \(\textbf{a}\) and \(\textbf{b}\), of sizes \(\alpha \times 1\) and \(\beta \times 1\), correspondingly, we obtain

$$\begin{aligned} \alpha \cdot \beta = \underset{1\times \alpha }{\textbf{1}^T} \underset{\alpha \times \beta }{ \begin{bmatrix} \textbf{a} &{} \textbf{a} &{} \ldots &{} \textbf{a}\\ \end{bmatrix} } \underset{\beta \times 1}{\textbf{b}} \end{aligned}$$
(18b)

This representation allows performing multiplication of numbers written in unary encoding efficiently. Now, we shall discuss multiplication of binary variables by means of integer programming.

Consider two binary variables: \(x,y\in \{0,1\}\). Multiplication of the variables via BIP requires introduction of an auxiliary variable, \(s\in \{0,1\}\), and setting corresponding constraints. The constraints establish obvious relationship between this variable, s, and the multiplicands, x and y:

$$\begin{aligned} \left. \begin{aligned} s&\le x,\\ s&\le y,\\ s&\ge x + y - 1.\quad \end{aligned} \right\} \end{aligned}$$
(19)

The process of formulating the constraints (19) for the convolution of two histograms in binary form is summarized as Algorithm 4. Note that the two outer loops, over z and k, correspond to those from Algorithm 3, and the inner loops, over i and j, are due to unary encoding of the counts. However, these inner loops can be removed if the computation performed via efficient vector operations as shown in (18b).

The BIP constraints for the \(\max \) operation can be introduced in a similar manner and, hence, not discussed here. This procedure will repeat two loops from Algorithm 2 and will have two internal loops as in Algorithm 4. Therefore, further in this Section we will discuss only convolution operation, but we want to note that the same argumentation can be carried out for the maximum of two histograms.

Algorithm 4
figure d

Generation of BIP constraints for convolution operation

4.2.2 Forming a histogram in the unary encoding

Consider two histograms in unary encoding, \(\textbf{H}_X\) and \(\textbf{H}_Y\), of size \(n\times m\), where n is the number of bins and m is the number of binary variables in a bin. For these histograms, the Algorithm 4 returns an array \(\textbf{u}\) with auxiliary variables s given as variables. The values of these variables shall be obtained during the solution of the BIP problem.

One can see that the length of the array \(\textbf{u}\) equals the number of bins, n. Each element of the array \(\textbf{u}\) contains a sum of the auxiliary variables (see Algorithm 4, lines 6–7). It is easy to see the meaning of the each element in \(\textbf{u}\) (and, hence, of the each sum of s variables). They correspond to the values of bins in a histogram \(\textbf{H}_{XY}\), which is the convolution of \(\textbf{H}_X\) and \(\textbf{H}_Y\).

In order to map the elements of \(\textbf{u}\) onto the bin values of the histogram \(\textbf{H}_{XY}\), we (i) generate the matrix \(\textbf{H}_{XY} \in \{0, 1\}^{n\times m}\) with new variables, and (ii) supplement these variables with the following constraints

$$\begin{aligned} \left. \begin{aligned} \textbf{1}^T \textbf{H}_{XY}[1,:]&\le {\textbf{u}[1]} \frac{1}{d} + 0.5 \\ \vdots \\ \textbf{1}^T \textbf{H}_{XY}[n,:]&\le {\textbf{u}[n]} \frac{1}{d} + 0.5 \end{aligned} \quad \right\} \end{aligned}$$
(20)

Here \(\textbf{1}^T \textbf{H}_{XY}[i,:]\) denotes summation over the \(i^{\text {th}}\) row of the histogram \(\textbf{H}_{XY}\), i.e. \(\textbf{1}^T \textbf{H}_{XY}[i,:] = \sum _{k=1}^m \textbf{H}_{XY}[i,k]\), and \(\textbf{u}[i]\) is the \(i^{\text {th}}\) element of the array \(\textbf{u}\). Recall that the histogram \(\textbf{H}_{XY}\) has n rows and m columns, where rows correspond to the bins and columns give the bin counts in unary encoding. The parameter d is a normalization factor that reads

$$\begin{aligned} d = \frac{ \max \{ {\textbf{u}[1], \ldots , \textbf{u}[n]}\} }{m}. \end{aligned}$$
(21)

Such a choice of the normalization parameter is to ensure that the r.h.s. of the inequalities (20) does not exceed the number of binary variables, m. The number 0.5 is to round the r.h.s. to a positive real number.

In this form the value of the normalization factor d can be obtained self-consistently during the optimization procedure. However, this would give a non-convex optimization problem due to the division in (21), which we wish to avoid. Therefore, we fix the normalization factor in our implementation to be a constant of our choice. We can check post-hoc, whether the solution has reached the bound on the number of unary digits in any of the bins of the histogram. If this is the case, the optimality has been affected and we can increase the bound and re-run the procedure. If this is not the case, the normalization does not affect the optimality.

Let us now discuss how the problem can be simplified by the tightening.

4.2.3 Problem tightening

The convergence of the BIP solver can be increased by introducing constraints that tighten the relaxation. For example, one can separate zeroes from ones in the histogram matrix \(\textbf{H}\) with symmetry–breaking constraints:

$$\begin{aligned} H_{i, 1} \ge H_{i, 2} \ge H_{i, 3} \ge \ldots \ge H_{i, m-1} \ge H_{i, m}. \end{aligned}$$
(22)

These constraints do not have any effect on the correctness of the solution but decrease size of the branch and bound tree of the solver.

Moreover, as we discussed in Sect. 2.2 (see expressions (6) and (7)), maximum and convolution describe the operation of a basic logic gate. Thus, it is natural to evaluate these operations simultaneously for each gate. Similar to (19), we can write for three binary variables, \(x,y,z\in \{0,1\}\):

$$\begin{aligned} \left. \begin{aligned} s&\le x,\\ s&\le y,\\ s&\le z,\\ s&\ge x + y + z - 2. \end{aligned} \right\} \end{aligned}$$
(23)

These constraints describe the \(\max \) of two binary variables, x and y, and further convolution of the result with the third binary variable, z. Having discussed the atomic SSTA operations in unary encoding, we can proceed to the formulation of the SSTA Problem as a BIP problem, where we perform the same operation bit-wise on the unary-encoded counts.

4.3 Implementation and validation

Calculation of the circuit delay requires traversing the timing graph G(EV). Since a histogram \(\textbf{H}_{\text {sink}}\) corresponding to the sink contains all binary variables obtained in the previous steps, it is natural to write the objective function of the BIP problem (4)–(5) as the sum of all the variables due to atomic operations in the graph, \(\textbf{1}^T \textbf{H}_{\text {sink}} \textbf{1}\), subject to constraints discussed above. Doing so, we obtain the SSTA problem of a Circuit via BIP as follows:

$$\begin{aligned} \begin{array}{ll} \text {minimize}&{} \quad {\text {risk}(\textbf{H}_{\text {sink}})} \\ \text {subject to}&{} \textbf{H}_g[i,1] \ge \ldots \quad \ge \textbf{H}_g[i,m] \quad \quad \forall g \in G: i=1,\ldots ,n,\\ &{} \quad \textbf{1}^T \textbf{H}_g[i,:] \quad \le {\textbf{u}[i]} \frac{1}{d} + 0.5,\\ &{} \quad \quad \quad s_g \, \le \{x,y,z\}, \qquad \qquad \\ &{} \quad \quad \quad s_g \, \ge x + y + z - 2, \qquad \qquad \\ &{} \quad \quad \quad \textbf{S}_g \, \le \textbf{G}_g \end{array} \end{aligned}$$
(24)

where \(\text {risk}\) is a MILP representation of a risk measure. Notice that in many practical scenarios, one may consider the conditional value at risk (CVaR) (Rockafellar and Uryasev 2000) of the histogram–approximated random variable \(\textbf{H}_{\text {sink}}\) as the risk measure \(\textrm{risk}(\textbf{H}_\text {sink})\), in an effort to “shift the probability mass left”, loosely speaking. In general, this would be implemented by summing up the counts in some number of “rightmost” bins of the histogram approximation. For testing purposes, we have utilized the expression \(\textbf{1}^T \textbf{H}_{\text {sink}} \textbf{1}\), which sums counts across the histogram, and implements CVaR at 0% confidence level.

Fig. 2
figure 2

Comparison of three BIP formulations for the SSTA. The methods are tested on a “ladder” of maxima with \(n=12\) bins, each bins’ count bounded from above by \(m=12\). The blue line indicates a method with only (19) constraints (ORIG), the orange line indicates a method with the symmetry–breaking constraints (SBC) (22) and the green line indicates a method with the symmetry–breaking constraints and a \(3-\)term multiplication model (SBC+TTM) (23). In the first figure, the orange line overlaps the green line. In the second figure, the blue line overlaps the orange line

This BIP formulation uses the symmetry–breaking constraints (22) and the \(3-\)term multiplication model (23). Note that these constraints (23) imply taking \(\max \) and convolution operations for a gate g, whereas constraints (19) correspond only to multiplication and, thus, should be added after each atomic operation, both \(\max \) and convolution. Matrices \(\textbf{G}_g\) in the BIP are unary representations of a delay of each gate, bounded from above elementwise by \(\textbf{S}_g\), again represented in unary.

Fig. 3
figure 3

Scalability of the model with the first relaxation (SBC) (22) tested on a “ladder” of maxima with \(n=25\) bins, \(m=20\) unary variables, and no time limit. The subplots show: a,  the growth in the number of non-zeros (blue line), variables (orange line), and constraints (green line). b, MIP gap at a root node in percentage, MIP gap tolerance is set to 1%; c, time in seconds; d, relative error of the standard deviation (orange line) and mean (blue line) compared to Monte Carlo. (Color figure online)

To validate our BIP formulation of the SSTA, we have implemented (24) in Mosek 10.0 matrix-oriented API. As a test bench we have used a toy circuit used in Mishagli and Blokhina (2020, Fig. 7) that gives a “ladder” sequence of logic gates. In terms of atomic operations, this sequence for the Nth gate reads

$$\begin{aligned} \max \{\underbrace{\ldots \max [\max (\xi _1,\xi _2) + \xi _0, \xi _3] + \xi _0\ldots }_{N-1\text { times}}, \xi _{N+1}\} + \xi _0. \end{aligned}$$
(25)

Here RVs \(\xi _i\) are drawn from the normal distribution, \(\xi _i\sim \mathcal N(\mu _i,\sigma _i)\). Delay due to gates operation time (gate delay) is given by \(\xi _0\), and \(\xi _i\) (\(i=1,\ldots ,N+1\)) are inputs’ delays. Gate delays were assumed distributed according to the standard normal distribution, \(\xi _0\sim \mathcal N(1,0)\); the mean values and standard deviations for inputs were drawn from the uniform distribution, following Ref. Mishagli and Blokhina (2020).

This sequence was simulated (i) using Monte Carlo (MC) and (ii) by solving BIP (24). For each gate \(g \in G(E,V)\), a new matrix \(\textbf{H}_g\) containing binary variables was created as described above, until the sink was reached. Then, the final BIP problem was passed to a Mosek solver. Numerical experiments were ran on a machine equipped with Intel(R) Core(TM) i9-9880H (8 cores at 2.3 GHz and total of 16 threads) with 16 GB RAM.

The results are summarized in Figs. 2 and 3. Correctness of the BIP formulation can be seen from Fig. 3 where the mean and standard deviation of the sink delay distribution is compared against the MC. Although the number of constraints and variables scale linearly with the number of gates, these numbers are huge, thus, this approach does not scale. For example, the numbers of variables and constraints quickly reached the order of \(\sim 10^6\) for only 5 gates (Fig. 3). In the next Section, we consider the GP formulation of SSTA, which scales much better.

5 SSTA via geometric programming

Next, we present a practical formulation of the SSTA problem. First, we will present the formulation by means of Geometric Programming, which is a restriction of the exact formulation, but where the error can be made arbitrarily small. Then, we will show its reformulation and scalability.

5.1 Statement of the problem

Naturally, we can treat the probability of each bin as a positive number in the range \([\epsilon , 1]\), where \(\epsilon \) is a very small number. Multiplication of two bins leads to a monomial function of two variables and a neutral coefficient. Either the convolution or the maximum is then the sum of the multiplications, thus a posynomial. Therefore, general Algorithms 2 and 3 can be utilized within the Geometric Programming (GP) framework in a straightforward manner and no additional constraints are needed for the atomic operations.

In the following, we again consider a timing graph G(EV) consisting of N gates; the number of bins used for the histogram approximation of distributions is n. For each input gate g, we have a vector \(\textbf{e}_g \in \mathbb {R}_{++}^{n\times 1}\) created from generated numbers with Gaussian probability and a vector \(\textbf{z}_g \in \mathbb {R}_{++}^{n\times 1}\) of non-negative variables representing the bin probabilities. Similarly to Sect. 4, the geometric program starts with

$$\begin{aligned} \begin{aligned}&\text {minimize} \quad {\text {risk}(\textbf{z}_{\text {sink}})} \\&\text {subject to} \quad \quad \textbf{e}_g \le \textbf{z}_g \quad \le \textbf{1}, \quad g = 1,\ldots ,N, \end{aligned} \end{aligned}$$
(26)

where \({\text {risk}(\textbf{z}_{\text {sink}})}\) is a posynomially–representable risk measure of the delay \(\textbf{z}_{\text {sink}}\) at the sink of the graph, and the bounds of the variables are standard GP-compatible inequalities. Subsequently, the construction of the geometric program follows the Algorithm 1 (“General SSTA algorithm”). In particular, for each maximum \(z_{\zeta }\) of two histograms \(z_{\eta }\) and \(z_{\xi }\), we constrain z as in Algorithm 2:

$$\begin{aligned} z_{\zeta }[i] = z_{\eta }[i] \cdot \sum _{k = 1}^{i}{z_{\xi }[k]} + z_{\xi }[i] \cdot \sum _{k = 1}^{i-1}{z_{\eta }[k]} \quad \forall i = 1 \ldots n. \end{aligned}$$
(27)

See Sect. 3.1 for a discussion. For each convolution \(z_{\zeta }\) of two histograms \(z_{\eta }\) and \(z_{\xi }\), we constrain elements of \(\textbf{z}\) as in Algorithm 3:

$$\begin{aligned} z_{\zeta } = \sum _{i=1}^n \sum _{k=1}^{i-1} z_{\eta }[k] \cdot z_{\xi }[i-k] \end{aligned}$$
(28)

up to the shifting. See Sect. 3.2 for further details.

5.2 Reformulation and relaxation

The posynomial formulation in (26) does not seem to be problematic in any way. Still, to improve its scalability, we may wish to consider a reformulation. In the following, we will concentrate only on convolution; the procedure is the same for the maximum.

After the first convolution in the last bin, we have a posynomial with \(1\cdot n\) terms (monomials with two variables). After the second convolution, we will have in the last bin a posynomial with \(n\cdot n\) monomials each with three variables, and for the \(N^{\text {th}}\) gate we will have \(n^{N-1}\) monomials in the last bin, each with N variables. This clearly leads to an exponential growth in monomials after each convolution and maximum for a constant number of bins.

For each monomial in the posynomial, we need to introduce an exponential cone, two continuous variables, and two constraints. Thus, for a constant number of bins, the number of variables, cones, and constraints grows exponentially with the number of gates. For a constant number of gates \(N+1\), the number of variables, cones, and constraints grows as the sum of all bins \(\sum _{i=1}^n i^N\) with the number of bins, which, in turn, can be expressed as a polynomial of degree \(N+1\) by Faulhaber’s formula (Knuth 1993).

Fig. 4
figure 4

Scalability of the GP (26) on a “ladder” of maxima with \(n=60\) bins and a varying number of gates, N. The subplots show: a, the growth in the number of cones (blue line), variables (orange line), and constraints (green line overlapped the blue line) as the number of gates, N, increases; b, time in seconds; c, relative error of the standard deviation (orange line) and mean (blue line) compared to Monte Carlo as the number of gates, N, increases. (Color figure online)

Fig. 5
figure 5

Scalability of the reformulated GP (26) tested on a “ladder” of maxima with fixed \(N=4\) gates and varying numbers of bins, n, per gate. The subplots show: a, the growth in the number of cones (blue line), variables (orange line), and constraints (green line overlapping the blue line) as the number of bins, n, increases; b, time in seconds as the number of bins increases; c, relative error of the standard deviation (orange line) and mean (blue line) compared to Monte Carlo, as the number of bins, n, increases. (Color figure online)

We can reduce this with the following simple trick: by introducing n new positive variables (monomials) and setting appropriate constraints. At the beginning of the traverse, we initiate two empty lists of vectors to store the successors and predecessors. After the convolution at the \(i^{\text {th}}\) gate, the resulting vector of posynomials \(\textbf{z}_i\) is appended to the list of predecessors. A new vector of one-variable monomials is created and written in the successors’ list, and constrained such that it is no less than the predecessor. This new vector represents a histogram and is propagated further. The last vector \(\textbf{z}_{\text {sink}}\) that appears in the successors’ list corresponds to the histogram of the delay at the sink node.

Subsequently, the formulation continues with the inequalities based on the equalities in Algorithms 2 and 3, while adding the auxiliary variables.Footnote 3 In particular, for each convolution, we introduce \((n/2)(1 + n)\) new exponential cones, thus \((2n/2)(1 + n) + n\) auxiliary variables. Such a reformulation gives the exact same solution as the original GP. We just decreased the exponential growth of variables, cones and constraints to a linear one with the number of gates, and a high-degree polynomial growth with the number of bins to always quadratic. The values are slightly different for the maximum, but the asymptotics before and after the reformulation are identical.

Notice that (26) can be transformed into a standard GP-compatible inequality. Thus, the reformulation of (26) remains a generalized GP program.

5.3 Implementation and validation

We have prototyped the formulations in CVXPY Agrawal et al. (2018); Diamond and Boyd (2016). For benchmarking purposes, we have passed the instances to MOSEK 10.0, which ran on a laptop equipped with Intel(R) Core(TM) i9-9880H (8 cores at 2.3 GHz and total of 16 threads) with 16 GB RAM. The same toy circuit was used as in Sect. 4.3.

The scalability of the reformulated GP model is demonstrated in Figs. 4 and 5. The results on a ladder of maxima parameterized by the depth of the ladder are shown in Fig. 4 Notice that the numbers of cones (blue line) and variables (orange line) are linear in the depth of the ladder. At the same time, the relative error increases.

Figure  5 demonstrates the scalability of the reformulated GP (26) on a ladder of maxima and convolutions, parameterized by the number of bins per gate. Notice that the numbers of cones (blue line) and variables (orange line) are quadratic in the number of bins. At the same time, the relative error decreases with the number of bins, as expected.

6 Discussion and conclusions

In this paper, the problem of the calculation of the maximum delay in a digital circuit under uncertainty (also known as SSTA) is studied from the mathematical optimization point of view for the first time. Using a histogram representation of the delays distributions for simplicity, we have presented two formulations of SSTA as an optimization problem. Section 4 shows Binary Integer Programming (BIP) approach, which is a formal formulation and does not scale. Section 5 gives a more practical formulation of SSTA as a Geometric Programming (GP) problem.

For a reformulation of the GP, we have demonstrated linear scaling with the number of gates and quadratic scaling with the number of bins. The SSTA has been successfully computed using 30 bins for a circuit with 400 gates in 440 s which ran on an 8-processor machine equipped with Intel Xeon Scalable Platinum 8160 (192 cores at 2.1GHz and 384 hardware threads) with 1536 GB RAM.Footnote 4

The histogram approximation, which has been previously studied (Liou et al. 2001) as a replacement of Monte Carlo simulations, is used in this work for optimization purposes. However, this approach has clear disadvantages: (i) as we increase the number of gates, we have to increase the size of the interval, and with that the number of bins; (ii) we also assume the support of the delay distribution is knownFootnote 5 (iii) Also, it should be noted that the correlations between the delays were not taken into account.

On the other hand, this approach allowed us to (i) perform the robust optimization of delays’ distributions, unlike in other statistical approaches, where only the statistical moments are taken into account, and (ii) perform computations in polynomial time up to any fixed precision using GP. Last but not least, the histogram formulation of the SSTA makes the results transparent and easy to understand.

Some of these challenges could be addressed. Similar to the approximation algorithm (Liou et al. 2001, Section 3.3) of Liou et al., one could address the scalability issue at the cost of some error by decomposing the circuit into “supergates”, possibly hierarchically, and at the further cost of estimating only the tail of the delay distribution by (i) “filtering out unnecessary stems”, i.e., discarding sample paths, which are guaranteed not to influence the tail of the distribution. See Liou et al. (2001, Section 2.3) for suggestions how this could be performed. The same approach of Liou et al. (2001, Section 2.3) could also be used to address the challenge (ii) above, as it can be used to estimate the range of arrival times of events. The challenge (iii) above, phrased in terms of addressing the correlations seems to be inherently difficult, albeit perhaps less important. This inherent difficulty extends to measuring the correlations between more than two gates’ delays, especially when conditioned on external factors such as a change of temperature. Indeed, many sources (Jyu et al. 1993, e.g., p. 131) claim “cell library designers agree that it is reasonable to expect the delays for components on a single chip to track each other.”

Table 1 An overview of the approaches that approximate probability distributions with various parametric classes of distributions, with references to their uses in replacements of Monte Carlo in simulation, and our suggestions as to the classes of optimization problems obtained using the technique of Sect. 4

Let us now chart some potential avenues for further research. Easily, one could replace the uniform distribution centered at the midpoint of each bin with a triangular distribution centered at the midpoint of each bin. In his pioneering paper (Naidu 2002), Naidu used such “impulse train” distributions as a replacement of Monte Carlo for simulation purposes. Our binary–integer optimization formulation of Sect. 4 should be easy to extend to the triangular distributions, and would remain mixed-integer linear. Whether the geometric–programming approach of Sect. 5 would be as easy to extend, remains to be investigated. Either way, this would be an interesting extension.

Further, our work could be plausibly extended to the Gaussian comb model of Mishagli and Blokhina (2020), where one would replace the uniform distribution centered at the midpoint of each bin with a Gaussian kernel function. As such, the Gaussian comb model is a very special case of the Gaussian mixture model with predefined, uniformly distributed expectations of the components. There, one optimizes over the mixture coefficients, rather than the counts in a histogram, and the objective function has a more complicated form (which expresses the integral in a closed-form). We conjecture the resulting mixed-integer non-linear optimization problems are “tame” in their continuous part, i.e., the continuous part is definable in an o-minimal structure. This is a fast-growing area of optimization, due to the applications in deep learning (see, e.g., Bolte and Pauwels (2021)), but mixed-integer extensions do not seem to have been studied yet, and there certainly are no off-the-shelf solvers.

More broadly, one could consider Gaussian mixture models or infinitely-smooth radial basis functions (RBFs). Infinitely smooth RBF, such as Multiquadric RBF or Inverse quadratic RBF, are real-analytic (\(C^{\infty }(\mathbb {R})\)), and hence one can formulate mixed-integer analytical optimization problems over these. While optimization over real-analytic functions is becoming better understood (Absil and Kurdyka 2006; Kurdyka et al. 2000), these optimization problems seem very challenging. While the famous result of Kurdyka et al. (2000) shows the finite length of the gradient flows, local minima are not necessarily stable equilibria of the gradient-descent system, and vice versa [Proposition 2 Absil and Kurdyka (2006)]. That is: local minimality is neither necessary nor sufficient for stability. Integer analytic optimization hence seems difficult to work with, other than using spatial branch-and-bound (Smith and Pantelides 1999), which may not be sufficiently scalable.

Future work may also involve an extension of the maximum computation and the gate sizing program for the case of correlated random variables and related problems of circuit design.