Process-Variation and Temperature Aware SoC Test Scheduling Technique
Authors
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s10836-013-5374-z
- Cite this article as:
- Aghaee, N., Peng, Z. & Eles, P. J Electron Test (2013) 29: 499. doi:10.1007/s10836-013-5374-z
- 7 Citations
- 220 Views
Abstract
High temperature and process variation are undesirable phenomena affecting modern Systems-on-Chip (SoC). High temperature is a well-known issue, in particular during test, and should be taken care of in the test process. Modern SoCs are affected by large process variation and therefore experience large and time-variant temperature deviations. A traditional test schedule which ignores these deviations will be suboptimal in terms of speed or thermal-safety. This paper presents an adaptive test scheduling method which acts in response to the temperature deviations in order to improve the test speed and thermal safety. The method consists of an offline phase and an online phase. In the offline phase a schedule tree is constructed and in the online phase the appropriate path in the schedule tree is traversed based on temperature sensor readings. The proposed technique is designed to keep the online phase very simple by shifting the complexity into the offline phase. In order to efficiently produce high-quality schedules, an optimization heuristic which utilizes a dedicated thermal simulation is developed. Experiments are performed on a number of SoCs including the ITC’02 benchmarks and the experimental results demonstrate that the proposed technique significantly improves the cost of the test in comparison with the best existing test scheduling method.
Keywords
SoC test schedulingAdaptive testTemperature awarenessProcess variationThermal simulation1 Introduction
Deep submicron integration causes high power densities and large process variation [7, 18]. The power density for a System-on-Chip (SoC) during test is considerably higher compared to the normal operation [4, 5, 27, 28] and so is the risk of overheating. Therefore, the temperature of the chip under test should be taken into account when designing the test process [5, 27]. Traditionally, thermal-aware test schedules are optimized to achieve minimum test application time while avoiding any overheating. Usually, this is done offline by introducing cooling cycles into the test schedule based on the thermal simulation data [10–12, 26].
A partitioning based approach is introduced in [26]. Selection of the start times for partitions is formulated as an optimization problem and solved with list scheduling [26]. The temperature is initially simulated using HotSpot [14] and is constrained to avoid overheating [26]. During the optimization, the list scheduler utilizes a superposition-based method to assess the temperatures [26].
A partitioning and interleaving approach is introduced in [12]. The suggested method in [12] formulates the number of test partitions and the length of cooling intervals into an optimization problem and then uses constrained logic programming to solve it. The temperature is simulated using HotSpot [14] and is constrained to avoid overheating. Since constrained logic programming is too slow to handle long test sets, a heuristic is proposed for test scheduling [12].
A partitioning and interleaving approach is introduced in [11] which determines the partition lengths, the cooling times, and the schedule, on-the-fly without iterating. The proposed heuristic schedules the partitions based on the cores’ temperatures and remaining test sizes [11]. The partition ends when its temperature approaches a high temperature threshold and then a cooling interval starts. The cooling interval may end when the core’s temperature is lower than a low temperature threshold [11]. The optimal value for low temperature threshold is found experimentally so that the test application time is minimized. The temperature is simulated using ISAC [23] and is constrained to avoid overheating [11].
The focus of this paper is on thermal safety issues in the SoC test process, but it is worth mentioning that there are other thermal issues concerning the SoC test. For example, there are defects which appear only in a special temperature range and therefore the test to detect them should be carried out in that special temperature range. The details of this problem and solutions to it could be found in [9, 25]. Beside the thermal safety issues, modern SoCs are affected by large process variation; this issue is discussed next.
The temperature at a certain test cycle is traditionally assumed to be identical for identical chips. However identical chips manufactured with the state of the art technologies are likely to have different temperatures at a certain test cycle because of the process variation. Process variation could be described as geometrical variations in the chips’ layouts and variations in materials properties. These phenomena induce variations in the electrical characteristics of the chip (e.g., threshold voltages and leakage currents).
The variations may also affect the thermal characteristics of the chip (e.g., thermal resistances and capacitances). The variation of the electrical characteristics causes power variations and thus temperature variations. The variation of the thermal characteristics causes variation in the device temperatures even if the power is identical for different chips. The temperature variation as an effect of process variation is studied in [8] for finFet devices.
In order to isolate the temperature uncertainty from other thermal phenomena, temperature error is defined as the difference between the expected temperature that is estimated by simulation and the actual temperature that is measured by sensors. Temperature error captures uncertainties due to different temperature related deviations, including ambient temperature fluctuations, voltage variations, and in particular process variation.
For traditional technologies, temperature error is small enough to be neglected or to allow worst case designs with negligible overhead [10–12, 26]. But, for new technologies the general trend of increase in power densities and process variation will eventually lead to a situation where the temperature error can no longer be ignored. Therefore, the negative effect of process variation (i.e., large temperature error) should be taken into account in developing efficient test solutions.
In [3], two process variation aware methods are proposed to maximize the test throughput. These methods consider thermal safety as a part of the optimization objective. However, the first method proposed in [3] does not react to temperature deviations, and the other method does not handle intra-chip process variation effects and time-variant temperature deviations. Based on [1, 2], in this paper an adaptive test scheduling method is introduced to address the intra-chip process variation. The proposed method adapts the tests’ partitions, cooling intervals, and test schedule to the current temperature situation in order to lower the cost of test. The adaptive test schedule is achieved by selectively reading the on-chip temperature sensors.
Integration of temperature sensors in a chip and their use during both test and normal functionality are already practical. For example, the Power5 processor, manufactured in 2004, is reported to have 24 temperature sensors [6]. A variety of mechanisms to access the sensors during test have also been proposed [15, 24]. A thermal-aware test scheduling technique using on-chip temperature sensors is proposed in [24] which is based on a static schedule. Two heuristics are suggested for generation of the static schedule, one of which takes the temperature constraint into account. It assumes that each individual test runs to completion [24]. This means that a large cooling time is required before start of each test in order to guarantee the thermal safety. This approach leads to very long cooling times and does not handle long and power intensive tests which will lead to overheating if applied to completion [24].
The technique proposed in this paper is based on partitioning and interleaving, as proposed in [11], and efficiently utilizes the cooling times in order to decrease the overall test application time. It also handles the long and power intensive tests which are not thermally safe, per se. The proposed technique generates a near optimal schedule tree in an offline phase. During testing (online phase), a chip traverses the schedule tree, starting from the tree’s root and ending at one of the tree’s leaves, depending on the actual temperatures.
To our knowledge, we are the first to propose an approach which incorporates the on-chip temperature sensors data during test, in order to adapt to the temperature deviations caused by process variation and to achieve a superior test performance [1, 2]. The methods proposed in [1, 2] are enhanced to improve the test schedule quality, run faster, and handle larger number of cores. In order to achieve these improvements, in addition to the methods we proposed in [1, 2], this paper presents a modified optimization approach and a fast thermal simulation algorithm which is customized for the test scheduling technique.
The rest of the paper is organized as follows. Section 2 presents a motivational example. Section 3 provides the problem formulation and introduces the cost function. Section 4 describes the temperature error model. Section 5 presents the main body of the proposed method. Section 6 explains the thermal simulation approach. Section 7 presents the experimental results and Section 8 presents the conclusion. A quick reference guide (Section 9) including abbreviations and notations is given at the end of the paper.
2 Motivational Example
Under a certain temperature error condition, due to process variation, the thermal responses of the different chips to the same test sequence will be different. Now, assume that chip C_{w} is warmer than expected, while chip C_{n} behaves normally. As illustrated in Fig. 1b, C_{w} will overheat. To prevent this, a more conservative offline schedule S_{2} has to be designed based on the thermal profile of C_{w}, for both chips, as illustrated in Fig. 1c.
This new schedule will avoid overheating, but will lead to longer test application time TAT_{2} compared with TAT_{1}. For chip C_{n}, this test application time is unnecessarily long, since the original schedule, S_{1}, in Fig. 1a is a safe schedule for this particular chip. For a set of manufactured chips with large temperature variations, in order to generate a globally conservative offline schedule, the hottest chip will be used to determine the test schedule. This test schedule will introduce too long cooling periods for most of the chips, leading to an inefficient test process.
In [3], we have proposed a technique to address the above problem with the help of a chip classification scheme. This scheme consists of several test schedules for different temperature error ranges. After applying a short test sequence, the actual temperature of the chip under test is measured using a sensor and depending on its value, the proper test schedule is selected. Therefore, the hotter chips will use a test schedule with more cooling, while the colder chips will have less cooling. The overheating issue is solved and the test application time will not be made unnecessarily long. This approach works fine under the assumption that the thermal behavior of the chips is time invariant.
However, in the case of large process variation, the thermal behavior is time variant and the technique presented in [3] will not be able to achieve high quality schedules. The variation of thermal response with time is illustrated in Fig. 1d. In this case, the temperature of chip C_{w} gradually lifts up compared to chip C_{n}, and C_{w} eventually overheats. A scheduling method capable of capturing temporal deviations is therefore required to deal with this new situation.
For the rest of the test, after i_{4}, two dedicated schedules, S_{2} and S_{3}, are generated in the offline phase for C_{n} and C_{w}, respectively. Therefore, in the online phase the test of C_{n} continues with schedule S_{2}, as in Fig. 2a, and the test of C_{w} continues with schedule S_{3}, as in Fig. 2b.
In this illustrative example, at the end of S_{1}, the schedule does a branching to either S_{2} or S_{3} based on the actual temperature. This information and the branching condition can be captured in a branching table, B_{1} in Fig. 2. As shown in Fig. 2a, C_{n} is tested initially with S_{1} and then with S_{2}, while, as shown in Fig. 2b, C_{w} is initially tested with S_{1} and then with a more conservative schedule, S_{3}. The segments of the schedule which are executed sequentially without branching are called linear schedules. An adaptive test schedule consists therefore of a number of branching tables in addition to multiple linear schedule tables.
The focus of this paper is multi-core SoC, although the above illustrative example was about a single-core design. It is assumed that, due to the intra-chip process variation, each core has its own thermal behavior similar to what is described above for a chip. Moreover, multi-core designs usually are affected by lateral heat dissipation among the cores and also by the limited test bus width which is shared by different cores.
3 Problem Formulation
Our goal, as discussed in the previous section, is to generate an efficient adaptive test schedule offline. This is formulated as an optimization problem. The input consists of a SoC design with its set of cores and their corresponding test sequences. The floor plan, the thermal parameters, the static power parameters, and the dynamic power parameters for the chip are given as inputs. The statistical data that models the temperature deviations are also given as input. The adaptive test schedule should be generated to minimize the test application time and the probability of overheating. These objectives are captured by a cost function which expresses the overall efficiency of the generated test schedule, as discussed in the following.
The test schedule should be generated under two constraints. The first constraint is the available test bus width. The test bus width limits the number of cores that can be tested in parallel. The second constraint is the available Automatic Test Equipment (ATE) memory which limits the size and the number of the linear schedule tables and branching tables. It is assumed that the available memory after loading the test patterns will be utilized for storing the schedule and therefore the amount of memory dedicated to the schedule will not introduce new costs.
The first term in the cost function is related to the test facility operation cost, which is defined as the operational Cost of the Test Facility per time unit (CTF) divided by the Test Throughput (TT). The cost of the test facility operation per time unit depends on the cost of the ATE machines, their maintenance costs, and other operation costs. The test throughput is explained later.
The second term of the cost function is related to the cost of the overheated chips, which is the product of the Price of One Chip (POC) and the expected number of overheated chips. The expected number of overheated chips is calculated based on the Test Overheating Probability (TOP) which represents the number of overheated chips per number of chips entering the test facility.
In (1) the test overheating probability, TOP, is divided by (1 − TOP) in order to give the expected number of overheated chips per number of non-overheated chips. The cost of the test facility per time unit, CTF, and the price of one chip, POC, depend on the particular manufacturing and test facility and on the particular SoC. To have a simple model for the test throughput, TT, assume that the given test facility is characterized by its overall Effective Test Time per Second (ETTpS) and Test Handling Time (THT).
The normalized test throughput, NTT, is proportional to the applied test size divided by the test application time. It is also proportional to the percentage of the chips that have completed the test without overheating. Therefore large test application time and large test overheating probability will result in small test throughput and consequently the cost component related to the test facility operation will be higher.
The balancing coefficient, BC, is in direct proportion to the price of one chip, POC, and in inverse proportion to the cost of the test facility per time unit, CTF. The first term in the above equation (normalized cost function) represents the test facility operation cost and shows what volume of test could be applied by the test facility per time unit. The second term represents the balanced cost of the overheated chips and is proportional to the test overheating probability, TOP, and the balancing coefficient. The balancing coefficient balances the cost of the overheated chips against the cost of the test facility operation. Expensive chips will results in a larger balancing coefficient and expensive test facility will result in a smaller balancing coefficient.
4 Temperature Error Model
In order to distinguish between the effects of the process variation and other undesirable thermal effects, four different temperatures are defined. The first one is expected temperature, that is the temperature of a normal chip which is not affected by undesirable thermal effects (including process variation). The expected temperature is an abstract concept and its exact value could not be acquired. The second one is simulated temperature, that is the temperature computed by simulation. The aim of simulation is to compute the expected temperature and therefore, ideally, the simulated temperature is equal to the expected temperature. The third one is actual temperature, that is the actual real-world temperature. The actual temperature is physical, but its exact value is usually impossible to acquire due to measurement errors. The forth and last one is measured temperature, that is the measured temperature using temperature sensors.
Based on the above definitions, three different temperature errors can be defined. The first one is simulator error, that is the difference between the expected temperature and the simulated temperature. The inaccuracies in the simulation model and algorithms contribute to this error. The second one is measurement error, that is the difference between the actual temperature and the measured temperature. The inaccuracies in the sensor technologies contribute to it. The third and last one is variation error, that is defined as the difference between the actual temperature and the expected temperature. This error has various sources including process variation, ambient temperature fluctuations, and voltage variations.
The focus of this paper is process variation which mainly contributes to the variation error and therefore in this paper we focus on variation error. In the rest of this paper, the temperature error is considered to be the difference between the expected temperature which is estimated by simulation and the actual temperature which is measured by sensors.
Temperature error is discussed above based on its origins but there still is another important point of view, the way it affects the cores. Therefore, temperature error is further categorized into spatial temperature error and temporal temperature error. Spatial temperature error shows that different cores have different temperature errors while the temporal temperature error shows that the same core has different errors at different times. A temperature error model gives the probabilities of the temperature errors for every core in every test cycle. The spatial error model gives the initial error distribution and then the temporal error model is used to recursively estimate the error distribution for the next cycle.
For example, a spatial temperature error model which consists of a discrete distribution shows that at the very beginning of the test the probability of an error equal to −2.3 °C in core 1 is 0.001 while the probability for the same error in core 2 is 0.02. The spatial error model, in this paper, is specified using a look up table which is assumed to be given as one of the inputs. Assuming that the error for a SoC design may range from −20 °C to +20 °C by a resolution of 0.5 °C, the number of the look up table entries (M) would be 80 for a core and M ×C for a SoC with C cores.
Our temporal temperature error model is assumed to be a discrete-time model which means that the temperature error is fixed during a period and then it changes discretely from one period to the next. Therefore, the temporal temperature error model specification has two parts, the period which is called temporal error period and a table of error change probabilities. The temporal temperature error table gives the probability of a particular change in error.
The size of the temperature error data set, given as input, might be quite large. In such a case it is necessary to extract a smaller set of data which is representative of the original data in accordance with the accuracy and speed requirements. This is done by clustering the errors into error clusters. The error clusters are characterized by temperature Error-clusters Borders (EB). The temperature error range, resolution, and error clusters are assumed to be identical for all cores, in this paper.
The error clustering is assumed to be uniform and the error-clusters borders, EB_{l}(0 ≤ l ≤ L), are identical for all cores. Assuming L error clusters, the size of the original data set reduces to L×C. Error clustering will divide the C-dimensional error space into error cells indexed using Cartesian system (i.e., [l_{0},l_{1}, …, l_{C−1}]). For example assume that for a SoC with two cores, each core has two error clusters. The 2-dimensional error space is divided into four error cells, indexed with [0,0], [0,1], [1,0], and [1,1]. While the original size of error space is M^{C}, the number of error cells is L^{C}. Assuming M = 80 and C = 2, the original size is 80^{2} = 6,400 while the size of the clustered error space, with L = 3, is 3^{2} = 9.
5 Adaptive Test Scheduling
The proposed adaptive method is based on the on-chip temperature sensors implemented on each core. During test, the actual temperatures of selected cores are read at certain selected moments and the gap between sensor readouts is filled with thermal simulation. A group of chips with similar thermal behavior which are tested with the same schedule is called a chip cluster. During the test, chips are dynamically classified into one of the chip clusters and are tested using the corresponding schedule. The chip clusters vary during the test, and at every adaptation moment (time moment corresponding to a certain branching table) the chip clusters change into a new scheme which is suitable for the new situation.
The parameters that affect the efficiency of the adaptive method are the moments when branching/adaptation happens, the number of edges (i.e., linear schedule tables) and the branching conditions (i.e., chip clustering). For the example in Fig. 2, the adaptation is happening at i_{4}, the number of edges is two (two linear schedule tables, S_{2} and S_{3}), and the branching condition is a comparison with the Threshold.
- 1.
How many chip clusters, at each possible node in the schedule tree, is suitable? The special case of one edge implies no branching, no sensor reading, and no extra effort.
- 2.
What is the proper chip clustering into the given number of chip clusters? The number of chip clusters is known from the answer to the previous question. Depending on the chip clustering some cores may not need sensor readout.
The second question is only relevant when the answer to the first question is larger than one. The above questions are then formulated in two different forms, the first question is described as a tree topology and the second question is the chip clustering for the nodes of that tree topology.
A candidate schedule tree is generated by combining a possible tree topology with a possible chip clustering. The number of candidate tree topologies and the number of alternative chip clusterings grow very fast with parameters like temporal error resolution and the number of cores. Since the number of candidate trees is the product of the tree topology alternatives and the chip clustering alternatives, the search space is so huge that ordinary search approaches would not work fast enough. Therefore a constructive method is suggested to deal with this high complexity.
The schedule tree is constructed by adding small partial trees to its leaves. These small partial trees which are the building blocks of the schedule tree are called sub-trees. A sub-tree consists of a small number of linear schedule and branching tables which makes it possible to be clustered and optimized (scheduled) at once. The tree that is under construction with unfinished tests is called an unfinished tree.
The algorithm starts with an initialization phase, as shown in Fig. 6. Here, the unfinished tree, sub-tree topologies, temperature error model, and thermal simulator are initialized. Then it proceeds with constructing the schedule tree as will be explained in Section 5.1. The procedure of constructing the schedule tree out of the sub-trees is presented in Section 5.1. The linear schedule tables are discussed in Section 5.2. The sub-tree evaluation is explained in Section 5.3. The sub-tree scheduling which is based on an optimization heuristic is explained in Section 5.4. Some remaining remarks are given in Section 5.5.
5.1 Tree Construction
The schedule tree construction starts with a root node and in each iteration an unfinished tree extends and multiplies by adding possible combinations of sub-trees to its active leaf nodes, as shown in Fig. 5. Then, a small number of promising under-construction trees are selected as unfinished trees from the offspring list for the next iteration. For example, an unfinished tree list (similar to Fig. 5a) will be selected from the offspring list (partially shown in Fig. 5c). The algorithm, as shown in Fig. 6, ends when all the unfinished trees have completed the test.
The normalized cost function, NCF (Eq. 4), is scaled by the tree’s number of nodes plus an adjusting offset. Now, adding nodes to the tree is only beneficial if it gives a reasonable cost reduction, otherwise a smaller tree may get a lower scaled cost and manage to survive to the next iteration, while bigger trees are discarded. The effect of the number of nodes is adjusted by the adjusting offset. A small adjusting offset promotes having fewer edges compared to a large adjusting offset which promotes having more edges.
To satisfy the memory constraint, when unfinished tree is selected based on its scaled cost function, it is scheduled for the rest of test by just using the linear schedule tables which mean no further branching. During this scheduling, the linear scheduling aborts as soon as the memory limit is violated. If the linear scheduling succeeds in respecting the memory limit, the iterations continue. Otherwise, the currently chosen unfinished tree is dropped and the next candidate with larger or equal scaled cost is tested for its compliance with memory limit. The scheduling will fail if no candidate could meet the memory limit, meaning that the limit is too tight even for a linear schedule.
5.2 Linear Schedule Tables
A linear schedule table, as discussed in Section 2, captures a schedule without branching. The linear schedule table entries (start/stop times for each and all cores) are optimized in the offline phase to reduce the probability of overheating. The temperatures are checked frequently in order to keep the overheating probability small.
The start/stop states in the linear schedule tables are generated using the heuristic proposed in [11]. According to this heuristic, the test of the cores with lower temperature and higher remaining test size will be started or resumed earlier. Activating the cores with lower temperature is desirable because it provides longer testing states and therefore reduces the number of test partitions and their corresponding overheads.
By choosing the colder cores while the effect of adjacent cores are taken into account by thermal simulation, in fact, the algorithm activates the cores which are far from the current active cores. This will save the newly activated cores from the accumulated heat in their possible neighbors and furthermore by not activating the adjacent cores, the newly deactivated cores will experience a faster cooling. The heuristic gives also advantage to the cores with longer remaining tests, thus maximizing the interleaving opportunities and also to avoid the situation that a long test sequence leads to a long total test application time.
As mentioned before, each chip cluster is tested with a dedicated linear schedule. Every chip cluster is represented by an error value which will be used to estimate the actual temperature based on the simulated temperature; this error value is called representative temperature error. The estimated temperature is updated periodically by correcting the cores’ simulated temperatures with the representative temperature error. The estimated temperature is then used to compute the static power and to determine the ‘state’ of the cores.
For example, assume that there are two chips {D_{0}, D_{1}} in a certain chip cluster and chips consist of only one core. Therefore, at a certain moment in time, there are two error values {E_{0}, E_{1}} corresponding to the two chips. But the linear scheduling heuristic works with one error value for one chip cluster. Therefore, the representative temperature error, r, which is a real number (r ∈ \( \mathbb{R} \)) is defined as a value which represents chips error values, {E_{0}, E_{1}}.
The representative temperature error is updated periodically with the temporal error period (see Section 4) while the estimated temperature, static power, and state of the cores are updated more frequently. After updating the state of the cores, the dynamic power sequence is computed. The initial temperatures are available as the results of the previous thermal simulation. Having dynamic and static power sequences in addition to the initial temperatures, the next thermal simulation is performed.
The representative temperature error for a chip cluster is viewed as a safety margin in [3] and its optimal value is experimentally computed for a number of examples. These experiments suggest that the optimal value for a representative temperature error is equal to the border between the chip cluster and the adjacent chip cluster that has larger error (i.e., hottest possible chip in the chip cluster). This is true for all chip clusters except the last one that has the largest error. For example, for a chip cluster stretching from EB_{i} to EB_{j} (EB_{i} < EB_{j}; j < L), EB_{j} would be a good choice to be the representative temperature error for this chip cluster. The representative temperature errors are assigned in a similar way in this paper.
To have an example from a different point of view, assume that in total there are four chips {D_{0}, D_{1,}D_{2,}D_{3}} and chips consist of only one core. Therefore, at a certain moment in time, there are four error values {E_{0}, E_{1,}E_{2,}E_{3}} corresponding to the four chips. Assume that E_{0} < E_{1} < E_{2} < E_{3}. Assume that the chip-clustering algorithm (will be explained in Section 5.4) has generated two chip clusters {D_{0}, D_{1}} and {D_{2}, D_{3}}. The representative temperature error for the chip cluster that has smaller errors (i.e., {D_{0}, D_{1}}) is r^{0} = E_{1} and the representative temperature error for the last chip cluster, r^{1}, is formulated as an optimization variable along with the chip-clusters borders in the chip-clustering algorithm.
A new sub-tree optimization method is proposed in this paper that encodes the problem based on chip-clusters borders. The representative temperature errors are defined as chip-clusters borders for all chip clusters but the last one. For the last error cluster (one with the largest errors), the representative temperature errors are encoded along with the chip-clusters borders as the sub-tree optimization variables. This will be explained in Section 5.4.
The optimization problem for a linear schedule table is to minimize the partial normalized cost function by finding the proper start/stop times. This is done based on the heuristic proposed in [11]. The utilized test bus width is the sum of the Test Access Mechanism (TAM) widths of the active cores for tests which utilize the TAM. The schedule size is the product of the number of the linear schedule table entries and the record size. The schedule tree is equivalent to a number of linear schedule tables (edges) in addition to a number of branching tables (nodes), as shown in Fig. 5a. The linear schedule table is explained above and the rest of the construction process will be explained in the following sections.
5.3 Sub-Tree Evaluation
The schedule tree is constructed by attaching sub-trees to the leaves of the unfinished trees (See Fig. 5). For this purpose, the proper schedule for a sub-tree topology should be found. In a sense, a sub-tree is a tree and the cost function introduced in Section 3 should be usable. However, there is a subtle difference between their objectives. For the schedule tree the objective is its very own cost. For a sub-tree the objective is, on the other hand, the cost of the schedule tree that is to be constructed. Therefore, the cost of the final schedule tree should be estimated assuming that this particular sub-tree is used in its construction. This makes the cost evaluation different for the sub-trees.
To find the near optimal schedule for a sub-tree topology, the partial cost function must be evaluated for different sub-tree clustering alternatives. For the evaluation of the cost function (i.e., NCF in (4)), the expected values of the test application time, TAT, the applied test size, ATS, and the test overheating probability, TOP, (denoted by ETAT, EATS, and ETOP, respectively) should be computed by utilizing the temperature error statistics.
This is specifically important for the nodes at which branching takes place. Branching at a node is, in fact, a chip clustering to a number of groups, so that each chip cluster corresponds to an exclusive edge branching out of that node. Chips are identified by their cores’ errors and therefore a chip clustering is a partitioning of the C-dimensional error space into a number of chip clusters.
This means that a chip cluster is a combination of specific error intervals of the cores. A candidate ‘sub-tree clustering’ is a set of chip clustering alternatives for nodes. Furthermore, a candidate ‘sub-tree clustering’ could be viewed as a set of nodes’ clustering alternatives for a sub-tree topology. An error cell is a cell in C-dimensional error space separated by cores’ error-clusters borders and therefore its projection on a core error axis is an error cluster for that core. Therefore, a node clustering could be seen as assigning error cells to chip clusters or equivalently labeling error cells with chip clusters. An example for labeling of the error cells is shown in Fig. 7. There are two cores (C = 2) in the figure, and the numbers inside the rectangular error cells are labels.
A candidate sub-tree topology will have a number of candidate clustering alternatives which label the nodes’ error cells with the relevant chip clusters. Each chip cluster for a node corresponds to an edge branching out of that node and corresponds to a linear schedule table. Each node has its own dedicated Error-Cell Labeling \( \left( EC{L}_{n,{l}_1,{l}_2,\dots, {l}_C}\left(1\le n\le N;1\le {l}_i\le L\right)\right) \). Looking from a branching node, a succeeding node corresponds to a chip cluster and therefore it receives a Node’s Cluster Label (NCL_{n}(1 ≤ n ≤ N)) to represent that chip cluster (or equivalently the preceding edge and corresponding linear schedule). This label indicates which of the branching node’s chip clusters will lead to a certain succeeding node.
The probabilities of error cells for different nodes and consequently the probabilities of those nodes are computed based on the temperature error model and based on the chip clusterings of the preceding nodes. In order to speed up the computation of the Error-Cells Probabilities (ECP) the Error Cell Change Probabilities (ECCP) are pre-computed as shown in (10). The error cell change probabilities are, in fact, the concentrated effect of the temporal error model which is repeatedly used to compute the error-cells and nodes probabilities.
Edges are scheduled by determining the linear schedule tables as explained in Section 5.2. Then the candidate sub-tree clustering is evaluated using the partial cost function which is based on the expected applied test size, the expected test application time, and the predicted test overheating probability. The first two are already introduced in (7–8), and the last one is explained below.
Evaluation of a partial tree is in fact an attempt to predict the cost of the completed schedule tree, based on the current situation of that partial tree. For this purpose, it is assumed that the final schedule tree will be composed of a number of similar partial trees (building blocks for the final schedule tree). These partial trees are assumed to have similar expected applied test size, expected test application time, and expected test overheating probability. These expected values are assumed to be similar to those of the partial tree that we are evaluating.
Therefore, a good prediction for the test application time and the applied test size would be their current expected values multiplied by the predicted total number of partial trees. Since only the ratio of the predicted test application time to the predicted applied test size matters in the cost function (the first term in Eq. 4), a good choice for predicted values of these variables is their expected values. But the situation for Predicted Test Overheating Probability (PTOP) is different since its value does not change linearly when a number of similar partial trees (building blocks) are put one after the other (unlike EATS and ETAT).
Think of λ as the total number of partial trees (building blocks) that are assumed to be similar to the current partial tree and will construct the final schedule tree. ETOP is computed for the partial tree as expressed in Eq. 19 and then the predicted test overheating probability is computed by assuming that these λ partial trees have overheating rates equal to the current partial tree’s overheating rate. λ is computed based on the expected Number of Partial Trees (NPT) which is defined as the total test size divided by the expected applied test size, EATS, for the current partial tree.
A naïve algorithm will use the NPT instead of the λ in (20). But, because of the localities in the schedule tree, partial trees (building blocks) with a lot of cooling may exist. For these partial trees, the expected applied test size is small and consequently the expected number of partial trees, NPT, will be estimated pessimistically. This unrealistic estimation may result in exceedingly large predicted test overheating probabilities, PTOP, and consequently a long schedule tree with too much of cooling may receive a low cost and be selected.
A larger Knee promotes lower overheating since the maximal value of λ increases and also because of the increased limiter’s amplification for the NPT values which are less than Knee. On the other hand, a smaller Knee will result in schedules with shorter test application time. Other types of limiters might be used provided that they do not cause difficulties for the convergence of the Particle Swarm Optimization (PSO) by introducing abrupt changes in the cost and consequently in the velocities when particles travel throughout the search space. PSO is explained in more details in Section 5.4.
At this point, the introduction to computation of the expected test application time, expected applied test size, expected test overheating probability, and predicted test overheating probability is completed and the expected cost for a sub-tree could be computed using them. Therefore, the clustering alternatives for a sub-tree topology could be evaluated using the scaled cost (Eq. 6). The clustering alternatives are explored by Particle Swarm Optimization (PSO) and the best scheduled sub-tree is selected at the end. This optimization is further discussed in Section 5.4.
5.4 Sub-Tree Scheduling
A solution encoding scheme is suggested in [1, 2] which labels the error cells with chip clusters. The number of the decision variables grows exponentially with the number of cores and therefore the computational complexity is very high. In this paper, we suggest a solution encoding scheme which encodes the chip-cluster borders instead of the error cells. For a node with S succeeding chip clusters the number of decision variables is S × C. For S chip clusters, there are (S − 1) chip-clusters borders and the S-th value is the representative temperature error for the last cluster. Here, the number of the decision variables grows in proportion to the number of cores and therefore the computational complexity is much smaller compared with the scheme suggested in [1, 2].
The possible solutions are then explored using particle swarm optimization that is a meta-heuristic. A candidate solution is called a particle and is represented by its location and its velocity. The locations are the encoded solutions and the velocities are used to determine the next candidate solutions. Each particle remembers its previous best location, and the swarm remembers the global best solution that is the best location any of its particles have visited ever. The previous bests and the global best are then used to give a hint to the random velocities.
There are two admissibility conditions to ensure that the particles are valid solutions. The first condition is the valid range and the other is the presence of required chip clusters. For example assume that errors range from −10 to +10 and therefore smaller or larger errors will never happen in practice. If it happens that one element in the next particle’s location is +11, then this particle is out of range. An example for a required chip cluster not being present is as follows. Assume that there are three edges in a certain node and therefore three chip clusters are necessary for that node. It may happen that in the next particle’s location, the first and the second chip-cluster borders are assigned with identical values and therefore the second chip cluster is missing.
The proposed solution encoding which is based on chip-clusters borders works well with particle swarm optimization, since the location and velocity in particle swarm optimization’s terms correspond to the location and velocity for chip-clusters borders. A typical particle in the beginning is far from being good and experiences a high velocity towards the better location since typically the difference between the best location and the current location is large at the beginning. Therefore a rapid convergence towards the preferred value for the chip cluster border will take place.
Later on, a typical particle will be close to the optimal location and according to (22) it will move slower, thus pin pointing the preferred value for the chip cluster border. Some experiments, for chip clustering optimization for sub-trees using particle swarm optimization, are reported in [2]. The experiments showed that the particle swarm optimization performs well for this purpose; therefore, it is used in this paper as a part of the proposed SoC test scheduling technique.
5.5 Remarks
Even though the thermal simulator errors and sensor measurement errors are not addressed explicitly in this paper, in practice when the temperature error model is being tuned empirically, a great amount of these errors will also be covered. There still might be small residual errors which are not captured by the temperature error model. These small residual errors are addressed by introducing a small safety margin (a slightly lower temperature limit is used in practice). The effect of this small safety margin on cost is negligible as shown in [3].
In this paper it is assumed that different cores may have different errors and every core has a temperature sensor. The sensor readings may take place only on multiples of the temporal error period. The proposed approach takes the overhead of sensor access and the overhead of reacting to the readout value (i.e., jumping to a new schedule table) into account and, therefore, the number of sensor accesses is automatically kept within a reasonable size.
The proposed optimization technique is structured so that it enables parallel implementations with different granularities. The alternative sub-tree topologies (AL_{i}.StT_{j} in Fig. 6) as described in Section 5.1, could be optimized in parallel. For example, assuming one unfinished tree with two leaf nodes and three sub-tree topologies, there will be 3^{2} = 9 combinations to optimize in parallel.
Furthermore, at the lower level of sub-tree scheduling, each alternative chip clustering in particle swarm optimization (particle_{i} in Fig. 8), as described in Section 5.4, could be generated (corresponding edges being scheduled) in parallel with other alternative schedules. The scheduling of the edges (i.e., optimizing the linear schedule tables) is the part that requires thermal simulation (dashed-line blocks in Fig. 8). Therefore, these most computationally expensive parts could be implemented in parallel in two different nested levels. The main structure of this work is presented by this point. There remains the proposed thermal simulation approach which helps to efficiently simulate the temperatures.
The proposed adaptive approach in this paper combines the benefits of an online scheduling technique with the benefits of an offline scheduling approach and avoids their shortcomings. An online schedule will introduce very large overheads that are associated with sensor readouts, decision making process, and pausing/resuming the tests. An offline schedule, on the other hand, is not capable of reacting to variations but has no run-time overheads. In a fully online approach, reading the temperature sensors for all cores as often as it is necessary and making the corresponding decisions for the acquired data will cause a very large load on the test access mechanism and will introduce large delays to the schedule. Our proposed approach uses temperature simulations as much as possible offline and accesses carefully-selected cores’ sensors at carefully-selected times during the test.
There is one schedule tree for a chip that addresses all cores individually. For example, in a linear schedule table that corresponds to an edge, it is stated that at time t_{1} cores C_{0} and C_{2} are being tested, while cores C_{1} and C_{3} are cooling. It might be that in another time, t_{5}, cores C_{1} and C_{2} are being tested, while cores C_{0} and C_{3} are cooling. The linear schedule table is similar to S_{1} in Fig. 2, but instead of the second column that shows only one column for state (in Fig. 2), there are as many state columns as the number of cores. There is only one branching table for one node in the schedule tree (similar to B_{1} in Fig. 2) but it contains, in every row, conditions that include at least one core and at most as many cores as the total number of cores.
6 Thermal Simulation Approach
In order to evaluate the candidates, the test application time and the test overheating probability should be computed as explained in Section 5. In order to calculate the test application time and the test overheating probability, the temperatures of the cores are required. Therefore, for every candidate schedule which is examined by the meta-heuristic (particle_{i} in Fig. 8), thermal simulation should be performed. Thermal simulation is in the main loop in Fig. 8 which itself is in the main loop in Fig. 6. This means that the thermal simulation which is performed inside the optimization loop is repeated numerously.
On the other hand, the thermal simulation is the slowest step in the iterative part of the algorithm. Therefore, the thermal simulation is the bottleneck for the number of the SoC cores which can be handled by the proposed method. In addition to being the limiting factor for the number of cores, the slow thermal simulation is also a bottleneck for the quality of the schedules. Since the optimization meta-heuristic will have a time consuming process inside its main loop, the time required to achieve a quality schedule will be excessively large and impractical, thus the quality should be sacrificed by ending the optimization process prematurely. It is, therefore, important to use a fast thermal simulation approach. In the following, the generic thermal simulation scheme is reviewed and then a fast solution is proposed.
The input power consists of the static power and the dynamic power. The static power depends on the chip and on the temperature, while the dynamic power depends on the chip and on the input test sequence. Both of the static power and the dynamic power are time-variant, but for practicality reasons, it is assumed that the power is constant during a simulation cycle (a discrete-time model is assumed). Therefore, in the following we focus on a single simulation cycle in which the input power is constant. The input power is updated with new static and dynamic power values, based on the results of the previous simulation cycle and then the simulation for the next cycle is performed.
The properties of the thermal model are encapsulated into two W × W matrices A and B. Θ and P are W × 1 temperature and power vectors. The model which is used in this paper is similar to what described here and its corresponding characteristic Eq. (24) is identical to HotSpot’s [14]. The mathematical representation of this model, Eq. (24), is a system of linear constant-coefficient differential equations and therefore it is a Linear Time-Invariant (LTI) system [19]. In fact the thermal model is a linear time-invariant lumped element model and the thermal elements are linear and time invariant.
θ_{0} and θ_{1} are core temperatures which should be taken care of. P_{0} and P_{1} are the powers applied to the cores.
The second part of the thermal simulation is to solve the model in order to find its response to the input power. Usually, the simulation time is divided into smaller intervals in which the power could be assumed to be fixed. Then Eq. (24) is solved iteratively for each interval.
In order to solve (24) there are two distinct approaches, the numerical approximation and the closed form solution. The numerical approximations are usually done with very small intermediate steps, and as a result, the complete temperature curve for the interval is constructed. HotSpot uses the Runge–Kutta method which is for numerical approximation [14]. Though only the temperature at the end of the interval is registered, many points of the temperature curve are calculated. Since we do not need such a detailed temperature curve and we only need the temperature at the end of the intervals, the equation is solved analytically in order to give the temperature at the end of the intervals in a closed form.
In addition to the granularity of the temperature curve, another important factor, which affects the simulation speed, is how frequently Eq. (24) is required to be solved. The scheduling technique presented in this paper, requires large number of simulations. When the system is LTI and the only variation in the inputs (within the simulation intervals) is them being scaled, the differential equation needs to be solved only once at the very beginning. The responses to the scaled versions of the previous inputs are obtained by scaling the previous outputs by the same factor. Since the computational cost of the scaling is less than the computational cost of solving the equation from scratch, a method which utilizes the LTI properties (i.e., scaling and superposition [19]) is faster than the Runge–Kutta method when numerous simulations are required.
Equation (28) could be understood intuitively by thinking about the system being LTI. According to the superposition principle, the effects of the initial value and the input power will add up, thus the plus sign between the two terms. The scaling property of the system could also be verified rapidly, as the scaling of an input, θ^{0} or P with a certain factor, will scale its own effect by the same factor.
The thermal simulation is done in two phases, an initialization phase and then the operational phase. In the initialization phase the model is invoked and based on it α and β are computed (this is shown in Fig. 6 in regard to the overall scheduling method). The operational phase is the iterative computation of the temperatures for different times using (28). Since the thermal model is time invariant, the initialization is done only once at the very beginning of the design process. Throughout the offline scheduling phase, only the iterative computations are performed.
In the closed form solution, the most computationally expensive part is the matrix exponential for α which is a part of (26). The matrix exponential could be computed using numerical methods such as Padé approximation [13]. In fact the initialization phase for the closed form solution includes calculating (26) and therefore it is very time consuming, however the operational phase only includes computing (28) and therefore it is fast.
On the other hand, for the Runge–Kutta approach [21], the initialization is fast since there is no need for computations which are as heavy as (26), but the operational phase is slow since the equation is required to be solved in many fine steps through large number of intermediate time instances. The conclusion is that the Runge–Kutta method is faster for limited number of simulations and the closed form method is faster for large number of simulations. The experiments in Section 7.1 will support this statement.
7 Experimental Results
Two distinct contributions of this paper, the thermal simulation approach and the scheduling technique are experimentally examined in this section. All experiments are performed on a desktop computer with Intel® Xeon® W3520 processor and 8 GB of memory. The experiments for thermal simulation are presented first.
7.1 Experiments for the Suggested Thermal Simulation Approach
The numerical approximation approaches, such as the one used by HotSpot, perform faster than the suggested approach for a small number of simulation cycles, as detailed in Section 6. But for simulations longer than 1,700 cycles, the proposed approach is faster than HotSpot, as shown in Fig. 12a. In general, this difference increases with a rate close to 0.011 s per cycle and it reached a CPU time difference of 100 s for 10,000 simulation cycles. This is important since for every edge in every candidate schedule tree thermal simulation is performed for the number of test cycles plus cooling cycles.
Thermal simulations are performed using the proposed approach and using HotSpot [14] for 10,000 simulation cycles for different numbers of cores, and the CPU times are plotted in Fig. 12b. In general, the CPU time difference increases rapidly with the number of cores and the difference reaches 100 s for 100 cores. This is also important, since achieving a good schedule in reasonable time becomes infeasible with a small increase in the number of cores, when the slower approach is in use.
7.2 Experiments for the Proposed Test Scheduling Technique
The proposed SoC test scheduling technique is experimentally evaluated in this section. The first set of experiments is performed on SoCs with different number of cores and the CPU times are reported. Then, experiments are done for ITC’02 [17] benchmark chips with random test switching activities generated using a Markov chain similar to [24]. Finally, an experiment is performed for the d695 benchmark chip from ITC’02 with real switching activities based on real test data from [22]. The costs of the test schedules and the test schedule sizes are reported for the last two sets of experiments. The experimental setup is briefly introduced at the beginning and then the results are presented.
The static power is computed using the temperature dependent model given in [16]. The temperature simulations are performed using the approach proposed in Section 6. The spatial temperature error is assumed to have normal distribution (0, 10) ranging from −20 °C to +20 °C with a resolution of 0.5 °C (M = 80). The temporal temperature error is also assumed to have a normal distribution (0, 10) ranging from −40 °C to +40 °C with a resolution of 0.5 °C. It is assumed that there are 20 temperature error clusters (L = 20).
The balancing coefficient is assumed to be equal to ten (BC = 10). It is assumed that each entry in a linear schedule table occupies 64 bits and each entry in a branching table per core per edge occupies 32 bits. For example a node with two succeeding edges for a SoC with two cores, occupies (2 × 2 × 32) = 128 bits.
CPU times for SoCs with different number of cores
Number of cores | 5 | 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 |
CPU time [Sec] | 9 | 46 | 52 | 132 | 208 | 308 | 590 | 762 | 1141 | 1367 |
Test cost for traditional and proposed test scheduling techniques
ITC’02 chips | Cost | Percentage reduction relative to the Offline | Percentage reduction relative to the Hybrid | |||
---|---|---|---|---|---|---|
Offline | Hybrid | Proposed | Hybrid | Proposed | Proposed | |
a586710 | 1.44 | 0.56 | 0.54 | 61 | 62 | 4 |
d281 | 0.69 | 0.45 | 0.03 | 35 | 96 | 93 |
d695 | 0.50 | 0.12 | 0.06 | 76 | 88 | 50 |
f2126 | 2.71 | 1.39 | 0.51 | 49 | 81 | 63 |
g1023 | 5.09 | 4.27 | 1.99 | 16 | 61 | 53 |
h953 | 0.46 | 0.14 | 0.11 | 70 | 76 | 21 |
p22810 | 1.22 | 0.70 | 0.69 | 43 | 43 | 1 |
p34392 | 0.75 | 0.72 | 0.06 | 4 | 92 | 92 |
p93791 | 1.02 | 0.13 | 0.08 | 87 | 92 | 38 |
q12710 | 1.32 | 0.40 | 0.23 | 70 | 83 | 42 |
t512505 | 0.48 | 0.23 | 0.13 | 52 | 73 | 43 |
u226 | 1.05 | 0.43 | 0.37 | 59 | 65 | 14 |
Average | 52 | 76 | 43 |
The cost improvements reported in [1, 2] are at most 20 % better than the Offline method and at most 7 % better than the Hybrid method, while in this paper, in average, the adaptive method reduces the cost by 76 % over the Offline method and 43 % over the Hybrid method. The proposed adaptive method has reduced the cost by 76 % compared to the Offline method, while the cost reduction achieved by the Hybrid method is 52 %. This difference demonstrates the advantage of the proposed adaptive method.
ATE memory utilized only for schedule in traditional and in proposed techniques
ITC’02 chips | Utilized memory for schedule [bit] | Percentage increase relative to the Offline | Percentage increase relative to the Hybrid | |||
---|---|---|---|---|---|---|
Offline | Hybrid | Proposed | Hybrid | Proposed | Proposed | |
a586710 | 1216 | 1888 | 4768 | 55 | 292 | 152 |
d281 | 1088 | 1280 | 2624 | 18 | 141 | 105 |
d695 | 1280 | 2176 | 3392 | 70 | 165 | 54 |
f2126 | 704 | 960 | 2368 | 36 | 236 | 147 |
g1023 | 576 | 1088 | 4480 | 89 | 678 | 312 |
h953 | 576 | 1088 | 1472 | 89 | 155 | 35 |
p22810 | 704 | 1888 | 5568 | 168 | 691 | 195 |
p34392 | 832 | 1472 | 2688 | 77 | 223 | 83 |
p93791 | 704 | 1920 | 3136 | 173 | 345 | 63 |
q12710 | 640 | 1024 | 1664 | 60 | 160 | 62 |
t512505 | 1152 | 2336 | 3712 | 103 | 222 | 59 |
u226 | 320 | 672 | 1568 | 110 | 390 | 133 |
Average | 87 | 308 | 117 |
The increase in the usage of ATE memory (as given in Table 3) refers only to the memory space used to store the schedule. This is usually small, compared with the memory space used to store the test patterns. Therefore a large increase in the schedule size is very likely to be translated into a small increase in the usage of the ATE memory as a whole.
The proposed scheduling method will utilize the available ATE memory even if a very small reduction in cost (e.g., from 0.70 to 0.69 for p22810 in Table 2) is achieved. Since the number of nodes contributes to the scaled cost function (Eq. 6), a larger schedule will not be generated (e.g., 195 % larger for p22810 in Table 3 compared with hybrid solution) if it does not reduce the cost compared with a smaller schedule.
The ATE memory constraint will affect the quality of the adaptive test schedules. The proposed algorithm will not generate even an offline schedule when the available memory is too small. By increasing the available ATE memory, first an offline schedule and then a hybrid schedule will be generated. With the further increase of the memory constraint, better schedules with lower costs will be generated. This trend continues until the cost reaches a minimum beyond which further cost reduction is impossible. The minimum cost is usually dictated by the branching overheads (time to read sensors and react accordingly). Such a trend is experimentally demonstrated in [1].
Cost and ATE memory utilized for schedule for d695 with real test data
Percentage change relative to the Offline | Percentage change relative to the Hybrid | |||||
---|---|---|---|---|---|---|
Offline | Hybrid | Proposed | Hybrid | Proposed | Proposed | |
Cost | 20.84 | 8.53 | 5.93 | −59 | −71 | −30 |
Utilized memory for schedule [bit] | 2688 | 4992 | 7232 | +86 | +169 | +49 |
The test sequences are stored elsewhere and are not affected by the schedule. The test size is larger than the schedule size, and therefore the effect of increased schedule size on the total consumed ATE memory is small. For example consider the experiments with the d695 chip with real switching activities. The size of the schedule for the adaptive solution is approximately 7 Kbit while the test size is approximately 1,324 Kbit. Therefore the percentage increase in total utilized ATE memory from the offline solution to the adaptive solution is 0.34 %. This means that the adaptive method achieves 71 % reduction in cost relative to the offline method, with a small expense of 0.34 % increase in the occupied ATE memory.
8 Conclusion
This paper presents an adaptive SoC test scheduling technique to deal with spatial and temporal temperature deviations, caused by process variations in deep submicron technologies. The main contribution is an algorithm to generate a set of efficient test schedules, each corresponding to a different thermal behavior of different cores during test. The on-chip temperature sensors are used to monitor the actual temperatures of the different cores and to guide the selection of the corresponding test schedules accordingly, during the test. This way, the overall test efficiency will be improved considerably.
The proposed technique consists of two distinct algorithms, the test scheduler and the thermal simulator. The thermal-aware test scheduler is a constructive algorithm which generates tree-based test schedules by putting the optimized sub-trees together. Sub-tree optimization is basically a chip-clustering algorithm which involves a linear test scheduling algorithm. A new sub-tree scheduling algorithm is proposed in this paper. The linear scheduling algorithm requires a thermal simulator in its main loop. A fast thermal simulation approach is proposed in order to speed up the thermal-aware test scheduling algorithm.
The proposed adaptive test scheduling technique generates process-variation and temperature aware test schedules for SoCs with large number of cores. The algorithm has a relatively short run-time and generates high quality test schedules. The proposed technique has been experimentally evaluated using a number of experiments including ITC’02 benchmark SoCs. The proposed technique outperforms the Offline and the Hybrid methods in average by 76 % and 43 %, respectively.
9 Quick Reference
Notation | Description | Equation |
A | Capacitances vector in the thermal model | |
ATS | Applied Test Size | |
B | Resistances vector in the thermal model | |
BC | Balancing Coefficient | |
Branching table | The table that determines with which linear schedule table a specific chip should be tested. (See the example in Section 2) | |
C | Number of cores | |
Chip cluster | A group of chips with similar thermal behavior that are tested with the same Linear schedule table. A chip cluster corresponds to an edge in the schedule tree. | |
Chip-cluster border | The border line between two Chip clusters. For two adjacent Chip clusters the border is a set of natural numbers, each corresponding to an individual core. A border represents a particular error value. (See Section 5.4) | |
Chip clustering | In short, it is finding the optimal partitioning of the C-dimensional error space into an already known number of Chip clusters for the nodes of a tree. (See full explanation in Section 5.4) | |
CTF | Cost of the Test Facility per time unit | |
EATS | Expected Applied Test Size | |
EB | temperature Error-clusters Borders | |
ECCP | Error Cell Change Probabilities | |
ECCP^{before normalization} | ECCP before being normalized | |
ECL | Error-Cell Labeling | |
ECP | Error-Cells Probabilities | |
ECP^{after branching} | ECP just after branching | |
ECP^{after overheating} | ECP just after overheating | |
ECP^{after temporal changes} | ECP after temporal changes (according to temperature error model) | |
Error cluster | A range of error values which are to be treated as one single error value. Error clusters are separated by Error-clusters Borders, EB. | |
ETAT | Expected Test Application Time | |
ETOP | Expected Test Overheating Probability | |
ETTpS | Effective Test Time per Second | |
HTT | High Temperature Threshold | |
I | Identity matrix | |
Knee | The point that the output is equal to the input and not equal to one, in the proposed reciprocal limiter. | |
L | Number of temperature error clusters | |
Linear schedule table | A schedule that specifies stop/start times for the test of all cores individually. This will correspond to an edge or to a single Chip cluster. (See the example in Section 2) | |
LOP | Leaves’ Overheating Probabilities | |
M | Number of temperature error values | |
N | Number of nodes in a tree | |
NATS | Nodes’ Applied Test Sizes | |
NCF | Normalized Cost Function | |
NCL | Node’s Cluster Label | |
NnOP | Node’s not Overheating Probability | |
Node | A node in the schedule tree that corresponds to the ending of a Linear schedule table (i.e., a place that branching is possible). | |
NP | Nodes’ Probabilities | |
NPT | expected Number of Partial Trees, similar to the current partial tree, that are required to construct the complete schedule tree | |
NTAT | Nodes’ Test Application Times | |
NTT | Normalized Test Throughput | |
P | Power vector | |
Partial cost function | NCF evaluated for a part of the schedule tree (e.g., a sub-tree). | |
POC | Price of One Chip | |
PSO | Particle Swarm Optimization | |
PTOP | Predicted Test Overheating Probability | |
Q | Number of leaf nodes | |
S | Number of succeeding edges for a node | |
SCF | Scaled Cost Function that is used to select the unfinished trees out of a group of offspring trees. | |
ST | Simulated Temperature | |
STEP | Spatial Temperature Error Probabilities | |
TAM | Test Access Mechanism | |
TAT | Test Application Time | |
Temporal error period | The period for the discrete-time temperature error model. The error values are updated regularly with a frequency equal to 1/Temporal_error_period. (See Section 4) | |
TEV | Temperature Error Values | |
THT | Test Handling Time | |
TOP | Test Overheating Probability | |
TT | Test Throughput | |
TTEP | Temporal Temperature Error Probability | |
W | Number of nodes in the thermal model | |
α | Transfer matrix for initial temperatures | |
β | Transfer matrix for power values | |
Θ | Temperatures vector in thermal model | |
θ^{0} | Initial temperatures | |
θ^{t} | Temperatures at the end of the interval of size t | |
Temperatures at t-th time sample (in Section 2) | ||
θ_{w} | Temperature of w-th thermal node | |
λ | The output of the proposed limiter, applied on the expected number of partial trees, NPT. λ ≤ (knee + 1) |
Acknowledgments
The authors would like to thank Soheil Samii for providing cycle-accurate test switching-activities data for SoC d695.