Test Flow Selection for Stacked Integrated Circuits

Integrated circuits (ICs) with a single chip (die) are typically tested with a test flow consisting of two test instances: (1) wafer sort for the bare chip and (2) package test for the packaged IC. For ICs with stacked chips - 3D Stacked ICs - there are many possible test instances, even more test flows, and no commonly used test flow. In this paper, we propose a test flow selection algorithm (TFSA) to obtain a test flow for a given 3D Stacked IC. The TFSA results in a test flow for a given 3D Stacked IC, such that the expected total test time to produce each good package is minimized. We implemented the TFSA, three straightforward test flow schemes and an exhaustive search, and experimentally compared the test flow schemes on three different test architecture design approaches. The results demonstrate the importance to have methods both to select the test flow and design the test architecture.


Introduction
The constant development in semiconductor technologies enables increasingly advanced integrated circuits (ICs). Today, it is possible to manufacture wafers where each individual chip (die) contains billions of transistors. After manufacturing, the chips are first cut from the wafer and then wire bonded to connect the chip to the package. Finally, the chips are packaged. The most recent advancement in semiconductor technologies is to stack several chips on top Linköping University, Linköping, Sweden of each other and package them in one IC − 3D Stacked IC. The chips in such a 3D Stacked IC are connected by through silicon vias (TSVs) [8].
IC manufacturing is extremely complex, which increases the risk of defects. To detect manufacturing defects, each and every IC is carefully tested. ICs with a single chip are commonly tested with a test flow consisting of two test instances; wafer sort and package test. The bare chip is tested at wafer sort to avoid packaging of defective chips. If no defects are found at wafer sort, the chip is wire-bonded, packaged, and re-tested during package test as defects may be introduced during wire-bonding and packaging. For 3D Stacked ICs there are many more test instances. It is possible to test each individual chip during wafer sort instances, at intermediate test instances where the partially complete stacks can be tested, and at package test instance where the complete 3D Stacked IC is tested. For a 3D Stacked IC with N chips there are 2N instances when a test may be performed, N during wafer sort, N − 1 for intermediate stacks and 1 for package test. Hence, there are 2 2N possible variations of test flows [8].
The test cost of 3D Stacked ICs depend on a large number of factors, such as the hardware manufacturing cost that includes wafer fabrication, stacking and packaging, DfT (design for test), fault coverage, test resource and test equipment, the test time and yield. In this paper we reduce the test cost per good 3D Stacked IC produced, by selecting a test flow optimizing two of the major contributing factors: test time and yield. Eventually, we address minimization of the expected test time for each good 3D Stacked IC produced, and implement that along with test architecture optimization as described in previous articles [12,13].
The expected test time may vary both with the choice of test flow as well as the applied test schedule. The time spent on testing defective ICs also contributes to the test cost. Hence, the manufacturing yield needs to be taken into account. For 3D Stacked ICs, where it is possible to stack a number of different chips, it is of interest to know how many chips of different types are needed in the manufacturing process to obtain a fixed number of good packages of 3D Stacked IC.
The test cost can be reduced by improving the yield and/or by reducing the time spent on testing. Improving yield implies reducing defects in the manufacturing process, for example, by upgrading production technologies. Yield improvement is not in the scope of this paper. In this paper, we focus on minimizing the time spent on testing.
In this paper, we assume a 3D Stacked IC, where the test time and yield at each test instance is known. With the goal of minimizing the expected test time per good 3D Stacked IC package, we propose a method to compute the effective yield, the number of chips that need to be tested at each test instance, and the expected test time, depending on the selected test flow. To find the most suitable test flow, we propose the Test Flow Selection Algorithm (TFSA). We performed experiments on several 3D Stacked ICs to compare the test flow obtained using TFSA against test flows obtained by exhaustive search and three straightforward test flows − wafer sort of each individual chip followed by package test (WSPT), test at all possible instances (TA) and test performed only at package test (PT). The results demonstrate that (1) TFSA produces results that are better than the three straightforward test flow schemes, (2) TFSA produces optimal test flow in most cases, and (3) the straightforward test flow where wafer sort of each chip is followed by package test of the complete 3D Stacked IC gives the best result among the three straightforward test flow schemes. We also integrated test flow selection with test architecture design, adopting the test planning scheme proposed in [12]. While [12] optimizes the test architecture for a single test flow that consists of wafer sorts of the individual chips followed by package test of the complete 3D Stacked IC, in this paper we generalize the approach for any given test flow. The experimental results validate that it is important to have methods to find the test flow as well as methods to design the test architecture. The test flow model and TFSA complies with all die orientations − face-to-face, face-to-back and back-to-back; as well as all wafer bonding technologies − wafer-to-wafer, die-to-die and die-to wafer. The rest of the paper is organized as follows. In Section 2 we discuss related research. The test architecture is elaborated in Section 3. In Section 4, we illustrate with an example at three different yield sets the need of finding a suitable test flow. In Section 5, we introduce notations and formulae, while in Section 6 we present the TFSA. In Section 7 we report the results from the experiments. The paper is concluded in Section 8.

Related Work
Several works have addressed test planning for core-based ICs having a single chip with the aim of optimizing the test cost [2,6,7]. Design and optimization of test architecture for non-stacked ICs with IEEE 1500 is described in [4,5,11,15]. In [5], Iyengar et al. address optimization of test access mechanisms (TAMs) for System-on-Chips (SoCs) to reduce core-test time by balancing core scan chains. Mullane et al. in [11] propose a hybrid scan for non-stacked ICs provided with IEEE 1500 core wrappers, by combining the serial and the parallel ports of the wrapper, resulting in an efficient test access that reduces the test time. However, for 3D Stacked ICs, test architecture optimized for each chip in the stack during wafer sort may not lead to an optimized test architecture when all the chips are tested jointly during package test.
We have proposed methods to reduce the test time for core-based 3D Stacked ICs, by optimizing the test architecture and the test plan [12,13]. We used a straightforward test flow, where each individual chip is tested at wafer sort and the complete stack at package test.
The test planning approaches were not adapted for arbitrary test flows.
Taouil et al. propose test cost models to predict the impact of test flows on the product quality and overall stack cost at an early design stage, which is important for a trade-off between quality and cost [14]. They present a model that predicts the product quality, defined in terms of Defective Parts Per Million (DPPM) for different test flows. A framework is provided for covering different test flows and cost models to identify the most cost effective test flow [3]. Simulation results show that test flows that include wafer sort generally reduce the overall cost and that the most cost-effective test flow strongly depends on the stack yield. They concluded, after experiments on different test flows, that adapting the tests according to the stack yield is a good approach. The paper analyzes several test flows for given 3D Stacked ICs, but do not provide a method for the selection of the most suitable test flow. Agrawal et al. in [1] have proposed a low complexity test flow selection scheme for 3D Stacked ICs to achieve a low test cost. It is shown that the test flow selection method takes considerably lower computation time as compared to exhaustive methods that completely explore the exponentially growing search space for 3D Stacked ICs. However, an estimate of the margin of the increase in test cost by using the proposed method against exhaustive search is lacking. Both [1,14] are built on the compromise between yield and test cost. However, the trade-off among two major contributing factors to the test cost, namely, test time and manufacturing cost of each component has been overlooked. The work is not verified against any test architecture design and test planning scheme. To the best of our knowledge, no work has previously proposed a test flow selection algorithm and verified it against any test architecture and test planning scheme.
A new test standard, IEEE P1838 is being developed to enable efficient modular testing of SICs [10]. The standard involves a die level wrapper on each chip in the stack. In addition, an IEEE 1149.1 based TAP controller is provided in the bottom die that controls the WIRs of the die wrappers. [12,13] addresses test scheduling for SICs to minimize the test cost. In [12] a scalable test architecture is assumed where each chip is provided with a IEEE 1500 based wrapper, in accordance with the developing IEEE P1838 standard. Similarly, [13] assumes a IEEE 1149.1 based test architecture for the SICs for test planning.

IEEE 1500 Based Test Architecture
In this section we discuss the test architecture for a 3D Stacked IC, where each chip of the stack is supported by a IEEE 1500 based infrastructure as proposed by [8].
A number of chips are stacked to construct a 3D Stacked IC. Figure 1  Within each chip, the TAM width, WPI, is split at the switch box, to several groups of TAMs. Each group of TAM is used to access one or more cores of the chip, connected in series. The TAM width W in Fig. 1, for Chip1 is split among Gr 1 connecting core1 and core2 in series, and Gr 2 to core3, while for Chip2 Gr 3 and Gr 4 accesses core4 and core5, respectively. Wrapper-chains are constructed by allocating one or more scan-chains to a single TAM line. In Fig. 2, core1 of Chip1 is illustrated, that has three scanchains: sc1, sc2 and sc3. TAM group Gr1 is used to access core1, such that sc1 forms a wrapper-chain on T AM1, while sc2 and sc3 in series is a wrapper-chain on T AM2. As illustrated in Fig. 1. the WPOup, WSOup and WSCup of the lower chip, Chip1, is connected to the WPIdown, WSIdown and WSCdown of the chip on top, Chip2, respectively. The WPOup, WSOup and WSCup of the topmost chip, Chip2, are directed out via the WPOdown, WSOdown and WSCdown, respectively, of the lowermost chip in the stack, Chip1. The WPOdown, WPIdown, WSOdown, WSOdown and WSCdown of the lowermost chip, Chip1, serve as the package test interface for the 3D Stacked IC. As in [9], in this paper we assume equal width of WPIs and WPOs for each chip.
The TSV interconnect between chips may be tested using the boundary scan registers, which connects all input/output via TSVs. Boundary scan registers are implemented on both chips and are used in TSV interconnect test. Test stimuli are applied on out-going TSVs and test responses are captured on in-coming TSVs. Since the boundary scan register is a separate register, testing of TSVs cannot be performed concurrently with core tests.
The TSV interconnect tests contribute with a constant term to the overall test time and could not be scheduled with any core tests. Therefore, the time required to perform TSV interconnect tests are overseen while addressing the total test time in the remainder of the paper.  We compare the expected total test time required to obtain each good 3D Stacked IC, by assuming TA, WSPT and PT as the test flows. Figure 3 illustrates the expected total test time required by the three test flows with the three sets of yield values. Computation details of the expected test time required by the test flows are elaborated in the following Section.
The results show that for Case 1 PT has the lowest expected test time, while in Case 2, the test flow with the lowest expected test time is WSPT, and that in Case 3 TA results in the lowest expected test time. Thus, it can be concluded from the results that a straightforward test flow may not provide the lowest expected test time for any given 3D Stacked IC. In this paper we present a method to obtain a test flow for any given 3D Stacked IC, such that the expected total test time is minimized.

Expected Total Test Time Estimation
In this section we derive an expression to calculate the expected total test time required to produce each fault-free 3D Stacked IC for any assumed test flow. For reference, we assume the design provided in Table 1 with yield Case 1.
We elaborate the notations given in Table 2 using Figs. 4 and 5. First, the given notations corresponding to given data are discussed, followed by the notations that refer to values that need to be calculated for a selected test flow. Finally, we discuss the notations that represent values that hold true for all 3D Stacked ICs considered in this paper. We assume a given 3D Stacked IC with N chips in the stack, where each chip is denoted by i, 1 ≤ i ≤ N. A test instance is denoted by I ij , illustrated in Fig. 5. Each test instance, I ij , requires test time T ij and has a yield y ij .
The wafer sort instances are illustrated by the boxes in the upper row, where j = 1, which means I 11 to I N1 are the instances for wafer sort. The wafer sort instance of a chip i is indicated with I i1 to the left in Fig. 4. We have j = 2 for intermediate tests of partial stacks with i chips, and for package test instances, I i2 . In Fig. 5 Fig. 4.
A partial stack, with i chips, is tested during the intermediate instance, I i2 , which comprises of components from two previous instances − a partial stack with i − 1 chips, and chip i. This is illustrated by arrows connecting instance I i−12 → I i2 , and I i1 → I i2 , in Fig. 5.
We now discuss the values that need to be computed to obtain the desired test flow. A test flow is represented by the vector X = (x 11 ...x N1 ), (x 22 ...x N2 ), (x N+12 ), with 2N elements, that include N wafer sorts, N − 1 intermediate tests and 1 package test. Each element is denoted by the binary decision variables x ij , for each box in Fig. 5, we set x ij = 1 when a test is performed at instance I ij , and x ij = 0 when no tests are performed at instance I ij . Let us consider the example of a 3D Stacked IC with 2 chips in the stack, as illustrated in Section 4. The vectors for TA, WSPT and PT would be represented as: τ is the total expected time taken by the test flow given by X. The total expected time depends on what tests are applied in a test flow. At each instance, the effective yield is computed depending on the tests that have been previously performed. To enable computation of test time, for a test instance I ij , we let Q ij denote the number of good units that need to be produced at instance I ij , T eff (ij ) denote the expected time taken to produce each good unit at instance I ij , and Y ij the effective yield. The objective of this paper is to minimize the expected total test time τ .
Finally, it is given for all 3D Stacked ICs that, instances I 12 and I N+11 do not exist. Instance I 12 corresponds to the box at the bottom left of Figure 5, for the intermediate test of only chip 1, which does not exist as at least two stacked chips are tested at any intermediate test. Again, for a 3D Stacked IC comprising of N chips in the stack, instance I N+11 corresponding to the box at the top right corner of Fig. 5, refers to wafer sort of chip N + 1, also does not exist. Therefore, it may be assumed that these instances require no test time, i.e., T 12 = T N+11 = 0, and also have a perfect yield, y 12 = y N+11 = 1, for the generic expressions.
In the following Sections 5.1, 5.2, and 5.3 respectively, we elaborate each component of the expression, viz, yield Y ij , quantity Q ij and time T eff (ij ).

Yield
The effective yield at each test instance is presented here, which is given as a function of the given yield values of the preceding test instances, depending on whether a test was performed during that instance. We will discuss effective yield first for wafer sort and then for intermediate and package tests.
Let us consider an arbitrary wafer sort test instance I i1 , which has the given yield of y i1 . As there are no prior tests of the chip at wafer sort, the effective yield depends only on the yield at test instance I i1 .
For wafer sort instances the effective yield Y i1 at test instance I i1 is given as: Does not exist In the example the yield of chip 2 for case 1 at wafer sort is y 21 = 0.91, which means that the effective yield Y 21 , which is wafer sort test of chip 2, is:  ). If x ij = 0 we get yx i1 i1 = y 1 i1 = y i1 and x ij = 1 gives yx i1 i1 = y 0 i1 = 1. Therefore, the expression gives y ij when x i1 = 1 i.e., when test was performed at wafer sort, and y ij · y i1 when x i1 = 0 i.e., when no tests were performed at wafer sort. However, the effective yield at any intermediate instance I i2 will also depend on the yield of the preceding intermediate instance I For the 3D Stacked IC in Section 4 with yield Case 1 and test flow X = (0, 1), (1), (1), we have x 11 = 0, x 21 = 1, x 22 = 1 and x 32 = 1. Hence we can compute Y 22 as: 11 11 · yx 21 21 = 0.92 · 0.90 1 · 0.91 0 = 0.92 · 0.90 · 1 = 0.8280 (4) Similarly, the effective yield at the following intermediate or package test instance I 32 depends on all previous tests performed, and can be given as: As seen in Fig. 5 11 11 · yx 21 21 , for i = 2 Therefore, in case of package tests, we will have: Where, it is given that y N+11 = 1, as noted in Table 2 and we setx N+11 = 1.
For the given package test instance I 32 , in this example, the preceding wafer sorts were performed, while the intermediate test instance was avoided; such that X = (1, 1), (0), (1

Quantity
At any test instance, the number of units tested is greater than the number of good units obtained, due to imperfect (< 1) yield. Therefore, we calculate the expected quantity of good units required at the end of each instance such that a fixed number of good units are obtained from a succeeding manufacturing stage. Let us start with the package test instance, illustrated by the rightmost box in Fig. 5. With a yield of y N+12 (< 1), to obtain Q N+12 = 1 good packages we need to test Q N+12 /y N+12 packages. Therefore, at the preceding instance I N2 , we need to produce Q N2 = Q N+12 /y N+12 good units. Now, to produce Q N2 good units at the intermediate test instance I N2 , we need to test Q N2 /(y N2 · yx N 1 N1 ) units. It is useful to note here that the number of instances that need to be tested during the intermediate test instance I N2 increases by 1/y N1 times if test was not performed at the wafer sort instance I N1 . This is due to the share of the defective wafers that pass on to the intermediate stack. Consequently, we need to produce Q N1 = Q N2 /(y N2 · yx N 1 N1 ) good intermediate stacks and back calculate the number of good units that need to be produced after each test instance up to Q 11 . The quantity of good units required after each test instance Q ij is formulated as follows.
For wafer sort instances: For the 3D Stacked IC in Section 4 with yield Case 1 and X = (0, 0), (0), (1), we utilize the second part of Eq. 9, since i = 1 ≤ 2. Therefore, the number of units required at the end of instance I 11 is: In the example, the quantity required for package test, when only the wafer sorts have been performed, such that X = (1, 1), (0), (1) is: It should be noted that at any intermediate test instance I i2 , for each intermediate stack comprising of chips 1 to i−1 obtained from instance I i−12 , per chip i from instance I i1 is stacked. Therefore, we need equal number of units from preceding instances I i1 and I i−12 , giving Q i1 = Q i−12 .

Time
An expression to calculate the expected total time taken by any test flow to produce each fault-free packaged 3D Stacked IC is formulated here. The time expected at each test instance depends on the time taken to test each unit at the instance, the effective yield, as well as the number of units required at successive test instances to produce the desired number of good packages. Table 3 is used to list the expected test time at each instance for different test flows required by the 3D Stacked ICs mentioned in Table 1.
The effective test time, T eff (ij ), spent at any instance, I ij , depends on the given test time, T ij , effective yield, Y ij , at the instance and the number of defect-free units that need to be obtained, Q ij , and the binary decision variable, x ij , is given by: = 30 0.92 · 0.90 0 · 0.91 0 · 1.075 · 1 = 35.06 (14) The sum of the effective test times, T eff (ij ), at each instance, I ij , gives the expected total test time required, τ , by any selected test flow, X, to produce each fault-free packaged 3D Stacked IC as shown below.
The expected total test time, assuming a test flow when all instances are tested, as seen in the topmost row of Case 1, gives: The objective is to find a suitable test flow for any given 3D Stacked IC, such that the expected total test time τ is minimized.

Test Flow Selection Algorithm (TFSA)
In this section, we first detail the Test Flow Selection Algorithm (TFSA) and then we detail the computational complexity of the algorithm.
Given the test time T ij and yield y ij at all test instances I ij , the TFSA generates a test flow, X, by iteratively trying to reduce the expected total test time τ . At each iteration, the test instance that contributes to most reduction in τ is selected. As discussed in the previous section, we represent a test flow with the vector X = (x 11 ...x N1 ), (x 22 ...x N2 ), (x N+12 ), where (x N+12 ) = 1, since package test is always performed.
TFSA, which is detailed in Algorithm 1, in line 1, takes as input N chips where for each test instance I ij (1 ≤ i ≤ N, 1 ≤ j ≤ 2) the test time T ij and y ij are given. We use the 3D Stacked IC described in Section 4, with yield Case 3 in Table 1 for illustration. In the example, there are 2 chips (N = 2); hence, there are 2 · N = 4 possible test instances.
The test flow vector X, and the corresponding test cost τ are initialized in line 2. All binary decision variables, x ij , are initialized such that only package test is applied. For the example, we set X = (0, 0), (0), (1). After initialization, the expected time is computed with Eq. 15 to: y 32 · (y 22 · (yx 11 11 · yx 21 21 ))x 22 As noted in Table 2, T eff (12) = 0 and T eff (31) = 0. When only package test is applied, the actual yield at package test takes the yield at all instances into account.
A variable, Counter, is active between lines 3 → 19, to ascertain 2N − 1 iterations. In this example, Counter iterates from 1 → 3. Variablesí andj are reset for the iteration, in line 4.
To scan through all 2N − 1 test instances I ij , variables i and j are defined between lines 5 → 17 and lines 6 → 16, respectively, where 1 ≤ i ≤ N and 1 ≤ j ≤ 2.
During an iteration, each inactive test instance x ij = 0 is set to x ij = 1, in line 7 → 8. The corresponding test costτ is computed in line 9, as a result of the modified test flow, to evaluate if there is a benefit to include the test instance in the test flow. In the first iteration, the matrix is updated to (1, 0), (0), (1).
At line 9, the effective total test timeτ for the current test flow X = (1, 0), (0), (1) is computed as: y 32 · (y 22 · (yx 11 11 · yx 21 21 ))x 22 Note, in this case, wafer sort is applied to chip 1, which means y 11 is used at test instance I 11 , and not in test instance I 32 .

Complexity Estimation
There are two nested iterations in algorithm 1. The outer iteration, for loops between lines 3 → 19, iterates Counter from 1 to 2N − 1, and the inner iteration, for loops between lines 5 → 17, iterates i from 1 to N. For each i, j takes two values 1 and 2. Thus, the complexity is the product of the number of iterations of each loop, i.e., (2N − 1) · N · 2 = 4N 2 − 2N, which is of order O(N 2 ).

Experiments
In this section we present two sets of experiments. First we compare the expected total test times obtained from TFSA with respect to three straightforward test flows and the test flow obtained by exhaustive search. Next, the TFSA is integrated with test planning of core-based 3D Stacked ICs with a IEEE 1500 based test architecture, to optimize the test time.

Test Flow Selection
The objective is to compare the expected total test time by applying TFSA and that required with the three straightforward test flow schemes (TA, WSPT, and PT) as well as against exhaustive search.
Experiments were performed on two sets of 3D Stacked IC designs with 2 to 10 chips in the stack. The 3D Stacked IC designs are detailed in Table 5. For example, in case of both Set 1 and Set 2, SI C 2 consists of three chips in the stack, chip 1, 2, and 3. The test time and yield values of each chip in the 3D Stacked IC designs at wafer sort are given for Set 1 and Set 2 in Table 4. In Set 1, chip 1 has a test time of 1000 time units and a yield of 0.62. The test times  Table 4. First chip is lowermost. 1 1, 2 SI C 2 1, 2, 3 SI C 3 1, 2, 3, 4 SI C 4 1, 2, 3, 4, 5 SI C 5 1, 2, 3, 4, 5, 6 SI C 6 1, 2, 3, 4, 5, 6, 7 SI C 7 1, 2, 3, 4, 5, 6, 7, 8 SI C 8 1, 2, 3, 4, 5, 6, 7, 8, 9 SI C 9 1, 2, 3, 4, 5, 6,7,8,9,10 are kept constant for all chips in the stack for both Set 1 and Set 2, to emphasize the difference among the expected total test times. However, the yield values for Set 1 and Set 2 are changed to emphasize the differences among the test flows obtained. For intermediate tests and package test, we assume an additional test time of 1000 units and the given yield at the test instance to be 0.70. For instance, the test time assumed during the package test of SI C 3 is the sum of the the test times of each individual chip − chips 1, 2 and 3 −, two layers of interconnects − between chips 1 and 2, and chips 2 and 3 − give an additional 2 × 1000, and finally 1000 time units for testing the package itself ( Table 5).

SI C
The results of the comparison between TFSA, TA, PT, WSPT and exhaustive search are collated in Table 6. Table 6 is organized as follows. The leftmost column lists the 3D Stacked IC designs. The following group of five columns list the expected total test times required by each method. The rightmost group of four columns depicts for each method the overhead in expected total test time time compared to the optimal expected total test time obtained by exhaustive search. The most significant points that can be drawn from Table 6 are: • TFSA generates test flows and corresponding test times very close to exhaustive search in most cases. • PT has a low expected total test time for 3D Stacked ICs with up to three chips in the stack: for SI C 1 of Set 1, the result is only 1% away from optimum, whereas the optimal is obtained for SI C 1 and SI C 2 of Set 2. However, for all other cases, PT produces results that are far from optimum. As the number of chips in the 3D Stacked IC increases, the performance of PT deteriorates. • TA results give expected total test times that are about 40% more than optimum for Set 1 and over 80% worse at an average for Set 2. • WSPT is not as efficient as the TFSA. However, it is interesting to note that WSPT produces optimal results when the number of chips is less than 4, and WSPT is only a few % away from optimum when the number of chips in the stack is less than eight. Table 7 lists the test flows obtained by the exhaustive search and the TFSA, respectively. It is interesting to note, first of all, the test flows proposed by the TFSA are, in most cases, the same as the test flows obtained from exhaustive search. However, for exhaustive search the expected total test time for 2 2N−1 test flows need to be evaluated, whereas TFSA only compares (2N −1) 2 test flows. Therefore, TFSA requires lower computation time, as compared to exhaustive search, to determine a test flow for 3D Stacked ICs with more than 2 chips in the stack. In case of 3D Stacked ICs with 9 chips in the stack, for TFSA, the test flow was determined in just over 2 minutes, whereas exhaustive search required longer than 2 days to arrive at the same result in Table 7. Secondly, the optimal test flow does not follow a regular pattern. For example, SI C 6 and SI C 7 of Set 1 differ by only one chip. But the test flows are very different. In addition, it is also observed that performing wafer sort pays off in most cases.

Test Architecture Design
In the second set of experiments, the goal is to compare the expected total test times obtained by integrating test architecture design and test planning schemes with different test flows. We evaluate (1) TFSA against three straightforward test flow schemes (TA, WSPT, and PT) against an exhaustive search of all possible test flows, and (2) the test flows on three test architecture designs and test planning schemes. The objective here is to integrate test flow selection and test architecture design to obtain the minimal test cost.
In the experiments, for each design we applied the four test flows and at each test flow we used the three test architecture design schemes. The results are collated in Table 9, which is organized as follows. There is a subtable for each of the 9 designs (DP, DT, GP, GT, DGP, DGT, DPT, GPT, and DGPT). Each sub-table is organized in the same manner. As an example, we take the sub- The results indicate that Scheme SIC is best for all cases. In some cases, Scheme SIC versus Scheme 2 on design GP is 9% better, but in some cases, for example DGPT Scheme SIC is 56% better than Scheme 2, with each test architecture scheme using the test flow obtained by TFSA.
The results indicate that WSPT is as good as TFSA in many cases; however, overall TFSA is close to exhaustive search. In Table 10 the test flows with the lowest test times on any test architecture scheme produced by TFSA is compared against exhaustive search on the designs. For example, the test flow for design DP using exhaustive search is X = (1, 1), (0), (1), which means wafer sort of D and P and package test of DP (for details, refer to Section 5).

Conclusion
In this paper, we illustrate the importance of test flow selection to reduce the expected total test time to produce each 3D Stacked IC. We propose a test flow selection algorithm (TFSA) to find the most suitable test flow for a given 3D Stacked IC. We evaluated the test flow obtained from TFSA, against three straightforward test flows and that obtained by exhaustive search. In the experiments we also compare the different test flows after integrating each with three test architecture design and test planning schemes. The experimental results demonstrate the importance to have methods both to find the test flow as well as test architecture design and test planning. It is observed that TFSA provides the optimal test flow, identical to exhaustive search, for all benchmarks. The test time can be further reduced by using both test architecture optimization and TFSA. For the benchmarks used in this paper, straightforward test flows like WSPT may perform as good as TFSA.
1) TFSA generates test flows and the corresponding test times very close to exhaustive search; 2) TFSA and a 3D Stacked IC optimized test architecture performs best with respect to test time, and 3) WSPT provides the minimum test time among the three straightforward test flow schemes and in many cases is equal to that with TFSA.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.