1 Introduction

Techniques of Scanning Acoustic Microscopy (SAM) were well known since several decades. Composites, IC and so on are getting smaller, layered constructions are getting more complex, and the thickness of layers are getting thinner. There are more and more new demands on the NDT for the now IC and Composites. SAM is the high-end application of ultrasonic techniques. Comparing with the X-Ray tomography, Sam can provide more checking information, more efficient detection efficiency, more secure working environment [1, 2]. More and more high-frequency point focus ultrasonic transducers have been put into use (3 MHz to 2000 MHz) to detect the defects in the sub micro-range [3]. Alain Phommahaxay and Ingrid De Wolf made use of \(V(z)\) curve model to detect the defect in TSV of 3D-ICs [4]. Sebastian Brand applied GHz-SAM to check defect in 3D-Integration IC and obtained 1 µm-regime image resolutions in experiments [5, 6].

The acoustic lens showed in the Fig. 1 is a point focus sensor. The transducer \(T\) mounted on a solid surface generates longitudinal wave that propagate to the lens and normal to surface of specimen. Lens is sapphire lens used to focus the incident wave into converging beam in the coupling fluid (water). \(F_{L}\) is the focal length of the acoustic lens. \(x,y,z\) is a three-axis coordinate system. Axis \(y\) is perpendicular to paper. Due to the existence of refraction and interference phenomena, the focus of the point focusing acoustic lens cannot be strictly converged on the focus point. As a result, there is a \(F_{Z}\), which is the length of focal spot. The sound pressure of point focusing lens is mainly concentrated in the focal spot region.

Fig. 1
figure 1

Schematic of point focus ultrasonic transducer beam propagation

The sound pressure \(P\) near the focus can be represented as expression (1) [7]. With the function (1) one can simulate the distribution curve of sound pressure. And the this mathematical theory has already been proved in document [8].

$$\begin{aligned} P(x,y,z) & = \sum\limits_{n = 1}^{15} {\frac{{\rho_{0} c_{0} v_{0} A_{n} }}{{1 + i(B_{n} + ik_{0} a^{2} /(2F_{L} ))y/d_{r} }}\exp (ik_{0} z)} \\ & \quad \exp \left( { - \frac{{B_{n} + ik_{0} a^{2} /(2F_{L} ))(x^{2} + z^{2} )/a^{2} }}{{1 + i(B_{n} + ik_{0} a^{2} /(2F_{L} ))y/d_{r} }}} \right) \\ \end{aligned}$$
(1)

where \(\rho_{0}\) and \(c_{0}\) are the density and velocity of water respectively. \(k_{0} = 2\pi /\lambda\) is the number of wave in water. \(d_{r} = 0.5k_{0} (D/2)^{2}\) is the distance of Rayleigh, \(D\) is the diameter of the acoustic lens. \(A_{n}\) and \(B_{n}\) are 15 sets of constants [9]. The simulation of sound pressure for a 30 MHz acoustic lens was shown in Fig. 2.

Fig. 2
figure 2

The simulation of sound pressure for a 30 MHz acoustic lens

Form the Fig. 2, one can get that the length of \(F_{Z}\) is very shot with only 5–6 µm. Further, with the frequency of transducer increasing, this distance \(F_{Z}\) will be further shortened. So, it is extremely complex to make the very length of focus spot match the thickness of film in specimen by manual operation. In order to solve the problem of focusing, we propose a high precision auto-focus method. In the following sections, we will explain both the theory and the implementation in detail.

2 Theory of Auto-focus Method

The purpose of Auto-focus method is to quickly find the position of the focal plane on the depth of layer interested. Auto-focus method can be divided into three steps. The first one is auto-focusing on upper surface, the second one is auto-focusing on interlayer, and the last one is to improve accuracy of the second step.

2.1 Theory of Auto-focus on Top Surface

Auto-focus on top surface is to equate the sound path \(L\) of sound wave in water with the focal length \(F_{L}\). \(F_{L}\) can be determined by the characteristics of the acoustic lens in Eq. (2).

$$F_{L} = R/(1 - c_{0} /c_{1} )$$
(2)

where \(R\) is the curvature radius of the acoustic lens. \(c_{0}\) and \(c_{1}\) are the sound speed in water and lens respectively. Make the time \(t\) equal to the time at which reflected wave arrives at the transducer.

$$t = 2L/c_{0} + 2H/c_{1}$$
(3)

\(2H/c_{1}\) can be presented by the delay time of lens \(t_{d}\). According to Ref. [10], the relation between sound velocity and temperature is showed in Eq. (4). \(temp\) is the temperature of water.

$$c_{0} = 1452(1 + 2.9376 \times 10^{ - 5} temp + 1.90574 \times 10^{ - 6} temp^{2} )$$
(4)

Let the \(t = 2F_{L} /c_{0} + t_{d}\), on the right of equation, \(F_{L}\), \(c_{0}\) and \(t_{d}\) are all deiced. So, the auto-focus on upper surface can be completed.

2.2 Theory of Auto-focus on Interlayer

SAM is used to image the internal structure of the sample. So, it is not enough to meet the application requirements by only auto-focus on upper surface. The theory of auto-focus on interlayer is based on \(V(z)\) curve model. \(V(z)\) curve model was proposed by Weglein, who experimentally obtained the surface acoustic wave velocity in the layered structure with different thickness of gold films on silicon (100) for first time. Weglein fund that there is a series of regular periodical (\(\Delta z\)) oscillations occurs on the negative z side [11] in Eq. (5). It can be applied to measure the characteristics of different materials. Atalar had introduced the mathematical model for the \(V(z)\) curve [12].

$$\Delta z = \frac{2\pi }{{2k_{w} (1 - \cos \theta_{R} )}} = \frac{\lambda w}{{2(1 - \cos \theta_{R} )}} = \frac{{v_{R}^{2} }}{{f \times v_{w} }}$$
(5)

where \(k_{w} = 2\pi /2\lambda\) is the wave number of sound in liquid. \(\theta_{R}\) is the critical angle of the Rayleigh wave defined by \(\sin \theta_{R} = v_{l} /v_{R}\). Where \(v_{l}\) is the velocity of the immersion liquid, and \(v_{R}\) is the Rayleigh wave velocity of specimen. So, the \(V(z)\) curve model is a useful tool for material characterization and defect detection in multi-layered PCBs0.

As showed in the Fig. 3, when the aperture angle of the acoustic lens is larger than the critical angle of Rayleigh wave, there is a part of the leaked longitudinal wave. The \(V(z)\) curve it natural to be considered as result of acoustic ray interference. There are two components of acoustic ray. The first component is reflected wave, which is a part of the leaked longitudinal waves. The second component is leaky Rayleigh waves, symmetrical to the incident beam. They arrive at the transducer and produce the A-Scan signal with interference effects.

Fig. 3
figure 3

The V(z) curve with a 230 MHz acoustic lens on layered specimen

Consequently, the \(V(z)\) can also be divided into two parts, one is the \(V_{G} (z)\) curve caused by reflected wave, the other is the \(V_{R} (z)\) caused by surface Rayleigh wave showed in Fig. 4.

Fig. 4
figure 4

The curves of \(V_{G} (z)\) in a and \(V_{R} (z)\) in b

In this paper, we use the \(V_{G} (z)\) curve to find the maximum amplitude of the reflected wave in the focal plane in different depths. \(V_{G} (z)\) curve, which we concerned, is not the reflected echo from surface of specimen, but is the echo reflected from very interface at which the depth we interested. Curve \(V_{G} (z)\) is almost identical to curve \(V(z)\) in the case of using acoustic focal lens with a small angle (\(\theta_{m} < \theta_{r}\), \(\theta_{m}\) is the semi-angle of the acoustic lens). The ideal situation showed in Fig. 5.

Fig. 5
figure 5

Interlayer focusing curve in idea situation

However, the A-Scan signal is contaminated by the superposition of echoes from internal thin structures. The curves obtained in the experiment are not as smoothing as the curve obtained in the simulation due to the noise and superposition. The problem is how to get the same curve as Fig. 5 by reducing the noise.

2.3 Resolution Improvement of \(V(z)\) Curve Model

The A-scan signal is composed of multiple reflection signals in multilayer specimen shown in Fig. 6a. Assuming that the upper surface is focused, the position of the lens is the origin of the z-axis coordinates.

Fig. 6
figure 6

The reflected signal of SAM on the ceramic substrate IC

The A-Scan signal obtained from specimen can be written as superposition of number \(M\) reflected echoes:

$$y(t) = \sum\limits_{i = i}^{M} {s_{i} + n(t)}$$
(6)

\(n(t)\) is the noise. Let \(x_{i} (t)\) be the incident pulse, and \(r_{i}\) be the reflection coefficient for the layer \(i\). Equation (6) can by rewritten in matrix format as:

$$y(t) = \sum\limits_{i = i}^{M} {r_{i} x_{i} + n(t)} = \varPhi r + v$$
(7)

The echoes \(s_{1} (t),s_{2} (t)\) and \(s_{3} (t)\) are reflected from top surface, die top and die bottom respectively. When die is thicker than wavelength of acoustic wave, signal \(s_{2} (t)\) and \(s_{3} (t)\) are separated clearly in the time domain in Fig. 6b. When die is thinner than wavelength of acoustic wave \(s_{2} (t)\) and \(s_{3} (t)\) are superimposed as shown in Fig. 6c. In this case it is difficult to obtain the amplitude of reflected wave and find the position of reflective interlayer.

In order to improve the A-Scan signal, we separate the sound echoes by sparse signal representation of A-Scan signal with three major steps. First, selecting an overcomplete dictionary \(\varPhi\), by which A-Scan signal can be sparsely represented as Eq. (7). Second, decomposing the reflection signal with the overcomplete dictionary, and separating incident pules with the sparse representation. Last, selecting an appropriate echo to be the A-Scan signal in the \(V(z)\) curve.

2.3.1 Selecting an Overcomplete Dictionary

The SAM signal is not sparse in time-domain, but it is possible that there is an overcomplete dictionary in which A-Scan signal can be sparsely represented. Dictionary \(\varPhi \in R^{N \times L}\) is a \(N \times L\) matrix, with generating atoms for each column vectors \(\{ \phi \}_{i = 1}^{L}\). \(N\) is the length of A-Scan signal to be processed and \(N < L\).

As that the A-Scan signals exhibit the time–frequency localization characteristic. So A-Scan signal can be separated in a time–frequency dictionary. Many kinds of time–frequency dictionary have been proposed over last years, for example, wavelet packets dictionaries, cosine packet dictionaries and Gabor dictionaries. According to the documents [13,14,15], Gabor dictionary can model the A-Scan signal best and is the most suitable dictionary.

The real Gabor dictionary is defined as follow:

$$D_{R} = \{ g_{(r,w)} :(r,w) \in \varGamma \times [0,2\pi ]\}$$
(8)

where \(r = (s,u,v)\) and Gabor function \(g_{(r,w)}\) written as follow:

$$g_{(r,w)} = \frac{{K_{(r,w)} }}{\sqrt s }g\left( {\frac{t - u}{s}} \right)\cos (vt + w)$$
(9)

where \(s\) is the scale of the function, \(u\) its translation and \(v\) is frequency modulation; \(w\) is the phase of the real Gabor vectors; \(g(t) = e^{{ - \pi t^{2} }}\) is the Gaussian window function; constant and factor \(K_{(r,w)} /\sqrt s\) normalize \(g_{(r,w)}\). The signal decompositions are performed in the discrete Gabor dictionary in practice \(D_{\alpha }\). The \(\varGamma_{\alpha }\) is composed of all \(r = (a^{j} ,pa^{j} \Delta u,ka^{ - j} \Delta v)\), where \(\Delta u = \Delta \xi /2\pi\) and \(\Delta u \times \Delta \xi < 2\pi\).

Gabor dictionary seem to be most suitable for SAM signal processing in terms of following features. Gabor function is optimally processing method in both time and frequency domain. Atoms in Gabor match SAM signal very well. The discretized scale parameter \(a\) can be selected in an arbitrary way. However, the degree of overcompleteness of Gabor dictionary depends on the scale parameter \(a\), the smaller amount of a is, the high degree of overcompleteness achieves.

2.3.2 Separating the Incident Signal

The A-Scan signal observed \(Y\) and overcomplete Gabor dictionary \(\varPhi\) already have been given. As the Eq. (7), the problem is to find the reflection coefficients \(r = \{ r_{i} \}\) subject to \(y(t) = \varPhi r + v\). The Basis Pursuit (BP) method proposed by Chen and Donoho is a convex optimization. This method is more suitable then Match Pursuit and Orthogonal Matching Pursuit and for this problem [16].

BP is to make vector \(r\) as sparser as possible. That is to minimize the number of non-zero in \(r\). Hence, we need to solve the follow equation by ignoring the noise:

$$\mathop {\hbox{min} }\limits_{r} ||r||_{0} \,{\text{subject}}\,{\text{ to}}\,y = \varPhi r$$
(10)

However, the complexity of solution grows exponentially with \(L\). We need to replace the \(l^{0}\) norm with the \(l^{1}\) norm, so the Eq. (10) can be rewritten as:

$$\mathop {\hbox{min} }\limits_{r} ||r||_{1} \,{\text{subject}}\,{\text{ to}}\,y = \varPhi r$$
(11)

It is a convex optimization problem without very great complexity even for very large \(L\). In the practical application, there is background noise in the A-Scan signal. To simplify the problem, we consider noise \(n(t)\) as Gaussian white noise. The solution of Eq. (11) can be rewritten as \(l^{1}\) norm with the BP method.

$$\mathop {\hbox{min} }\limits_{r} \left( {\frac{1}{2}||y - \varPhi r||_{2}^{2} + \lambda ||r||_{1} } \right)$$
(12)

\(\lambda \in R\) is set to have the value \(\lambda = \sqrt {2\log (p)}\). Where \(p\) is the cardinality of dictionary. The Eq. (12) is closely connected with linear programming. The linear programming is defined as follow:

$$\hbox{min} c^{T} x + 1/2||p||^{2} \,{\text{subject}}\,{\text{to}}\,Ax + \delta p = b$$
(13)

where \(c^{T} x\) is the object function, \(x \ge 0\) is the collection of equality constrains; \(\delta = 1\); \(A = (\varPhi , - \varPhi )\); \(x \ge 0\) and \(\delta = 1\). The BP problem can be equivalently reformulated as a linear programming.

2.3.3 Construction of A-Scan Signal

This step is to select the proper echo to reconstruct A-Scan signal to produce the \(V(z)\) curve. There are three steps: first, the time–frequency windows are set according to the frequency of transducer and depth of axis Z. Second, in the time–frequency atoms that are separated from the original A-Scan signal, we select the interested reflection coefficient \(r_{i}\). So we can isolate the reflection signal \(s_{1} (t)\),\(s_{2} (t)\) and \(s_{3} (t)\), and keep the proper signal \(s_{i} (t)\) in the Fig. 6. \(s_{i} (t)\) is the product of expected incident pulse and the reflection coefficient \(r_{i}\), \(s_{i} (t) = x_{i} \times r_{i}\). Third, the selected A-Scan signal \(s_{i} (t)\) will be used to construct the \(V(z)\) curve.

We use the Heisenberg box to render the time–frequency atoms selected by BP method. The Heisenberg Box can represent an “time–frequency location” of an atom located in the time frequency plane shown in Fig. 7.

Fig. 7
figure 7

The concept of the Heisenberg Box

The Heisenberg Box is a rectangle with width of time \(\sigma_{t}\), and length of frequency \(\sigma_{w}\), and the frequency center which coincides with the signal’s. In the phase plane, the darkness of each rectangle increase with the energy of each echo grows (Fig. 8).

Fig. 8
figure 8

a is the A-Scan signal, b is the phase plane of A-Scan signal

3 Implementation and Experiment

Auto-focus method was integrated in the imaging software of SAM. The hardware and software of SAM is developed by ourselves in laboratory. The SAM shown in Fig. 9 is mainly composed of pulser-receiver, axis X–Y linear motors, axis Z step motor, industrial PC, display, acoustic lens and transducer. The SAM obtain digital image with reflection ultrasonic echo technology. The pulser -receiver emits an electrical pulser to the piezoelectric transducer, and the piezoelectric transducer transforms the electrical signal into a planar wave, which is focused and changed into spherical wave by the acoustic lens. Spherical wave is focused on the interlayer of specimen in the coupled liquid (e.g. water). The acoustic signal reflected back by the specimen is converted to an electrical signal by the transducer. This signal is called A-Scan signa of one point. A two-dimensional digital image of the interlayer inside the sample can be obtained by moving lens in the X–Y plane. The two-dimensional digital image is the C-Scan image.

Fig. 9
figure 9

Schematic of reflection SAM

Auto-focus technology can significantly improve the resolution and contrast of C-scan images. Process of auto-focusing at the interface of interlayer can be divided into three steps. First step is to auto-focus on the top surface; second step is to measure the \(V(z)\) curve of the interlayer; last step is to calculate the distance of defocus \(d_{z}\) and move the acoustic lens to the position. The procedure of auto-focusing is shown in the Fig. 10.

Fig. 10
figure 10

Process of auto-focusing at the interlayer

The performance of the Auto-focus method was tested through the experiment by using the acoustic lenses of OLYMPUS. Their parameters are shown in Table 1.

Table 1 Parameters of acoustic lens

Experiments were performed to validate the auto-focus method. Five IC chips have been tested for auto-focus method as a sample.

Figure 11 shows the image of CPU PIII TUALATIN 1.2GHZ and a GPU ATI X1950pro, which are both Flip Chip. (a) is the image of top surface GPU, (b) is the image of solder bond in GPU, (c) is the solder bond of CPU, (d) is A-Scan signal of solder bond, and (e) is the result of the sparse signal representations from BP. The thickness of the tow IC is less than 1 mm, solder bonds are even thicker. Sparse signal representations applied in auto-focus can separate the superimposed echoes clearly and obtain the phase and amplitude in (e). In addition, by comparison of (c) and (d), it is found that the solder bond of CPU has a crack (in the red circle).

Fig. 11
figure 11

The images of GPU and CPU

Figure 12 is the image of the DPU designed by Institute of Computing Technology Chinese Academy of Sciences. (a) is the image with the focal plane on the top surface; (b) is the image focused on the layer of die; and (c) is the image focused on the layer of wiring layer. From the images of (b) and (c), it can be found that the difference in clarity due to the difference in focal plane, even if the distance is very close.

Fig. 12
figure 12

The image of NPU

Figure 13 shows the image of MOS-FET 1XTK170N0P obtained by lens V375 30 MHz. (a) is the image of top surface; (b) is the image of interlayer in FET; (c) is the A-Scan signal in LabVIEW after auto-focusing. One can get that the amplitude of A-Scan signal is large, and the contrast of image B is high.

Fig. 13
figure 13

The image of MOS-FET 1XTK170N0P

Figure 14 shows the image of memory chip memory chip DDR2 800. (a) is the image of top surface, (b) is the image of interlayer.

Fig. 14
figure 14

The image of memory chip DDR2 800

Figure 15 shows the image of microcontroller STC 12C5A60S2. (a) is the image of top surface, (b) is the image of interlayer. From this image, the die and circuit connection can be clearly seen.

Fig. 15
figure 15

The image of microcontroller STC 12C5A60S2

Auto-focus technology can only show its value when applied. His advantages are embodied in two aspects, one is high precision, the other is fast speed. The amplitude value of A-Scan signal on the focal point is the largest and the image is clearest. Focusing is required prior to imaging but is not required during the scanning process. The manual focus at low frequencies can also meet the requirements, but the speed is slow, and auto-focus speeds up the focus. The duration of focusing in these experiments are both no more than 2 min separately. From these experiments, it can be found that the location of the focal plane can be obtained accurately with auto-focus method.

4 Conclusion and Discussion

Scanning acoustic microscopy system is widely used in industry, medicine, biology, especially chip manufacturing, because of its unique advantages. Focusing is one of the key aspects of imaging and material testing. However, the auto-focus technology is not developed in SAM system. The novel auto-focus method proposed in this paper is based on \(V(z)\) curve model and improved by using the Sparse representations for A-Scan signal. The auto-focus technology integrated into the SAM system with three steps. The first is to focus on the upper surface, the second is to obtain the position of acoustic lens with the maximum reflection wave by the \(V(z)\) curve, third is to improve the resolution of \(V(z)\) curve by sparse representation. The advantages of auto-focus are embodied in two aspects, one is precision of focus, the other is speed of focus. In application, the accuracy of nondestructive testing of thin film materials and electronic chips can be improved, and the detection efficiency can be enhanced. Further, the auto-focus method will be a key tool for the application of SAM in the field of industrial nondestructive testing.

Sparse representation of A-Scan applied in auto-focus of SAM is to improve the \(V(z)\) curve in this paper. However, the application of this method is very extensive. The A-Scan signals at different Z-axis make up the \(V(z)\) curve. Further, the A-Scan signals of different positions in X–Y plane can compose a C-Scan digital image. Therefore, it can be believed that the method of Sparse representation can also be applied in image noise reduction and image optimization of SAM, which is just our next goal.