1 Introduction

Since the first microprocessor was commercialized in 1971 [1], the semiconductor industry has focused on making smaller transistors to increase the density of integrated circuits. Roughly following Moore’s law [2, 3], the number of transistors in a single device has increased from only ~ 2300 in 1971 to more than 100 billion at present. However, continuing to reduce the size of transistors according to Moore’s law reaches its physical [4, 5] and technical limits [6] as the channel length (spacing between source and drain electrodes) is reduced to less than 10 nm.

Instead of planar scaling, to increase the integration density of semiconductor devices, the semiconductor industry is choosing a three-dimensional (3D) manufacturing strategy [7], which has the advantages of high capacity and high energy efficiency. The manufacture of 3D semiconductor devices is accompanied by vertical layer-stacking technologies such as monolithic [8] or heterogeneous integration [9]. Layer materials used for semiconductor multilayer devices vary depending on the purpose. For example, for the gate electrode configuration called a metal oxide semiconductor [10], an oxide layer such as SiO2 is commonly used as an insulating layer, and a polysilicon or metallic material is used as a conductive layer. Depending on the purpose, the number of layers can range from a few to hundreds. For example, 3D NAND flash memories [11,12,13] (3D NANDs) have higher storage capacity than conventional planar charge trap flash by vertically stacking the semiconductor materials by more than 200 layers.

The fabrication process of 3D semiconductor multilayer devices has benefited from advances in existing thin film deposition techniques [14]. These techniques have developed rapidly since the 1930s, as the need for optical thin film coating, particularly for military applications, is substantial. The development of vacuum evaporation [15] and magnetron sputtering [16] was accelerated by the invention of oil diffusion pumps [17] in the 1930s. Chemical vapor deposition (CVD) [18] and atomic layer deposition (ALD) [19, 20] technologies were discovered in the 1960s and remain widely used to deposit very thin layers in semiconductor facilities.

Nanometer-thick multilayer devices made of semiconductor materials include 3D NANDs [11,12,13, 21], transition metal dichalcogenides (TMDs) [22,23,24,25,26,27], graphene multilayer devices [28,29,30], and dispersive mirrors [31,32,33,34,35]. Table 1 provides commonly used materials, the number of layers, layer thickness, potential defects, measurement methods, and applications for each semiconductor multilayer device. Residual stresses during thin film deposition can cause unwanted thickness variations in the finished product [36]. Accurate measurements of the layer thickness of these devices are important for the reliable electrical performance of the finished product.

Table 1 Representative examples of semiconductor multilayer devices

To date, several multilayer thickness metrology methods for 3D semiconductor devices have been proposed [37, 38]. Figure 1 briefly shows commonly used measurement methods depending on the target thickness of the layer. In this paper, we focus on reviewing the measurement methods and algorithms for nanometer-scale multilayer semiconductor devices. Table 2 compares nanometer-scale semiconductor multilayer thickness measurement methods. The thickness resolution of the optical methods depends on the layer thickness, the number of layers, and the measuring method. For example, in situ measurements show higher accuracy than ex situ measurements [39], and the thickness error increases with the thickness of the layer [40]. Electron microscopy [41,42,43,44] has the advantages of high magnification and high spatial resolution of < 0.1 nm; however, because of its destructive nature, the wafer-cutting process is required to measure the cross sections of the multilayer devices. Spectrophotometry [45, 46] is a nondestructive, noncontact method that can determine the thickness with a subnanometer resolution by exploiting the reflection, absorption, and transmission properties of light in light–matter interactions. Spectroscopic ellipsometry [47,48,49] is widely used for multilayer thickness characterization by simultaneously measuring the amplitude and phase change of the polarization state of the reflected light from a target sample. Raman spectroscopy [50,51,52] can observe the chemical and structural characteristics of multilayers by detecting scattered light. Ultrasound [53, 54] can be used for multilayer thickness characterization, but because of its relatively long wavelength (typically sub-mm scale), it is not suitable for analyzing nanometer-scale multilayer devices. White-light interferometry (WLI) [55,56,57] has been used to measure relatively thick layers of > 1 μm. WLI-based thin film thickness measurement is limited to transparent materials and is not suitable for nanometer-scale thin film devices because of its low phase sensitivity [58].

Fig. 1
figure 1

Typical target layer thickness for measurement methods used for thin film thickness characterization

Table 2 Performance comparison of measurement methods used for semiconductor multilayer thickness characterization

Optical approaches are accompanied by characterization algorithms [59,60,61,62,63] to determine the thickness of a multilayer. Model-based or machine learning algorithms [64] can be used as the characterization algorithm. Model-based algorithms find unknown sample parameters (thickness or refractive index) by comparing the calculated optical responses (reflectance or transmittance) based on optical modeling with the measured ones. Machine learning, which is a data-driven algorithm, automatically learns the correlation between unknown sample parameters and optical responses without accurate multilayer structural modeling.

In Sect. 2, methods commonly used for semiconductor multilayer devices, namely, electron microscopy, spectrophotometry, spectroscopic ellipsometry, and Raman spectroscopy, are reviewed, and in Sect. 3, algorithms commonly used for semiconductor multilayer devices are reviewed.

2 Multilayer Device Measurement Methods

2.1 Electron Microscopy

The working principle of electron microscopy [65] is to detect and analyze scattered or transmitted electrons resulting from electron–matter interactions. The highest resolution of today’s electron microscopy has reached the sub-angstrom scale [66], which is limited by noise such as thermal, drift, and lens aberrations [67, 68].

The high-resolution capabilities of electron microscopy are exploited in semiconductor device manufacturing facilities to measure the nanometer-scale thickness of semiconductor multilayers [69, 70]. Electron microscopy commonly used for semiconductor metrology includes critical dimension scanning electron microscopy (CD-SEM) [41] and scanning transmission electron microscopy (STEM) [71].

SEM detects secondary and backscattered electrons produced by electron–matter interactions to determine interlayer boundaries from changes in signal intensity according to the composite materials of the multilayer. In particular, CD-SEM [72, 73] is specialized in the dimensional metrology of semiconductor devices. To avoid damaging the sample, electrons are accelerated at a low energy of < 1 keV, and fast and accurate stages for the semiconductor wafer are inserted in the CD-SEM instrument. The resolution of commonly used CD-SEM is < 1 nm, determined by the electron-optical system. CD-SEM does not require special sample preparation [74], such as TEM, which requires thin sections of a specimen and has the advantage of having a wider FOV than TEM.

Figure 2a shows the thickness characterization results of the oxide films of 3D NAND using CD-SEM [75]. Because the oxide–nitride-blocking oxide (ONO) film thickness surrounding the vertical channel of 3D NAND is related to the data retention rate in the final products, precise thickness measurement is required. The diameter of each layer is determined by the difference in brightness between different media, and the layer thickness of ~ 10 nm is determined by the difference in diameter before and after deposition. In this paper, CD-SEM can measure the ONO film thickness with a precision of 0.08 nm.

Fig. 2
figure 2

Representative examples of electron microscopy used for semiconductor multilayer thickness characterization. a CD-SEM images of ONO films for 3D NAND devices. Reproduced with permission from [75]. Copyright 2018 Society of Photo-Optical Instrumentation Engineers (SPIE). b CD-SEM analysis for defect detection using machine learning. Reproduced with permission from [76]. Copyright 2021 Society of Photo-Optical Instrumentation Engineers (SPIE). c Cross-sectional STEM images of vertically stacked MoS2–WS2 multilayer devices. Reproduced from [80] with permission from Kang K. et al., Layer-by-layer assembly of two-dimensional materials into wafer-scale heterostructures, Nature, 550, 229–233, 2017, Springer Nature. d In situ STEM images showing the evolution of the a-TiOxNy layer. Reproduced from [81] with permission of Royal Society of Chemistry, from in situ TEM observation on the interface-type resistive switching by electrochemical redox reactions at a TiN/PCMO interface, Baek K. et al., 9, 582–593, 2017; permission conveyed through Copyright Clearance Center, Inc.

Machine learning is used to improve the imaging speed of CD-SEM [76]. Figure 2b shows the fast detection of defects in periodic patterned images of 36-nm pitch with the help of machine learning. Machine learning is used to reduce the imaging time from 20 h/mm2 with conventional CD-SEM imaging to 1.25 h/mm2. By training a machine learning model with a rapidly acquired low-resolution image as input and a slowly acquired high-resolution image as output, a trained machine learning model can quickly detect defects at high resolution with rapidly acquired low-resolution images.

STEM [77,78,79] enables subangstrom resolution imaging by scanning a nanometer-sized electron beam point by point on a target sample. Scattered electrons transmitted through the sample are collected by an annular-shaped detector at the back focal plane of the objective lens. With the intensity variation of the scattered electron proportional to the square of the atomic number, the material properties and thickness can be characterized. The resolution of STEM has reached ~ 0.05 nm [66] but has the disadvantage of limited FOV and the need to prepare very thin sections of specimens, typically approximately 100 nm thick, for high resolution.

STEM can be used to verify the deposition state of stacked layers of ultrathin semiconductor films [80,81,82]. Figure 2c shows TMD films measured with angstrom precision by STEM [80]. An interlayer distance of 6.4 Å is distinguished. In situ TEM measurements can be used to observe real-time growth or shrinkage of the few-nanometer intermediate reaction layer of neuromorphic multilayer devices [81], as shown in Fig. 2d. They directly visualized the electrochemical redox processes by in situ TEM imaging.

Electron microscopy is widely used in semiconductor manufacturing processes due to its high resolution; however, it is not suitable for total inspection because it requires a wafer-cutting process. Compared to optical approaches, electron microscopy typically has a narrow FOV of 10–100 nm and requires tedious sample preparation processes, such as focused ion beam milling [83] in the case of TEM.

2.2 Spectrophotometry

Spectrophotometry [45, 46] is the quantitative measurement of transmitted or reflected light as a function of wavelength resulting from light–matter interactions over a wide wavelength range. The ratio of the intensity between the reference (a portion of a light source or reference mirror) and the measured (light interacting with the thin film materials) depends on the layer thickness and the refractive index of the thin film devices. Single or multiple photometers can be used to detect intensity changes. In general, the spectral range of ultraviolet to near-infrared light is used for spectrophotometers, but recently, the spectral range of extreme ultraviolet light (from 10 to 124 nm) has also been used [84,85,86,87] to measure semiconductor devices with a thickness of a few nanometers.

Spectrophotometry is noninvasive, nondestructive, simple, and requires no special sample preparation. However, spectrophotometry has a drawback when measuring highly absorbing media such as metallic materials in the visible range. Compared to spectroscopic ellipsometry, spectrophotometry has better spatial resolution because it does not necessarily require the beam to be incident on the sample at an oblique incident angle, but the optical information is limited to the amplitude change, while ellipsometry obtains the amplitude and phase change of the polarization state with a single measurement.

Thickness characterization by spectrophotometry is an indirect method that requires appropriate optical modeling and optimization algorithms to determine the layer thickness from the measured optical responses. Sample parameters such as the thickness or refractive index of each layer are determined by the optical fitting methods [88, 89], which compare the measured spectra with the simulated ones. For the optical modeling of multilayer devices, the transfer matrix method (TMM) [14] based on the continuity of electromagnetic waves passing through a thin film can be used. The electric and magnetic fields in the vertical direction (the direction in which the thin films are stacked) at the interface of the thin film system can be expressed as a characteristic matrix, as shown in the following equation [14]:

$$\left[ {\begin{array}{*{20}c} {E_{{\text{t}}} } \\ {H_{{\text{t}}} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\cos \delta } & {{\text{i}}\sin \delta /\eta_{1} } \\ {{\text{i}}\eta_{1} \sin \delta } & {\cos \delta } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {E_{{\text{b}}} } \\ {H_{{\text{b}}} } \\ \end{array} } \right]$$
(1)

where Et (Eb) and Ht (Hb) are the electric and magnetic fields of the upper (lower) boundary of the layer, respectively. The phase shift δ is expressed as 2πNdcosθ/λ, where N, d, θ, and λ are the complex index of refraction, the thickness of the layer, the incident angle of light, and the wavelength of light, respectively. η is the optical admittance, which represents the ratio of the magnetic field to the electric field. The transfer matrix for multiple layers can be expressed as the following equation by extending Eq. (1):

$$\left[ {\begin{array}{*{20}c} B \\ C \\ \end{array} } \right] = \left\{ {\mathop \prod \limits_{r = 1}^{m} \left[ {\begin{array}{*{20}c} {\cos \delta_{r} } & {{\text{i}}\sin \delta_{r} /\eta_{r} } \\ {{\text{i}}\eta_{r} \sin \delta_{r} } & {\cos \delta_{r} } \\ \end{array} } \right]} \right\}\left[ {\begin{array}{*{20}c} 1 \\ {\eta_{{\text{s}}} } \\ \end{array} } \right]$$
(2)

The number of layers is m, and the optical admittance of the substrate is ηs. B and C are the normalized electric and magnetic fields, respectively. Thus, the reflectance (R), transmittance (T), and absorption (A) of the multilayer thin film system can be expressed as the following equations:

$$R = \left( {\frac{{\eta_{0} B - C}}{{\eta_{0} B + C}}} \right)\left( {\frac{{\eta_{0} B - C}}{{\eta_{0} B + C}}} \right)^{*}$$
(3)
$$T = \frac{{4\eta_{0} {\text{Re}} \left( {\eta_{m} } \right)}}{{\left( {\eta_{0} B + C} \right)\left( {\eta_{0} B + C} \right)^{*} }}$$
(4)
$$A = \frac{{4\eta_{0} {\text{Re}} \left( {BC^{*} - \eta_{m} } \right)}}{{\left( {\eta_{0} B + C} \right)\left( {\eta_{0} B + C} \right)^{*} }}$$
(5)

The calculated reflectance or transmittance can be compared to the measured values to characterize the optimal thickness or refractive index of each layer. The most commonly used metric for evaluating the goodness of fit is the mean squared error (MSE), as shown in the following equation:

$${\text{MSE}} = \frac{1}{2N - M}\mathop \sum \limits_{i = 1}^{N} \left[ {\left( {\frac{{R_{i}^{{{\text{cal}}}} - R_{i}^{\exp } }}{{\sigma_{R,i}^{\exp } }}} \right)^{2} } \right]$$
(6)

where σ is the error bar of the measurements, N is the number of measurements, and M is the fit coefficient.

To increase the sensitivity of the measured signal from the targeted thin film, differential reflectance spectroscopy (DRS) [90,91,92] can be used. Differential reflectance (DR) can be expressed as DR = (R1 − R2)/R2, where R1 and R2 are the reflectance of the target and the reference, respectively. When a medium similar to the substrate of the target thin film is used as a reference sample, the target layer thickness can be measured with subnanometer precision. Figure 3a shows the DR spectra of thin film devices used as display panels [90]. In this study, the thickness of a ~ 10-nm-thick silicon layer is determined with 0.3-nm precision.

Fig. 3
figure 3

Representative examples of spectrophotometry. a Experimental setup and optical fitting results of differential reflectometry. Reprinted with permission from [90]. © The Optical Society. b Schematic of angle-resolved spectral reflectometry and a comparison of the measured and calculated angle-dependent reflectance. Reprinted with permission from [94]. © The Optical Society

To increase the amount of measurement information, multiangle [93, 94] or multisample [95] measurements can be used. Substantial changes in the reflectance spectra by changing the angle of incidence can provide new information. Measuring multiple samples with fixed optical constants of the target sample can also provide additional information. The additional information can improve the accuracy and uniqueness of thickness characterization [49]. Figure 3b shows multiangle reflectometry [94] using a digital micromirror device. The reflectance spectrum from the target sample changes with the angle of incidence from 0 to ~ 70°. Thus, a ~ 210 to 250-nm-thick layer was characterized with a combined uncertainty of 0.43 nm.

Machine learning can be used to reduce the characterization time or replace complex optical modeling [40, 96, 97]. Figure 4a shows the use of a thin film neural network [96] for the optical design and characterization of a multilayer of 232 layers. Using a trained machine learning model greatly reduced the computation time for thickness characterization. This thin film neural network takes ~ 0.924 s to optically model 232 layers, which is ~ 73-fold faster than the conventional framework (~ 67.5 s).

Fig. 4
figure 4

Representative examples of recent spectrophotometry for semiconductor multilayer thickness characterization. a Conceptual diagram of a thin film neural network and optical fitting results of 232 layers. Reproduced from [96] under the terms of creative common license: https://creativecommons.org/licenses/. b 3D nanostructure characterization results of EUV imaging reflectometry. Reproduced from [85] under the terms of creative common license: https://creativecommons.org/licenses/

As the structure of 3D integrated circuits is reduced to nanometer size, reflectometry in the extreme ultraviolet (EUV) region [84,85,86,87] can be used to measure nanoscale features. The wavelength range of EUV light generated by high harmonic generation is ~ 10 to 100 nm, which has the advantage of nanoscale spatial resolution. In addition, EUV light can penetrate metallic materials such as copper or aluminum, which is opaque in the visible range. Figure 4b shows the surface topographic imaging and thickness characterization of the SiO2-doped microstructures by EUV phase-sensitive reflectometry [85]. A SiO2 layer thickness of a few nanometers was characterized with 0.3-nm precision.

2.3 Spectroscopic Ellipsometry

One of the most widely used multilayer characterization methods, spectroscopic ellipsometry [47,48,49, 98], measures the changes in the polarization state of reflected or transmitted light from target samples. The main advantage of ellipsometry is in acquiring two pieces of optical information simultaneously by measuring the amplitude (ψ) and phase difference (Δ) of the polarization change with a rotating polarizer or analyzer (another polarizer before the detector) unit. Because ellipsometry irradiates light on the target device at an oblique angle, lateral resolution can be limited by several micrometers. Spectroscopic ellipsometry uses a mechanically rotating polarizer to obtain the change in polarization state over time. Therefore, it is vulnerable to mechanical vibration and takes several milliseconds to acquire single measurement data.

Similar to spectrophotometry, conventional ellipsometry also indirectly characterizes sample parameters with accurate optical modeling of the multilayer structure. The reflection coefficient of s- and p-polarized light is expressed as the following equation:

$$\frac{{r_{{\text{p}}} }}{{r_{{\text{s}}} }} = \tan \left( \psi \right){\text{e}}^{{\text{i}}\Delta }$$
(7)

where rs and rp are the Fresnel coefficients of s- and p-polarized reflections, respectively, which are functions of thickness and refractive index. ψ and Δ can be derived with s- and p-polarized reflection coefficients calculated by the TMM. The thickness and refractive index of a thin film are determined when the MSE between the measured and calculated values is minimized, as shown by the following equation:

$${\text{MSE}} = \frac{1}{2N - M}\mathop \sum \limits_{i = 1}^{N} \left[ {\left( {\frac{{\psi_{i}^{{{\text{cal}}}} - \psi_{i}^{\exp } }}{{\sigma_{\psi ,i}^{\exp } }}} \right)^{2} + \left( {\frac{{\Delta_{i}^{{{\text{cal}}}} - \Delta_{i}^{\exp } }}{{\sigma_{\Delta ,i}^{\exp } }}} \right)^{2} } \right]$$
(8)

where σ is the error bar of the measurements, N is the number of measurements, and M is the fit coefficient.

Unfortunately, ψ and Δ values measured from single-wavelength ellipsometry do not provide a unique multilayer thickness solution. Thus, ellipsometry is often accompanied by multiangle [99,100,101] or multisample [102, 103] measurements. Multiangle or multisample measurement data help improve the uniqueness of characterization results by providing more information about the sample. Reducing the number of unknown parameters (such as layer thicknesses or refractive indices) can also help improve the accuracy and uniqueness of the characterization results. For example, by introducing a tooling factor that linearly fits the multilayer thickness change tendency in the layer deposition process [104], the number of fitting parameters required for characterization can be substantially reduced (from 37 to 2). Figure 5a shows the thickness characterization results of a 37-layer SiO2–Ta2O5 stack with a relative thickness error of < 2% compared to the target design. In this study, complex refractive indices can be estimated using the dispersion model [105, 106].

Fig. 5
figure 5

Representative examples of recent spectroscopic ellipsometry for semiconductor multilayer thickness characterization. a Optical fitting results of spectroscopic ellipsometric data using a tooling factor for each material. Reprinted from Surf Coat Technol, 357, Hilfiker J. N. et al., Spectroscopic ellipsometry characterization of multilayer optical coatings, 114–121, Copyright 2019, with permission from Elsevier [104]. b Experimental setup of dual-comb spectroscopic ellipsometry. Reproduced from [107] under the terms of creative common license: https://creativecommons.org/licenses/. c Representative examples of layer thickness prediction results using machine learning. Reproduced from [108] under the terms of creative common license: https://creativecommons.org/licenses/

Dual-comb ellipsometry [107] has been reported to improve the spectral resolution of spectroscopic ellipsometry, as shown in Fig. 5b. Dual-comb spectroscopy uses two optical frequency combs with slightly different repetition rates to acquire an interferogram in a wide time span without mechanical scanning. By exploiting this feature, s- and p-polarized interferograms between the signal comb and local comb can be obtained with two independent photodetectors, respectively. As one of the experimental results, the high spectral resolution of dual-comb spectroscopy can provide better thickness characterization precision (3.3 nm) compared to that of conventional ellipsometry (12.1 nm). However, because the typical spectral bandwidth of an optical frequency comb is < 100 nm, dual-comb spectroscopy can provide only limited optical information for thin films compared to conventional spectroscopic ellipsometry. Therefore, an optical comb source with a wide wavelength range may be required.

Machine learning can be combined with spectroscopic ellipsometry [108,109,110,111] to replace complex optical modeling. The correlation between the thickness of each layer of multilayer and ellipsometric data can be automatically trained using machine learning. Figure 5c shows the thickness prediction results with more than 200 layers using a machine learning model [108]. Using the ellipsometric data as the input and the high-resolution thickness information measured by TEM as the output, the thickness of each layer can be predicted with an average root-mean-square error of ~ 1.6 Å. Additionally, machine learning can be used for optical design. Without complex physical interpretation of the thin film structures, the machine learning model predicts ellipsometric spectra based on the given refractive indices and thicknesses during the optimization process [110]. Thus, the optimization process ended faster with smaller MSE values than conventional spectroscopic ellipsometry tested for 15 thin film materials.

2.4 Raman Spectroscopy

Raman spectroscopy [50] can be used to characterize thin film devices by exploiting the change in the energy state of scattered light as it interacts with thin film material. Most of the scattered photons undergo elastic scattering (Rayleigh scattering) and very rarely do incident photons lose or gain energy (Raman scattering). The Raman shift is defined as the difference in frequency between the incident light and the scattered light. Because the Raman shift depends on the material and incident light, Raman spectroscopy can be used to identify unknown materials [112, 113]. Further, by carefully observing the change in the Raman shift spectrum (in terms of the height, width, and position of the peak), the layer thickness or number of layers can be determined [114,115,116,117,118,119]. For example, by observing the 1st-order Raman shift of crystalline silicon at 520 cm−1, the thickness of the silicon material grown during the deposition process can be monitored in real time [120]. Typically, Raman spectroscopy uses a continuous wave or pulsed laser, and the scattered light is collected by a lens and a highly sensitive photodetector array. Because most of the scattered light is in the regime of Rayleigh scattering, a notch filter or an edge pass filter should be used to filter it out.

Thin film characterization using Raman spectroscopy is an indirect method similar to other optical-based analyses. By fitting the calculated Raman shift to the measured one, the composition of a thin film can be determined, or the change in thickness can be characterized [121, 122]. The intensity of a Raman shift is proportional to a function of the intensity I0 and the frequency ν of the incident light as follows [50]:

$$I_{{\text{R}}} \propto I_{0} \nu^{4} N\left( {\frac{\partial \alpha }{{\partial Q}}} \right)^{2}$$
(9)

where N is the number of scattering molecules, and α and Q are the polarizability and amplitude of the vibrational coordinate, respectively. The peak intensity of a Raman shift varies with the layer thickness. Figure 6a shows the peak intensity changes in a Raman shift while consecutively stacking MoS2 on SiO2–Si multilayer devices [115]. As the ~ 0.7-nm thick MoS2 layer is continuously stacked, the spectral information of the Raman shift (the peak position and distance from the adjacent peak position) continuously changes. With the fitting process, the MoS2 layer thickness can be determined with an accuracy of 10% of the nominal thickness. However, when the number of layers exceeds ~ 10, the sensitivity of peak position changes is substantially degraded. Therefore, thickness characterization using the peak intensity of the Raman shift is limited to < 10 layers. This restriction is one of the limitations of thickness characterization using Raman spectroscopy, where the peak intensity change is not monotonic when the number of layers or the thickness increases.

Fig. 6
figure 6

Representative examples of Raman spectroscopy for semiconductor multilayer thickness characterization: a MoS2 layers at layer numbers from 1 to 116. Reprinted with permission from [115]. Copyright 2012 American Chemical Society. b Exfoliated hBN layers at layer numbers from 3 to 20. Reproduced from [116] with permission of © IOP Publishing Ltd, from Low frequency Raman spectroscopy of few-atomic-layer thick hBN crystals, Stenger I. et al., 4, 031003, 2017; permission conveyed through Copyright Clearance Center, Inc.

Low-frequency Raman spectroscopy [116, 123] can be used to determine the number of layers from the strong dependence of the low-frequency vibrational modes on the layer number. In the case of hBN-based 2D semiconductor devices [124], the stacking order and the number of layers of hBN are closely related to electrical characteristics. As shown in Fig. 6b, the low-frequency vibrational modes at < 100 cm−1 are strongly downshifted as the hBN thickness decreases. In contrast, the high-frequency vibrational mode near ~ 1350 cm−1 appears insensitive to thickness changes.

To increase the sensitivity of the Raman signal, interference enhancement [125, 126] by exploiting the multiple reflections occurring on the SiO2–Si substrate can be used. Typical graphene layers or TMDs are fabricated on a Si substrate with a capping layer of silicon oxide several hundred nanometers thick. The Raman signal with interference enhancement can be detected with high sensitivity with up to a 30-fold enhancement factor [125]. However, multiple reflections from the SiO2–Si stack lead to ambiguity in the peak intensity of the Raman shift as the layer thickness increases.

Raman spectroscopy is widely used to detect the structural, chemical, and electrical properties of semiconductor multilayer devices in inspection facilities. Micro-Raman spectroscopy [120, 121, 127] can be used to analyze a local target area with high resolution and a small spot size by applying a microscopic system. Information from Raman spectroscopic studies includes electron/hole doping concentration [128], material composition [112, 113], stress and strain [52, 127], crystal orientation [129], and layer thickness and number [114,115,116,117,118,119]. With these versatile applications, Raman spectroscopy is commonly used to analyze the 2D semiconductor multilayer devices of emerging next-generation semiconductor materials such as graphene, hBN, and MoS2. However, because a Raman spectrum contains insufficient information for use in the multilayer characterization of semiconductors of hundreds of layers, it is mainly used for layer characterization of a few nanometers in thickness, with < 10 layers.

3 Multilayer Device Characterization Algorithms

A determination of layer thickness with multilayer device measurements involves the use of appropriate algorithms. Model-based algorithms [59,60,61,62,63] include optical modeling and optical parameter optimization. Machine learning can be used for hidden physics modeling, direct characterization, and image classification. The operating principles for both categories are depicted in Fig. 7.

Fig. 7
figure 7

Operation principles of thickness characterization algorithms used for semiconductor multilayer devices: a commonly used model-based algorithms and b supervised learning

Figure 7a shows the operating principle of the model-based algorithm. Optical modeling requires the thickness and refractive index information of the target multilayer device to be set as initial values. The refractive index of the layer can be estimated with various dispersion models [63, 105, 130,131,132] depending on the medium. The modeled optical spectrum is compared with the measured spectrum using an optimization algorithm [133,134,135,136]. During the optimization process, the objective function minimizes, for example, the MSE between the measured and the modeled values. With repeated iterations, the updated sample parameters generate new modeled optical responses and are compared with the measured values again. When the MSE is minimized, the optimization process stops, and optimal sample parameters are determined.

Figure 7b shows the operating principle of machine learning. Machine learning only focuses on finding the correlation between the measured optical responses and the desired sample parameters. As a representative type of machine learning, supervised learning can be exploited for thickness characterization. The operating principle is as follows. First, in the training process, the machine learning model is trained with the measured optical responses as an input and the thickness information as an output. “Training” means optimizing the weight parameters of the machine learning model that best represents the input–output relationship. Various machine learning models, such as ridge regression [137], artificial neural networks [138], convolution neural networks [139], and support vector machines [140], can be used to train the labeled data. Second, in the prediction process, optical responses measured for unseen multilayer devices are applied to the trained model to predict thickness.

This section compares the commonly used methods for each algorithm to select the right one for a given problem.

3.1 Model-Based Algorithm

Determining the sample parameters of thin film devices involves estimating the “causes” (the thickness and refractive index of the target multilayer) from the “results” (the measured optical spectrum), which is called solving the optical inverse problem [14, 61]. The process of matching the modeled values with the estimated sample parameters to the measured values is called “fitting.” The accuracy and uniqueness of the solutions (sample parameters) are inevitably affected by measurement and numerical errors [49, 62]. An optical model is designed based on the initial values of the sample parameters. For example, optical responses (such as reflectance and transmittance) can be calculated by the TMM with multilayer thickness and refractive index information. An initial value of thickness can be assigned based on preliminary knowledge, and an initial value of the refractive index can be assigned using optical dispersion models. Sample parameters are updated until the calculated optical responses best fit the measured ones. To evaluate the goodness of fit, for example, the MSE between calculated and measured values can be used. To update and evaluate the optical model, optimization algorithms such as the Levenberg–Marquardt algorithm [135], gradient descent [133], the Gauss–Newton algorithm [134], and genetic algorithms [136] are commonly used for thin film characterization.

To estimate the refractive index, the Sellmeier equation [106], Cauchy’s equation [131], and the Tauc–Lorentz model [132] can be used. Sellmeier and Cauchy’s equations are the empirical relation of the refractive index with respect to the wavelength and are mainly used to model the refractive index of transparent media such as fused silica in the visible range. The difference between Sellmeier and Cauchy’s equations is that the former equation fits better in the infrared wavelength range, and the latter equation has a simpler form. The general formulas for both methods are as follows:

$${\text{Sellmeier}}\,{\text{equation:}}\,n^{2} \left( \lambda \right) = 1 + \mathop \sum \limits_{i} \frac{{B_{i} \lambda^{2} }}{{\lambda^{2} - C_{i} }}$$
(10)
$${\text{Cauchy's}}\,{\text{equation:}}\,n\left( \lambda \right) = A + \frac{B}{{\lambda^{2} }}$$
(11)

where n is the refractive index, λ is the wavelength of the light, and A, B, and C are the fitting coefficients. To model the complex refractive index of absorbing media, the Tauc–Lorentz model can be used. The Tauc–Lorentz model expresses the complex relative permittivity with the following formula:

$$\varepsilon \left( E \right) = \varepsilon \left( \infty \right) + \chi \left( E \right)$$
(12)

where ε is the relative permittivity, E is the photon energy, ε(∞) is the relative permittivity at infinite energy, and χ is the electric susceptibility. The complex refractive index can be derived from the complex relative permittivity with the equations below:

$${\text{Re}} \left( \varepsilon \right) = n^{2} - \kappa^{2}$$
(13)
$${\text{Im}} \left( \varepsilon \right) = 2n\kappa$$
(14)

where n and κ are the real and imaginary parts of the refractive index, respectively. Modeling the refractive index can improve the accuracy and uniqueness of thickness characterization by reducing the unknown parameters of the optical inverse problem.

The determination of sample parameters with an optimization process can be expressed as

$$\hat{x} = \arg \mathop {\min \,}\limits_{x} F\left( x \right)$$
(15)

where \(\hat{x}\) is the optimized sample parameter, and F(x) is the objective function such as \(F\left( x \right) = \sum \left[ {y_{{{\text{mea}}}} - y_{\bmod } \left( x \right)} \right]^{2}\), where ymea and ymod are measured and modeled curves, respectively. Optimization algorithms differ in how they treat the objective function to update parameters in the iterative optimization process. Gradient descent finds the local minima in the opposite direction of the derivative of the objective function. The Gauss–Newton algorithm can approach the local minima faster and more accurately than gradient descent by considering not only the gradient but also the second derivative of the objective function, but the calculation of the second derivative can lead to numerical instability. The Levenberg–Marquardt algorithm is a combination of gradient descent and the Gauss–Newton algorithm. The Levenberg–Marquardt algorithm behaves like gradient descent when parameters are far from the local minima and behaves like a Gauss–Newton algorithm when they get close.

A genetic algorithm is a search algorithm inspired by the theory of natural selection. A genetic algorithm optimizes sets of sample parameters at once in one iteration. These parameter sets are called a “population.” During the optimization process, among the initial population, a set of parameters with the smallest error between the model value and the measured value is “selected.” In the next iteration, the selected parameters are mixed with each other to create a new parameter set (this process is called “crossover”). Then, some of the parameters change randomly with low probability, and this process is called “mutation.” This process is repeated until the updated parameters yield optimal fitting results. A genetic algorithm is more likely to converge to the global optima compared with gradient methods, but it takes longer computation times because it requires calculating the objective function for all populations at every iteration.

Model-based algorithms have the advantages of not requiring a large number of samples, high interpretability of results, and the ability to set bounds on sample parameters. In the optimization process of model-based algorithms, setting the initial parameters is important, and sometimes incorrect initial parameters can cause the optimization to fail completely or take a long time. Furthermore, the computational speed of a model-based method should be considered because the optical modeling and optimization time increase with the number of layers.

3.2 Machine Learning

Machine learning is a data-driven algorithm that automatically detects patterns in data for decision-making. Machine learning is widely used in science and engineering, with strengths in pattern recognition, classification, prediction, and parameter optimization [141, 142]. Particularly in the field of ultrafast optics [143], machine learning algorithms are used for various purposes such as image reconstruction [144], optical parameter characterization [145], image classification [142, 146], and system self-optimization [147, 148].

Machine learning is divided into supervised learning, unsupervised learning, and reinforcement learning, depending on the approach [64]. Among these methods, supervised learning aims to train a parametric function that can map desired outputs from inputs. For the simplest example, the input vector \({\varvec{x}} = \left\{ {x_{1} ,x_{2} , \ldots ,x_{n} } \right\}\) can be optical responses for n wavelengths, and the desirable output vector \({\varvec{y}} = \left\{ {y_{1} ,y_{2} , \ldots ,y_{m} } \right\}\) can be the layer thickness of m layers. Then, the predicted output vector \(\hat{\user2{y}}\) is expressed as:

$${\hat{\varvec{y}}} = \hat{f}\left( {\varvec{x}} \right) = {\varvec{w}}^{\text{T}} {\varvec{x}} + b = \mathop \sum \limits_{{\varvec{i}}}^{{\varvec{n}}} w_{i} x_{i} + b$$
(16)

where w is the weight vector (variables updated every iteration), and b is the bias. As a representative objective function, one can evaluate the MSE:

$${\varvec{w}} = \arg \mathop {\min }\limits_{w} \frac{1}{T}\mathop \sum \limits_{t}^{T} \left( {\hat{\user2{y}} - {\varvec{y}}} \right)^{2}$$
(17)

where T is the total number of data. Machine learning uses a backpropagation algorithm [149] to update the weight vector at every iteration. Machine learning consumes most of the time to train the parametric function composed of weight vectors. However, because the trained machine learning model predicts the output through a simple matrix operation (matrix multiplication between the input matrix and the weight matrix) on new input data, the prediction time is very short (typically < 0.1 ms but dependent on the hardware used).

With the emergence of several techniques [137, 150, 151] to improve the algorithmic performances of machine learning and the improvement in the speed of processing large amounts of data using graphical processing units (GPUs) [152], machine learning has been effectively used in various fields. The gradient vanishing problem can be suppressed by the rectified linear unit activation function [150], and the overfitting problem is greatly suppressed by several regularization methods such as dropout [151], data augmentation [153, 154], and ridge regression [137]. Batch normalization [155] greatly reduces the overall training time by resolving the covariance shift problem of latent variables.

Compared to model-based algorithms, machine learning is advantageous because the computation time does not increase in proportion to the number of layers in a multilayer device, and accurate physical analysis of the multilayer device is not required. Because parameter optimization in machine learning is based on matrix multiplication, the computing time does not substantially increase even if the dimension of the predicted output increases. Given good quality data (data with good reproducibility), machine learning can effectively find correlations between input and the desired outputs without complex physical interpretation. With this advantage, for the early studies, machine learning algorithms were used to predict the ellipsometric spectrum using neural networks [109, 111]. Thin film neural networks have recently been used for effective thin film design [96]. Previously, careful optical modeling was required to design a thin film with the desired optical response. Thin film neural networks are more effective for multilayer device designs with > 200 layers. Machine learning yields better characterization results than conventional methods in optimizing the refractive index and thickness of thin layers in various media [110]. Moreover, machine learning can also be used for defect detection in inspection processes [76]. Supervised learning can be used to predict layer thicknesses of > 200 layers from spectroscopic ellipsometric data with a high resolution similar to TEM [108]. Furthermore, group delay dispersion (GDD) measured by scanning white-light interferometry [156] can be used as additional input data for supervised learning. GDD is the second derivative of the spectral phase change and is very sensitive to changes in thin film thickness. In [157], thickness characterization performance was improved by adding GDD to the input data for supervised learning.

The limitations of machine learning are that sufficient data (at least 100 or more) are required to train the model, and accurate prediction is difficult when the structure of the multilayer device differs from that used for training. Small data problems can be addressed by exploiting data augmentation [153, 154] or simulation data. Noise-injection-based data augmentation [108, 158, 159] effectively prevents model overfitting. In [108], faulty devices could be detected economically using simulation data instead of producing outlier samples that reflect all outlier cases. The no free lunch theorem [160] refers to the fact that a trained machine learning model that is optimized for a particular problem does not perform well for other problems. Thus, a machine learning model optimization process should always be performed according to the given problem. Therefore, in applications where the multilayer structure must be changed in various ways or the number of data is insufficient, a model-based method can be more economical.

An optical neural network (ONN) [96, 161,162,163] refers to performing neural network training and prediction in free space by exploiting the speed of light. The high transmission speed and low heat dissipation of optical communication can replace existing electronics-based computing systems. As thin film neural networks [96] are already being applied to multilayer analysis, ONN could potentially be applied to thin film design or characterization with ultrafast speed and high energy efficiencies.

4 Conclusions

We have reviewed commonly used measurement methods and algorithms for semiconductor multilayer devices. As the semiconductor industry demands more compact and more versatile 3D semiconductor devices, nanometer-scale semiconductor multilayer stacking technology is becoming critical. To date, destructive and nondestructive methods have been used complementary to each other for the thickness characterization of semiconductor multilayer devices. Optical approaches seek multi-variate, multi-sample measurements to achieve sub-nanometer-scale thickness characterization errors. Recently, data-driven machine learning has been used to accelerate characterization times or replace the complex physical interpretations of multilayer structures. The best combination of the measurement method and algorithm depends on the semiconductor multilayer device under testing. We expect that this review will be of great help in choosing an appropriate characterization method.