Attenuation Sensitivity Kernel Analysis in Viscoelastic Full-Waveform Inversion Based on the Generalized Standard Linear Solid Rheology

Obtaining accurate subsurface Q (quality factor) models using full-waveform inversion (FWI) methods remains a challenging task. The forward modeling problem of viscoelastic wave propagation can be solved by superimposing N rheological bodies of Maxwell or Zener type with generalized standard linear solid rheology. However, different approaches were proposed to calculate the attenuation sensitivity kernels in viscoelastic FWI. This study reviews and compares previous theories for constructing the viscoelastic sensitivity kernels. Furthermore, we derive the viscoelastic sensitivity kernels directly following the adjoint-state (or Lagrangian multiplier) method. Compared to previous approaches, we reveal that the Q sensitivity kernels can be calculated with adjoint memory strain variables. In the numerical experiments, different methods are used to calculate the viscoelastic sensitivity kernels for comparison. We have found that when simultaneously inverting for velocity and Q models, these methods can provide inversion results of comparable quality. However, in the event of inaccurate velocity structures, the Q sensitivity kernels calculated with memory strain variables can resolve the Q anomalies more clearly, while suffering from fewer parameter trade-offs.


Introduction
In recent decades, the technique of wave-equation-based full-waveform inversion/tomography (FWI/FWT) has been intensively investigated to obtain underground model properties with high accuracy and spatial resolution for the exploration of hydrocarbon resources and interpreting deep Earth structures (Tarantola 1984;Bunks et al. 1995;Pratt et al. 1998;Virieux and Operto 2009;Tape et al. 2009;Warner et al. 2013;Fichtner and Trampert 2011;Bozdağ et al. 2016). Seismic signals experience amplitude decay and phase distortions when propagating in the Earth due to intrinsic attenuation (Liu et al. 1976;Dahlen and Tromp 1998). Attenuation represents one important property for interpreting subsurface structures and compositions. The attenuation effects are commonly quantified with quality factor Q for a linear viscoelastic solid in geophysics community. Considering attenuation effects in seismic imaging (Innanen and Lira 2010;Dutta and Schuster 2014;Zhu 2015;Sun et al. 2015;Shen et al. 2018) and inversion (Malinowski et al. 2011;Brossier 2011;Moradi and Innanen 2016;Fabien-Ouellet et al. 2017;Yang et al. 2017;Trinh et al. 2018;Kamath et al. 2020;Keating and Innanen 2020;Chen et al. 2021) has drawn significant attention. However, constructing accurate and high-resolution subsurface Q profiles reliably using FWI approaches is still a very complex and difficult problem remaining to be solved.
The damping effects of attenuation on propagating waves can be modeled using a phenomenological model, which can be constructed by parallel connection of N standard linear elements referred to as the generalized standard linear solid (GSLS) rheology (Carcione et al. 1988;Robertsson et al. 1994;Carcione 2004). In the frequency domain, the modeling of wave propagation in attenuating medium can be implemented easily by introducing complex velocity (Hak and Mulder 2011;Kamei and Pratt 2013;Operto and Miniussi 2018;Keating and Innanen 2019). However, for (very) large-scale 3D forward and inverse simulations in exploration and earthquake seismology, time-domain approaches are preferable and more frequently used. Under the GSLS formulation, Q is not involved explicitly in the rheological bodies of time-domain visco-elastodynamic wave equation (Blanch et al. 1995;Bohlen 2002). Instead, the relaxation parameters are obtained implicitly by minimizing an objective function with linear or nonlinear optimization methods (Emmerich and Korn 1987;Martin and Komatitsch 2009;Blanc et al. 2016). This impedes the derivation and calculation of Q sensitivity kernels, which are essential for iteratively updating Q models in viscoelastic FWI . Furthermore, when simultaneously inverting for velocity and Q properties, the inverted Q models are easily damaged by trade-off (or crosstalk) artifacts caused by inaccurate velocities (Brossier 2011;Hak and Mulder 2011;Kamei and Pratt 2013;Pan and Innanen 2019;Keating and Innanen 2020). Taking attenuation mechanisms into account has become one of the most difficult and complex problems in FWI studies and applications. Different theoretical frameworks have been suggested by previous researchers to calculate the sensitivity kernels in time-domain viscoelastic FWI based on the GSLS model (Charara et al. 2000;Tromp et al. 2005;Fichtner and van Driel 2014). In Charara et al. (2000), the attenuation parameters are defined as the differences between unrelaxed and relaxed moduli. The sensitivity kernels of unrelaxed moduli and attenuation parameters in viscoelastic FWI are derived with a perturbation approach based on Born approximation. The attenuation sensitivity kernels are formulated as the cross-correlation between forward strain memory variables and adjoint strain fields. Based on this theoretical framework, Bai et al. (2017) inverted for the attenuation parameters in anisotropic and viscoelastic medium.
In Tromp et al. (2005), when deriving the sensitivity kernel for Q, the relation between shear modulus and quality factor defined in the Kolsky-Futterman model by Kolsky (1952) and Futterman (1962) is introduced. Using the Born scattering integral and the chain rule, the derived Q sensitivity kernel has the same expression as that of modulus sensitivity kernel but with a different frequency-domain adjoint source. In earthquake seismology, Zhu et al. (2013) used this approach to invert for the Q structures beneath Europe and the North Atlantic. In exploration seismology, Pan and Innanen (2019) have applied this approach to image subsurface Q profiles in a practical walk-away vertical seismic profiling survey for delineating hydrocarbon reservoirs. In this paper, the approaches given by Charara et al. (2000) and Tromp et al. (2005) are referred to as method I and II, respectively.
Based on the adjoint-state (or Lagrangian multiplier) method, sensitivity kernels in FWI can be constructed by cross-correlating the forward and adjoint wavefields, which avoids direct calculation of the Jacobian matrices for (very) large-scale inversion practices (Plessix 2006;Métivier et al. 2013). In this paper, following the adjoint-state method, we directly derive the sensitivity kernels for relaxation functions and Q in viscoelastic FWI. The moduli sensitivity kernels can be constructed by cross-correlating the forward strain fields with the adjoint stress fields (Liu and Tromp 2008). The Q sensitivity kernels can be formulated as the cross-correlation between forward strain fields and adjoint strain memory variables. The same GSLS model as that of method II by Tromp et al. (2005) is used in forward modeling and sensitivity kernel derivation within this approach, which is theoretically more complete. Furthermore, this method allows calculating the velocity and Q sensitivity kernels simultaneously without introducing additional adjoint sources.
In the numerical experiments, a simple example with separated velocity and Q anomalies in viscoelastic media is first given for sensitivity and trade-off analysis. The velocity and Q sensitivity kernels based on different methods are calculated for comparison and interparmeter trade-off quantification. Next, we give a surface acquisition survey example with topographic variations and complex velocity structures. Unstructured quadrilateral mesh is used to discretize the models while accounting for topographic effects. Surface waves (SWs) are extracted from the shorts to calculate the sensitivity kernels of S-wave velocity and quality factor ( V S and Q S ). Early arrivals of body waves (BWs) are used to calculate the sensitivity kernels of P-wave velocity and quality factor ( V P and Q P ). The influences of velocity errors on Q sensitivity kernels are evaluated based on the three methods. The velocity and Q models obtained by inversion of SWs and BWs are presented for comparison. In these numerical experiments, we have obtained a series of observations: 1. The velocity sensitivity kernels calculated using these three different methods do not show obvious differences; 2. When simultaneously inverting for velocity and Q models, these three methods can provide inversion results of comparable quality; 3. However, in the presence of velocity errors, the Q sensitivity kernels calculated using the memory strain variables with method I and the adjoint-state method suffer from fewer trade-offs. The inverted Q models are also more accurate than those obtained by method II; 4. Furthermore, method II appears to converge more slowly and is almost two times more expensive than the other methods for simultaneous velocity and Q estimation.
This paper is organized as follows. The basic principles of viscoelastic FWI and forward modeling based on the GSLS rheology are first given. Then, previous theories for constructing the attenuation sensitivity kernels are reviewed. An adjoint-state (or Lagrangian multiplier) method for deriving and calculating the viscoelastic FWI sensitivity kernels is introduced. In the numerical modeling section, two synthetic examples are provided to analyze the sensitivity kernels obtained by different methods and evaluate their performances for inversion.

Principle of Viscoelastic FWI and Forward Simulation
In a common FWI framework, underground parameters are gradually refined by minimizing the least-squared differences between modeled data u and practical seismic recordings d . However, because the data fitting procedure is highly nonlinear, the standard waveformdifference objective function suffers from cycle skipping problem seriously. The envelope E measures the instantaneous amplitude of a seismic signal s where H represents Hilbert transform. In FWI studies and applications, the envelope-based objective functions are widely used to reduce cycle skipping difficulty (Wu et al. 2014;Yuan et al. 2015;.  showed that the envelope-difference objective function exhibits stronger sensitivity to Q anomaly and can naturally balance the velocity and Q updates for effective inversion. Thus, in this study, an envelopedifference objective function is introduced to invert for the velocity and Q models in viscoelastic FWI where ‖ ⋅ ‖ 2 represents l 2 norm, x r represent the positions of seismometers, m indicate physical properties (e.g., velocity or modulus) describing subsurface media, t max is the length of recording time, V denotes the underground space, E and E obs indicate the envelopes of synthetic and observed data, respectively. Because the model sizes and data sets in the inversion applications of exploration and global seismology are usually extremely large, the local Newton optimization approach is commonly used to solve the inverse problem. At each updating loop, the search direction Δm can be obtained by solving the Newton linear system where g = ∇ m and H = ∇ 2 m are the gradient and Hessian matrix. The gradient in Eq. 3 can be derived as (Bozdağ et al. 2011; where * indicates complex conjugate transpose, E = E − E obs ∕E is relative change of the envelope, and the Fréchet derivative wavefield u∕ m measures the wavefield perturbation due to the perturbations of model parameters. Thus, the adjoint source of the envelope-difference misfit function can be obtained as Following Tromp et al. (2005), we reformulate the gradient as where K m denotes the sensitivity kernel corresponding to relative model perturbation Δm∕m . For (very) large-scale inversion applications in earthquake and exploration seismology, explicitly calculating the Hessian matrix and/or its inverse is computationally unaffordable. In this type of gradient-based optimization methods, the Hessian matrix and/ or its inverse is generally ignored. In truncated-Newton methods (Métivier et al. 2013;Yong et al. 2022), the linear system in Eq. 4 is approximately solved with conjugate-gradient algorithms. In this study, we use a quasi-Newton l-BFGS optimization method to construct the search direction Δm , wherein the inverse Hessian is approximated by the gradient and model changes within a limited number of previous iterations (Nocedal and Wright 2006). The inverse Hessian approximation is applied to precondition the gradients using a "two-loop" recursion scheme (Nocedal and Wright 2006). At the k th iteration, the underground properties are corrected by where k is the step length determined using a backtracking line search method.
In viscoelastic FWI, a larger number of forward simulations need to be performed for obtaining the synthetic data u . Seismic wave propagation in a linear viscoelastic solid is governed by the following momentum equation and constitutive law where denotes the distribution of mass density, f is the source term vector, : indicates double tensor contraction operation and, in a linear viscoelastic solid, the stress tensor is determined by the entire history of strain fields and can be formulated as the convolution between relaxation function c and time derivative of the stress tensor (or displacement gradient ∇u ) (Liu et al. 1976;Robertsson et al. 1994). The relaxation function c determines the anelastic behavior of the materials, which can be modeled using the phenomenological model represented mechanically by a combination of springs and dash spots. The system constructed by parallel connection of N standard linear elements is referred to as the GSLS rheology. The relaxation function of Zener model writes where c R is the relaxed modulus, H(t) is the Heaviside step function, and p and p indicate the strain and stress relaxation times of the pth rheological body, respectively (Blanch et al. 1995;Bohlen 2002). However, the convolution in Hooke's law impedes numerical simulation of the viscoelastic waves. To overcome this problem, we can apply time derivative to the stress tensor, leading to where p refers to the tensor of memory strain variables, which describe anelastic characteristics of the wavefields and the memory variables satisfy the following differential equation Thus, by introducing the memory strain variables, the damping effects of attenuation on propagating waves in a dissipative medium can be modeled by solving several equations numerically with the superposition of N parallel relaxation mechanisms (Blanch et al. 1995). In viscoelastic forward simulations and adjoint inversion, if different numbers of parallel relaxation mechanisms N are used, additional inversion uncertainties may be introduced. Thus, N = 3 is used for modeling the attenuation effects in all of the numerical experiments presented in this study. Considering isotropy in attenuating media, the relaxation function can be expressed with bulk modulus and shear modulus where ij is the Kronecker delta function, and i, j, k, l take on the values of x, z for 2D media.
In this study, we use a time-domain spectral element (SEM) method to simulate the viscoelastic wave propagation (Komatitsch and Tromp 1999). In SEM, the weak formulation of the momentum equation is solved on the mesh of hexahedral elements, on which the wavefield is discretized using high-degree Lagrange interpolants and integrated using the Gauss-Lobatto-Legendre (GLL) rule. Compared to the other forward modeling methods, SEM combines the flexibility of finite element methods for incorporating irregular topography with the high accuracy of pseudo-spectral methods.

Construct Q Model
Q is commonly used to quantify and describe the damping effects of intrinsic attenuation on propagating seismic waves. Smaller Q values imply stronger attenuation. In frequencydomain, Q is defined by where denotes angular frequency, ℝ and mean real and imaginary parts, respectively, and c indicates the time derivative of c in frequency-domain

3
where F means forward Fourier transform, i denotes the imaginary unit, and p describes the strength of viscoelastic attenuation Thus, reciprocal of Q becomes To fit a given quality factor Q −1 0 over a range of angular frequencies min , max , the parameters of relaxation times p and p can be obtained by minimizing the distance between Q −1 0 and Eq. 17 through a least-squares optimization process (Emmerich and Korn 1987). However, in the classical approach given in Emmerich and Korn (1987), the results of p < p may be obtained, which is not suitable both physically and mathematically (Blanc et al. 2016). Thus, in this study, according to Blanc et al. (2016), the following objective function is introduced where r = 1, ..., R indicates the index of angular frequencies that are distributed linearly on a logarithmic scale of R points (Blanc et al. 2016). In this study, we minimize the objection function J using the nonlinear SolvOpt optimization algorithm with the positivity constraint of p > p > 0 ∀p (p is integer) (Kappel and Kuntsevich 2000;Kuntsevich and Kappel 1997). This approach is more accurate than the classical linear optimization method.

Sensitivity Kernels of Different Methods for Viscoelastic FWI
Sensitivity kernels are essential for effective model updates in viscoelastic FWI. However, as described above, the attenuation effects are modeled by superimposing N standard linear damping mechanisms without explicit description of Q in the rheological bodies with the GSLS model. Instead, a finite and constant Q within seismic frequency band is approximated through an optimization procedure. Thus, it becomes problematic and complex to derive and construct the Q sensitivity kernels for viscoelastic FWI.
Based on the GSLS forward modeling engine in time-domain, different frameworks have been suggested to construct the viscoelastic FWI sensitivity kernels. However, discrepancies exist between these methods, which gives rise to the problem of determining the best approach for sensitivity kernels calculation. In the following sections, we first revisit the previous theories for calculating the viscoelastic sensitivity kernels and then introduce the adjoint-state method for deriving and calculating the sensitivity kernels. Charara et al. (2000) derived the sensitivity kernels for unrelaxed moduli and attenuation parameters with a perturbation approach based on Born approximation. At time t = 0 + , the relaxation function c reduces to the unrelaxed stiffness tensor c U corresponding to the high-frequency limit

Method I
The difference between relaxed and unrelaxed stiffness tensors is given by The stiffness difference c , which is proportional to p , measures the attenuation magnitude and is defined as the attenuation parameter that remains to be estimated in this method. The relaxation function can be expressed in terms of c U and c Within the seismic frequency band, the ratio of c U to c R can be assumed to be constant (Carcione 2004;Charara et al. 2000) Thus, the first-order derivative of relaxation function with respect to time can be expressed as Inserting Eq. 23 into the constitutive relation yields where r p is the tensor of memory strain variables. Note that r p is slightly different from p in Eq. 11, which involves the time derivative of the strain tensor.
Supposing that a viscoelastic anomaly is embedded in an infinite homogeneous background, the model perturbations are Δ , Δc U and Δ c . The wavefield perturbation can be expressed as where Δ and Δr p refer to the perturbed stress tensor and memory strain variables, respectively. Under the Born approximation (or single scattering approximation), the high-order terms in Eq. 26 can be ignored, which gives the following wave equation where ΔM is the equivalent moment tensor source. As can be seen the propagation of the perturbed seismic waves caused by model changes is described by Eq. 27. Interaction of the incident wave with model perturbations on the right side of Eq. 27a serves as the virtual scattering source. Based upon integration by parts and applying far-field approximation, the solution of Eq. 27 can be written using Green's function G with the representation theory (Charara et al. 2000;Pan et al. 2016) The isotropic and viscoelastic media can be described with the model parameters of density , unrelaxed bulk and shear moduli ( U and U ), and the corresponding moduli differences = R − U and = R − U (or attenuation parameters). Taking partial derivatives of the wavefield perturbation with respect to the perturbations of these model parameters, we can obtain the Fréchect derivative wavefields. Then, the sensitivity kernels can be derived as where u † = G ⊗ f † is the adjoint displacement field, ⊗ represents convolution, D = ∇u + (∇u) T ∕2 − (∇ ⋅ u)I∕3 (T means matrix transpose) and D † are the traceless strain deviator and its adjoint, r p and ř p are the trace and deviatoric parts of r p . For sake of compactness, we ignore the integrations over time, volume, sources and receivers in Eq. 29. In this study, we ignore the influence of density for viscoelastic FWI and thus the density kernel K is not presented. We observe that in method I, the unrelaxed moduli sensitivity kernels K U and K U are the same with the moduli sensitivity kernels in purely isotropic and elastic media. The sensitivity kernels K and K for attenuation parameters can be constructed by cross-correlating the forward memory strain variables and adjoint displacement wavefields.

Method II
In this section, we follow the approach given by Tromp et al. (2005) to derive the Q sensitivity kernels for viscoelastic FWI. With the assumption that Q is constant over a wide range of seismic frequencies, in frequency domain, the modulus W is associated the corresponding quality factor Q W by where 0 means reference angular frequency, the symbols ln and sgn indicate natural logarithm and step function, respectively (Kolsky 1952;Futterman 1962;Dahlen and Tromp 1998;Tromp et al. 2005). Variation of the modulus due to the perturbation of Q W can be derived as  Using the chain rule and frequency-domain Born scattering integral, the sensitivity kernels for moduli quality factors ( Q and Q ) can be expressed in terms of the corresponding moduli sensitivity kernels where K iso and K iso are the moduli sensitivity kernels in purely isotropic and elastic media (Tromp et al. 2005). However, a different adjoint source f † Q is needed for calculating the adjoint wavefields in Eq. 32 where f † is the regular adjoint source of envelope-difference objective function (Eq. 5) in frequency-domain (Tromp et al. 2005). The first term in adjoint source f † Q involving ln ∕ 0 controls the physical dispersion of the seismic signals, whereas the second term is simply the Hilbert transform of the regular adjoint source f † measuring dissipation of the seismic amplitudes (Bozdağ et al. 2011). Under this formulation, the moduli sensitivity kernels K and K are the same as those in purely isotropic and elastic media. It means that when simultaneously inverting for moduli and Q parameters, different adjoint sources are needed to construct the adjoint wavefields, which doubles the computational cost for sensitivity kernel calculation. Furthermore, we realize that the Q sensitivity kernels in Eq. 32 and adjoint source f † Q are derived utilizing the attenuation physical mechanism defined by Kolsky (1952) and Futterman (1962), which is different from the GSLS rheology used in the viscoelastic forward modeling. The modeling errors between two different physical models may produce unexpected uncertainties for Q inversion in this method (Keating and Innanen 2019).

Adjoint-State Method
According to Fichtner and van Driel (2014), the relaxation parameters are selected by minimizing the objective function J (Eq. 18) such that p is approximately equal to the inverse of Q. Thus, the relaxation function can be expressed in terms of Q: In this condition, the Q sensitivity kernels in viscoelastic FWI can be directly derived following the standard adjoint-state method by inserting the above constitutive relation into the equation of motion (Eq. 8). The Lagrangian formulation of the envelope-difference objective function can be written as where is Lagrangian multiplier. In isotropic media, the quality factors Q and Q can be incorporated into the constitutive relation with the enforcement of = Q −1 and = Q −1 , yielding Inserting Eq. 36 into Eq. 35 and following the derivation process in adjoint-state method with integration by parts and divergence theorem (Liu and Tromp 2006), the variation of the Lagrangian objective function due to the perturbations of the model parameters , , , Q and Q can be derived as iso where ΔE represents the perturbation of envelope due to model variation, ̃ = ∇ is the Lagrangian strain field, ̂ and ̌ mean the trace and deviatoric parts of ̃ , ̂ p and ̌ p indicate the trace and deviatoric parts of Lagrangian memory strain variables ̃ p : As described in Métivier et al. (2013), the Lagrangian is stationary with respect to ΔE in the absence of the model perturbations. Thus, the adjoint-state equation can be obtained when the coefficient of ΔE becomes zero We define the adjoint displacement field u † as time-reversed Lagrangian field t max − t . Then, the sensitivity kernels for the , , Q and Q models based on the adjoint-state method can be obtained as where † indicate the adjoint strain fields defined as the time-reversed Lagrangian strain fields ∇ t max − t , ̂ † and ̌ † are the trace and deviatoric parts of the adjoint strain fields † , ̂ † and ̌ † are the trace and deviatoric parts of the adjoint memory strain variables † . As can be seen, following the derivation process of adjoint-state (or Lagrangian multiplier) method, the Q sensitivity kernels can be calculated efficiently with the adjoint memory strain variables.

Sensitivity Kernels with Velocity-Q Model Parameterization
In the previous section, the sensitivity kernels of moduli and attenuation parameters are given for different methods. However, in practical FWI experiments, the viscoelastic media is commonly described using velocity-Q model parameterization, which is associated with density , velocity (P-wave velocity V P and S-wave velocity V S ), and quality factor (P-wave quality factor Q P and S-wave quality factor Q S ). In this section, we derive the sensitivity kernels in velocity-Q model parameterization for different methods.
In method I, Charara et al. (2000) derived the sensitivity kernels for , U , U , and . Herein, with the enforcement of = Q −1 and = Q −1 given by Fichtner and van Driel (2014), the attenuation parameters and are associated with Q and Q by Thus, the sensitivity kernels for Q and Q in method I can be obtained as According to Dahlen and Tromp (1998), Q P and Q S are related to Q and Q by where = V P ∕V S and = Q P ∕Q S . Sensitivity kernels for V P , V S , Q P and Q S in velocity-Q model parameterization can be expressed in terms of the sensitivity kernels in modulus-Q parameterizaion (Pan and Wang 2020): The relationships in Eq. 44 work for all of the three methods. Thus, in the inversion process, the velocity and Q models can be updated and reconstructed.

Numerical Experiments
In this section, we first give a synthetic example with separated velocity and Q anomalies to analyze the sensitivity kernels of different methods and evaluate their inversion performances. Next, numerical experiments based on a modified Canadian Foothill model with topographic variations and complex structures are given to invert for near-surface velocity and Q structures using SWs and early arrivals of BWs. The numerical modeling and inversion experiments presented in this section are performed using the FWI code package of SeisElastic2D ) based on SEM forward modeling method in viscoelastic medium. Figure 1 shows true velocity and Q models. A starting homogeneous V P model is designed with a constant value of 5500 m/s. The target V P structure is created by embedding one anomaly with +6% perturbation in the upper parts of the homogeneous background, as shown in Fig. 1a. The true and initial V S models are created from the true and initial V P models with V P ∕V S = 2 . The initial Q P and Q S models are homogeneous with Q P = Q S = 150 . The true Q P and Q S models are created by embedding one attenuating anomaly ( Q P = Q S = 10 ) in the lower parts of the homogeneous background, as shown in Fig. 1c, d. The true and initial models are homogeneous with a constant value of 3000 kg/m 3 . In the inversion experiments, the influence of is ignored and the model is kept unchanged. The model is 1000 m wide and 1000 m deep. A quadrilateral SEM mesh defined using 5 × 5 GLL points with 35 and 35 elements in vertical and horizontal directions is used to discretize the models. A total of 32 sources and 194 receivers are regularly distributed along the left and right sides of the model with Δx s = 60 m and Δx r = 10 m, as indicated by the red stars and white triangles in Fig. 1. A Ricker wavelet with dominant frequency of 30 Hz is used as source for forward simulation. We set all boundaries of the model with absorbing boundary conditions to suppress the artificial reflections.

Sensitivity Kernel Analysis with Separated Velocity and Q Anomalies
In the numerical experiments, to reduce the influences of P-S wave mode coupling, the P-waves are separated to calculate the sensitivity kernels of compressional velocity V P and quality factor Q P . S-waves are separated to calculate the sensitivity kernels of shear velocity V S and quality factor Q S . The sensitivity kernels are first calculated using the three different methods for comparison, as presented in Fig. 2. The velocity sensitivity kernels of different methods appear to be close. Some trade-off artifacts caused by the errors of attenuation models are observable in the lower parts of the velocity sensitivity kernels. However, the Q sensitivity kernels calculated using different methods differ obviously. In Fig. 2g, h, the Q P and Q S sensitivity kernels calculated using method II suffer from strong trade-off artifacts due to the velocity errors in the upper parts, as indicated by the arrows, whereas, in the Q P and Q S sensitivity kernels calculated using method I and the adjointstate method, the trade-off artifacts are very weak, as shown in Fig. 2c, d, k, l. These observations suggest that compared to method II, the Q sensitivity kernels calculated with the memory strain variables in method I and the adjoint-state method can naturally suppress the trade-off artifacts, as the memory strain variables directly describe the anelastic characteristics of the seismic waves. The discrepancies between GSLS and Kolsky-Futterman models for modeling attenuation effects lead to the trade-offs in method II. Synthetic inversion experiments are then performed. To evaluate the influences of velocity errors on Q inversion, we first only update the Q models based on the starting homogeneous velocity structures. The iterative inversion stops when the number of iterations reaches 12 or the relative data misfit decrease is smaller than 0.1%. The Q P and Q S models inverted using these methods are presented in Fig. 3. To test the performances of different methods further, we then carry out inversion experiments by simultaneously updating velocity and Q models, as presented in Fig. 4. The inversion stops when the number of iterations reaches 60 or the relative data misfit decrease is smaller than 0.1% . The well logs of inversion results at horizontal distance of 500 m are extracted for comparison, as plotted in Fig. 5a, b. Reductions of the normalized data misfits in the inversion experiments are plotted in Fig. 6. As can be seen, when using initial velocity models, the Q P and Q S models inverted by method II are contaminated more intensely by the trade-off artifacts of velocity errors, as indicated by the arrows in Fig. 1 a, b are the true V P and V S models; the red stars and white triangles in a, b indicate the schematic distributions of the sources and receivers, respectively. c, d are the true Q P and Q S models Fig. 3b, e. This is caused by the discrepancies between GSLS and Kolsky-Futterman models. Compared to method II, method I and the adjoint-state method allow obtaining better Q P and Q S models with stronger Q anomalies and fewer crosstalk artifacts, as shown in Fig. 3a, d, c, f. The data misfits in method I and the adjoint-state method also reduce faster, as shown in Fig. 6a. When simultaneously updating velocity and Q models, the three methods can resolve the velocity perturbations in the upper parts well. However, strong trade-off artifacts appear in the lower parts of the inverted velocity models caused by inaccurate Q models as indicated by the arrows in Fig. 5c, d. In the Q inversion results, some weak artifacts exist in the upper parts, as indicated by the arrows in Fig. 4g-l, whereas the Q anomalies in the lower parts are clearly discernable. In the simultaneous inversion experiments, data misfits of method II reduce slower than the other methods, as shown in Fig. 6b. Method II can produce comparable quality inversion results at the cost of higher computation requirements.

Canadian Foothill Model Example with Irregular Topography
In this part, we design a synthetic viscoelastic Canadian Foothill model with complex structures and irregular topography, which is closer to practical conditions in the subsurface Pan et al. 2022). The three different methods are applied to computing the viscoelastic sensitivity kernels and inverting for velocity and Q models using Fig. 2 a-d are the velocity and Q sensitivity kernels ( K V P , K V S , K Q P and K Q S ) calculated using method I; e-h are the velocity and Q sensitivity kernels calculated using method II; i-l are the velocity and Q sensitivity kernels calculated using the adjoint-state method both SWs and BWs. In Fig. 7, we present the true V S , initial V S and true Q ( Q P = Q S ) models with topographic variations. The well log data at a horizontal distance of 325 m, as indicated by the white dashed lines in Fig. 7, are extracted from the models for presentation. The topographic surface is depicted by the black solid lines. The white area is the air layer. The initial V S model, as presented in Fig. 7c, is created by smoothing the target V S model using a 2D Gaussian function. The smoothing radii in horizontal and vertical directions are 20 m. The true and starting models for V P are generated directly from the true and starting V S models with the same structure variations and a constant value of V P ∕V S = 3 . In practical conditions of the near-surface, the unconsolidated formations always have low velocity values and strong attenuation effects (small Q). Thus, we create the true Q P and Q S models by embedding strong Q anomalies ( Q P = Q S = 20 ) in the homogeneous background ( Q P = Q S = 150 ), which overlap with the low velocity zones at the near-surface, as shown in Fig. 7e. The density model is set with a constant value of 1800 kg/m 3 and kept unchanged in the inversion experiments. The model size in the horizontal direction is 450 m. The maximum depth from the topographic surface to the model bottom is 150 m. To account for the effects caused by the irregular surface, an unstructured quadrilateral mesh is generated to discretize the model.
We arrange 33 sources and 151 receivers ( Δx s = 12 m and Δx r = 3 m) regularly along the irregular surface. The source time function used in forward modeling is a Ricker wavelet with dominant frequency of 30 Hz. A free-surface boundary condition is applied on the topographic surface and absorbing boundary conditions are applied on the other sides of the model. In Fig. 8a, c, we present the observed shot gathers (black lines) of vertical (z) and horizontal (x) components calculated from the target models in comparison with the synthetic shot gathers (red lines) calculated from the initial models. Figure 8b, d shows the detailed comparison of the traces from 350 to 450 m in the horizontal distance. Figure 8e, f shows the corresponding data residuals of z (red lines) and x (gray lines) components. It can be seen that SWs in the short-offset profiles are Fig. 3 a-c are the Q P models inverted using method I, II and the adjoint-state method with the initial velocity models, respectively; d-f are the Q S models inverted by the three different methods 1 3 much stronger than the BWs. Because the initial Q P and Q S models are homogeneous and do not contain the strong attenuation anomalies at the near-surface, the synthetic traces have larger amplitudes than the observed recordings. In the numerical experiments, we design time windows to separate SWs and BWs, which helps reduce the complex trade-offs between different model parameters. SWs are used to calculate the V S and Q S sensitivity kernels and update the V S and Q S models. BWs are used to calculate the V P and Q P sensitivity kernels and update the V P and Q P models. If the SWs and BWs are not separated, the V P and Q P models will be updated slowly and suffer from trade-offs artifacts seriously.
We first calculate the Q sensitivity kernels with the true velocity models within the frequency band of [3 Hz, 25 Hz] using different methods for comparison, as shown in Fig. 9. The Q sensitivity kernels constructed with different methods are very close with only small Fig. 4 a-c are the V P models inverted using method I, II and the adjoint-state method, respectively; d-f are the V S models inverted using method I, II and the adjoint-state method; g-i are the Q P models inverted using method I, II and the adjoint-state method; j-l are the Q S models inverted using method I, II and the adjoint-state method Fig. 5 a, b are the comparison of well logs extracted from the inverted Q P and Q S models with the initial velocity models; c, d are the comparison of well logs extracted from the inverted V P and V S models in the simultaneous inversion experiments; e, f are the comparison of well logs extracted from the inverted Q P and Q S models in the simultaneous inversion experiments. The black and gray lines indicate initial and true models. The black dashed, black dash-dotted and gray dashed lines indicate method I, II and the adjointstate method, respectively Fig. 6 a shows the normalized data misfits reductions for Q inversion with the initial velocity models using method I, II and the adjoint-state method; b shows the normalized data misfits reductions when simultaneously inverting for velocity and Q models. The black dashed, black dash-dotted and gray dashed lines indicate method I, II and the adjoint-state method, respectively differences as indicated by the arrows in Fig. 9 and the attenuation anomalies at the nearsurface can be imaged clearly without obvious artifacts. Then, we calculate the sensitivity kernels using the initial models obtained by smoothing the true models. Figure 10 presents the V S and Q S sensitivity kernels obtained by different methods using SWs when the initial velocity models are used (Pan et al. 2022). Figure 11 presents the V P and Q P sensitivity kernels calculated using BWs with by different methods. The velocity sensitivity kernels constructed with different frameworks do not exhibit obvious differences. However, the Q sensitivity kernels appear to contain more trade-off artifacts caused by velocity errors compared to those (Fig. 9) obtained with true velocity models. Furthermore, it is observed that the Q S and Q P sensitivity kernels calculated by method II suffer from stronger trade-off artifacts than those calculated by the other methods, as indicated by the arrows in Fig. 10d. The imaged Q anomaly structures appear distorted.
The Q S and Q P inversion results obtained using different methods with the true and initial velocity models are given in Figs. 12 and 13 for presentation, respectively. Stopping criteria of the inversion is that when the number of iterations reaches 12 or the relative data misfit decrease is smaller than 0.1% . Figure 14 shows the comparisons of well log data at horizontal distance of 325 m extracted from the Q S and Q P models inversion results. Figure 15 presents the reduction histories of data misfits and model errors after normalization in the Q inversion experiments with true and initial velocity models. It is observed that when the velocity models are correct, the three different methods can  Fig. 10 in  resolve the Q structures at the near-surface reliably and clearly. Due to illumination limitation of the early arrivals of BWs, some artifacts appear in deeper parts of the inverted Q P models, as indicated by the arrow in Fig. 14b. Even though, the inverted Q P model by method II appears to suffer from fewer artifacts, as indicated by the arrows in Fig. 12, these methods can provide inversion results of comparable quality, as indicated by the data misfits and model errors in Fig. 15a, c. However, in the presence of velocity errors, the inverted Q models are significantly damaged by the trade-off artifacts in the deeper parts, as indicated by the arrows in Fig. 14d. Furthermore, method II suffers from the trade-off errors more seriously compared to the other methods, as indicated by the data misfits and model errors in Fig. 15b, d. This may result from the fact that different physical models are used in forward modeling and Q sensitivity kernel calculation in method II. The Q sensitivity kernels calculated directly using the memory strain variables in method I and the adjoint-state method are more accurate and less prone to the trade-offs.
Finally, we perform inversion experiments for simultaneously determining the velocity and Q models by different methods. The inversion loops were ended when the number Fig. 8 a, c show the comparison of observed data (black lines) and synthetic data (blue lines) calculated from the initial models of z and x components, respectively; b, d show the detailed comparisons of the data from 400 to 450 m in horizontal direction; e shows the corresponding data residuals of z (red lines) and x (gray lines) components; f shows the detailed data residuals of iterations reaches 60 or the relative data misfit decrease is smaller than 0.1% . In stage I, SWs are extracted from the shot gathers to update V S and Q S models simultaneously. The inversion outputs are presented in Fig. 16. In stage II, BWs are extracted to update Fig. 9 a, b are the Q S and Q P sensitivity kernels ( K Q S and K Q P ) calculated using SWs and BWs with true velocity models by method I; c, d are the Q S and Q P sensitivity kernels calculated by method II; e, f are the Q S and Q P sensitivity kernels calculated by the adjoint-state method Fig. 10 a, b are the V S and Q S sensitivity kernels ( K V S and K Q S ) calculated using SWs with initial wrong velocity models by method I; c, d are the V S and Q S sensitivity kernels calculated by method II; e, f are the V S and Q S sensitivity kernels calculated by the adjoint-state method V P and Q P models simultaneously. The inversion outputs are plotted in Fig. 17. Well logs are extracted from the inversion results for comparison, as shown in Fig. 18. Figures 19  and 20 show the reduction histories of data misfits and model errors with normalization Fig. 11 a, b are the V P and Q P sensitivity kernels ( K V P and K Q P ) calculated using BWs with initial wrong velocity models by method I; c, d are the V P and Q P sensitivity kernels calculated by method II; e, f are the V P and Q P sensitivity kernels calculated by the adjoint-state method Fig. 12 a, b are the inverted Q S and Q P models with the true velocity models by method I using SWs and BWs, respectively; c, d are the inverted Q S and Q P models by method II; e, f are the inverted Q S and Q P models by the adjoint-state method by inversion of SWs and BWs in stage I and II. Method II suffers from slower convergence rate because of the trade-off artifacts in the Q sensitivity kernels. As optimization progresses, the velocity models are iteratively improved and the trade-off artifacts in the Q models inverted are gradually suppressed. Even though, some small differences exist between the inverted models by different methods, as indicated by the arrows in Figs. 16 Fig. 13 a, b are the inverted Q S and Q P models with the initial wrong velocity models by method I using SWs and BWs, respectively; c, d are the inverted Q S and Q P models by method II; e, f are the inverted Q S and Q P models by the adjoint-state method Fig. 14 a, b show the comparisons of well logs extracted from the inverted Q S and Q P models (Fig. 12) with the true velocity models at horizontal distance of 325 m by different methods; c, d show the comparisons of well logs extracted from the inverted Q S and Q P models (Fig. 13) with the initial wrong velocity models by different methods. The black and gray solid lines indicate the true and initial models. The black dashed, black dash-dotted and gray dashed lines indicate the inverted models obtained by method I, II and the adjoint-state method, respectively and 17, the final inversion results of method II are comparable to those obtained by method I and the adjoint-state method, as indicated by the model errors in Fig. 20a, b. However, because the adjoint wavefields in velocity and Q sensitivity kernels need to be constructed using different adjoint sources, method II is almost two times more expensive than the other two methods. Finally, the modeled data generated using the adjoint-state method inversion results is compared with the observations. It can be seen that amplitudes and travel time of the synthetics are consistent with the observations (Fig. 21).

Discussion
In the first synthetic inversion example, an cross-well acquisition survey (as shown in Fig. 1) is designed for investigating the coupling effects between velocity and Q models. Ray coverage and source-receiver illumination for velocity and Q anomalies are the same. Transmission waves are mainly used for inversion. If a surface acquisition Fig. 15 a shows the reductions of normalized data misfits for Q S and Q P inversion using SWs and BWs with the true velocity models; b shows the reductions of normalized data misfits for Q S and Q P inversion using SWs and BWs with the initial velocity models; c, d show the corresponding reductions of normalized model errors for inversion of Q S and Q P models using SWs and BWs with true and initial velocity models, respectively. The black solid, black dashed and black dash-dotted lines indicate method I, II and the adjointstate method for Q S inversion with SWs. The gray solid, gray dashed and gray dash-dotted lines indicate method I, II and the adjoint-state method for Q P inversion with BWs survey, in which sources and receivers are distributed on top surface of the model, is used for numerical experiments, the Q structures in deeper parts will be more difficult to be recovered. First, because offset of the surface acquisition survey is small, reflected waves are needed to invert for Q models in deep regions suffering from higher nonlinearity. The velocity structures at shallow parts are easier to be estimated benefiting from better ray coverage and source-receiver illumination. Furthermore, according to our experience and previous studies (Zhu 2014), velocity contrasts produce stronger influences on reflected seismic waves than Q contrasts. Compared to reflections, the transmission waves traveling through the attenuating zones are more sensitive to Q perturbations. This means that it is more difficult to decouple the velocity and Q models using reflected waves in the surface acquisition survey. In the second synthetic example for imaging the near-surface velocity and Q structures, the SWs and BWs of early arrivals Fig. 16 a, b show the simultaneously inverted V S and Q S models using SWs in stage I by method I; c, d are the inverted V S and Q S models in stage I by method II; e, f are the inverted V S and Q S models in stage I by the adjoint-state method Fig. 17 a, b are the simultaneously inverted V P and Q S models using BWs in stage II by method I; c, d are the inverted V P and Q P models in stage II by method II; e, f are the inverted V P and Q P models in stage II by the adjoint-state method can be considered as transmission waves. For further studies, it is necessary to provide more numerical experiments and detailed analysis about the influences of reflection and transmission waves in different acquisition settings (e.g., cross-well and vertical seismic profile acquisitions) for decoupling velocity and Q models.
Different physical mechanisms have been suggested to model the attenuation effects on propagating waves in viscoelastic media (Kolsky 1952;Futterman 1962;Carcione et al. 1988;Robertsson et al. 1994;Wang et al. 2019). Ursin and Toverud (2002) gave a comparison of different seismic dispersion and attenuation models and concluded that the standard linear solid (SLS) model shows behavior different to that of the Kolsky-Futterman model. In method II Fig. 18 a, b show comparisons of well logs extracted from the inverted V S and Q S models by inversion of SWs using different methods; c, d show comparisons of well logs extracted from the inverted V P and Q P models by inversion of BWs using different methods. The black and gray solid lines indicate the initial and true models. The black dashed, black dash-dotted and gray dashed lines indicate method I, II and the adjoint-state method, respectively Fig. 19 Reductions of normalized data misfits for simultaneous inversion of velocity and Q models using SWs and BWs by different methods. The black solid, black dashed and black dashdotted lines indicate method I, II and the adjoint-state method for inversion of SWs, respectively. The gray solid, gray dashed and gray dash-dotted lines indicate method I, II and the adjoint-state method for inversion of BWs, respectively described in this study, the GSLS and Kolsky-Futterman models are used for forward simulations and calculating the Q sensitivity kernels, respectively. The discrepancies between GSLS model with N = 3 and the Kolsky-Futterman model increase the inversion trade-offs in method II. However, in method I and the adjoint-state method, the Q sensitivity kernels are calculated by involving memory strain variables, which characterize and capture the attenuation effects more effectively based on the GSLS model. Keating and Innanen (2019) investigated the modeling errors between SLS and Kolsky-Futterman models for viscoacoustic FWI in frequency-domain and showed that appropriate frequency band helps reduce the discrepancies between SLS and Kolsky-Futterman models. However, few researchers compare the behavior of GSLS model with different numbers of parallel relaxation mechanisms with that of the Kolsky-Futterman model. Hence, the appropriate frequency band and number of parallel relaxation mechanisms for reducing the modeling errors between GSLS and Kolsky-Futterman models remain to be determined in our future studies.

Conclusions
In this study, we revisit different methods for constructing the sensitivity kernels for time-domain viscoelastic FWI on the basis of GSLS rheology. In method I and the adjoint-state method, the Q sensitivity kernels are constructed with memory strain Fig. 20 a shows the reductions of normalized model errors for simultaneous inversion of V S and Q S models using SWs; The black solid, black dashed and black dash-dotted lines indicate method I, II and the adjointstate method for V S inversion. The gray solid, gray dashed and gray dash-dotted lines indicate method I, II and the adjoint-state method for Q S inversion. b shows the reductions of normalized model errors for simultaneous inversion of V P and Q P models using BWs. The black solid, black dashed and black dash-dotted lines indicate method I, II and the adjoint-state method for V P inversion. The gray solid, gray dashed and gray dash-dotted lines indicate method I, II and the adjoint-state method for Q P inversion variables, whereas, in method II, an additional adjoint source is introduced based on the Kolsky-Futterman model and Born scattering integral. Synthetic inversion experiments with both cross-well and surface acquisition geometries are provided. Different methods are used to calculate the sensitivity kernels and invert for the models for comparison using transmission waves including SWs and BWs of early arrivals. The observations suggest that the Q sensitivity kernels constructed with memory strain variables are more accurate, suffering from fewer trade-off artifacts. When simultaneously inverting for the velocity and Q models, these different methods can provide inversion results of comparable quality. However, method II converges more slowly and is more expensive than the other methods.
Author Contributions JS and HC contributed to conceptualization; WP and JS provided methodology; JS and WP carried out numerical experiments; ZY and HL performed data analysis and processing; JS and WP performed writing-original draft preparation; HC performed writing-review and editing; JS and WP contributed equally overall; WP, HC, and JS performed acquired funding; ML and XH provided coding.

Declarations
Financial Interest All authors certify that they have no affiliations with involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.