1 Introduction

The modern theory of singular limits of nonlinear evolutionary PDEs was first developed for the case when uniform Sobolev-space bounds can be obtained for both solutions and their first time derivative [14, 15]. Although the requirement that the first time derivative be uniformly bounded was later eliminated [7, 22], the many results that followed ([1,2,3, 6, 8, 11, 17, 20, 23] and references therein) still require that the solution and some number of its spatial derivatives be uniformly bounded. Almost all of those singular limit results involve systems in which the large terms have constant coefficients, but even the few results for systems with variable-coefficient large terms [4, 5, 9, 13, 24] concern cases for which uniform spatial estimates can be proven.

However, the derivatives of solutions to many systems of nonlinear evolutionary PDEs containing a parameter do not remain uniformly bounded as that parameter tends to its limit. As will be discussed in Sect. 6.1, in some cases this lack of uniform bounds causes the time of existence of the solutions to tend to zero, but in other cases the solution nevertheless exists for a uniform time. One case of particular interest is when the solution has a particular structure

$$\begin{aligned} u=U(t,\textbf{x},\tfrac{\mathbf {\phi }(t,\textbf{x})}{\varepsilon })+o(1) \end{aligned}$$
(1.1)

that explains the persistence of the solution despite the nonuniformity of the norms of its spatial derivatives. Solutions having such structures are common in the theory of geometric optics [10, 12, 13, 19], in which the initial data contains rapidly-varying terms of the form \(u_0(\textbf{x},\frac{\mathbf {\phi }_0(\textbf{x})}{\varepsilon })\).

In this paper we analyze equations having solutions of the form (1.1) from the point of view of singular limits, in which some terms in the PDE are of size \(O(\frac{1}{\varepsilon })\), rather than of geometric optics, in which the initial data contains bounded terms whose first derivatives are of size \(O(\frac{1}{\varepsilon })\). Although geometric optics problems can be transformed into singular limit equations by adding the variables \(\mathbf {\theta }\mathrel {:=}\tfrac{\mathbf {\phi }(t,\textbf{x})}{\varepsilon }\) [22, Sect. 5], [13, Sect. 5.1], and singular limit equations with large term \(\frac{1}{\varepsilon }C\partial _y\) can be formally transformed into geometric optics problems by a change of variables

$$\begin{aligned} y\mapsto {\widehat{y}}=\varepsilon y, \end{aligned}$$
(1.2)

these operations are not inverses of each other, so each viewpoint yields a different perspective. In particular, the fact that singular limit equations do not come equipped with phase functions or even their initial values adds to the challenge and interest. A more detailed analysis of the relation between singular limit equations and geometric optics will be presented in Sect. 6.3.

1.1 Equations

In this paper we construct appropriate phase functions for singular limit equations and systems, and use them to establish the existence and regularity of solutions for a time independent of the small parameter \(\varepsilon \) in the PDE and to determine the asymptotic form of solutions as \(\varepsilon \rightarrow 0\). Since the case of constant-coefficient large operators is mostly covered by the classical theory of singular limits mentioned above, the singular limit equations studied here will contain variable-coefficient large terms.

The phenomenon we study can be seen most simply in a PDE like \(u_t+u_x+\tfrac{\cos x}{\varepsilon }u_y=0\), whose solution having initial data \(u_0\) is \(u(t,x,y)=u_0(x-t,y-\tfrac{\sin (x)-\sin (x-t)}{\varepsilon })\). The form of that solution suggests trying the ansatz

$$\begin{aligned} u(t,x,y)\mathrel {:=}U(t,x,y-\tfrac{\mu (t,x)}{\varepsilon }) \end{aligned}$$
(1.3)

for the more general equation

$$\begin{aligned} a(t,x)u_t+b(t,x)u_x+\tfrac{c(t,x)}{\varepsilon }u_y+d(t,x,u)u_y+f(t,x,u)=0. \end{aligned}$$
(1.4)

Substituting (1.3) into (1.4) and defining

$$\begin{aligned} z\mathrel {:=}y-\tfrac{\mu (t,x)}{\varepsilon }\end{aligned}$$
(1.5)

yields

$$\begin{aligned} \tfrac{ c-a\mu _t-b\mu _x }{\varepsilon }U_z+ aU_t+bU_x+ dU_z+f=0. \end{aligned}$$
(1.6)

Hence if we let the “fast phase function” \(\mu \) satisfy the equation

$$\begin{aligned} a(t,x)\mu _t+b(t,x)\mu _x=c(t,x) \end{aligned}$$
(1.7)

obtained from the terms of order \(\tfrac{1}{\varepsilon }\) in (1.6) then the “profile” U should satisfy

$$\begin{aligned} a(t,x)U_t+b(t,x)U_x+ d(t,x,U)U_z+f(t,x,U)=0. \end{aligned}$$
(1.8)

Since the PDEs (1.7) and (1.8) are independent of the small parameter \(\varepsilon \), both \(\mu \) and U will exist and be bounded for a time independent of \(\varepsilon \), under suitable assumptions on the coefficients and initial data. In particular, we assume both here and later that

$$\begin{aligned} a\ge a_{\min }>0. \end{aligned}$$
(1.9)

Hence (1.3) yields the exact solution of the initial-value problem for the model Eq. (1.4), and in particular specifies precisely the dependence of the solution on \(\varepsilon \).

In this paper we consider several generalizations of (1.4) for which the ansatz (1.3) or generalizations of it describes the leading-order behavior of solutions, although it no longer yields exact solutions. Specifically, under appropriate conditions that vary in their generality we will prove uniform existence, i.e., existence for at least a time independent of \(\varepsilon \), and describe the asymptotics of the solutions for certain equations of the following forms:

Scalar hyperbolic equation with coefficients depending on y:
$$\begin{aligned}{} & {} a(t,x,y,\varepsilon u,\varepsilon )u_t+b(t,x,y,\varepsilon u,\varepsilon )u_x+\tfrac{c(t,x,y)}{\varepsilon }u_y \nonumber \\{} & {} \quad +d(t,x,y,u,\varepsilon ) u_y+f(t,x,y,u,\varepsilon )=0 \end{aligned}$$
(1.10)
Scalar nonuniformly parabolic equation:
$$\begin{aligned}{} & {} a(t,x,\varepsilon u,\varepsilon ) u _t +b(t,x,\varepsilon u,\varepsilon ) u _x+ \tfrac{c(t,x) }{\varepsilon }u _y+d(t,x,u,\varepsilon ) u _y +f(t,x,u,\varepsilon ) \nonumber \\{} & {} \qquad = \varepsilon ^2 g(t,x,u,\varepsilon ) u _{xx}+\varepsilon h(t,x,u,\varepsilon )u_{xy} + k(t,x,u,\varepsilon )u_{yy} \end{aligned}$$
(1.11)
Symmetric hyperbolic system:
$$\begin{aligned}{} & {} A(t,x,\varepsilon u,\varepsilon )u_t+B(t,x,\varepsilon u,\varepsilon )u_x+\tfrac{1}{\varepsilon }C(t,x) u_y {} \nonumber \\{} & {} \quad +D(t,x,u,\varepsilon )u_y+f(t,x,u,\varepsilon )=0\end{aligned}$$
(1.12)
\(2\times 2\) symmetric hyperbolic system with coefficients depending on y:
$$\begin{aligned}{} & {} A(t,x,y,\varepsilon u,\varepsilon ) u_t+B(t,x,y,\varepsilon u,\varepsilon )u_x+\tfrac{1}{\varepsilon }C(t,x,y)u_y \nonumber \\{} & {} \quad +D(t,x,y,u,\varepsilon )u_y+f(t,x,y,u,\varepsilon )=0 \end{aligned}$$
(1.13)

The conditions under which uniform existence holds and asymptotics can be determined will be presented in detail before and in the theorems about each of those equations. In particular, we do not claim that all solutions of those equations having sufficiently smooth and bounded initial data exist for a time independent of \(\varepsilon \); to the contrary, we will present in Example 6.3 an equation of the form (1.10) and smooth bounded initial data for which uniform existence does not hold. Furthermore, our results for the system (1.12) require severe restrictions when that system contains three or more components, and our results for (1.13) require severe restrictions even though that system is assumed to be \(2\times 2\).

In all these equations, having a term \( D u_y\) is slightly more general than including dependence on \(\varepsilon u\) and \(\varepsilon \) in the term \(\frac{1}{\varepsilon }Cu_y\), because the O(1) part of \(\frac{1}{\varepsilon }C(\ldots ,\varepsilon u,\varepsilon )u_y\) would necessarily be affine-linear in u. The similarly slightly more general terms \(\varepsilon A_1(\ldots ,u,\varepsilon ) u_t+\varepsilon B_1(\ldots ,u,\varepsilon )u_x\) could be included without difficulty; such terms have been omitted for notational simplicity. As will be seen in the limit equations below, the \(O(\varepsilon )\) part of A and B appear in the limit just like D, which is effectively the \(O(\varepsilon )\) part of C. This is a general phenomenon for fast singular limits, e.g. [22, (2.18)].

Although (1.12) may appear to be the most difficult equation to treat since it involves systems of arbitrary size, it is in fact the only one of the four for which our results can almost be obtained from geometric optics results, specifically [13]. Even for that system our results are slightly more general in that the matrix A multiplying the time derivatives is not restricted to be the identity matrix, and we point out how the results there apply to singular limit systems beyond those obtained directly from geometric optics as considered in [13]. However, our largest contribution concerning (1.12) is a proof that is quite different, in the spirit of singular limits, and simpler than the proof for general geometric optics problems in [13]. Its key points will be pointed out during the course of the proof, in Sect. 4. A comparison of our result for (1.12) with corresponding results for geometric optics problems obtained from (1.12) via the transformation (1.2), and of the phase functions occurring in each version, will be presented in Sect. 6.3.

Geometric optics results do not apply to the parabolic Eq. (1.11) on account of the presence of the second-order terms. In fact, even the existence of solutions to that equation for a time that might depend on \(\varepsilon \) is not obvious because the second-order terms are nonlinear and non-uniformly parabolic. Nevertheless, thanks to the scaling of the parabolic terms by powers of \(\varepsilon \), the large term can be eliminated from (1.11) using the transformation (1.3) with the same fast phase function (1.7) as for (1.4). Uniform existence will then be obtained for the transformed system by applying the recent results of [25].

A characteristic feature of the singular limit systems in [13, §4] arising from geometric optics is that the coefficients appearing in the equations do not depend on the variables with respect to which derivatives are taken in the large terms. Both (1.11) and (1.12) also share this feature since the large terms involve derivatives only with respect to y and none of the coefficients depend on that variable. However, in (1.10) and (1.13) the coefficients do depend on y, which introduces new and interesting complications. First, it is no longer possible to eliminate the large term in the PDE by an appropriate choice of the fast phase function \(\mu \), because \(\mu \) must still be independent of y to avoid introducing a term of size \(\frac{1}{\varepsilon ^2}\) while the coefficients now do depend on y. The way to resolve this difficulty is to generalize the ansatz (1.3) by assuming that the solution depends on y only through the combination \(Y(t,x,y)-\frac{\mu (t,x)}{\varepsilon }\), for some appropriate function Y. The function Y is now chosen so as to eliminate the large term, while \(\mu \) is determined by the requirement that Y satisfy a functional equation ensuring that periodicity with respect to y translates into periodicity with respect to Y. That requirement introduces an averaging operator into the equation determining \(\mu \), such that when the coefficients do not actually depend on y then \(\mu \) reduces to the function satisfying (1.7) and Y(txy) reduces to y. A second complication that arises from the dependence of the coefficients of (1.10) on y is that if the only phase included in the ansatz is \(Y(t,x,y)-\frac{\mu (t,x)}{\varepsilon }\) then the coefficients of the transformed equations would depend on \(\frac{\mu (t,x)}{\varepsilon }\) and hence would have derivatives of order \(\frac{1}{\varepsilon }\). We overcome this problem by adding \(\frac{\mu }{\varepsilon }\) as an additional phase, which paradoxically introduces a new large term. However, that new large term can be rendered harmless by changing the time variable in a manner reminiscent of the change of time variable for the system (1.12), and the resulting limit equation does not retain any dependence on the extra phase.

For the Eq. (1.10) it is possible to obtain uniform existence but not the asymptotic behavior by combining \(\varepsilon \)-weighted estimates for derivatives obtained by the method of characteristics (cf. [12, Proposition 6.1.1]) with the technique of solving the PDE for the derivative appearing in the large term (cf. [21]). Details are presented in Sect. 6.4 for completeness. That approach only works for scalar hyperbolic equations because for parabolic equations solving the PDE for a first-order large term would lose a derivative, while for systems \(\varepsilon \)-weighted derivative estimates do not suffice because uniform estimates for derivatives are needed in order to obtain a uniform \(L^\infty \) bound. Hence, as for the parabolic Eq. (1.11), even the uniform existence part of our results for certain systems of the form (1.13) is new.

In all these equations, the variable x may be replaced by several variables \(x_i\), with only straightforward modifications to the results and proofs, possibly including an increase in the the required smoothness of the coefficients and initial data. However, increasing the number of y variables would be more difficult except possibly in certain cases for which the approach here could be combined with methods of geometric optics.

When the system (1.12) has three or more components but the sufficient conditions given here do not hold then uniform existence is likely to fail. However, when the conditions for the system (1.13) do not hold it is plausible that uniform existence may still hold in some cases but with the asymptotics of the solutions being more complicated. We hope to consider such cases and other equations requiring a more general ansatz for their asymptotic structure in future work.

Although this paper is primarily concerned with theoretical aspects of the equations studied, it may be noted that (1.10) is a transport equation and example (6.19) of a system of the form (1.12) is the well-known system form of a wave equation, i.e., the component u of that system satisfies \(u_{tt}-b(x)^2u_{xx}-\frac{c(x)^2}{\varepsilon ^2} u_{yy}=0\). Since both transport equations and wave equations occur in a variety of physical contexts our results may also have physical relevance.

1.2 Theorems

In the following theorems, C denotes a constant independent of \(\varepsilon \) that may be different in each appearance, and \(C^s_B\) denotes the space of functions having s continuous derivatives that are bounded uniformly in the independent variables, on any compact set of the dependent variables.

Before the statement of each theorem we present definitions and notations that are used in that theorem, and possibly also in subsequent theorems. In some cases the definitions involve key ideas of the proofs, which are presented here despite their length because the definitions are needed in order to properly state the theorems. Some remarks explaining the assumptions and conclusions are also presented.

As hinted in Sect. 1.1, the asymptotic form of solutions to the scalar hyperbolic Eq. (1.10) will be \(u\sim U^0(\mu (t,x),x,Y(t,x,y)-\frac{\mu (t,x)}{\varepsilon })\). In order to state the theorem for the solutions to that PDE we must first define the equations that determine the fast part \(\mu \) and the slow part Y of the phase function, and define the coefficients that will appear in the PDE for the asymptotic profile U. We begin by defining averaging and limit operations.

For any function g that is periodic with period P in a variable w and may also depend on other variables, define

$$\begin{aligned} \left\langle {g}\right\rangle _{w}\mathrel {:=}\tfrac{1}{P}\int _0^P g\,dw. \end{aligned}$$
(1.14)

In addition, for any function g depending continuously on \(\varepsilon \), define

$$\begin{aligned} {g}_{(0)}=\lim _{\varepsilon \rightarrow 0} g. \end{aligned}$$
(1.15)

Assume that

$$\begin{aligned} \text {all the coefficients in}\, (1.10) \text {are periodic in}\, y\,\text {with the same period,} \end{aligned}$$
(1.16)

that (1.9) holds, and that

$$\begin{aligned} |c(t,x,y,v)|\ge c_{\min }>0. \end{aligned}$$
(1.17)

Since the change of variables \(y\mapsto -y\) replaces \(\partial _y\) with \(-\partial _y\), which in essence replaces c with \(-c\), by making that change of variables if necessary we can normalize c to satisfy

$$\begin{aligned} c(t,x,y,v)\ge c_{\min }>0. \end{aligned}$$
(1.18)

Let \(\mu (t,x)\) be the solution of the initial-value problem

$$\begin{aligned} \left\langle {\frac{{a}_{(0)}}{{{{c}_{(0)}}}}}\right\rangle _{\!\!y}\mu _t+\left\langle {\frac{{b}_{(0)}}{{{{c}_{(0)}}}}}\right\rangle _{\!\!y}\mu _x=1, \qquad \mu (0,x)\equiv 0,\end{aligned}$$
(1.19)

and define Y(txy) by

$$\begin{aligned} Y_y\mathrel {:=}\frac{{a}_{(0)}}{{{{c}_{(0)}}}} \,\mu _t+ \frac{{b}_{(0)}}{{{{c}_{(0)}}}}\,\mu _x, \qquad Y(t,x,0)=0.\end{aligned}$$
(1.20)

Here and later the fast phase vanishes at time zero because fast oscillations are not present in the assumed form of the initial data. Equation (1.20) links the slow part Y(txy) and the fast part \(\mu (t,x)\) of the phase function \(\varepsilon Y-\mu \). Hence it is not possible to first make a change of independent variables \(y\mapsto Y(t,x,y)\) to reduce to the case when the slow part of the phase is simply the independent variable as in (1.5), and afterwards look for the appropriate fast phase.

Let y(txY) denote the inverse function of Y considered to be a function of y, i.e., the function such that \(y(t,x,Y(t,x,y))\equiv y\), and let \(t(\tau ,x)\) denote the inverse with respect to t of the function

$$\begin{aligned} \tau \mathrel {:=}\mu (t,x), \end{aligned}$$
(1.21)

both of which will be shown to exist at least for small times. Then for any function g of y and other variables define

$$\begin{aligned} {\widehat{g}}(\tau ,x,Y,v)\mathrel {:=}g(t(\tau ,x),x,y(t(\tau ,x),x,Y),v). \end{aligned}$$
(1.22)

It will be shown in Lemma 2.1 that if g is periodic in y with the same period as the coefficients of (1.10) then \({\widehat{g}}\) is periodic in Y with the same period, so that \(\left\langle {{\widehat{g}}}\right\rangle _{Y}\) is well defined. It is not necessary to determine y(txY) in order to calculate \(\left\langle {{\widehat{g}}}\right\rangle _{Y}\), since the change of variables \(y=y(t,x,Y)\) transforms the integral \(\tfrac{1}{P}\int _0^P g(t,x,y(t,x,Y))\,dY\) into \(\tfrac{1}{P}\int _0^P g(t,x,y) \left[ \frac{{a}_{(0)}}{{{{c}_{(0)}}}} \,\mu _t+ \frac{{b}_{(0)}}{{{{c}_{(0)}}}}\,\mu _x\right] \,dy\).

In addition, define

$$\begin{aligned} A(t,x,y,v,\varepsilon )\mathrel {:=}a(t,x,y,v,\varepsilon )\mu _t(t,x)+b(t,x,y,v,\varepsilon )\mu _x(t,x) \end{aligned}$$
(1.23)

and

$$\begin{aligned}{} & {} D(t,x,y,u,\varepsilon ){:=}Y_y(t,x,y) d(t,x,y,u,\varepsilon )+a(t,x,\varepsilon u,\varepsilon )Y_t(t,x,y) \nonumber \\{} & {} \quad +b(t,x,\varepsilon u,\varepsilon )Y_x(t,x,y)- \frac{(a-{a}_{(0)})\mu _t+(b-{b}_{(0)})\mu _x}{\varepsilon }. \end{aligned}$$
(1.24)

Finally, the norms that will appear in the theorems are

$$\begin{aligned} \Vert u\Vert _{X^s_{\varepsilon ,T}}\mathrel {:=}\sup _{0\le t\le T}\, \sum _{0\le |\alpha |\le s} \left\| (\varepsilon \partial _x)^{\alpha _1}(\partial _y)^{\alpha _2}(\varepsilon \partial _t)^{\alpha _3} u\right\| _{X^0} \end{aligned}$$
(1.25)

with \(X=C\) or \(X=H\), where T is a positive number, \(\alpha =(\alpha _1,\alpha _2,\alpha _3)\) is a multi-index with nonnegative components, s is a positive integer, and \(\varepsilon \) is the small parameter appearing in the PDE. The reason that y-derivatives are not multiplied by powers of \(\varepsilon \) in the norm (1.25) is that there is no dependence on y in the fast phase(s) \(\mu (t,x)\) or \(\mu ^{(j)}(t,x)\) that appear multiplied by \(\frac{1}{\varepsilon }\) in the asymptotic forms of the solutions.

A collection \(u^{(\varepsilon )}\) of functions depending on \(\varepsilon \) is asymptotic in \(X^s_{\varepsilon ,T,\text {loc}}\) to a profile \(u^{(0)}\) possibly depending on \(\varepsilon \) is a specified manner if for every \(C^\infty \) function \(\phi \) with compact support in the spatial variables, not depending on \(\varepsilon \), the \(X^s_{\varepsilon ,T}\) norm of \(\phi \, (u^{(\varepsilon )}-u^{(0)})\) converges to zero as \(\varepsilon \rightarrow 0\).

Theorem 1.1

Assume that (1.9), (1.16), and (1.17) hold, and normalize c as described after (1.17) so that (1.18) holds.

  1. 1.

    Assume that for some \(s\ge 1\) the coefficients a, b, and c belong to \(C^{s+1}_B\) and the coefficients d and f and the initial data \(u_0\) belong to \(C^s_B\). Then for any positive \(\varepsilon _0\) there exists a time \(T>0\) such that for \(0<\varepsilon \le \varepsilon _0\) the solution of (1.10) having initial data \(u_0\) exists and belongs to \(C^{s}_B\) for \(0\le t\le T\), and satisfies

    $$\begin{aligned} \sup _{0<\varepsilon \le \varepsilon _0} \Vert u\Vert _{C^s_{\varepsilon ,T}} \le K. \end{aligned}$$
    (1.26)

    Moreover, there is a \({\widetilde{T}}>0\) such that as \(\varepsilon \rightarrow 0\) the solution u is asymptotic in \(C^{s-1}_{\varepsilon ,{\widetilde{T}},\text {loc}}\) (and hence in particular in \(C^0_\text {loc}\)) to

    $$\begin{aligned} U^{(0)}(\mu (t,x),x,Y(t,x,y)-\tfrac{\mu (t,x)}{\varepsilon }), \end{aligned}$$

    where \(U^{(0)}(\tau ,x,z)\) is the unique solution of the limit profile equation

    $$\begin{aligned}{} & {} U^{(0)}_\tau +\left\langle {\frac{{\widehat{b}}_{(0)}}{{{\widehat{A}}}_{(0)}}}\right\rangle _{\!\!Y}\!\!(\tau ,x)\,U^{(0)}_x +\left\langle {\frac{{{\widehat{D}}}_{(0)}}{{\widehat{A}}_{(0)}}}\right\rangle _{\!\!Y}\!\!(\tau ,x,U^{(0)})\,U^{(0)}_z\nonumber \\{} & {} \quad +\left\langle {\frac{{{\widehat{f}}}_{(0)}}{{\widehat{A}}_{(0)}}}\right\rangle _{\!\!Y}\!\!(\tau ,x,U^{(0)})=0 \end{aligned}$$
    (1.27)

    satisfying

    $$\begin{aligned} U^{(0)}(0,x,z)=u^0(x,z), \end{aligned}$$
    (1.28)

    where the notations \(\widehat{}\), \({}_{(0)}\), and \(\left\langle {}\right\rangle _{Y}\) are defined in (1.22), (1.15), and (1.14), respectively, A is defined in (1.23), and D is defined in (1.24).

  2. 2.

    Now assume that \(s\ge 2\), and let \(\tau ^{(0)}\) be any time such that the limit profile \(U^{(0)}\) exists and has a finite \(C^1\) norm for \(0\le \tau \le \tau ^{(0)}\). Let

    $$\begin{aligned} T^{(0)}\mathrel {:=}\mathop {\textrm{argmin}}\limits _t\left\{ \inf _x \partial _t \mu (t,x)=0 \quad \text {or}\quad \sup _x \mu (t,x)=\tau ^{(0)}\right\} \end{aligned}$$
    (1.29)

    be the smallest time t at which \(\mu \) either stops being increasing in t or takes on the value \(\tau ^{(0)}\), possibly at infinity. Then there exists a positive \(\varepsilon _1\) such that for \(0<\varepsilon \le \varepsilon _1\) the solution u exists and has finite \(C^1\) norm for \(0\le t\le T^{(0)}\), and there is a finite K such that for \(0<\varepsilon \le \varepsilon _1\)

    $$\begin{aligned} \Vert u(t,x,y)-U^{(0)}(\mu (t,x),x,Y(t,x,y)-\tfrac{\mu (t,x)}{\varepsilon })\Vert _{C^{s-2}_{\varepsilon ,T^{(0)}}} \le K\varepsilon . \end{aligned}$$
    (1.30)

    In particular, \(\Vert u-U^{(0)}\Vert _{C^0}\le K\varepsilon \). Moreover, there exists a function \(U(\tau ,x,z,\eta ;\varepsilon )\) such that

    $$\begin{aligned} u(t,x,y)\equiv U(\mu (t,x),x,Y(t,x,y)-\tfrac{\mu (t,x)}{\varepsilon },\tfrac{\mu (t,x)}{\varepsilon };\varepsilon ), \end{aligned}$$
    (1.31)

    and

    $$\begin{aligned} \max _{0\le \tau \le \tau ^{(0)}}\Vert U(\tau ,x,z,\eta ;\varepsilon )-U^{(0)}(\tau ,x,z)\Vert _{C^{s-2}}\le K\varepsilon . \end{aligned}$$
    (1.32)

Remark 1.2

  1. 1.

    Besides being used in Lemma 2.1, which is only needed when c depends on y, there is an additional reason why c is assumed to be bounded away from zero in Theorem 1.1, which will be explained in Sect. 6.2.

  2. 2.

    In [13, Sect. 5.1] the time variable \(\tau \) of the transformed equation is assumed to equal the original time variable t, which avoids the somewhat involved condition in (1.29) that defines \(T^{(0)}\). For simplicity, in the subsequent theorems that involve a change of the time variable we will just prove existence and asymptotics for some time independent of \(\varepsilon \), although a more precise version similar to (1.29) could also be proven for those theorems.

  3. 3.

    Although it is more natural to estimate \(u-U^{(0)}\) in the original variables (txy), as in (1.30), the estimate (1.32) in the transformed variables \((\tau ,x,z,\eta )\) is stronger since none of the spatial derivatives are weighted by \(\varepsilon \). The latter estimate therefore makes clearer that u has the specific asymptotic form \(U^{(0)}(\mu (t,x),x,Y(t,x,y)-\tfrac{\mu (t,x)}{\varepsilon })\).

We next define some notations that will be used in the statement of the theorem concerning (1.11). Recalling the notation \({}_{(0)}\) defined in (1.15), let \(\mu \) be the unique solution of

$$\begin{aligned} {a}_{(0)}(t,x)\mu _t+{b}_{(0)}(t,x)\mu _x= c(t,x),\qquad \mu (0,x)\equiv 0, \end{aligned}$$
(1.33)

which is the special case of (1.19) in which a, b, and c are independent of y. Define

$$\begin{aligned}&\begin{aligned}D(t,x,U,\varepsilon )&\mathrel {:=}d(t,x,U,\varepsilon ) -\tfrac{a(t,x,\varepsilon U,\varepsilon )-{a}_{(0)}(t,x)}{\varepsilon }\mu _t -\tfrac{b(t,x,\varepsilon U,\varepsilon )-{b}_{(0)}(t,x)}{\varepsilon }\mu _x {}\\ {}&+\varepsilon \mu _{xx}g(t,x,U,\varepsilon ), \end{aligned}\end{aligned}$$
(1.34)
$$\begin{aligned}&K(t,x,U,\varepsilon )\mathrel {:=}k(t,x,U,\varepsilon )-h(t,x,U,\varepsilon )\mu _x+g(t,x,U,\varepsilon )\mu _x^2. \end{aligned}$$
(1.35)

Theorem 1.3

Assume that (1.9) holds and that the coefficients g, h, and k satisfy the nonuniform parabolicity condition

$$\begin{aligned} \begin{aligned}&\alpha ^2 g(t,x,u,\varepsilon )+\alpha \beta h(t,x,u,\varepsilon )+\beta ^2 k(t,x,u,\varepsilon )\ge 0 \\ {}&\text {for all real}\, \alpha \,\text {and}\, \beta \,\text {and all}\, (t,x,u,\varepsilon ). \end{aligned} \end{aligned}$$
(1.36)

Suppose that, for some integer \(s\ge 4\), the coefficients a, b, c from (1.11) belong to \(C^{s+2}_B\), the coefficients d, f, g, h, and k there and also \(\int _0^1 \tfrac{\partial f}{\partial u}(t,x,ru)\,dr\) belong to \(C^s_B\), and the initial data \(u_0(x,y)\) belongs to \(H^{2s}\). Then

  1. 1.

    The solution of that PDE having initial data \(u_0\) exists and belongs to \(H^{s}\) for at least a positive time T independent of \(\varepsilon \).

  2. 2.

    Now assume that \(s\ge 6\). Let \(U^{(0)}(t,x,z)\) be the unique solution, which exists and belongs to \(H^s\) on some time interval \([0,T^{(0)}]\), of

    $$\begin{aligned} \begin{aligned}&{a}_{(0)}(t,x)U^{(0)}_t+{b}_{(0)}(t,x)U^{(0)}_x+{D}_{(0)}(t,x,U^{(0)})U^{(0)}_z +{f}_{(0)}(t,x,U^{(0)})\\ {}&\quad ={K}_{(0)}(t,x,U^{(0)})U^{(0)}_{zz}, \\ {}&U^{(0)}(0,x,z)=u_0(x,z). \end{aligned}\end{aligned}$$
    (1.37)

    Then on the time interval \([0,\min (T,T^{(0)})]\)

    $$\begin{aligned} \Vert u(t,x,y)-U^{(0)}(t,x,y-\tfrac{\mu (t,x)}{\varepsilon })\Vert _{H^{s-2}_{\varepsilon ,\min (T,T^{(0)})}}\le C\varepsilon . \end{aligned}$$
    (1.38)

    Moreover, there exists a function \(U(t,x,z,\varepsilon )\) such that

    $$\begin{aligned} u(t,x,y)\equiv U(t,x,y-\tfrac{\mu (t,x)}{\varepsilon },\varepsilon ), \end{aligned}$$
    (1.39)

    and

    $$\begin{aligned} \max _{0\le t\le \min (T,T^{(0)})}\Vert U(t,x,z,\varepsilon )-U^{(0)}(t,x,z)\Vert _{H^{s-2}}\le C\varepsilon . \end{aligned}$$
    (1.40)

In particular, \(\Vert u-U^{(0)}\Vert _{C^0}\le C\varepsilon \).

Remark 1.4

  1. 1.

    The powers of \(\varepsilon \) multiplying the diffusion terms \(u_{xx}\) and \(u_{xy}\) in (1.11) are needed to ensure that the coefficient K of the diffusion term \(U_{zz}\) in the Eq. (3.2) for U contains no powers of \(\frac{1}{\varepsilon }\). Due to the presence of those powers, the nonlinear diffusion terms in (1.11) cannot be uniformly parabolic uniformly in \(\varepsilon \). Consequently, classical results for nonlinear uniformly parabolic equations cannot be used. We shall apply instead recent results [25, Theorems 2.7, 4.1] for non-uniformly parabolic equations. The extra smoothness required in Theorem 1.3 and the other requirements there on the coefficients are conditions of those results.

  2. 2.

    In contrast to (1.10), it does not seem possible to allow the coefficients of (1.11) to depend on y, because changing the time variable of (1.11) to \(\tau \mathrel {:=}\mu (t,x)\) as for (1.10) would yield a term involving the second derivative with respect to \(\tau \) not having a fixed sign, arising from the parabolic terms in (1.11).

We next consider system (1.12). By defining fast phases in an appropriate manner and making transformations of both independent and dependent variables it is possible under certain conditions to obtain a system for which the large terms have constant coefficients, which will ensure that solutions of the original system (1.12) exist for a time independent of \(\varepsilon \). The most important condition is (1.49), which as noted below corresponds to a special case of the coherence assumption of [13]. While this correspondence motivates assumption (1.49), we do not use any results from [13] in the proof of the theorem for system (1.12), but instead show by direct calculation that (1.49) ensures that the large terms of the transformed system have constant coefficients.

Phases. We will let the fast phases \(\mu ^{(j)}\) be the solutions of the generalization

$$\begin{aligned} 0=\det \left( -\mu _t {A}_{(0)}(t,x)-\mu _x {B}_{(0)}(t,x)+ C(t,x)\right) , \qquad \mu (0,x)\equiv 0 \end{aligned}$$
(1.41)

of condition (1.7) that held for the prototypical equation (1.4). Now that the equation is a system it is no longer possible to make the large terms vanish entirely, but (1.41) ensures that a generalization of that condition will hold, namely that the matrices \({\widehat{C}}^{(j)}\) defined below in (1.45) will have determinant zero.

For any matrix M of size n, let \(\{\lambda _j(M)\}_{j=1}^n\) denote the set of its eigenvalues, whose order will be chosen according to some rule when needed. Since the vanishing of \(\mu \) at time zero implies that \(\mu _x(0,x)\) also vanishes, and we will assume that A is positive definite, (1.41) implies that

$$\begin{aligned} \mu ^{(j)}_t(0,x)=\lambda _j\!\!\left( ({A}_{(0)})^{-\frac{1}{2}} C({A}_{(0)})^{-\frac{1}{2}}{\big |_{t=0}}\right) \end{aligned}$$
(1.42)

for some ordering of those \(\mu ^{(j)}\) and \(\lambda _j\). We will assume that for some choice of the eigenvalues \(\lambda _1\) and \(\lambda _2\),

$$\begin{aligned} \begin{aligned} \lambda _1\!\!\left( ({A}_{(0)})^{-\frac{1}{2}} C({A}_{(0)})^{-\frac{1}{2}}{\big |_{t=0}}\right) -\lambda _2\!\left( ({A}_{(0)})^{-\frac{1}{2}} C({A}_{(0)})^{-\frac{1}{2}}{\big |_{t=0}}\right) \ge c>0, \end{aligned}\nonumber \\ \end{aligned}$$
(1.43)

and let \(\mu ^{(1)}\) and \(\mu ^{(2)}\) be the corresponding solutions of (1.41) satisfying (1.42).

Transformations. Define the new time variable by

$$\begin{aligned} \tau (t,x)\mathrel {:=}\mu ^{(1)}(t,x)-\mu ^{(2)}(t,x). \end{aligned}$$
(1.44)

As will be shown, (1.43) implies that \(\tau (t,x)\) can be inverted with respect to its first variable, at least on some time interval. Let \(t(\tau ,x)\) denote the inverse function, and define

$$\begin{aligned} \begin{aligned} {\widehat{A}}(\tau ,x,v,\varepsilon )&\mathrel {:=}(\mu ^{(1)}_t(t,x)-\mu ^{(2)}_t(t,x))A(t,x,v,\varepsilon ) \\ {}&\quad + (\mu ^{(1)}_x(t,x)-\mu ^{(2)}_x(t,x))B(t,x,v,\varepsilon ){\big |_{t=t(\tau ,x)}}, \\ {\widehat{C}}^{(j)}(\tau ,x)&\mathrel {:=}C(t,x)-\mu ^{(j)}_t(t,x) {A}_{(0)}(t,x)-\mu ^{(j)}_x(t,x) {B}_{(0)}(t,x){\big |_{t=t(\tau ,x)}}, \\ {}&{\widehat{D}}^{(j)}(\tau ,x,v,\varepsilon )\mathrel {:=}D(t,x,v,\varepsilon ) \\ {}&-\tfrac{ \mu ^{(j)}_t(t,x) [A(t,\varepsilon v,\varepsilon )-{A}_{(0)}(t,x)]+\mu ^{(j)}_x(t,x) [B(t,x,\varepsilon v,\varepsilon )-{B}_{(0)}(t,x)]}{\varepsilon }{\big |_{t=t(\tau ,x)}}, \\ {\widehat{M}}(\tau ,x,v,\varepsilon )&\mathrel {:=}M(t,x,v,\varepsilon ){\big |_{t=t(\tau ,x)}} \qquad \text{ for }\ M\in \{B,f\}. \end{aligned}\nonumber \\ \end{aligned}$$
(1.45)

We will show that \({{\widehat{A}}}_{(0)}\) is positive definite. Using that matrix, define, for any matrix-valued function \({\widehat{M}}\) of \((\tau ,x,\varepsilon U,\varepsilon )\) or \((\tau ,x, U,\varepsilon )\),

$$\begin{aligned} \widetilde{M}(\tau ,x,w,\varepsilon )\mathrel {:=}({{\widehat{A}}}_{(0)})^{-1/2}{\widehat{M}}(\tau ,x,({\widehat{A}}_{(0)})^{-1/2}w,\varepsilon )({{\widehat{A}}}_{(0)})^{-1/2}. \end{aligned}$$
(1.46)

To begin the final set of transformations, let \(r^{(j)}(\tau ,x)\), \(j=1,\cdots , n\), where n is the length of the vector u in (1.12), be the normalized eigenvectors of the matrix \({{{\widetilde{C}}}}^{(1)}(\tau ,x)\), chosen to be orthogonal if they correspond to a repeated eigenvalue, and let \(R(\tau ,x)\) be the matrix whose columns are the \(r^{(j)}\). For any matrix-valued function \({\widetilde{M}}\) define

$$\begin{aligned} {\mathcal {M}}(\tau ,x,w,\varepsilon )\mathrel {:=}R^T(\tau ,x){\widetilde{M}}(\tau ,x,R^Tw,\varepsilon ) R(\tau ,x), \end{aligned}$$
(1.47)

where any arguments not present in \({\widetilde{M}}\) are omitted from \({\mathcal {M}}\) as well, and define

$$\begin{aligned} \begin{aligned}&{\mathcal {F}}(\tau ,x,w,\varepsilon )\mathrel {:=}R^T({\widehat{A}}_{(0)})^{-\frac{1}{2}}{\widehat{f}}(\tau ,x,{\widehat{A}}_{(0)})^{-\frac{1}{2}}R^Tw,\varepsilon ) \\ {}&\qquad \qquad \quad +R^T({{\widehat{A}}}_{(0)})^{-\frac{1}{2}}\! \big \{{\widehat{A}}(\tau ,x,\varepsilon ({{\widehat{A}}}_{(0)})^{-\frac{1}{2}}R^T w,\varepsilon )\partial _{\tau }[({{\widehat{A}}}_{(0)})^{-\frac{1}{2}}R ] \\ {}&\qquad \qquad \quad +{\widehat{B}}(\tau ,x,\varepsilon ({{\widehat{A}}}_{(0)})^{-\frac{1}{2}}R^T w,\varepsilon )\partial _x [({{\widehat{A}}}_{(0)})^{-1/2}R ] \big \}w. \end{aligned} \end{aligned}$$
(1.48)

Large operator. In order to obtain a constant-coefficient large operator, we will assume that all solutions \(\mu ^{(j)}(t,x)\) of (1.41) satisfy

$$\begin{aligned} \mu ^{(j)}(t,x)\equiv \alpha ^{(j)}\mu ^{(1)}(t,x)+(1-\alpha ^{(j)})\mu ^{(2)}\quad \text {for some constants}\, \alpha ^{(j)},\nonumber \\ \end{aligned}$$
(1.49)

where

$$\begin{aligned} \begin{aligned}&\mu ^{(1)} \,\text {and}\, \mu ^{(2)}\, \text {are the solutions of} (1.41) \text {satisfying} (1.42) \\ {}&\text {for}\, j=1,2, \,\text {with}\, \lambda _1 \,\text {and}\, \lambda _2 \,\text {being eigenvalues satisfying} (1.43). \end{aligned} \end{aligned}$$
(1.50)

Then define

$$\begin{aligned} \begin{aligned} {\mathcal {L}}&\mathrel {:=}\mathop {\textrm{diag}}\limits ((\alpha ^{(j)}-1)\partial _{z_1}+\alpha ^{(j)}\partial _{z_2}), \\ \mathbb {P}&\mathrel {:=}L^2\text {-orthogonal projection onto null space of}\, {\mathcal {L}}, \end{aligned} \end{aligned}$$
(1.51)

where \(\mathop {\textrm{diag}}\limits (s^{(j)})\) denotes the diagonal matrix or operator with diagonal entries \(s^{(j)}\). Finally, as usual let \(e^{(j)}\) denote the vector whose \(j^{\hbox {th}}\) component is one and whose other components are zero.

Theorem 1.5

Assume that the matrices A, B, C, and D are symmetric, that D and \(f(t,x,u,\varepsilon )\) belong to \(C^s_B\) for some \(s\ge 4\), that A, B, and C belong to \(C^{s+2}_B\), and that the initial data \(u_0\) belongs to \(H^s\). Assume also that A satisfies the generalization

$$\begin{aligned} {A}_{(0)}(t,x)\ge cI \qquad \hbox { with}\ c>0 \end{aligned}$$
(1.52)

of (1.9). In addition, assume that there exist eigenvalues \(\lambda _1\) and \(\lambda _2\) of

$$\begin{aligned} ({A}_{(0)})^{-\frac{1}{2}}{C}_{(0)}({A}_{(0)})^{-\frac{1}{2}} \end{aligned}$$

satisfying (1.43), and that (1.49)–(1.50) hold. Then there exists a positive time \(T_{\min {}}\) such that the solution u(txy) of (1.12) with the initial data \(u_0\) exists for \(0\le t\le T_{\min {}}\) and satisfies

$$\begin{aligned} \begin{aligned} \big \Vert {}&{} u(t,x,y) -[({{\widehat{A}}}_{(0)})^{-1/2}R]{\big |_{\tau =\tau (t,x)}} \mathcal {U}^{(0)}(\tau (t,x),x,y\\ {}&-\tfrac{\mu ^{(1)}(t,x)}{\varepsilon },y-\tfrac{\mu ^{(2)}(t,x)}{\varepsilon })\big \Vert _{H^{s-1}_{\varepsilon ,T_{\min {}}}} \le C\varepsilon , \end{aligned}\end{aligned}$$
(1.53)

where \(\tau (t,x)\) is defined in (1.44) and \(\mathcal {U}^{(0)}\) is the unique solution of

$$\begin{aligned} { \begin{aligned}&\mathbb {P}\big [ {\mathcal {U}}^{(0)}_\tau + {\mathcal {B}}_{(0)}{\mathcal {U}}^{(0)}_x +{{\mathcal {D}}}_{(0)}^{(1)}\mathcal {U}^{(0)}_{z_1}+{{\mathcal {D}}}_{(0)}^{(2)}\mathcal {U}^{(0)}_{z_2}+{{\mathcal {F}}}_{(0)}\big ]=0, \\ {}&{\mathcal {L}} {\mathcal {U}}^{(0)}=0, \\ {}&{\mathcal {U}}^{(0)}(0,x,z_1,z_2)={\mathcal {U}}_0(x,z_1,z_2) \mathrel {:=}\sum _{j=1}^n \big [ e^{(j)}\cdot \big [ R^T({\widehat{A}}_{(0)})^{1/2}\big ]{\big |_{\tau =0}}\\ {}&u_0(x,\alpha ^{(j)} z_1+(1-\alpha ^{(j)})z_2)\big ] e^{(j)}. \end{aligned}} \end{aligned}$$
(1.54)

Moreover, there exists a function \({\mathcal {U}}(\tau ,x,z_1,z_2,\varepsilon )\) and a positive \(\tau _{\min }\) such that

$$\begin{aligned} u(t,x,y)\equiv [ ({\widehat{A}}_{(0)})^{-1/2}R]{\big |_{\tau =\tau (t,x)}} \mathcal {U}(\tau (t,x),x,y-\tfrac{\mu ^{(1)}(t,x)}{\varepsilon },y-\tfrac{\mu ^{(2)}(t,x)}{\varepsilon },\varepsilon )\nonumber \\ \end{aligned}$$
(1.55)

and

$$\begin{aligned} \max _{0\le \tau \le \tau _{\min {}}}\Vert \mathcal {U}(\tau ,x,z_1,z_2,\varepsilon )- \mathcal {U}^{(0)}(\tau ,x,z_1,z_2)\Vert _{H^{s-1}}\le K\varepsilon . \end{aligned}$$
(1.56)

In particular,

$$\begin{aligned} \begin{aligned}&\big \Vert u(t,x,y) - \, [({{\widehat{A}}}_{(0)})^{-1/2}R]{\big |_{\tau =\tau (t,x)}} \mathcal {U}^{(0)}(\tau (t,x),x,y-\tfrac{\mu ^{(1)}(t,x)}{\varepsilon },\\ {}&\quad y-\tfrac{\mu ^{(2)}(t,x)}{\varepsilon })\big \Vert _{C^0} \le K\varepsilon . \end{aligned}\end{aligned}$$

Remark 1.6

  1. 1.

    The second equation of (1.54) says that the \(j^{\hbox {th}}\) component of \({\mathcal {U}}^{(0)}\) depends on the variables \(z_1\) and \(z_2\) only through the linear combination \(\alpha ^{(j)} z_1+(1-\alpha ^{(j)})z_2\). In other words, each component of \({\mathcal {U}}\) depends in the limit on its own phase \(y-\frac{\alpha ^{(j)}\mu ^{(1)}+(1-\alpha ^{(j)})\mu ^{(2)}}{\varepsilon }\) but not on the phases of the other components.

  2. 2.

    Condition (1.49) always holds for \(j=1,2\), with \(\alpha ^{(1)}=1\) and \(\alpha ^{(2)}=0\), so that assumption only restricts systems that are \(3\times 3\) or larger. For those systems that restriction is quite severe, as will be illustrated in Example 6.6. In terms of the geometric-optics phases \(\phi ^{(j)}\mathrel {:=}{\widehat{y}}-\mu ^{(j)}(t,x)\), where \(\widehat{y}\mathrel {:=}\varepsilon y\), condition (1.49) ensures that the \(\phi ^{(j)}\) satisfy the coherence condition [13, Definition 2.1.1].

  3. 3.

    The equation \(\mathbb {P}V=0\) does not necessarily imply the existence of a function W such that \(V={\mathcal {L}}{\mathcal {W}}\) because of the problem of small divisors. Cf. [22, pp. 486– 487]. The results of [20, 21] on the existence of such a W do not apply here, because \(\widehat{\mathcal {L}}({\mathbf {\xi }})=\xi _1 \mathop {\textrm{diag}}\limits (\alpha ^{(j)}-1)+\xi _2\mathop {\textrm{diag}}\limits (\alpha ^{(j)})\) does not satisfy the assumption there that the dimension of the null space of \(\widehat{{\mathcal {L}}}\) be independent of \(\mathbf \xi \) for all \({\mathbf \xi }\in \mathbb {R}{\setminus }\{0\}\).

  4. 4.

    Theorems 1.5 and 1.7 below require that the matrices A, B, and C have one more derivative than was needed for Theorem 1.1, because the derivatives of \(({{\widehat{A}}}_{(0)})^{-\frac{1}{2}}\) appearing in (1.48) contain second derivatives of the \(\mu ^{(j)}\), which are as smooth as those matrices.

Some of the definitions for our final theorem are similar to those for Theorem 1.5, others are generalizations of definitions for Theorem 1.1, and a few are unique to Eq. (1.13). The key new condition, which unfortunately is quite restrictive, is (1.82) below, which ensures that the dependence of the coefficients on y does not produce any large term in the transformed system.

Phases. In the theorem for system (1.13) we will assume not only that A is positive definite, but also that C is invertible. Define the slow parts \(Y^{(k)}\) and the fast parts \(\mu ^{(k)}\) of the phases \(\varepsilon Y^{(k)}-\mu ^{(k)}\) by

$$\begin{aligned} Y^{(k)}_y(t,x,y)\mathrel {:=}\mu ^{(k)}_t\lambda _k\big (C^{-1}A_{(0)}+\tfrac{\mu ^{(k)}_x}{\mu ^{(k)}_t}C^{-1}B_{(0)}\big ),\qquad Y^{(k)}(t,x,0)\equiv 0,\nonumber \\ \end{aligned}$$
(1.57)

and

$$\begin{aligned} 1\equiv \mu ^{(k)}_t\left\langle {\lambda _k\big (C^{-1}A_{(0)}+\tfrac{\mu ^{(k)}_x}{\mu ^{(k)}_t}C^{-1}B_{(0)}\big )}\right\rangle _{\!y},\qquad \mu ^{(k)}(0,x)\equiv 0. \end{aligned}$$
(1.58)

Alternatively, the initial condition on the \(Y^{(k)}\) in (1.57) can be replaced by the normalization

$$\begin{aligned} \int _{-\frac{P}{2}}^{\frac{P}{2}} Y^{(k)}(t,x,y)\,dy=0, \end{aligned}$$
(1.59)

where P is the period of the coefficients with respect to the variable y, which ensures that the Fourier series in y of \(Y^{(k)}(t,x,y)-y\) will have no constant term.

Since a condition \(0=\det (w M-N)\) is equivalent to requiring that w be an eigenvalue of \(M^{-1}N\), (1.57)–(1.58) generalize both (1.19)–(1.20) and (1.41), and will ensure that the matrices \({\widehat{C}}^{(k)}\) defined below in (1.72) will have determinant zero. The reason for using \(\mu ^{(k)}_t\lambda _k(C^{-1}A_{(0)}+\tfrac{\mu ^{(k)}_x}{\mu ^{(k)}_t}C^{-1}B_{(0)})\) rather than the simpler expression \(\lambda _k(\mu ^{(k)}_t C^{-1}A_{(0)}+\mu ^{(k)}_xC^{-1}B_{(0)})\) is to avoid complications arising from the fact that \(\lambda _k(c M)\) might equal \(c \lambda _{3-k}(M)\) rather than \(c\lambda _k(M)\) when \(c<0\), depending on the way the ordering of the eigenvalues is determined.

The assumptions on A and C ensure that the eigenvalues \(\lambda _k(C^{-1}A{\big |_{t=0}})\) are nonzero, although not necessarily positive, which will imply that both \(\lambda _k\big (C^{-1}A_{(0)}+\tfrac{\mu ^{(k)}_x}{\mu ^{(k)}_t}C^{-1}B_{(0)}\big )\) and \(\mu ^{(k)}_t\) are nonzero for sufficiently small times. In addition, we will assume that

$$\begin{aligned} \left| \lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big ) -\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )\right| \;\bigg |_{{ t=0}}\ge c>0. \end{aligned}$$
(1.60)

By exchanging the labels 1 and 2 if necessary we can then arrange that

$$\begin{aligned} \frac{\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )-\lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}{ \lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )} \;\Bigg |_{{t=0}}\ge c>0. \end{aligned}$$
(1.61)

Transformations. Define the new time and spatial variables by

$$\begin{aligned} \begin{aligned} \tau (t,x,y,\varepsilon )&\mathrel {:=}\varepsilon (z_2(t,x,y,\tfrac{1}{\varepsilon })-z_1(t,x,y,\tfrac{1}{\varepsilon })), \\ z_j(t,x,y,\tfrac{1}{\varepsilon })&\mathrel {:=}Y^{(j)}(t,x,y)-\tfrac{\mu ^{(j)}(t,x)}{\varepsilon }, \end{aligned} \end{aligned}$$
(1.62)

which generalize (1.44) and the formula \(z=y-\frac{\mu (t,x)}{\varepsilon }\) used implicitly in (1.31)–(1.32). In system (1.12) the coefficients did not depend on y, so there was no need to express y in terms of the new spatial variables \(z_1\) and \(z_2\). For Eq. (1.10) the formula for y took the simple form \(y=z+\eta \). However, for the system (1.13) the relationship between y and \((z_1,z_2)\) is more complicated, and in particular is intertwined with the relationship between the old and new time variables t and \(\tau \). In order that terms of size \(O(\frac{1}{\varepsilon })\) like those appearing in the formula for the \(z_k\) in (1.62) will not appear in the formula for y in terms of \((z_1,z_2)\), that formula will be derived using the functions

$$\begin{aligned} G(t,x,z_1,z_2)\mathrel {:=}\tfrac{\mu ^{(1)}(t,x)}{t}z_2-\tfrac{\mu ^{(2)}(t,x)}{t}z_1 \end{aligned}$$
(1.63)

and

$$\begin{aligned} Q(t,x,y)\mathrel {:=}\tfrac{\mu ^{(1)}(t,x)}{t}Y^{(2)}(t,x,y)-\tfrac{\mu ^{(2)}(t,x)}{t}Y^{(1)}(t,x,y). \end{aligned}$$
(1.64)

The point is that the formula for G together with (1.62) implies that

$$\begin{aligned} G(t,x,z_1(t,x,y,\tfrac{1}{\varepsilon }),z_2(t,x,y,\tfrac{1}{\varepsilon }))\equiv Q(t,x,y) \end{aligned}$$
(1.65)

on account of the cancellation of the \(O(\tfrac{1}{\varepsilon })\) term \(\frac{\mu ^{(1)}\mu ^{(2)}}{\varepsilon t}\).

It will be shown that Q is invertible with respect to y. Let \(Q^{-1}\) denote the corresponding inverse function, i.e., the function such that

$$\begin{aligned} y\equiv Q^{-1}(t,x,Q(t,x,y)) \quad \text {and} \quad Y\equiv Q(t,x,Q^{-1}(t,x,Y)). \end{aligned}$$
(1.66)

Applying \(Q^{-1}(t,x,\cdot )\) to both sides of (1.65) yields

$$\begin{aligned} Q^{-1}(t,x,G(t,x,z_1(t,x,y,\tfrac{1}{\varepsilon }),z_2(t,x,y,\tfrac{1}{\varepsilon })))\equiv Q^{-1}(t,x,Q(t,x,y))\equiv y.\nonumber \\ \end{aligned}$$
(1.67)

We therefore define

$$\begin{aligned} {\mathcal {Y}}(t,x,z_1,z_2)\mathrel {:=}Q^{-1}(t,x,G(t,x,z_1,z_2)). \end{aligned}$$
(1.68)

Using \({\mathcal {Y}}\), define next

$$\begin{aligned} \begin{aligned} {\widehat{\tau }}(t,x,z_1,z_2,\varepsilon )\mathrel {:=}&\tau (t,x,\mathcal {Y}(t,x,z_1,z_2),\varepsilon ) =\left( \mu ^{(1)}(t,x)-\mu ^{(2)}(t,x)\right) \\ {}&+\varepsilon \left( Y^{(2)}(t,x,{\mathcal {Y}}(t,x,z_1,z_2))-Y^{(1)}(t,x,{\mathcal {Y}}(t,x,z_1,z_2))\right) . \end{aligned}\nonumber \\ \end{aligned}$$
(1.69)

We will also show that \({\widehat{\tau }}\) is invertible with respect to its first variable. Let \({\widehat{t}}\) denote the corresponding inverse function, i.e., the function such that

$$\begin{aligned} t\equiv {\widehat{t}}( {{\widehat{\tau }}}(t,x,z_1,z_2,\varepsilon ),x,z_1,z_2,\varepsilon )\quad \text {and}\quad \tau \equiv {{\widehat{\tau }}}(\widehat{t}(\tau ,x,z_1,z_2,\varepsilon ),x,z_1,z_2,\varepsilon ).\nonumber \\ \end{aligned}$$
(1.70)

We therefore define

$$\begin{aligned} \widehat{{\mathcal {Y}}}(\tau ,x,z_1,z_2,\varepsilon )\mathrel {:=}{\mathcal {Y}}({\widehat{t}}(\tau ,x,z_1,z_2,\varepsilon ),x,z_1,z_2). \end{aligned}$$
(1.71)

We are now ready to define the first form of the transformed coefficients, generalizing (1.45):

(1.72)

where the term \(\frac{{\widehat{C}}^{(k)}-{\widehat{C}}_{(0)}^{(k)}}{\varepsilon }\) was included in \({\widehat{D}}\) because we will use \({{\widehat{C}}}_{(0)}^{(k)}\) rather than \({\widehat{C}}^{(k)}\) in what follows. We now proceed in similar fashion to (1.46)–(1.47): Define, for any matrix function \({\widehat{M}}\),

$$\begin{aligned} \widetilde{M}(\tau ,x,z_1,z_2,w,\varepsilon )\mathrel {:=}({{\widehat{A}}}_{(0)})^{-1/2} \widehat{M}(\tau ,x,z_1,z_2,({\widehat{A}}_{(0)})^{-1/2}w,\varepsilon )({{\widehat{A}}}_{(0)})^{-1/2},\nonumber \\ \end{aligned}$$
(1.73)

where any arguments absent in \({\widehat{M}}\) are also omitted from \({\widetilde{M}}\). Next, we let \(\{r^{(j)}(\tau ,x,z_1,z_2)\}_{j=1}^2\) be the normalized eigenvectors of \({{\widetilde{C}}_{(0)}}^{(1)}(\tau ,x,z_1,z_2)\) and let \(R(\tau ,x,z_1,z_2)\) be the matrix whose columns are the \(r^{(j)}\), and for any matrix-valued function \({\widetilde{M}}\) define

$$\begin{aligned} {\mathcal {M}}(\tau ,x,w,\varepsilon )\mathrel {:=}R^T(\tau ,x,z_1,z_2){\widetilde{M}}(\tau ,x,z_1,z_2,R^Tw,\varepsilon ) R(\tau ,x,z_1,z_2),\nonumber \\ \end{aligned}$$
(1.74)

where any arguments not present in \({\widetilde{M}}\) are omitted from \({\mathcal {M}}\) as well. In slightly different fashion than (1.48), define

$$\begin{aligned} \begin{aligned}&{\mathcal {F}}(\tau ,x,w,\varepsilon )\mathrel {:=}\tfrac{1}{\rho (\tau .x.z_1,z_2,\varepsilon )}R^T({\widehat{A}}_{(0)})^{-\frac{1}{2}} \Big \{{\widehat{f}}(\tau ,x,{{\widehat{A}}}_{(0)})^{-\frac{1}{2}}R^Tw,\varepsilon ) \\&\quad +\Big ({\widehat{A}}(\tau ,x,\varepsilon ({{\widehat{A}}}_{(0)})^{-\frac{1}{2}}R^T w,\varepsilon )\partial _{\tau }[\rho (\tau ,x,z_1,z_2,\varepsilon )({{\widehat{A}}}_{(0)})^{-\frac{1}{2}}R ] \\&\quad +{\widehat{B}}(\tau ,x,\varepsilon ({{\widehat{A}}}_{(0)})^{-\frac{1}{2}}R^T w,\varepsilon ) \partial _x [\rho (\tau ,x,z_1,z_2,\varepsilon )({{\widehat{A}}}_{(0)})^{-1/2}R ] \Big ) w \Big \}, \end{aligned} \end{aligned}$$
(1.75)

where \(\rho (\tau ,x,z_1,z_2,\varepsilon )\) is a scalar function to be chosen later. Its purpose is to make \(\rho ({\widehat{A}}_{(0)})^{-\frac{1}{2}}R\) independent of \((z_1,z_2)\), because otherwise we would need to add to \({\mathcal {F}}\) the terms

$$\begin{aligned} \frac{1}{\varepsilon }\tfrac{1}{\rho }R^T({{\widehat{A}}}_{(0)})^{-\frac{1}{2}} {\mathcal {C}}^{(k)}_{(0)}\partial _{z_k}[\rho ({\widehat{A}}_{(0)})^{-\frac{1}{2}}R]w, \end{aligned}$$
(1.76)

which would make the estimates non-uniform in \(\varepsilon \).

Large operator. Define

$$\begin{aligned} \begin{aligned} {\mathcal {L}}&\mathrel {:=}\left( {\begin{matrix}0&{}0\\ 0&{}-1\end{matrix}}\right) \partial _{z_1} +\left( {\begin{matrix}1&{}0\\ 0&{}0\end{matrix}}\right) \partial _{z_2}, \\ \mathbb {P}&\mathrel {:=}L^2\text {-orthogonal projection onto null space of}\, {\mathcal {L}} =\left( {\begin{matrix}\left\langle {\cdot }\right\rangle _{\!z_2}&{}0 \\ 0&{} \left\langle {\cdot }\right\rangle _{\!z_1} \end{matrix}}\right) . \end{aligned} \end{aligned}$$
(1.77)

Theorem 1.7

Assume that A, B, C, and D are symmetric \(2\times 2\) matrices, that D and f belong to \(C^s\) for some \(s\ge 4\), that A, B, and C belong to \(C^{s+2}\), and that the initial data \(u_0\) belong to \(H^s\). Assume also that

$$\begin{aligned} A(t,x,y,\varepsilon u,\varepsilon )\ge cI \qquad \text {for some positive constant}~c \end{aligned}$$
(1.78)

and

$$\begin{aligned} |\det C(t,x,y)|\ge c_{\min }>0, \end{aligned}$$
(1.79)

that (1.60)–(1.61) hold, and that

$$\begin{aligned} \left. \tfrac{\lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}{\left\langle {\lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}\right\rangle _{\!y}} -\tfrac{\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}{\left\langle {\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}\right\rangle _{\!y}} \;\right| _{{t=0}}\equiv 0. \end{aligned}$$
(1.80)

Finally, assume that there exists a scalar function \(\rho (\tau ,x,z_1,z_2)\) satisfying

$$\begin{aligned} 0<c_1\le \rho (\tau ,x,z_1,z_2)\le c_2 \end{aligned}$$
(1.81)

such that

$$\begin{aligned} \partial _{z_k}[\rho (\tau ,x,z_1,z_2) ({\widehat{A}}_{(0)}(\tau ,x,z_1,z_2))^{-\frac{1}{2}}R(\tau ,x,z_1,z_2) ]\equiv 0, \quad k=1,2.\nonumber \\ \end{aligned}$$
(1.82)

Then there exists a positive time \(T_{\min {}}\) such that the solution u(txy) of (1.13) with the initial data \(u_0\) exists for \(0\le t\le T_{\min {}}\) and satisfies

$$\begin{aligned} \begin{aligned} \bigg \Vert&u(t,x,y) -\left\{ [\rho ({{\widehat{A}}}_{(0)})^{-1/2}R]\, {\mathcal {U}}^{(0)}(\tau ,x,z_1,z_2)\right\} _{\Big |\begin{array}{c} \tau = \tau (t,x,y,0)\\ z_k=Y^{(k)}(t,x,y)-\frac{\mu ^{(k)}(t,x)}{\varepsilon } \end{array}} \\ {}&\quad \bigg \Vert _{H^{s-1}_{\varepsilon ,T_{\min {}}}} \le C\varepsilon , \end{aligned}\nonumber \\ \end{aligned}$$
(1.83)

where \(\tau \), \(\mu ^{(k)}\), and \(Y^{(k)}\) are defined in (1.62), (1.58), and (1.57), respectively, and the asymptotic profile \(\mathcal {U}^{(0)}(\tau ,x,z_1,z_2)\) is the unique solution of

$$\begin{aligned} \begin{aligned}&\mathbb {P}\big [ {\mathcal {U}}^{(0)}_\tau + {\mathcal {B}}_{(0)}{\mathcal {U}}^{(0)}_x +{{\mathcal {D}}}_{(0)}^{(1)}\mathcal {U}^{(0)}_{z_1}+{{\mathcal {D}}}_{(0)}^{(2)}\mathcal {U}^{(0)}_{z_2}+{{\mathcal {F}}}_{(0)}\big ]=0, \\&{\mathcal {L}} {\mathcal {U}}^{(0)}=0, \\&{\mathcal {U}}^{(0)}(0,x,z_1,z_2)={\mathcal {U}}_0(x,z_1,z_2) \mathrel {:=}\begin{pmatrix} e^{(1)}\cdot \big [ \tfrac{1}{\rho }R^T({{\widehat{A}}}_{(0)})^{1/2}\big ]{\big |_{\tau =0}}u_0(x,z_1))\\ e^{(2)}\cdot \big [ \tfrac{1}{\rho }R^T({\widehat{A}}_{(0)})^{1/2}\big ]{\big |_{\tau =0}}u_0(x,z_2)). \end{pmatrix} \end{aligned}\nonumber \\ \end{aligned}$$
(1.84)

Moreover, there exists a function \({\mathcal {U}}(\tau ,x,z_1,z_2,\varepsilon )\) and a positive \(\tau _{\min }\) such that

$$\begin{aligned} u(t,x,y)\equiv \left\{ \rho ({\widehat{A}}_{(0)})^{-1/2}R\, \mathcal {U}(\tau ,x,z_1,z_2,\varepsilon )\right\} _{ \Big |\begin{array}{l} \tau = \tau (t,x,y,\varepsilon ) \\ z_k=Y^{(k)}(t,x,y)-\frac{\mu ^{(k)}(t,x)}{\varepsilon }\\ \end{array}} \end{aligned}$$
(1.85)

and

$$\begin{aligned} \max _{0\le \tau \le \tau _{\min {}}}\Vert \mathcal {U}(\tau ,x,z_1,z_2,\varepsilon )- \mathcal {U}^{(0)}(\tau ,x,z_1,z_2)\Vert _{H^{s-1}}\le K\varepsilon . \end{aligned}$$
(1.86)

In particular,

$$\begin{aligned} \Vert u(t,x,y)-\left\{ \rho ({{\widehat{A}}}_{(0)})^{-1/2}R\, \mathcal {U}^{(0)}(\tau ,x,z_1,z_2)\right\} _{\Big |\begin{array}{c} \tau = \tau (t,x,y,0)\\ z_k=Y^{(k)}(t,x,y)-\frac{\mu ^{(k)}(t,x)}{\varepsilon } \end{array}}\Vert _{C^0}\le K\varepsilon . \end{aligned}$$

Remark 1.8

  1. 1.

    As will be shown in the proof of Theorem 1.7, condition (1.80) ensures that the new time variable vanishes identically when the original time variable does. Condition (1.80) is therefore analogous to [13, Assumption 2.3.1] that requires that some linear combination of the phases vanish identically at time zero.

  2. 2.

    As in Theorem 1.5, it would be possible to allow systems of size greater than two in Theorem 1.7, provided that condition (1.49) plus a similar condition on the slow parts \(Y^{(j)}\) of the phases hold, with the same coefficients \(\alpha ^{(j)}\) in both. However, in addition to the restrictions those conditions impose, condition (1.82) would become more much more restrictive as the size of the system increases.

  3. 3.

    In all the theorems, the time of existence depends on the initial data only through the norm in which it is assumed to be bounded.

The results for the PDEs (1.10), (1.11), (1.12), and (1.13) will be proven in Sects. 2, 3, 4, and 5, respectively.

2 Scalar PDE with y-periodic coefficients

We begin by showing some properties of the transformation from y to Y. In order that the result will also be applicable to system (1.13), (1.17) rather than its normalized version (1.18) will be assumed.

Lemma 2.1

Assume that the coefficients a, b, and c in (1.10) are periodic with period P in y and belong to \(C^r_B\) for some \(r\ge 1\), and that a and |c| satisfy the positivity conditions (1.9) and (1.17), respectively. Then there exists a positive time T such that the following hold for \(0\le t\le T\):

  1. 1.

    The transformation of the independent variables \(y\mapsto Y\) defined by (1.20) is \(C^{r}\), has a \(C^{r}\) inverse \(Y\mapsto y(t,x,Y)\), and satisfies

    $$\begin{aligned} Y(t,x,y+P)\equiv Y(t,x,y)+P,\qquad y(t,x,Y+P)\equiv y(t,x,Y)+P. \end{aligned}$$
    (2.1)
  2. 2.

    For any function g(txyv) that is periodic with period P in y, the function \({\widehat{g}}\) defined in (1.22) satisfies

    $$\begin{aligned} \begin{aligned} {\widehat{g}}(\tau ,x,Y+P,v) \equiv {\widehat{g}}(\tau ,x,Y,v), \end{aligned} \end{aligned}$$
    (2.2)

    i.e., \({\widehat{g}}\) is periodic in Y with period P.

Proof

The claimed smoothness of Y(txy) follows from its definition and the assumed smoothness of the coefficients a, b, and c together with the assumed positivity of |c|. The definitions (1.20) of Y and (1.19) of \(\mu \) together with the assumed positivity of a and |c| ensure that \(Y_y(0,x,y)\ge k>0\), because (1.19) implies that \(\mu _t\) has the same sign as c at time zero. The PDE satisfied by \(\mu \) together with the assumed smoothness and boundedness of the coefficients a, b, and c then ensure that there exists a positive T such that \(Y_y(t,x,y)\ge \frac{k}{2}\) for \(0\le t\le T\). Hence Y(txy) is invertible with respect to y in that time interval, and the smoothness of its inverse y(txY) follows from the smoothness of Y(txy) together with the positivity of \(Y_y\).

Integrating the equation for \(Y_y\) in (1.20) from y to \(y+P\) and using the PDE (1.19) for \(\mu \) yields

$$\begin{aligned} Y(t,x,y+P)-Y(t,x,y)=P \left( \left\langle {\frac{{a}_{(0)}}{{{{c}_{(0)}}}}}\right\rangle _{\!\!y}\mu _t +\left\langle {\frac{{b}_{(0)}}{{{{c}_{(0)}}}}}\right\rangle _{\!\!y}\mu _x \right) =P, \end{aligned}$$
(2.3)

which is the first identity in (2.1). Applying \(y(t,x,\cdot )\) to both sides of the first identity in (2.1) yields \( y(t,x,Y(t,x,y)+P)=y(t,x,Y(t,x,y+P)) =y+P\), and substituting \(y=y(t,x,Y)\) into the far right and far left of that result shows that the second identity in (2.1) also holds. In particular, for any function g(txyv) that is periodic with period P in y, the function \(\widehat{g}(\tau ,x,Y,v)\) defined in (1.22) satisfies

$$\begin{aligned} {\hat{g}}(\tau ,x,Y+P,v)= & {} g(\tau ,x,y(t(\tau ,x),x,Y+P),v) =g(\tau ,x,y(t(\tau ,x),x,Y)+P,v)\\= & {} g(\tau ,x,y(t(\tau ,x),x,Y),v)={\hat{g}}(\tau ,x,Y,v) \end{aligned}$$

which shows that (2.2) holds, since the transformation from t to \(\tau \) only involves the variables t and x and so does not affect periodicity with respect to Y. \(\square \)

Proof of Theorem 1.1

Differentiating the first identity in (2.1) with respect to x shows that \(\frac{\partial Y}{\partial x}_{|y\mapsto y+P}=\frac{\partial Y}{\partial x}\), i.e., \(\frac{\partial Y}{\partial x}\) is periodic in y. Hence \(\widehat{\tfrac{\partial Y}{\partial x}}\) defined as in (1.22) is periodic in Y by (2.2), and the same holds for \(\tfrac{\partial Y}{\partial t}\) and \(\tfrac{\partial Y}{\partial y}\).

Next, the positivity of a and the \(C^0_B\) bound for c imply a positive lower bound for the coefficient \(\left\langle {\frac{{a}_{(0)}}{{{{c}_{(0)}}}}}\right\rangle _{\!\!y}\) of \(\mu _t\) in (1.19). Hence the PDE and initial condition there imply that for sufficiently small times the transformation from t and x to \(\tau \mathrel {:=}\mu (t,x)\) and x is invertible, and in view of the assumed smoothness of the coefficients both \(\mu (t,x)\) and its inverse function \(t(\tau ,x)\) belong to \(C^{s+1}\).

We now calculate that the function u defined by (1.31) will satisfy (1.10) provided that \(U(\tau ,x,z,\eta )\) satisfies the PDE

$$\begin{aligned} \begin{aligned} A(t(\tau ,x),x,&y(t,x,z+\eta ),\varepsilon U,\varepsilon )\left[ U_\tau +\tfrac{1}{\varepsilon }U_\eta \right] +b(t,x,y(t,x,z+\eta ),\varepsilon U,\varepsilon )U_x \\ {}&+D(t,x,y(t,x,z+\eta ),U,\varepsilon )U_z+f(t,x,y(t,x,z+\eta ),U,\varepsilon )=0, \end{aligned}\nonumber \\ \end{aligned}$$
(2.4)

where A and D are defined in (1.23)–(1.24). The initial data for U is

$$\begin{aligned} U(0,x,z,\eta )=u^0(x,y(0,x,z)). \end{aligned}$$
(2.5)

The positivity of a, together with the PDE and initial condition satisfied by \(\mu \) and the fact that a depends on U only through \(\varepsilon U\) imply that A is positive up to some positive time T independent of \(\varepsilon \). Hence we can divide (2.4) by A to obtain a PDE in which both the time derivative and the large term have constant coefficients. Standard \(C^0\) estimates along characteristics, similar to those in [12, Sect. 6.2], then show that U and its spatial derivatives through order s are uniformly bounded up to some time independent of \(\varepsilon \). Moreover, since the initial data is independent of \(\eta \), \(U_\tau \) is uniformly bounded initially, and then similar estimates show that \(U_\tau \) is uniformly bounded in \(C^{s-1}\). Standard results for singular limits [22, Theorem 2.1, Theorem 2.3, Corollary 2.4], adapted to \(C^s\) spaces rather than the standard \(H^s\) spaces, therefore yield the convergence of U to a limit that is independent of \(\eta \) and satisfies the limit PDE obtained by averaging over \(\eta \) the PDE obtained by dividing (2.4) by A, thereby eliminating the large term, and taking the limit of the result as \(\varepsilon \rightarrow 0\). Since shifting the independent variable preserves averages over that variable, for any function Q

$$\begin{aligned} \left\langle {Q(z+\eta )}\right\rangle _{\eta }=\left\langle {Q(Y)}\right\rangle _{Y}. \end{aligned}$$
(2.6)

In addition, the standard results just cited show that under the extra smoothness assumption the solution exists for as long as the limit exists and converges at the rate \(O(\varepsilon )\) to that limit. Translating those results back from U to u yields the conclusions of the theorem. \(\square \)

3 Scalar parabolic PDE

Define

$$\begin{aligned} G(t,x,U,\varepsilon )\mathrel {:=}g(t,x,U,\varepsilon ),\quad H(t,x,u,\varepsilon )\mathrel {:=}h(t,x,U,\varepsilon )-2g(t,x,U,\varepsilon )\mu _x.\nonumber \\ \end{aligned}$$
(3.1)

We look for a solution u having the form (1.39), where \(\mu \) is the unique solution of (1.33) and \(U(t,x,z,\varepsilon )\) will be the solution of

$$\begin{aligned} \begin{aligned} a(t,x,\varepsilon U)U_t&+ b(t,x,\varepsilon U)U_x +D(t,x,U,\varepsilon )U_z+f(t,x,U) \\ {}&=\varepsilon ^2G(t,x,U,\varepsilon )U_{xx}+\varepsilon H(t,x,U,\varepsilon )U_{xz}+K(t,x,z,\varepsilon )U_{zz} \end{aligned} \end{aligned}$$
(3.2)

satisfying \(U(0,x,z,\varepsilon )=u_0(x,z)\), where D and K were defined in (1.34), (1.35).

Proof of Theorem 1.3

Plugging (1.39) into (1.11) yields

$$\begin{aligned} 0&=a U _t+b U _x+d U _z +f-\varepsilon ^2 g U _{xx}-\varepsilon hU_{xz}- kU_{zz} \\&\qquad -\frac{U_z}{\varepsilon }\left( a\mu _t+b\mu _x-c\right) -g\mu _x^2U_{zz}-k\mu _xU_{zz}+\varepsilon g\left[ \mu _{xx}U_z+2\mu _xU_{xz}\right] . \end{aligned}$$

which by the definitions of \(\mu \) in (1.33), of D in (1.34), and of G and H in (3.1), and K in (1.35) reduces to (3.2). This shows that if \(\mu \) satisfies (1.33) and U satisfies (3.2) then the function u defined by (1.39) satisfies (1.11).

Since equation (1.33) is linear, the solution \(\mu \) belongs to \(C^{s+2}_B\) for all time. Note also that \(\mu \) does not depend on \(\varepsilon \). Hence the coefficients of (3.2) belong to \(C^{s}_B\). Moreover, equation (3.2) for U is nonuniformly parabolic in the sense used in [25], i.e., for all real \(\alpha \) and \(\beta \) and all values of the arguments \((t,x,U,\varepsilon )\),

$$\begin{aligned}&\varepsilon ^2\alpha ^2 G(t,x,U,\varepsilon )+\varepsilon \alpha \beta H(t,x,U,\varepsilon )+\beta ^2 K(t,x,U,\varepsilon ) \\&\quad = \left( \varepsilon \alpha -\beta \mu _x\right) ^2g(t,x,U,\varepsilon )+\left( \varepsilon \alpha -\beta \mu _x\right) \beta h(t,x,U,\varepsilon )+\beta ^2 k(t,x,U,\varepsilon ) \ge 0 \end{aligned}$$

where the inequality in the right hand-side follows from the parabolicity assumption (1.36) with \({\tilde{\alpha }}=\varepsilon \alpha -\beta \mu _x\) and \({\tilde{\beta }}=\beta \). In particular, taking \(\alpha \) equal to zero shows that

$$\begin{aligned} K(t,x,U)\ge 0. \end{aligned}$$
(3.3)

The assumptions of Theorem 1.3 imply that the conditions of [25, Theorem 4.1] hold, so by that theorem there exists a unique solution U of (3.2) having an \(H^s\) bound on some time interval [0, T]. Since the coefficients in (3.2) depend smoothly on \(\varepsilon \), that bound and time of existence are independent of \(\varepsilon \) for \(0<\varepsilon \le \varepsilon _0\).

Hence the right hand-side of (1.39) exists for at least a time independent of \(\varepsilon \) and satisfies (1.11) with initial data \(u_0\). Since [25, Theorem 4.1] can also be applied to (1.11) for fixed \(\varepsilon \) and that theorem includes a uniqueness result, the identity (1.39) must hold.

Similarly, in view of (3.3), the PDE (1.37) also satisfies the assumptions of [25, Theorem 4.1], so under the additional assumptions of the theorem the solution \(U^{(0)}\) of that equation exists and belongs to \(H^s\) on some time interval \([0,T^{(0}]\).

To prove the error estimate, note that the corrector \(U^{(1)}=\frac{U-U^{(0)}}{\varepsilon }\) satisfies

$$\begin{aligned} \begin{aligned}&a(t,x,\varepsilon U,\varepsilon )U^{(1)}_t\!+\! b(t,x,\varepsilon U,\varepsilon )U^{(1)}_x\!+\!D(t,x,U,\varepsilon )U^{(1)}_z \!+\!\tfrac{f(t,x,U,\varepsilon )-{f}_{(0)}(t,x,U^{(0)})}{\varepsilon }\\&\hspace{1cm}- \varepsilon ^2 G(t,x,U,\varepsilon )U^{(1)}_{xx}-\varepsilon H(t,x,U,\varepsilon )U^{(1)}_{xz} -K(t,x,U,\varepsilon )U^{(1)}_{zz} \\ {}&\quad =R\mathrel {:=}-\delta a\,U^{(0)}_t- \delta b\,U^{(0)}_x-\delta D\,U^{(0)}_z- \delta f + \varepsilon G \,U^{(0)}_{xx}+ H\,U^{(0)}_{xz} +\delta K\,U^{(0)}_{zz}, \end{aligned}\nonumber \\ \end{aligned}$$
(3.4)

where for any function M the coefficient \(\delta M\) is defined to be \(\frac{M-{M}_{(0)}}{\varepsilon }\). The coefficients D, f, and K involve U not multiplied by \(\varepsilon \), neither directly multiplying the argument U nor multiplying the entire function. Hence \(\delta M\) with \(M\in \{D,f,K\}\) include terms of the form \(\frac{M(t,x,U,\varepsilon )-M(t,x,U^{(0)},0)}{\varepsilon }\), which in view of the fact that \(U=U^{(0)}+\varepsilon U^{(1)}\) can be written as

$$\begin{aligned} \begin{aligned}&\tfrac{M(t,x,U,\varepsilon )-M(t,x,U^{(0)},0)}{\varepsilon }=\tfrac{M(t,x,U,\varepsilon )-M(t,x,U,0)}{\varepsilon }+\tfrac{M(t,x,U,0)-M(t,x,U^{(0)},0)}{\varepsilon }\\ {}&=\tfrac{M(t,x,U,\varepsilon )-M(t,x,U,0)}{\varepsilon }+\tfrac{1}{\varepsilon }\int _0^1 \tfrac{d}{ds} M(t,x,U^{(0)}+\varepsilon s U^{(1)},0)\,ds \\ {}&=\tfrac{M(t,x,U,\varepsilon )-M(t,x,U,0)}{\varepsilon }+\left[ \int _0^1 M_U(t,x,U^{(0)}+\varepsilon s U^{(1)},0)\,ds\right] U^{(1)} \\ {}&=\tfrac{M(t,x,U,\varepsilon )-M(t,x,U,0)}{\varepsilon }+\left[ \int _0^1 M_U(t,x,(1-s) U^{(0)}+ s U,0)\,ds\right] U^{(1)}. \end{aligned} \end{aligned}$$
(3.5)

After subtracting \(\frac{M(t,x,U,0)-M(t,x,U^{(0)},0)}{\varepsilon }\) from R on the right side of (3.4) and subtracting the equivalent expression \(\left[ \int _0^1 M_U(t,x,(1-s) U^{(0)}+ s U)\,ds\right] U^{(1)}\) from the left side of that equation, the left side of the modified equation is linear in \(U^{(1)}\) and the modified R contains no explicit dependence on \(U^{(1)}\) but only on U and \(U^{(0)}\). Moreover the assumed smoothness of the coefficients together with the fact that U and \(U^{(0)}\) belong to \(H^s\), plus estimates like that in (3.5) but without needing to express U in terms of \(U^{(0)}\) and \(U^{(1)}\), ensure that the modified R is bounded in \(H^{s-2}\) by a constant independent of \(\varepsilon \). Since \(U^{(1)}\) equals \(\frac{U-U^{(0)}}{\varepsilon }\) it certainly exists and belongs to \(H^s\) on \([0,\min (T,T^{(0)})]\), and the coefficients of the equation it satisfies belong to \(C^{s-2}\), while the inhomogeneous term that belongs to \(H^{s-2}\). Hence we can apply [25, Theorem 2.7] with s replaced by \(s-2\) to obtain a uniform bound for \(U^{(1)}\). Moreover, since (3.4) is linear in \(U^{(1)}\), that bound holds on \([0,\min (T,T^{(0)})]\), which yields (1.40) and hence also (1.38). \(\square \)

4 Symmetric hyperbolic system

Proof of Theorem 1.5

The conditions on the fast phases \(\mu ^{(j)}\) in (1.41) and (1.50) together with assumption (1.43) and the boundedness of the coefficients of (1.12) ensure that the variable \(\tau \) defined by (1.44) satisfies

$$\begin{aligned} \tau (0,x)\equiv 0,\qquad 0< c_-\le \tau _t(0,x)\le c_+<\infty . \end{aligned}$$
(4.1)

Hence there is a positive \(T_1\) such that the change of variables \((t,x)\mapsto (\tau ,x)\) is one to one for \(0\le t\le T_1\) and a positive \({\widetilde{T}}_1\) such that its inverse \(t(\tau ,x)\) is defined and one to one for \(0\le \tau \le {\widetilde{T}}_1\). Moreover, (1.42)–(1.43) plus (1.52) ensure that \({{\widehat{A}}}_{(0)}{\big |_{t=0}}\ge cI\) for some positive c, which by the smoothness of A ensures that that condition continues to hold up to some positive time with a possibly smaller yet still positive constant.

We look for solutions having the form

$$\begin{aligned} u(t,x,\varepsilon )=U(\tau (t,x),x,y-\tfrac{\mu ^{(1)}(t,x)}{\varepsilon },y-\tfrac{\mu ^{(2)}(t,x)}{\varepsilon },\varepsilon ). \end{aligned}$$
(4.2)

Substituting (4.2) into (1.12) shows that u will satisfy (1.12) provided that \(U(\tau ,x,z_1,z_2,\varepsilon )\) satisfies

$$\begin{aligned} \begin{aligned} {\widehat{A}}(\tau ,x,\varepsilon U,\varepsilon ) U_\tau&+{\widehat{B}}(\tau ,x,\varepsilon U,\varepsilon )U_x +\tfrac{1}{\varepsilon }\left( {\widehat{C}}^{(1)}(\tau ,x) U_{z_1} + {\widehat{C}}^{(2)}(\tau ,x) U_{z_2}\right) \\ {}&+{\widehat{D}}^{(1)}(\tau ,x, U,\varepsilon ) U_{z_1}+{\widehat{D}}^{(2)}(\tau ,x, U,\varepsilon ) U_{z_2} +{\widehat{f}}(\tau ,x,U,\varepsilon )=0, \end{aligned} \end{aligned}$$
(4.3)

where the \(\;\widehat{}\;\) coefficients are defined in (1.45).

The first simple yet key observation is that by construction

$$\begin{aligned} {\widehat{C}}^{(2)}-{\widehat{C}}^{(1)}\equiv {{\widehat{A}}}_{(0)}. \end{aligned}$$
(4.4)

Applying \(({{\widehat{A}}}_{(0)})^{-1/2}\) on both the left and right of both sides of (4.4) yields

$$\begin{aligned} {\widetilde{C}^{(2)}} - {\widetilde{C}^{(1)}}=I, \end{aligned}$$
(4.5)

where the \(\;\widetilde{}\;\) coefficients are defined in (1.46). Equation (4.5) implies that the eigenvectors \(r^{(j)}(\tau ,x)\) of \({\widetilde{C}}^{(1)}\) defined after (1.46) are also eigenvectors of \({\widetilde{C}}^{(2)}\). Also, since the matrix \({\widetilde{C}}^{(1)}\) is symmetric, the set of its orthonormal eigenvectors \(r^{(j)}(\tau ,x)\) forms a basis, and hence the matrix R whose columns equal those eigenvectors is an orthogonal matrix.

The second simple key observation is that not only does \(\det {{\widetilde{C}}}^{(j)}=0\) for \(j=1,2\) hold on account of the definition of the \(\mu ^{(j)}\), but more generally the assumption (1.49) implies that

$$\begin{aligned} \det (\alpha ^{(j)} {\widetilde{C}}^{(1)}+(1-\alpha ^{(j)}){{\widetilde{C}}}^{(2)})=0, \qquad \text {for all}\, j. \end{aligned}$$
(4.6)

After relabeling if necessary, and allowing repeated solutions \(\mu ^{(j)}\) of (1.41) and \(\alpha ^{(j)}\) of (1.49), the eigenvectors \(r^{(j)}\) are the null eigenvectors of the matrices appearing inside the determinant in (4.6). Combining the resulting equation

$$\begin{aligned} \left[ \alpha ^{(j)} {\widetilde{C}}^{(1)}+(1-\alpha ^{(j)}){{\widetilde{C}}}^{(2)}\right] r^{(j)}=0 \end{aligned}$$
(4.7)

with (4.5) shows that

$$\begin{aligned} {{\widetilde{C}}}^{(1)} r^{(j)}=(\alpha ^{(j)}-1)r^{(j)}, \qquad {{\widetilde{C}}}^{(2)} r^{(j)}=\alpha ^{(j)}r^{(j)}. \end{aligned}$$
(4.8)

The identities (4.8) together with the constancy of the \(\alpha ^{(j)}\) ensure that the multiplicity of each eigenvalue of the \({{\widetilde{C}}}^{(j)}\) is constant for sufficiently small times, and hence that the \(r^{(j)}(\tau ,x)\) can be chosen to be as smooth as the matrices \({{\widetilde{C}}}^{(j)}\). Hence the matrices \(\mathcal {A}\), \({\mathcal {B}}\), \({\mathcal {C}}^{(j)}\), and \({\mathcal {D}}\), and the vector \({\mathcal {F}}\) defined in (1.47)–(1.48) also belong to \(C^s\). Moreover, the identities (4.8) then imply that

$$\begin{aligned} {\mathcal {C}}^{(1)}=\mathop {\textrm{diag}}\limits (\alpha ^{(j)}-1),\qquad {\mathcal {C}}^{(2)}=\mathop {\textrm{diag}}\limits (\alpha ^{(j)}). \end{aligned}$$
(4.9)

Hence, upon substituting

$$\begin{aligned} U=({{\widehat{A}}}_{(0)})^{-1/2}R\,{\mathcal {U}} \end{aligned}$$
(4.10)

into (4.3) and multiplying the result by \(R^T({{\widehat{A}}}_{(0)})^{-1/2}\) we obtain that \({\mathcal {U}}\) satisfies

$$\begin{aligned} {\mathcal {A}} {\mathcal {U}}_\tau +{\mathcal {B}}\mathcal {U}_x +\tfrac{1}{\varepsilon }{\mathcal {L}}{\mathcal {U}} +{\mathcal {D}}^{(1)}\mathcal {U}_{z_1}+{\mathcal {D}}^{(2)}{\mathcal {U}}_{z_2}+{\mathcal {F}}=0, \end{aligned}$$
(4.11)

where

$$\begin{aligned} {\mathcal {L}}\mathrel {:=}{\mathcal {C}}^{(1)} \partial _{z_1}+{\mathcal {C}}^{(2)}\partial _{z_2}. \end{aligned}$$
(4.12)

Formulas (1.46) and (1.47) together with the orthogonality of the matrix R ensure that

$$\begin{aligned} {{\mathcal {A}}}_{(0)}\equiv I, \end{aligned}$$
(4.13)

while substituting (4.9) into (4.12) shows that definition (4.12) agrees with the previous definition (1.51) of \({\mathcal {L}}\); in particular, \({\mathcal {L}}\) has constant coefficients.

We now discuss the initial data \({\mathcal {U}}_0\) for the system (4.11). The condition that \({\mathcal {U}}_0\) correspond to the initial data \(u_0(x,y)\) of the original system (1.12) is

$$\begin{aligned} {\mathcal {U}}_0(x,y,y)=\big [ R^T({\widehat{A}}_{(0)})^{1/2}\big ]{\big |_{\tau =0}} \; u_0(x,y). \end{aligned}$$
(4.14)

The third and final key observation is that the non-uniqueness of \({\mathcal {U}}_0\) arising from the two occurrences of y on the left side of (4.14) can be utilized to make \({\mathcal {U}}_0\) satisfy

$$\begin{aligned} {{\mathcal {L}}}{\mathcal {U}}_0(x,z_1,z_2)=0, \end{aligned}$$
(4.15)

which will ensure that

$$\begin{aligned} \begin{aligned} \Vert {\mathcal {U}}_\tau (0,x,z_1,z_2)\Vert _{H^{s-1}}\le C \qquad \text{ with } C \text{ independent } \text{ of } \varepsilon . \end{aligned} \end{aligned}$$
(4.16)

Formula (1.51) implies that in order to obtain (4.15) the first component of \({\mathcal {U}}\) should be independent of \(z_2\), its second component should not depend on \(z_1\), and in general component j should depend on \(z_1\) and \(z_1\) only via the combination \(\alpha ^{(j)} z_1+(1-\alpha ^{(j)} )z_2\). Hence, by construction, the formula for \({\mathcal {U}}_0\) in (1.54) satisfies both (4.14) and (4.15).

Although the matrix \({\mathcal {A}}\) multiplying the time derivatives in (4.11) depends in general on \(\tau \) and x as well as \(\varepsilon {\mathcal {U}}\), the fact that \({{\mathcal {A}}}_{(0)}=I\) together with the smoothness of the dependence of \({\mathcal {A}}\) on its arguments ensures that \({\mathcal {A}}_\tau \) and \({\mathcal {A}}_x\) are \(O(\varepsilon )\), just as \(\nabla _{\!{\mathcal {U}}}{\mathcal {A}}=O(\varepsilon )\). Together with the fact that the large operator has constant coefficients, this implies that for some positive time \(T_2\) standard energy estimates for the system (4.11) yield an \(H^s\) bound for \(\Vert {\mathcal {U}}\Vert _{H^s}\) that is uniform in \(\varepsilon \), as in the classical theorem [18, Theorem 2.3] for singular limits that assumes that \({\mathcal {A}}\) depends only on \(\varepsilon \mathcal {U}\). Furthermore, the uniform bound (4.16) on the initial value of \({\mathcal {U}}_\tau \) implies, again as in the classical case, a uniform bound for the \(H^{s-1}\) norm of \(\mathcal {U}_\tau \) on the same time interval. The proof of the convergence result for the classical case [18, Theorem 2.3 and comments in proof]), [21, Theorem 2] remains valid when the the limit is given in the form (1.54) even without the assumption [21, (2.7)] on the rank of the large operator. This shows that as \(\varepsilon \rightarrow 0\), \({\mathcal {U}}\) converges in \(H^r\) for \(r<s\) to the unique solution of the initial-value problem (1.54). As in the proof of Theorem 1.1, [22, Corollary 2.4] shows that the rate of convergence is \(O(\varepsilon )\), i.e. (1.56) holds. Transforming back to the original variables yields (1.53). \(\square \)

5 Special \(2\times 2\) systems with y-dependence

Proof of Theorem 1.7

The uniform positivity of \(A_{(0)}\) together with the uniform invertibility of C ensures that the eigenvalues of \(A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\) are bounded away from zero. The continuity of those eigenvalues together with (1.60) therefore ensures that

$$\begin{aligned} \begin{aligned}&\lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big ){\big |_{t=0}}, \qquad \lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big ){\big |_{t=0}}, \\ {}&\text {and}\qquad \left. \left[ \lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )- \lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )\right] \right| _{{t=0}} \end{aligned} \end{aligned}$$

each have a fixed sign. In turn, this ensures that the averages of y of those eigenvalues and of their difference have the same fixed signs as before averaging, and are bounded away from zero. We can therefore replace some or all of the expressions in (1.61) by their averages and that result will still hold. Using this fact we will prove the claims made during the presentation of the definitions used in the theorem, namely that \(\tau \) is invertible with respect to the variable t, that Q is invertible with respect to y, and that \({{\widehat{A}}}_{(0)}\) is positive definite at time zero. In addition, we will show that \(\tau \) is identically zero when \(t=0\). In particular, we will show that up to some positive time

$$\begin{aligned} \mu ^{(1)}_t(t,x)-\mu ^{(2)}_t(t,x)\ge C_1,\qquad \Vert \partial _{t,x}\mu ^{(k)}, \partial _{t,x,y} Y^{(k)},\partial _{t,x,z_1,z_2}{\mathcal {Y}}\Vert _{C^{s-1}}\le C_2,\nonumber \\ \end{aligned}$$
(5.1)

which by the definition (1.62) of \(\tau \) implies the desired invertibility of \(\tau \).

The null initial condition for \(\mu ^{(k)}\) in (1.58) together with the smoothness of the coefficients in the PDE for \(\mu \) there ensure that \(\frac{\mu ^{(k)}}{t}\) remains bounded as \(t\rightarrow 0\) and satisfies

$$\begin{aligned} \lim _{t\rightarrow 0} \tfrac{\mu ^{(k)}}{t}=\mu ^{(k)}_t(0,x)=\frac{1}{\left\langle {\lambda _k(C^{-1}A_{(0)}{\big |_{t=0}})}\right\rangle _{\!y}} =\frac{1}{\left\langle {\lambda _k\big ( A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}{\big |_{t=0}}\big )}\right\rangle _{\!y}},\nonumber \\ \end{aligned}$$
(5.2)

where we have used the fact that matrices that are similar have the same eigenvalues. In particular, the eigenvalues of \(C^{-1}A_{(0)}{\big |_{t=0}}\) are real, bounded, and bounded away from zero, and by (1.60) and the discussion above their difference and the difference of their averages is at least a fixed constant. In particular, \(\mu ^{(k)}_t(0,x)\) is bounded and bounded away from zero. Since real eigenvalues of a \(2\times 2\) matrix with real coefficients can become complex only after they coalesce, condition (1.60) then ensures that up to some time independent of \(\varepsilon \) the eigenvalues of the matrix appearing on the right sides of (1.57)–(1.58) are distinct and real, even though that matrix is not necessarily conjugate to a symmetric matrix for \(t\ne 0\). The smoothness of the coefficient matrices together with (5.2) then ensures that the bounds on \(\mu \) and Y in the second inequality in (5.1) hold. Moreover, once \(Q_y\) is shown to be bounded and bounded away from zero the bounds in (5.1) for \({\mathcal {Y}}\) will also hold. Moreover, by (1.60) plus the discussion above and the smoothness of \(\mu \), (5.2) implies that the first inequality in (5.1) holds.

Substituting (5.2) into (1.57) and using once more the initial condition satisfied by \(\mu ^{(k)}_x\) shows that

$$\begin{aligned} Y^{(k)}_y(0,x,y)=\left. \tfrac{\lambda _k\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}{\left\langle {\lambda _k\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}\right\rangle _{\!y}} \;\right| _{{t=0}}. \end{aligned}$$
(5.3)

Substituting (5.2)–(5.3) into the definition (1.64) of Q and using the assumptions (1.52), (1.79), (1.60), and the \(C^0_B\) bounds on the coefficient matrices A and C shows that Q remains bounded as \(t\rightarrow 0\) and

$$\begin{aligned} \begin{aligned} \lim _{t\rightarrow 0} Q_y(t,x,y)&=\mu ^{(1)}_t(0,x)\mu ^{(2)}_t(0,x)\left[ \lambda _2(C^{-1}A_{(0)}{\big |_{t=0}})-\lambda _1(C^{-1}A_{(0)}{\big |_{t=0}})\right] \\ {}&= \left. \frac{\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )-\lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}{ \left\langle {\lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}\right\rangle _{\!y}\left\langle {\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}\right\rangle _{\!y}} \;\right| _{{t=0}} \end{aligned} \end{aligned}$$
(5.4)

which by (1.61) plus the discussion above is bounded from below by a positive constant. The smoothness of \(\mu \) and the coefficient matrices A and C together with (5.2) implies that \(Q_y\) is \(C^s\), so \(Q_y\) is positive, and hence Q is invertible with respect to y, at least up to some positive time, and \(Q^{-1}\) will be \(C^s\) up to that time. As noted above, this implies the estimates for \({\mathcal {Y}}\) in (5.1).

To show that \(\tau (0,x,y,\varepsilon )\equiv 0\), subtract (5.3) with \(k=1\) from the same equation with \(k=2\) to obtain

$$\begin{aligned} Y^{(2)}_y(0,x,y)-Y^{(1)}_y(0,x,y)= \left. \tfrac{\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}{\left\langle {\lambda _2\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}\right\rangle _{\!y}} -\tfrac{\lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}{\left\langle {\lambda _1\big (A_{(0)}^{\frac{1}{2}}C^{-1}A_{(0)}^{\frac{1}{2}}\big )}\right\rangle _{\!y}} \;\right| _{{t=0}}, \nonumber \\ \end{aligned}$$
(5.5)

which by assumption (1.80) is identically zero. Integrating (5.5) with respect to y and using the second condition in (1.57) or its alternative (1.59) then yields \(Y^{(1)}(0,x,y)-Y^{(2)}(0,x,y)\equiv 0\). Hence, in view of the initial condition \(\mu ^{(k)}(0,x)\equiv 0\) from (1.58), and the definition (1.62) of \(\tau \), \(\tau (0,x,y,\varepsilon )\) is indeed identically zero. In addition, the first estimate in (5.1) together with the definition of \(\tau \) implies that \(\tau >0\) when \(t>0\).

We next show that \({\widehat{A}}_{(0)}\) is positive definite at time zero, and hence also up to some positive time. Using once more the fact that the initial condition for \(\mu \) from (1.58) implies that \(\mu ^{(k)}_x(0,x)\equiv 0\) together with the formulas (5.2)–(5.3) for the initial values of \(\mu ^{(k)}_t\) and \(Y^{(k)}_y\) yields the formula

$$\begin{aligned} \begin{aligned} {\widehat{A}}_{(0)}(0&,x,z_1,z_2,\varepsilon )= \left[ \tfrac{1}{\left\langle {\lambda _1}\right\rangle _{y}}-\tfrac{1}{\left\langle {\lambda _2}\right\rangle _{y}}\right] A_{(0)}+ \left[ \tfrac{\lambda _2}{\left\langle {\lambda _2}\right\rangle _{\!y}} -\tfrac{\lambda _1}{\left\langle {\lambda _1}\right\rangle _{\!y}}\right] C = A_{(0)}^{\tfrac{1}{2}} M A_{(0)}^{\tfrac{1}{2}}, \\ {}&\text {where}\quad M\mathrel {:=}\left[ \tfrac{1}{\left\langle {\lambda _1}\right\rangle _{y}}-\tfrac{1}{\left\langle {\lambda _2}\right\rangle _{y}}\right] + \left[ \tfrac{\lambda _2}{\left\langle {\lambda _2}\right\rangle _{\!y}} -\tfrac{\lambda _1}{\left\langle {\lambda _1}\right\rangle _{\!y}}\right] A_{0}^{-\frac{1}{2}}C A_{0}^{-\frac{1}{2}}, \end{aligned}\nonumber \\ \end{aligned}$$
(5.6)

in which \(\lambda _k\) denotes \(\lambda _k\big (A_{(0)}^{\frac{1}{2}}C^{-1}A^{\frac{1}{2}}_{(0)}\big )\). Since the matrix \(A_{0}^{-\frac{1}{2}}C A_{0}^{-\frac{1}{2}}\) appearing in the last line of (5.6) is the inverse of \(\lambda _k\big (A_{(0)}^{\frac{1}{2}}C^{-1}A^{\frac{1}{2}}_{(0)}\big )\), the eigenvalues of the matrix M appearing in (5.6) are

$$\begin{aligned} \left[ \tfrac{1}{\left\langle {\lambda _1}\right\rangle _{y}}-\tfrac{1}{\left\langle {\lambda _2}\right\rangle _{y}}\right] +\frac{\tfrac{\lambda _2}{\left\langle {\lambda _2}\right\rangle _{\!y}} -\tfrac{\lambda _1}{\left\langle {\lambda _1}\right\rangle _{\!y}}}{\lambda _k}, \quad k=1,2, \end{aligned}$$

which reduce to

$$\begin{aligned} \frac{\lambda _2-\lambda _1}{\lambda _1\left\langle {\lambda _2}\right\rangle _{y}}\qquad \text {and}\qquad \frac{\lambda _2-\lambda _1}{\left\langle {\lambda _1}\right\rangle _{y}\lambda _2}. \end{aligned}$$
(5.7)

The normalization (1.61), together with the fact shown above that any of the expressions in that inequality may be replaced by their averages, ensures that both eigenvalues of the symmetric matrix M are positive, and hence that matrix is positive definite. Since \(A_{(0)}\) is also positive definite by assumption (1.52), (5.6) shows that \({\widehat{A}}_{(0)}\) is positive definite at time zero, and hence also up to some positive time.

We now turn to analyzing the matrices \({{\widetilde{C}}}_{(0)}^{(k)}\) and \({{\mathcal {C}}}_{(0)}^{(k)}\). Substituting (1.57) into the definition of \({\widehat{C}}^{(k)}\) in (1.72), using definition (1.73), and pulling out a factor of \(\mu ^{(k)}_tC\) from the result yields

$$\begin{aligned} {{\widehat{C}}}_{(0)}^{(k)}=\mu ^{(k)}_t C \left\{ \lambda _k\big (C^{-1}A_{(0)}+\tfrac{\mu ^{(k)}_x}{\mu ^{(k)}_t}C^{-1}B_{(0)}\big ) - \big [C^{-1}A_{(0)}+\tfrac{\mu ^{(k)}_x}{\mu ^{(k)}_t}C^{-1}B_{(0)}\big ]\right\} ,\nonumber \\ \end{aligned}$$
(5.8)

which shows that the matrices \({{\widehat{C}}}_{(0)}^{(k)}\), and hence also the matrices \({{\widetilde{C}}}_{(0)}^{(k)}\), are singular, i.e., there exist vectors \(r^{(k)}(\tau ,x,z_1,z_2)\) such that

$$\begin{aligned} { {\widetilde{C}}}_{(0)}^{(k)}r^{(k)}=0. \end{aligned}$$
(5.9)

In similar fashion to (4.4)–(4.5), the identities

$$\begin{aligned} {{\widehat{C}}}_{(0)}^{(2)}- {\widehat{C}}_{(0)}^{(1)}={\widehat{A}}_{(0)}\qquad \text {and}\qquad \widetilde{C}^{(2)}_{(0)}- {\widetilde{C}}^{(1)}_{(0)}=I \end{aligned}$$
(5.10)

hold by construction. Multiplying the second identity in (5.10) by \(r^{(k)}\) on the right and using (5.9) yields

$$\begin{aligned} {\widetilde{C}}^{(2)}_{(0)}r^{(1)}=r^{(1)},\qquad {\widetilde{C}}^{(1)}_{(0)}r^{(2)}=-r^{(2)}. \end{aligned}$$
(5.11)

Hence both \(r^{(k)}\) are eigenvectors of \({\widetilde{C}}_{(0)}^{(1)}\), so their definition here is consistent with their definition before the statement of the theorem. Moreover, (5.9) and (5.11) show that the eigenvalues of \({{\widetilde{C}}}_{(0)}^{(1)}\) are distinct, so the \(r^{(j)}\) are as smooth as \({{\widetilde{C}}}_{(0)}^{(1)}\), namely \(C^s\). Since \({{\widetilde{C}}}_{(0)}^{(1)}\) is symmetric the vectors are orthogonal, so after normalizing them to have length one the matrix R whose columns are the \(r^{(j)}\) is orthogonal, i.e.,

$$\begin{aligned} {\mathcal {R}}^T{\mathcal {R}}=I. \end{aligned}$$
(5.12)

Moreover, (5.9)–(5.12) imply that

$$\begin{aligned} \begin{aligned} {\mathcal {C}}_{(0)}^{(1)}=\left( {\begin{matrix}0&{}0\\ 0&{}-1\end{matrix}}\right) ,\quad {\mathcal {C}}_{(0)}^{(2)}=\left( {\begin{matrix}1&{}0\\ 0&{}0\end{matrix}}\right) ,\quad {{\mathcal {A}}}_{(0)}=I. \end{aligned} \end{aligned}$$
(5.13)

which are the analogues of (4.9) and (4.13).

We look for solutions of (1.13) having the form

$$\begin{aligned} u(t,x,y)=U(\tau (t,x,y,\varepsilon ),x,Y^{(1)}(t,x,y)-\tfrac{\mu ^{(1)}(t,x)}{\varepsilon },Y^{(2)}(t,x,y)-\tfrac{\mu ^{(2)}(t,x)}{\varepsilon },\varepsilon ). \nonumber \\ \end{aligned}$$
(5.14)

Substituting (5.14) into (1.13) and using the definitions (1.62) and (1.72) shows that u defined by (5.14) will satisfy (1.13) provided that U satisfies

$$\begin{aligned} \begin{aligned} {\widehat{A}}(\tau ,&x,z_1,z_2,\varepsilon U,\varepsilon ) U_\tau +\widehat{B}(\tau ,x,z_1,z_2,\varepsilon U,\varepsilon )U_x \\ {}&+\tfrac{1}{\varepsilon }\left[ {\widehat{C}}^{(1)}_{(0)}(\tau ,x,z_1,z_2)U_{z_1}+{\widehat{C}}^{(2)}_{(0)}(\tau ,x,z_1,z_2)U_{z_2}\right] \\ {}&+{\widehat{D}}^{(1)}(\tau ,x,z_1,z_2,U,\varepsilon )U_{z_1}+{\widehat{D}}^{(2)}(\tau ,x,z_1,z_2,U,\varepsilon )U_{z_2}\\ {}&+{\widehat{f}}(\tau ,x,z_1,z_2,U,\varepsilon )=0. \end{aligned} \end{aligned}$$
(5.15)

Making the change of variables

$$\begin{aligned} U=\rho {\widetilde{A}}_{(0)}^{-1/2}R{\mathcal {U}}, \end{aligned}$$
(5.16)

multiplying the resulting equation by \(\frac{1}{\rho }R^T\widetilde{A}_{(0)}^{-\frac{1}{2}}\), and using assumption (1.82) yields the system (4.11), where (4.12) and (4.13) again hold, but now the coefficients \({\mathcal {A}}\), \({\mathcal {B}}\), \({\mathcal {D}}\) and \({\mathcal {F}}\) depend on \(z_1\) and \(z_2\) in addition to \(\tau \) and x, and, by (5.13),

$$\begin{aligned} {\mathcal {L}}=\left( {\begin{matrix}0&{}0\\ 0&{}-1\end{matrix}}\right) \partial _{z_1}+ \left( {\begin{matrix}1&{}0\\ 0&{}0\end{matrix}}\right) \partial _{z_2}. \end{aligned}$$
(5.17)

Since assumption (1.82) has eliminated the bad terms (1.76) that would otherwise be present in (4.11) for system (1.13), the remainder of the proof is then the same as for the case of coefficients independent of y given in Sect. 4, with one minor exception: Since the function \(\tau (t,x,y,\varepsilon )=\mu ^{(1)}(t,x)-\mu ^{(2)}(t,x)+\varepsilon [Y^{(2)}(t,x,y)-Y^{(1)}(t,x,y)]\) defined in (1.62) is \(C^1\), we have replaced \(\tau (t,x,y,\varepsilon )\) by \(\tau (t,x,y,0)\) in the estimates (1.83) and (1.56) so as to make the only dependence on \(\varepsilon \) in the profile appear in the phases, because the error induced by that replacement has the same size \(O(\varepsilon )\) as the error of those estimates. \(\square \)

6 Counterexamples and examples

6.1 Nonuniform existence of solutions to (1.10)

Since the coefficient a of \(u_t\) is always assumed to be positive, it is always possible to divide the scalar PDE (1.10) by that coefficient, which replaces the coefficients b and c by \(\tfrac{b}{a}\) and \(\tfrac{c}{a}\). When looking for counterexamples it therefore suffices to consider the coefficients b and c. We begin with two preliminary examples for equations with stronger dependence on the dependent variable than is allowed in (1.10); the first example is classical.

Example 6.1

(c depends on u) In the standard example \(u_t+\frac{c(u)}{\varepsilon }u_y=0\) with initial data \(u_0\) for which \(c'(u_0)u_0'\) takes O(1) negative values, the y-derivative of the solution blows up in finite time, and more specifically at a time \(O(\varepsilon )\) since the small parameter can be scaled into the time.

Example 6.2

(b depends on u) Since b is not multiplied by \(\frac{1}{\varepsilon }\) in (1.4), the scaling argument of Example 6.1 is not applicable. Moreover, classical singular limit theory obtains uniform existence when b depends on u assuming that c is independent of x. We therefore let b be a bounded positive function of u whose first derivative is nonzero at some point and let c be a bounded function of x whose first derivative is nonzero at some point. In similar fashion to [16, (3.5)], solving the characteristic equations for the PDE

$$\begin{aligned} u_t+b(u)u_x+\tfrac{c(x)}{\varepsilon }u_y=0 \end{aligned}$$
(6.1)

shows that solutions of the initial-value problem for that PDE can be written in the implicit form

$$\begin{aligned} u=u_0(x-b(u)t,y-\tfrac{C(x)-C(x-b(u)t)}{\varepsilon b(u)}), \qquad \text {where}\, C'=c. \end{aligned}$$
(6.2)

Proceeding as in that reference by taking the y derivative of (6.2) and solving the result for \(u_y\) yields the formula

$$\begin{aligned} u_y=\frac{(u_0)_y}{1-t b'(u)(u_0)_x-\frac{b'(u)(u_0)_y}{\varepsilon b(u)^2}[C(x)-C(x-b(u)t)-c(x-b(u)t)b(u)t] }.\nonumber \\ \end{aligned}$$
(6.3)

The Taylor expansion formula for C(x) around the point \(x-b(u)t\) shows that the expression \(C(x)-C(x-b(u)t)-c(x-b(u)t)b(u)t\) equals \(\frac{1}{2} c'(x-\theta b(u)t)b(u)^2t^2\) for some \(\theta \in (0,1)\). Hence if \(b'(u_0(x,y))(u_0(x,y))_yc'(x)>0\) at some point then (6.3) implies that \(|u_y|\) will become infinite at some point at a time of order \(O(\sqrt{\varepsilon })\). Note that the more subtle mechanism of blow up as compared to Example 6.1 manifests itself in the blowup being less rapid than in that example. A similar calculation shows that \(u_x\) will also become infinite.

Moreover, estimates for u, \(u_y\), and \(\sqrt{\varepsilon }u_x\) along characteristics, with time rescaled by \(t=\sqrt{\varepsilon }\tau \), show that solutions of the equation obtained from the prototypical PDE (1.4) by replacing b there with b(txu) exist for at least a time \(O(\sqrt{\varepsilon })\). Similar estimates are presented in detail in Example 6.3. Hence the blowup time obtained for equations in which b depends on u is sharp. Note that the scaling used to obtain that lower bound on the existence time is different than the scaling from [12, Proposition 6.1.1] mentioned in the introduction.

Example 6.3

(b depends on y, and c vanishes somewhere) Consider the PDE

$$\begin{aligned} u_t+b(y)u_x+\tfrac{c(x)}{\varepsilon }u_y+d(u)u_y=0, \end{aligned}$$
(6.4)

which is a special case of (1.10). Of course, in order for Theorem 1.1 to not apply, the coefficients must fail to satisfy at least one of the hypotheses of that theorem. The construction here depends on c vanishing at some point, which violates assumption (1.17). The fast blowup of solutions will now require that three coefficients depend on particular variables, and the blowup time will be even less rapid, albeit only slightly, than in Example 6.2. For Eq. (6.4) it is not possible to solve the characteristic ODEs even to obtain an implicit formula for the solution like (6.2), except in the special case when \(b(y)=k_1y\) and \(c(x)=k_2x\). Nevertheless, it is still possible to derive ODEs for \(u_x\) and \(u_y\) and use them to prove that under certain conditions the time of breakdown tends to zero with \(\varepsilon \), in similar fashion to the calculations in [16, pp. 35–36] for the case of an equation in one spatial dimension not containing a parameter \(\varepsilon \). Taking the x or y derivative of (6.4) and defining the directional derivative along characteristics \(D_t v\mathrel {:=}v_t+bv_x+\tfrac{c}{\varepsilon }v_y+dv_y\) yields the system

$$\begin{aligned} D_t u_x=-\tfrac{c'(x)}{\varepsilon }u_y-d'(u)u_xu_y,\qquad D_t u_y=-b'(y)u_x-d'(u)u_y^2. \end{aligned}$$
(6.5)

We consider here the case when there exist \(x_*\), \(y_*\), and \(u_*\) such that \(c(x_*)=0\), \(b(y_*)=0\), and \(d(u_*)=0\) while \(c'(x_*)<0\), \(b'(y_*)<0\) and \(d'(u_*)<0\), and take smooth bounded initial data \(u_0\) such that \((u_0)_y(x_*,y_*)>0\). Then the characteristic through the point \((x_*,y_*,u_*)\) remains at that point for all time. Define

$$\begin{aligned} P(t)\mathrel {:=}\sqrt{\varepsilon }u_x(t,x_*,y_*)\qquad \text {and}\qquad Q(t)\mathrel {:=}u_y(t,x_*,y_*); \end{aligned}$$
(6.6)

then on the characteristic through \((x_*,y_*,u_*)\) (6.5) becomes

$$\begin{aligned} \frac{d}{dt}P=-\tfrac{c'(x_*)}{\sqrt{\varepsilon }}Q-d'(u_*)PQ, \qquad \frac{d}{dt}Q=-\tfrac{b'(y_*)}{\sqrt{\varepsilon }}P-d'(u_*)Q^2. \end{aligned}$$
(6.7)

In terms of the eigenmodes

$$\begin{aligned} R_\pm \mathrel {:=}Q\pm \sqrt{\tfrac{b'(y_*)}{c'(x_*)}}P, \end{aligned}$$
(6.8)

of the linear part of the system (6.7), that system becomes

$$\begin{aligned} \frac{d}{dt}R_\pm =\pm \tfrac{\sqrt{b'(y_*)c'(x_*)}}{\sqrt{\varepsilon }}R_\pm + \tfrac{[-d'(u_*)](R_+-R_-)}{2} R_\pm . \end{aligned}$$
(6.9)

Since \(Q(0)>0\) by assumption and \(P(0)=O(\sqrt{\varepsilon })\), for sufficiently small \(\varepsilon \), we have \(R_\pm (0)>0\). Hence the ODEs (6.9) imply that

$$\begin{aligned} \frac{d}{dt}R_+>\tfrac{1}{2} \tfrac{\sqrt{b'(y_*)c'(x_*)}}{\sqrt{\varepsilon }}R_+\qquad \text {and} \qquad \frac{d}{dt}R_-<-\tfrac{1}{2} \tfrac{\sqrt{b'(y_*)c'(x_*)}}{\sqrt{\varepsilon }}R_- \nonumber \\ \end{aligned}$$
(6.10)

at least until \(R_+\ge \frac{\delta }{\sqrt{\varepsilon }}\) for a certain positive \(\delta \), which happens after a time \(T=O(\sqrt{\varepsilon }\ln \tfrac{1}{\varepsilon })\). The differential inequality for \(R_-\) in (6.10) implies that \(R_-=O(\sqrt{\varepsilon })\) at that time. This implies via (6.8) that both P and Q are positive and at least \(\frac{\delta }{2\sqrt{\varepsilon }}\) at time T. The ODEs (6.7) then imply that both P and Q remain positive at later times, so the ODE for Q there implies that

$$\begin{aligned} \frac{d}{dt}Q>[-d'(u_*)]Q^2. \end{aligned}$$
(6.11)

Since \(Q(T)\ge \frac{\delta }{2\sqrt{\varepsilon }}\), (6.11) implies that Q becomes infinite at a time \(T_*=T+O(\sqrt{\varepsilon })=O(\sqrt{\varepsilon }\ln \tfrac{1}{\varepsilon })\).

6.2 Boundary layers in solution to (1.10) when \(c=0\)

Example 6.4

(Boundary layer in solution to (1.10)) The assumption (1.17) for Theorem 1.1 requires that the coefficient c of the large term in (1.10) be bounded away from zero. Example 6.3 showed that when that assumption does not hold and in addition the coefficient b depends on y then the time of existence of the solution may tend to zero with \(\varepsilon \). We now show that even when the coefficients a, b, and c are independent of y the vanishing of the coefficient c may change the asymptotics of the solution from those given in Theorem 1.1. Specifically, boundary layers can appear.

We first note that uniform existence can be proven for solutions of the special case

$$\begin{aligned} a(t,x,\varepsilon u)u_t+b(t,x,\varepsilon u)u_x+\tfrac{1}{\varepsilon }c(t,x)u_y+d(t,x,y,u)u_y+f(t,x,y,u)=0\nonumber \\ \end{aligned}$$
(6.12)

of (1.10), assuming only that the coefficients belong to \(C^1\) and a satisfies (1.9). This can be shown via a slight extension of the method of [12, Propositions 6.1.1–6.1.3], using the method of characteristics and the \(\varepsilon \)-weighted \(C^1\) norm \(\Vert u\Vert _{C^0}+\Vert u_y\Vert _{C^0}+\varepsilon \Vert u_x\Vert _{C^0}\), because dynamic estimates for those norms are obtained with coefficients that are bounded in \(\varepsilon \). Similar estimates are presented in detail in Sect. 6.4 for the case when the coefficients do depend on y but c does not vanish, except that the estimate for \(u_y\) there is obtained by solving the PDE for that expression rather than dynamically.

Now specialize further to the equation

$$\begin{aligned} u_t +\tfrac{c(x)}{\varepsilon }u_y =k+f(y), \end{aligned}$$
(6.13)

where f is assumed to be periodic in y, as in Theorem 1.1, and can be normalized to have mean zero by adjusting k. When c(x) is bounded away from zero as required by (1.17) then Theorem 1.1 shows that the asymptotics of the solution having initial data \(u_0(x,y)\) are

$$\begin{aligned} u(t,x,y)=U^{(0)}(c(x)t,x,y-\tfrac{c(x)t}{\varepsilon })+O(\varepsilon ), \end{aligned}$$
(6.14)

where U satisfies

$$\begin{aligned} c(x)U^{(0)}_\tau =k,\qquad U^{(0)}(0,x,z)=u_0(x,z). \end{aligned}$$
(6.15)

Dividing (6.15) by c(x) and solving the result yields

$$\begin{aligned} U^{(0)}(\tau ,x,z)=u_0(x,z)+k\tfrac{\tau }{c(x)}, \end{aligned}$$
(6.16)

and substituting this back into (6.14) yields

$$\begin{aligned} u(t,x,y)=u_0(x,y-\tfrac{c(x)t}{\varepsilon })+kt+O(\varepsilon ). \end{aligned}$$
(6.17)

We now compare the asymptotic solution (6.17) of (6.13) predicted by Theorem 1.1 with the exact solution of that equation obtained by the method of characteristics, namely

$$\begin{aligned} \begin{aligned} u(t,x,y)\!=\! u^0(x,y\!-\!\tfrac{c(x)t}{\varepsilon })\!+\!kt\!+\!\varepsilon \tfrac{F(y)-F(y-\frac{c(x)t}{\varepsilon })}{c(x)}, \;\;\text{ where } F(y)\mathrel {:=}\int _0^y f(s)\,ds. \end{aligned}\nonumber \\ \end{aligned}$$
(6.18)

When c(x) is bounded away from zero then the final term on the right side of the equation in (6.18) is of size \(O(\varepsilon )\) everywhere, in accordance with (6.17). However, the identity

$$\begin{aligned} \varepsilon \tfrac{F(y)-F(y-\frac{c(x)t}{\varepsilon })}{c(x)}=-\tfrac{\varepsilon }{c(x)}\int _0^1 \frac{d}{ds}F(y-s\tfrac{c(x)t}{\varepsilon })\,ds =-t\int _0^1 f(y-st \tfrac{c(x)}{\varepsilon })\,ds \end{aligned}$$

shows that if c(x) vanishes at one or more points then for \(t>0\) the final term on the right side of the equation in (6.18) will be of order one in regions where \(c(x)=O(\varepsilon )\), and hence the leading asymptotics will be different than (6.17). In other words, a boundary layer appears.

6.3 Examples and comparison to geometric optics

Example 6.5

(\(2\times 2\) system and a \(3\times 3\) variant) Consider the system

$$\begin{aligned} \begin{pmatrix}u\\ v\end{pmatrix}_t+\begin{pmatrix}0&{}b(x)\\ b(x)&{}0\end{pmatrix}\begin{pmatrix}u\\ v\end{pmatrix}_x +\tfrac{1}{\varepsilon }\begin{pmatrix}c(x)&{}0\\ 0&{}-c(x)\end{pmatrix} \begin{pmatrix}u\\ v\end{pmatrix}_y=0, \end{aligned}$$
(6.19)

where \(c(x)>0\) and b(x) is not identically zero. For system (6.19), Eq. (1.41) for the fast phases \(\mu \) becomes

$$\begin{aligned} 0=\det \begin{pmatrix}-\mu _t+c(x)&{}-b(x)\mu _x\\ -b(x)\mu _x&{}-\mu _t-c(x) \end{pmatrix}, \qquad \mu (0,x)\equiv 0 \end{aligned}$$
(6.20)

which yields the equations

$$\begin{aligned} \mu _t=\pm \sqrt{c(x)^2+\mu _x^2 b(x)^2}, \qquad \mu (0,x)\equiv 0 \end{aligned}$$
(6.21)

for the fast phases \(\mu \). Although the solutions of (6.21) generally cannot be determined explicitly, if \(\mu \) is a solution then so is \(-\mu \). Hence formula (1.44) becomes \(\tau =2\mu (t,x)\), and the ansatz (4.2) takes form

$$\begin{aligned} \begin{pmatrix}u\\ v\end{pmatrix}= \begin{pmatrix}U(2\mu (t,x),x,y-\tfrac{\mu (t,x)}{\varepsilon },y+\tfrac{\mu (t,x)}{\varepsilon })\\ V(2\mu (t,x),x,y-\tfrac{\mu (t,x)}{\varepsilon },y+\tfrac{\mu (t,x)}{\varepsilon })\\ \end{pmatrix}. \end{aligned}$$
(6.22)

Substituting (6.22) into (6.19) and defining \(z_1=y-\frac{\mu (t,x)}{\varepsilon }\) and \(z_2=y+\frac{\mu (t,x)}{\varepsilon }\) yields the system

$$\begin{aligned} \begin{aligned}&\begin{pmatrix} 2\mu _t&{}{}2b(x)\mu _x\\ 2b(x)\mu _x&{}{}2\mu _t \end{pmatrix} \begin{pmatrix} U\\ V \end{pmatrix}_{\!t}+ \begin{pmatrix} 0&{}{}b(x)\\ b(x)&{}{}0 \end{pmatrix} \begin{pmatrix} U\\ V \end{pmatrix}_{\!x} \\ {}&+\tfrac{1}{\varepsilon }\left[ \begin{pmatrix} c(x)-\mu _t&{}{}-b(x)\mu _x\\ -b(x)\mu _x&{}{}-c(x)-\mu _t \end{pmatrix} \begin{pmatrix} U\\ V \end{pmatrix}_{\!z_1}+ \begin{pmatrix} c(x)+\mu _t&{}{}b(x)\mu _x\\ b(x)\mu _x&{}{}-c(x)+\mu _t \end{pmatrix} \begin{pmatrix} U\\ V \end{pmatrix}_{\!z_2}\right] =\begin{pmatrix}0\\ 0\end{pmatrix}. \end{aligned}\nonumber \\\end{aligned}$$
(6.23)

Using the formulas

$$\begin{aligned} \begin{aligned}&a+c>0, \quad ac-b^2>0\quad \implies \\ \begin{pmatrix}a&{}b\\ b&{}c\end{pmatrix}^{\frac{1}{2}}&=\begin{pmatrix} \frac{\sqrt{a c-b^2}+a}{\sqrt{2 \sqrt{a c-b^2}+a+c}} &{} \frac{b}{\sqrt{2 \sqrt{a c-b^2}+a+c}} \\ \frac{b}{\sqrt{2 \sqrt{a c-b^2}+a+c}} &{} \frac{\sqrt{a c-b^2}+c}{\sqrt{2 \sqrt{a c-b^2}+a+c}} \\ \end{pmatrix}, \\ \begin{pmatrix}a&{}b\\ b&{}c\end{pmatrix}^{-\frac{1}{2}}&=\begin{pmatrix} \frac{\left( \sqrt{a c-b^2}+c\right) \sqrt{2 \sqrt{a c-b^2}+a+c}}{a \left( \sqrt{a c-b^2}+2 c\right) +c \sqrt{a c-b^2}-2 b^2} &{} -\frac{b}{\sqrt{a c-b^2} \sqrt{2 \sqrt{a c-b^2}+a+c}} \\ -\frac{b}{\sqrt{a c-b^2} \sqrt{2 \sqrt{a c-b^2}+a+c}} &{} \frac{\left( \sqrt{a c-b^2}+a\right) \sqrt{2 \sqrt{a c-b^2}+a+c}}{a \left( \sqrt{a c-b^2}+2 c\right) +c \sqrt{a c-b^2}-2 b^2} \\ \end{pmatrix} \end{aligned} \end{aligned}$$
(6.24)

yields

$$\begin{aligned} \begin{pmatrix}2\mu _t&{}2b(x)\mu _x\\ 2b(x)\mu _x&{}2\mu _t \end{pmatrix}^{-\frac{1}{2}}= \begin{pmatrix} \alpha &{}\beta \\ \beta &{}\alpha \end{pmatrix} \end{aligned}$$
(6.25)

with

$$\begin{aligned} \begin{aligned} \alpha&\mathrel {:=}\tfrac{\left( \mu _t+\sqrt{\mu _t^2-b(x)^2 \mu _x^2}\right) ^{3/2}}{2 \mu _t \left( \mu _t+\sqrt{\mu _t^2-b(x)^2 \mu _x^2}\right) -2 b(x)^2 \mu _x^2} =\tfrac{\sqrt{\mu _t+c(x)}}{2 c(x)}, \\ \beta&\mathrel {:=}-\tfrac{b(x) \mu _x}{2 \sqrt{\mu _t^2-b(x)^2 \mu _x^2} \sqrt{\sqrt{\mu _t^2-b(x)^2 \mu _x^2}+\mu _t}} =-\tfrac{b(x) \mu _x}{2 c(x) \sqrt{\mu _t+c(x)}}, \end{aligned} \end{aligned}$$
(6.26)

where the second forms of \(\alpha \) and \(\beta \) are obtained by using (6.21). Calculations show that

$$\begin{aligned} \begin{pmatrix} \alpha &{}\beta \\ \beta &{}\alpha \end{pmatrix} \begin{pmatrix} c(x)-\mu _t&{}-b(x)\mu _x\\ -b(x)\mu _x&{}-c(x)-\mu _t \end{pmatrix} \begin{pmatrix} \alpha &{}\beta \\ \beta &{}\alpha \end{pmatrix}= \begin{pmatrix} 0&{}0\\ 0&{}-1 \end{pmatrix} \end{aligned}$$

and

$$\begin{aligned} \begin{pmatrix} \alpha &{}\beta \\ \beta &{}\alpha \end{pmatrix} \begin{pmatrix} c(x)+\mu _t&{}b(x)\mu _x\\ b(x)\mu _x&{}c(x)-\mu _t \end{pmatrix} \begin{pmatrix} \alpha &{}\beta \\ \beta &{}\alpha \end{pmatrix}= \begin{pmatrix} 1&{}0\\ 0&{}0 \end{pmatrix}. \end{aligned}$$

Hence the matrix R of eigenvectors is simply the identity matrix for this system. The change of variables (4.10) is therefore \(\left( {\begin{matrix}U\\ V\end{matrix}}\right) = \left( {\begin{matrix} \alpha &{}\beta \\ \beta &{}\alpha \end{matrix}}\right) \left( {\begin{matrix}{\mathcal {U}}\\ \mathcal V\end{matrix}}\right) \), and making that substitution yields the system

$$\begin{aligned} \begin{aligned} \begin{pmatrix} {\mathcal {U}}\\ {\mathcal {V}} \end{pmatrix}_t&+\begin{pmatrix} -\frac{b(x)^2 \mu _x}{2 c(x)^2} &{} \frac{b(x) \mu _t}{2 c(x)^2} \\ \frac{b(x) \mu _t}{2 c(x)^2} &{} -\frac{b(x)^2 \mu _x}{2 c(x)^2} \\ \end{pmatrix}\begin{pmatrix} {\mathcal {U}}\\ {\mathcal {V}} \end{pmatrix}_x \\ {}&+\tfrac{1}{\varepsilon }\left[ \begin{pmatrix} 0&{}0\\ 0&{}-1 \end{pmatrix}\begin{pmatrix} {\mathcal {U}}\\ {\mathcal {V}} \end{pmatrix}_{z_1}+ \begin{pmatrix} 1&{}0\\ 0&{}0 \end{pmatrix}\begin{pmatrix} {\mathcal {U}}\\ {\mathcal {V}} \end{pmatrix}_{z_2}\right] + \begin{pmatrix} \gamma &{}\delta \\ \delta &{}\gamma \end{pmatrix}\begin{pmatrix} {\mathcal {U}}\\ {\mathcal {V}} \end{pmatrix}= \begin{pmatrix} 0\\ 0 \end{pmatrix}, \end{aligned}\nonumber \\ \end{aligned}$$
(6.27)

where

$$\begin{aligned} \begin{aligned} \gamma&\mathrel {:=}\tfrac{b(x) \left( \mu _x \left( 2 b(x) c'(x)-c(x) b'(x)\right) -b(x) c(x) \mu _{xx}\right) }{4 c(x)^3} \\ \delta&\mathrel {:=}\tfrac{b(x)}{8 c(x)^3 \left( \mu _t+c(x)\right) ^2}\big [ c(x)^3(\mu _{tx}-c'(x))-2c'(x)\mu _t(\mu _t^2+b(x)^2\mu _x^2) \\ {}&\qquad \qquad \qquad \qquad +2 c(x)^2(b(x)b'(x)\mu _x+b(x)^2\mu _x\mu _{xx}-2c'(x)\mu _t+\mu _t\mu _{tx}) \\ {}&\qquad \qquad \qquad \qquad -c(x) (3b(x)^2c'(x)\mu _x^2-2b(x)b'(x)\mu _x^2\mu _t-2b(x)^2\mu _x\mu _{xx}\mu _t \\ {}&\qquad \qquad \qquad \qquad +5c'(x)\mu _t^2+b(x)^2\mu _x^2\mu _{tx}-\mu _t^2\mu _{tx}) \big ] \end{aligned}\nonumber \\ \end{aligned}$$
(6.28)

We take the initial conditions to be

$$\begin{aligned} \begin{aligned} \begin{pmatrix} {\mathcal {U}}(0,x,z_1,z_2)\\ {\mathcal {V}}(0,x,z_1,z_2) \end{pmatrix} =\begin{pmatrix}{\mathcal {U}}_0(x,z_1)\\ {\mathcal {V}}_0(x,z_2) \end{pmatrix} \mathrel {:=}(2\mu _t(0,x))^{\frac{1}{2}}\begin{pmatrix}u_0(x,z_1)\\ v_0(x,z_2) \end{pmatrix}, \end{aligned}\end{aligned}$$
(6.29)

which satisfy

$$\begin{aligned} \left[ \begin{pmatrix} 0&{}0\\ 0&{}-1 \end{pmatrix}\begin{pmatrix} {\mathcal {U}}_0\\ {\mathcal {V}}_0 \end{pmatrix}_{z_1}+ \begin{pmatrix} 1&{}0\\ 0&{}0 \end{pmatrix}\begin{pmatrix} {\mathcal {U}}_0)\\ {\mathcal {V}}_0 \end{pmatrix}_{z_2}\right] =\begin{pmatrix}0\\ 0\end{pmatrix} \end{aligned}$$

in accordance with (4.15). Specializing (1.54) to system (6.19) shows that as \(\varepsilon \rightarrow 0\) the solutions of (6.27) tend to the unique solution of

$$\begin{aligned} \begin{aligned} {\mathcal {U}}^{(0)}_{z_2}&=0={\mathcal {V}}^{(0)}_{z_1}, \\ {\mathcal {U}}^{(0)}_\tau&-\tfrac{b(x)^2 \mu _x}{2 c(x)^2}\, \mathcal {U}^{(0)}_x+\tfrac{b(x) \mu _t}{2 c(x)^2}\left\langle {\mathcal V^{(0)}_x}\right\rangle _{\!z_2} +\gamma \, {\mathcal {U}}^{(0)}+\delta \left\langle {\mathcal V^{(0)}}\right\rangle _{\!z_2}=0, \\ {\mathcal {V}}^{(0)}_\tau&+\tfrac{b(x) \mu _t}{2 c(x)^2}\left\langle {{\mathcal {U}}^{(0)}_x}\right\rangle _{\!z_1}-\tfrac{b(x)^2 \mu _x}{2 c(x)^2}\, {\mathcal {V}}^{(0)}_x +\delta \left\langle {\mathcal {U}^{(0)}}\right\rangle _{\!z_1}+\gamma \, {\mathcal {V}}^{(0)}=0, \\ {\mathcal {U}}^{(0)}{\big |_{t=0}}&={\mathcal {U}}_0,\quad \mathcal V^{(0)}{\big |_{t=0}}={\mathcal {V}}_0, \end{aligned}\nonumber \\ \end{aligned}$$
(6.30)

and the solution of the original system (6.19) is asymptotic to \(\left( {\begin{matrix}\alpha &{}\beta \\ \beta &{}\alpha \end{matrix}}\right) \) times that limit, with

$$\begin{aligned} (\tau ,z_1,z_2) \,\text {evaluated at}\, (2\mu (t,x),y-\tfrac{\mu (t,x)}{\varepsilon },y+\tfrac{\mu (t,x)}{\varepsilon }). \end{aligned}$$
(6.31)

The connection between the results here for systems of the form (1.12) and the geometric optics theorem of [13] can be seen by making the change of variables \(y=\frac{{\widehat{y}}}{\varepsilon }\) in (6.19). This transforms (6.19) into

$$\begin{aligned} \begin{pmatrix}u\\ v\end{pmatrix}_t+\begin{pmatrix}0&{}b(x)\\ b(x)&{}0\end{pmatrix}\begin{pmatrix}u\\ v\end{pmatrix}_x + \begin{pmatrix}c(x)&{}0\\ 0&{}-c(x)\end{pmatrix} \begin{pmatrix}u\\ v\end{pmatrix}_{{\widehat{y}}}=0, \end{aligned}$$
(6.32)

whose initial data now has the form \(\left( {\begin{matrix}u\\ v \end{matrix}}\right) {\big |_{t=0}} = \Big ({\begin{matrix}u_0(x,\frac{{\widehat{y}}}{\varepsilon })\\ v_0(x,\frac{{\widehat{y}}}{\varepsilon })\end{matrix}} \Big )\). In accordance with [13, Remark 2.3.3], the geometric optics phases are \({{\widehat{y}}}{\mp } \mu (t,x)\). Since system (6.32) together with its initial data is equivalent to the original system and initial data, the asymptotics of the solution are the same except that y is replaced by \(\frac{{\widehat{y}}}{\varepsilon }\) in (6.31). The same transformation and resulting translation of the asymptotics to the geometric optics framework holds for all systems of the form (1.12). As noted in the introduction, this yields an alternative proof of a special case of the results of [13], with the slight generalization that the matrix multiplying the time derivatives is not required to be the identity. Note that even though the order one y derivative term \(Du_y\) is transformed to \(\varepsilon Du_{{\widehat{y}}}\), that term still appears in the asymptotics of the solution, as do the order \(\varepsilon \) parts of A and B.

An example of a system larger than \(2\times 2\) for which condition (1.49) holds can be obtained by appending to system (6.19) a scalar equation having no large term, which may be coupled to (6.19) by symmetric terms involving x derivatives, symmetric terms involving order one y derivatives, and undifferentiated terms. Since the third fast phase \(\mu ^{(3)}\equiv 0\) is a constant-coefficient combination \(\frac{1}{2} \mu +\frac{1}{2} (-\mu )\) of the two fast phases \(\pm \mu \), Theorem 1.5 applies with \(\alpha ^{(3)}=\frac{1}{2}\).

Example 6.6

(Phases and necessity of condition (1.49)) One might think that the phase functions for singular limit equations are simply the \(\mu ^{(j)}\), because those functions appear multiplied by \(\frac{1}{\varepsilon }\) in the ansatz formulas (1.3), (1.31), (1.39), (4.2), and (5.14). Actually, the slow part y for equations (1.11) and (1.12) or Y(txy) for (1.10) and (1.13) should be included, so that the full phase function is \(\varepsilon y-\mu ^{(j)}\) or \(\varepsilon Y(t,x,y)-\mu ^{(j)}\).

The correct definition of the phases functions can be seen from the necessity of condition (1.49). That condition is an analogue of the coherence condition [13, Definition 2.2.1 and Remark 2.2.3] for geometric optics, which says that all phases must be constant-coefficient linear combinations of a basis set. However, (1.49) requires that the fast phase functions \(\mu ^{(j)}\) for system (1.12) be convex, not arbitrary, linear combinations of the first two fast phases, i.e., must have coefficients that sum to one. The reason for that convexity requirement only becomes clear when we consider the full phase functions \(\varepsilon y-\mu ^{(j)}(t,x)\) for that system. If the phases functions were just \(\mu ^{(j)}\) then linear combinations \(\mu ^{(j)}(t,x)=\alpha \mu ^{(1)}(t,x)+\beta \mu ^{(2)}(t,x)\) should be allowed, but for the full phases the condition \(\varepsilon y-\mu ^{(j)}(t,x)=\alpha (\varepsilon y-\mu ^{(1)}(t,x)) + \beta (\varepsilon y-\mu ^{(2)}(t,x) )\) automatically implies that \(\beta \) must equal \(1-\alpha \), as required in (1.49).

The fact that the full phases for system (1.12) are \(\varepsilon y-\mu ^{(j)}(t,x)\) also explains why precisely two independent phases are possible for that system. On account of the initial condition \(\mu ^{(j)}(0,x)\equiv 0\) for the fast phase functions, all full phases \(\varepsilon y-\mu ^{(j)}(t,x)\) reduce at time zero to the same function \(\varepsilon y\), and so span a space of dimension one. By [13, Lemma 2.3.2] the dimension of the space of phases is at most one more than the dimension of the space they span at time zero, so only two independent phases are allowed.

To see the necessity of condition (1.49), consider the system

$$\begin{aligned} \begin{aligned} \begin{pmatrix} u\\ v\\ w \end{pmatrix}_t&+ \begin{pmatrix} 1&{}\quad 0&{}\quad 0\\ 0&{}\quad -1&{}\quad 0\\ 0&{}\quad 0&{}\quad 0 \end{pmatrix}\begin{pmatrix} u\\ v\\ w \end{pmatrix}_x+ \frac{1}{\varepsilon }\begin{pmatrix} c_1(t,x)&{}\quad 0&{}\quad 0\\ 0&{}\quad c_2(t,x)&{}\quad 0\\ 0&{}\quad 0&{}\quad c_3(t,x) \end{pmatrix}\begin{pmatrix} u\\ v\\ w \end{pmatrix}_y \\ {}&+ M(t,x) \begin{pmatrix} u\\ v\\ w \end{pmatrix}=0, \end{aligned}\nonumber \\ \end{aligned}$$
(6.33)

where M(tx) is a matrix that couples the components of the system. Assume that the fast phases \(\mu ^{(1)}(t,x)\) and \(\mu ^{(2)}(t,x)\), which by (1.41) are determined by

$$\begin{aligned} \mu ^{(1)}_t+\mu ^{(1)}_x=c_1(t,x), \quad \mu ^{(2)}_t-\mu ^{(1)}_x=c_2(t,x),\quad \mu ^{(j)}(0,x)\equiv 0, \end{aligned}$$
(6.34)

satisfy \(\mu ^{(1)}_t(0,x)-\mu ^{(2)}_t(0,x)\ge c>0\) in accordance with (1.42)–(1.43). The remaining solution \(\mu ^{(3)}\) of (1.41) satisfies \(\mu ^{(3)}_t=c_3(t,x), \mu ^{(3)}(0,x)\equiv 0\). Calculating the matrices \({\mathcal {C}}^{(j)}\) of the transformed system (4.11)–(4.12) for (6.33) yields

$$\begin{aligned} {\mathcal {C}}^{(1)}= \left( {\begin{matrix} 0&{}\quad 0&{}\quad 0\\ 0&{}\quad -1&{}\quad 0\\ 0&{}\quad 0&{}\quad \frac{\mu ^{(3)}_t-\mu ^{(1)}_t}{\mu ^{(1)}_t-\mu ^{(2)}_t} \end{matrix}}\right) ,\qquad {\mathcal {C}}^{(2)}= \left( {\begin{matrix} 1&{}\quad 0&{}\quad 0\\ 0&{}\quad 0&{}\quad 0\\ 0&{}\quad 0&{}\quad \frac{\mu ^{(3)}_t-\mu ^{(2)}_t}{\mu ^{(1)}_t-\mu ^{(2)}_t} \end{matrix}}\right) . \end{aligned}$$
(6.35)

Since \({\mathcal {C}}^{(2)}-{\mathcal {C}}^{(1)}=I\) in accordance with (4.5), it suffices to determine when \(\mathcal C^{(2)}\) is a constant matrix. That holds iff \(\frac{\mu ^{(3)}_t-\mu ^{(2)}_t}{\mu ^{(1)}_t-\mu ^{(2)}_t}=\alpha \) for some constant \(\alpha \), which may be solved for \(\mu ^{(3)}_t\) to obtain \(\mu ^{(3)}_t=\alpha \mu ^{(1)}_t+(1-\alpha )\mu ^{(2)}_t\). Since all the \(\mu ^{(j)}\) are identically zero at time zero, integrating with respect to t shows that (1.49) is indeed a necessary condition for the large terms of the transformed equation to have constant coefficients.

Example 6.7

(\(2\times 2\) system with y-dependent coefficients) Define

$$\begin{aligned} A\mathrel {:=}\left( {\begin{matrix} 2+\cos (x+y)&{}0\\ 0&{} 4+2\cos (x+y)+\sin t\,\sin (x-y) \end{matrix}} \right) \end{aligned}$$

and consider the system

$$\begin{aligned} \begin{aligned}&A \begin{pmatrix} u\\ v \end{pmatrix}_t + \begin{pmatrix} 0&{}b(t,x,y)\\ b(t,x,y)&{}0 \end{pmatrix} \begin{pmatrix} u\\ v \end{pmatrix}_x +\frac{1}{\varepsilon }\begin{pmatrix} 2&{}0\\ 0&{}1 \end{pmatrix} \begin{pmatrix} u\\ v \end{pmatrix}_y= \begin{pmatrix} 0\\ 0 \end{pmatrix} \end{aligned} \end{aligned}$$
(6.36)

Calculations show that the eigenvalues of the matrix \( A^{\frac{1}{2}}\left( {\begin{matrix} 2&{}0\\ 0&{}1 \end{matrix}}\right) ^{-1} A^{\frac{1}{2}}\) are \(\frac{2+\cos (x+y)}{2}\) and \(4+2\cos (x+y)+\sin t\sin (x-y)\), whose y-averages are one and four, respectively. At time zero,

$$\begin{aligned} \begin{aligned} \frac{\lambda _1}{\left\langle {\lambda _1}\right\rangle _{y}}&=\frac{\frac{2+\cos (x+y)}{2}}{1}= 1+\tfrac{1}{2} \cos (x+y) \\ {}&= \frac{4+2\cos (x+y)+\sin 0\,\cdot \sin (x-y)}{4} =\frac{\lambda _2}{\left\langle {\lambda _2}\right\rangle _{y}}, \end{aligned} \end{aligned}$$

i.e., condition (1.80) is satisfied. Since the averages of the eigenvalues are independent of x as well as of y, the solution of (1.58) is \(\mu ^{(1)}=t\), \(\mu ^{(2)}=\frac{t}{4}\). Substituting those functions into the differential equation in (1.57) and solving using the alternative normalization (1.59) then yields \(Y^{(1)}=y+\frac{\sin (x+y)}{2}\), \(Y^{(2)}=y+\frac{\sin (x+y)}{2}+\frac{\sin t\,\cos (x-y)}{4}\). Using these, we calculate from (1.62) that

$$\begin{aligned} \tau= & {} t-\tfrac{t}{4}+\varepsilon [(y+\tfrac{\sin (x+y)}{2}+\tfrac{\sin t\,\cos (x-y)}{4})-(y+\tfrac{\sin (x+y)}{2})]\nonumber \\ {}= & {} \tfrac{3}{4} t+\varepsilon \tfrac{\cos (x-y)}{4}\sin t, \end{aligned}$$
(6.37)

which indeed vanishes identically at time zero as desired. Also, using (1.64) we obtain

$$\begin{aligned} \begin{aligned} Q&=\tfrac{t}{t}\left( y+\tfrac{\sin (x+y)}{2}+\tfrac{\sin t\,\cos (x-y)}{4}\right) -\tfrac{\frac{t}{4}}{t}\left( y+\tfrac{\sin (x+y)}{2}\right) \\ {}&=\tfrac{6y+3\sin (x+y)+2\sin t\,\cos (x-y)}{8}, \end{aligned} \end{aligned}$$
(6.38)

whose derivative with respect to y is indeed strictly positive, in accordance with the discussion after (5.4).

Since the inverses of \(\tau \) with respect to t and of Q with respect to y cannot be calculated explicitly, we will write the transformed coefficient matrices in terms of the original variables (txy). Using the formulas (1.72) and (1.73) we calculate that

$$\begin{aligned} {\widehat{A}}&= \left( \begin{array}{cc} \frac{6+3\cos (x+y)+2\sin t\,\sin (x-y)}{4}&{}0 \\ 0&{}\frac{6+3\cos (x+y)+2\sin t\,\sin (x-y)}{2} \end{array}\right) , \nonumber \\ {\widehat{B}}&= B=\left( \begin{array}{cc} 0 &{} b(t,x,y) \\ b(t,x,y) &{} 0\end{array}\right) \nonumber \\ {\widehat{C}}^{(1)}&=\left( \begin{array}{cc} 0\; &{}0 \\ 0 \; &{} - \frac{6+3\cos (x+y)+2\sin t\,\sin (x-y)}{2} \end{array}\right) , \quad {\widehat{C}}^{(2)}= \left( \begin{array}{cc} \frac{6+3\cos (x+y) +2\sin t\,\sin (x-y)}{4} &{}0 \\ 0 &{}0 \end{array}\right) , \nonumber \\ \end{aligned}$$
(6.39)

and

$$\begin{aligned} {\widetilde{C}}^{(1)}=\left( {\begin{matrix} 0\; &{}0 \\ 0\; &{} -1 \end{matrix}}\right) , \quad {\widetilde{C}}^{(2)}= \left( {\begin{matrix}1 &{}0 \\ 0 &{} 0 \end{matrix}}\right) , \quad {\widetilde{B}}=\tfrac{2\sqrt{2}\, b(t,x,y)}{6+3\cos (x+y)+2\sin t\,\sin (x-y)}\left( {\begin{matrix} 0 &{} 1 \\ 1 &{} 0\end{matrix}}\right) . \end{aligned}$$
(6.40)

Since the \({\widetilde{C}}^{(j)}\) already have the form desired for the \({\mathcal {C}}^{(j)}\), the matrix R is the identity matrix. Upon setting \(\rho \mathrel {:=}\sqrt{6+3\cos (x+y)+2\sin t\,\sin (x-y)}\) we obtain \(\rho \widetilde{A}^{-\frac{1}{2}}R=\left( {\begin{matrix}2&{}0\\ 0&{}\sqrt{2}\end{matrix}} \right) \), which is independent of y, so that after evaluation at \(y=\widehat{{\mathcal {Y}}}\) it will be independent of \(z_1\) and \(z_2\) as desired. Since that matrix is also independent of t and x, all the terms in the formula (1.75) for the undifferentiated term of the transformed equation vanish. The fact that R is the identity matrix also implies that \({\mathcal {B}}\) equals \({\widetilde{B}}\). The transformed system and its limit are then obtained as in Example 6.5, with the matrix \({\mathcal {B}}\) here substituted for the corresponding matrix in (6.27) and with no undifferentiated term.

6.4 Estimates along characteristics for Eq. (1.10)

Taking the x, y, or t derivative of (1.10) and defining the directional derivative along characteristics \( D_t v\mathrel {:=}av_t+bv_x+\tfrac{c}{\varepsilon }v_y+dv_y\) yields the equations

$$\begin{aligned} \begin{aligned} D_t u_w&=-[a_wu_t+\varepsilon a_{\varepsilon u}u_wu_t+b_w u_x+\varepsilon b_{\varepsilon u}u_wu_x+\tfrac{1}{\varepsilon }c_wu_y \\&\qquad \qquad +d_wu_y+d_u u_wu_y+f_w+f_uu_w] \end{aligned} \;\; \text {for } w\in \{x,y,t\}.\nonumber \\ \end{aligned}$$
(6.41)

Taking first \(w=t\) and then \(w=x\) in (6.41) and multiplying the result in each case by \(\varepsilon \) yields

$$\begin{aligned} \begin{aligned} D_t (\varepsilon u_t)&=-[a_t (\varepsilon u_t)-a_{\varepsilon u}(\varepsilon u_t)^2-b_t(\varepsilon u_x)-b_{\varepsilon u}(\varepsilon u_t)(\varepsilon u_x) + c_t u_y \\&\qquad \qquad +\varepsilon d_t u_y+d_u(\varepsilon u_t)u_y+\varepsilon f_t+f_u(\varepsilon u_t)], \\ D_t (\varepsilon u_x)&=-[a_x (\varepsilon u_t)-a_{\varepsilon u}(\varepsilon u_x)(\varepsilon u_t)-b_x(\varepsilon u_x)-b_{\varepsilon u}(\varepsilon u_x)^2 + c_x u_y \\&\qquad \qquad +\varepsilon d_x u_y+d_u(\varepsilon u_x)u_y+\varepsilon f_x+f_u(\varepsilon u_x)]. \end{aligned} \end{aligned}$$
(6.42)

It would not be useful to do the same for \(w=y\), either with or without multiplying the result by \(\varepsilon \) since the resulting equation would have a large term \(\frac{1}{\varepsilon }c_y\) times either \(\varepsilon u_y\) or \(u_y\), respectively, and so would not yield a uniform estimate. Instead, since c is assumed to be bounded away from zero, we solve (1.10) for \(u_y\) to obtain

$$\begin{aligned} u_y=-\frac{1}{c+\varepsilon d}[\varepsilon f+a(\varepsilon u_t)+b (\varepsilon u_x)] \end{aligned}$$
(6.43)

Substituting (6.43) into each equation in (6.42) yields ODEs for \(\varepsilon u_t\) and \(\varepsilon u_x\) along characteristics that have right sides of order one when those variables are of order one. The original PDE (1.10) can be written as \(D_tu=-f\), whose right side is also of order one. Those ODEs therefore yield uniform \(C^0\) bounds for u, \(\varepsilon u_t\), and \(\varepsilon u_x\) up to some time independent of \(\varepsilon \), and (6.43) then yields a uniform \(C^0\) bound for \(u_y\) up to the same time. In particular, those bounds imply that the solution of (1.10) exists for a time independent of \(\varepsilon \).