The continuous-time hidden Markov model based on discretization. Properties of estimators and applications

In this paper we consider continuous-time hidden Markov processes (CTHMM). The model considered is a two-dimensional stochastic process (Xt,Yt)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_t,Y_t)$$\end{document}, with Xt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_t$$\end{document} an unobserved (hidden) Markov chain defined by its generating matrix and Yt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_t$$\end{document} an observed process whose distribution law depends on Xt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_t$$\end{document} and is called the emission function. In general, we allow the process Yt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Y_t$$\end{document} to take values in a subset of the q-dimensional real space, for some q. The coupled process (Xt,Yt)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X_t,Y_t)$$\end{document} is a continuous-time Markov chain whose generator is constructed from the generating matrix of X and the emission distribution. We study the theoretical properties of this two-dimensional process using a formulation based on semi-Markov processes. Observations of the CTHMM are obtained by discretization considering two different scenarii. In the first case we consider that observations of the process Y are registered regularly in time, while in the second one, observations arrive at random. Maximum-likelihood estimators of the characteristics of the coupled process are obtained in both scenarii and the asymptotic properties of these estimators are shown, such as consistency and normality. To illustrate the model a real-data example and a simulation study are considered.


Introduction
Hidden Markov Models (HMM) appear in a large number of real-world estimation problems where a process with unobservable states produce observable outputs normally referred as signals. Mor et al. (2021) conduct a review of the published work on HMM during the last few decades, and the different application areas of these models, e.g., Pattern recognition, Bioinformatics, Economy and Finance, Network security, Meteorology, Reliability engineering, etc.
The problem is presented as follows. Consider a coupled process, (X , Y ), in discrete or continuous time, where X is an unobserved process, a hidden process, and Y is the observed process. Based on the observations, the law of the coupled process has to be estimated. In general, the hidden process X is a Markov chain, and Y is a process whose distribution depends on the value of X t , i.e., P(Y t ∈ B|Y s , X s ; s ≤ t) = P(Y t ∈ B|X t ), or P(Y t ∈ B|Y s , X s ; s ≤ t) = P(Y t ∈ B|Y t , X t ). In the first case we have an M1-M0 model and in the second an M1-M1 model, where M means Markov order 1 or 0. These are the most used orders, but in general we can consider more general orders.
In the literature, the considered Markov chain is a finite-state space process with transition probability P i j being a function of a parameter θ (in general a vector). It is written as P i j (θ ). Therefore, the estimation of the parameter vector θ , i.e., θ , gives us a plug-in estimator of the transition probability, i.e., P i j := P i j ( θ).
Basic theoretical results concerning HMM in discrete time are given in Baum andPetrie (1966), andLeroux (1992) where consistency of estimators is proven, as well as in Bickel et al. (1998) that prove asymptotic normality of the estimator for a stationary process. These results concerning discrete-time hidden Markov models (DTHMM) case are used in our CTHMM in order to provide asymptotic results for our estimators.
Here, a continuous-time HMM (CTHMM) is considered, where X is defined by its generating matrix A and Y by its probability law, with conditional distribution G(i, B) given {X t = i}, and B a measurable set in R q , i.e., P(Y t ∈ B|X t = i) = G (i, B), so the considered model is an M1-M0.
Our results concern the consistency and asymptotic normality of the estimator of the parameter θ and applications to a real data set and simulated data.
It is assumed that the generator A of X is a matrix depending on the parameter θ , i.e., A = A(θ ) as well as the probability distribution of Y , i.e., G = G θ . Let us denote g θ the density of G θ with respect to some dominating measure μ.
The methodology to tackle the problem is based on results in DTHMM, as Bickel et al. (1998), Baum and Petrie (1966), Leroux (1992) and also on our results in Gamiz et al. (2023). In the present paper the CTHMM is approached by discretization, so that previous results can be applied here, and the parameter θ can be estimated by the EM-algorithm. From the estimator θ we obtain an estimator of the generating matrix A and prove the asymptotic properties of this estimator such as normality and consistency.
We consider two different scenarii.

Scenario 1: Regular inspections in time
The true state of the system at time t, X t is non-observable. However, at regular intervals of length h, 0 < h < T , certain information related to the system is observed. Let us show an example of the type of problems that can be treated with this approach. Let us consider a continuous-time Markov chain (CTMC) {X t , t > 0} with generating matrix given by This model is a typical description of the functioning of a system with two units identical and independent. The units fail at a constant rate equal to λ. Once a unit fails is sent for repair. The repair system has capacity for the two units and the repair rate is also constant and equal to μ. The state of the system is expressed as the number of units functioning, then the state space is E = {0, 1, 2}. For assessing the system performance, we focus on certain functionals φ(A, t), such as the reliability of the system (the probability that the system has not suffered from failure before a given time), the availability function (the probability that the system is operative at a given time) or the expected hitting times (the expected time a particular class of states is reached by the system), among others. The usual procedure is to take a sample of identical systems that work under similar conditions and to estimate the unknown parameters λ and μ from the data. The main problem here is that we consider that the system is not directly observable and then we are not able to register information regarding the number of units that are functioning in the system at any given time. In contrast, at some regular times, t 0 = 0, t 1 = h, t 2 = 2h, . . ., t n = nh, . . ., we have access to certain indicators that provide useful, though partial, information about the state of the system. Let us denote by Y n the accessible information about the system that we can observe at time t n , for n > 0. We assume that {Y 0 , Y 1 , . . . , Y n } are (conditionally) independent and that for each n the distribution of the random variable Y n is determined by a parameter that depends on the current state of the system, that is, at time t n = nh.
In this situation, {Y 0 , Y 1 , . . . , Y n } can be seen as a sample from a DTHMM whose parameters can be written as a vector θ that includes λ, μ and some other parameters that determine the distribution of Y n givenX n = X nh = i, for i ∈ E.
We estimate θ n from observations of the hidden Markov model at times {t n = nh, n ≥ 0}. Leroux (1992) showed the strong consistency of this estimator, that is θ n → θ , and in Bickel et al. (1998) the weak convergence of the estimator to a Normal law is proven, that is, √ n( θ n − θ) → N (0, 0 ), in the case of HMM. Subsequently, for the plug-in estimator of the transition matrix Q h , that is Q h,n = Q h ( θ n ), the strong consistency and asymptotic normality are proven. At the same time the strong consistency and the asymptotic normality of the plug-in estimator of A h , that is A h,n = A( Q h,n ), is also demonstrated in this paper.
And as a consequence, for any functional within the following class H (t) = (A, t), H h (t) = (A h , t) is defined. Then the plug-in estimator for H h (t), H h,n , is shown to be strongly consistent and asymptotically normal. As an application of this result, the transition matrix function, H (t) = P(t) = e At , can be considered, as well as other functions such as reliability, availability, etc.

Scenario 2: Random inspections in time
For a given interval length T , the true state of the system at time t, X t is non-observable. Observations related to the underlying state are received at random times following an Homogenous Poisson Process (HPP), N (t), with intensity λ, unknown and estimated from the data. We consider the embedded Markov chain obtained from the visited states of X at the successive arrival times of the HPP. We denote Q its transition probability matrix and A the generating matrix of the CTMC. We assume that λ ≥ max{a i , i ∈ E}. Using the uniformization method, see, e.g., Kulkarni (2011), we define Q = (1/λ)A + I. Using the HMM, an estimator Q, of Q is obtained, and subsequently the corresponding estimator of the generating matrix A is obtained as A = λ Q − I . It is demonstrated that these estimators Q and A are strongly consistent and asymptotically normal.
The model discussed in this scenario should not be confused with the so called Markovmodulated Poisson process, which consists of a Poisson process whose rate is controlled by a non-observable continuous-time Markov chain. In Freed and Shepp (1982) it is considered a switched Poisson process, i.e., with only two states for the hidden chain. The authors assume that the rate of one of the states is zero and derive a simple formula for the asymptotic likelihood ratio that allows to estimate the state at any time from a stream of past events. In our case, a Poisson process is assumed to deal with the number of signals registered by an external observer but it is not responsible for the number of signals emitted by the hidden source nor for the nature of such signals, as it is the case of the Poisson process involved in a Markov-modulated Poisson process.
The results summarized here can be utilized in different areas, such as system reliability, biology, etc., e.g., in Wei et al. (2002) a CTHMM is proposed for evaluating network performance considering discrete time observations. Zhou et al. (2020) proposed as well a CTHMM where the observations can be collected regularly, irregularly or continuously. The number of states is unknown. They apply this model to bladder cancer data. In Verma et al. (2018) the authors develop a CTHMM under a generalized linear modeling framework to model the evolution of chronic obstructive pulmonary disease (COPD), again, the model considers discrete-time observations although they are irregularly-spaced observations. Similarly Hulme et al. (2021) proposed a CTHMM to estimate health conditions of patients that are monitored by wearable and mobile technology, the observations are as well irregularly-spaced. Finally, Liu et al. (2016) considers the use of CTHMM for modeling disease progression as well.
This paper is organized as follows. In Sect. 2 the model and its main characteristics are defined. The generator of the two-dimensional process is defined and obtained in terms of the generating matrix A and the emission distribution G. The basic properties of the model are obtained using a formulation based on semi-Markov processes. In Sect. 3, maximumlikelihood estimators for the parameters of the model are obtained based on data observed under two different time discretization schemes. That is, scenario 1, where observations arrive at times regularly pre-specified; and, scenario 2, where observations arrive randomly according to a homogeneous Poisson process. Some applications and particular cases of the model are described in Sect. 3.6.1. Some numerical examples are presented in Sect. 5 and the conclusions are given in Sect. 6.

The model
We follow the formulation of Bickel et al. (1998) and consider the "usual parametrization" of the model. Specifically, let {X t ; t ≥ 0} be a continuous-time Markov chain with finite state space E = {1, . . . , d} and generating matrix A = (a i j , ; i, j ∈ E). Let {Y t ; t ≥ 0} an Y-valued sequence such that given X t = i, Y t is conditionally distributed with density g(i, y) with respect to some σ -finite measure μ on Y, and fixed i ∈ E.
All processes are defined on a complete probability space ( , F , P), are right continuous and then progressively measurable. This will imply also a family of probabilities Both, the generating matrix A as well as the family of densities {g(i, ·); i ∈ E}, depend on a vector of parameters θ , that is A(i, j) = a i j (θ ) and g(i, ·) = g θ (i, ·), for all i, j ∈ E. The set of possible values of the vector θ is denoted ⊂ R k , and θ has to be estimated from a set of observations of the process {Y t }.
The vector θ usually includes the transition rates and also some parameters characterizing the densities g. As a particular case, it can be assumed that g(i, y) denotes a concrete parametric family of distributions with parameters β = (β 1 , β 2 , . . . , β s ), then we can write θ = (A * , β), with A * the matrix A without its principal diagonal, since the diagonal entries are functions of the off diagonal entries. In our previous example of a 3-state MC, if we have that g(i, ·) is the Normal distribution N (κ i , σ 2 i ), i ∈ E; the parameter vector θ will be: θ = (λ, μ, κ 1 , σ 2 1 , κ 2 , σ 2 2 , κ 3 , σ 2 3 ). In general Y can be seen as a subset in R q for some q. If Y t has density function g(i, ·), given that X t = i, then P(Y t ∈ B|X t = i) = B g(i, y)μ(dy), for B ⊂ Y, is the probability that the process Y t takes values in a subset B given X t = i, for i ∈ E. We also denote G(i, B) = P(Y t ∈ B|X t = i).
At some points of the paper we will discuss the simpler case that Y t takes values in a finite set, that is Y = {y 1 , . . . , y s } and then the emission function will be a d × s-dimensional matrix G, with elements G(i, y) = G(i, {y}) = P(Y t = y|X t = i), for all i ∈ E and y ∈ Y, and all t > 0 (see Gamiz et al. 2023).
For the rest of the paper we will consider the process (X , Y ) defined as follows.
Definition 1 Let X = {X t ; t ≥ 0} be an irreducible homogeneous Markov process in a finite set E and with generating matrix A = (a i j ) i, j∈E , and Y = {Y t ; t ≥ 0} is a homogeneous process in a general set Y ⊂ R q , with q ≥ 1 such that, for t > 0, the distribution of Y t is determined G(i, ·), over the event X t = i, for i ∈ E. Then we say that (X , Y ) is a twodimensional process with dependence structure M1-M0. That is, if B Y denotes the set of Borel subsets of Y, then for any i, j ∈ E, y ∈ Y and B ∈ B Y , and for all s, t > 0, where we also use homogeneity of the processes X and Y . Moreover, for t > 0, and, for t = 0, P(X t = j, Y t ∈ B|X 0 = i, Y 0 = y) = 1 if i = j and y ∈ B; and, 0 otherwise.

Generators
Let us focus on the two-dimensional process {(X t , Y t ); t ≥ 0}, where, as above, X is a continuous-time Markov chain taking values in the set E = {1, 2, . . . , d}, with generating matrix A and initial law α; and Y is a stochastic process taking values in a set Y ⊂ R q whose distribution depends on X t .
First we recall the concept of generator operator for a two-dimensional process with a general state space.
Proposition 1 Let (X , Y ) be a stochastic process where X is a CTMC with finite state space E and generating matrix A = (a i j ; i, j ∈ E), and let the transition probabilities of the process (X , Y ) be given by (1). The generator of the two-dimensional MC (X t , Y t ) can be written as for all i ∈ E and y ∈ Y.
Proof From definition 2 we have, for i ∈ E and y ∈ Y, In particular, when Y is a finite set, the generator is given by a matrix A as we show next. The general expression in definition 2 becomes We consider the following cases: Then the generator can be written as a matrix whose elements are Example 1 For illustrative purposes, let us construct the 2-dimensional generator for a simple case where E = {1, 2} and Y = {y 1 , y 2 , y 3 }. Then, the state space of the coupled process is E = {(1, y 1 ), (1, y 2 ), (1, y 3 ), (2, y 1 ), (2, y 2 ), (2, y 3 )}, and the corresponding generating matrix is where a ·· denote the elements of the generating matrix A of X , and G(·, ·) denote the emission probabilities.
From the generator in Proposition 1 we obtain the semigroup

Markov renewal equation for Markov processes
The above semigroup is difficult to handle numerically. For this reason we will consider another equivalent formulation via semi-Markov processes (see for example Limnios and Oprisan 2001). In fact, Markov processes are particular cases of semi-Markov processes and we may use the Markov renewal equation in order to obtain the above formulation (3) in a much easier numerical scheme.
Let us define the semi-Markov kernel of the Markov process X t in the following way. Let the process (X , Y ) defined as in Definition 1. For i, j ∈ E and B ∈ B Y , we define the following functions Note that the function L(i, j, t) is the semi-Markov kernel of the Markov process {X t }.
In particular, when Y t takes values in a finite set of size s, we can define the matrix of transition functions of the process (X , Y ), P t with dimension (d · s) × (d · s) and elements In the same way we can construct matrices M t and L t whose elements are given above. Given φ a measurable function defined on E × R + we define its convolution by L as follows Proposition 2 (Markov renewal equation) The function P(i, j, B, t) verifies the following Markov renewal equation where i, j ∈ E, B ∈ B Y , and t ≥ 0.
Proof We have that where (J 1 , T 1 ) describes the first jump of the Markov renewal process {(J n , T n ), n > 0} associated to the process X , that is J 1 = X T 1 . Then It has been proven that P(i, j, B, t) = M(i, j, B, t)+(L * P) (i, j, B, t), where * denotes the convolution operation defined in (5).
For the case that Y t takes values in a finite set we can write Eq. (6) in matrix notation as follows Then the solution of Eq. (6) is given by P(i, j, B, t) = ( * M) (i, j, B, t), see for example Limnios (2012).
Using the Markov renewal theorem (MRE), see Shurenkov (1984), the stationary distribution of the process (X , Y ) can be obtained in the following proposition.

Proposition 3 (Stationarity) Under the conditions above and if X is irreducible, we have
So we have that the probability distribution π whose elements are π( j, B) = π j G( j, B), for j ∈ E, and B ∈ B Y is the limit distribution of the process (X , Y ).
Proof For the first part of the proposition, let ρ denote the stationary distribution of the embedded Markov chain J n associated to the MC X t , with transition probabilities Let π be the row vector of stationary probabilities of the CTMC X t , that is, verifying that πA = 0. It is satisfied that π i = m i ρ i /m, with m i = 1/a i , the mean sojourn time in state i and m = j∈E m i ρ i .
Then, by the irreducibility of X and, as P(i, j, Then we define π( j, B) = π j G( j, B), for j ∈ E and B ∈ B Y the stationary distribution of the coupled process (X , Y ).
For the second part, we can write and, from the definition of the generator A and the definition of π, we get where the last equality is deduced from the fact that πA = 0. G( j, B) does not depend on the time, we can also use relation (4) to derive directly the limit in the first part of Proposition 3.

Estimation
As described in Sect. 2, the two-dimensional process (X , Y ) is fully specified by the generating matrix of the continuous-time MC X , A and the emission distribution G. Let us assume that A = A(θ ), and that G = G θ , that is, they both depend on a vector of unknown parameters θ . The problem here is to obtain estimators of the parameter from data. Let us consider that the MC X is unobservable while we can register information about the state of the process Y .
In other words, consider that our data come from a continuous-time HMM (X t , Y t ), where X t is the hidden process and we aim at estimating θ from a sample of observations of the process Y t in an interval of time [0, T ].

Scenario 1. Regular inspections in time
In this section, observations emitted by a CTHMM are recorded at regular time points. Let us consider a stochastic system according to a continuous-time Markov chain {X t ; t ≥ 0} with state space E = {1, 2, . . . , d} as above; generating matrix A, that is P(t) = e At , for all t > 0.
The true state of the system X t is non observable at any time t ∈ (0, T ]. However, at n regular pre-specified times denoted by 0 < h < 2h < · · · < nh = T , observations of a random process {Y t ; t ≥ 0} are available providing certain information about the true state of the system. Let us denoteŶ k = Y kh the k-th observed signal, which is recorded at time kh, for k ∈ {0, 1, . . . , n}. Associated to (or embedded in) the continuous-time process we can consider the following discrete time Markov chainX k = X kh , with transition matrix Q h = e Ah , whose (i, j)element is given by Q h (i, j) = P(X h = j|X 0 = i). The discrete Markov chainX k is called the discrete skeleton (at scale h) of X t , (see Kingman 1963) for ergodicity properties of Markov chains based on discrete skeletons).
Consider the sequence of times 0 < h < 2h < · · · < nh, thenŶ 0 ,Ŷ 1 , . . .,Ŷ n can be seen as a realization of the embedded discrete hidden Markov model that we denote by (X ,Ŷ ). We can use the results in Bickel et al. (1998) to estimate the transition matrix of this embedded hidden Markov chain {X n }, that is we obtain Q h , as well as the emission functions G(i, dy) = P(Ŷ k ∈ dy|X k = i), for i ∈ E.
For an interval of length h, we have Q h = P(h) = e Ah ; where A = (a i j ; i, j ∈ E) is the generating matrix of the hidden continuous-time Markov chain; P(h) is the corresponding transition function matrix; and, Q h is the transition matrix of the Markov chainX k . Then, using Taylor expansion, we approximate where o(h)/h → 0 as h → 0, and δ i j equal to 1 if i = j and 0 otherwise. Then, we define the estimator of the generating matrix as where Q h is the maximum likelihood estimator of the transition matrix corresponding to the discrete-time chainX , obtained as in Gamiz et al. (2023), and I the identity matrix. Note that since we write A = A(θ ) we also have that Q is a function of the parameter vector θ , that is, Q = Q(θ ).

Scenario 2. Random inspections in time
In this case, we consider that the observations of the process {Y t } are received at random times following a Poisson process N (t), independent of X t and Y t , with intensity λ, which is unknown and can be estimated from the data.
The number of observations into the interval (0, T ) is finite, random and equal to N (T ), where N (t), t ≥ 0 is a HPP with intensity λ ≥ max{a i }, and a i = −a ii > 0, for all i = 1, . . . , d.
Let us denoteŶ 0 ,Ŷ 1 , . . . ,Ŷ n the observations registered in an interval (0, T ]. The states visited by the Markov process X at the successive arrival times of the Poisson process can be seen as an embedded discrete-time hidden Markov chain Z n with transition probability matrix denoted by Q. The generating matrix of the hidden process is denoted by A already given before, then, using the uniformization method (see, e.g., Kulkarni 2011, or Girardin andLimnios 2018), we have that where I is the identity matrix. Using the HMM model we estimate again the transition matrix of the embedded Markov chain, Q. So we can define the following estimator where λ can be estimated by λ = N (T )/T .

Assumptions
In order to establish consistency and asymptotic normality of the estimators, we need the following assumptions. Let us define G θ (i, dy) := P θ (Y t ∈ dy|X t = i) = g θ (i, y)μ(dy), i ∈ E and θ ∈ ⊂ R k , with an open set, and where μ is some reference measure dominating all G θ (i, ·).

Asymptotic properties
Let (X , Y ) be the two-dimensional process as defined in Definition 1 in Sect. 2. Let A = (a i j ) denote the generating matrix of X t , t ≥ 0, with state space E = {1, 2, . . . , d}.
Consider the related Markov chains: Scenario 1:X n = X nh , with transition probability matrix Q h = e A h . Scenario 2: Z n = X T n where T n , n ≥ 1, are the arrival times of a H P P(λ), with λ ≥ max{a i , i = 1, . . . , d}, and T 0 = 0, where a i := −a ii . Its transition probability is Q = I + λ −1 A.

Lemma 1 (i) If the Markov process X is irreducible then the above Markov chains,X and
Z , are irreducible and aperiodic, i.e., ergodic. (ii) The Markov process X , and the Markov chainsX and Z have the same stationary probability, say π. (iii) If the Markov process X is stationary, then the Markov chainsX and Z are also stationary, and the sequenceŶ is stationary with the stationary distribution π G(·) = j∈E π j G( j, ·) in both scenarii. (iv) In our case M1 − M0, if {X n } is ergodic, the Markov chain (X n , Y n ) is ergodic with stationary probabilityπ( j, B) = π j G( j, B), with j ∈ E and B ∈ B Y , Borel sets in Y.
Proof (i) By construction theX and Z are irreducible too as X . Moreover they are aperiodic since Q h (i, i) > 0 and Q(i, i) > 0. (ii) The stationary probability π of the MP X is given by the solution of the equation πe Ah = π which is the same as forX . Moreover this is the same for Z too (see, e.g., Ross 1996). (iii) If the MP X is stationary, it means that P(X t = j) = π j , for all t ≥ 0 and all j ∈ E, and we have P(X kh = j) = P(X k = j) = π j , so thatX is stationary too. The same applies for Z .
(iv) The transition probability for the Markov process (X t , Y t ) is obtained by the limit of the transition probability. We showed in Sect. 2.2 that the transition probabilities for the process (X t , Y t ) are given by P(X t = j, Y t ∈ B|X 0 = i, Y 0 = y) = P i j (t)G( j, B), for i, j ∈ E, B ∈ B Y and y ∈ Y. Also we proved in Sect. 2.2 that π( j, B) = π j G( j, B).

Scenario 1. Regular inspections in time
In this section we establish asymptotic properties of the estimators. Let h > 0, and let Y 0 ,Ŷ 1 , . . . ,Ŷ n observations of a HMM as described in Sect. 3.2 for Scenario 1. We extend the sequence {Y n , n ≥ 0} to the sequence {Y n , n ∈ Z}.

Asymptotic Fisher information matrix
This matrix is nonsingular provided that there exists an integer n ∈ N such that I n (θ 0 ) is nonsingular.
Now for the processes X t , Y t in continuous time, we get the following results.

Proposition 4 (Consistency)
Let h > 0 be fixed. Under assumptions A1-A4, given a sample of observations {Ŷ 0 ,Ŷ 1 , . . . ,Ŷ n }, the estimator given in (7) is uniformly strongly consistent, that is Proof Uniform consistency of θ n implies uniform consistency of Q h,n , since Q h,n = Q( θ n ) is a continuous transformation of θ n , and so is A h,n , since we define A h,n = A( θ n ) = h −1 Q( θ n ) − I , then A h,n is a uniform consistent estimator of A h . Proof For a fixed h > 0, and T = nh, we have, as T → ∞, n → ∞, and we can use Therorem 1 of Bickel et al. (1998).
Let A be the generating matrix of {X t }, with A = A(θ ); then for h > 0 fixed, define and then, again using the Delta method, we obtain A = h −2 Q h = ∇A 0 ∇A, which does not depend on h.
In this scenario, it is important to check the accuracy, in terms of the value of h, of the approximation of A by A h . The following proposition proves that A h,n converges uniformly to A, when h goes to 0 and n goes to ∞, for fixed T .
Proposition 6 Under assumptions A1-A4, given a sample of observations {Ŷ 0 ,Ŷ 1 , . . . ,Ŷ n }, it is true that: (a) A h,n is an estimator of A uniformly strongly consistent. That is: , strong consistency is deduced because A h,n is a continuous transformation of θ , which gives us (a). For , for any h as n → +∞ and T → +∞. We can take h n = (n ln n) −1/2 in order that T = nh n → +∞ as n → +∞ so we can apply Proposition 5 in the first term of the right hand side of (9). On the other hand, Q h = e Ah , and, by the definition of A h , we have that A h n − A = A 2 2 h n + o(h n ), then, for the second term in the right hand side of (9) we have that √ n(A h n − A) = A 2 2 1 √ ln n + o(1/ √ ln n), and the conclusion follows from the uniqueness of the limit.

Applications
Proposition 6 allows us to prove the almost sure convergence of the plug-in estimators of functionals of type H (t) = (A, t). That is, let h > 0, then we can define the H h (t) = (A h , t) as H but based on A h . If a sample of size n of the HMM is available, we can define the plug-in estimator of H h (t) and deduce the properties of this estimator as above. Moreover, we have that As an example let us consider the reliability or survival function. Let A 0 and α 0 the restrictions of A and α on the up states. Then we can write the reliability formula where Q h,0 is the restriction of Q h on the up states, and 1 is a column-vector of 1 s with the appropriate dimension.
Proposition 7 For any arbitrary but fixed t > 0, such that t = nh, we have

Scenario 2. Random inspections in time
In this section we establish asymptotic properties for T → +∞.  Ross 1996). Also from the results in Gamiz et al. (2023) we have that Q T −→ Q. Then using Slutsky theorem (Gut 2013), we get
To check the convergence to a Normal distribution we consider the two terms of expression (10) separately. First, we have where we use that λ = N (T )/T . From the results in Bickel et al. (1998) and the fact that N (T ) → ∞ (a.s.) as T → ∞, we have that with variance-covariance matrix Q obtained by the Delta method applied to the function Q(θ ) = Q, similar to proposition 5.
Using that N (T )/T → λ, almost surely, as T → +∞, we get that the first term of the sum converges to a Normal distribution, that is On the other hand, N (T ) has Poisson distribution with mean λT , which can be approximated by a Normal distribution with both mean and variance equal to λT , that is Then, the second term in the sum is also asymptotically Normal with variance-covariance Specifically B is a d 2 -dimensional vector obtained by stacking the columns of B into a column vector, then 2 is also a matrix of dimension d 2 × d 2 , and then = 1 + 2 is well defined.
Then {(X t , Y t ); t > 0} is a two-dimensional continuous-time Markov chain with statespace E = E × Y and transition matrix P with elements P t ((i, y ), ( j, y)) = P i j (t)G( j, y), for i, j ∈ E and y , y ∈ Y. In matrix form, we have with elements a((i, y )( j, y)) as detailed in (2). Following similar arguments as in Gamiz et al. (2023), we consider that the state-space of the process X is split into two subsets U := {1, . . . , r }, the working states, and D := {r + 1, . . . , d}, the down states. Additionally, the system up states can be defined not only by U ⊂ E but also by some subset of Y. Then we consider also a partition in the set of possible observations, that is Y = Y 1 ∪ Y 2 , where in Y 1 we consider indicators of good performance of the system, and in Y 2 we consider the indicators warning of some serious problem in the system.
Let us denote τ the first time the system visits the set of down states D, i.e., the hitting time of set D. Let us consider Therefore the reliability of the system can be defined as R(t) = P(τ > t), for t ≥ 0. Conditioning on the initial state (i, y) ∈ U = U × Y 1 , we write and then for t > 0. Using matrix notation, we can write where A U denotes the sub-matrix of A with all transition rates among states of subset U .

Reliability for CTHMM with a general state space
When the Y process takes values in a finite set, the formula in (12) can be applied because the generator A U is a matrix, but when Y takes values in a subset of R q , A U is an operator and we can not make use of the formula in (12). In that case we propose to work by Markov renewal equations (in the semi-Markov way) in this section. Let us define H (i, t) = e −a i t G(i, Y 1 ), for i ∈ U , and let U be the Markov renewal function corresponding to the sub semi-Markov kernel L U (i, j, t) = a i j a i (1−e −a i t ), for i, j ∈ U . The following result can be proved similarly to Proposition 2. L(i, k, ds) Therefore, the conditional reliability is given by the only solution of the above equation, i.e.,

Numerical examples
In this section we illustrate the two scenarii discussed in the rest of the paper. In the first example, a real dataset is considered whereas in the second example a simulation study is carried out. In both cases, we discretize the continous-time problem, and then estimate using our algorithms in Gamiz et al. (2023), and finally we establish estimator properties in continuous-time, our initial problem.

Scenario 1. Regular inspections in time
As a illustration we consider a comparison study of the suicide-rate in the US and Japan during the period 1985-2015. The data have been taken from the data platform https://www. kaggle.com/. We focus on the following variables: the Suicide Rate, which is measured as the number of suicides/100k population, and, the Gross Domestic Product per capita (G D P_ per_capita).

Preliminary
Although the data registry provides information from a total of 94 countries we limit ourselves to the US and Japan. We have chosen two of the most developed countries in the world with the aim of comparing them with respect to the number of suicides per year.
In Fig. 1 we give a graphical description of the situation. On the top panel we describe the case of US while on the bottom panel we represent the information from Japan.
Looking at the figure from left to right we have the following. The plots on the left display the suicide-rate per year for US (top) and Japan (bottom) from 1995 to 2015, respectively. The suicide-rate has been calculated as the number of suicides registered in the country every year per 100,000 inhabitants. In each graph it is shown the value calculated every year as well as a smoothed curve obtained from these values and that helps in visualizing the tendency of the suicide-rate along the period of observation. As we can see from the curves, not only suicides occur in Japan at a significantly higher rate than in US, also it is remarkable the existence of a high variability in the rate of suicides in Japan over the years, which is not appreciated in the case of US where the tendency is more steady.
On the right-side panels, a scatterplot is given representing the GDP per capita from 1995 until 2015 in US (top) and Japan (bottom). What catches our attention is that again the situation in US seems to be more stable and suggests an increasing trend, with an almost constant slope, of the population standard of living in the US. We can see just one abrupt decay coinciding with the period of the economical crisis around 2008. However the GDP curve in Japan, although suggests a slightly increasing tendency from 1995 onwards, it shows high variability at the same period of time when the highest suicide rate is seen in the corresponding plot (on the left panel). Figure 2 shows smooth density estimations of the rate of suicides all along the observation period for both countries, US on the left panel, and Japan on the right one. As we can see, the curves in both cases seem to suggest two different "regimes" (hidden or latent) in the country. These regimes could be the result of the interaction of the country's wealth (measured in terms of the GDP) and possibly many other intrinsic factors which in conjunction can explain, to a certain extent, the observed rate of suicides at any particular time.
With this in mind, we propose to study this phenomenon using a CTHMM with the following specifications.
• {X (t); t > 0} is the CTMC that represents the internal regime that is not directly observable. Let us assume that X takes values in E = {1, 2}; • {Y (t), t > 0} is the observable process. We define Y (t) as the rate of suicides at time t.
Considering the plots in Fig. 2, we assume that Y (t) conditioned to the event {X (t) = i} follows a Normal distribution with mean μ i and standard deviation σ i , for i ∈ {1, 2}; We do not have a continuous-time follow-up of the process. We rather have annual observations, so we use the methodology presented in this paper and estimate the model considering Scenario 1 where observations arrive regularly in time, specifically with h = 1. Regarding the DTHMM for each case study (US and Japan), the following parameters have to be estimated where Q h (i, 1) = P(X h = 1|X 0 = i), for i ∈ {1, 2}. Then an estimator A h is obtained as explained in Sect. 3.1.

Results
We use the EM-Algorithm. The function Q(θ |θ (m) ) = E θ (m) [ln f (X ,Ŷ|θ)] will give us by successive iterations an approximation of the estimate of θ, where ln f (X ,Ŷ|θ) is the loglikelihood corresponding to the complete dataset, which would be obtained from the bivariate process (X ,Ŷ ).

• M-Step
Update θ (m) to θ (m+1) . The maximization step M is realized directly by Considering that the emission function follows a Normal law, g θ (i, y) = (σ i √ 2π) −1 exp −(y − μ i ) 2 /(2σ 2 i ) , for i = 1, 2. The optimal value of the second term Q 2 ((φ i ; i ∈ E)|θ (m) ) is obtained for ; and, We have estimated the model in both cases, i.e., with data from US as well as Japan. The estimated values reported in Table 1 are the following: the transition matrix Q h ; the parameters of the emission law, {G(i, ·) → N (μ i , σ i ); i = 1, 2}; the initial distribution, α; the generating matrix A and the stationary distribution of the continuous MC π. Figure 3 represents the transition probability functions between hidden states.

Scenario 2. Random inspections in time
To illustrate this approach we present a simulation study. In particular, we consider a system with four possible states, that is E = {1, 2, 3, 4}, with {1, 2} the functioning states whereas {3, 4} are the down states. The observed output may vary in the set Y = {y 1 , y 2 , y 3 , y 4 }. Besides, we assume that the system is inspected at times that follow a Poisson process with intensity λ. Let us assume that X 0 = 1 and Y 0 = y 1 . We have simulated a total of 500 samples of size N = 100 each according to the following algorithm.
1. Generate a sample trajectory (X 1 , T 1 ), . . . , (X N , T N ) of the MC X from the generating matrix A, where X n is the nth state visited by the process and T n the sojourn time of the system in the n − 1 for n ∈ {1, . . . , N }. Denote Z n = n k=1 T n , the successive jump times of the MC.  N k 0 , total number of observations obtained at the kth repetition of the experiment 2. Generate a sample Y 1 , . . . , Y N of the process Y , following the rule P(Y = y|X = i) = G i,y , with i ∈ {1, 2, 3, 4} and y ∈ Y. 3. Generate a sample trajectory of the PP with rate λ, that is, S 1 < S 2 < . . . < S N 0 . Simulate arrival times until S N 0 ≥ Z N . 4. PutŶ 0 = Y 0 . For each r ∈ {1, . . . , N 0 }, find n > 0 such that Z n ≤ S r < Z n+1 and definê Y r = Y n .
We have considered three cases, λ ∈ {6.5, 8, 9} Again we use the EM-algorithm, as in Gamiz et al. (2023), to fit the discrete HMM (X ,Ŷ ) and estimate the corresponding parameters Q and G. Also, for each repetition of the experiment, estimate λ = n 0 S n 0 . Then, obtain an estimation of the generating matrix A as explained in Sect. 3.1. Using the estimation of the emission matrix G we construct the generating matrix of the 2-dimensional process (X , Y ) and following 4.1 we can obtain the estimation of the reliability function, which is shown in Fig. 4.
Notice that N 0 ≥ N , since we choose λ > max{−a ii , i ∈ E}. In our case we have obtained the statistics displayed in Table 2.

Conclusion
Continuous-time hidden Markov processes can be seen as a two-dimensional continuous-time Markov process (X t , Y t ) where the first component X t is an unobservable continuous-time Markov chain and the second one Y t is an observable process whose distribution law depends on X t through a function called the emission function. In this paper, we have defined the generator function corresponding to the coupled process in terms of the generating matrix of X t and the emission function. The theoretical properties of this type of processes have been obtained using a semi-Markov formulation of the model. To estimate the characteristics of the process we have considered two different discretization schemes under which observations can arrive. On the one hand, we assume that observations arrive regularly in time, and on the other hand, we assume that observations arrive at random. In both cases, maximum-likelihood estimator of the parameters of the CTHMM have been obtained and their asymptotic properties have been proven.
To our knowledge, the approach to HMM models considered in this paper, basing on Markov renewal theory, is completely new and provides a powerful tool to get other insights in this context of HMMs.
With respect to the estimation problem, as pointed out in Liu et al. (2017), when discretization is too coarse, many transitions of the hidden states can occur between two consecutive observations and the dynamics of the hidden process might be poorly caught by the model. This situation may happen in our first scenario when the time-span between observations (h) is not fine enough. Then, more flexible models are required and it is suitable a continuous follow-up of the process that will be considered in a future work.
In the applications discussed in this paper we have assumed that the set of hidden states is finite and the number of states is pre-specified. Generalizations of our work can be to consider selecting the optimal number of hidden states, as in Lin and Song (2022) where the number of states is unknown and to be determined by the data. Another approach that will be considered in future works, consists of defining the state space of the hidden chain, in general as a measurable set. See for example, Dorea and Zhao (2002) where kernel density estimation is discussed for the observed process in the context of DTHMM.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.