Appendix
Appendix 1: derivatives of the forward-backward recursions
First of all we have that
$$\begin{aligned} \frac{\partial \log l^{(1)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}&= \frac{\partial \log \pi (u|\varvec{x}^{(1)})}{\partial \varvec{\theta }}\\&+\,\,\frac{\partial \log \phi ^{(1)}(\varvec{y}^{(1)}|u,\varvec{w}^{(1)})}{\partial \varvec{\theta }} \end{aligned}$$
and
$$\begin{aligned}&\frac{\partial \log l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}= \frac{\partial \log l^{(t-1)}({\bar{u}},\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}\\&\quad +\,\,\frac{\partial \log \pi ^{(t)}(u|{\bar{u}},\varvec{x}^{(t)})}{\partial \varvec{\theta }} +\frac{\partial \log \phi ^{(t)}(\varvec{y}^{(t)}|u,\varvec{w}^{(t)})}{\partial \varvec{\theta }}. \end{aligned}$$
Now considering Eq. (8) we have that
$$\begin{aligned} \frac{\partial \log l^{(t)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }} = \sum _{{\bar{u}}=1}^k\frac{ l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{l^{(t)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}\frac{\partial \log l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}, \end{aligned}$$
which may be recursively computed for \(t=2,\ldots ,T\) also taking into account the results in Appendix 2 and that
$$\begin{aligned} \frac{\partial \log \phi ^{(t)}(\varvec{y}|u,\varvec{w})}{\partial \varvec{\alpha }}= \sum _{j=1}^r\frac{\partial \log \phi ^{(t)}_j(y_j|u,\varvec{w})}{\partial \varvec{\alpha }}. \end{aligned}$$
In the end we obtain
$$\begin{aligned} \frac{\partial \log f(\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}= \sum _{u=1}^k \frac{l^{(T)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{f(\tilde{\varvec{y}}|\tilde{\varvec{z}})} \frac{\partial \log l^{(T)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}. \end{aligned}$$
In a similar way we have that
$$\begin{aligned} \frac{\partial \log m^{(T)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }}=0 \end{aligned}$$
and
$$\begin{aligned} \frac{\partial \log m^{(t)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }} \!=\! \sum _{u=1}^k \frac{m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{m^{(t)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})} \frac{\partial \log m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }} \end{aligned}$$
for \(t=2,\ldots ,T-1\), where
$$\begin{aligned} \frac{\partial \log m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }}&= \frac{\partial \log m^{(t+1)}(\tilde{\varvec{y}}|u,\tilde{\varvec{z}})}{\partial \varvec{\theta }}\\&+\,\, \frac{\partial \log \pi ^{(t+1)}(u|{\bar{u}},\varvec{x}^{(t+1)})}{\partial \varvec{\theta }} \\&+\,\, \frac{\partial \log \phi ^{(t+1)}(\varvec{y}^{(t+1)}|u,\varvec{w}^{(t+1)})}{\partial \varvec{\theta }}. \end{aligned}$$
Then these derivatives may be computed by a backward recursion.
Appendix 2: derivatives of the density and probability mass functions
In the case of a canonical GLM parametrization, and considering the general situation of multivariate outcomes, for the measurement component we have
$$\begin{aligned}&\frac{\partial \log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }}=\frac{y-\mu ^{(t)}(u,\varvec{w})}{g(\tau )} \varvec{a}^{(t)}_{ju\varvec{w}},\\&\frac{\partial ^2\log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }\partial \varvec{\alpha }^{\prime }}= -V(Y^{(t)}|U^{(t)}=u, \varvec{W}^{(t)}=\varvec{w}) \varvec{a}^{(t)}_{ju\varvec{w}}(\varvec{a}^{(t)}_{ju\varvec{w}})^{\prime }, \end{aligned}$$
where \(\tau \) denotes the dispersion parameter and \(g(\tau )\) denotes the function involving this parameter in the typical expression for an exponential family distribution (McCullagh and Nelder 1989). In the case of categorical data where a multinomial logit parametrization is adopted, we have
$$\begin{aligned} \frac{\partial \log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }}= (\varvec{A}^{(t)}_{ju\varvec{w}})^{\prime }\varvec{G}_{1c_j}^{\prime }(\varvec{e}_j(y\!+\!1)-\varvec{\phi }_j^{(t)}(u,\varvec{w})), \end{aligned}$$
where \(\varvec{e}_c(y+1)\) is a vector of \(c\) zeros with element \(y+1\) equal to 1 (because the first category is labelled as 0) and
$$\begin{aligned} \frac{\partial ^2\log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }\partial \varvec{\alpha }^{\prime }}\!=\! (\varvec{A}^{(t)}_{ju\varvec{w}})^{\prime }\varvec{G}_{1c_j}^{\prime }\varvec{\Omega }\left( \varvec{\phi }_j^{(t)}(u,\varvec{w})\right) \varvec{G}_{1c_j}\varvec{A}^{(t)}_{ju\varvec{w}}, \end{aligned}$$
where, for a generic probability vector \(\varvec{f}\), we have \(\varvec{\Omega }(\varvec{f})=\mathrm{diag}(\varvec{f})-\varvec{f}\varvec{f}^{\prime }\).
Regarding, the other derivatives, we have
$$\begin{aligned}&\frac{\partial \log \pi (u|\varvec{x})}{\partial \varvec{\beta }}=\varvec{B}^{\prime }_{\varvec{x}}\varvec{G}_{1k}^{\prime }(\varvec{e}_k(u)-\varvec{\pi }(\varvec{x})),\\&\frac{\partial ^2\log \pi (u|\varvec{x})}{\partial \varvec{\beta }\partial \varvec{\beta }^{\prime }}= -\varvec{B}^{\prime }_{\varvec{x}}\varvec{G}_{1k}^{\prime }\varvec{\Omega }(\varvec{\pi }(\varvec{x}))\varvec{G}_{1k}\varvec{B}_{\varvec{x}}, \end{aligned}$$
and, finally,
$$\begin{aligned}&\frac{\partial \log \pi ^{(t)}(u|{\bar{u}},\varvec{x})}{\partial \varvec{\gamma }}=\big (\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}\big )^{^{\prime }} \varvec{G}_{{\bar{u}}k}^{\prime }\left( \varvec{e}_k(u)-\varvec{\pi }^{(t)}({\bar{u}},\varvec{x})\right) ,\\&\frac{\partial ^2\log \pi ^{(t)}(u|{\bar{u}},\varvec{x})}{\partial \varvec{\gamma }\partial \varvec{\gamma }^{\prime }}= -\big (\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}\big )^{^{\prime }} \varvec{G}_{{\bar{u}}k}^{\prime }\varvec{\Omega }\left( \varvec{\pi }^{(t)}({\bar{u}},\varvec{x})\right) \varvec{G}_{{\bar{u}}k}\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}, \end{aligned}$$
where \(\varvec{\pi }(\varvec{x})\) is the column vector of the initial probabilities \(\pi (u|\varvec{x})\) and \(\varvec{\pi }^{(t)}(\bar{u},\varvec{x})\) is that of the transition probabilities \(\pi ^{(t)}(u|\bar{u},\varvec{x})\), with \(u=1,\ldots ,k\).