1 Introduction

The latent linear correlation (LLC) is a measure of bivariate association which is usually adopted when variables are measured at an ordinal level or when data are available in the form of frequency or contingency tables. Because LLCs are quite often used in analyzing categorical ordered variables, they are also known as polychoric correlations [59]. Latent linear or polychoric correlations differ from other measures of association such as Goodman and Kruskal’s \(\gamma \) or Kendall’s \(\tau \) in that they are based on a latent continuous parametric model according to which LLCs behave. Given a set of J variables, LLCs are computed pairwise for each pair jk of variables by considering their joint frequencies \({\mathbf {N}}_{R\times C}^{jk} = (n_{11}^{jk},\ldots ,n_{rc}^{jk},\ldots ,n_{RC}^{jk} )\) over a \(R^{jk}\times C^{jk}\) partition space of the variables’ domain. The general idea is to map the observed counts \({\mathbf {N}}_{R\times C}^{jk}\) to the real domain of the bivariate latent density model via the Muthen’s thresholds-based approach [57], under the constraint that the volumes of the rectangles of the latent density should be equal to the observed frequencies. In doing so, changing the covariance parameter of the latent model will change the probability distribution over the latent rectangles and hence the probability masses over the cells of \({\mathbf {N}}_{R\times C}^{jk}\). Although several parametric models are available for estimating LLCs (e.g., elliptical, skew-Gaussian, Copula-based models; see [47, 63, 67]), the standard formulation based on the Gaussian density with zero means and latent correlations \({\mathbf {R}}^{jk}_{R\times C}\) is strong enough to be of practical use for many empirical applications. (For a recent study, see [39, 56].) Some of these include inter-rater agreement [60], reliability measurement [11, 70, 81], ordinal CFA and SEM [50, 58, 78], fuzzy cluster analysis [73], and polychoric PCA for dimensionality reduction in discrete data [45].

Fuzzy frequency or contingency tables are of particular concern across several disciplines including social, behavioral, and health sciences. Overall, there are two main situations which give rise to fuzzy frequencies, namely when precise data are classified into imprecise categories or, in the opposite case, when fuzzy data are classified into either precise or imprecise categories. Examples of the first case may be found in studies involving socioeconomic variables (e.g., income, labor flushes, employment) [19, 77], images or scenes classification [26, 38], content analysis [43], reliability analyses [23], evaluation of user-based experiences [46], multivariate analysis of qualitative data [3, 8], spatial distributional data [31], and human-based risk assessment [20]. By contrast, examples of the second case are most common in studies involving rating scales-based variables such as satisfaction, quality, attitudes, and motivation [12, 21]. What both of these situations have in common is that the \(R^{jk}\times C^{jk}\) space constitutes a fuzzy partition and, consequently, observed counts in the classification grid are no longer natural numbers. There have been a number of studies that have tried to deal with fuzzy contingency tables and fuzzy association measures. For instance, Kahraman et al. [42] proposed some nonparametric tests generalized to the case of fuzzy data, Grzegorzewski [32] studied fuzzy hypotheses testing based on fuzzy random variables, Denœux [24] proposed a rank-sum test based on fuzzy partial ordering and introduced a modelization of fuzzy statistical significance test, Hryniewicz [36] generalized the Goodman and Kruskal’s \(\gamma \) measure to the case of fuzzy observations arranged into contingency tables, and Taheri et al. [69] presented the analysis of contingency tables for both fuzzy observations/crisp categories and crisp observations/fuzzy categories cases along with a fuzzy generalization of association measures based on frequencies. Although they differ in some respects, all of them generalize the analysis of contingency tables to the fuzzy case either by the Zadeh’s extension principle or by the \(\alpha \)-cuts-based calculus [72]. Fuzzy statistics aside, a more recent strategy to incorporate imprecision and indeterminacy in count data is that of using a neutrosophic-based generalization of the standard chi-square and F-statistics [4, 5].

Based on this research stream, this article focuses on estimating latent linear correlations from fuzzy frequency tables, which include both the cases of crisp observations/fuzzy categories and fuzzy observations/fuzzy or crisp categories. Unlike the aforementioned studies, we develop our results by generalizing the standard LLC problem to cope with fuzzy frequencies under the general fuzzy maximum likelihood framework [24, 62]. In particular, we define the fuzzy frequency table \({{\widetilde{\mathbf {N}}}}_{R\times C}^{jk}\) in terms of fuzzy cardinality and generalized natural numbers first, and then we extend the sample space of the LLC model to deal with fuzzy counts \({{\tilde{n}}}_{11}^{jk},\ldots ,{{\tilde{n}}}_{rc}^{jk},\ldots ,{{\tilde{n}}}_{RC}^{jk}\). In doing so, the fuzziness of the observations enters the model as a systematic and non-random component, while the model’s parameters are still crisp (i.e., the estimated latent correlation matrix \({{\widehat{\mathbf {R}}}}^{jk}_{R\times C}\) is a non-fuzzy quantity). This offers an attractive solution to the problem of estimating LLCs with fuzzy information, with the additional benefit that statistical models that use the LLCs statistic as input data (e.g., CFA, PCA, SEM) do not need any further generalization to cope with fuzzy data.

The remainder of this article is structured as follows. Section 2 introduces the concept of fuzzy frequency through fuzzy cardinalities and generalized natural numbers. Section 3 describes the fuzzy LLCs model and its characteristics in terms of parameters estimation and interpretation. Section 4 reports the results of a simulation study performed to assess the finite sample properties of the fuzzy LLCs model as compared with standard defuzzification-based estimation methods. Section 5 describes the application of the proposed method to two real case studies, and Sect. 6 concludes the article by providing final remarks and suggestions for further extensions of the current findings. All the materials like algorithms and datasets used throughout the article are available to download at https://github.com/antcalcagni/fuzzypolychoric/.

2 Fuzzy Frequencies

2.1 Preliminaries

A fuzzy subset \({\tilde{A}}\) of a universal set \({\mathcal {A}} \subset {\mathbb {R}}\) can be defined by means of its characteristic function \(\xi _{\widetilde{A}}:{\mathcal {A}}\rightarrow [0,1]\). It can also be expressed as a collection of crisp subsets called \(\alpha \)-sets, i.e., \({\tilde{A}}_\alpha = \{x \in {{\mathcal {A}}}: ~\xi _{\widetilde{A}}(x) > \alpha \}\) with \(\alpha \in (0,1]\). If the \(\alpha \)-sets of \({\tilde{A}}\) are all convex sets, then \({\tilde{A}}\) is a convex fuzzy set. The support of \({\tilde{A}}\) is \({A}_{0} = \{x \in {{\mathcal {A}}}: ~\xi _{\widetilde{A}}(x) > 0 \}\) and the core is the set of all its maximal points \({A}_{1} = \{x \in {{\mathcal {A}}}: ~\xi _{\widetilde{A}}(x) = \max _{z \in {{\mathcal {A}}}}~ \xi _{\widetilde{A}}(z) \}\). In case \(\max _{x\in {{\mathcal {A}}}} \xi _{\widetilde{A}}(x) = 1\), then \({\tilde{A}}\) is a normal fuzzy set. If \({\tilde{A}}\) is a normal and convex subset of \({\mathbb {R}}\), then \({\tilde{A}}\) is a fuzzy number (also called fuzzy interval). The quantity \(l({{\tilde{A}}}) = \sup {A}_0 - \inf {A}_0\) is the length of the support of the fuzzy set \({{\tilde{A}}}\). The simple cardinality of a fuzzy set \({{\tilde{A}}}\) is defined as \(|{{\tilde{A}}}| = \int _{{{\mathcal {A}}}}\xi _{\widetilde{A}}(x)~\mathrm{d}x\). Given two fuzzy sets \({{\tilde{A}}}, {\tilde{B}}\), the degree of inclusion of \({\tilde{A}}\) in \({\tilde{B}}\) is \(\epsilon _{{\tilde{A}}{\tilde{B}}} = \left| \min _{x\in {{\mathcal {A}}}}\left( \xi _{\widetilde{A}}(x),\xi _{\widetilde{B}}(x)\right) \right| \big / \max (1,|{{\tilde{A}}}|)\), with \(\epsilon _{{\tilde{A}}{\tilde{B}}}\in [0,1]\). The case \(\epsilon _{{\tilde{A}}{\tilde{B}}}=1\) indicates that \({{\tilde{A}}}\) is fully included in \({{\tilde{B}}}\). The class of all normal fuzzy numbers is denoted by \({{\mathcal {F}}}({\mathbb {R}})\). Fuzzy numbers can conveniently be represented using parametric models that are indexed by some scalars. These include a number of shapes like triangular, trapezoidal, Gaussian, and exponential fuzzy sets [34]. A relevant class of parametric fuzzy numbers are the so-called LR-fuzzy numbers [27] and their generalizations [13, 70]. The trapezoidal fuzzy number is one of the most common fuzzy set used in many applications, and it is parameterized using four parameters as follows:

$$\begin{aligned} \xi _{\widetilde{A}}(x) = \mathbb {1}_{(c_1,c_2)}(x) + \bigg (\frac{x-x_l}{c_1-x_l}\bigg ) \mathbb {1}_{(x_l,c_1)}(x) + \bigg (\frac{x_u-x}{x_u-c_2}\bigg )\mathbb {1}_{(c_2,x_u)}(x) \end{aligned}$$
(2.1)

with \(x_l,x_u,c_1,c_2 \in {\mathbb {R}}\) being lower, upper bounds, and first and second modes, respectively. The symbol \(\mathbb {1}_{(a,b)}(x)\) denotes the indicator function in the interval (ab). Interestingly, the trapezoidal fuzzy set includes the triangular (if \(c_1=c_2\)) and rectangular (if \(x_l=c_1, c_2=x_u\)) fuzzy sets as special cases. A degenerated fuzzy number \(\mathring{A}\) is a particular fuzzy set with \(\xi _{\widetilde{A}}(c)=1\) and \(\xi _{\widetilde{A}}(x)=0\) for \(x\ne c\), \(x\in {{\mathcal {A}}}\). Note that rectangular and degenerated fuzzy numbers can be adopted to represent crisp categories and crisp observations, respectively. When a probability space is defined over \({{\mathcal {A}}}\), the probability of a fuzzy set \({{\tilde{A}}}\) can be defined as \({\mathbb {P}}({\tilde{A}}) = \int _{{{\mathcal {A}}}} \xi _{\widetilde{A}}(x)d{\mathbb {P}}\) (Zadeh’s probability). Similarly, the joint probability of two fuzzy sets is \({\mathbb {P}}({\tilde{A}}{\tilde{B}}) = \int _{{{\mathcal {A}}}} \xi _{\widetilde{A}}(x)\xi _{\widetilde{B}}(x)d{\mathbb {P}}\) under the rule \(\xi _{{\tilde{A}}{\tilde{B}}}(x)= \xi _{\widetilde{A}}(x)\xi _{\widetilde{B}}(x)\) (independent fuzzy sets) [52].

2.2 Fuzzy Granules

Let \({{\mathcal {S}}} = \{{\tilde{A}}_1,\ldots ,{{\tilde{A}}}_i,\ldots ,{\tilde{A}}_I\}\) be a sample of I fuzzy or non-precise observations with \({\tilde{A}}_i\) being a fuzzy number as defined by Eq. (2.1). Then, the interval \({{\mathcal {R}}}({{\mathcal {S}}}) = [r_{0},r_{1}]\subset {\mathbb {R}}\) is the range of the fuzzy sample where \(r_0 = \min \{A^\dagger _{0_1},\ldots ,A^\dagger _{0_I}\}\) and \(r_1 = \max \{A^\dagger _{0_1},\ldots ,A^\dagger _{0_I}\}\), with \(A^\dagger _{0_i}\) being the infimum of the support set \(A_{0_i}\) computed for the ith fuzzy observation. A collection \({{\mathcal {G}}} = \{{\tilde{G}}_1,\ldots ,{\tilde{G}}_c,\ldots ,{{\tilde{G}}}_C\}\) of C fuzzy sets is a fuzzy partition of \({{\mathcal {R}}}({{\mathcal {S}}})\) if the following two properties hold (i) \(\max _{i=1,\ldots ,I} l({{\tilde{A}}}_i) \le \min _{c=1,\ldots ,C} l({{\tilde{C}}}_c)\) and (ii) \(\sum _{c=1}^{C} \xi _{\widetilde{G_c}}(x) = 1\) (Ruspini’s partition) [9, 29]. The fuzzy sets in \({{\mathcal {G}}}\) are also called granules of \({{\mathcal {R}}}({{\mathcal {S}}})\). The evaluation of the amount of fuzzy observations in a granule \({\tilde{G}}_c\) is called cardinality (scalar or fuzzy) and can be used to compute fuzzy frequencies or counts for a partition \({{\mathcal {G}}}\) given a sample \({{\mathcal {S}}}\). Figure 1 (left-side panels) shows an example of fuzzy granulation for both fuzzy and crisp observations.

2.3 Fuzzy Counts as Generalized Natural Numbers

Let \({{\tilde{\mathbf {x}}}}_j = \{{\tilde{x}}^j_{1},\ldots ,{\tilde{x}}^j_{i},\ldots ,{\tilde{x}}^j_{I}\}\) and \({{\tilde{\mathbf {x}}}}_k = \{{\tilde{x}}^k_{1},\ldots ,{\tilde{x}}^k_{i},\ldots ,{\tilde{x}}^k_{I}\}\) be two samples of fuzzy observations and \({{\tilde{\mathbf {g}}}}_j = \{{{\tilde{g}}}^j_{1},\ldots ,{{\tilde{g}}}^j_{c},\ldots ,{\tilde{g}}^j_{C}\}\) and \({{\tilde{\mathbf {g}}}}_k = \{{{\tilde{g}}}^k_{1},\ldots ,{{\tilde{g}}}^k_{r},\ldots ,{\tilde{g}}^k_{R}\}\) be two fuzzy partitions of the domains \({{\mathcal {R}}}({{\tilde{\mathbf {x}}}}_j)\) and \({{\mathcal {R}}}({{\tilde{\mathbf {x}}}}_k)\). Given a pair of granule \(({{\tilde{g}}}_r, {{\tilde{g}}}_c)\), a fuzzy or imprecise count for the joint sample \(({{\tilde{\mathbf {x}}}}_j,{{\tilde{\mathbf {x}}}}_k)\) is a fuzzy set \({{\tilde{n}}}_{rc}^{jk}\) with membership function \(\xi _{\widetilde{n}_{rc}}^{jk}: {\mathbb {N}}_{0} \rightarrow [0,1]\). As it is defined over natural numbers, a fuzzy count is a finite generalized natural number for which extended operations are available (e.g., addition, multiplication) [75]. Analogously to fuzzy intervals, the class of all fuzzy counts is denoted as \({{\mathcal {F}}}({\mathbb {N}}_{0})\). There are different choices for the computation of \(\xi _{\widetilde{n}_{rc}}^{jk}\) (e.g., see [22, 36, 69, 71, 72]). In this contribution, we will follow the findings of [9, 10] which are based on Zadeh’s fuzzy counting functions [79] and fuzzy cardinalities [14]. More precisely, let

$$\begin{aligned} {\varvec{\epsilon }}_{rc}^{jk} = \left( \epsilon _{rc_1}^{jk},\ldots ,\epsilon _{rc_i}^{jk},\ldots ,\epsilon _{rc_I}^{jk}\right) \end{aligned}$$

be the vector of joint degrees of inclusion for the rcth granule where

$$\begin{aligned}&\epsilon _{rc_i}^{jk} = \min \left( \epsilon _{r_i}^j,\epsilon _{c_i}^k \right) ,\\&\epsilon _{r_i}^{j} = \big |\min _{x\in {{\mathcal {R}}}({{\tilde{\mathbf {x}}}}_j)} (\xi _{\widetilde{x}_{ji}}(x),\xi _{\widetilde{g}_{r}}(x))~\big |~ \big / \max (1,|{{\tilde{x}}}_{i}^j|), \\&\epsilon _{c_i}^{k} = \big |\min _{x\in {{\mathcal {R}}}({{\tilde{\mathbf {x}}}}_k)} (\xi _{\widetilde{x}_{ki}}(x),\xi _{\widetilde{g}_{c}}(x))~\big |~ \big / \max (1,|{{\tilde{x}}}_{i}^k|), \end{aligned}$$

with |.| being the simple cardinality according to the definition given in Sect. 2.1. For \(n\in {\mathbb {N}}_0\), the fuzzy count is as follows:

$$\begin{aligned} \xi _{\widetilde{n}_{rc}}^{jk}(n) = \min \left( \mu _{\mathrm{FLC}}(n), \mu _{\mathrm{FGC}}(n) \right) \end{aligned}$$
(2.2)

with \(\mu _{\mathrm{FLC}}(n)\) and \(\mu _{\mathrm{FGC}}(n)\) being the output of the Zadeh’s fuzzy counting functions [79]. The following calculus can be used for \(\mu _{\mathrm{FLC}}(n)\) and \(\mu _{\mathrm{FGC}}(n)\). First, compute the square matrix of differences \({\mathbf {Z}}_{I\times I} = \left( {\varvec{\epsilon }}_{rc}^{jk}{\mathbf {1}}_I^T - {\mathbf {1}}_I({\varvec{\epsilon }}_{rc}^{jk})^T\right) \), with \({\mathbf {1}}_I\) being an \(I\times 1\), vector of all ones. Then, for each \(i=1,\ldots ,I\) the vector \({\mathbf {z}}_{I\times 1}\) is computed, with \(z_i = {\mathbf {1}}_I^T{\mathcal {H}}({\mathbf {Z}}_{,i})\) and \({\mathcal {H}}(x)\) being the Heaviside step function defined by \({\mathcal {H}}(x) := \{0 ~\text {if}~ x<0~,~ 1 ~\text {if}~ x\ge 0\}\). The vector \({\mathbf {z}} = (z_1,\ldots ,z_i,\ldots ,z_I)\) contains the sums of the output of the Heaviside function applied column-wise on \({\mathbf {Z}}\). Finally, for \(n=0,1,2,\ldots ,I\) the Zadeh’s counting functions are as follows:

$$\begin{aligned} \mu _{\mathrm{FGC}}(n)&= \max \left( {\mathcal {H}}({\mathbf {z}}-n)\odot {\varvec{\epsilon }}_{rc}^{jk} \right) ,\nonumber \\ \mu _{\mathrm{FLC}}(n)&= 1-\max \left( {\mathcal {H}}({\mathbf {z}}-n+1)\odot {\varvec{\epsilon }}_{rc}^{jk} \right) , \end{aligned}$$
(2.3)

where \(\odot \) is the element-wise product, whereas \({\mathcal {H}}(x)\) is the Heaviside function defined as above. Thus, the membership function of \({{\tilde{n}}}_{rc}^{jk}\) is defined as the minimum among the degree of possibility that at least n elements from \(({{\tilde{\mathbf {x}}}}_j,{{\tilde{\mathbf {x}}}}_k)\) are included in the rcth granule (FGC count) and the degree of possibility that at most n elements are included in the rcth granule (FLC count). By applying Eqs. (2.2) and (2.3) for each pair of granules \(({{\tilde{g}}}_1,{\tilde{g}}_1),\ldots ,({{\tilde{g}}}_r,{\tilde{g}}_c),\ldots ,({{\tilde{g}}}_R,{\tilde{g}}_C)\), one obtains the fuzzy frequency matrix \({{\widetilde{\mathbf {N}}}}_{R\times C}^{jk}\). Note that the resulting fuzzy set \(\xi _{\widetilde{n}_{rc}}\) may not be normal, i.e., \(\max _{n} \xi _{\widetilde{n}_{rc}}(n) \le 1\), and a post hoc normalization should be applied if normal fuzzy sets were needed.

Finally, it is relevant to point out that Eqs. (2.2) and (2.3) are quite general and can be applied for the cases of fuzzy observations/fuzzy categories, crisp observations/fuzzy categories, and fuzzy observations/crisp categories. In this context, crisp observations and crisp categories can be realized by means of degenerated fuzzy sets and rectangular fuzzy sets, respectively. For the special case of crisp observations/crisp categories, the resulting fuzzy set \(\xi _{\widetilde{n}_{rc}}\) is degenerate. Figure 1 shows an exemplary case of fuzzy frequencies for fuzzy observations and fuzzy categories (Fig. 1A, middle and rightmost panels) and crisp observations and fuzzy categories as well (Fig. 1B, middle and rightmost panels).

Fig. 1
figure 1

Examples of fuzzy granules and fuzzy counts for A fuzzy triangular observations and fuzzy trapezoidal categories and B crisp observations and fuzzy trapezoidal categories. Note that in both cases frequencies are represented as generalized natural numbers

figure a

3 LLCs for Fuzzy Frequency Tables

In this section, we describe the statistical procedure for computing latent linear correlations when observations are in the general form of fuzzy frequencies.

3.1 Model

Let \(X = (X^j_i,X^k_i) ~ i=1,\ldots ,I\) be a collection of pairs of continuous random variables (\(j,k \in \{1,\ldots ,J\}, j\ne k\)) following the bivariate Gaussian distribution centered at zero with correlation parameter \(\rho _{jk} \in [-1,1]\) and density

$$\begin{aligned} f_X({\mathbf {x}};\rho _{jk}) = \frac{1}{2\pi \sqrt{1-\rho _{jk}^2}}\exp \left( -\frac{1}{2}\left[ \frac{(x^{j})^2+(x^{k})^2-2x^{j}x^k \rho _{jk}}{1-\rho _{jk}^2} \right] \right) , \end{aligned}$$
(3.1)

for \(-\infty< x^j < \infty \) and \(-\infty< x^k < \infty \). Without loss of generality, consider the collection of fuzzy observations

$$\begin{aligned} {{\tilde{\mathbf {y}}}} = \{({{\tilde{y}}}^{j}_1,{{\tilde{y}}}^{k}_1),\ldots ,({{\tilde{y}}}^{j}_i,{{\tilde{y}}}^{k}_i),\ldots ,({{\tilde{y}}}^{j}_I,{{\tilde{y}}}^{k}_I)\}, \end{aligned}$$

which relates to the (latent) bivariate Gaussian model in Eq. (3.1) via the constraint

$$\begin{aligned} ({{\tilde{y}}}^j_i \in {{\tilde{g}}}_r^j) \wedge ({{\tilde{y}}}^k_i \in {{\tilde{g}}}_c^k) \quad \text {iff}\quad (X^j_i,X^k_i) \in (\tau ^{X^j}_{r-1}, \tau ^{X^j}_{r}] \times (\tau ^{X^k}_{c-1}, \tau ^{X^k}_{c}]\subset {\mathbb {R}}^2, \end{aligned}$$
(3.2)

where \(\in \) is intended as fuzzy membership, \(({{\tilde{g}}}_r^j,{{\tilde{g}}}_c^k)\) are observed fuzzy categories or granules, and the arrays \(\varvec{\tau }_{X^j} = (\tau ^{X_j}_0,\ldots ,\tau ^{X_j}_r,\ldots ,\tau ^{X_j}_R)\) and \(\varvec{\tau }_{X^k} = (\tau ^{X_k}_0,\ldots ,\tau ^{X_k}_c,\ldots ,\tau ^{X_k}_C)\) are thresholds of the bivariate support \({\mathbb {R}}^2\) under the conventions \(\tau ^{X_j}_0 = \tau ^{X_k}_0 = -\infty \) and \(\tau ^{X_j}_R = \tau ^{X_k}_C = \infty \). Note that since fuzzy numbers encompass crisp observations and crisp categories as special cases (i.e., degenerated and rectangular fuzzy numbers, respectively), the expression (3.2) can be used for the non-fuzzy case as well. For instance, the simplest situation involving non-fuzzy observations and non-fuzzy categories can be obtained rewriting the left part of the constraint as \((\mathring{y}^j_i = r) \wedge (\mathring{y}^k_i = c)\), which indicates that crisp observations take the indices of the categories.

The parameter space for the LLCs model is

$$\begin{aligned} \varvec{\theta } = \{\rho _{jk},\varvec{\tau }_{X^j},\varvec{\tau }_{X^k}\} \in [-1,1]\times {\mathbb {R}}^{R-1}\times {\mathbb {R}}^{C-1}, \end{aligned}$$

whereas the log-likelihood function takes the following form in the case of independent and identically distributed fuzzy observations [47, 59]:

$$\begin{aligned} \ln {\mathcal {L}}(\varvec{\theta };{{\widetilde{\mathbf {N}}}})&= K - \sum _{r=1}^{R}\sum _{c=1}^{C} ~\sum _{n\in {\mathbb {N}}_0} n~\xi _{\widetilde{n}_{rc}}^{jk}(n) \ln \pi _{rc}^{jk}(\varvec{\theta }) \nonumber \\&= K - \sum _{r=1}^{R}\sum _{c=1}^{C} ~\sum _{n\in {\mathbb {N}}_0} n~\xi _{\widetilde{n}_{rc}}^{jk}(n) \ln \int _{\tau _{r-1}^{X^j}}^{\tau _{r}^{X^j}} \int _{\tau _{c-1}^{X^k}}^{\tau _{c}^{X^k}} f_X({\mathbf {x}};\rho _{jk}) ~\mathrm{d}x^j \mathrm{d}x^k, \end{aligned}$$
(3.3)

where \(f_X({\mathbf {x}};\rho _{jk})\) is the model’s density in Eq. (3.1), \(\xi _{\widetilde{n}_{rc}}^{jk}(n)\) is the rcth fuzzy count, and K is a constant term. Note that \(f_X({\mathbf {x}};\rho _{jk})\) is not fuzzy in this context and its realizations represent unobserved (latent) quantities. The evaluation of \(({{\tilde{y}}}^j_i \in {{\tilde{g}}}_r^j) \wedge ({{\tilde{y}}}^k_i \in {{\tilde{g}}}_c^k)\) gives raise to a collection of fuzzy counts \({{\tilde{n}}}_{11}^{jk},\ldots ,{{\tilde{n}}}_{rc}^{jk},\ldots ,{{\tilde{n}}}_{RC}^{jk}\) acting as possibilistic constraints on the unobserved non-fuzzy counts which would be observed if fuzziness was missed. As such, the expression \(\xi _{\widetilde{n}_{rc}}^{jk}(n_{rc}) \in [0,1]\) should be interpreted as the possibility that the crisp count \(n_{rc}\) has to occur, with \(\xi _{\widetilde{n}_{rc}}^{jk}( n_{rc} ) = 1\) indicating that \(n_{rc}\) is fully possible. According to the epistemic viewpoint on fuzzy statistics [37], the sampling process is thought as being the consequence of a two-stage generation mechanism, the first of which is a random experiment and the second is a non-random fuzzification of the outcome being realized. As an example of this schema, consider the simplest case of crisp observations (e.g., income and tobacco use) that are classified by a group of raters or an automatic classification system on the basis of fuzzy categories (e.g., income levels: low, medium, high; tobacco use: none, sporadic, habitual). Stated in this way, the fuzzy frequencies associated with income and tobacco use encapsulate two sources of uncertainty, namely the random component due to the sampling process and the non-random component due to the post-sampling fuzzy classification.

3.2 Parameter Estimation

To estimate \(\varvec{\theta }\), we adopt the Olsson’s two-stage approach for latent linear correlations which iteratively alternates between approximating \({{{\hat{\varvec{\tau }}}}}\) from the observed count data and maximizing Eq. (3.3) with respect to \({{\hat{\rho }}}\) given the current thresholds [59]. In the case of fuzzy data, this procedure can be implemented using a variant of the expectation–maximization algorithm generalized to the case of fuzzy observations [24]. Likewise for the standard EM algorithm, the fuzzy EM version alternates between the E-step, which requires computing the expected complete log-likelihood given the candidate \(\varvec{\theta }' = \varvec{\theta }^{(q-1)}\), and the M-step, which maximizes the expected complete log-likelihood w.r.t. \(\varvec{\theta }^{(q)}\). More precisely, in the fuzzy EM algorithm the complete data log-likelihood is that obtained if the matrix of counts \({\mathbf {N}}_{R\times C}^{jk}\) was precisely observed, namely:

$$\begin{aligned} \ln {\mathcal {L}}(\varvec{\theta };{\mathbf {N}}) = \ln I! - \sum _{r=1}^{R}\sum _{c=1}^{C} {n}_{rc}^{jk} \ln \pi _{rc}^{jk}(\varvec{\theta }) - \sum _{r=1}^{R}\sum _{c=1}^{C} \ln n_{rc}^{jk}!. \end{aligned}$$
(3.4)

Given the estimates \(\varvec{\theta }'\), the E-step for the (q)th iteration consists of computing the \({\mathcal {Q}}\)-function via conditional expectation on the observed fuzzy counts:

$$\begin{aligned} {\mathcal {Q}}(\varvec{\theta },\varvec{\theta }')&= {\mathbb {E}}_{{\varvec{\theta }'}}\left[ \ln {\mathcal {L}}(\varvec{\theta };{\mathbf {N}}) \Big | \mathbf {{\widetilde{N}}}\right] \nonumber \\&\quad \propto \sum _{r=1}^{R}\sum _{c=1}^{C} ~{\mathbb {E}}_{{\varvec{\theta }'}}\left[ {N}_{rc}^{jk}\Big | {{\tilde{n}}_{rc}}^{jk}\right] \ln \pi _{rc}^{jk}(\varvec{\theta }) - {\mathbb {E}}_{{\varvec{\theta }'}}\left[ \ln N_{rc}^{jk}! \Big | {{\tilde{n}}_{rc}}^{jk}\right] . \end{aligned}$$
(3.5)

The conditional expectations involve the density of a discrete random variable \(N_{rc}\) conditioned on a fuzzy event \({\tilde{n}}_{rc}\) that, under the multinomial schema for random counts, can reasonably be modeled as Binomial [1]. Thus, using the definition of fuzzy probability, \(N_{rc}|{\tilde{n}}_{rc}\) is as follows:

$$\begin{aligned} p_{N_{rc}^{jk}|{\tilde{n}}_{rc}^{jk}}(n;\pi _{rc}^{jk}(\varvec{\theta }))&= \frac{{\mathbb {P}}_{\varvec{\theta }}\left( N_{rc}^{jk},{\tilde{n}}_{rc}^{jk}\right) }{{\mathbb {P}}_{\varvec{\theta }}\left( {\tilde{n}}_{rc}^{jk}\right) } = \frac{\xi _{\widetilde{n}_{jk}}^{jk}(n) p_{N_{rc}^{jk}}(n;\pi _{rc}^{jk}(\varvec{\theta }))}{\sum _{n\in {\mathbb {N}}_0} n~\xi _{\widetilde{n}_{jk}}^{jk}(n) p_{N_{rc}^{jk}}(n;\pi _{rc}^{jk}(\varvec{\theta }))} , \end{aligned}$$
(3.6)
$$\begin{aligned} \pi _{rc}^{jk}(\varvec{\theta }),&= \int _{\tau _{r-1}^{X^j}}^{\tau _{r}^{X^j}} \int _{\tau _{c-1}^{X^k}}^{\tau _{c}^{X^k}} f_X({\mathbf {x}};\rho _{jk}) ~\mathrm{d}x^j \mathrm{d}x^k, \end{aligned}$$
(3.7)

where \(p_{N_{rc}^{jk}}= {\mathcal {B}}in(n;\pi _{rc}^{jk}(\varvec{\theta }))\) and \(f_X({\mathbf {x}};\rho _{jk})\) is the latent model’s density in Eq. (3.1). Note that the quantity \(I\pi _{rc}^{jk}(\varvec{\theta })\) is the reconstructed count from the bivariate latent model given the current parameters \(\varvec{\theta }'\) [66]. The linear form of the expectations in Eq. (3.5) is

$$\begin{aligned} {\mathbb {E}}_{\tiny {\varvec{\theta }'}}\left[ N_{rc}^{jk}\Big | {{\tilde{n}}_{rc}}^{jk}\right] = \sum _{n\in {\mathbb {N}}_0} n ~p_{N_{rc}^{jk}|{\tilde{n}}_{rc}^{jk}}(n;\pi _{rc}^{jk}(\varvec{\theta }')), \end{aligned}$$
(3.8)

whereas, since it is not involved in the M-step of the algorithm, the nonlinear expectation is provided in Appendix A for the sake of completeness.

Finally, the M-step for the (q)th iteration requires maximizing the functional \({\mathcal {Q}}(\varvec{\theta },\varvec{\theta }')\) with respect to \(\varvec{\theta }\). Given the filtered counts at the current iteration \(\widehat{{\mathbf {N}}}^{jk}_{R\times C}\) (see Eq. 3.8), the Olsson’s two-stage estimation approach requires the estimation of thresholds from the cumulative marginals of filtered counts first:

$$\begin{aligned} {\widehat{\varvec{\tau }}}_{X^j}^{(q)}&= \varPhi ^{-1}\left( {\mathbf {A}}_{R\times R}\widehat{{\mathbf {N}}}^{jk}{\mathbf {1}}_{C} \right) , \end{aligned}$$
(3.9)
$$\begin{aligned} {\widehat{\varvec{\tau }}}_{X^k}^{(q)}&= \varPhi ^{-1}\left( {\mathbf {A}}_{C\times C}(\widehat{{\mathbf {N}}}^{jk})^T{\mathbf {1}}_{R} \right) , \end{aligned}$$
(3.10)

where \({\mathbf {A}}\) is a lower triangular matrix of ones, \({\mathbf {1}}\) is a vector of appropriate order of all 1/I, and \(\varPhi \) is the Gaussian univariate distribution function with mean zero and unitary variance. Next, conditioned on \(\{{\widehat{\varvec{\tau }}}_{X^j}^{(q)},{\widehat{\varvec{\tau }}}_{X^k}^{(q)}\}\), the remaining parameter is found by solving the score equation of \({\mathcal {Q}}(\varvec{\theta },\varvec{\theta }^{(q)})\) numerically w.r.t. \(\rho _{jk}\):

$$\begin{aligned} {\mathcal {U}}_{\rho _{jk}} = \frac{\partial {\mathcal {Q}}\left( \rho _{jk},\{{\widehat{\varvec{\tau }}}_{X^j}^{(q)},{\widehat{\varvec{\tau }}}_{X^k}^{(q)}\}\right) }{\partial {{\varvec{\pi }}^{jk}}} \frac{\partial {\varvec{\pi }}^{jk}}{\partial \rho _{jk}} = 0. \end{aligned}$$
(3.11)

The algorithm proceeds iteratively until the log-likelihood does not increase significantly. Table 1 summarizes the fuzzy EM algorithm for the LLCs model.

Table 1 Expectation–maximization algorithm for estimating \(\varvec{\theta } = (\varvec{\tau }_{X^j},\varvec{\tau }_{X^k},\rho _{jk})\) in LLCs model with fuzzy frequency data

3.3 Remarks

About the Convergence of the Algorithm Given a candidate \(\varvec{\theta }'\), the fuzzy EM starts by constructing the surrogate \({\mathcal {Q}}(\varvec{\theta },\varvec{\theta }')\) that lower-bounds the observed data log-likelihood \(\ln {\mathcal {L}}(\varvec{\theta };\mathbf {{\widetilde{N}}})\) (E-step). Next, it is maximized to get the current estimates \(\varvec{\theta }^{(q)}\) (M-step), which is in turn used to construct a new lower bound \({\mathcal {Q}}(\varvec{\theta },\varvec{\theta }^{(q)})\) in the next iteration to get a new estimate \(\varvec{\theta }^{(q+1)}\). The estimates in the M-step are chosen so that \({\mathcal {Q}}(\varvec{\theta },\varvec{\theta }^{(q)}) \ge {\mathcal {Q}}(\varvec{\theta },\varvec{\theta }')\), which forms the base of the monotonicity condition \(\ln {\mathcal {L}}(\varvec{\theta }^{(q+1)};\mathbf {{\widetilde{N}}}) \ge \ln {\mathcal {L}}(\varvec{\theta }^{(q)};\mathbf {{\widetilde{N}}})\) [54]. As for the standard case, the monotonicity of the sequence \(\{\ln {\mathcal {L}}(\varvec{\theta }^{(q)}\}_{q\in {\mathbb {N}}}\) implies the convergence to a stationary value, which can be global or local depending on the characteristics of the log-likelihood function and the starting point \(\varvec{\theta }^0\). A sketch of the proof of the monotonicity of the fuzzy EM for the LLCs is provided in Appendix B, whereas the formal equivalence between EM and fuzzy EM is detailed in [24, 62].

About the Starting Values of the Algorithm Suitable starting values \(\varvec{\theta }^0\) can be obtained by first defuzzifying the observed fuzzy frequencies matrix \(\widetilde{{\mathbf {N}}}^{jk}\) to obtain non-fuzzy counts and then applying the standard Olsson’s two-stage approach [59] on defuzzified data. In general, this yields convenient starting values. In the LLCs model, defuzzification can be performed via mean or max-based procedures as follows: \({{\hat{n}}}_{rc}^{\mathrm{mean}} \overset{\sim }{=}\sum _{n\in {\mathbb {N}}_0} n \xi _{\widetilde{n}_{rc}}(n) / \left( \sum _{n\in {\mathbb {N}}_0} \xi _{\widetilde{n}_{rc}}(n) ~ \right) \), \({{\hat{n}}}_{rc}^{\mathrm{max}} = \max \{n\in {\mathbb {N}}_0: \xi _{\widetilde{n}_{rc}}(n) = \max _{z \in {\mathbb {N}}_0}~ \xi _{\widetilde{n}_{rc}}(z) \}\), \(~r=1,\ldots ,R\), \(~c=1,\ldots ,C\).

About the Term \(p_{N_{rc}|{\tilde{n}}_{rc}}(n;\pi _{rc}(\varvec{\theta }))\). The term \(p_{N_{rc}|{\tilde{n}}_{rc}}\) represents the density of a non-fuzzy random variable conditioned on fuzzy numbers and can mathematically be interpreted as the combination of two independent components, namely the random mechanism underlying the sampling process and the observer’s partial knowledge (imprecision) about the sample realizations. In this sense, as it weights each fuzzy datum by the probability that it has to occur [52], \(p_{N_{rc}|{\tilde{n}}_{rc}}\) should not be confused with the mean-based defuzzification of fuzzy numbers. A nice property of this formulation is that fuzziness vanishes when precise observations are available. Indeed, the conditional density involving a degenerated fuzzy number \(\mathring{n}_{rc}\) boils down to a degenerated discrete density \(p_{N_{rc}|\mathring{n}_{rc}}\) with nonzero probability masses only for those n such that \(\xi _{\widetilde{n}_{rc}}(n)=1\). As a consequence, the fuzzy EM procedure reduces to standard Olsson’s two-stage maximum likelihood estimation. In general, there are a number of ways for plugging-in non-stochastic components of uncertainty into \(p_{N_{rc}|{\tilde{n}}_{rc}}\), such as those involving imprecise probability [7], conditional probability [18], belief measures [76], and random fuzzy variables [30].

About the Computation of Standard Errors and Inference Standard errors for \({\hat{\rho }}_{jk}\) might be computed by following the general results of the EM algorithm [54]. In particular, there are a number of procedures which have been suggested to this purpose. A common way is that of calculating the square root of the inverse of the empirical information matrix [40], which approximates the expected information matrix by using the observed score statistic. Similarly, another strategy has been suggested by Louis [53] and requires the computation of the expected complete and missing information matrices. Alternatively, standard errors can also be obtained by means of nonparametric or parametric bootstrap techniques, which have been demonstrated to be robust under several circumstances [55]. In the context of this article, we resorted to use the nonparametric bootstrap technique to compute standard errors for the fuzzy polychoric estimator (see [55], section 4.6). A particular advantage of this procedure is that \((1-\alpha )\) confidence intervals (CIs) can also be obtained as a by-product of the bootstrap technique, for instance the bias-corrected and accelerated (BCa) CIs. More precisely, Q bootstrap samples of the fuzzy matrix \(\{\mathbf {{\widetilde{N}}}^{(jk)}\}_{q=1,\ldots ,Q}\) can be obtained by drawing from the \(\alpha \)-cuts of the rcth element of \(\mathbf {{\widetilde{N}}}^{(jk)}\) (for each \(r=1,\ldots ,R\) and \(c=1,\ldots ,C\)) and then fuzzifying back the bootstrapped sample of count data [33]. Finally, the sequence of estimates \(\{{\hat{\rho }}_{jk}\}_{q=1,\ldots ,Q}\) is used to compute the bootstrap covariance matrix \({\mathbb {C}}\text {ov}\left[ R\right] _{jk} \overset{\sim }{=}\frac{1}{Q-1}\sum _{q=1}^Q ({\hat{\rho }}_{jk}^{(q)} - \frac{1}{B}\sum _{q=1}^Q {\hat{\rho }}_{jk})^2\), which is in turn used for the computation of the standard errors \({\hat{\sigma }}_{\rho _{jk}} = \sqrt{{\mathbb {C}}\text {ov}\left[ R\right] _{jk}}\), and the \((1-\alpha )\) BCa confidence intervals [25].

About the Polychoric Correlation Matrix \({\mathbf {R}}_{J\times J}\) As for the standard approach used in computing polychoric correlation matrices (e.g., see [41, 59]), also in the case of fuzzy data, the matrix of latent linear correlations is obtained by calculating each element \(\rho _{jk}\) of the correlation matrix pairwise. Although this approach offers a simple and effective alternative to more challenging methods (e.g., see [48, 68]), in some circumstances, it may lead to non-positive definite correlation matrices. This can be problematic, especially when such matrices are used as input of other statistical models such as factor analyses or SEMs [51]. In these cases, eigenvalue decomposition-based smoothing [44], least squares [44] or Dykstra’s [35] corrections constitute workable solutions to solve this issue.

4 Simulation Study

The aim of this simulation study is twofold. First, we wish to evaluate the performances of fuzzy EM algorithm in estimating parameters of the LLCs model and, second, to investigate whether the standard Olsson’s maximum likelihood procedure performs as good as the proposed method if applied on max-based and mean-based defuzzified data. The case \(J=2\) has been considered for the sake of simplicity. The Monte Carlo study has been performed on a (remote) HPC machine based on 16 CPU Intel Xeon CPU E5-2630L v3 1.80 GHz, 16x4 GB RAM, whereas computations and analyses have been done in the R framework for statistical analyses.

Design The design of the study involved three factors, namely (i) \(I\in \{150,250,500,1000\}\), (ii) \(\rho ^0 \in \{0.15,0.50,0.85\}\), (iii) \(R=C\in \{4,6\}\), which were varied in a complete factorial design with \(4\times 3\times 2=24\) possible combinations. The threshold parameters were held fixed under the equidistance hypothesis [41], namely \(\varvec{\tau }^0_{X^j}=\varvec{\tau }^0_{X^k}=(-2.00,-0.66,0.66,2.00)\) for the conditions with \(R=C=4\) and \(\varvec{\tau }^0_{X^j}=\varvec{\tau }^0_{X^k}=(-2.00,-1.20,-0.40,0.40,1.20,2.00)\) for \(R=C=6\). For each combination, \(B=5000\) samples were generated yielding to \(5000\times 24 = 120000\) new data and an equivalent number of parameters.

Data Generation and Procedure Let \(I_{a}\), \(\rho ^0_{b}\), \(R_{d}=C_d\) be distinct levels of the factors I, \(\rho ^0\), R, and C. Then, fuzzy frequency data have been generated according to the following procedure. For each \(r=1,\ldots ,R_d\) and \(c=1,\ldots ,C_d\):

  1. (i)

    Set \(n_{rc} = I_a\pi _{rc}\) (see Eq. (3.7)) given \(\varvec{\tau }^0_{X^j}\), \(\varvec{\tau }^0_{X^k}\), \(\rho ^0_{b}\), and \(I_a\);

  2. (ii)

    the imprecision concerning \(n_{rc}\) was generated as follows: \(m_1\sim {\mathcal {G}}amma_{\text {d}}(\alpha _{m_1},\beta _{m_1})\) where \(\alpha _{m_1}= 1+n_{rc}\beta _{m_1}\), \(\beta _{m_1}=(n_{rc}+n_{rc}^2+4s^2_1)^{\frac{1}{2}} \big / 2s^2_1\), \(s_1 \sim {\mathcal {G}}amma_{\text {d}}(\alpha _{s_1},\beta _{s_1})\), \(\alpha _{s_1}= 1+m_0\beta _{s_1}\), \(\beta _{s_1}=(m_0+m_0^2+4s^2_0)^{\frac{1}{2}} \big / 2s^2_0\), \(m_0=1\) and \(s_0=0.25\), with \({\mathcal {G}}amma_{\text {d}}\) indicating the discrete Gamma random variable with shape and rate being reparameterized in terms of mean m and variance s;

  3. (iii)

    the fuzzy set associated with \({{\tilde{n}}}_{rc}\) was obtained via the following probability–possibility transformation: \({\varvec{\xi }}_{{{\tilde{n}}}_{rc}} = f_{{\mathcal {G}}_{\text {d}}}({\mathbf {n}};\alpha _{rc},\beta _{rc}) \big / \max f_{{\mathcal {G}}_{\text {d}}}({\mathbf {n}};\alpha _{rc},\beta _{rc})\), with \({\mathbf {n}}=\{0,1,\ldots ,I_{a}\}\), \(\alpha _{rc} = 1+m_1\beta _{s_1}\), \(\beta _{s_1}=1+(m_1+m_1^2+4s_1^2)^{\frac{1}{2}} / 2s_1^2 \), \(\beta _{rc}=(m_1+m_1^2+4s^2_1)^{\frac{1}{2}} \big / 2s^2_1\), and \(f_{{\mathcal {G}}_{\text {d}}}(n;\alpha _{rc},\beta _{rc})\) being the discrete Gamma density normalized to one in order to mimic the behavior of a normal fuzzy set [27]. The discrete density \(f_{{\mathcal {G}}_{\text {d}}}\) is computed as a difference of survival functions of the continuous Gamma density \(S_{{\mathcal {G}}}(x) - S_{{\mathcal {G}}}(x+1)\) [15, 74].

Note that step (ii) is required in order to make crisp counts entirely imprecise so that \({{\tilde{n}}}_{rc}\) is no longer centered on \(n_{rc}\). Finally, parameters \(\varvec{\theta } = \{\rho ,\varvec{\tau }_{X^j},\varvec{\tau }_{X^k}\}\) were estimated from the fuzzy counts \({{\widetilde{\mathbf {N}}}}_{R_d\times C_d}\) using the fuzzy EM algorithm (fEM) and the standard Olsson’s two-stage maximum likelihood on max-based (dML-max) and mean-based (dML-mean) defuzzified counts (see Sect. 3.3).

Outcome Measures For each condition of the simulation design, the three methods (i.e., fEM, ML-max, ML-mean) were evaluated in terms of bias and root mean square errors. In addition, for each method thresholds were aggregated to form a scalar statistic, namely \({\widehat{\varvec{\tau }}} = {\mathbf {1}}_{R_d}^T{\widehat{\varvec{\tau }}}_{X^j}\) and \({\widehat{\varvec{\tau }}} = {\mathbf {1}}_{C_d}^T{\widehat{\varvec{\tau }}}_{X^k}\). (Note that \(\varvec{\tau }_{X^j}\) and \(\varvec{\tau }_{X^k}\) are equal by design.) For the sake of completeness, bootstrap standard errors and 95% BCa confidence intervals were computed for the three methods along with coverage probability and interval lengths.

Table 2 Simulation study: average bias and root-mean-square errors for \(\rho \) in the condition \(R=C=4\)
Table 3 Simulation study: average bias and root-mean-square errors for \(\rho \) in the condition \(R=C=6\)
Table 4 Simulation study: average bias and root-mean-square errors for the aggregated thresholds \({\widehat{\varvec{\tau }}} = {\mathbf {1}}_{R_d}^T{\widehat{\varvec{\tau }}}_{X^j}\) in the condition \(R=C=4\)

Results Tables 2, 3, 4, and 5 report the results of the simulation study with regards to \({{\hat{\rho }}}\) and \({\hat{\varvec{\tau }}}\) for both \(R=C=4\) and \(R=C=6\) cases. We begin with the correlation parameter \(\rho \) for the case \(R=C=4\) (see Table 2). Considering \(\rho ^0=0.15\), the methods showed negligible bias in estimating \(\rho \). However, they differed in terms of RMSE, with fEM showing lower values with increasing sample size if compared to dML-max and dML-mean. With increasing correlation length (\(\rho ^0>0.15\)), bias of estimates as well as RMSE was more pronounced for dML-max and dML-mean. The same results were also observed for the case with \(R=C=6\) (see Table 3). With regards to the overall statistic \({\hat{\varvec{\tau }}}\) for the threshold parameters, all the methods achieved comparable results regardless of \(\rho ^0\). In particular, fEM showed slightly higher bias and RMSE and then dML-max and dML-mean methods across \(R=C=4\) (see Table 4) and \(R=C=6\) (see Table 5) conditions. To further investigate these results, we studied average bias and variance of estimates for \({{{\hat{\varvec{\tau }}}}}_{X^j}\) (or \({{{\hat{\varvec{\tau }}}}}_{X^k}\)) as a function of sample size I and \(\rho ^0\). We found that the leftmost and rightmost thresholds tended to be slightly larger for fEM as opposed to the innermost thresholds for both \(R=C=4\) (see Supplementary Materials, Figure S1) and \(R=C=6\) conditions (see Supplementary Materials, Figure S2). Moreover, the variance of estimates for the leftmost and rightmost thresholds was higher if compared to the innermost thresholds (see Supplementary Materials, Table S2) but, as expected, it reduced with increasing sample size regardless of \(\rho ^0\). This is not surprising given that we implemented a standard LLCs model in which no particular constraints were applied on threshold estimates, such as \({\mathbf {1}}_{R_d}^T{\hat{\varvec{\tau }}}_{X^j} = 0\) (e.g., see [28]).Footnote 1 Most importantly, according to the Gaussianity assumption underlying the LLCs model, estimated thresholds were symmetric and equidistant with respect to the fixed point zero (see Supplementary Materials, Table S1). Overall, the results suggest that fEM should be preferred over defuzzified maximum likelihood when the interest is in estimating the latent linear association \(\rho \) among pairs of variables and fuzzy frequency statistics are available. On the contrary, for those particular cases where \(\rho \) is known and the interest is in estimating the true threshold parameters, standard Olsson’s maximum likelihood method can directly be applied after defuzzifiyng observed fuzzy frequency counts. With regards to the estimation of the standard errors, the three algorithms showed comparable results. As expected, the statistic \({\hat{\sigma }}_{\rho _{jk}}\) decreased as a function of the sample size I for both \(R=C=4\) and \(R=C=6\) conditions (see Supplementary Materials, Tables S3 and S4). Instead, with regards to 95% CIs, only the fEM method showed consistent results in terms of coverage probability and interval lengths over all the simulation conditions (see Supplementary Materials, Table S3 and S4). In particular, with the exception of the condition \(\rho ^0=0.15\), the dML-max and dML-mean algorithms did not reach the nominal coverage probability. By contrast, the empirical coverage probability for the fEM algorithm was close to (or higher then) the nominal value, with interval lengths decreasing as a function of the sample size I.

Table 5 Simulation study: average bias and root-mean-square errors for the aggregated thresholds \({\widehat{\varvec{\tau }}} = {\mathbf {1}}_{R_d}^T{\widehat{\varvec{\tau }}}_{X^j}\) in the condition \(R=C=6\)

5 Applications

In this section, we describe the application of the proposed method to two case studies from health and natural sciences, involving the assessment of a psychotherapeutic intervention (application 1) and the evaluation of meteorological characteristics for forty Turkish cities (application 2). Note that both the applications are provided to merely illustrate the use of fuzzy LLCs model when dealing with imprecise data.

5.1 Application 1: Assessing the Outcome of a Therapy

Evaluating the quality of a psychotherapy session plays a central role in evidence-based medicine. A typical approach to understand the fundamentals of the therapeutic process consists in asking experts to assess the global quality and characteristics of the therapist–patient relationship through specialized instruments such as the PQS questionnaire [61]. The data thus collected generally consist either of ratings or of classification of attributes made through bounded and graded scales. Because of their characteristics, these tasks often involve imprecision and vagueness that can adequately be accounted for by the fuzzy statistical modeling. In this application, we consider the assessment of a psychotherapy session by means of the PQS questionnaire. Data were originally collected by [17] and refer to \(I=60\) evaluations of psychotherapy on a 9-point scale over \(J=3\) dimensions of assessment. Given the nature of the task, the three variables were originally considered to be fuzzy, each with three trapezoidal fuzzy categories. To account for the extremes of the classification scale, two more outer categories were added so that \(R=C=5\) (see Table 6). Figure 2 shows the granulation based on five fuzzy categories (\(G_0\),\(\ldots \),\(G_4\)) for each dimension of assessment along with the corresponding crisp observations. The aim is to compute the correlation matrix for the three fuzzy variables, with the hypothesis that the higher degree of association is related to a good therapeutic outcome. The first step requires computing the fuzzy frequency matrix \({{\widetilde{\mathbf {N}}}}_{5\times 5}\) for each pair of \(J=3\) fuzzy variables given the crisp observed data. Next, the matrix of fuzzy counts is used to estimate the latent linear correlation matrix \({{\widehat{\mathbf {R}}}}_{5\times 5}\). Figure 3 shows a graphical representation of the matrix of fuzzy counts \(\mathbf {{\widetilde{N}}}_{5\times 5}\) for one pair of variables (i.e., \(X_2\),\(X_3\)). It contains fuzzy numbers with various degrees of fuzziness and includes combinations with degenerated fuzzy counts as well (i.e., \(G_0^{(2)},G_4^{(3)}\) and \(G_0^{(2)},G_0^{(3)}\)). Table 7 reports the estimates of LLC coefficients. Overall, the results showed a low level of association among the three dimensions, which in turn indicated that the psychotherapy being assessed cannot be classified as having a good outcome.

Table 6 Application 1: Fuzzy categories for the three variables of the assessment task
Fig. 2
figure 2

Application 1: Granulation for the three fuzzy variables along with crisp observations (dashed gray lines)

Fig. 3
figure 3

Application 1: Fuzzy frequency matrix for the pair \(X_2\),\(X_3\). Note that each cell contains a fuzzy natural number \({{\tilde{n}}}_{rc}\) for a specific combination of the \(R\times C\) granulation space

Table 7 Application 1: Latent linear correlation matrix estimated via Olsson’s two-stage fuzzy EM algorithm (the bootstrap standard errors are reported in parentheses)

5.2 Application 2: Effect of Climatic Variables on Rainfall

Meteorological variables are generally used to assess the impact of climatic characteristics in many phenomena including human as well as non-human activities. Although often regarded as discrete or continuous measurements, these variables can benefit from fuzzy coding in some circumstances. Examples include cases in which these variables are imprecisely coded (e.g., when data are available in terms of intervals or linguistic categories) or when they are derived from a variety of sources (e.g., samples, historical databases, experts) that need to be integrated before being used for data analysis [8, 16]. In this application, we consider the analysis of \(J=5\) meteorological variables (i.e., SUN: daily hours of sunshine; HUM: percentage of humidity; PRE: precipitations; ALT: altitude; MAX: maximum daily temperature) which were collected in 40 cities of Turkey during 2004 [2]. Data were originally coded using \(R=C=3\) fuzzy triangular categories (\(G_0\): minimum; \(G_1\): medium; \(G_2\): maximum) and membership grades \({\varvec{\epsilon }}^{(j)}_1,{\varvec{\epsilon }}^{(j)}_2,{\varvec{\epsilon }}^{(j)}_3\) \(j=1,\ldots ,5\) constitute the input data for the subsequent analysis. The aim is to explore the effects of climatic variables on rainfall (PRE) by means of a path analysis model. Likewise for the first application, the first step consisted in computing the fuzzy frequency matrix \({{\widehat{\mathbf {N}}}}_{3\times 3}\) for each pair of the five climatic variables given the observed membership degrees. Then, the LLCs matrix was estimated using the fuzzy EM algorithm. Figure 4 shows an example of fuzzy counts for the pair of variables PRE-HUM, whereas Table 8 reports the estimated correlations for the variables involved in the study. As expected, the results showed a certain level of association among the five climatic variables.

Fig. 4
figure 4

Application 2: Fuzzy frequency matrix for the pair PRE, HUM. Note that each cell contains a fuzzy natural number \({{\tilde{n}}}_{rc}\) for a specific combination of the \(R\times C\) granulation space

Table 8 Application 2: Latent linear correlation matrix estimated via Olsson’s two-stage fuzzy EM algorithm (the bootstrap standard errors are reported in parentheses)

Once the LLCs matrix has been estimated, we proceeded by modeling the effects of the climatic variables on PRE via path analysis (see Fig. 5). In particular, we expected that a higher humidity (HUM) increased rainfall (PRE) and that sunshine duration (SUN) decreased the levels of precipitation (PRE). Similarly, we also expected an indirect effect of altitude (ALT) on humidity (HUM) through temperatures (TEMP). The path model has been estimated on the LLCs matrix via maximum likelihood as implemented in the R library lavaan [64]. Overall, the estimated model showed a moderate fit (\(R^2=0.20\)). The results (Table 9) highlighted that PRE increased as a function of HUM (\({\hat{\beta }}=0.1844\), \({\hat{\sigma }}^2_\beta =0.1386\)) and decreased as sunshine duration increased (\({\hat{\beta }}=-0.3406\), \({\hat{\sigma }}^2_\beta =0.1386\)). Humidity was negatively related to temperature (\({\hat{\beta }}=-0.2161\), \({\hat{\sigma }}^2_\beta =0.1544\)), which was in turn negatively associated with altitude (\({\hat{\beta }}=-0.5577\), \({\hat{\sigma }}^2_\beta =0.1312\)) as expected .

Fig. 5
figure 5

Application 2: Path model for the effect of the climatic variables on the response variable PRE. Note that straight lines represent direct effects, whereas dotted lines indicate correlations

Table 9 Application 2: Estimated coefficients \({{\hat{\varvec{\beta }}}}\) and residual variances \({{\hat{\varvec{\sigma }}}}_{\epsilon }^2\) for the path model depicted in Fig. 5 along with the standard errors \({{\hat{\varvec{\sigma }}}}_\beta ^2\) of the estimates

6 Conclusions

In this article, we described a novel approach to estimate latent linear correlations (LLCs) when data are in the form of fuzzy frequency tables. In particular, we represented fuzzy counts in terms of generalized natural numbers first, and then we generalized the sample space of the standard LLCs model to cope with fuzzy counts while retaining its parameter space as non-fuzzy. The resulting model encapsulated both random and non-random/imprecision components in a unified statistical representation. Since the inferential interest is on estimating the latent correlation matrix of the observed variables, parameter estimation was performed via fuzzy maximum likelihood using the expectation–maximization algorithm. A simulation study and two real applications were developed to highlight the characteristics of the fuzzy LLCs model. Overall, the simulation results revealed that the fuzzy LLCs model showed more accurate results in estimating the true correlation matrix as opposed to standard methods which can be applied on defuzzified data. The applications showed how the proposed method can be of particular value in situations involving fuzzy classification and fuzzy coding as well.

A particular advantage of the fuzzy LLCs model is its simplicity and ability to deal with situations involving imprecise classification problems. Moreover, the proposed method works with both fuzzy observations/crisp categories and crisp observations/fuzzy categories and, as such, it includes the standard crisp observations/crisp categories as a special case. Again, the fuzzy LLCs model does not require the extension of its parametric representation to account for fuzzy frequency data and consequently, parameter estimation and inference can be performed using the asymptotic properties of maximum likelihood theory. This is quite convenient and obviates the need of generalizing LLCs-based statistical modeling—such as structural equation models and factor analysis—to the fuzzy case. A limitation of the proposed approach is that it is based on the simplest, but still used, assumption of Gaussianity for LLCs. Although it has been proved that the assumption holds in several empirical contexts, there may be the need of LLCs based on more general probabilistic models (e.g., skew-Gaussian, elliptical, t, copula-based). As a result, the problems already identified by other researchers, for instance, bias in estimating the asymptotic covariance matrix of the LLCs matrix [28, 56], still persist in the fuzzy case. The fuzzy bootstrap technique used to approximate the covariance matrix of the fuzzy polychoric matrix might constitute an additional limitation of the current study. Indeed, although it provides a computational solution to calculate standard errors and CIs, it might suffer from the course of dimensionality (e.g., in the case of a higher number of variables or response categories) as well as from a larger variance in the estimates. This is a well-known issue in the fuzzy statistics literature (e.g., see [33]), and it is particularly due to the fact that fuzzy bootstrap techniques handle with two sources of variability simultaneously, i.e., one related to the randomness of the estimator and the second related to the effect of the fuzziness in the data.

There are a number of further extensions to this project that can be undertaken in future research studies. For instance, the use of more general probabilistic model would extend the proposed method to handle with situations involving violations of Gaussianity assumption. In this line, further investigations should be undertaken to study the problem of deriving asymptotically efficient estimators for covariance matrix and standard error, for instance, by obtaining a fuzzy generalization of the Louis’ method [53]. Similarly to the non-fuzzy case, this is still an open question. At the same time, building interval estimators for the polychoric fuzzy estimator—beyond the point-wise solution described in this article—might constitute a further generalization of the findings of the present study. Another aspect which might be interesting to investigate is the case where data need to be represented using more general fuzzy numbers (e.g., beta, exponential, Gaussian), which would allow the proposed method to cope with cases requiring more flexible models to represent non-random imprecision. Further, studying the properties of fuzzy LLCs-based statistical models like structural equation modeling or factor analysis would also constitute a research topic to be considered in a further study. Finally, neutrosophic-based generalizations of the proposed LLC statistic might also be a further research line to be investigated (e.g., see [6, 65]).