Skip to main content
Log in

Bayesian network structural learning from complex survey data: a resampling based approach

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Nowadays there is increasing availability of good quality official statistics data. The construction of multivariate statistical models possibly leading to the identification of causal relationships is of interest. In this context Bayesian networks play an important role. A crucial step consists in learning the structure of a Bayesian network. One of the most widely used procedures is the PC algorithm consisting in carrying out several independence tests on the available data set and in building a Bayesian network according to the tests results. The PC algorithm is based on the irremissible assumption that data are independent and identically distributed. Unfortunately, official statistics data are generally collected through complex sampling designs, then the aforementioned assumption is not met. In such a context the PC algorithm fails in learning the structure. To avoid this, the sample selection must be taken into account in the structural learning process. In this paper, a modified version of the PC algorithm is proposed for inferring causal structure from complex survey data. It is based on resampling techniques for finite populations. A simulation experiment showing the robustness with respect to departures from the assumptions and the good performance of the proposed algorithm is carried out.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Antal E, Tillé Y (2011) A direct bootstrap method for complex sampling designs from a finite population. J Amer Statist Assoc 106:534–543

    Article  MathSciNet  MATH  Google Scholar 

  • Ballin M, Scanu M (2010) Vicard P (2010) Estimation of contingency tables in complex survey sampling using probabilistic expert systems. J Stat Plan Inference 140:1501–1512

    Article  MATH  Google Scholar 

  • Beaumont J-F, Patak Z (2012) On the generalized bootstrap for sample surveys with special attention to poisson sampling. Int Stat Rev 80:127–148

    Article  MathSciNet  MATH  Google Scholar 

  • Berger YG (2011) Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pak J Stat 27:407–426

    MathSciNet  Google Scholar 

  • Bickel PJ, Freedman DA (1981) Some asymptotic theory for the bootstrap. Ann Statist 9:1196–1217

    MathSciNet  MATH  Google Scholar 

  • Boistard H, Lophuhaä HP, Ruiz-Gazen A (2017) Functional central limit theorems for single-stage sampling design. Ann Stat 45:1728–1758

    Article  MathSciNet  MATH  Google Scholar 

  • Booth JG, Butler RW, Hall P (1994) Bootstrap methods for finite populations. J Amer Statist Assoc 89:1282–1289

    Article  MathSciNet  MATH  Google Scholar 

  • Chao MT, Lo S-H (1985) A bootstrap method for finite population. Sankhya Ser A 47:399–405

    MathSciNet  MATH  Google Scholar 

  • Chauvet G (2007) Méthodes de bootstrap en population finie. Ph.D. Dissertation, Laboratoire de statistique d’enquêtes, CREST-ENSAI, Universioté de Rennes 2,

  • Chatterjee A (2011) Asymptotic properties of sample quantiles from a finite population. Ann Inst Statist Math 63:157–179

    Article  MathSciNet  MATH  Google Scholar 

  • Conti PL (2014) On the estimation of the distribution function of a finite population under high entropy sampling designs, with applications. Sankhya B 76:234–259

    Article  MathSciNet  MATH  Google Scholar 

  • Conti PL, Marella D (2015) Inference for quantiles of a fnite population: asymptotic vs. resampling results. Scand J Stat 42:545–561

    Article  MathSciNet  MATH  Google Scholar 

  • Conti PL, Marella D (2015) Inference for quantiles of a finite population: Asymptotic versus resampling results. Scand J Stat 42:545–561

    Article  MathSciNet  MATH  Google Scholar 

  • Conti PL, Marella D, Mecatti F, Andreis F (2019) A unified principled framework for resampling based on pseudo-populations: asymptotic theory. Bernoulli 26:1044–1069

    MathSciNet  MATH  Google Scholar 

  • Conti PL, Di Iorio A (2018) Analytic inference in finit populations via resampling, with applications to confidence intervals and testing for independence, arXiv:1809.08035. Submitted under second review

  • Conti PL, Di Iorio A, Guandalini A, Marella D, Vicard P, Vitale V (2020) On the estimation of the Lorenz curve under complex sampling designs. Stat Meth Appl 29:1–24

    Article  MathSciNet  MATH  Google Scholar 

  • Cowell RG, Dawid P, Lauritzen SL, Spiegelhalter DJ (2007) Probabilistic networks and expert systems: exact computational methods for bayesian networks, Springer Publishing Company

  • Di Zio M, Scanu M, Coppola L, Luzi O, Ponti A (2004) Bayesian networks for imputation. J Royal Stat Soc A 167:309–322

    Article  MathSciNet  MATH  Google Scholar 

  • Drton M, Maathuis MH (2017) Structure learning in graphical modeling. Annu Rev Stat Appl 4:365–393

    Article  Google Scholar 

  • Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman N, Goldszmidt M, Wyner A (1999) Data analysis with bayesian networks: a bootstrap approach. Proceedings of the 15th annual conference on uncertainty in artificial intelligence, 196-201,

  • Grafström A (2010) Entropy of unequal probability sampling designs. Stat Methodol 7:84–97

    Article  MathSciNet  MATH  Google Scholar 

  • Hájek J (1964) Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat 35:1491–1523

    Article  MathSciNet  MATH  Google Scholar 

  • Holmberg A (1998) A bootstrap approach to probability proportional-to-size sampling, Proceedings of the ASA Section on Survey research Methods, 378–383

  • Jiménez-Gamero MD, Moreno-Rebollo JL, Mayor-Gallego JA (2018) On the estimation of the characteristic function in finite populations with applications. Test 27:95–121

    Article  MathSciNet  MATH  Google Scholar 

  • Gross ST (1980) Median estimation in sample surveys. In Proceedings of the section on survey reasearch methods. American Statistical Association 181-184

  • Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012) Causal inference using graphical models with the R package pcalg. J Stat Softw 47:1–26

    Article  Google Scholar 

  • Lagani V, Athineou G, Farcomeni A, Tsagris M, Tsamardinos I (2017) Feature selection with the R package MXM: discovery statistically-equivalentfeature subsets. J Stat Softw 80:7

    Article  Google Scholar 

  • Lahiri SN (2003) Resampling methods for dependent data. Springer series in statistics. Springer, New York

    Book  Google Scholar 

  • Mashreghi Z, Haziza D, Leger C (2016) A survey of bootstrap methods in finite population sampling. Stat Surv 10:1–52

    Article  MathSciNet  MATH  Google Scholar 

  • Marella D, Vicard P (2013) Object-oriented bayesian networks for modeling the respondent measurement error. Commun Stat 42:3463–3477

    Article  MathSciNet  MATH  Google Scholar 

  • Marella D, Vicard P (2015) Object-oriented bayesian network to deal with measurement error in household surveys. Advances in Statistical Models for Data Analysis, Springer

  • Marella D, Pfeffermann D (2019) Matching information from two independent informative samples. J Stat Plan Inference 203:70–81

    Article  MathSciNet  MATH  Google Scholar 

  • McCarthy PJ, Snowden CB (1985) The bootstrap and finite population sampling. In Vital and health statistics 95(2): 1–23. Washington, DC: Public Heath Service Publication, U.S. Government Printing,

  • Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337

    Article  MATH  Google Scholar 

  • Pfeffermann D (2001) Modelling of complex survey data: why model? Why is it a problem? How can we approach it? Surv Methodol 37:115–136

    Google Scholar 

  • Ramsey J, Spirtes P, Zhang J (2006) Adjacency-faithfulness and conservative causal inference, Proceedings of 22nd conference on uncertainty in artificial intelligence, 401–408. Oregon: AUAI Press,

  • Ranalli MG, Mecatti F (2012) Comparing recent approaches for bootstrapping sample survey data: a first step towards a unified approach. In Proceedings of the ASA section on survey research methods, 4088-4099,

  • Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Rao JNK, Scott AJ (1981) The analysis of categorical data from complex sample surveys: chi-squared tests for goodness-of-fit and independence in two-way tables. J Am Stat Assoc 76:221–230

    Article  MathSciNet  MATH  Google Scholar 

  • Rao JNK, Scott AJ (1984) On chi-squared tests for multi-way tables with cell proportions estimated from survey data. Ann Stat 12:46–60

    Article  MATH  Google Scholar 

  • Rao JNK, Wu C-FJ (1988) Resampling inference with complex survey data. J Amer Statist Assoc 83:231–241

    Article  MathSciNet  MATH  Google Scholar 

  • Serfling RJ (1980) Approximation theory of mathematical statistics. Wiley, New York

    Book  MATH  Google Scholar 

  • Sitter RR (1992) A resampling procedure for complex survey data. J Amer Statist Assoc 87:755–765

    Article  MathSciNet  MATH  Google Scholar 

  • Skinner CJ, Holt D, Smith MF (1989) Analysis of complex surveys. Wiley

  • Spirtes P, Glymour G, Scheines R (2000) Causation, Prediction, and Search, MIT Press, Cambridge, MA, 2nd ed. with additional material by D. Heckerman, C. Meek, G. F. Cooper and T. Richardson

  • Thibaudeau Y, Winkler WE (2002) Bayesian networks representations, generalized imputation, and synthetic micro-data satisfying analytic constraints, Research Report RRS2002/92002. U.S, Bureau of the Census

  • Tsagris M (2019) Bayesian network learning with PC algorithm: an improved and correct variation. Appl Artif Intell 33(2):101–123

    Article  MathSciNet  Google Scholar 

  • Tsamardinos IL, Brown E, Aliferis CF (2006) The max-min climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78

    Article  MATH  Google Scholar 

  • Uhler C, Raskutti G, Bühlmann P, Yu B (2013) Geometry of the faithfulness assumption in causal inference. Ann Stat 41:436–463

    Article  MathSciNet  MATH  Google Scholar 

  • Verma T, Pearl J (1990) On equivalence of causal models. Technical Report R-150, Department of Computer Science, University of California at Los Angeles

  • Wilcox RR (2010) Fundamentals of modern statistical methods, Substantially improving power and accuracy. Springer

  • Zhang J, Spirtes P (2008) Detection of unfaithfulness and robust causal inference. Minds Mach 18:239–271

    Article  Google Scholar 

Download references

Acknowledgements

We want to thank the anonymous referees whose comments considerably improved an earlier version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paola Vicard.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Proposition 1

Here the main lines showing how Proposition 1 descends from Proposition 1 in Conti et al. (2018) are provided.

Define the cumulative distribution functions (c.d.f.s),

$$\begin{aligned} F^{hk}=\sum _{u=1}^{h}\sum _{v=1}^{k}p^{uv}, \; h=1,\dots , H, \; \; k=1,\dots , K \end{aligned}$$

the empirical c.d.f.s

$$\begin{aligned} \widehat{F}^{hk}=\sum _{u=1}^{h}\sum _{v=1}^{k}\widehat{p}^{uv}, \; h=1,\dots , H, \; \; k=1,\dots , K \end{aligned}$$

where \(\widehat{p}^{uv}\) are estimated using the classical Hájek estimators as in (6), and the corresponding random vectors (with elements in lexicographic order)

$$\begin{aligned} \varvec{F}^{HK}=\left[ \begin{array}{c} F^{11} \\ F^{12} \\ \dots \\ F^{HK} \\ \end{array} \right] \;\;\;\;\; \widehat{\varvec{F}}^{HK}=\left[ \begin{array}{c} \widehat{F}^{11} \\ \widehat{F}^{12} \\ \dots \\ \widehat{F}^{HK} \\ \end{array} \right] \end{aligned}$$

and

$$\begin{aligned} \varvec{T}^{HK}=\sqrt{n}\left( \widehat{\varvec{F}}^{HK} - \varvec{F}^{HK} \right) . \end{aligned}$$

Note that the random vector \(\varvec{T}^{HK}\) lies on a hyperplane of dimension \(HK-1\), due to the relationships \(\widehat{F}^{HK} = F^{HK}=1\) (then the last component of \(\varvec{T}^{HK}\) is 0).

From Conti et al. (2018) it follows that \(\varvec{T}^{HK}\) tends in distribution, as \(n, N \rightarrow \infty\), to a degenerate multivariate Normal r.v. with mean vector \(\varvec{0}^{HK}\) (with HK components) and covariance matrix \(\varvec{\Omega }^{HK}\). Since the limiting distribution is degenerate (it lies in a sub-space of dimension \(HK-1\)), the matrix \(\varvec{\Omega }^{HK}\) is degenerate. However, this does not affect neither its definition, nor its basic properties (cfr. Rao (1973), pp. 184-185). In addition, again from Conti et al. (2018), the relationship

$$\begin{aligned} \varvec{\Omega }^{HK} = \varvec{\Omega }^{HK}_1 + f\varvec{\Omega }^{HK}_2 \end{aligned}$$
(25)

holds, where \(\varvec{\Omega }^{HK}_1\) is the part of the total variability due to sampling design, \(\varvec{\Omega }^{HK}_2\) is the part of variability due to superpopulation model, and f is the limiting sampling fraction.

Define now

$$\begin{aligned} \varvec{W}^{HK}=\sqrt{n} \left[ \begin{array}{c} \widehat{p}^{11} - p^{11} \\ \widehat{p}^{12} - p^{12} \\ \dots \\ \widehat{p}^{HK} - p^{HK} \\ \end{array} \right] \end{aligned}$$

From

$$\begin{aligned} p^{hk}= & {} F^{hk}-F^{h \; k-1}-F^{h-1 \; k}+F^{h-1 \; k-1} \\ \widehat{p}^{hk}= & {} \widehat{F}^{hk}-\widehat{F}^{h \; k-1}-\widehat{F}^{h-1 \; k}+\widehat{F}^{h-1 \; k-1} \end{aligned}$$

where \(h=1,\dots , H\), \(k=1,\dots , K\), it is immediate to verify that the map

$$\begin{aligned} \varvec{T}^{HK} \mapsto \varvec{W}^{HK} \end{aligned}$$
(26)

is linear, and hence continuous. From the continuous mapping theorem, \(\varvec{W}^{HK}\) tends in distribution to a degenerate multivariate Normal distribution with mean \(\varvec{0}^{HK}\) and (singular) covariance matrix \(\varvec{\Sigma }^{HK}\). In view of (25), the matrix \(\varvec{\Sigma }^{HK}\) can be decomposed as

$$\begin{aligned} \varvec{\Sigma }^{HK} = \varvec{\Sigma }^{HK}_1 + f\varvec{\Sigma }^{HK}_2. \end{aligned}$$
(27)

Next, define

$$\begin{aligned} \varvec{W}^{H}=\sqrt{n} \left[ \begin{array}{c} \widehat{p}^{1.} - p^{1.} \\ \widehat{p}^{2.} - p^{2.} \\ \dots \\ \widehat{p}^{H.} - p^{H.} \\ \end{array} \right] =\sqrt{n} \left[ \begin{array}{c} \sum _{k=1}^{K}(\widehat{p}^{1k} - p^{1k}) \\ \sum _{k=1}^{K}(\widehat{p}^{2k} - p^{2k}) \\ \dots \\ \sum _{k=1}^{K}(\widehat{p}^{Hk} - p^{Hk}) \\ \end{array} \right] . \end{aligned}$$

The map \(\varvec{W}^{HK} \mapsto \varvec{W}^{H}\) is linear, and hence continuous. From the continuous mapping theorem, it follows that \(\varvec{W}^{H}\) tends in distribution to a (degenerate) multivariate Normal distribution, with mean vector \(\varvec{0}^{H}\) and covariance matrix \(\varvec{\Sigma }^H\). From (27), it also follows that the following decomposition holds:

$$\begin{aligned} \varvec{\Sigma }^{H} = \varvec{\Sigma }^{H}_1 + f\varvec{\Sigma }^{H}_2. \end{aligned}$$

Finally, using exactly the same arguments as above, it is not difficult to see that the degenerate r.v.

$$\begin{aligned} \varvec{W}^{K}=\sqrt{n} \left[ \begin{array}{c} \widehat{p}^{.1} - p^{.1} \\ \widehat{p}^{.2} - p^{.2} \\ \dots \\ \widehat{p}^{.K} - p^{.K} \\ \end{array} \right] \end{aligned}$$

tends in distribution to a (degenerate) multivariate Normal distribution, with mean vector \(\varvec{0}^{K}\) and covariance matrix \(\varvec{\Sigma }^K\). Again, the decomposition

$$\begin{aligned} \varvec{\Sigma }^{H} = \varvec{\Sigma }^{H}_1 + f\varvec{\Sigma }^{H}_2 \end{aligned}$$

holds.\(\square\)

In order to prove Propositions 2, 3, define the vectors \(\widetilde{{\varvec{p}}}^{HK}\) and \(\overline{{\varvec{p}}}^{HK}\) of length HK

$$\begin{aligned} \widetilde{{\varvec{p}}}^{HK}= \left[ \begin{array}{c} \widehat{p}^{1.}\widehat{p}^{.1}\\ \widehat{p}^{1.}\widehat{p}^{.2}\\ \dots \\ \widehat{p}^{1.}\widehat{p}^{.K}\\ \widehat{p}^{2.}\widehat{p}^{.1}\\ \widehat{p}^{2.}\widehat{p}^{.2}\\ \dots \\ \widehat{p}^{2.}\widehat{p}^{.K}\\ \dots \\ \widehat{p}^{H.}\widehat{p}^{.1}\\ \widehat{p}^{H.}\widehat{p}^{.2}\\ \dots \\ \widehat{p}^{H.}\widehat{p}^{.K}\\ \end{array} \right] \quad \overline{{\varvec{p}}}^{HK}=\left[ \begin{array}{c} p^{1.}p^{.1}\\ p^{1.}p^{.2}\\ \dots \\ p^{1.}p^{.K}\\ p^{2.}p^{.1}\\ p^{2.}p^{.2}\\ \dots \\ p^{2.}p^{.K}\\ \dots \\ p^{H.}p^{.1}\\ p^{H.}p^{.2}\\ \dots \\ p^{H.}p^{.K}\\ \end{array} \right] \end{aligned}$$

and the matrices (\(H \times HK\) and \(K \times HK\), respectively)

$$\begin{aligned} {\varvec{A}}= & {} \left[ {\varvec{A}}_1,{\varvec{A}}_2,\dots ,{\varvec{A}}_H\right] \\ {\varvec{B}}= & {} \left[ {\varvec{B}}_1,{\varvec{B}}_2, \dots ,{\varvec{B}}_H\right] \end{aligned}$$

where

  1. i)

    \({\varvec{A}}_h\) is a matrix of size \(H \times K\) with all entries equal to 0 but the entries of the hth row which are equal to 1, for \(h=1,\dots ,H\).

  2. ii)

    \({\varvec{B}}_h\) is an identity matrix of order K, for \(h=1,..,H\).

If we set

$$\begin{aligned} \widehat{{\varvec{p}}}^{H.}= \left[ \begin{array}{c} \widehat{p}^{1.}\\ \widehat{p}^{2.}\\ \dots \\ \widehat{p}^{H.}\\ \end{array} \right] \quad \widehat{{\varvec{p}}}^{.K}=\left[ \begin{array}{c} \widehat{p}^{.1}\\ \widehat{p}^{.2}\\ \dots \\ \widehat{p}^{.K}\\ \end{array} \right] \end{aligned}$$

then the relationships

$$\begin{aligned} \widehat{{\varvec{p}}}^{H.}= & {} \varvec{A}\widehat{{\varvec{p}}}^{HK} \\ \widehat{{\varvec{p}}}^{. K}= & {} \varvec{B}\widehat{{\varvec{p}}}^{HK} \end{aligned}$$

hold. Next, define the matrices (\(HK\times H\), \(HK\times H\) and \(HK \times K\), respectively)

$$\begin{aligned} \varvec{\Pi }= \left[ \begin{array}{c} \varvec{\Pi }_1\\ \varvec{\Pi }_2\\ \dots \\ \varvec{\Pi }_H\\ \end{array} \right] \quad \widehat{\varvec{\Pi }}=\left[ \begin{array}{cc} \widehat{\varvec{\Pi }}_1\\ \widehat{\varvec{\Pi }}_2\\ \dots \\ \widehat{\varvec{\Pi }}_H\\ \end{array} \right] \quad \varvec{\Psi }=\left[ \begin{array}{cc} \varvec{\Psi }_1\\ \varvec{\Psi }_2\\ \dots \\ \varvec{\Psi }_H\\ \end{array} \right] \end{aligned}$$

where

  1. 1.

    \(\varvec{\Pi }_h\) is a matrix of size \(K \times H\) having all entries equal to zero but the entries in the hth column that are equal to \(p^{.1},p^{.2},\dots ,p^{.K}\), for \(h=1,...,H\).

  2. 2.

    \(\widehat{\varvec{\Pi }}_h\) is a matrix of order \(K \times H\) having all entries equal to zero but the entries in the hth column that are equal to \(\widehat{p}^{.1},\widehat{p}^{.2},\dots ,\widehat{p}^{.K}\), for \(h=1,...,H\).

  3. 3.

    \(\varvec{\Psi }_h\) is a diagonal matrix of order \(K \times K\), with all entries in the main diagonal equal to \(p^{h.}\), for \(h=1,...,H\).

With this symbols, we may write

$$\begin{aligned} \sqrt{n} \left[ \begin{array}{c} \widehat{{\varvec{p}}}^{HK} - {\varvec{p}}^{HK} \\ \widehat{{\varvec{p}}}^{H.} - {\varvec{p}}^{H.} \\ \widehat{{\varvec{p}}}^{.K} - {\varvec{p}}^{.K} \\ \end{array} \right] =\left[ \begin{array}{c} \varvec{I}^{HK}\\ {\varvec{A}}\\ {\varvec{B}}\\ \end{array} \right] \sqrt{n}( \widehat{{\varvec{p}}}^{HK}-{\varvec{p}}^{HK}) \end{aligned}$$
(28)

where \(\varvec{I}^{HK}\) is the identity matrix of size \(HK \times HK\).

Lemma 1

\(\widehat{p}^{hk}-p^{hk}\) converges in probability to 0 as, n, N go to infinity, for each h, k.

Proof

Immediate consequence of Proposition 1. \(\square\)

Note that Proposition 1 actually implies that \(\widehat{p}^{hk}-p^{hk}=O_{p}(n^{-1/2})\), for each h, k.

Lemma 2

\(\widehat{p}^{h.}-p^{h.}\), \(\widehat{p}^{.k}-p^{.k}\) converge in probability to 0 as, n, N go to infinity, for each h, k.

Proof

Consequence of Lemma 1. \(\square\)

Proof of Proposition 2

It is enough to use the relationship (28). Proposition 2 follows from (28), Proposition 1, and the continuous mapping theorem. \(\square\)

Lemma 3

Under the independence hypothesis \({\mathcal H_{0}}\), the limiting distribution of \(\sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})\) coincides with the limiting distribution of

$$\begin{aligned} (\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}}) \left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}) \right\} \end{aligned}$$

that turns out to be (degenerate) multivariate Normal with null mean vector and covariance matrix,

$$\begin{aligned} \varvec{\Gamma }^{HK}=(\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}})\varvec{\Sigma }^{HK} (\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}})^{T} \end{aligned}$$

Proof

From the relationship

$$\begin{aligned} \widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k}=(\widehat{p}^{hk}-p^{h.}p^{.k})- \widehat{p}^{h.}(\widehat{p}^{.k}-p^{.k})- \widehat{p}^{.k}(\widehat{p}^{h.}-p^{h.}) \end{aligned}$$

it follows that, in matrix terms,

$$\begin{aligned} \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})=\sqrt{n} (\varvec{I}^{HK}-\widehat{\varvec{\Pi }} {\varvec{A}}-\varvec{\Psi } {\varvec{B}})(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}). \end{aligned}$$

Next, from Lemma 2, the matrix \(\widehat{\Pi }\) tends in probability to \(\Pi\), as nN go to infinity. Using the Slutsky Theorem (Serfling 1980), this implies, in its turns, that the limiting distribution of,

$$\begin{aligned} \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})= & {} (\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}}) \left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}) \right\} \\- & {} (\widehat{\varvec{\Pi }}-\varvec{\Pi }){\varvec{A}}\left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}) \right\} \end{aligned}$$

coincides with the limiting distribution of

$$\begin{aligned} (\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}}) \left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}) \right\} \end{aligned}$$
(29)

The linearity of (29) and the continuous mapping theorem complete the proof. \(\square\)

For the sake of simplicity, from now the notation

$$\begin{aligned} \varvec{C}=\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}}\end{aligned}$$

will be used.

Lemma 4

Define

$$\begin{aligned} \chi ^{2}_{2H}=n\sum _{h=1}^{H}\sum _{k=1}^{K}(\widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k})^2 \left( \frac{1}{\widehat{p}^{h.}\widehat{p}^{.k}}-\frac{1}{p^{h.}p^{.k}}\right) \end{aligned}$$
(30)

Under the null hypothesis of independence \({{\mathcal {H}}_{0}}\), \(\chi ^{2}_{2H}\) converges in probability to 0 as n, N go to infinity.

Proof

First of all, we have

$$\begin{aligned} |\chi ^{2}_{2H}|\le \max _{h,k}\left| \frac{1}{\widehat{p}^{h.}\widehat{p}^{.k}}- \frac{1}{p^{h.}p^{.k}} \right| \left\{ n \sum _{h=1}^{H}\sum _{k=1}^{K}(\widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k})^2\right\} . \end{aligned}$$

Since convergence in probability is preserved under continuous transformations, the term

$$\begin{aligned} \max _{h,k} \left| \frac{1}{\widehat{p}^{h.}\widehat{p}^{.k}}- \frac{1}{p^{h.}p^{.k}} \right| {\mathop {\rightarrow }\limits ^{p}} 0 \;\;\; \mathrm {as} \; n,N \rightarrow \infty \end{aligned}$$
(31)

In addition, from Lemma 3 it follows that

$$\begin{aligned} n\sum _{h=1}^H\sum _{k=1}^K (\widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k})^2 & = \left\{ \sqrt{n}(\widehat{\varvec{p}}^{HK}-\widetilde{\varvec{p}}^{HK})\right\} ^T \left\{ \sqrt{n}(\widehat{\varvec{p}}^{HK}-\widetilde{\varvec{p}}^{HK})\right\} \nonumber \\&{\mathop {\rightarrow }\limits ^{d}} \varvec{X}^{T}\varvec{X} \end{aligned}$$
(32)

where \(\varvec{X}\) is a singular multivariate (HK) Normal r.v. with null mean vector and covariance matrix \(\varvec{\Gamma }^{HK}=\varvec{C}\varvec{\Sigma }^{HK}\varvec{C}^T\). The lemma follows from (31) and (32) and the continuous mapping theorem. \(\square\)

Lemma 5

Define

$$\begin{aligned} \chi ^{2}_{1H}=n\sum _{h=1}^H\sum _{k=1}^k\frac{(\widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k})^2}{\widehat{p}^{h.}\widehat{p}^{.k}}. \end{aligned}$$
(33)

Under the null hypothesis of independence \({{\mathcal {H}}_{0}}\), \(\chi ^{2}_{1H}\) tends in distribution to \(\varvec{X}^{T}{\varvec{p}}^{HK}({\varvec{p}}^{HK})^{T}\varvec{X}\) where \(\varvec{X}\) is a (singular) multivariate HK Normal r.v. with null mean vector and covariance matrix \(\varvec{\Gamma }^{HK}=\varvec{C} \varvec{\Sigma }^{HK} \varvec{C}^{T}\).

Proof

It is enough to observe that

$$\begin{aligned} \chi ^{2}_{1H}=\left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})\right\} ^T \varvec{p}^{HK}({\varvec{p}}^{HK})^{T}\left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})\right\} \end{aligned}$$

and apply Lemma 3 and the continuous mapping theorem. \(\square\)

Proof of Proposition 3

The statistic \(\chi ^2_{H}\) can be written as \(\chi ^2_{1H}+\chi ^2_{2H}\), where \(\chi ^2_{1H}\) and \(\chi ^2_{2H}\) are defined in (33) and (30) respectively. The proof is a simple application of Lemma 4, 5.\(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marella, D., Vicard, P. Bayesian network structural learning from complex survey data: a resampling based approach. Stat Methods Appl 31, 981–1013 (2022). https://doi.org/10.1007/s10260-021-00618-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-021-00618-x

Keywords

Navigation