Bayesian network structural learning from complex survey data: a resampling based approach

Marella, Daniela; Vicard, Paola

doi:10.1007/s10260-021-00618-x

Bayesian network structural learning from complex survey data: a resampling based approach

Original Paper
Published: 21 January 2022

Volume 31, pages 981–1013, (2022)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

490 Accesses
6 Citations
Explore all metrics

Abstract

Nowadays there is increasing availability of good quality official statistics data. The construction of multivariate statistical models possibly leading to the identification of causal relationships is of interest. In this context Bayesian networks play an important role. A crucial step consists in learning the structure of a Bayesian network. One of the most widely used procedures is the PC algorithm consisting in carrying out several independence tests on the available data set and in building a Bayesian network according to the tests results. The PC algorithm is based on the irremissible assumption that data are independent and identically distributed. Unfortunately, official statistics data are generally collected through complex sampling designs, then the aforementioned assumption is not met. In such a context the PC algorithm fails in learning the structure. To avoid this, the sample selection must be taken into account in the structural learning process. In this paper, a modified version of the PC algorithm is proposed for inferring causal structure from complex survey data. It is based on resampling techniques for finite populations. A simulation experiment showing the robustness with respect to departures from the assumptions and the good performance of the proposed algorithm is carried out.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust semiparametric inference for polytomous logistic regression with complex survey design

Article 23 November 2020

Design-Unbiased Statistical Learning in Survey Sampling

Article 06 October 2020

Robustness in Survey Sampling Using the Conditional Bias Approach with R Implementation

References

Antal E, Tillé Y (2011) A direct bootstrap method for complex sampling designs from a finite population. J Amer Statist Assoc 106:534–543
Article MathSciNet MATH Google Scholar
Ballin M, Scanu M (2010) Vicard P (2010) Estimation of contingency tables in complex survey sampling using probabilistic expert systems. J Stat Plan Inference 140:1501–1512
Article MATH Google Scholar
Beaumont J-F, Patak Z (2012) On the generalized bootstrap for sample surveys with special attention to poisson sampling. Int Stat Rev 80:127–148
Article MathSciNet MATH Google Scholar
Berger YG (2011) Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pak J Stat 27:407–426
MathSciNet Google Scholar
Bickel PJ, Freedman DA (1981) Some asymptotic theory for the bootstrap. Ann Statist 9:1196–1217
MathSciNet MATH Google Scholar
Boistard H, Lophuhaä HP, Ruiz-Gazen A (2017) Functional central limit theorems for single-stage sampling design. Ann Stat 45:1728–1758
Article MathSciNet MATH Google Scholar
Booth JG, Butler RW, Hall P (1994) Bootstrap methods for finite populations. J Amer Statist Assoc 89:1282–1289
Article MathSciNet MATH Google Scholar
Chao MT, Lo S-H (1985) A bootstrap method for finite population. Sankhya Ser A 47:399–405
MathSciNet MATH Google Scholar
Chauvet G (2007) Méthodes de bootstrap en population finie. Ph.D. Dissertation, Laboratoire de statistique d’enquêtes, CREST-ENSAI, Universioté de Rennes 2,
Chatterjee A (2011) Asymptotic properties of sample quantiles from a finite population. Ann Inst Statist Math 63:157–179
Article MathSciNet MATH Google Scholar
Conti PL (2014) On the estimation of the distribution function of a finite population under high entropy sampling designs, with applications. Sankhya B 76:234–259
Article MathSciNet MATH Google Scholar
Conti PL, Marella D (2015) Inference for quantiles of a fnite population: asymptotic vs. resampling results. Scand J Stat 42:545–561
Article MathSciNet MATH Google Scholar
Conti PL, Marella D (2015) Inference for quantiles of a finite population: Asymptotic versus resampling results. Scand J Stat 42:545–561
Article MathSciNet MATH Google Scholar
Conti PL, Marella D, Mecatti F, Andreis F (2019) A unified principled framework for resampling based on pseudo-populations: asymptotic theory. Bernoulli 26:1044–1069
MathSciNet MATH Google Scholar
Conti PL, Di Iorio A (2018) Analytic inference in finit populations via resampling, with applications to confidence intervals and testing for independence, arXiv:1809.08035. Submitted under second review
Conti PL, Di Iorio A, Guandalini A, Marella D, Vicard P, Vitale V (2020) On the estimation of the Lorenz curve under complex sampling designs. Stat Meth Appl 29:1–24
Article MathSciNet MATH Google Scholar
Cowell RG, Dawid P, Lauritzen SL, Spiegelhalter DJ (2007) Probabilistic networks and expert systems: exact computational methods for bayesian networks, Springer Publishing Company
Di Zio M, Scanu M, Coppola L, Luzi O, Ponti A (2004) Bayesian networks for imputation. J Royal Stat Soc A 167:309–322
Article MathSciNet MATH Google Scholar
Drton M, Maathuis MH (2017) Structure learning in graphical modeling. Annu Rev Stat Appl 4:365–393
Article Google Scholar
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Article MathSciNet MATH Google Scholar
Friedman N, Goldszmidt M, Wyner A (1999) Data analysis with bayesian networks: a bootstrap approach. Proceedings of the 15th annual conference on uncertainty in artificial intelligence, 196-201,
Grafström A (2010) Entropy of unequal probability sampling designs. Stat Methodol 7:84–97
Article MathSciNet MATH Google Scholar
Hájek J (1964) Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat 35:1491–1523
Article MathSciNet MATH Google Scholar
Holmberg A (1998) A bootstrap approach to probability proportional-to-size sampling, Proceedings of the ASA Section on Survey research Methods, 378–383
Jiménez-Gamero MD, Moreno-Rebollo JL, Mayor-Gallego JA (2018) On the estimation of the characteristic function in finite populations with applications. Test 27:95–121
Article MathSciNet MATH Google Scholar
Gross ST (1980) Median estimation in sample surveys. In Proceedings of the section on survey reasearch methods. American Statistical Association 181-184
Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012) Causal inference using graphical models with the R package pcalg. J Stat Softw 47:1–26
Article Google Scholar
Lagani V, Athineou G, Farcomeni A, Tsagris M, Tsamardinos I (2017) Feature selection with the R package MXM: discovery statistically-equivalentfeature subsets. J Stat Softw 80:7
Article Google Scholar
Lahiri SN (2003) Resampling methods for dependent data. Springer series in statistics. Springer, New York
Book Google Scholar
Mashreghi Z, Haziza D, Leger C (2016) A survey of bootstrap methods in finite population sampling. Stat Surv 10:1–52
Article MathSciNet MATH Google Scholar
Marella D, Vicard P (2013) Object-oriented bayesian networks for modeling the respondent measurement error. Commun Stat 42:3463–3477
Article MathSciNet MATH Google Scholar
Marella D, Vicard P (2015) Object-oriented bayesian network to deal with measurement error in household surveys. Advances in Statistical Models for Data Analysis, Springer
Marella D, Pfeffermann D (2019) Matching information from two independent informative samples. J Stat Plan Inference 203:70–81
Article MathSciNet MATH Google Scholar
McCarthy PJ, Snowden CB (1985) The bootstrap and finite population sampling. In Vital and health statistics 95(2): 1–23. Washington, DC: Public Heath Service Publication, U.S. Government Printing,
Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337
Article MATH Google Scholar
Pfeffermann D (2001) Modelling of complex survey data: why model? Why is it a problem? How can we approach it? Surv Methodol 37:115–136
Google Scholar
Ramsey J, Spirtes P, Zhang J (2006) Adjacency-faithfulness and conservative causal inference, Proceedings of 22nd conference on uncertainty in artificial intelligence, 401–408. Oregon: AUAI Press,
Ranalli MG, Mecatti F (2012) Comparing recent approaches for bootstrapping sample survey data: a first step towards a unified approach. In Proceedings of the ASA section on survey research methods, 4088-4099,
Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York
Book MATH Google Scholar
Rao JNK, Scott AJ (1981) The analysis of categorical data from complex sample surveys: chi-squared tests for goodness-of-fit and independence in two-way tables. J Am Stat Assoc 76:221–230
Article MathSciNet MATH Google Scholar
Rao JNK, Scott AJ (1984) On chi-squared tests for multi-way tables with cell proportions estimated from survey data. Ann Stat 12:46–60
Article MATH Google Scholar
Rao JNK, Wu C-FJ (1988) Resampling inference with complex survey data. J Amer Statist Assoc 83:231–241
Article MathSciNet MATH Google Scholar
Serfling RJ (1980) Approximation theory of mathematical statistics. Wiley, New York
Book MATH Google Scholar
Sitter RR (1992) A resampling procedure for complex survey data. J Amer Statist Assoc 87:755–765
Article MathSciNet MATH Google Scholar
Skinner CJ, Holt D, Smith MF (1989) Analysis of complex surveys. Wiley
Spirtes P, Glymour G, Scheines R (2000) Causation, Prediction, and Search, MIT Press, Cambridge, MA, 2nd ed. with additional material by D. Heckerman, C. Meek, G. F. Cooper and T. Richardson
Thibaudeau Y, Winkler WE (2002) Bayesian networks representations, generalized imputation, and synthetic micro-data satisfying analytic constraints, Research Report RRS2002/92002. U.S, Bureau of the Census
Tsagris M (2019) Bayesian network learning with PC algorithm: an improved and correct variation. Appl Artif Intell 33(2):101–123
Article MathSciNet Google Scholar
Tsamardinos IL, Brown E, Aliferis CF (2006) The max-min climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78
Article MATH Google Scholar
Uhler C, Raskutti G, Bühlmann P, Yu B (2013) Geometry of the faithfulness assumption in causal inference. Ann Stat 41:436–463
Article MathSciNet MATH Google Scholar
Verma T, Pearl J (1990) On equivalence of causal models. Technical Report R-150, Department of Computer Science, University of California at Los Angeles
Wilcox RR (2010) Fundamentals of modern statistical methods, Substantially improving power and accuracy. Springer
Zhang J, Spirtes P (2008) Detection of unfaithfulness and robust causal inference. Minds Mach 18:239–271
Article Google Scholar

Download references

Acknowledgements

We want to thank the anonymous referees whose comments considerably improved an earlier version of the paper.

Author information

Authors and Affiliations

Department of Social and Economic Sciences, Sapienza Università di Roma, Piazzale Aldo Moro 5, 00185, Rome, Italy
Daniela Marella
Department of Economics, Università Roma Tre, Via Silvio D’amico 77, 00145, Rome, Italy
Paola Vicard

Authors

Daniela Marella
View author publications
You can also search for this author in PubMed Google Scholar
Paola Vicard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paola Vicard.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Proposition 1

Here the main lines showing how Proposition 1 descends from Proposition 1 in Conti et al. (2018) are provided.

Define the cumulative distribution functions (c.d.f.s),

$$\begin{aligned} F^{hk}=\sum _{u=1}^{h}\sum _{v=1}^{k}p^{uv}, \; h=1,\dots , H, \; \; k=1,\dots , K \end{aligned}$$

the empirical c.d.f.s

$$\begin{aligned} \widehat{F}^{hk}=\sum _{u=1}^{h}\sum _{v=1}^{k}\widehat{p}^{uv}, \; h=1,\dots , H, \; \; k=1,\dots , K \end{aligned}$$

where $\widehat{p}^{uv}$ are estimated using the classical Hájek estimators as in (6), and the corresponding random vectors (with elements in lexicographic order)

$$\begin{aligned} \varvec{F}^{HK}=\left[ \begin{array}{c} F^{11} \\ F^{12} \\ \dots \\ F^{HK} \\ \end{array} \right] \;\;\;\;\; \widehat{\varvec{F}}^{HK}=\left[ \begin{array}{c} \widehat{F}^{11} \\ \widehat{F}^{12} \\ \dots \\ \widehat{F}^{HK} \\ \end{array} \right] \end{aligned}$$

and

$$\begin{aligned} \varvec{T}^{HK}=\sqrt{n}\left( \widehat{\varvec{F}}^{HK} - \varvec{F}^{HK} \right) . \end{aligned}$$

Note that the random vector $\varvec{T}^{HK}$ lies on a hyperplane of dimension $HK-1$, due to the relationships $\widehat{F}^{HK} = F^{HK}=1$ (then the last component of $\varvec{T}^{HK}$ is 0).

From Conti et al. (2018) it follows that $\varvec{T}^{HK}$ tends in distribution, as $n, N \rightarrow \infty$, to a degenerate multivariate Normal r.v. with mean vector $\varvec{0}^{HK}$ (with HK components) and covariance matrix $\varvec{\Omega }^{HK}$. Since the limiting distribution is degenerate (it lies in a sub-space of dimension $HK-1$), the matrix $\varvec{\Omega }^{HK}$ is degenerate. However, this does not affect neither its definition, nor its basic properties (cfr. Rao (1973), pp. 184-185). In addition, again from Conti et al. (2018), the relationship

$$\begin{aligned} \varvec{\Omega }^{HK} = \varvec{\Omega }^{HK}_1 + f\varvec{\Omega }^{HK}_2 \end{aligned}$$

(25)

holds, where $\varvec{\Omega }^{HK}_1$ is the part of the total variability due to sampling design, $\varvec{\Omega }^{HK}_2$ is the part of variability due to superpopulation model, and f is the limiting sampling fraction.

Define now

$$\begin{aligned} \varvec{W}^{HK}=\sqrt{n} \left[ \begin{array}{c} \widehat{p}^{11} - p^{11} \\ \widehat{p}^{12} - p^{12} \\ \dots \\ \widehat{p}^{HK} - p^{HK} \\ \end{array} \right] \end{aligned}$$

From

$$\begin{aligned} p^{hk}= & {} F^{hk}-F^{h \; k-1}-F^{h-1 \; k}+F^{h-1 \; k-1} \\ \widehat{p}^{hk}= & {} \widehat{F}^{hk}-\widehat{F}^{h \; k-1}-\widehat{F}^{h-1 \; k}+\widehat{F}^{h-1 \; k-1} \end{aligned}$$

where $h=1,\dots , H$, $k=1,\dots , K$, it is immediate to verify that the map

$$\begin{aligned} \varvec{T}^{HK} \mapsto \varvec{W}^{HK} \end{aligned}$$

(26)

is linear, and hence continuous. From the continuous mapping theorem, $\varvec{W}^{HK}$ tends in distribution to a degenerate multivariate Normal distribution with mean $\varvec{0}^{HK}$ and (singular) covariance matrix $\varvec{\Sigma }^{HK}$. In view of (25), the matrix $\varvec{\Sigma }^{HK}$ can be decomposed as

$$\begin{aligned} \varvec{\Sigma }^{HK} = \varvec{\Sigma }^{HK}_1 + f\varvec{\Sigma }^{HK}_2. \end{aligned}$$

(27)

Next, define

$$\begin{aligned} \varvec{W}^{H}=\sqrt{n} \left[ \begin{array}{c} \widehat{p}^{1.} - p^{1.} \\ \widehat{p}^{2.} - p^{2.} \\ \dots \\ \widehat{p}^{H.} - p^{H.} \\ \end{array} \right] =\sqrt{n} \left[ \begin{array}{c} \sum _{k=1}^{K}(\widehat{p}^{1k} - p^{1k}) \\ \sum _{k=1}^{K}(\widehat{p}^{2k} - p^{2k}) \\ \dots \\ \sum _{k=1}^{K}(\widehat{p}^{Hk} - p^{Hk}) \\ \end{array} \right] . \end{aligned}$$

The map $\varvec{W}^{HK} \mapsto \varvec{W}^{H}$ is linear, and hence continuous. From the continuous mapping theorem, it follows that $\varvec{W}^{H}$ tends in distribution to a (degenerate) multivariate Normal distribution, with mean vector $\varvec{0}^{H}$ and covariance matrix $\varvec{\Sigma }^H$. From (27), it also follows that the following decomposition holds:

$$\begin{aligned} \varvec{\Sigma }^{H} = \varvec{\Sigma }^{H}_1 + f\varvec{\Sigma }^{H}_2. \end{aligned}$$

Finally, using exactly the same arguments as above, it is not difficult to see that the degenerate r.v.

$$\begin{aligned} \varvec{W}^{K}=\sqrt{n} \left[ \begin{array}{c} \widehat{p}^{.1} - p^{.1} \\ \widehat{p}^{.2} - p^{.2} \\ \dots \\ \widehat{p}^{.K} - p^{.K} \\ \end{array} \right] \end{aligned}$$

tends in distribution to a (degenerate) multivariate Normal distribution, with mean vector $\varvec{0}^{K}$ and covariance matrix $\varvec{\Sigma }^K$. Again, the decomposition

$$\begin{aligned} \varvec{\Sigma }^{H} = \varvec{\Sigma }^{H}_1 + f\varvec{\Sigma }^{H}_2 \end{aligned}$$

holds.$\square$

In order to prove Propositions 2, 3, define the vectors $\widetilde{{\varvec{p}}}^{HK}$ and $\overline{{\varvec{p}}}^{HK}$ of length HK

$$\begin{aligned} \widetilde{{\varvec{p}}}^{HK}= \left[ \begin{array}{c} \widehat{p}^{1.}\widehat{p}^{.1}\\ \widehat{p}^{1.}\widehat{p}^{.2}\\ \dots \\ \widehat{p}^{1.}\widehat{p}^{.K}\\ \widehat{p}^{2.}\widehat{p}^{.1}\\ \widehat{p}^{2.}\widehat{p}^{.2}\\ \dots \\ \widehat{p}^{2.}\widehat{p}^{.K}\\ \dots \\ \widehat{p}^{H.}\widehat{p}^{.1}\\ \widehat{p}^{H.}\widehat{p}^{.2}\\ \dots \\ \widehat{p}^{H.}\widehat{p}^{.K}\\ \end{array} \right] \quad \overline{{\varvec{p}}}^{HK}=\left[ \begin{array}{c} p^{1.}p^{.1}\\ p^{1.}p^{.2}\\ \dots \\ p^{1.}p^{.K}\\ p^{2.}p^{.1}\\ p^{2.}p^{.2}\\ \dots \\ p^{2.}p^{.K}\\ \dots \\ p^{H.}p^{.1}\\ p^{H.}p^{.2}\\ \dots \\ p^{H.}p^{.K}\\ \end{array} \right] \end{aligned}$$

and the matrices ($H \times HK$ and $K \times HK$, respectively)

$$\begin{aligned} {\varvec{A}}= & {} \left[ {\varvec{A}}_1,{\varvec{A}}_2,\dots ,{\varvec{A}}_H\right] \\ {\varvec{B}}= & {} \left[ {\varvec{B}}_1,{\varvec{B}}_2, \dots ,{\varvec{B}}_H\right] \end{aligned}$$

where

i)
${\varvec{A}}_h$ is a matrix of size $H \times K$ with all entries equal to 0 but the entries of the hth row which are equal to 1, for $h=1,\dots ,H$.
ii)
${\varvec{B}}_h$ is an identity matrix of order K, for $h=1,..,H$.

If we set

$$\begin{aligned} \widehat{{\varvec{p}}}^{H.}= \left[ \begin{array}{c} \widehat{p}^{1.}\\ \widehat{p}^{2.}\\ \dots \\ \widehat{p}^{H.}\\ \end{array} \right] \quad \widehat{{\varvec{p}}}^{.K}=\left[ \begin{array}{c} \widehat{p}^{.1}\\ \widehat{p}^{.2}\\ \dots \\ \widehat{p}^{.K}\\ \end{array} \right] \end{aligned}$$

then the relationships

$$\begin{aligned} \widehat{{\varvec{p}}}^{H.}= & {} \varvec{A}\widehat{{\varvec{p}}}^{HK} \\ \widehat{{\varvec{p}}}^{. K}= & {} \varvec{B}\widehat{{\varvec{p}}}^{HK} \end{aligned}$$

hold. Next, define the matrices ($HK\times H$, $HK\times H$ and $HK \times K$, respectively)

$$\begin{aligned} \varvec{\Pi }= \left[ \begin{array}{c} \varvec{\Pi }_1\\ \varvec{\Pi }_2\\ \dots \\ \varvec{\Pi }_H\\ \end{array} \right] \quad \widehat{\varvec{\Pi }}=\left[ \begin{array}{cc} \widehat{\varvec{\Pi }}_1\\ \widehat{\varvec{\Pi }}_2\\ \dots \\ \widehat{\varvec{\Pi }}_H\\ \end{array} \right] \quad \varvec{\Psi }=\left[ \begin{array}{cc} \varvec{\Psi }_1\\ \varvec{\Psi }_2\\ \dots \\ \varvec{\Psi }_H\\ \end{array} \right] \end{aligned}$$

where

1.
$\varvec{\Pi }_h$ is a matrix of size $K \times H$ having all entries equal to zero but the entries in the hth column that are equal to $p^{.1},p^{.2},\dots ,p^{.K}$, for $h=1,...,H$.
2.
$\widehat{\varvec{\Pi }}_h$ is a matrix of order $K \times H$ having all entries equal to zero but the entries in the hth column that are equal to $\widehat{p}^{.1},\widehat{p}^{.2},\dots ,\widehat{p}^{.K}$, for $h=1,...,H$.
3.
$\varvec{\Psi }_h$ is a diagonal matrix of order $K \times K$, with all entries in the main diagonal equal to $p^{h.}$, for $h=1,...,H$.

With this symbols, we may write

$$\begin{aligned} \sqrt{n} \left[ \begin{array}{c} \widehat{{\varvec{p}}}^{HK} - {\varvec{p}}^{HK} \\ \widehat{{\varvec{p}}}^{H.} - {\varvec{p}}^{H.} \\ \widehat{{\varvec{p}}}^{.K} - {\varvec{p}}^{.K} \\ \end{array} \right] =\left[ \begin{array}{c} \varvec{I}^{HK}\\ {\varvec{A}}\\ {\varvec{B}}\\ \end{array} \right] \sqrt{n}( \widehat{{\varvec{p}}}^{HK}-{\varvec{p}}^{HK}) \end{aligned}$$

(28)

where $\varvec{I}^{HK}$ is the identity matrix of size $HK \times HK$.

Lemma 1

$\widehat{p}^{hk}-p^{hk}$ converges in probability to 0 as, n, N go to infinity, for each h, k.

Proof

Immediate consequence of Proposition 1. $\square$

Note that Proposition 1 actually implies that $\widehat{p}^{hk}-p^{hk}=O_{p}(n^{-1/2})$, for each h, k.

Lemma 2

$\widehat{p}^{h.}-p^{h.}$, $\widehat{p}^{.k}-p^{.k}$ converge in probability to 0 as, n, N go to infinity, for each h, k.

Proof

Consequence of Lemma 1. $\square$

Proof of Proposition 2

It is enough to use the relationship (28). Proposition 2 follows from (28), Proposition 1, and the continuous mapping theorem. $\square$

Lemma 3

Under the independence hypothesis ${\mathcal H_{0}}$, the limiting distribution of $\sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})$ coincides with the limiting distribution of

$$\begin{aligned} (\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}}) \left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}) \right\} \end{aligned}$$

that turns out to be (degenerate) multivariate Normal with null mean vector and covariance matrix,

$$\begin{aligned} \varvec{\Gamma }^{HK}=(\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}})\varvec{\Sigma }^{HK} (\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}})^{T} \end{aligned}$$

Proof

From the relationship

$$\begin{aligned} \widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k}=(\widehat{p}^{hk}-p^{h.}p^{.k})- \widehat{p}^{h.}(\widehat{p}^{.k}-p^{.k})- \widehat{p}^{.k}(\widehat{p}^{h.}-p^{h.}) \end{aligned}$$

it follows that, in matrix terms,

$$\begin{aligned} \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})=\sqrt{n} (\varvec{I}^{HK}-\widehat{\varvec{\Pi }} {\varvec{A}}-\varvec{\Psi } {\varvec{B}})(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}). \end{aligned}$$

Next, from Lemma 2, the matrix $\widehat{\Pi }$ tends in probability to $\Pi$, as n, N go to infinity. Using the Slutsky Theorem (Serfling 1980), this implies, in its turns, that the limiting distribution of,

$$\begin{aligned} \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})= & {} (\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}}) \left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}) \right\} \\- & {} (\widehat{\varvec{\Pi }}-\varvec{\Pi }){\varvec{A}}\left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}) \right\} \end{aligned}$$

coincides with the limiting distribution of

$$\begin{aligned} (\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}}) \left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\overline{{\varvec{p}}}^{HK}) \right\} \end{aligned}$$

(29)

The linearity of (29) and the continuous mapping theorem complete the proof. $\square$

For the sake of simplicity, from now the notation

$$\begin{aligned} \varvec{C}=\varvec{I}^{HK}-\varvec{\Pi } {\varvec{A}}-\varvec{\Psi } {\varvec{B}}\end{aligned}$$

will be used.

Lemma 4

Define

$$\begin{aligned} \chi ^{2}_{2H}=n\sum _{h=1}^{H}\sum _{k=1}^{K}(\widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k})^2 \left( \frac{1}{\widehat{p}^{h.}\widehat{p}^{.k}}-\frac{1}{p^{h.}p^{.k}}\right) \end{aligned}$$

(30)

Under the null hypothesis of independence ${{\mathcal {H}}_{0}}$, $\chi ^{2}_{2H}$ converges in probability to 0 as n, N go to infinity.

Proof

First of all, we have

$$\begin{aligned} |\chi ^{2}_{2H}|\le \max _{h,k}\left| \frac{1}{\widehat{p}^{h.}\widehat{p}^{.k}}- \frac{1}{p^{h.}p^{.k}} \right| \left\{ n \sum _{h=1}^{H}\sum _{k=1}^{K}(\widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k})^2\right\} . \end{aligned}$$

Since convergence in probability is preserved under continuous transformations, the term

$$\begin{aligned} \max _{h,k} \left| \frac{1}{\widehat{p}^{h.}\widehat{p}^{.k}}- \frac{1}{p^{h.}p^{.k}} \right| {\mathop {\rightarrow }\limits ^{p}} 0 \;\;\; \mathrm {as} \; n,N \rightarrow \infty \end{aligned}$$

(31)

In addition, from Lemma 3 it follows that

$$\begin{aligned} n\sum _{h=1}^H\sum _{k=1}^K (\widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k})^2 & = \left\{ \sqrt{n}(\widehat{\varvec{p}}^{HK}-\widetilde{\varvec{p}}^{HK})\right\} ^T \left\{ \sqrt{n}(\widehat{\varvec{p}}^{HK}-\widetilde{\varvec{p}}^{HK})\right\} \nonumber \\&{\mathop {\rightarrow }\limits ^{d}} \varvec{X}^{T}\varvec{X} \end{aligned}$$

(32)

where $\varvec{X}$ is a singular multivariate (HK) Normal r.v. with null mean vector and covariance matrix $\varvec{\Gamma }^{HK}=\varvec{C}\varvec{\Sigma }^{HK}\varvec{C}^T$. The lemma follows from (31) and (32) and the continuous mapping theorem. $\square$

Lemma 5

Define

$$\begin{aligned} \chi ^{2}_{1H}=n\sum _{h=1}^H\sum _{k=1}^k\frac{(\widehat{p}^{hk}-\widehat{p}^{h.}\widehat{p}^{.k})^2}{\widehat{p}^{h.}\widehat{p}^{.k}}. \end{aligned}$$

(33)

Under the null hypothesis of independence ${{\mathcal {H}}_{0}}$, $\chi ^{2}_{1H}$ tends in distribution to $\varvec{X}^{T}{\varvec{p}}^{HK}({\varvec{p}}^{HK})^{T}\varvec{X}$ where $\varvec{X}$ is a (singular) multivariate HK Normal r.v. with null mean vector and covariance matrix $\varvec{\Gamma }^{HK}=\varvec{C} \varvec{\Sigma }^{HK} \varvec{C}^{T}$.

Proof

It is enough to observe that

$$\begin{aligned} \chi ^{2}_{1H}=\left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})\right\} ^T \varvec{p}^{HK}({\varvec{p}}^{HK})^{T}\left\{ \sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})\right\} \end{aligned}$$

and apply Lemma 3 and the continuous mapping theorem. $\square$

Proof of Proposition 3

The statistic $\chi ^2_{H}$ can be written as $\chi ^2_{1H}+\chi ^2_{2H}$, where $\chi ^2_{1H}$ and $\chi ^2_{2H}$ are defined in (33) and (30) respectively. The proof is a simple application of Lemma 4, 5.$\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marella, D., Vicard, P. Bayesian network structural learning from complex survey data: a resampling based approach. Stat Methods Appl 31, 981–1013 (2022). https://doi.org/10.1007/s10260-021-00618-x

Download citation

Accepted: 29 November 2021
Published: 21 January 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10260-021-00618-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian network structural learning from complex survey data: a resampling based approach

Abstract

Access this article

Similar content being viewed by others

Robust semiparametric inference for polytomous logistic regression with complex survey design

Design-Unbiased Statistical Learning in Survey Sampling

Robustness in Survey Sampling Using the Conditional Bias Approach with R Implementation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Proposition 1

Lemma 1

Proof

Lemma 2

Proof

Proof of Proposition 2

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian network structural learning from complex survey data: a resampling based approach

Abstract

Access this article

Similar content being viewed by others

Robust semiparametric inference for polytomous logistic regression with complex survey design

Design-Unbiased Statistical Learning in Survey Sampling

Robustness in Survey Sampling Using the Conditional Bias Approach with R Implementation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Proposition 1

Lemma 1

Proof

Lemma 2

Proof

Proof of Proposition 2

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Proof of Proposition 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation