Abstract
Nowadays there is increasing availability of good quality official statistics data. The construction of multivariate statistical models possibly leading to the identification of causal relationships is of interest. In this context Bayesian networks play an important role. A crucial step consists in learning the structure of a Bayesian network. One of the most widely used procedures is the PC algorithm consisting in carrying out several independence tests on the available data set and in building a Bayesian network according to the tests results. The PC algorithm is based on the irremissible assumption that data are independent and identically distributed. Unfortunately, official statistics data are generally collected through complex sampling designs, then the aforementioned assumption is not met. In such a context the PC algorithm fails in learning the structure. To avoid this, the sample selection must be taken into account in the structural learning process. In this paper, a modified version of the PC algorithm is proposed for inferring causal structure from complex survey data. It is based on resampling techniques for finite populations. A simulation experiment showing the robustness with respect to departures from the assumptions and the good performance of the proposed algorithm is carried out.
Similar content being viewed by others
References
Antal E, Tillé Y (2011) A direct bootstrap method for complex sampling designs from a finite population. J Amer Statist Assoc 106:534–543
Ballin M, Scanu M (2010) Vicard P (2010) Estimation of contingency tables in complex survey sampling using probabilistic expert systems. J Stat Plan Inference 140:1501–1512
Beaumont J-F, Patak Z (2012) On the generalized bootstrap for sample surveys with special attention to poisson sampling. Int Stat Rev 80:127–148
Berger YG (2011) Asymptotic consistency under large entropy sampling designs with unequal probabilities. Pak J Stat 27:407–426
Bickel PJ, Freedman DA (1981) Some asymptotic theory for the bootstrap. Ann Statist 9:1196–1217
Boistard H, Lophuhaä HP, Ruiz-Gazen A (2017) Functional central limit theorems for single-stage sampling design. Ann Stat 45:1728–1758
Booth JG, Butler RW, Hall P (1994) Bootstrap methods for finite populations. J Amer Statist Assoc 89:1282–1289
Chao MT, Lo S-H (1985) A bootstrap method for finite population. Sankhya Ser A 47:399–405
Chauvet G (2007) Méthodes de bootstrap en population finie. Ph.D. Dissertation, Laboratoire de statistique d’enquêtes, CREST-ENSAI, Universioté de Rennes 2,
Chatterjee A (2011) Asymptotic properties of sample quantiles from a finite population. Ann Inst Statist Math 63:157–179
Conti PL (2014) On the estimation of the distribution function of a finite population under high entropy sampling designs, with applications. Sankhya B 76:234–259
Conti PL, Marella D (2015) Inference for quantiles of a fnite population: asymptotic vs. resampling results. Scand J Stat 42:545–561
Conti PL, Marella D (2015) Inference for quantiles of a finite population: Asymptotic versus resampling results. Scand J Stat 42:545–561
Conti PL, Marella D, Mecatti F, Andreis F (2019) A unified principled framework for resampling based on pseudo-populations: asymptotic theory. Bernoulli 26:1044–1069
Conti PL, Di Iorio A (2018) Analytic inference in finit populations via resampling, with applications to confidence intervals and testing for independence, arXiv:1809.08035. Submitted under second review
Conti PL, Di Iorio A, Guandalini A, Marella D, Vicard P, Vitale V (2020) On the estimation of the Lorenz curve under complex sampling designs. Stat Meth Appl 29:1–24
Cowell RG, Dawid P, Lauritzen SL, Spiegelhalter DJ (2007) Probabilistic networks and expert systems: exact computational methods for bayesian networks, Springer Publishing Company
Di Zio M, Scanu M, Coppola L, Luzi O, Ponti A (2004) Bayesian networks for imputation. J Royal Stat Soc A 167:309–322
Drton M, Maathuis MH (2017) Structure learning in graphical modeling. Annu Rev Stat Appl 4:365–393
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Friedman N, Goldszmidt M, Wyner A (1999) Data analysis with bayesian networks: a bootstrap approach. Proceedings of the 15th annual conference on uncertainty in artificial intelligence, 196-201,
Grafström A (2010) Entropy of unequal probability sampling designs. Stat Methodol 7:84–97
Hájek J (1964) Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann Math Stat 35:1491–1523
Holmberg A (1998) A bootstrap approach to probability proportional-to-size sampling, Proceedings of the ASA Section on Survey research Methods, 378–383
Jiménez-Gamero MD, Moreno-Rebollo JL, Mayor-Gallego JA (2018) On the estimation of the characteristic function in finite populations with applications. Test 27:95–121
Gross ST (1980) Median estimation in sample surveys. In Proceedings of the section on survey reasearch methods. American Statistical Association 181-184
Kalisch M, Mächler M, Colombo D, Maathuis MH, Bühlmann P (2012) Causal inference using graphical models with the R package pcalg. J Stat Softw 47:1–26
Lagani V, Athineou G, Farcomeni A, Tsagris M, Tsamardinos I (2017) Feature selection with the R package MXM: discovery statistically-equivalentfeature subsets. J Stat Softw 80:7
Lahiri SN (2003) Resampling methods for dependent data. Springer series in statistics. Springer, New York
Mashreghi Z, Haziza D, Leger C (2016) A survey of bootstrap methods in finite population sampling. Stat Surv 10:1–52
Marella D, Vicard P (2013) Object-oriented bayesian networks for modeling the respondent measurement error. Commun Stat 42:3463–3477
Marella D, Vicard P (2015) Object-oriented bayesian network to deal with measurement error in household surveys. Advances in Statistical Models for Data Analysis, Springer
Marella D, Pfeffermann D (2019) Matching information from two independent informative samples. J Stat Plan Inference 203:70–81
McCarthy PJ, Snowden CB (1985) The bootstrap and finite population sampling. In Vital and health statistics 95(2): 1–23. Washington, DC: Public Heath Service Publication, U.S. Government Printing,
Pfeffermann D (1993) The role of sampling weights when modeling survey data. Int Stat Rev 61:317–337
Pfeffermann D (2001) Modelling of complex survey data: why model? Why is it a problem? How can we approach it? Surv Methodol 37:115–136
Ramsey J, Spirtes P, Zhang J (2006) Adjacency-faithfulness and conservative causal inference, Proceedings of 22nd conference on uncertainty in artificial intelligence, 401–408. Oregon: AUAI Press,
Ranalli MG, Mecatti F (2012) Comparing recent approaches for bootstrapping sample survey data: a first step towards a unified approach. In Proceedings of the ASA section on survey research methods, 4088-4099,
Rao CR (1973) Linear statistical inference and its applications, 2nd edn. Wiley, New York
Rao JNK, Scott AJ (1981) The analysis of categorical data from complex sample surveys: chi-squared tests for goodness-of-fit and independence in two-way tables. J Am Stat Assoc 76:221–230
Rao JNK, Scott AJ (1984) On chi-squared tests for multi-way tables with cell proportions estimated from survey data. Ann Stat 12:46–60
Rao JNK, Wu C-FJ (1988) Resampling inference with complex survey data. J Amer Statist Assoc 83:231–241
Serfling RJ (1980) Approximation theory of mathematical statistics. Wiley, New York
Sitter RR (1992) A resampling procedure for complex survey data. J Amer Statist Assoc 87:755–765
Skinner CJ, Holt D, Smith MF (1989) Analysis of complex surveys. Wiley
Spirtes P, Glymour G, Scheines R (2000) Causation, Prediction, and Search, MIT Press, Cambridge, MA, 2nd ed. with additional material by D. Heckerman, C. Meek, G. F. Cooper and T. Richardson
Thibaudeau Y, Winkler WE (2002) Bayesian networks representations, generalized imputation, and synthetic micro-data satisfying analytic constraints, Research Report RRS2002/92002. U.S, Bureau of the Census
Tsagris M (2019) Bayesian network learning with PC algorithm: an improved and correct variation. Appl Artif Intell 33(2):101–123
Tsamardinos IL, Brown E, Aliferis CF (2006) The max-min climbing Bayesian network structure learning algorithm. Mach Learn 65(1):31–78
Uhler C, Raskutti G, Bühlmann P, Yu B (2013) Geometry of the faithfulness assumption in causal inference. Ann Stat 41:436–463
Verma T, Pearl J (1990) On equivalence of causal models. Technical Report R-150, Department of Computer Science, University of California at Los Angeles
Wilcox RR (2010) Fundamentals of modern statistical methods, Substantially improving power and accuracy. Springer
Zhang J, Spirtes P (2008) Detection of unfaithfulness and robust causal inference. Minds Mach 18:239–271
Acknowledgements
We want to thank the anonymous referees whose comments considerably improved an earlier version of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Proposition 1
Here the main lines showing how Proposition 1 descends from Proposition 1 in Conti et al. (2018) are provided.
Define the cumulative distribution functions (c.d.f.s),
the empirical c.d.f.s
where \(\widehat{p}^{uv}\) are estimated using the classical Hájek estimators as in (6), and the corresponding random vectors (with elements in lexicographic order)
and
Note that the random vector \(\varvec{T}^{HK}\) lies on a hyperplane of dimension \(HK-1\), due to the relationships \(\widehat{F}^{HK} = F^{HK}=1\) (then the last component of \(\varvec{T}^{HK}\) is 0).
From Conti et al. (2018) it follows that \(\varvec{T}^{HK}\) tends in distribution, as \(n, N \rightarrow \infty\), to a degenerate multivariate Normal r.v. with mean vector \(\varvec{0}^{HK}\) (with HK components) and covariance matrix \(\varvec{\Omega }^{HK}\). Since the limiting distribution is degenerate (it lies in a sub-space of dimension \(HK-1\)), the matrix \(\varvec{\Omega }^{HK}\) is degenerate. However, this does not affect neither its definition, nor its basic properties (cfr. Rao (1973), pp. 184-185). In addition, again from Conti et al. (2018), the relationship
holds, where \(\varvec{\Omega }^{HK}_1\) is the part of the total variability due to sampling design, \(\varvec{\Omega }^{HK}_2\) is the part of variability due to superpopulation model, and f is the limiting sampling fraction.
Define now
From
where \(h=1,\dots , H\), \(k=1,\dots , K\), it is immediate to verify that the map
is linear, and hence continuous. From the continuous mapping theorem, \(\varvec{W}^{HK}\) tends in distribution to a degenerate multivariate Normal distribution with mean \(\varvec{0}^{HK}\) and (singular) covariance matrix \(\varvec{\Sigma }^{HK}\). In view of (25), the matrix \(\varvec{\Sigma }^{HK}\) can be decomposed as
Next, define
The map \(\varvec{W}^{HK} \mapsto \varvec{W}^{H}\) is linear, and hence continuous. From the continuous mapping theorem, it follows that \(\varvec{W}^{H}\) tends in distribution to a (degenerate) multivariate Normal distribution, with mean vector \(\varvec{0}^{H}\) and covariance matrix \(\varvec{\Sigma }^H\). From (27), it also follows that the following decomposition holds:
Finally, using exactly the same arguments as above, it is not difficult to see that the degenerate r.v.
tends in distribution to a (degenerate) multivariate Normal distribution, with mean vector \(\varvec{0}^{K}\) and covariance matrix \(\varvec{\Sigma }^K\). Again, the decomposition
holds.\(\square\)
In order to prove Propositions 2, 3, define the vectors \(\widetilde{{\varvec{p}}}^{HK}\) and \(\overline{{\varvec{p}}}^{HK}\) of length HK
and the matrices (\(H \times HK\) and \(K \times HK\), respectively)
where
-
i)
\({\varvec{A}}_h\) is a matrix of size \(H \times K\) with all entries equal to 0 but the entries of the hth row which are equal to 1, for \(h=1,\dots ,H\).
-
ii)
\({\varvec{B}}_h\) is an identity matrix of order K, for \(h=1,..,H\).
If we set
then the relationships
hold. Next, define the matrices (\(HK\times H\), \(HK\times H\) and \(HK \times K\), respectively)
where
-
1.
\(\varvec{\Pi }_h\) is a matrix of size \(K \times H\) having all entries equal to zero but the entries in the hth column that are equal to \(p^{.1},p^{.2},\dots ,p^{.K}\), for \(h=1,...,H\).
-
2.
\(\widehat{\varvec{\Pi }}_h\) is a matrix of order \(K \times H\) having all entries equal to zero but the entries in the hth column that are equal to \(\widehat{p}^{.1},\widehat{p}^{.2},\dots ,\widehat{p}^{.K}\), for \(h=1,...,H\).
-
3.
\(\varvec{\Psi }_h\) is a diagonal matrix of order \(K \times K\), with all entries in the main diagonal equal to \(p^{h.}\), for \(h=1,...,H\).
With this symbols, we may write
where \(\varvec{I}^{HK}\) is the identity matrix of size \(HK \times HK\).
Lemma 1
\(\widehat{p}^{hk}-p^{hk}\) converges in probability to 0 as, n, N go to infinity, for each h, k.
Proof
Immediate consequence of Proposition 1. \(\square\)
Note that Proposition 1 actually implies that \(\widehat{p}^{hk}-p^{hk}=O_{p}(n^{-1/2})\), for each h, k.
Lemma 2
\(\widehat{p}^{h.}-p^{h.}\), \(\widehat{p}^{.k}-p^{.k}\) converge in probability to 0 as, n, N go to infinity, for each h, k.
Proof
Consequence of Lemma 1. \(\square\)
Proof of Proposition 2
It is enough to use the relationship (28). Proposition 2 follows from (28), Proposition 1, and the continuous mapping theorem. \(\square\)
Lemma 3
Under the independence hypothesis \({\mathcal H_{0}}\), the limiting distribution of \(\sqrt{n}(\widehat{{\varvec{p}}}^{HK}-\widetilde{{\varvec{p}}}^{HK})\) coincides with the limiting distribution of
that turns out to be (degenerate) multivariate Normal with null mean vector and covariance matrix,
Proof
From the relationship
it follows that, in matrix terms,
Next, from Lemma 2, the matrix \(\widehat{\Pi }\) tends in probability to \(\Pi\), as n, N go to infinity. Using the Slutsky Theorem (Serfling 1980), this implies, in its turns, that the limiting distribution of,
coincides with the limiting distribution of
The linearity of (29) and the continuous mapping theorem complete the proof. \(\square\)
For the sake of simplicity, from now the notation
will be used.
Lemma 4
Define
Under the null hypothesis of independence \({{\mathcal {H}}_{0}}\), \(\chi ^{2}_{2H}\) converges in probability to 0 as n, N go to infinity.
Proof
First of all, we have
Since convergence in probability is preserved under continuous transformations, the term
In addition, from Lemma 3 it follows that
where \(\varvec{X}\) is a singular multivariate (HK) Normal r.v. with null mean vector and covariance matrix \(\varvec{\Gamma }^{HK}=\varvec{C}\varvec{\Sigma }^{HK}\varvec{C}^T\). The lemma follows from (31) and (32) and the continuous mapping theorem. \(\square\)
Lemma 5
Define
Under the null hypothesis of independence \({{\mathcal {H}}_{0}}\), \(\chi ^{2}_{1H}\) tends in distribution to \(\varvec{X}^{T}{\varvec{p}}^{HK}({\varvec{p}}^{HK})^{T}\varvec{X}\) where \(\varvec{X}\) is a (singular) multivariate HK Normal r.v. with null mean vector and covariance matrix \(\varvec{\Gamma }^{HK}=\varvec{C} \varvec{\Sigma }^{HK} \varvec{C}^{T}\).
Proof
It is enough to observe that
and apply Lemma 3 and the continuous mapping theorem. \(\square\)
Proof of Proposition 3
The statistic \(\chi ^2_{H}\) can be written as \(\chi ^2_{1H}+\chi ^2_{2H}\), where \(\chi ^2_{1H}\) and \(\chi ^2_{2H}\) are defined in (33) and (30) respectively. The proof is a simple application of Lemma 4, 5.\(\square\)
Rights and permissions
About this article
Cite this article
Marella, D., Vicard, P. Bayesian network structural learning from complex survey data: a resampling based approach. Stat Methods Appl 31, 981–1013 (2022). https://doi.org/10.1007/s10260-021-00618-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-021-00618-x