Penalized estimation of directed acyclic graphs from discrete data

Gu, Jiaying; Fu, Fei; Zhou, Qing

doi:10.1007/s11222-018-9801-y

Penalized estimation of directed acyclic graphs from discrete data

Published: 02 February 2018

Volume 29, pages 161–176, (2019)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

885 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

Bayesian networks, with structure given by a directed acyclic graph (DAG), are a popular class of graphical models. However, learning Bayesian networks from discrete or categorical data is particularly challenging, due to the large parameter space and the difficulty in searching for a sparse structure. In this article, we develop a maximum penalized likelihood method to tackle this problem. Instead of the commonly used multinomial distribution, we model the conditional distribution of a node given its parents by multi-logit regression, in which an edge is parameterized by a set of coefficient vectors with dummy variables encoding the levels of a node. To obtain a sparse DAG, a group norm penalty is employed, and a blockwise coordinate descent algorithm is developed to maximize the penalized likelihood subject to the acyclicity constraint of a DAG. When interventional data are available, our method constructs a causal network, in which a directed edge represents a causal relation. We apply our method to various simulated and real data sets. The results show that our method is very competitive, compared to many existing methods, in DAG estimation from both interventional and high-dimensional observational data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse directed acyclic graphs incorporating the covariates

Article 16 August 2018

Structure learning of sparse directed acyclic graphs incorporating the scale-free property

Article 29 September 2018

LeCaSiM: Learning Causal Structure via Inverse of M-Matrices with Adjustable Coefficients

Article Open access 15 February 2024

References

Aragam, B., Zhou, Q.: Concave penalized estimation of sparse Bayesian networks. J. Mach. Learn. Res. 16, 2273–2328 (2015)
MathSciNet MATH Google Scholar
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)
Article MathSciNet MATH Google Scholar
Bielza, C., Li, G., Larranaga, P.: Multi-dimensional classification with Bayesian networks. Int. J. Approx. Reason. 52(6), 705–727 (2011)
Article MathSciNet MATH Google Scholar
Bouckaert, R.R.: Probabilistic network construction using the minimum description length principle. In: Symbolic and Quantitative Approaches to Reasoning and Uncertainty: European Conference ECSQARU ’93, Lecture Notes in Computer Science, vol. 747, pp. 41–48. Springer (1993)
Bouckaert, R.R.: Probabilistic network construction using the minimum description length principle. Technical Report RUU-CS-94-27, Department of Computer Science, Utrecht University (1994)
Buntine, W.: Theory refinement on Bayesian networks. In: Proceedings of the Seventh Annual Conference on Uncertainty in Artificial Intelligence, pp. 52–60. Morgan Kaufmann Publishers Inc. (1991)
Chickering, D.M., Heckerman, D.: Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Mach. Learn. 29(2), 181–212 (1997)
Article MATH Google Scholar
Cooper, G.F., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)
MATH Google Scholar
Cooper, G.F., Yoo, C.: Causal discovery from a mixture of experimental and observational data. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 116–125. Morgan Kaufmann Publishers Inc. (1999)
Csárdi, G., Nepusz, T.: The igraph software package for complex network research. InterJ. Complex Syst. 1695, 1–9 (2006). http://igraph.org
Ellis, B., Wong, W.H.: Learning causal Bayesian network structures from experimental data. J. Am. Stat. Assoc. 103(482), 778–789 (2008)
Article MathSciNet MATH Google Scholar
Erdos, P., Rényi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5(1), 17–60 (1960)
MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Article Google Scholar
Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7(3), 397–416 (1998)
MathSciNet Google Scholar
Fu, F., Zhou, Q.: Learning sparse causal Gaussian networks with experimental intervention: regularization and coordinate descent. J. Am. Stat. Assoc. 108(501), 288–300 (2013)
Article MathSciNet MATH Google Scholar
Gámez, J.A., Mateo, J.L., Puerta, J.M.: Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min. Knowl. Disc. 22(1–2), 106–148 (2011)
Article MathSciNet MATH Google Scholar
Han, S.W., Chen, G., Cheon, M.S., Zhong, H.: Estimation of directed acyclic graphs through two-stage adaptive lasso for gene network inference. J. Am. Stat. Assoc. 111(515), 1004–1019 (2016)
Article MathSciNet Google Scholar
Hauser, A., Bühlmann, P.: Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J. Mach. Learn. Res. 13, 2409–2464 (2012). http://jmlr.org/papers/v13/hauser12a.html
Hauser, A., Bühlmann, P.: Jointly interventional and observational data: estimation of interventional markov equivalence classes of directed acyclic graphs. J. R. Stat. Soc. Ser. B Stat. Methodol. 77(1), 291–318 (2015)
Article MathSciNet MATH Google Scholar
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
MATH Google Scholar
Herskovits, E., Cooper, G.: Kutató: an entropy-driven system for construction of probabilistic expert systems from databases. In: Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, pp. 117–128. Elsevier Science Inc. (1990)
Kalisch, M., Bühlmann, P.: Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8, 613–636 (2007)
MATH Google Scholar
Kalisch, M., Mächler, M., Colombo, D., Maathuis, M.H., Bühlmann, P.: Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47(11), 1–26 (2012). http://www.jstatsoft.org/v47/i11/
Kou, S., Zhou, Q., Wong, W.H.: Equi-energy sampler with applications in statistical inference and statistical mechanics (with discussion). Ann. Stat. 34, 1581–1652 (2006)
Article MATH Google Scholar
Lam, W., Bacchus, F.: Learning Bayesian belief networks: an approach based on the MDL principle. Comput. Intell. 10(3), 269–293 (1994)
Article Google Scholar
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers, pp. 1–12 (2016)
Meganck, S., Leray, P., Manderick, B.: Learning causal Bayesian networks from observations and experiments: a decision theoretic approach. In: International Conference on Modeling Decisions for Artificial Intelligence, pp. 58–69. Springer (2006)
Meier, L., van de Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70(1), 53–71 (2008)
Article MathSciNet MATH Google Scholar
Pearl, J.: Causality: models, reasoning, and inference. Econom. Theory 19, 675–685 (2003)
Article Google Scholar
Peér, D., Regev, A., Elidan, G., Friedman, N.: Inferring subnetworks from perturbed expression profiles. Bioinformatics 17(suppl 1), S215–S224 (2001)
Article Google Scholar
Pournara, I., Wernisch, L.: Reconstruction of gene networks using Bayesian learning and manipulation experiments. Bioinformatics 20(17), 2934–2942 (2004)
Article Google Scholar
Sachs, K., Perez, O., Peér, D., Lauffenburger, D.A., Nolan, G.P.: Causal protein-signaling networks derived from multiparameter single-cell data. Science 308(5721), 523–529 (2005)
Article Google Scholar
Schmidt, M., Murphy, K.: Lassoordersearch: learning directed graphical model structure using $\ell _1$-penalized regression and order search. Learning 8(34), 2 (2006)
Google Scholar
Schmidt, M., Niculescu-Mizil, A., Murphy, K., et al.: Learning graphical model structure using $\ell _1$-regularization paths. AAAI 7, 1278–1283 (2007)
Google Scholar
Scutari, M.: Learning Bayesian networks with the bnlearn R package. J. Stat. Softw. 35(3), 1–22 (2010). https://doi.org/10.18637/jss.v035.i03
Article MathSciNet Google Scholar
Scutari, M.: An empirical-Bayes score for discrete Bayesian networks. In: Conference on Probabilistic Graphical Models, pp. 438–448 (2016)
Scutari, M.: Bayesian network constraint-based structure learning algorithms: parallel and optimized implementations in the bnlearn R package. J. Stat. Softw. 77(2), 1–20 (2017). https://doi.org/10.18637/jss.v077.i02
Article Google Scholar
Shojaie, A., Michailidis, G.: Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97(3), 519–538 (2010)
Article MathSciNet MATH Google Scholar
Shojaie, A., Jauhiainen, A., Kallitsis, M., Michailidis, G.: Inferring regulatory networks by combining perturbation screens and steady state gene expression profiles. PLoS ONE 9(2), e82393 (2014)
Article Google Scholar
Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search. Springer, New York (1993)
Book MATH Google Scholar
Suzuki, J.: A construction of Bayesian networks from databases based on an MDL principle. In: Proceedings of the Ninth Annual Conference on Uncertainty in Artificial Intelligence, pp. 266–273 (1993)
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max–min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)
Article Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)
Article MathSciNet MATH Google Scholar
van de Geer, S., Bühlmann, P.: $\ell _0$-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Stat. 41(2), 536–567 (2013)
Article MATH Google Scholar
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). http://www.stats.ox.ac.uk/pub/MASS4. ISBN 0-387-95457-0
Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)
Article MATH Google Scholar
Wu, T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zhou, Q.: Multi-domain sampling with applications to structural inference of Bayesian networks. J. Am. Stat. Assoc. 106(496), 1317–1330 (2011)
Article MathSciNet MATH Google Scholar
Zhu, J., Hastie, T.: Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3), 427–443 (2004)
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported by NSF Grant IIS-1546098 (to Q.Z.).

Author information

Jiaying Gu and Fei Fu have contributed equally to this work.

Authors and Affiliations

Department of Statistics, University of California, Los Angeles, CA, 90095, USA
Jiaying Gu, Fei Fu & Qing Zhou

Authors

Jiaying Gu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Fu
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qing Zhou.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 248 KB)

Appendix: Asymptotic theory

In this appendix, we establish asymptotic theory for the DAG estimator $\hat{\varvec{\beta }}_{\lambda }$ (12) assuming that p is fixed and $n\rightarrow \infty $. By rearranging and relabeling individual components, we rewrite $\varvec{\beta }$ as $\varvec{\phi }=(\varvec{\phi }_{(1)}, \varvec{\phi }_{(2)})$, where $\varvec{\phi }_{(1)} = \text {vec}( \varvec{\beta }_{1\cdot 1},\ldots , \varvec{\beta }_{1\cdot p}, \ldots , \varvec{\beta }_{p\cdot 1},\ldots , \varvec{\beta }_{p\cdot p})$ is the parameter vector of interest and $\varvec{\phi }_{(2)} = \text {vec}(\varvec{\beta }_{1\cdot 0}, \ldots ,\varvec{\beta }_{p\cdot 0})$ denotes the vector of intercepts. Hereafter, we denote by $\phi _j$ the jth group of $\varvec{\phi }$, such that $\phi _1 = \varvec{\beta }_{1\cdot 1}$, $\phi _2 = \varvec{\beta }_{1\cdot 2}, \ldots , \phi _{p^2}=\varvec{\beta }_{p\cdot p}$, and so on. We say $\varvec{\phi }$ is acyclic if the graph $\mathcal {G}_{\varvec{\phi }}$ induced by $\varvec{\phi }$ (or the corresponding $\varvec{\beta }$) is acyclic.

Define $\varvec{\phi }_{[k]}$ ($k\in \{1,\ldots ,p\}$) to be the parameter vector obtained from $\varvec{\phi }$ by setting $\varvec{\beta }_{k\cdot i} = \mathbf {0}$ for $i=1, \ldots , p$. In other words, the DAG $\mathcal {G}_{\varvec{\phi }_{[k]}}$ is obtained by deleting all edges pointing to the kth node in $\mathcal {G}_{\varvec{\phi }}$; see (10). We assume the data set $\mathcal {X}$ consists of $(p+1)$ blocks, denoted by $\mathcal {X}^j$ of size $n_j \times p$, $j=1,\ldots ,p+1$. The node $X_j$ is experimentally fixed in $\mathcal {X}^j$ for the first p blocks, while the last block contains purely observational data. Let $\mathcal {I}_j$ be the set of row indices of $\mathcal {X}^j$. As demonstrated by (2), we can model interventional data in the kth block of the data matrix $\mathcal {X}^k$ as i.i.d. observations from a joint distribution factorized according to $\mathcal {G}_{\varvec{\phi }_{[k]}}$. Denote the corresponding probability mass function by $p(\mathbf {x}|\varvec{\phi }_{[k]})$, where $\mathbf {x}=(x_1,\ldots ,x_p)$ and $x_j \in \{ 1, \ldots , r_j \}$ for $j=1,\ldots ,p$. To simplify our notation, denote the parameter for the $(p+1)$th block by $\varvec{\phi }_{[p+1]} = \varvec{\phi }$. Then the log-likelihood of $\mathcal {X}$ is

$$\begin{aligned} L(\varvec{\phi }) = \sum _{k=1}^{p+1} L_k(\varvec{\phi }_{[k]}) = \sum _{k=1}^{p+1} \log p(\mathcal {X}^k \mid \varvec{\phi }_{[k]}), \end{aligned}$$

(23)

where $\log p(\mathcal {X}^k | \varvec{\phi }_{[k]})=\sum _{h \in \mathcal {I}_k} \log (p(\mathcal {X}_{h\cdot }| \varvec{\phi }_{[k]}))$ and $\mathcal {X}_{h\cdot } =(\mathcal {X}_{h1},\ldots ,\mathcal {X}_{hp})$. The penalized log-likelihood function with a tuning parameter $\lambda _n>0$ is

$$\begin{aligned} R(\varvec{\phi })= & {} L(\varvec{\phi })-\lambda _n\sum _{j=1}^{p^2}||\phi _j||_2 \nonumber \\= & {} \sum _{k=1}^{p+1}L_k(\varvec{\phi }_{[k]})-\lambda _n\sum _{j=1}^{p^2}||\phi _j||_2, \end{aligned}$$

(24)

where the component group $\phi _j\,(j=1,\ldots ,p^2)$ represents the influence of one variable on another. Let $\Omega = \{\varvec{\phi }: \mathcal {G}_{\varvec{\phi }} \text { is a DAG}\}$ be the parameter space. A penalized estimator $\hat{\varvec{\phi }}$ is obtained by maximizing $R(\varvec{\phi })$ in $\Omega $.

Though interventional data help distinguish equivalent DAGs, the following notion of natural parameters is needed to completely establish identifiability of DAGs for the case where each variable has interventional data. We say that i is an ancestor of j in a DAG $\mathcal {G}$ if there exists at least one path from i to j. Denote the set of ancestors of j by $\text {an}(j)$.

Definition 1

(Natural parameters) We say that $\varvec{\phi } \in \Omega $ is natural if $i \in \text {an}(j) \text { in } \mathcal {G}_{\varvec{\phi }}$ implies that j is not independent of i under the joint distribution given by $\varvec{\phi }_{[i]}$ for all $i,j=1,\ldots ,p$.

For a causal DAG, a natural parameter implies that the effects along multiple causal paths connecting the same pair of nodes do not cancel. This is a reasonable assumption for many real-world problems and is much weaker than the faithfulness assumption. Under the faithfulness assumption, all conditional independence restrictions can be read off from d-separations in the DAG. If nodes i and j are independent in $\varvec{\phi }_{[i]}$, then by faithfulness the nodes i and j must be separated by empty set and thus $i \notin \text {an}(j)$ in $\mathcal {G}_{\varvec{\phi }_{[i]}}$. This implies that $i \notin \text {an}(j)$ in $\mathcal {G}_{\varvec{\phi }}$ as well, by the construction of $\mathcal {G}_{\varvec{\phi }_{[i]}}$. Indeed, we see that the faithfulness assumption implies the natural parameter assumption.

To establish asymptotic properties of our penalized likelihood estimator, we make the following assumptions:

(A1)
The true parameter $\varvec{\phi }^*$ is natural and an interior point of $\Omega $.
(A2)
The parameter $\varvec{\theta }_j$ of the conditional distribution $[X_j | \Uppi _j^{\mathcal {G}}; \varvec{\theta }_j]$ is identifiable for each $j=1,\ldots ,p$. The log-likelihood function $\ell _j(\varvec{\theta }_j) = \log p({x}_j|\Uppi _j^{\mathcal {G}}; \varvec{\theta }_j)$ is strictly concave and continuously three times differentiable for any interior point.

Recall that the kth block of our data, $\mathcal {X}^k$, can be regarded as an i.i.d. sample of size $n_k$ from the distribution $p(\mathbf {x}|\varvec{\phi }_{[k]}^*)$ for all k, while we define $\varvec{\phi }_{[p+1]}^*=\varvec{\phi }^*$ for the last block of observational data.

Theorem 1

Assume (A1) and (A2). If $p(\mathbf {x}|\varvec{\phi }_{[k]})=p(\mathbf {x}|\varvec{\phi }_{[k]}^*)$ for all possible $\mathbf {x}$ and all $k=1,\ldots ,p$, then $\varvec{\phi }=\varvec{\phi }^*$. Furthermore, if $n_k\gg \sqrt{n}$ for all $k=1,\ldots ,p$, then for any $\varvec{\phi } \ne \varvec{\phi }^*$,

$$\begin{aligned} P(L(\varvec{\phi }^*)>L(\varvec{\phi })) \rightarrow 1 \quad \text { as } n \rightarrow \infty . \end{aligned}$$

(25)

Theorem 2

Assume (A1) and (A2). If $\lambda _n/\sqrt{n}\rightarrow 0$ and $n_k\gg \sqrt{n}$ for all $k=1,\ldots ,p$, then there exists a global maximizer $\hat{\varvec{\phi }}$ of $R(\varvec{\phi })$ such that $||\hat{\varvec{\phi }}-\varvec{\phi }^*||_2=O_p(n^{-1/2})$.

Proofs of the two theorems are relegated to Supplemental Material. Theorem 1 confirms that the causal DAG model is identifiable with interventional data assuming a natural parameter. Theorem 2 implies that there is a $\sqrt{n}$-consistent global maximizer of $R(\varvec{\phi })$ with the group norm penalty. Note that Assumption (A2) does not specify a particular choice of model for the conditional distribution $[X_j | \Uppi _j^{\mathcal {G}}]$, and thus, these theoretical results apply to a large class of DAG models for discrete data. In particular, the multi-logit regression model (4) satisfies (A2).

Remark 2

The assumption on the sample size of interventional data, $n_k\gg \sqrt{n}$, imposes a lower bound on how fast the fraction $\alpha _k=n_k/n\gg n^{-1/2}$ can approach zero for $k=1,\ldots ,p$. Although this allows the observational data to dominate when $\alpha _k\rightarrow 0$, the fractions of interventional data must be larger than the typical order $O_p(n^{-1/2})$ of statistical errors so that (25) can hold to establish identifiability of the true causal DAG parameter $\varvec{\phi }^*$. This guarantees that the global maximizer $\hat{\varvec{\phi }}$ will locate in a neighborhood of $\varvec{\phi }^*$ with high probability. Once in this neighborhood, the convergence rate of $\hat{\varvec{\phi }}$ then depends on the size n of all data, both interventional and observational. Therefore, increasing the size of observation data will lead to more accurate estimate $\hat{\varvec{\phi }}$ as long as we keep $\alpha _k\gg n^{-1/2}$ for $k=1,\ldots ,p$.

Remark 3

It is interesting to generalize the above asymptotic results to the case where $p=p_n$ grows with the sample size n, say, by developing nonasymptotic bounds on the $\ell _2$ estimation error $\Vert \hat{\varvec{\phi }}-\varvec{\phi }^*\Vert _2$. However, in order to estimate the causal network consistently, sufficient interventional data are needed for each node, i.e., $n_k$ must approach infinity, and thus, $p/ n \rightarrow 0$ as $n\rightarrow \infty $. This limits us to the low-dimensional setting with $p<n$. Suppose we have a large network with $p\gg n$. One may first apply some regularization method on observational data to screen out independent nodes and to partition the network into small subgraphs that are disconnected to one another. Then for each small subgraph, we can afford to generate enough interventional data for every node and apply the method in this paper to infer the causal structure. Our asymptotic theory provides useful guidance for the analysis in the second step.

For purely observational data, the theory becomes more complicated due to the existence of equivalent DAGs and parameterizations. It is left as future work to establish the consistency of a global maximizer for high-dimensional observational data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gu, J., Fu, F. & Zhou, Q. Penalized estimation of directed acyclic graphs from discrete data. Stat Comput 29, 161–176 (2019). https://doi.org/10.1007/s11222-018-9801-y

Download citation

Received: 20 June 2017
Accepted: 24 January 2018
Published: 02 February 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s11222-018-9801-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Penalized estimation of directed acyclic graphs from discrete data

Abstract

Access this article

Similar content being viewed by others

Sparse directed acyclic graphs incorporating the covariates

Structure learning of sparse directed acyclic graphs incorporating the scale-free property

LeCaSiM: Learning Causal Structure via Inverse of M-Matrices with Adjustable Coefficients

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 248 KB)

Appendix: Asymptotic theory

Definition 1

Theorem 1

Theorem 2

Remark 2

Remark 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Penalized estimation of directed acyclic graphs from discrete data

Abstract

Access this article

Similar content being viewed by others

Sparse directed acyclic graphs incorporating the covariates

Structure learning of sparse directed acyclic graphs incorporating the scale-free property

LeCaSiM: Learning Causal Structure via Inverse of M-Matrices with Adjustable Coefficients

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 248 KB)

Appendix: Asymptotic theory

Appendix: Asymptotic theory

Definition 1

Theorem 1

Theorem 2

Remark 2

Remark 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation