A method of generating multivariate non-normal random numbers with desired multivariate skewness and kurtosis

Abstract

In social and behavioral sciences, data are typically not normally distributed, which can invalidate hypothesis testing and lead to unreliable results when being analyzed by methods developed for normal data. The existing methods of generating multivariate non-normal data typically create data according to specific univariate marginal measures such as the univariate skewness and kurtosis, but not multivariate measures such as Mardia’s skewness and kurtosis. In this study, we propose a new method of generating multivariate non-normal data with given multivariate skewness and kurtosis. Our approach allows researchers to better control their simulation designs in evaluating the influence of multivariate non-normality.

Introduction

In social and behavioral sciences, the normality of data is assumed in most statistical methods. Nonetheless, data are rarely normally distributed in practice. Therefore, the statistical inferences may not be valid, and the results may not be reliable any more when procedures developed for normal data are used to analyze non-normal data (Cain, Zhang, & Yuan, 2017; Micceri, 1989). Many studies in the literature investigated the consequences of the violation of the normality assumption and proposed some alternative procedures to analyze non-normal data. For instance, Bradley (1980) showed that robustness of statistical procedures such as the classical Z, t, and F tests suffered from the non-normality of data. Non-parametric tests and procedures have won appreciation of researchers because they do not rely on the data distribution and therefore, the violation of normality does not directly disqualify data analysis (Hollander and Wolfe, 2015).

In the literature, discussions on non-normality mainly focus on the univariate case; whereas the consequences of deviation from the multivariate normality are less explored. However, the analysis of multivariate data is routinely conducted in social and behavioral sciences research. Therefore, it is important to understand the influence of the multivariate non-normality on the multivariate analysis, which can be done through Monte Carlo simulations. To conduct such simulations, one needs to generate multivariate data with the control of the degree of non-normality. In the literature, most non-normal data generators are developed for univariate data, such as the third-order polynomial power method (the power method; Fleishman, 1978), the fifth-order polynomial method (Headrick, 2002), and the g-h distribution method (Field & Genton, 2012).

The existing methods typically generate multivariate data according to specific univariate marginal measures such as the univariate skewness and kurtosis, but not multivariate measures such as Mardia’s (1970) skewness and kurtosis. For example, the widely used simulation method proposed by Vale and Maurelli (1983) (VM) was built on Fleishman’s (1978) polynomial approach. In addition to generating data for each variable with specific first four moments, their method also controls for a correlation matrix that allows researchers to have a desired multivariate data structure. This method is very popular in the moment-based modeling area, such as structural equation modeling (SEM). However, some researchers have questioned the generalization of this method. Foldnes and Grønneberg (2015) derived the mathematical distribution of the VM approach and showed that even though the approach could generate multivariate data with user-specified marginal skewness and kurtosis, the generated data might not be truly multivariate non-normal. Astivia and Zumbo (2015) have shown that the Vale and Maurelli method has downward bias. In one of their later papers, Astivia and Zumbo (2018) also found the multiplicity solution issue of the Fleishman’s polynomial-related method, which means that there are multiple possible solutions for the polynomial coefficients (a, b, c, and d). This issue might lead to the difference in the analysis even with the same inputs. To remedy the drawback, researchers have developed other methods. Mair, Satorra, and Bentler (2012), for example, introduced a multivariate approach based on copulas that could also generate data with a pre-specified variance-covariance matrix. Foldnes and Olsson (2016) presented a method using linear combinations of independent generator variables. Additionally, Lee and Kaplan (2018) developed a generator for the multivariate ordinal data based on entropy procedures.

Despite their usefulness, none of these methods allows the direct control of the multivariate non-normality measures. Multivariate skewness and kurtosis have been shown to directly impact statistical analysis. For example, Yuan, Bentler, and Zhang (2005) noted that a robust procedure might be necessary for reliable SEM inferences when a sample has a large multivariate kurtosis. More recently, Cain et al., (2017) conducted a meta-analysis study on the multivariate non-normality of the data used in 254 published studies and found that the type I error rates of testing the model fit were remarkably higher in factor analysis when the multivariate normality was violated. Generating multivariate non-normal data with desired multivariate measures is the first step to understanding the type and severity of non-normality. This is because it relates to both multivariate skewness and kurtosis on analysis procedures.

Generating multivariate non-normal random data requires the understanding of the definition of the non-normality and the relationship between univariate and multivariate data. Mardia (1970) introduced the measures of population multivariate skewness and kurtosis as the natural extension of the univariate ones. In the univariate case, with non-zero skewness, the distribution is asymmetry. When the excess kurtosis is not 0 (excess kurtosis equals to kurtosis minus 3), the distribution density function is different from a normal distribution. Similarly, Mardia’s multivariate kurtosis indicates whether the tails are heavy or light in comparison to those of the multivariate normal distribution (DeCarlo, 1997). On the other hand, the Mardia’s skewness is still a measure of symmetry, but cannot take negative values. Higher values indicate severer asymmetry.

To date, there is no available method for researchers to directly specify both multivariate skewness and kurtosis for multivariate non-normal data generation. To fill the gap, we introduce a new method of generating multivariate non-normal data with specific multivariate measures. This approach allows researchers to better control their simulation design in evaluating the influence of the multivariate non-normality. More over, this technique will allow for a better understanding of the relationship between multivariate non-normality and the marginal univariate non-normality.

The rest of the paper is organized as follows. We first propose a new generating method and introduce an R package for the implementation of the method. We then present a simulation study and the results with various conditions. We conclude the study with a summary of our method.

Method

Data model

To generate the non-normal data, we specify the following data model. We use a vector x of p variables as,

$$ \mathbf{x}=r\mathbf{A}\boldsymbol{\xi}, $$
(1)

and each marginal xi as,

$$ x_{i}=r\sum\limits_{j=1}^{q}a_{ij}\xi_{j}, $$
(2)

where ξ = (ξ1,...,ξq) is a vector containing q independent random variables. Each of the variables ξj has the first four ordered moments \(E(\xi _{j})=0,E({\xi _{j}^{2}})=1, E({\xi _{j}^{3}})\), and \(E({\xi _{j}^{4}})\). A = (aij) is a p × q matrix of rank p (pq), and AAt = Σ = cov(x), and r is a random variable, which is independent of ξ, with the first four ordered moments E(r),E(r2),E(r3), and E(r4).

The ordered moments are a set of quantitative measures describing the shape of a distribution. When the ordered moments are normalized, they become the standardized moments (or central moments). Skewness and kurtosis are the third and fourth standardized moments. The ordered and standardized moments can convert to each other as long as mean and variance are provided.

According to the definition of Mardia (1970), the population multivariate skewness (β1) and kurtosis (β2) of x based on our model are computed as,

$$ \beta_{1} =E\{[(\mathbf{x}-\boldsymbol{\mu})^{t} \boldsymbol{\Sigma}^{-1}(\mathbf{y}-\boldsymbol{\mu})]^{3}\} =[E(r^{3})]^{2}\sum\limits_{j=1}^{q}[E({\xi_{j}^{3}})]^{2}, $$
(3)
$$ \begin{array}{@{}rcl@{}} \beta_{2} &=&E\{[(\mathbf{x}-\boldsymbol{\mu})^{t}\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})]^{2}\}\\ &=&E(r^{4})[\sum\limits_{j=1}^{q}E({\xi_{j}^{4}})+p(p-1)]. \end{array} $$
(4)

Footnote 1

Using these formulas, population multivariate skewness and kurtosis can be calculated when univariate measuresFootnote 2, the ordered moments of r and ξj, are given. The standardized multivariate kurtosis formula, centering β2 by p(p + 2), has been obtained in Yuan, Zhang, and Zhao (2017). However, the solution of univariate measures cannot be uniquely obtained based on these formulas from specified multivariate measures.

Although the solution is not on a one-to-one basis, multiple solutions are available that share the same multivariate measures. Since we only care about the measures at the multivariate level rather than the univariate level (or the marginal level), to remedy the lack of uniqueness of the solution, we establish one from multivariate to univariate by applying some constraints.

First, we set r to be a constant 1 for convenience because it is only a scale factor. Second, the number of variables in ξ is set to be the same as the number of variables in X, so that p = q. Additionally, ξ1 to ξp are set to be independent and identically distributed (i.i.d.). Thus, the 3rd- and 4th-ordered moments are the same for all ξj, which are defined as E(ξ3) and E(ξ4). With these constraints, the multivariate skewness and kurtosis above become,

$$ \begin{array}{@{}rcl@{}} \beta_{1}^{*} &=&E\{[(\mathbf{x}-\boldsymbol{\mu})^{t}\boldsymbol{\Sigma}^{-1}(\mathbf{y}-\boldsymbol{\mu})]^{3}\}=p[E(\xi^{3})]^{2}, \end{array} $$
(5)
$$ \begin{array}{@{}rcl@{}} \beta_{2}^{*} & = &E\{[(\mathbf{x}-\boldsymbol{\mu} )^{t}\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\!\boldsymbol{\mu} )]^{2}\} = pE(\xi^{4}) + p(p - 1). \end{array} $$
(6)

When multivariate measures (\(\beta _{1}^{*}\) and \(\beta _{2}^{*}\)) and number of variables (p) are provided, E(ξ3) and E(ξ4) can be computed through Equations (5) and (6). Once we have the moments for all ξj (i.e., E(ξ) = 0, E(ξ2) = 1, E(ξ3) and E(ξ4)), we can generate random numbers of ξ and then transform them to multivariate random numbers of x by Eq. 1.

We acknowledge that relaxing some of the constraints would give researchers more control of the univariate measures (e.g., allowing different univariate measures or setting different scaling factors). However, based on a small-scale simulation, we found that they did not influence the behavior of the multivariate measures much (see Table ?? in the supplementary materials). Therefore, we use the constraints for convenience in this study.

For each ξj, we use a modified power method to generate random non-normal numbers. The widely used power method is proposed by Fleishman (1978), which is to generate non-normal data through a polynomial transformation

$$ Y=a+bZ+cZ^{2}+dZ^{3}, $$
(7)

where Z comes from a standard normal distribution. With the information of the first four desired standardized moments (mean, variance, skewness γ1, and kurtosis γ2) of Y, Fleishman (1978) derived four equations to obtain the coefficients a,b,c, and d through Newton’s method.

In our study, Y is replaced by each ξj as

$$ \begin{array}{@{}rcl@{}} \xi_{j}=a+bZ+cZ^{2}+dZ^{3}. \end{array} $$
(8)

Instead of including the standardized moments (skewness and kurtosis) as used in Fleishman’s method, we use the third- and fourth-ordered moments of ξj. Therefore, the four equations of solving the coefficients are revised as

$$ \begin{array}{@{}rcl@{}} a+c &=&0, \end{array} $$
(9)
$$ \begin{array}{@{}rcl@{}} b^{2}+6bd+2c^{2}+15d^{2}-1 &=&0, \end{array} $$
(10)
$$ \begin{array}{@{}rcl@{}} 72bcd+6b^{2}c+8c^{3}+270cd^{2}-E(\xi^{3}) &=&0, \end{array} $$
(11)
$$ \begin{array}{@{}rcl@{}} 3b^{4}+60b^{2}c^{2}+60c^{4}+60b^{3}d+936bc^{2}d+630b^{2}d^{2}&& \end{array} $$
(12)
$$ \begin{array}{@{}rcl@{}} +4500c^{2}d^{2}+3780bd^{3}+10395d^{4}-E(\xi^{4}) &=&0. \end{array} $$
(13)

With the value of a,b,c, and d, we first generate random numbers from the standard normal distribution to form the sample Z with size n. Then, the sample of ξj is obtained by the polynomial transformation in Eq. 8. Repeatedly sampling Z and conducting transformation for each ξj, one gets the multivariate data with ξ =(ξ1,...,ξp). The final step is to obtain x by applying the specific covariance matrix A to ξ following the data model in Eq. 1 with r = 1.

In summary, the following procedure can be used.

  1. 1.

    With the user-specified multivariate skewness (\(\beta _{1}^{*}\)) and kurtosis (\(\beta _{2}^{*}\)) and the number of variables (p), calculate the third- and fourth-ordered moments of ξj (j = 1,2,⋯ ,p).

  2. 2.

    Generate the standardized ξj by the modified power method to form ξ.

  3. 3.

    Use the Cholesky decomposition to decompose the user-specified correlation matrix (or covariance matrix) to matrix A and multiply it to ξ (x = Aξ).

Through this process, the generated data x will have the desired multivariate skewness and kurtosis. One shortcoming of applying the Cholesky decomposition approach is that, after the linear transformations, the population marginal measures of x (e.g., γ1 and γ2) will be different from the original univariate measures of ξ. However, unlike the VM method, where the marginal measures are of interest, the focus of our method is the multivariate measures and the marginal measures are nuisance parameters. Therefore, our method does not require an intermediate correlation matrix and can apply the Cholesky decomposition directly.

Limited ranges of the skewness and kurtosis

The power method cannot cover all the possible combinations of univariate skewness and kurtosis. This is because the method does not require the distribution of Y, and thus the moments cannot be analytically derived. However, the range relationship of univariate skewness (γ1) and kurtosis (γ2) of Y has been estimated through simulation (Luo, 2011). Based on that relationship, we derived the range relationship between the univariate ξj’s third and fourth moments with our modified power method, which is

$$ E(\xi^{4})\geq1.641[E(\xi^{3})]^{2}+1.774. $$
(14)

Plugging it into the data model in Eq. 1, the relationship of xi’s skewness and kurtosis is,

$$ \gamma_{2}\geq\frac{1.641}{\sum a_{ij}^{4}}(\frac{\gamma_{1}}{\sum a_{ij}^{3}})^{2}-1.226\sum a_{ij}^{4}+3, $$
(15)

which is restricted compared to the theoretical relationship of the general univariate skewness and kurtosis,

$$ \gamma_{2}\geq {\gamma_{1}^{2}}+1. $$
(16)

Correspondingly, applying the inequality in Eq. 13 to the multivariate skewness and kurtosis formulas in Eqs. 5 and 6, the relationship of multivariate skewness and kurtosis in our method can be derived as

$$ \beta^{*}_{2} \geq1.641\beta^{*}_{1}+p(p+0.774). $$
(17)

R package

An R package mnonr is developed based on our method to generate multivariate non-normal random numbers with user-specified multivariate skewness and kurtosis as well as the covariance matrix. If the values of the multivariate skewness and kurtosis are beyond the valid range of our method, the users will get a warning message and the allowed ranges. The package not only implements our multivariate method (function: mnonr), but also provides the Vale and Maurelli (1983) method (function: unonr). In addition, univariate and multivariate skewness and kurtosis significance tests are included (function: mardia).

Example

We now illustrate how to generate non-normal data with the mnonr package. Suppose the goal is to generate bivariate non-normal data with multivariate skewness \(\beta ^{*}_{1}=3\) and kurtosis \(\beta ^{*}_{2}=61\). Both variables have mean 0 and variance 1. The covariance between them is set to be 0.5. In total, we generate 10,000 bivariate random numbers with the desired features.

To generate the data, the R function mnonr was used in which we set n = 10,000, p = 2, ms = 3, mk = 61, and Sigma = matrix(c(1,0.5,0.5,1),2,2). The meaning of each argument is listed below:

  • n: the size of random number to generate;

  • p: the number of variables;

  • ms: the value of multivariate skewness;

  • mk: the value of multivariate kurtosis;

  • Sigma: the covariance matrix of variables.

For illustration, we also calculated the covariance matrix of the generated data and conducted hypothesis testing of the univariate and multivariate skewness and kurtosis through the function mardia.

The R input and output are given below.

figurea

The sample data yield a multivariate skewness 3.15 and multivariate kurtosis 64.11. The covariance matrix is close to the specified one. It also shows clearly that the marginal univariate skewness and kurtosis for the two variables are different. According to marginal (2), the theoretical skewness and kurtosis for marginal variables are: γ1(x1) = 1.22,γ2(x1) = 29.50,γ1(x2) = 0.95,γ2(x2) = 19.56. The scatter-plot and marginal histograms are shown in Fig. 1. Even though both variables have leptokurtic distributions, x1 has larger kurtosis than x2, which shows on the figure that the distribution of x1 has a fatter tail. This is because when we form x, the transformation x = Aξ would yield different distributions of each xj,j = 1,⋯ ,p, even though the ξj,j = 1,⋯ ,p, are iid.

Fig. 1
figure1

Scatter-plot and marginal histograms of two-variable multivariate data

Simulation study

To evaluate the performance of our method, we conducted the following simulation study by varying the sample sizes, covariances, number of variables, and different combinations of multivariate skewness and kurtosis.

Study design

The sample sizes are set to be 100, 1000, and 10,000. We set the variances all to be 1 and varied the covariance between two variables from low to high with values 0, 0.1, 0.3, 0.5, 0.7, and 0.9. In each condition, the covariances of any two variables are set to be the same. The numbers of variables in the multivariate data are set to be 2, 4, and 6, which are also the number of ξj in ξ.

The values of multivariate skewness and kurtosis are chosen based on Cain et al., (2017). They provided a descriptive table of Mardia’s multivariate skewness and kurtosis values collected from 136 multivariate studies. We choose the minimum, first quartile, median, and third quartile values when the sample sizes are larger than 100. The values of multivariate skewness are \(\beta ^{*}_{1} =\) 0, 1, 3, and 15. The multivariate kurtosis values are \(\beta ^{*}_{2} =\) 10, 32, 61, and 91.

We deleted some conditions due to the restricted range of multivariate skewness and kurtosis by Eq. 16. In total, 480 conditions are evaluated, and 1000 replications of data are generated under each condition.

Evaluation

We evaluated the performance of our random number generation method by comparing the statistics of the generated data with the population ones used to generate the data. Specifically, the statistics included multivariate skewness \(\beta ^{*}_{1}\), multivariate kurtosis \(\beta ^{*}_{2}\), ξj’s third moment E(ξ)3, and ξj’s fourth moment E(ξ)4. We calculated the bias (B) and relative bias (RB) of the simulation results of the above statistics. The bias is the difference between the mean of the sample statistic value (\(\hat {\theta }\)) and its corresponding population parameter value (𝜃); and the relative bias is the proportion of the bias of the population value, which are

$$ B = \frac{{\sum}^{N}\hat{\theta}}{N}-\theta, $$
(18)
$$ RB = \frac{B}{\theta}\times 100<percent> $$
(19)

Results

For the sake of space, we only report several representative conditions in Table 1 and the full results are available in the Supplementary Materials. They represent the small (\(\beta ^{*}_{1}=1\), \(\beta ^{*}_{2}=32\)), medium (\(\beta ^{*}_{1}=3\), \(\beta ^{*}_{2}=61\)), and large (\(\beta ^{*}_{1}=15\), \(\beta ^{*}_{2}=91\)) multivariate skewness and kurtosis combinations.

Table 1 Simulation results (partial)

When the sample size increases, the bias of both univariate and multivariate measures becomes smaller. The performance of ξj verified that the modified power method does not affect the accuracy of the power method for generating univariate non-normal data. For multivariate measures, kurtosis tended to be underestimated and skewness tended to be overestimated. Additionally, multivariate kurtosis had smaller relative bias than multivariate skewness.

When comparing simulation results with various covariance settings, we found that both multivariate skewness and kurtosis do not seem to be affected by covariances. The multivariate skewness is only related to the number of variables and the value of E(ξ)3. Similarly, the multivariate kurtosis is influenced by the number of variables and the value of E(ξ)4. Covariance does not play a crucial role in multivariate skewness and kurtosis via our generating method and therefore variance-covariance matrix only affects the marginal measures rather other multivariate measures.

Increasing the number of variables leads to less biased skewness and kurtosis estimates holding other conditions constant.

Conclusions and future directions

In this paper, we proposed a new method for generating multivariate non-normal data. The advantage of our method is that it allows researchers to directly specify both multivariate skewness and kurtosis to better control them. With the data generating model, we established one possible solution to relate multivariate measures to univariate measures: using univariate measures to generate ξ and apply the variance-covariance matrix to produce multivariate x. Our method can help researchers better understand the influence of the multivariate non-normality.

The widely used VM method and our method are both based on Fleishman’s polynomial-related approach. Both can generate correlated multivariate random data. However, the two methods also have important differences. The main difference lies in the perspectives of the multivariate non-normality. First of all, the multivariate non-normality can be simply because of the non-normality of the marginal distribution and/or the multivariate distribution. The VM method concentrates on the univariate non-normality without specifically controlling the multivariate non-normality. In contrast, our method focuses on the multivariate non-normality but not controlling the univariate marginal non-normality. For instance, in a bivariate distribution of x = (x1,x2)’, through the VM method, researchers can specify the marginal univariate skewness and kurtosis of x1 and x2, but not themultivariate measures. With our method, one can directly determine multivariate skewness and kurtosis of x, but with no control of the marginal distribution. The choice of the two methods should be based on the particular research interest.

Because of the use of the power method, our method also inherits the same problems associated with it, such as the Gaussian-like property and multiplicity solution issue. First, as it is discussed by Foldnes and Grønneberg (2015), to evaluate the robustness of Gaussian ML estimation using multivariate data with Gaussian-like property, even with the marginal univariate measures showing severe non-normality, the researchers might get biased results. Without further exploration, we could not identify the degree of the potential impact related to our method. Second, there are different sets of coefficients (a, b, c, d) in the modified power method. For example, in the limited simulation experiment in the supplementary materials (see Table ??), we found that within each parameter (i.e., \(\beta ^{*}_{1}, \beta ^{*}_{2}\), n,p) setting, there were four sets of possible coefficients. Different starting values will yield different sets of coefficients, which could affect the multivariate skewness and kurtosis. We recommend that the researchers should try different starting values in data generating and our R package provides such an option in addition to the default value.

As shown in the simulation results, with a small sample size (n = 100), the relative bias of the multivariate measures could be very high. With the increasing number of variables (p) and sample sizes, this issue becomes less severe. When merging the data of each marginal univariate variable, a small deviation could lead to a large gap for multivariate data. This drawback is shared with other multivariate data generators relating to the reliability of multivariate measures. As a future direction, we plan to develop a sample size planning method of different multivariate skewness and kurtosis to optimize the generating process.

Since our method only used one approach to generate univariate variables, another related limitation as described in the Method section is that some combinations of skewness and kurtosis cannot be obtained. However, our procedure provides researchers a simple data model to transform multivariate measures to univariate ones. In the future, we will apply other univariate generators to our method in order to improve the empirical performance of the multivariate generator and eliminate the potential problems that are related to the current modified power method such as the solution multiplicity and Gaussian-like property.

Open practices statement

The data in this study are based on simulation. None of the data or materials are related to any experiments.

Notes

  1. 1.

    According to Mardia’s definition, y is an identical but independent distribution of x. Since the skewness is a measure of asymmetry, the inside of the power 3 could not be a symmetrical measure (like the even power). The multivariate skewness is like the “squared version” of the univariate one.

  2. 2.

    In this paper, when it is in the multivariate setting, we refer the moments of xi as marginal measures, and the univariate measures are the moments of r and ξj. The marginal skewness and kurtosis of xi are \(\gamma _{1}(x_{i})=E(r^{3}){\sum }_{j=1}^{q} a_{ij}^{3}E({\xi _{j}^{3}})/\sigma _{ii}^{3/2},\)\(\gamma _{2}(x_{i})=E(r^{4})[{\sum }_{j=1}^{q} a_{ij}^{4}(E({\xi _{j}^{4}})-3)/\sigma _{ii}^{2}+3]\) (Yuan and Bentler, 1997).

References

  1. Astivia, O.O., & Zumbo, B.D. (2015). A cautionary note on the use of the Vale and Maurelli method to generate multivariate, nonnormal data for simulation purposes. Educational and Psychological Measurement, 75(4), 541–567. https://doi.org/10.1177/0013164414548894

    Article  Google Scholar 

  2. Astivia, O.O., & Zumbo, B.D. (2018). On the solution multiplicity of the Fleishman method and its impact in simulation studies. British Journal of Mathematical and Statistical Psychology, 71, 437–458. https://doi.org/10.1111/bmsp.12126

    Article  Google Scholar 

  3. Bradley, J.V. (1980). Nonrobustness in z, t, and f tests at large sample sizes. Bulletin of the Psychonomic Society, 16(5), 333–336. https://doi.org/10.3758/BF03329558

    Article  Google Scholar 

  4. Cain, M.K., Zhang, Z., & Yuan, K.H. (2017). Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation. Behavior Research Methods, 49, 1716–1735. https://doi.org/10.3758/s13428-016-0814-1

    Article  PubMed  Google Scholar 

  5. DeCarlo, L.T. (1997). On the meaning and use of kurtosis. Psychological Methods, 2(3), 292–307. https://doi.org/10.1037/1082-989X.2.3.292

    Article  Google Scholar 

  6. Field, C., & Genton, M.G. (2012). The multivariate g-and-h distribution. Technometrics, 48(1), 104–111. https://doi.org/10.1198/004017005000000562

    Article  Google Scholar 

  7. Fleishman, A. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4), 521–532. https://doi.org/10.1007/BF02293811

    Article  Google Scholar 

  8. Foldnes, N., & Grønneberg, S (2015). How general is the Vale–Maurelli simulation approach? Psychometrika, 80(4), 1066–1083. https://doi.org/10.1007/s11336-014-9414-0

    Article  PubMed  Google Scholar 

  9. Foldnes, N., & Olsson, U.H. (2016). A simple simulation technique for nonnormal data with prespecified skewness, kurtosis, and covariance matrix. Multivariate Behavioral Research, 51, 207–219. https://doi.org/10.1080/00273171.2015.1133274

    Article  PubMed  Google Scholar 

  10. Headrick, T.C. (2002). Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions. Computational Statistics & Data Analysis, 40, 685–711. https://doi.org/10.1016/S0167-9473(02)00072-5

    Article  Google Scholar 

  11. Hollander, M., & Wolfe, D.A. (2015). Nonparametric statistical methods, 3rd edn. Wiley.

  12. Lee, Y., & Kaplan, D. (2018). Generating multivariate ordinal data via entropy principles. Psychometrika, 83(1), 156–181. https://doi.org/10.1007/s11336-018-9603-3

    Article  PubMed  Google Scholar 

  13. Luo, H. (2011). Generation of non-normal data-a study of Fleishman’s power method. Dept. of Statistics Uppsala Univ.

  14. Mair, P., Satorra, A., & Bentler, P. (2012). Generating nonnormal multivariate data using copulas: Applications to SEM. Multivariate Behavioral Research, 47(4), 547–565. https://doi.org/10.1080/00273171.2012.692629

    Article  PubMed  Google Scholar 

  15. Mardia, K.V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3), 519–530. https://doi.org/10.1093/biomet/57.3.519

    Article  Google Scholar 

  16. Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105 (1), 156–166. https://doi.org/10.1037/0033-2909.105.1.156

    Article  Google Scholar 

  17. Vale, C.D., & Maurelli, V.A. (1983). Simulating multivariate nonnormal distributions. Psychometrika, 48(3), 465–471. https://doi.org/10.1007/BF02293687

    Article  Google Scholar 

  18. Yuan, K.H., & Bentler, P. (1997). Generating multivariate distributions with specified marginal skewness and kurtosis. In W. Bandilla, & F. Faulbaum (Eds.) SoftStat’97-advances in statistical software, (Vol. 6 pp. 385–391). Stuttgart: Lucius & Lucius.

  19. Yuan, K.H., Bentler, P.M., & Zhang, W. (2005). The effect of skewness and kurtosis on mean and covariance structure analysis: The univariate case and its multivariate implication. Sociological Methods & Research, 34(2), 240–258. https://doi.org/10.1177/0049124105280200

    Article  Google Scholar 

  20. Yuan, K.H., Zhang, Z., & Zhao, Y. (2017). Reliable and more powerful methods for power analysis in structure equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 2(3), 315–330. https://doi.org/10.1080/10705511.2016.1276836

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Wen Qu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(PDF 457 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Qu, W., Liu, H. & Zhang, Z. A method of generating multivariate non-normal random numbers with desired multivariate skewness and kurtosis. Behav Res 52, 939–946 (2020). https://doi.org/10.3758/s13428-019-01291-5

Download citation

Keywords

  • Multivariate non-normal data
  • Multivariate skewness
  • Multivariate kurtosis
  • Random number generation