Abstract
This chapter introduces, characterizes and classifies the eight methods of multivariate data analysis (MVA) covered in this book. When using MVA, several variables are considered simultaneously and their relationship is analyzed quantitatively. MVA aims to describe and explain these relationships or to predict future developments. Bivariate analyses that consider just two variables at a time are a special case of MVA. However, reality is usually much more complex and requires the consideration of more than just two variables. Furthermore, this chapter presents the fundamentals of empirical data analysis that are relevant to all methods discussed in the book. Since most readers will be familiar with these basics, these presentations serve primarily as a repetition or as an opportunity to look up important aspects of quantitative data analysis, such as basic statistical concepts (e.g. mean, standard deviation, covariance), the difference between correlation and causality, and the basics of statistical testing. Finally, the handling of outliers and missing values is discussed and the statistical package IBM SPSS Statistics, which is used in this book, is briefly introduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Both SPSS and R use the point-biserial calculation of a correlation if one of the variables has only two calculation-relevant values.
- 2.
On www.multivariate-methods.info, the reader will also find an Excel sheet with information on the calculation of the various statistical parameters using Excel.
- 3.
In Excel, the mean of a variable can be calculated by: = AVERAGE(matrix), where (matrix) is the range of cells containing the data of the variable. For example, “ = AVERAGE(C6:C55)” calculates the mean of the 50 cells C6 to C55 in column C.
- 4.
In Excel, the sample variance can be calculated by: \(s_{x}^{2}\) = VAR.S(matrix). The population variance can be calculated by: \(\sigma_{x}^{2}\) = VAR.P(matrix).
- 5.
In Excel, the sample standard deviation can be calculated by: \(s_{x}^{{}}\) = STDEV.S(matrix). The population standard deviation is calculated by: \(\sigma_{x}^{{}}\) = STDEV.P(matrix).
- 6.
Variance and standard deviation cannot be interpreted meaningfully for the variable “gender”. However, columns E and F are required for the calculation of covariance and correlations.
- 7.
In Excel, the covariance can be calculated as follows: sxy = COVARIANCE.S(matrix1;matrix2).
- 8.
In Excel, the correlation between variables can be calculated as follows: rxy = CORREL(matrix1;matrix2).
- 9.
Cf. the correlation of binary variables with metrically scaled variables in Sect. 1.1.2.2.
- 10.
For statistical testing, also see Sect. 1.3.
- 11.
The p-value may be calculated in Excel as follows: p = TDIST(ABS(t);N−2;2) or p=1–F.DIST(F;1;n–2;1).
- 12.
The central limit theorem states that the sum or mean of n independent random variables tends toward a normal distribution if n is sufficiently large, even if the original variables themselves are not normally distributed. This is the reason why a normal distribution can be assumed for many phenomena.
- 13.
In Excel we can calculate the critical value \(t_{\alpha /2}\) for a two-tailed t-test by using the function T.INV.2 T(α;df). We get: T.INV.2 T(0.05;99) = 1.98. The values in the last line of the t-table are identical with the normal distribution. With df = 99 the t-distribution comes very close to the normal distribution.
- 14.
In Excel we can calculate the p-value by using the function T.DIST.2 T(ABS(temp);df). For the variable in our example we get: T.DIST.2 T(ABS(−1.90);99) = 0.0603 or 6.03%
- 15.
In Excel we can calculate the critical value \(t_{\alpha }\) for the lower tail by using the function T.INV(α;df). We get: T.INV(0.05;99) = –1.66. For the upper tail we have to switch the sign or use the function T.INV(1–α;df).
- 16.
In Excel we can calculate the p-value for the left tail by using the function T.DIST(temp;df;1). We get: T.DIST(−1.90;99;1) = 0.0302 or 3%. The p-value for the right tail is obtained by the function T.DIST.RT(temp;df).
- 17.
- 18.
The histogram was created with Excel by selecting “Data/Data Analysis/Histogram”. In SPSS, histograms are created by selecting “Analyze/Descriptive Statistics/Explore”.
- 19.
In SPSS we can create boxplots (just like histograms) by selecting “Analyze/Descriptive Statistics/Explore”.
References
Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNelly.
du Toit, S. H. C., Steyn, A. G. W., & Stumpf, R. H. (1986). Graphical exploratory data analysis. New York: Springer.
Freedman, D. (2002). From association to causation: Some remarks on the history of statistics (p. 521). Berkeley, Technical Report No: University of California.
Gigerenzer, G. (2002). Calculated rsks. New York: Simon & Schuster.
Green, P. E., Tull, D. S., & Albaum, G. (1988). Research for marketing decisions (5th ed.). Englewood Cliffs (NJ): Prentice Hall.
Hastie, T., Tibshirani, R., & Friedman, J. (2011). The elements of statistical learning. New York: Springer.
Pearl, J., & Mackenzie, D. (2018). The book of Why—The new science of cause and effect. New York: Basic Books.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 103, pp. 677–680.
Tukey, J. W. (1977). Exploratory data analysis. Massachusetts: Addison-Wesley.
Watson, J., Whiting, P. F. & Brush, J. E. (2020). Interpreting a covid-19 test result. British Medical Journal, 12 May 2020, 369:m1808.
Further reading
Anderson, D. R., Sweeney, D. J., & Williams, T. A. (2007). Essentials of modern business statistics with Microsoft Excel. Mason (OH): Thomson.
Darren, G., & Mallery, P. (2021). IBM SPSS Statistics 27 step by step: A simple guide and reference (17th ed.). New York: Routledge.
Field, A., Miles, J., & Field, Z. (2012). Discovering sstatistics using R. London: Sage.
Fisher, R. A. (1990). Statistical methods, experimental design, and scientific inference. Oxford: Oxford University Press.
Freedman, D., Pisani, R., & Purves, R. (2007). Statistics (4th ed.). New York: Norton & Company.
Sarstedt, M., & Mooi, E. (2019). A concise guide to market research: The process, data, and methods using IBM SPSS statistics (3rd ed.). Berlin: Springer.
Wonnacott, T. H., & Wonnacott, R. J. (1977). Introductory statistics for business and economics (2nd ed.). Santa Barbara: Wiley.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 Der/die Herausgeber bzw. der/die Autor(en), exklusiv lizenziert durch Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature
About this chapter
Cite this chapter
Backhaus, K., Erichson, B., Gensler, S., Weiber, R., Weiber, T. (2021). Introduction to Empirical Data Analysis. In: Multivariate Analysis. Springer Gabler, Wiesbaden. https://doi.org/10.1007/978-3-658-32589-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-658-32589-3_1
Published:
Publisher Name: Springer Gabler, Wiesbaden
Print ISBN: 978-3-658-32588-6
Online ISBN: 978-3-658-32589-3
eBook Packages: Business and Economics (German Language)