Skip to main content

Introduction to Empirical Data Analysis

  • Chapter
  • First Online:
Multivariate Analysis

Abstract

This chapter introduces, characterizes and classifies the eight methods of multivariate data analysis (MVA) covered in this book. When using MVA, several variables are considered simultaneously and their relationship is analyzed quantitatively. MVA aims to describe and explain these relationships or to predict future developments. Bivariate analyses that consider just two variables at a time are a special case of MVA. However, reality is usually much more complex and requires the consideration of more than just two variables. Furthermore, this chapter presents the fundamentals of empirical data analysis that are relevant to all methods discussed in the book. Since most readers will be familiar with these basics, these presentations serve primarily as a repetition or as an opportunity to look up important aspects of quantitative data analysis, such as basic statistical concepts (e.g. mean, standard deviation, covariance), the difference between correlation and causality, and the basics of statistical testing. Finally, the handling of outliers and missing values is discussed and the statistical package IBM SPSS Statistics, which is used in this book, is briefly introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Both SPSS and R use the point-biserial calculation of a correlation if one of the variables has only two calculation-relevant values.

  2. 2.

    On www.multivariate-methods.info, the reader will also find an Excel sheet with information on the calculation of the various statistical parameters using Excel.

  3. 3.

    In Excel, the mean of a variable can be calculated by: = AVERAGE(matrix), where (matrix) is the range of cells containing the data of the variable. For example, “ = AVERAGE(C6:C55)” calculates the mean of the 50 cells C6 to C55 in column C.

  4. 4.

    In Excel, the sample variance can be calculated by: \(s_{x}^{2}\) = VAR.S(matrix). The population variance can be calculated by: \(\sigma_{x}^{2}\) = VAR.P(matrix).

  5. 5.

    In Excel, the sample standard deviation can be calculated by: \(s_{x}^{{}}\) = STDEV.S(matrix). The population standard deviation is calculated by: \(\sigma_{x}^{{}}\) = STDEV.P(matrix).

  6. 6.

    Variance and standard deviation cannot be interpreted meaningfully for the variable “gender”. However, columns E and F are required for the calculation of covariance and correlations.

  7. 7.

    In Excel, the covariance can be calculated as follows: sxy = COVARIANCE.S(matrix1;matrix2).

  8. 8.

    In Excel, the correlation between variables can be calculated as follows: rxy = CORREL(matrix1;matrix2).

  9. 9.

    Cf. the correlation of binary variables with metrically scaled variables in Sect. 1.1.2.2.

  10. 10.

    For statistical testing, also see Sect. 1.3.

  11. 11.

    The p-value may be calculated in Excel as follows: p = TDIST(ABS(t);N−2;2) or p=1–F.DIST(F;1;n–2;1).

  12. 12.

    The central limit theorem states that the sum or mean of n independent random variables tends toward a normal distribution if n is sufficiently large, even if the original variables themselves are not normally distributed. This is the reason why a normal distribution can be assumed for many phenomena.

  13. 13.

    In Excel we can calculate the critical value \(t_{\alpha /2}\) for a two-tailed t-test by using the function T.INV.2 T(α;df). We get: T.INV.2 T(0.05;99) = 1.98. The values in the last line of the t-table are identical with the normal distribution. With df = 99 the t-distribution comes very close to the normal distribution.

  14. 14.

    In Excel we can calculate the p-value by using the function T.DIST.2 T(ABS(temp);df). For the variable in our example we get: T.DIST.2 T(ABS(−1.90);99) = 0.0603 or 6.03%

  15. 15.

    In Excel we can calculate the critical value \(t_{\alpha }\) for the lower tail by using the function T.INV(α;df). We get: T.INV(0.05;99) = –1.66. For the upper tail we have to switch the sign or use the function T.INV(1–α;df).

  16. 16.

    In Excel we can calculate the p-value for the left tail by using the function T.DIST(temp;df;1). We get: T.DIST(−1.90;99;1) = 0.0302 or 3%. The p-value for the right tail is obtained by the function T.DIST.RT(temp;df).

  17. 17.

    Cf., e.g., Hastie et al. 2011, Pearl and Mackenzie 2018; Gigerenzer 2002.

  18. 18.

    The histogram was created with Excel by selecting “Data/Data Analysis/Histogram”. In SPSS, histograms are created by selecting “Analyze/Descriptive Statistics/Explore”.

  19. 19.

    In SPSS we can create boxplots (just like histograms) by selecting “Analyze/Descriptive Statistics/Explore”.

References

  • Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNelly.

    Google Scholar 

  • du Toit, S. H. C., Steyn, A. G. W., & Stumpf, R. H. (1986). Graphical exploratory data analysis. New York: Springer.

    Book  Google Scholar 

  • Freedman, D. (2002). From association to causation: Some remarks on the history of statistics (p. 521). Berkeley, Technical Report No: University of California.

    Google Scholar 

  • Gigerenzer, G. (2002). Calculated rsks. New York: Simon & Schuster.

    Google Scholar 

  • Green, P. E., Tull, D. S., & Albaum, G. (1988). Research for marketing decisions (5th ed.). Englewood Cliffs (NJ): Prentice Hall.

    Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2011). The elements of statistical learning. New York: Springer.

    Google Scholar 

  • Pearl, J., & Mackenzie, D. (2018). The book of Why—The new science of cause and effect. New York: Basic Books.

    Google Scholar 

  • Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 103, pp. 677–680.

    Google Scholar 

  • Tukey, J. W. (1977). Exploratory data analysis. Massachusetts: Addison-Wesley.

    Google Scholar 

  • Watson, J., Whiting, P. F. & Brush, J. E. (2020). Interpreting a covid-19 test result. British Medical Journal, 12 May 2020, 369:m1808.

    Google Scholar 

Further reading

  • Anderson, D. R., Sweeney, D. J., & Williams, T. A. (2007). Essentials of modern business statistics with Microsoft Excel. Mason (OH): Thomson.

    Google Scholar 

  • Darren, G., & Mallery, P. (2021). IBM SPSS Statistics 27 step by step: A simple guide and reference (17th ed.). New York: Routledge.

    Google Scholar 

  • Field, A., Miles, J., & Field, Z. (2012). Discovering sstatistics using R. London: Sage.

    Google Scholar 

  • Fisher, R. A. (1990). Statistical methods, experimental design, and scientific inference. Oxford: Oxford University Press.

    Google Scholar 

  • Freedman, D., Pisani, R., & Purves, R. (2007). Statistics (4th ed.). New York: Norton & Company.

    Google Scholar 

  • Sarstedt, M., & Mooi, E. (2019). A concise guide to market research: The process, data, and methods using IBM SPSS statistics (3rd ed.). Berlin: Springer.

    Book  Google Scholar 

  • Wonnacott, T. H., & Wonnacott, R. J. (1977). Introductory statistics for business and economics (2nd ed.). Santa Barbara: Wiley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Klaus Backhaus .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Der/die Herausgeber bzw. der/die Autor(en), exklusiv lizenziert durch Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Backhaus, K., Erichson, B., Gensler, S., Weiber, R., Weiber, T. (2021). Introduction to Empirical Data Analysis. In: Multivariate Analysis. Springer Gabler, Wiesbaden. https://doi.org/10.1007/978-3-658-32589-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-658-32589-3_1

  • Published:

  • Publisher Name: Springer Gabler, Wiesbaden

  • Print ISBN: 978-3-658-32588-6

  • Online ISBN: 978-3-658-32589-3

  • eBook Packages: Business and Economics (German Language)

Publish with us

Policies and ethics