Skip to main content

On the Logic of Data Analysis: Which Evaluation Strategy Fits Best to My Research Question?

  • Chapter
  • First Online:
Social Science Data Analysis

Abstract

When calculating regression models, of any kind, the user has to make a number of decisions regarding model building: Are the independent variables included in the regression model together or in several steps? Should interaction effects be considered in the model? The following chapter deals with such basic questions about the logic of data analysis, by presenting three common evaluation strategies: the gross-net model, the stepwise (hierarchical) regression and the moderation analysis. With these modeling variants, different forms of influences of so-called third variables or confounding variables can be examined: Why does the influence of an independent variable X on a dependent variable Y decrease or increase when a third variable Z is added to the model? How does one interpret interaction effects in regression models?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    To be able to compare gross and net effects, a so-called listwise case exclusion can be performed for both models. This means that a case is excluded from the calculation of the regression models as soon as a missing or invalid value is present for the dependent or one of the independent variables. The listwise case exclusion can be associated with a strong reduction of the initial sample in multivariate models, which reduces the chance of finding empirically existing relationships. In addition, the listwise case exclusion can lead to biased estimation results if the absence of a value is not determined by chance alone. For these reasons, so-called imputation methods have been developed, with which missing values can be replaced. An overview of these methods can be found, for example, in Allison (2005).

  2. 2.

    In addition, education and deprivation correlate negatively with each other.

  3. 3.

    A related term in this context is the spurious correlation. The difference between a spurious correlation and a complete mediation is that in the case of a spurious correlation Z affects X and Y, while in the case of mediation X affects Z. With cross-sectional data, which underlie our example, the actual causal direction cannot be clarified. The assumed directions of effects thus remain left to the theoretical argumentation.

  4. 4.

    ‚Becoming smaller‘ or ‚becoming weaker‘ always means that the respective effect moves closer to zero. ‚Becoming larger‘ or ‚becoming stronger‘ means accordingly that the effect moves away from zero in absolute terms.

  5. 5.

    Alternatively to the procedure presented here, it is possible to remove the first intervening variable z1 (in the example: education) from the model again before the next intervening variable z2 (in the example: deprivation) is included, etc. However, it is still recommended to calculate at least a complete model with all independent variables (corresponding to model 5 in Table 7.4).

  6. 6.

    For mediation analyses, it is actually advisable to take a sheet of paper in doubt and draw a path diagram like in Fig. 7.3 with the corresponding signs of the effects.

  7. 7.

    We also already know from the descriptive analysis (Table 7.3) that East German socialized women are more educated.

  8. 8.

    It should be noted that anomie in the specification of model 4, i.e. under control of deprivation and education level, has no effect on Y. The situation might be different if anomie were included in model 2, i.e. without controlling for other variables.

  9. 9.

    One could object here that it is cumbersome and space-intensive in the presentation to include each independent variable individually in the regression model. However, two things have to be distinguished here: first, the analyses that the researcher performs and second, the presentation of the results. For example, it may be sufficient in some cases to present only one gross-net model in tabular form. In this case, however, at least in the text, it should be explained what causes the larger deviations between the gross and net effects in detail, i.e. where mediation and suppression relationships exist. A detailed evaluation of the data, in which the independent variables are included step by step, is therefore recommended in any case.

  10. 10.

    The following literature on mediation and suppression is mentioned, which can help the reader to obtain further information: Classical works on mediation analysis are by Baron and Kenny (1986) and James and Brett (1984). Another application-oriented introduction can be found, using the example of linear regressions, in Urban and Mayerl (2018, Chap. 6). The path-analytic determination of indirect effects is explained by Bollen (1987). Detailed technical details and extensions to mediation analysis can be found in MacKinnon (2008).

  11. 11.

    For simplification reasons, the model is specified more parsimoniously here than in Table 7.4 (Model 5) where education, anomie and deprivation are additionally controlled for.

  12. 12.

    The z-value −3.84 corresponds to the empirical significance level p = 0.00012 in the standard normal distribution for a two-sided alternative hypothesis. The probability of an alpha error—the indirect effect is zero in the population—is thus smaller than 0.001 %.

  13. 13.

    Those who do not have access to specialized software like Mplus can either calculate by hand or use the installation of additional modules. For Stata, the additional module mediation is recommended (see https://econpapers.repec.org/software/bocbocode/s457294.htm) and for SPSS the macro PROCESS (see http://www.processmacro.org/download.html, both last accessed on 06.02.2021). Those who prefer to calculate by hand, but want to save some work, can use the online calculator for the Sobel test by Kristopher J. Preacher (http://quantpsy.org/sobel/sobel.htm; last accessed on 13.12.2020).

  14. 14.

    In the course of this chapter, the terms mediation, suppression and moderation were introduced. This terminology comes from the English-language literature and is used relatively consistently there. In German-language sociological methods books, however, the terminology is partly inconsistent. Mediation is, for example, referred to by Diekmann (2010, p. 726) and Schnell et al. (2011, p. 227) as “explanation” or “interpretation” and by Kühnel and Krebs (2001, p. 473) as “confounding”. Moderation is called “prediction” by Schnell et al. (2011, p. 227), while Diekmann (2010, p. 730) speaks of “specification”.

  15. 15.

    Also for moderation, the reader is recommended further literature. In addition to the fundamental work by Baron and Kenny (1986) on conceptual aspects, the technical details of calculating interaction effects are discussed in detail in Aiken and West (1996), Fox (1997), Frazier et al. (2004) and Whisman and McClelland (2005) and as an introduction in Urban and Mayerl (2018, Chap. 6) as well as Lohmann (2010).

References

  • Aiken, Leona S., und Stephen G. West. 1996. Multiple regression: Testing and interpreting interactions. Newbury Park: Sage.

    Google Scholar 

  • Allison, Paul D. 2005. Missing data. Quantitative applications in the social sciences. Thousand Oaks: Sage.

    Google Scholar 

  • Baron, Reuben M., und David A. Kenny. 1986. The moderator-mediator distinction in social psychological research: Conceptual, strategic and statistical considerations. Journal of Personality and Social Psychology 51:1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173.

  • Bollen, Kenneth A. 1987. Total, direct, and indirect effects in structural equation models. In Sociological methodology, Hrsg. Clifford C. Clogg, 37–69. Washington, D.C.: American Sociological Association. https://doi.org/10.2307/271028.

  • Diekmann, Andreas. 2010. Empirische Sozialforschung. Grundlagen, Methoden, Anwendungen. Reinbek: Rowohlt.

    Google Scholar 

  • Fox, John. 1997. Applied regression analysis, linear models, and related methods. Thousand Oaks: Sage.

    Google Scholar 

  • Frazier, Patricia A., Andrew P. Tix, und Kenneth E. Barron. 2004. Testing moderator and mediator effects in counseling psychology research. Journal of Counseling Psychology 51:115–134. https://doi.org/10.1037/0022-0167.51.1.115.

  • Geiser, Christian. 2011. Datenanalyse mit Mplus. Eine anwendungsorientierte Einführung. Wiesbaden: Springer VS. https://doi.org/10.1007/978-3-531-93192-0.

  • James, Lawrence R., und Jeanne M. Brett. 1984. Mediators, moderators, and tests for mediation. Journal of Applied Psychology 69:307–321. https://doi.org/10.1037/0021-9010.69.2.307.

  • Kühnel, Steffen-M., und Dagmar Krebs. 2001. Statistik für die Sozialwissenschaften. Grundlagen, Methoden, Anwendungen. Reinbek: Rowohlt. https://doi.org/10.1007/s11615-003-0126-9.

  • Krampen, Günter. 1979. Eine Skala zur Messung der normativen Geschlechtsrollen-Orientierung (GRO-Skala). Zeitschrift für Soziologie 8:254–266. https://doi.org/10.1515/zfsoz-1979-0304.

  • Lohmann, Henning. 2010. Nicht-Linearität und Nicht-Additivität in der multiplen Regression: Interaktionseffekte, Polynome und Splines. In Handbuch der sozialwissenschaftlichen Datenanalyse, Hrsg. Christoph Wolf und Henning Best, 677–707. Wiesbaden: VS Verlag. https://doi.org/10.1007/978-3-531-92038-2_26.

  • Lois, Daniel. 2020. Gender role attitudes in Germany, 1982–2016: An age-period-cohort (APC) analysis. Comparative Population Studies 45:35–64. https://doi.org/10.12765/CPoS-2020-02.

  • MacKinnon, David P. 2008. Introduction to statistical mediation analysis. Milton Park: Routledge. https://doi.org/10.4324/9780203809556.

  • Mays, Anja. 2012. Determinanten traditionell-sexistischer Einstellungen in Deutschland – Eine Analyse mit Allbus-Daten. Kölner Zeitschrift für Soziologie und Sozialpsychologie 64:277–302. https://doi.org/10.1007/s11577-012-0165-6.

  • Schnell, Rainer, Paul B. Hill, und Elke Esser. 2011. Methoden der empirischen Sozialforschung. München: Oldenbourg.

    Google Scholar 

  • Sobel, Michael E. 1982. Asymptotic confidence intervals for indirect effects in structural equation models. In Sociological methodology, Hrsg. Samuel Leinhardt, 290–312. Washington, D.C.: American Sociological Association. https://doi.org/10.2307/270723.

  • Statistisches Bundesamt. 2020. Internationale Bildungsindikatoren im Ländervergleich. Ausgabe 2020 – Tabellenband. Wiesbaden.

    Google Scholar 

  • Urban, Dieter, und Jochen Mayerl. 2018. Angewandte Regressionsanalyse: Theorie, Technik und Praxis. Wiesbaden: Springer VS. https://doi.org/10.1007/978-3-658-01915-0.

  • Whisman, Mark A., und Gary H. McClelland. 2005. Designing, testing, and interpreting interactions and moderator effects in family research. Journal of Family Psychology 19:111–120. https://doi.org/10.1037/0893-3200.19.1.111.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian G. Hartmann .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hartmann, F.G., Kopp, J., Lois, D. (2023). On the Logic of Data Analysis: Which Evaluation Strategy Fits Best to My Research Question?. In: Social Science Data Analysis. Springer, Wiesbaden. https://doi.org/10.1007/978-3-658-41230-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-658-41230-2_7

  • Published:

  • Publisher Name: Springer, Wiesbaden

  • Print ISBN: 978-3-658-41229-6

  • Online ISBN: 978-3-658-41230-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics