Skip to main content

Abstract

In this chapter the different types of data that you may come across in quantitative research are explored. How data might be first described using various descriptive statistics is addressed before looking at which statistical tool is most appropriate to use and why. The distribution of data, particularly normal distribution curve that helps in the choice of both descriptive statistics and the inferential statistical test is discussed. The two forms of hypotheses, alternate and null, are introduced as well as probability levels and means of establishing whether the data are normally distributed or not. Statistical package for social sciences (SPSS) printouts of some of the more commonly used tests described in the chapter are included to show how to interpret data in order to come to the correct conclusion regarding reporting of the findings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reference

  1. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum; 1988.

    Google Scholar 

Further Reading

  • Altman DG. Practical statistics for medical research. London: Chapman & Hall; 2018.

    Google Scholar 

  • Campbell MJ. Statistics at square two. Understanding modern statistical applications in medicine. 2nd ed. Oxford: Wiley-Blackwell; 2006.

    Book  Google Scholar 

  • Campbell MJ, Swinscow TDV. Statistics at square one. 11th ed. Oxford: Wiley-Blackwell; 2009.

    Google Scholar 

  • Claude CS, Longo G. The Deluge of spurious correlations in big data. Found Sci. 2017;22(3):595–612.

    Article  Google Scholar 

  • Miles J, Shevlin M. Applying regression and correlation: a guide for students and researchers. London: Sage; 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David M. Flinton .

Editor information

Editors and Affiliations

Appendices

Appendix

Independent Samples t-Test

The test below relates to emotional intelligence scores of radiography students, referred to in the printout as test scores.

The independent samples t-test printout consists of two tables. The first table describes the data. In this case the dependent variable was called test score. The next two columns contain the names of the two groups being compared and the number of subjects in each group. The mean score is then seen. In this case we can see that the diagnostic radiographers had a higher test score (36.91) than therapeutic radiographers (32.29). We then can see the standard deviation and standard error of the mean (SEM), which is the standard deviation divided by the square root of the sample size (SEM = σ/√n). This table tells us the difference between the groups, but not whether this difference is significant. For that we need to look at the second table.

Group statistics

 

D/T

N

Mean

Std. deviation

Std. error mean

Test_Score

Diagnostic

80

36.91

4.450

0.498

Therapeutic

21

32.29

5.781

1.261

The second table should be read as two tables in one and to help with this colour has been added splitting the two tables up. The first table coloured blue looks at the result of the Levene’s test, a test that tells you if we have equal variances or not between the two groups. The null hypothesis of the Levene’s test is that the variance of the two groups is equal. If the test reaches significance (p ≤ 0.05), then we reject the null hypothesis that they are equal and accept that they must be different. In the example below the significance of the Levene’s test is 0.449 and therefore we do not reject the null hypothesis and we can assume equal variances. With that decided we now know to read the top row of the t-test result which is coloured yellow. If the Levene’s test had been ≤0.05, we would have read from the bottom row of the yellow t-test as equal variances could not be assumed.

Reading the test we can see the test statistic is 3.974 and the significance value is being reported as 0.000. This would be written up for publication as t = 3.974, p < 0.001. The significance value is only reported to three decimal places so we cannot report an exact value in this instance, just that it is less than 0.001.

Independent samples test

 

Levene’s test for equality of variances

t-test for equality of means

F

Sig.

t

df

Sig. (2-tailed)

Mean difference

Std. error difference

95% confidence interval of the difference

Lower

Upper

Test_Score

Equal variances assumed

0.578

0.449

3.974

99

0.000

4.627

1.164

2.316

6.937

 

Equal variances not assumed

  

3.412

26.544

0.002

4.627

1.356

1.842

7.411

The difference between the two groups is 4.627. This can also be calculated by subtracting the means in the group statistics table.

This finding might be reported as follows: The tests scores for diagnostic radiographers were significantly higher (36.91) than therapeutic that of radiographers (32.29), t = 3.974, p < 0.001.

ANOVA

When using SPSS it is possible to test more than one assumption at a time; the table below contains two analyses. The analyses are all part of one data set and are looking at the diagnostic and therapeutic scores for a mental rotations test looking at comparing the 3 years of students. The first table as with the t-test looks at homogeneity of variance and as the significance values are above 0.05 we can assume equal variance. The ANOVA is quite robust to deviations from homogeneity of variance but it is an assumption of the test. Descriptive of the variables is not done automatically with this test as with the t-test but can be toggled on if required.

Test of homogeneity of variances

 

Levene statistic

df1

df2

Sig.

Diagnostic students’ scores

Based on mean

0.160

2

94

0.852

Based on median

0.176

2

94

0.839

Based on median and with adjusted df

0.176

2

93.675

0.839

Based on trimmed mean

0.183

2

94

0.833

Therapeutic students’ scores

Based on mean

0.431

2

92

0.651

Based on median

0.273

2

92

0.762

Based on median and with adjusted df

0.273

2

91.977

0.762

Based on trimmed mean

0.461

2

92

0.632

The second table is the ANOVA analysis. The important columns are the final two, which give us the test statistic for each test and the significance value. The first test looking at the 3 diagnostic years was not significant (F = 0.349, p = 0.707). The second test looking at therapeutic students is significant (F = 6.014, p = 0.004). What we do not know at the moment is what year group is different from which other year group. The test simply states that there is a difference, it could be that all year groups are different from each other or just 1 year might be different from another, any variation is possible and the more groups you have, the bigger the possible number of possibilities.

ANOVA

 

Sum of squares

df

Mean square

F

Sig.

Diagnostic students’ scores

Between groups

15.927

2

7.963

0.349

0.707

Within groups

2147.558

94

22.846

  

Total

2163.485

96

   

Therapeutic students’ scores

Between groups

258.663

2

129.331

6.014

0.004

Within groups

1978.537

92

21.506

  

Total

2237.200

94

   

In order to find out where the difference is we have to undertake a post hoc test. As there was no significant difference between the diagnostic student year groups, just the therapeutic students we only need to do a post hoc test for the therapeutic students. There are a variety of post hoc tests that can be done and in this instance a Tukey’s HSD (honestly significant difference) test was used. In the table below I have highlighted the significant differences. The table repeats itself, but we can see that the first year’s score was significantly different to the second year’s score (p = 0.016), but not the third year’s score (p = 0.833). The second year’s score was also different to the third year score (p = 0.007).

Multiple comparisons

Tukey’s HSD

Dependent variable

(I) year

(J) year

Mean difference (I–J)

Std. error

Sig.

95% confidence interval

Lower bound

Upper bound

Therapeutic students’ scores

1

2

3.138*

1.110

0.016

0.49

5.78

3

−0.692

1.201

0.833

−3.55

2.17

2

1

−3.138*

1.110

0.016

−5.78

−0.49

3

−3.830*

1.230

0.007

−6.76

−0.90

3

1

0.692

1.201

0.833

−2.17

3.55

2

3.830*

1.230

0.007

0.90

6.76

  1. ∗The mean difference is significant at the 0.05 level

Mann–Whitney U Test

The SPSS printout for non-parametric tests is far simpler than for parametric tests. The table below is the printout for the same data as the ANOVA test. Again, two tests were performed and each row reports one test. The first column tells you what is being tested. The second column what statistical test was performed and then the significance level of the test. The final column tells you whether or not to reject the null hypothesis.

figure a

Double clicking on this box brings up further information seen below. First, we see a bar chart that shows us the frequency of the test scores for each of the two groups. Second, we see a further box that tells us the test statistic for the test we carried out. In this case U = 414.5, p < 0.001.

figure b

Kruskal–Wallis Test

The printout for the Kruskal–Wallis test is very similar to the Mann–Whitney test above consisting of the same columns. As with the ANOVA test in this case two tests were performed and each row reports one test. The first column tells you what is being tested. The second column what statistical test was performed and then the significance level of the test. The final column tells you whether to reject the null hypothesis.

figure c

Again, double clicking the box in SPSS brings up a pop-out with more information. The pop-out shown below shows you each year’s score for the therapeutic students in the form of a box and whisker plot. The second box gives you the test statistic and repeats the significance level of the test. We can now report the finding of the test, H = 9.321, p = 0.009.

figure d

Clicking on pairwise comparisons brings up further information seen below. This is the post hoc test which takes into consideration that we are doing multiple tests. Just as with the ANOVA test the post hoc test is telling us which groups are different to which other groups. In this instance group 1 and 2 are different to each other as are groups 2 and 3.

figure e

Correlation

The following data looks at patient information leaflets and how readable they are to the public. The analysis compared readability score with average sentence length and the use of passive voice in the text. As with other tests you can undertake more than one test at a time with correlations. Colour has been added to the table to show that each test is undertaken twice in the table and each variable is correlated with itself, which of course will give a perfect correlation of 1. Reading horizontally across the first row we can see that the readability score is correlated with sentence length (r = 0.532), a moderate positive correlation and that this correlation is significant (p < 0.001). It can also be seen that readability score and passive voice have a correlation coefficient of 0.098 which is not significant, p = 0.379. Looking at the second row the first box looks at sentence length and readability score and is repeat information, the next box is the correlation coefficient with itself, but the third box gives us new information in that sentence length and passive voice are related to each other and have a correlation coefficient of 0.280, p = 0.10. The third row of boxes is all repeat information.

figure f

When doing a correlation it is very useful to view the result in the form of a scatterplot. This will show the relationship between the two variables. It is not usual to put a line of best fit on as this is technically a regression line and implies that you have done a regression.

figure g

Regression

Regression analysis is widely used for prediction and is used to infer a causal relationship between the independent and dependent variables. In the above example it might be possible to suggest that if we change the average sentence length of a document, we can change its readability score. If we undertake a regression analysis on the above data the first column tells us the correlation coefficient followed by the coefficient of determination (r2 value). The r2 value tells us that approximately 28% of the variance in readability score can be explained by the sentence length.

Model summary

Model

R

R square

Adjusted R square

Std. error of the estimate

1

0.532a

0.283

0.275

0.62398

  1. aPredictors: (Constant), Sentence length

The ANOVA table informs us whether our regression model explains a statistically significant proportion of the variance. The F-ratio in the ANOVA table (see below) tests whether the regression model is a good fit for the data. Remember we could add more variables to the model and build up a picture of what affects readability score. The table shows that the independent variable (sentence length) is statistically significant when predicting the dependent variable (Readability score), F = 32.035, p < 0.0001.

ANOVAa

Model

Sum of squares

df

Mean square

F

Sig.

1

Regression

12.473

1

12.473

32.035

0.000b

Residual

31.538

81

0.389

  

Total

44.011

82

   
  1. aDependent variable: readability score
  2. bPredictors: (constant), sentence length

The coefficients table informs us about the values of the regression line. The column marked B tells us where the line intercepts the axis (11.312) and the slope of the line (0.229), the model predicts an increase of 0.229 in readability score for every increase of 1 word in sentence length. The graph following the coefficients table shows this data and also has the data from the above table present in the form of a formula. The zero intercept is not shown on the graph but let us put some figures into the formula. The intercept is 11.3 and if we want to know the value for a sentence length of 12 the formula becomes 11.3 + (0.23 × 12) = 14.07, which looking at the graph seems to be about right. The final column is the significance value for the regression coefficient.

Coefficientsa

Model

Unstandardised coefficients

Standardised coefficients

t

Sig.

B

Std. error

Beta

1

(Constant)

11.312

0.650

 

17.403

0.000

Sentence length

0.229

0.041

0.532

5.660

0.000

  1. aDependent variable: readability score
figure h

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Flinton, D.M., Malamateniou, C. (2020). Quantitative Methods and Analysis. In: Ramlaul, A. (eds) Medical Imaging and Radiotherapy Research: Skills and Strategies. Springer, Cham. https://doi.org/10.1007/978-3-030-37944-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37944-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37943-8

  • Online ISBN: 978-3-030-37944-5

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics