Quantitative Methods and Analysis

Flinton, David M.; Malamateniou, Christina

doi:10.1007/978-3-030-37944-5_15

David M. Flinton² &
Christina Malamateniou²

1567 Accesses

Abstract

In this chapter the different types of data that you may come across in quantitative research are explored. How data might be first described using various descriptive statistics is addressed before looking at which statistical tool is most appropriate to use and why. The distribution of data, particularly normal distribution curve that helps in the choice of both descriptive statistics and the inferential statistical test is discussed. The two forms of hypotheses, alternate and null, are introduced as well as probability levels and means of establishing whether the data are normally distributed or not. Statistical package for social sciences (SPSS) printouts of some of the more commonly used tests described in the chapter are included to show how to interpret data in order to come to the correct conclusion regarding reporting of the findings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reference

Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum; 1988.
Google Scholar

Author information

Authors and Affiliations

School of Health Sciences, City, University of London, London, UK
David M. Flinton & Christina Malamateniou

Authors

David M. Flinton
View author publications
You can also search for this author in PubMed Google Scholar
Christina Malamateniou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David M. Flinton .

Editor information

Editors and Affiliations

Diagnostic Radiography and Imaging, School of Health and Social Work, University of Hertfordshire, Hatfield, Hertfordshire, UK
Aarthi Ramlaul

Appendices

Appendix

Independent Samples t-Test

The test below relates to emotional intelligence scores of radiography students, referred to in the printout as test scores.

The independent samples t-test printout consists of two tables. The first table describes the data. In this case the dependent variable was called test score. The next two columns contain the names of the two groups being compared and the number of subjects in each group. The mean score is then seen. In this case we can see that the diagnostic radiographers had a higher test score (36.91) than therapeutic radiographers (32.29). We then can see the standard deviation and standard error of the mean (SEM), which is the standard deviation divided by the square root of the sample size (SEM = σ/√n). This table tells us the difference between the groups, but not whether this difference is significant. For that we need to look at the second table.

Group statistics
	D/T	N	Mean	Std. deviation	Std. error mean
Test_Score	Diagnostic	80	36.91	4.450	0.498
Test_Score	Therapeutic	21	32.29	5.781	1.261

The second table should be read as two tables in one and to help with this colour has been added splitting the two tables up. The first table coloured blue looks at the result of the Levene’s test, a test that tells you if we have equal variances or not between the two groups. The null hypothesis of the Levene’s test is that the variance of the two groups is equal. If the test reaches significance (p ≤ 0.05), then we reject the null hypothesis that they are equal and accept that they must be different. In the example below the significance of the Levene’s test is 0.449 and therefore we do not reject the null hypothesis and we can assume equal variances. With that decided we now know to read the top row of the t-test result which is coloured yellow. If the Levene’s test had been ≤0.05, we would have read from the bottom row of the yellow t-test as equal variances could not be assumed.

Reading the test we can see the test statistic is 3.974 and the significance value is being reported as 0.000. This would be written up for publication as t = 3.974, p < 0.001. The significance value is only reported to three decimal places so we cannot report an exact value in this instance, just that it is less than 0.001.

Independent samples test
		Levene’s test for equality of variances		t-test for equality of means
		F	Sig.	t	df	Sig. (2-tailed)	Mean difference	Std. error difference	95% confidence interval of the difference
		F	Sig.	t	df	Sig. (2-tailed)	Mean difference	Std. error difference	Lower	Upper
Test_Score	Equal variances assumed	0.578	0.449	3.974	99	0.000	4.627	1.164	2.316	6.937
	Equal variances not assumed			3.412	26.544	0.002	4.627	1.356	1.842	7.411

The difference between the two groups is 4.627. This can also be calculated by subtracting the means in the group statistics table.

This finding might be reported as follows: The tests scores for diagnostic radiographers were significantly higher (36.91) than therapeutic that of radiographers (32.29), t = 3.974, p < 0.001.

ANOVA

When using SPSS it is possible to test more than one assumption at a time; the table below contains two analyses. The analyses are all part of one data set and are looking at the diagnostic and therapeutic scores for a mental rotations test looking at comparing the 3 years of students. The first table as with the t-test looks at homogeneity of variance and as the significance values are above 0.05 we can assume equal variance. The ANOVA is quite robust to deviations from homogeneity of variance but it is an assumption of the test. Descriptive of the variables is not done automatically with this test as with the t-test but can be toggled on if required.

Test of homogeneity of variances
		Levene statistic	df1	df2	Sig.
Diagnostic students’ scores	Based on mean	0.160	2	94	0.852
	Based on median	0.176	2	94	0.839
	Based on median and with adjusted df	0.176	2	93.675	0.839
	Based on trimmed mean	0.183	2	94	0.833
Therapeutic students’ scores	Based on mean	0.431	2	92	0.651
	Based on median	0.273	2	92	0.762
	Based on median and with adjusted df	0.273	2	91.977	0.762
	Based on trimmed mean	0.461	2	92	0.632

The second table is the ANOVA analysis. The important columns are the final two, which give us the test statistic for each test and the significance value. The first test looking at the 3 diagnostic years was not significant (F = 0.349, p = 0.707). The second test looking at therapeutic students is significant (F = 6.014, p = 0.004). What we do not know at the moment is what year group is different from which other year group. The test simply states that there is a difference, it could be that all year groups are different from each other or just 1 year might be different from another, any variation is possible and the more groups you have, the bigger the possible number of possibilities.

ANOVA
		Sum of squares	df	Mean square	F	Sig.
Diagnostic students’ scores	Between groups	15.927	2	7.963	0.349	0.707
	Within groups	2147.558	94	22.846
	Total	2163.485	96
Therapeutic students’ scores	Between groups	258.663	2	129.331	6.014	0.004
	Within groups	1978.537	92	21.506
	Total	2237.200	94

In order to find out where the difference is we have to undertake a post hoc test. As there was no significant difference between the diagnostic student year groups, just the therapeutic students we only need to do a post hoc test for the therapeutic students. There are a variety of post hoc tests that can be done and in this instance a Tukey’s HSD (honestly significant difference) test was used. In the table below I have highlighted the significant differences. The table repeats itself, but we can see that the first year’s score was significantly different to the second year’s score (p = 0.016), but not the third year’s score (p = 0.833). The second year’s score was also different to the third year score (p = 0.007).

Multiple comparisons
Tukey’s HSD
Dependent variable	(I) year	(J) year	Mean difference (I–J)	Std. error	Sig.	95% confidence interval
Dependent variable	(I) year	(J) year	Mean difference (I–J)	Std. error	Sig.	Lower bound	Upper bound
Therapeutic students’ scores	1	2	3.138^*	1.110	0.016	0.49	5.78
	1	3	−0.692	1.201	0.833	−3.55	2.17
	2	1	−3.138^*	1.110	0.016	−5.78	−0.49
	2	3	−3.830^*	1.230	0.007	−6.76	−0.90
	3	1	0.692	1.201	0.833	−2.17	3.55
	3	2	3.830^*	1.230	0.007	0.90	6.76

∗The mean difference is significant at the 0.05 level

Mann–Whitney U Test

The SPSS printout for non-parametric tests is far simpler than for parametric tests. The table below is the printout for the same data as the ANOVA test. Again, two tests were performed and each row reports one test. The first column tells you what is being tested. The second column what statistical test was performed and then the significance level of the test. The final column tells you whether or not to reject the null hypothesis.

Double clicking on this box brings up further information seen below. First, we see a bar chart that shows us the frequency of the test scores for each of the two groups. Second, we see a further box that tells us the test statistic for the test we carried out. In this case U = 414.5, p < 0.001.

Kruskal–Wallis Test

The printout for the Kruskal–Wallis test is very similar to the Mann–Whitney test above consisting of the same columns. As with the ANOVA test in this case two tests were performed and each row reports one test. The first column tells you what is being tested. The second column what statistical test was performed and then the significance level of the test. The final column tells you whether to reject the null hypothesis.

Again, double clicking the box in SPSS brings up a pop-out with more information. The pop-out shown below shows you each year’s score for the therapeutic students in the form of a box and whisker plot. The second box gives you the test statistic and repeats the significance level of the test. We can now report the finding of the test, H = 9.321, p = 0.009.

Clicking on pairwise comparisons brings up further information seen below. This is the post hoc test which takes into consideration that we are doing multiple tests. Just as with the ANOVA test the post hoc test is telling us which groups are different to which other groups. In this instance group 1 and 2 are different to each other as are groups 2 and 3.

Correlation

The following data looks at patient information leaflets and how readable they are to the public. The analysis compared readability score with average sentence length and the use of passive voice in the text. As with other tests you can undertake more than one test at a time with correlations. Colour has been added to the table to show that each test is undertaken twice in the table and each variable is correlated with itself, which of course will give a perfect correlation of 1. Reading horizontally across the first row we can see that the readability score is correlated with sentence length (r = 0.532), a moderate positive correlation and that this correlation is significant (p < 0.001). It can also be seen that readability score and passive voice have a correlation coefficient of 0.098 which is not significant, p = 0.379. Looking at the second row the first box looks at sentence length and readability score and is repeat information, the next box is the correlation coefficient with itself, but the third box gives us new information in that sentence length and passive voice are related to each other and have a correlation coefficient of 0.280, p = 0.10. The third row of boxes is all repeat information.

When doing a correlation it is very useful to view the result in the form of a scatterplot. This will show the relationship between the two variables. It is not usual to put a line of best fit on as this is technically a regression line and implies that you have done a regression.

Regression

Regression analysis is widely used for prediction and is used to infer a causal relationship between the independent and dependent variables. In the above example it might be possible to suggest that if we change the average sentence length of a document, we can change its readability score. If we undertake a regression analysis on the above data the first column tells us the correlation coefficient followed by the coefficient of determination (r² value). The r² value tells us that approximately 28% of the variance in readability score can be explained by the sentence length.

Model summary
Model	R	R square	Adjusted R square	Std. error of the estimate
1	0.532^a	0.283	0.275	0.62398

^aPredictors: (Constant), Sentence length

The ANOVA table informs us whether our regression model explains a statistically significant proportion of the variance. The F-ratio in the ANOVA table (see below) tests whether the regression model is a good fit for the data. Remember we could add more variables to the model and build up a picture of what affects readability score. The table shows that the independent variable (sentence length) is statistically significant when predicting the dependent variable (Readability score), F = 32.035, p < 0.0001.

ANOVA^a
Model		Sum of squares	df	Mean square	F	Sig.
1	Regression	12.473	1	12.473	32.035	0.000^b
	Residual	31.538	81	0.389
	Total	44.011	82

^aDependent variable: readability score
^bPredictors: (constant), sentence length

The coefficients table informs us about the values of the regression line. The column marked B tells us where the line intercepts the axis (11.312) and the slope of the line (0.229), the model predicts an increase of 0.229 in readability score for every increase of 1 word in sentence length. The graph following the coefficients table shows this data and also has the data from the above table present in the form of a formula. The zero intercept is not shown on the graph but let us put some figures into the formula. The intercept is 11.3 and if we want to know the value for a sentence length of 12 the formula becomes 11.3 + (0.23 × 12) = 14.07, which looking at the graph seems to be about right. The final column is the significance value for the regression coefficient.

Coefficients^a
Model		Unstandardised coefficients		Standardised coefficients	t	Sig.
Model		B	Std. error	Beta	t	Sig.
1	(Constant)	11.312	0.650		17.403	0.000
1	Sentence length	0.229	0.041	0.532	5.660	0.000

^aDependent variable: readability score

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Flinton, D.M., Malamateniou, C. (2020). Quantitative Methods and Analysis. In: Ramlaul, A. (eds) Medical Imaging and Radiotherapy Research: Skills and Strategies. Springer, Cham. https://doi.org/10.1007/978-3-030-37944-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-37944-5_15
Published: 24 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37943-8
Online ISBN: 978-3-030-37944-5
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics

Quantitative Methods and Analysis

Abstract

Access this chapter

Reference

Further Reading