FormalPara Task 1:

Download and install R on your computer.

FormalPara Task 2:

Install the necessary packages.

  1. a)

    Install the vcd package

    > install.packages()

You will get a list of servers. Select a server that is close to your country.

You will get a list of available packages. Select vcd.

The installation process starts.

Activate the vcd package from the console: > library(vcd)

  1. b)

    Install the FactoMineR package

    > install.packages()

You will get a list of servers. Select a server that is close to your country.

You will get a list of available packages. Select FactoMineR.

The installation process starts.

Activate the FactoMineR package from the console: > library(FactoMineR)

Beware: R considers upper and lower case letters to be different symbols.

FormalPara Task 3:

Get some data. We will create a table with some fictional data.

> a <- matrix(c(100,50,100,50,100,200,100,200,25,50,75,100), nrow=4,dimnames=list(c("A","B","C","D"),c("positive","negative","neutral")))

 > a

 

Positive

Negative

Neutral

A

100

100

25

B

50

200

50

C

100

100

75

D

50

200

100

The data is entered column by column, from first to last row. The variable a is assigned the result. If you need to switch the rows and columns you can do so with the transpose command (t).

> t(a)

 

A

B

C

D

Positive

100

50

100

50

Negative

100

200

100

200

Neutral

25

50

75

100

Now you have some data. The first we do is to create association tables with “assoc” from the vcd package.

 > assoc(a, shade=T)

The parameter shade=T asks the function to color code how significant each cell is.

You may also transpose rows and columns, using the function t().

> assoc(t(a), shade=T)

The red bars mark cells that contain frequencies that are lower than expected. The blue bars mark cells that have higher frequencies than expected. The height of the bars is proportional to significance and the width of the bars is proportional to the support (how much data note that “negative” has a wider base where the frequencies are higher) (Fig. 1).

Fig. 1
2 association plots between rows and columns, with Pearson's residuals spanning from negative 4.3 to 5.4. Positive residuals and negative residuals indicate higher and lower frequencies than expected, respectively. Height and width of bars are proportional to the significance and level of support.

Association graphs, original and transposed

The graphs allow us to look for associations between rows and columns, and see if the association is higher or lower than expected if rows and columns were statistically independent.

The second function we will investigate is Correspondence Analysis. The input to this function is also a matrix, just like the association graphs. It is often good to use both association graphs and CA graphs. We simply call the CA function, after we have activated the package FactoMineR.

> CA(a)

> CA(t(a))

You will see part of the analysis is text, and a graph that can be saved is also presented.

The CA graph calculates a coordinate system with the dimensions that best explain the variance in the data set. It is a nice way to present very complicated datasets with many different variables. The way to read the graph is to look at the extreme points, most distant from the origin. These points span up the dimensions. We further need to look at the line from each point to the origin. For example, B and negative are highly associated, both because they are close in the space and they have a similar angle toward the origin. We can also see this directly from the data, the value on negative for B is 200, which is much higher than the values for positive or neutral. The real usefulness of CA comes when you have large tables or matrices, with many rows (items) and columns (typically descriptors). Such data is very difficult to grasp, but in the CA graph you can see the structure (Fig. 2).

Fig. 2
A set of CA graphs, featuring both the original and transposed versions plots dimension 2, 24.83% versus dimension 1, 75.17%. The data points are A, B, C, and D, and the coordinates of each data point have been adjusted.

CA graphs, original and transposed. Note that the colors have been switched, and the coordinate values for each data point have also altered, but the variance explained is the same

FormalPara Task 4:

Find your own data sets, and see if association and correspondence will help you understand and present the structure in your data.