# Correlation

## Correlation

A correlation between two variables x and y is a standardized measure of how much two random variables X and Y change together in a linear way. A correlation is usually denoted as 'r'. It's values can go from -1 to +1. A strong positive correlation indicates that greater values in one variable correspond to greater values in the other variable. A strong negative correlation indicates that greater values in one variable correspond to smaller values in the other variable. A correlation of 0 indicates that there's no linear relationship between the two variables x and y.

## Requirements:

Both random variables must be at least interval scaled and bivariate normal distribution is required.

Illustration of a bivariate normal distribution

## Calculation:

Suppose we have two normally distributed random variables x and y.

xi and yi denote the values of x and y for case i.

then the correlation is defined as:

If the standard deviations of x and y as well as the covariance between x and y are known the correlation can be defined as:

And a correlation can always be written as the cross-product of the standardized values of x and y

## Spearman's rank correlation

Spearman's rank correlation is applied if the random variables X and Y are ordinal scaled.

Spearman's rank correlation is identical to the Pearson's product moment correlation if the values of both X and Y variables are transformed into ranks (values range from 1 to N)

It can be rewritten as:

and standard error:

whereas di denotes the rank difference of observation i

## Interpretation

Correlation does not imply causation.

In principle there are four different ways to interprete a correlation between two variables X and Y supposed the correlation is not a coincidence:

X causes Y

Y causes X

X causes Y and Y causes X (bidirectional causation)

There is a third variable Z that causes both X and Y

There can be no conclusion made regarding the existence or the direction of a cause-and-effect relationship only from the fact that X and Y are correlated.

## Fisher's Z-transformation

The Fisher's Z-transformation is approximate variance-stabilizing transformation of r when the two random variables X and Y are bivariate normal distributed. The Fisher's Z-transformation is used for example when correlations coefficients are averaged and when testing certain hypotheses about correlations.

## Calculation:

wheras 'ln' is the natural logarithm function and 'arctanh' is the inverse hyperbolic function.

the standard error of the Z-transformed correlation is

So, the Fisher's Z transformation and it's inverse

can be used to calculate confidence intervals for correlation coefficients.

## Averaging correlations

If correlations originate from equal-sized samples you can simply take the inverse of the averaged Z-transformed correlation coefficients.

If sample sizes are not equal the following formula applies:

whereas Zj are the Z-transformed correlation coefficients and nj are the corresponding sample sizes

## Testing correlation hypotheses

#### Case A) testing H0: ρ=0

This is by far the most common case. Normally one is interested if a given correlation (ρ) differs significantly from a hypothesized zero-correlation in the population.

In such a case the following t-test applies:

The t-value has (n-2) degrees of freedom.

#### Case B) testing H0: ρ=ρ0<>0

Sometimes you want to test if a given correlation (ρ) is different from a well known correlation in the population (ρ0) that is different from zero

In that case you can calculate the following z-value of the standard normal distribution (CAUTION: Do not confound the z-value from the standard normal distribution and the Fisher's Z-values):

Z = Fisher's Z-transformation of the given correlation

Z0 = Fisher's Z-transformation of the well known population correlation

#### Case C) testing H0: ρ1=ρ2

If you want to test if two correlation coefficients from two independent samples differ significantly, the following z-value is applicable:

#### Case D) testing H0: ρ1=ρ2=...=ρk

If you want to test if k correlation coefficients from k independent samples differ significantly, the following Χ2-distributed value applicable as test value:

The χ2-value has k-1 degrees of freedom.

## Example of a Correlation

 x y x2 y2 x*y 2 1 4 1 2 1 2 1 4 2 9 6 81 36 54 5 4 25 16 20 3 2 9 4 6 Σ 20 15 120 61 84

then the correlation is:

We get a positive correlation, so greater values in x correspond to greater values in y. The following figure illustrates this:

The t-value for testing if the correlation is significantly different from a zero-correlation is:

and because this t-value is greater than the critical t-value for a non directed test (t(df=3, alpha=0.05)=3.182) we can say that the obtained correlation coefficient differs significantly from zero.

BrightStat output of the correlation example

## References

Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263.

Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.

Fisher, R.A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10(4), 507–521.

Bortz, J. (2005). Statistik für Human- und Sozialwissenschaftler (6th Edition). Heidelberg: Springer Medizin Verlag.

## Gallery

### Donate IOTA (IOTA)

WVIKKIQTDAALRFSWOXLJUBNGVQOGQPTRWVXTHCYJCFEQDVJWSRWNZQXZQEQUAYPOEGJPDE9ODHZSQITT9MRJDNP9DC

### Donate Gridcoins (GRC)

S4isByEbRy6i9Kd3NfHbwrEob7gahYsdBA

### Donate Ethereum (ETH)

0x8570825cD83c345a94070BD0fbccd93f97c67635