A correlation between two variables x and y is a standardized measure of how much two random variables X and Y change together in a linear way. A correlation is usually denoted as 'r'. It's values can go from -1 to +1. A strong positive correlation indicates that greater values in one variable correspond to greater values in the other variable. A strong negative correlation indicates that greater values in one variable correspond to smaller values in the other variable. A correlation of 0 indicates that there's no linear relationship between the two variables x and y.

Pearson's product moment correlation


Both random variables must be at least interval scaled and bivariate normal distribution is required.

Illustration of a bivariate normal distribution


bivariate normal distribution


Suppose we have two normally distributed random variables x and y.

xi and yi denote the values of x and y for case i.

then the correlation is defined as:




If the standard deviations of x and y as well as the covariance between x and y are known the correlation can be defined as:


correlation 2


And a correlation can always be written as the cross-product of the standardized values of x and y




Spearman's rank correlation

Spearman's rank correlation is applied if the random variables X and Y are ordinal scaled.

Spearman's rank correlation is identical to the Pearson's product moment correlation if the values of both X and Y variables are transformed into ranks (values range from 1 to N)

It can be rewritten as:


Pearson correlation coefficient


and standard error:


Pearson correlation coefficient standard error


whereas di denotes the rank difference of observation i



Correlation does not imply causation.

In principle there are four different ways to interprete a correlation between two variables X and Y supposed the correlation is not a coincidence:


X causes Y

Y causes X

X causes Y and Y causes X (bidirectional causation)

There is a third variable Z that causes both X and Y


correlation and causation


There can be no conclusion made regarding the existence or the direction of a cause-and-effect relationship only from the fact that X and Y are correlated.


Fisher's Z-transformation

The Fisher's Z-transformation is approximate variance-stabilizing transformation of r when the two random variables X and Y are bivariate normal distributed. The Fisher's Z-transformation is used for example when correlations coefficients are averaged and when testing certain hypotheses about correlations.



Fisher's Z transformation


wheras 'ln' is the natural logarithm function and 'arctanh' is the inverse hyperbolic function.

the standard error of the Z-transformed correlation is


standard error of Fisher's Z value


So, the Fisher's Z transformation and it's inverse


inverse of Fisher's Z transformation


can be used to calculate confidence intervals for correlation coefficients.


Averaging correlations

If correlations originate from equal-sized samples you can simply take the inverse of the averaged Z-transformed correlation coefficients.

If sample sizes are not equal the following formula applies:


averaging correlations from unequally sized samples


whereas Zj are the Z-transformed correlation coefficients and nj are the corresponding sample sizes


Testing correlation hypotheses


Case A) testing H0: ρ=0

This is by far the most common case. Normally one is interested if a given correlation (ρ) differs significantly from a hypothesized zero-correlation in the population.

In such a case the following t-test applies:


Test correlation against zero


The t-value has (n-2) degrees of freedom.


Case B) testing H0: ρ=ρ0<>0

Sometimes you want to test if a given correlation (ρ) is different from a well known correlation in the population (ρ0) that is different from zero

In that case you can calculate the following z-value of the standard normal distribution (CAUTION: Do not confound the z-value from the standard normal distribution and the Fisher's Z-values):


Test correlation agains non zero


Z = Fisher's Z-transformation of the given correlation

Z0 = Fisher's Z-transformation of the well known population correlation


Case C) testing H0: ρ12

If you want to test if two correlation coefficients from two independent samples differ significantly, the following z-value is applicable:


test if two independent correlations are different


Case D) testing H0: ρ12=...=ρk

If you want to test if k correlation coefficients from k independent samples differ significantly, the following Χ2-distributed value applicable as test value:


test if k independent correlations are different


The χ2-value has k-1 degrees of freedom.

Example of a Correlation


  x y x2 y2 x*y
  2 1 4 1 2
  1 2 1 4 2
  9 6 81 36 54
  5 4 25 16 20
  3 2 9 4 6
Σ 20 15 120 61 84


then the correlation is:






We get a positive correlation, so greater values in x correspond to greater values in y. The following figure illustrates this:


scatterplot correlation


The t-value for testing if the correlation is significantly different from a zero-correlation is:


scatterplot correlation


and because this t-value is greater than the critical t-value for a non directed test (t(df=3, alpha=0.05)=3.182) we can say that the obtained correlation coefficient differs significantly from zero.


BrightStat output of the correlation example


Wiki link correlation

Wiki link correlation and causation

Wiki link Fisher's Z-transformation


Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246–263.

Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.

Fisher, R.A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10(4), 507–521.

Bortz, J. (2005). Statistik für Human- und Sozialwissenschaftler (6th Edition). Heidelberg: Springer Medizin Verlag.



map kinase