Comparing Two Distributions – Correlation Coefficient (Statistics for Psychologists)

So far,

  • Central Tendency: A single value that reflects the nature of a distribution .
  • Dispersion: Associating some number to the ‘spread’ of a distribution.

If you noticed, it’s about measuring various properties of ‘a single’ distribution.

But what if we have to compare two distribution?


Suppose one of your friends said, “Tall people tend to weigh more in my family”. It looks intuitively looks logical – but is it true scientifically?

To prove it, you and your friend collected information from your friend’s family: Dad, mom, your friend and his two siblings. You can find the average of height and weight separately, also find how ‘spread’ the data is (measure of dispersion). Now, how will you compare the “Heights” table and “Weights” table and make conclusions?

For that, we have a tool called Correlation Coefficient.

Correlation Coefficient

Correlation coefficient (developed by Karl Pearson) is a way to indicate how closely related two sets of data are. Correlation coefficient is given by:

r_{XY} = {\frac1n\sum XY - \bar X \cdot \bar Y\over \sqrt{[\frac1n\sum X^2-\bar X^2][\frac1n\sum Y^2-\bar Y^2]}}

Example 1
Calculate the correlation coefficient between marks in test 1 (X) and Test 2 (Y)



∑X=33∑Y=24∑XY=148∑X2 =223∑Y2 =146

Calculating means,

\bar X = \frac{33}5=6.6

\bar Y = \frac{24}5=4.8

r_{XY} = {\frac{148}5 - 6.6\times 4.8\over\sqrt{[\frac{223}5-6.6^2][\frac{146}5-4.8^2]}}=-0.8218

Meaning of Correlation Coefficient

We got correlation coefficient value, and it is negative – what does it mean?

Correlation usually has one of two directions. These are positive or negative. If it is positive, then the two sets go up together. If it is negative, then one goes up while the other goes down.

From MathIsFun
From MathIsFun

In example 1, the scatter plot (plotting the points (x,y) on a graph) will give us a plot like this:

Clear that our values of Y are decreasing as value of X increases. Remember: we got our correlation coefficient as -0.8218!

But… it’s about checking if value of Y increases/decreases with value of X? Can’t we do it by observing the data or scatter plot? Not exactly. When your data set is bigger, or say ‘weakly’ correlated, it would be hard to catch such trends. Correlation coefficient helps us associate a degree of relation between the distributions as well. So, it’s a very useful tool!

Simplifying Calculations

Example 2
Find the correlation coefficient of the following marks scored by 5 students in two exams:



∑X=148∑Y=154∑XY=6096∑X2 =5742∑Y2 =6526

If you continue to calculate the correlation coefficient, you will get the answer as r_{XY}=0.9987

But – big numbers, looks scary! Remember the trick we used for standard deviation? Turns out it will work for Correlation, too!

XYU = X – 34V = Y – 35UVU^2V^2
∑U=12∑V=14∑UV = 440∑U2 = 302∑V2 =646
Now it’s more easy! 🙂

Continue the calculation and find r_{XY} = r_{UV} (Answer will be 0.9987)

Take a look at what happens to the scatter plot:

Look how the trend seen in scatter plot is the same, and how Y (and V) increases as X (and U) increases!

The Trick

  • Choose the numbers
    • A (preferably the median of X),
    • B (preferably the median of Y) •
  • Let U = X – A and V = Y – B
  • Find rUV
  • rUV = rXY

Example 3:
Here’s the Height-Weight of 7 random entries in this Kaggle dataset: Find the correlation coefficient and make some observation about the correlation between height and weight.

Height (X)174189185195189195192
Weight (Y)968711010410481101

(Try it yourself)

Some Properties of Correlation Coefficient

  • rXY = rYX
  • – 1 ≤ r ≤ 1
  • r > 0 when X increases as Y increases
  • r = 0 when X and Y have no connection
  • r < 0 when X decreases as Y increases

Extra Reading


One Comment Add yours

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.