Rank Correlation and Lines of Regression (Statistics for Psychologists)

The idea of Correlation in the previous notes were proposed by a great statistician, Karl Pearson. He was from England. An interesting fact about Dr. Pearson is that his real first name was spelled “Carl”, but it was copied down wrongly by someone during his college admissions, thereby he was forced to changed it to “Karl”. He was a statistician, biostatistician and a philosopher. Albert Einstein talks about how one of Pearson’s book titled “The Grammar of Science” influenced the formulation of the Theory of Relativity! Pearson also was the one who started the first Department of Statistics in the entire world!

Karl Pearson
Francis Galton
Charles Spearman

Pearson was a huge fan of another scientist named Sir Francis Galton. Sir Francis was a statistician, polymath, sociologist, psychologist, … , tropical explorer, geographer, inventor, meteorologist and psychometrician! I’ve skipped nearly 5 fields in this list, to save space! Sir Francis was Pearson’s superhero – to the extent that Pearson wrote a 3-volume biography of Galton!

Pearson wrongly predicted that Sir Francis will be known as the greatest scientist contemporary to him, but that credit was stolen by one of Sir Francis’s cousins, Charles Darwin!

But Sir Francis had many fans, including famous mathematicians like Pearson. Spearman was a military officer, who later studied psychology. But he is more known for his contributions to statistics.

One of his contributions is a simpler formulae for correlation coefficient if we are trying to compare ranks (from 1 to n) in two different cases. For example, Ranks of a class in Sem 1 and Sem 2. Since ranks are numbers between 1 and n for n observations, he simplified the correlation coefficient formula to:

If (xi, yi), i = 1, … , n are ranks in two events, Spearman’s Rank Correlation is given by

\rho = 1- {6\sum d_i^2\over n(n^2-1)}

where di=xi – yi, i = 1,2,…,n. ρ is a Greek letter, read as ‘rho’.

Note that Rank Correlation Coefficient can only be used in the case of ranks.

Example 1
Madam Jo, Jesse Sir, Rajdeep & Phin participated in both singing competition and maths quiz contest. Judges’ decision is given below. Find the rank correlation coefficient.

ParticipantMadam JoJesse SirRajdeepPhin
Singing (X)1432
Maths Quiz (Y)4213

Solution:

ParticipantMadam JoJesse SirRajdeepPhin
Singing (X)1432
Maths Quiz (Y)4213
d = x – y -322-1
d29441

n = 4

∑d2 = 18

\rho = 1- {6\sum d_i^2\over n(n^2-1)} =  1 - {6\times 18\over 4(4^2 - 1)} = -0.8

One of the conclusions we can make from this problem is that among these four people, those who are weak at singing are good in mathematics. Note that it’s true only among these four people, not in general.

Example 2
Ten Competitors in a musical test were ranked by the three judges  and  in the following order. Using rank correlation method, discuss which pair of judges has the nearest approach to common liking in music.

Ranks by/CandidatesC1C2C3c4C5C6C7C8C9C10
A (X)16510324978
B (Y)35847102169
C (Z)64981231057

Try it yourself! Note that first, you will have to find the rank correlation between Judges A and B, then between B and C and finally betweek A and C. Make analysis based on the answers you get!

Example 3
Given the marks in Physics and mathematics. Find the rank correlation coefficient.

JesseNabaWandaRussellMitra
Physics3022354036
Maths4745354850

Note that the table gives you the marks, not the ranks. First you have to rank them in each subject, then find correlation coefficient! Try it yourself!


Since rank correlation is the same as Pearson’s Correlation coefficient , -1 < ρ < 1.

Tied Ranks

If there are two people with same rank (tied rank), you might have to use some tricks to get right answer. We will not discuss that in detail now!

Lines of Regression

In this data set, we are comparing the heights of fathers (X) with that of sons (Y)

X6566676768
Y6768657272

The scatter plot for this particular problem looks like

While correlation gives us some insights, looking at a line which is closest to all points helped us understand this better.

This line is called line of regression. It not only helps us quantify the strength of relationship between X and Y – it also helps us predict/forecast the value.

Regression involves drawing a line that’s closest to all data points. We call it ‘Linear Regression’ if it’s a straight line.

Formulas of Lines of Regression

Line of Regression of y on x: y-\bar{y} = r {\sigma_Y\over\sigma_X}(x-\bar{x})

Line of Regression of x on y: x-\bar{x} = r {\sigma_X\over\sigma_Y}(y-\bar{y})

where \sigma_X is the standard deviation of X and \sigma_Y is the standard deviation of Y. r is the correlation coefficient of X and Y.

Note that, unlike correlation coefficient, lines of regression are not the same for X on Y and Y on X.

Extra Links

One Comment Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.