Finally, there's a fourth definition

of similarity we might use,

this is called the Pearson correlation.

So what would happen if we had

numerical ratings rather than

just thumbs up and thumbs down?

Maybe the cosine similarity would

no longer be the best idea.

So think about a data set like this,

where we have star ratings.

This is zero if we haven't rated

the movie, and a value of 1, 2,

3, 4, 5 if you have rated

a movie depending on what your star rating of it is.

In this case unlike the thumbs up thumbs down,

five now means you bought it and liked it,

zero means you just didn't buy it,

and one means you bought it and disliked it.

Okay, so to consider how this idea is

different from cosine similarity or

why cosine similarity might be

a bad idea to apply to this data,

think about what's happen in the following scenario.

We're again, given a data set where we have three users,

so a three-dimensional space

and two movies that are being rated.

Suppose we have one of our movies, Harry Potter,

and it was watched by

two of those users and they both rated it five stars.

That might correspond to a vector like 5, 5,

0 in this space,

where we take a particular row or column of that matrix.

On the other hand, then we have

another movie like, Pitch Black,

which was watched by the same users,

but they really didn't like it,

so they both gave it one star.

Okay, and what's important to note is that

these two vectors actually point in the same direction.

So if we were to take the cosine similarity between them,

the angle theta between those two movies,

well, the angle is zero degrees, right?

Those vectors point in exactly the same direction,

so the cosine similarity with the cosine of that angle,

would be equal to one.

That's the wrong notion.

These movies should be as opposite as possible.

The users who watched both movies,

liked one and hated the other.

So somehow, we might like to normalize that data.

Say in this case, we could subtract the average.

The average rating here would be three.

If we were to subtract the average,

we would end up with re-normalized data

that said Harry Potter took a value of 2, 2,

0 and Pitch Black took a value of -2,

-2, 0, and in this case,

the angle between them would be equal to 180 degrees.

Those movies would point in

exactly the opposite direction.

They'd be very different movies in terms of

recommending into people and that's what we might like.

So this next notion of

similarity called Pearson correlation,

is going to try and perform that re-normalization or

that average subtraction to make

sure that these items would point

in opposite directions.

Okay, so the basic idea is that we don't want

one-star ratings to be parallel to five-star ratings,

rather we would like to try and figure out or subtract

the average so that we know that five stars is

positive but one star is negative.

All that's going on here,

the equation looks fairly frightening,

but we're really just subtracting averages appropriately.

This is nothing other than a cosine similarity equation,

where we have first extracted

averages in terms of

the ratings given by

each user or the ratings given for each item.

So we can just compare and contrast our definition

of cosine similarity to Pearson correlation.

This equation is given here for similarity between users,

but you could equivalently

define it for the similarity between items,

just by interchanging, I sub

U with U sub I, whenever you see it.

So all we're doing to compute the similarity

between users, is taking the items rated by both users,

but we're first subtracting

the average rating by a particular user,

that means we're determining

whether this rating is a positive or

negative one by the standards of that individual user,

by subtracting the average.

Other than that average subtraction,

the Pearson correlation looks pretty

much equivalent to the cosine similarity.