Welcome to this module of

Data Science for Business Innovation.

When you need to introduce a product on the market,

you need to correctly estimate its price.

For instance, are we able to give

the right price to a diamond based on its features?

Let's start by plotting a Cartesian space where we put on

the axes the price and the number of carats of diamonds.

We then position some points

representing exemplary diamonds in the space.

Here, we have some low price diamonds of one carat.

Up here, we have some more expensive diamonds

of three carats.

Now, I have a question for you.

What is the price of the two carat diamond?

One thousand dollars? Six hundred?

Four hundred? Think about it.

Well, I guess that you did something like this.

You eyeball the line here,

and then you said if it is two carats,

than it should be 600.

This line is called,

line of best fit,

and the method to determine it

is called linear regression.

An extremely useful machine learning methodology

that comes from statistics.

Of course computers cannot eyeball lines,

but they are excellent in

repeating fast simple procedures.

So let's put again

those diamonds in the previous Cartesian space,

and now let's try to understand how a computer can

perform linear regression to find the line of best-fit.

First of all, the computer

randomly draws a line in the space.

Let's say this one.

Then it measures the error.

So there is this first error, this second error,

this third, fourth, fifth and sixth error.

Then it explores the space of possible actions

that can be performed over the line to reduce the error.

It can turn the line clockwise,

anticlockwise, it can shift it up,

it can shift it down.

What it does is following

a general procedure called gradient descent.

It will explore all the possible actions

and do a move towards reducing the error.

This is like descending from a mountain.

Let's call it Mount Errorest.

After estimating an initial error,

the computer checks the action will get it

as fast as possible downhill.

So let's put the points again,

and let's say that if we rotate the line of

a 120 degrees anticlockwise we have something like this.

Now, let's measure the error again.

The first error diminishes a lot.

The second error now is almost nothing.

The third diminishes, the fourth diminishes,

the fifth is very small and the sixth diminished.

So we descended a bit downhill.

Now, what's the best action?

Well, probably is to shift up the line a bit.

That will reduce all the errors

and it will bring us downhill a bit more.

When the error becomes stable and

no action can reduce the error significantly,

we found the line of best fit.

So far, we illustrated how to implement

linear regression using gradient descent.

You may think that linear regression has limited

applicability because only a few problems are linear.

But actually, linear regression

works also for non-linear problems.

For instance, let's say that we have these points here.

The line of best fit this time is

a quadratic behavior that is

represented by a parabolic function.

You can of course try to fit a quadratic curve

or what you can do is transform the axis of the space.

Instead of putting your points in an x-y space,

you put your points in the x square, y space.

Hence, the trend becomes a linear problem.

This works also for more complex situations.

Let's say for instance that we

have this circle of points here.