Hello everyone. In this video,

you will learn the model validation

and improvement techniques.

Specifically, the residual analysis.

A residual or an error of the model is

the difference between the observed value

y and the predicted value of y hat,

which represents the variation

not explained by the model.

There are two possible causes for a residual.

First, the residual is

a purely random noise that is

unpredictable and cannot be eliminated.

Second, the residual is the result of errors in the model.

Either, model misses some important variables

that may significantly

affect the dependent variable y

or the model has the wrong form.

For example, the model should

be non-linear rather than linear.

If the first cause holds true,

then the model is valid.

If the second cause holds true,

then the model is not valid.

To validate the model and to find

out which cause led to the residuals,

we can conduct a residual analysis.

The key idea is that if we can

explain all patterns in a data,

then the residuals should be purely random and thus,

satisfy the following four conditions:

First, linearity,

which means that the residuals are randomly distributed.

Second, independence.

That is, the residuals are

independent over all observations.

Third, normality.

The residuals are normally distributed.

And fourth, equal variance. That is,

the residuals have constant variance

over all the observations.

If all these conditions are met,

then the residuals are purely

random and the regression model is valid.

Otherwise, there might be

distinct patterns in the residuals which means

that the model either

misses some variables or has the wrong form.

Thus, the residual analysis can provide

important insight on how to improve the model.

To explain conditions, let's look at some examples.

For linearity, the figure on

your left shows what a purely random noise may look like.

The figure on your right shows a non-random pattern,

which indicates that

the hypothesized linear model is not valid,

and we may need a non-linear model.

For independence, the figure on

the left shows that

the residuals are independent over time.

The figure on your right shows that

the residuals alternate cyclically.

That is, a positive residual is

always followed by a negative residual and vice versa,

which means that the residuals are correlated over time.

Regarding normality, we can

draw a histogram of the residuals.

If the histogram looks like

the figure on your left, that is,

a bell-shaped histogram,

then the residuals are normally distributed.

Otherwise, the residuals are not normally distributed,

like this example on your right.

Please note that a moderate departure from

normality is generally not problematic.

Finally, regarding equal covariance,

the figure on your left shows

a constant variability of

the residuals for different x values.

However, the figure on your right shows that

the variability of the residuals is

increasing as x increases.

Thus, the variance is not

equal and the model may not be valid.

Now, let's come back to our case.

Here are the residuals of

the trend model on AK demand data.

Can you please comment on whether

the residuals meet the conditions of linearity,

independence, and equal variance?

Obviously, the variance of

the residuals is increasing over time.

As you can see, these dots in red circle and

a general increasing trend of

variability in the residuals,

which implies that

the equal variance condition is not met.

In addition, the residuals

may have some periodical patterns,

which needs to be confirmed by more analysis.

In summary, the residual analysis shows

that the variability of residuals is increasing.

So the model may miss some important variables.

The potential variable missed could be

some environmental variables such as price,

home sales, and so on.