Next, we're going to

look at another application that illustrates

the utility of being able to define our own functions,

and that's to write programs to work with the Gaussian distribution.

You're familiar with the Gaussian distribution.

It's a mathematical model,

the so-called bell curve that has been used successfully for

centuries to fit experimental observations in lots of contexts.

This is the function, it's called the Gaussian probability density function.

It's Phi of X equals one over square root of two Pi e to the minus X squared over two,

and this is a plot of the function.

It's centered around zero,

that's the mean and the standard deviation measures how it spreads and that's one.

So, this is the scores of SATs in some year.

It pretty well matches this experimental data if we adjust the parameters.

We take a mean of 1,019 and standard deviation from the experimental data 209,

then plot the Gaussian function with those parameters,

two parameter Gaussian is one over sigma square root

of two Pi e to the minus X over Mu squared,

x minus Mu squared over two sigma squared.

Then it scales the function to match your experimental data.

The mean and the sample standard deviation,

it fits experimental observations.

So often, when writing programs that process

data we want to work with the Gaussian distribution,

the Gaussian probability density function.

There's good math behind it,

but for now, all you need is that,

that's the formula for Phi of X

with a little math to get the two argument version that's scaled,

subtract the mean divide by sigma,

apply Phi and then divide by Phi.

So, that's all the math that you need,

just definition and a little algebra.

So, Gaussian distribution arises everywhere and these are

just some examples from papers pulled off the web.

If you're studying polystyrene particles in glycerol or laser beam propagation or

optical tweezers or oxidase patches in the visual cortex, Gaussian distribution arises.

I don't know if I believe this one,

predicted US oil production,

that doesn't sound too Gaussian to me,

but people try to fit it anyway.

German money, well, Gauss was German,

and actually, have the curve on their money.

So, we want to work with it.

So, is it there in Java's math library?

Do I have math.Phi? The answer is no.

So, why not? It's so important, why is not there?

Well, maybe the answer is that you could do it yourself.

So, you can create a class Gaussian and you can define a function,

we'll call it pdf, probability density function.

All we have to do is implement that math formula,

one over square root of two Pi e to the minus X squared over two.

So, that's Math.exp minus X squared over two divided by math.square root, two times math.

PI, it's as simple as that.

So again, we're calling a function another module,

we've been doing that for awhile.

Then if you want the three argument version,

then we use that one appropriately.

Now in Java, if you have a two function signatures with differing number of arguments,

they can have the same name and that's useful in this case.

But all we do is use the other one and just apply it with

the argument x minus Mu over sigma and then divide the result by sigma.

So, and we'll look at some more functions in a minute,

but I think the point is that if there's any math functions that you need,

you can define libraries yourself that implement those functions.

That's an example of modular programming,

you don't want to put this math code in every program that does the Gaussian,

you just want the program to act as if the function is implemented.

So, we put this in a module Gaussian.java and now you can

call these functions with Gaussian.pdf and so forth.

In a way with libraries,

any user can extend the Java system to include

the functions that aren't appropriate for the application at hand.

It's a very powerful idea.

Now, there's another Gaussian function called the cumulative distribution function.

So that one is,

if you have a particular point on the x-axis,

you might ask what percentage of the total is less than or equal to that?

That's the same as what's the area of the curve to the left of a certain point?

That's called the Gaussian cumulative distribution function

and that's useful in applications.

This is what it looks like.

It's the one over square root of two Pi integral

from minus infinity to e minus x squared over 2 dx.

That's useful to know,

it has a similar,

but simpler three argument version

where you just have to evaluate it at x minus Mu over sigma.

So, that's the math for the cumulative distribution function and it's typical to apply.

So in our year,

you had to get an 820 to be a division I athlete.

You might say, well what fraction of test takers didn't qualify?

Well, the answer is given by

the Gaussian CDF for that data with that average and that standard deviation.

In this case, about 17 percent.

So maybe, you want to write code that uses

the Gaussian cumulative distribution function

to be able to compute information like this.

So, what do we do with that?

Well, there isn't a close form for that,

it was expressed in terms of an integral,

how are we going to implement that?

The answer is, well, there is a Taylor series that is very accurate,

and actually, this is the way that most math functions are implemented on computers.

In this case, there's math that shows that Phi of z is one-half,

the CDF is one-half,

the pdf times this series which the terms get smaller and smaller pretty quickly.

So, we can just write code that implements this math formula.

So if it's eight, if it's less than minus eight is so small is to be negligible,

and if it's bigger than eight it's so close to one as the difference is negligible.

So, we start with compute this sum and this just implements

this formula where we compute each term by multiplying by z squared and then dividing

by the next odd number and that's what we do as long as we're getting anywhere.

If we add something and it's still the same then we can stop.

Then what we're supposed to return is a half plus what we computed times the pdf.

It turns out that this result is accurate to 15 places.