In this lecture, will try to derive a spectral representation for random processes. So let's start with the coin toss and example that we have used before. Imagine that we have a machine that each time n flips a coin and outputs 1, if the outcome is head and minus 1, if the outcome is tail. We assume that each sample is independent and each sample value clearly if the coin is fair, has 50% probability of being plus 1 minus 1. So we can run this machine say for instance here, we run it for 32 time indices. And every time we run the machine, we obtain a different realization of the signal each realization is an alternation of plus 1 and minus 1 in a random order. And each time we run the machine, we get 32 new samples that are different every time. So we know the mechanism behind the signal generation, but how could we represent the spectral properties of this signal? What is the frequency domain representation of this random sequence? The first thing we could do is try and simply to take the DFT of this portion of the process that we are considering. So we have 32 samples, we take the DFT and here we're looking at the magnitude of the DFT. We don't really see much. Of course, the DFT is symmetric because the signal is real valued but there is no apparent pattern. And if we run the signal generation machine once again, and we perform the DFT, we get a completely different shape once again. So it doesn't seem like taking the DFT of a portion of a realization of a random process can highlight any particular spectral properties of the process itself. So perhaps we might think if we increase the length of the portion that we analyzed and we take in more points, maybe the DFT will be clearer. Maybe the DFT will highlight some of the properties. So let's take a longer portion of the random process. So here we've been doing our plots with 32 samples. Let's try with 64 samples and nothing really seems to appear in the plot. We can increase the data length even more. Here, we have a 128 points but again, there is no pattern in the DFT. So clearly, taking the DFT of a single realization is not the way to go. Okay, so when we have something that is random and we want to mitigate the effects of randomness the standard strategy is to try and take averages. Now the problem is that the coin toss signal we just defined is 0 mean, namely the expected value for each sample is going to be 0. So if the average value of the signal is 0, the average value of the DFT will be 0 as well because of linearity. But the signal does move it moves from minus 1 to 1 randomly, but it does move. So the energy or the power if you want that is contained in a realization is nonzero. So we should be able to see this power in the frequency domain. Let's try and dig into this intuition a bit more. In general, if we have a full realization of the coin toss signal, the energy associated to the realization is going to be infinite. If we take the sum of the square value of each sample, well, that is going to be equal to the number of samples. And therefore, as the support over which we compute the energy grows to infinity, the energy itself will go to infinity. However, the power of each realization is going to be constant and equal to 1 over any interval over which we compute the power. So the coin toss signal is a power signal. Let's try to average therefore not the DFT coefficients, but the square magnitude of the DFT coefficients normalized by the length of the DFT. We pick an interval of length n, we pick a number of iterations m that corresponds to the number of realizations that we will use to average the DFT over. We run the signal generator capital M times and we obtain capital M and point realizations. We compute the DFT of each realization. And then we square the DFT and we compute the average of all this capital M DFTs. So let's do this numerically and see how this looks. So if we start with one single realization, we get this random pattern that we have seen before. But as we increase the number of realizations, we see that the average value of the squared DFT coefficient seems to converge to 1 for all values of the index. So in this example, we have taken capital N equal to 32. So 32 points of each realization, and now we have 5,000 different realizations that were using to compute this average. Let's define p of K as the expected value of the square magnitude of the DFT coefficient number K normalized by the length of the DFT. From the previous plot, it looks that the sequence P of K converges to 1 if we take enough repetitions of the experiment. Now, if the square magnitude of a DFT coefficient usually represents the energy in frequency of a signal. The square magnitude normalized by the length of the analysis interval represents the power distribution or the power density in frequency for the signal. And so the frequency domain representation for stochastic processes seems to be the power spectral density. We will analyze this in more detail in just a second. The intuition, however, behind the fact that PK is equal to 1 for the coin toss signal is that it seems that the power of this stochastic process is equally distributed over all frequencies. And this makes sense intuitively because the signal is random so we could get a sequence of all ones or a sequence of all ones with correspond to power allocated exclusively to the slowest frequency. Or we could get a sequence of perfectly alternating plus 1 and minus 1 which would correspond to power allocated to the highest frequency and every combination in between. So the power spectral density of the signal seems to reflect the fact that we cannot predict the shape of the signal. But we do know that the signal alternates between plus 1 and minus 1. A variation on the theme is considering a filtered random process. So let's take a very, very simple filter, a two-point moving average and we use this filter to filter the coin toss process that we just defined. So the output of the filter will be just the sum of two consecutive samples divided by 2. What is the power spectral density of a filtered random process? So we repeat numerically the same experiment with it before. We generate capital M realizations of the coin toss process. We take say for instance, a 32-point segment of this realizations. And we filter these with a two-point moving average. Then we take the DFT, we square the coefficients. We normalize it by the length of the DFT and we average this result over different realizations. So if you take a single realization, we have a pattern doesn't tell us much but as we increase the number of realizations. We see that once again, the plot of the average squared DFT coefficients seem to converge to a clear pattern. And with 5,000 realizations like before, we have a clear curve here that is exactly corresponding to the square magnitude of the frequency response of the filter. So it appears that for filtered process, the power spectral density of the output is equal to the power spectral density of the input multiplied by the square magnitude of the frequency response of the filter. Well all this is very nice but of course we derive these results for a very specific random process and a very specific filter. And so now our goal is to show that these results hold in general. Before we do so, we have to go back a bit to one of the earliest lectures in this DSP course and recall the difference between energy and power signals. Energy signals are signals that have a finite square norm. You take the sum of the square absolute value of each sample over the entire support of the signal and the result is finite. Power signals on the other hand, have infinite energy, but they have a finite rate of energy. So if you take the energy over an interval and you normalize that by the length of the interval you always have a finite number. We have encountered many examples of energy signals so all finite support signals are finite energy. Of course, the sink is a finite energy signal and exponential sequences with a ratio less than 1 in magnitude are also finite energy signals. For this signals the, DFT is well-defined we have no problem computing it. And the square magnitude of the DTFT represents the distribution of energy in frequency. For power signals, which are also very common for instance, the constant signal X of n will be equal to 1 is a power signal, the step sequence is a power signal. Any complex exponential or trigonometric function with infinite support is also a power signal while all the signals can be brought into the DTFT formalism using the Dirac Delta function. As a matter of fact, whenever you see a Dirac Delta in a spectrum, you know that the underlined signal is a power signal. For these signals, however, you cannot take the square magnitude of the DTFT and try to have a distribution of energy and frequency. You have to use a different representation, namely the power spectral density. To define the power spectral density, consider that truncated DTFT sum so you have your DTFT sum and you range the index from some minus n integer to n. And you define the power spectral density as the square magnitude of this truncated DTFT normalized by the length of the support over which you compute this truncated DTFT. And then you take the limit as n goes to infinity. The result is a representation of power in frequency for a power signal. So for example, the constant signal X of n equal to a has a power spectral density, which is equal to a squared times a Direct Delta centered in 0. The step sequence x a scholar a has a power spectral density which is equal to a squared over 2 x a direct Delta n 0. A complex exponential on the other hand will have a power spectral density which is the square of the magnitude of the complex exponential times a Dirac Delta centered at the frequency of the complex exponential. Now, a wide sense stationary process with finite variance is a power signal. So it's frequency domain representation will be a power spectral density. We just need to modify the definition we just introduced before by replacing the square magnitude of the truncated DTFT with the expected value of say squared magnitude. This takes into account the randomness of the signal and averages out across realizations the randomness of the DTFT. Okay, so let's compute this term. We can write the expected value of the magnitude of the truncated DTFT as the expected value of the truncated DTFT multiplied by the conjugate of the truncated DTFT. We assume the signal is real so we only need to conjugate the complex exponential here. Now, we collect the sums and we move the expectation operator inside the sums because of linearity. And now we use the wide sense stationarity hypothesis for the process and this expectation here, expectation of the product between X of N and X of M where n and M are the indices of the two sums, becomes simply the autocorrelation of the process computed at time lag M minus n. And we have of course always a complex exponential here. Now here, we have a double sum and we're going to use a very clever trick to transform this double sum into a single sum. Let's see how to do that. So suppose we have a double sum of this form. We have an outer index that goes from minus big N to big N and an inner index that also goes from minus big N and to big N. And the function inside double sum only depends on the difference between the summation indices. So the range of this argument here n minus n if you consider the range of the summation indices goes from minus two times capital n 2 plus 2 times capital N. So we can rewrite this double sum as a single sum where the index now ranges from minus to capital M to capital N. And the function inside the sum is computed over the index and each term F of K is multiplied by a weighing factor here C of K. Which will take into account the number of times that the term M minus N is equal to K in the original summation. C of K, therefore is the number of ways that we can pick and N and M in the interval minus big N to big N so that M minus n is equal to K. So if you think about it a bit, this is geometrically equivalent to the number of ways that we can fit a segment of length K over an interval that goes from minus n to big N. So for instance, a segment of this as long as the interval will only fit one time. A segment this one unit shorter will fit two times. A segment this two units shorter will fit three times and so on and so forth. After some reflection, we can come up with a formula for the number of ways that we can do so. And C of K turns out to be 2 times capital N plus 1 minus the absolute value of K. So we use this to turn that expected squared DFT value into a single sum. So here we have what we were looking for and this is the current state of the equation. We have a sum that goes from minus 2 times capital N to 2 times capital N of this wave factor here to n plus 1 minus absolute value of K times the autocorrelation of the input process computing K times e to the minus J Omega K. This starts look very much like a DTFT. But let's move on. Remember, we're looking for the power spectral density, which is the expected value of the squared DTFT normalized by the length of the interval over which we computed the partial DTFT. And that length is 2 capital N plus 1, so this is now what we're having. This term here, let's call it W n of K is a weighing factor that multiplies the value of the autocorrelation in K and the complex exponential in minus J Omega K. And we want to take this limit for big N goes to infinity. Let's examine how this guy behaves as n grows to infinity. If we plot WN of K for all possible values of K. Well here, we take a short windows from minus 40 to 40, the function looks like a triangle and as n increases, this triangle becomes wider and wider. As n goes to infinity w n of K will tend to the constant 1 for all values of K. So with this in mind, we compute the limit that we were looking for, okay. So the limit for n that goes to Infinity of this some, the weighing factor, the autocorrelation, the complex exponential. As n goes to Infinity, this goes to 1 and we're left with the sum for K that goes from minus infinity to plus infinity of the autocorrelation of the input process in K times e to the minus J Omega K. Which is the DTFT of the autocorrelation of the input process. So this is a remarkable result the power spectral density of a random process is the DTFT of its autocorrelation. As an example, let's look at the power spectral density of a white noise process. Remember, a white noise process of defined by a mean equal to 0 and auto correlation that is nonzero only for K equal to 0 in which case it's equal to the variance of the process. Well, if you take the DTFT of the autocorrelation for the white noise process, you have the power spectral density is the constant Sigma square. Where Sigma square is the variance of the process. Now, the coin toss process, you remember, was white because every coin toss is independent and it's autocorrelation is Delta of K. And therefore, we have confirmed that the power spectral density for the coin toss process is indeed a constant. Two final words on white noise which is in a big quietest noise and Signal processing, the whiteness of the noise does not depend on the probability density function that governs the distribution of each sample in the noise. The whiteness only describes the decorrelation between noise samples. So the power spectral density will always be flat for white noise independently of the probability density function that describes the individual samples. You can have Gaussian white noise. You can have uniform white noise. You can even have bi-variate white noise as we had in the case of the coin toss process. But very often, a Gaussian distribution is what models the experimental data the best. An additive white Gaussian noise called AWGN, is the most common noise model for communication systems.