So, we have seen that the transmitter sends a sequence of symbols, a[n], created by the map. Now, we take the receiver into account. We don't yet know how, but it's safe to assume that the receiver, in the end, will obtain an estimation hat a of n, of the original transmitted symbol sequence. It's an estimation, because even if there is no distortion introduced by the channel, even if nothing bad happens, there will always be a certain amount of noise that will corrupt the original sequence. When noise is very large, our estimate for the transmitted symbol will be off and will incur a decoding error. Now this probability of error will depend on the power of the noise with respect to the power of the signal. And will also depend on the decoding strategy that we put in place. How smart we are in circumvent in the effects of the noise. One way we can maximize the probability of correctly guessing the transmitted symbol is by using suitable alphabets. And so we'll see in more detail what that means. Remember the scheme for the transmitter. We have a bitstream coming in and then we have the scrambler, and then the mapper. And here, we have a sequence of symbols, a(n). These symbols will have to be sent over the channel. And to do so, we upsample, and we interpolate, and then we transmit. Now, how do we go from bitstreams to samples in more detail? In other words, how does the mapper work? The mapper will split the incoming bitstream into chunks and will assign a symbol a(n) from a finite alphabet to each chunk. The alphabet, we will decide later what it is composed of. To undo the mapping operation and recover the bit stream, the receiver will perform a slicing operation. So the receiver will receive a value hat a(n), where hat indicates the fact that noise has leaked into the value of the signal. And the receiver will decide which symbol from the alphabet, which is known to the receiver as well, is closest to the receive symbol. And from there it would be extremely easy to piece back the original bitstream. As an example, let’s look at simple two level signaling. This generates signals of the kind we have seen in the examples so far, alternating between two levels. The way the mapper works is by splitting the incoming bitstream into single bits. And the output symbol sequence uses an alphabet composed of two symbols, G and -G, and associates G to a bit of value one, and -G to a bit of value zero. At the receiver, the slicer looks at the sign of the incoming signal sequence, which has been corrupted by noise, and decides that the nth bit will be 1 if the sign of the nth signal is positive and zero otherwise. Let's look at an example. Let's assume G = 1. So, the two levels signal will alternate between plus 1 and minus 1. And, suppose we have an imput bit sequence that gives rise to this signal here. After transmission, and after decoding at the receiver, the resulting symbol sequence will look like this, where each symbol has been corrupted by a variant amount of noise. If we now slice the sequence by thresholding as shown before, we recover a symbol sequence like this where we have indicated in red the errors incurred by the slicer because of the noise. So if you want to analyze in more detail what the probability of error is, we have to make some hypothesis on the signals involved in this toy experiment. Assume that each received symbol can be modeled as the original symbol plus a noise sample. Assume also that the bits in the bitstream are equiprobable, so zero and one appear with probability 50% each. Assume that the noise and the signal are independent. And assume that the noise is additive white Gaussian noise with zero mean and its known variance sigma zero. With this hypothesis the probability of error can be written out as follows. First of all we split the probability of errors into two conditional probabilities, conditioned by whether the n-th bit is equal to 1 or the n-th bit is equal to 0. In the first case when the n-th bit is equal to 1, remember, the produce symbol will be equal to G. So the probability of error is equal to the probability for the noise sample to be less than minus G, because only in this case, the sum of the sample plus the noise will be negative. Similarly, when the alphabet is equal to 0, we have a negative sample. And the only way for that to change sign is if the noise sample is greater than G. Since the probability of each occurrence is one half, because of the symmetry of the Gaussian distribution function, this is equal to the probability for the noise sample to be larger than G. And we can compute this as the integral from G to infinity, of the probability distribution function for the Gaussian distribution, with the known variance here. This function has a standard name, it's called the error function. And since this integral cannot be computed in close form, this function is available in most numerical packages under this name. So the important thing to notice here is that the probability of error is some function of the ratio between the amplitude of the signal and the standard deviation of the noise. And we can carry this analysis further by considering the transmitted power. We have a bilevel signal and each level occurs with one half probability. So the variance of the signal, which corresponds to the power, is equal to G squared times the probability of the n being equal to one plus G squared plus the probability of the nth bit being equal to zero which is equal to G squared. And so if we rewrite the probability of our function, we can write that it is equal to the error function of the ratio between the standard deviation of the transmitted signal divided by the standard deviation of the nodes. Which is equivalent to saying that this is the error funtion of the square root of the signal to noise ratio. If we plot this as a function of the signal to noise ratio in dBs, and I remind here that dBs here. We compute 10 log in base 10 of the power of the signal divided by the power of the noise. And since we are in a log log scale, we can see that the probability of error decays exponentially with the signal to noise ratio. Absolute rate of decay might change, in terms of the linear constants involved in the curve. The trend will stay the same, even for more complex signaling schemes. So the lesson that we learned from this simple example is that in order to reduce the probability of error, we should increase G, the amplitude of the signal. But of course increasing G also increases the power of the transmitted signal. And we know that we cannot go above the channels powers constraint. And so that's how the power constraint limits the reliability of transmission. The bilevel signaling scheme is very instructive, but it's also very limited in the sense that we're sending just one byte per output symbol. So increase the throughput, to increase the number of bytes per second that we send over a channel, we can use multi level signaling. There are very many ways to do so. We'll just look at a few, but the fundamental idea is that we take now larger chunks of bits, and therefore, we have alphabets, that have a higher cardinality. So more values in the alphabet, means more bits per symbol, and therefore a higher data rate. But not to give the ending away, we will see that the power of the signal will also be dependent of the size of the alphabet, and so in order not to exceed the certain probability of error, given the channel's power of constraint. We will not be able to grow the alphabet indefinitely, but we can be smart in the way we build this alphabet. And so we will look at some examples.