As you have been learning about strings you have been improving on your algorithm to find genes in DNA. However, lets take a moment to think about what your algorithm would do on this string. You will find the start codon ATG at index 0, then it will find the stop codon, TAA at index 8. It will then check if the distance between them, which is eight, is a multiple of three. Because eight is not a multiple of three, your algorithm will conclude that this is not a valid gene. Between the ATG and TAA there is one full codon, ATC, and two-thirds of another code on, GC. But if you were to keep looking past this TAA you would find another TAA at index 15. Now the distance between the ATG and TAA is 15 which is a multiple of three, so this is a valid gene. The first TAA that we found was not actually a codon, but rather pieces of two adjacent codons, the T from GCT, and the AA from an AAT. Your next improvement to the gene finding algorithm is to add this functionality, to make the algorithm keep looking until it finds a stop codon that is a multiple of three away from the start codon. Having just worked that example let us now do step two of the seven step process and write down what we just did. The first thing we did was to find the ATG. Then we found the first occurrence of TAA after the ATG which was right here at index eight. Then we checked that the distance between them was a multiple of three or not. In this case, it was not. So we found the next TAA after this first one. The second one is right here at index15. Then we checked if the distance between this TAA and the start codon was a multiple of three. It was, so that all of the sub string from 0 up to 18 was our answer. In this particular set of steps, we checked in two places to see if the distance was a multiple of three. If this works in the general case, you can just implement this algorithm with familiar if else statements. However, do we always only need to check twice? Let's look at a different DNA string. With this DNA string we would need to check three times. The first two TAAs are not a multiple of three away from the start codon but the third one is. Would checking three times be enough? Could we have to check 4 times, 5, 10, 50 times? This raises the question of how many times we have to check in general. And the answer is that we cannot think of a particular number of times. Even if you wrote 50 if else statements, we could come up with a DNA string that has more than 50 TAAs that are not a multiple of three away from the start codon before finding a valid one. Instead we need to write our algorithm so that it repeats the checking however many times it needs to. As you have seen before, repetition in your algorithm will turn into a loop when you translate the algorithm into code. To express your algorithm with repetition, you will need to make the repetitive steps the same and figure out what to loop over. Previously you have seen four loops, which iterate over the elements in some interval such as pixels in an image. Now, you're going to learn about a new kind of loop known as a While Loop, which lets you iterate as long as some condition holds. Before we try to generalize these steps by finding repetition, let's be a bit more precise about what we did. We found the first ATG at index zero. For the first TAA we started looking at index three and found it at index eight. We checked if eight was a multiple of three, it wasn't. So we started looking at index nine for the second TAA and we found it at index 15. We checked if 15 was a multiple of 3, it was, so everything between was our answer. Now, let us take these steps and generalize them. We looked for ATG here, why was that? We always want to look for ATG because it is the start codon. What about the fact that we found it at index zero? We're not always going to find it at index zero, however we are going to want to use that information, so let's give that a name. When we turn this into code, what will be a variable that we will assign to this in this line and then use later? In particular, we'll call it StartIndex. What about looking for TAA? We will always want to look for TAA, since that is the stop codon. Will we always start at index three? Probably not. Why did we start at index three here? We started there because it was right after the start codon that we found. In the general case, this would be startIndex + 3. We won't always find it at index eight either, so let's give that a name too. We'll call it currIndex. Let's also be a bit more specific about the distance between them. It is currIndex minus startIndex. Which happens in both of these steps. Next, you aren't always going to start looking from index nine, but why do we start at nine here? If you look back at where we work the problem and wrote down our steps, we started at nine because the previous one started at eight. In our generalized algorithm, we named the previous location currIndex, so we can start from currIndex + 1. We also should name the location where we found it. Should we give it a new name such as nextIndex? Or should we just update an existing name such as, currIndex? In this case, we want to update currIndex since that represents where we have found the most recent TAA. If you did not realize this right away and gave it a different name, you would realize it later on as you try to make the steps uniform so that you can express the repetition. Finally, we'll generalize the last step to just indicate that the text between them is the text from start index to currIndex + 3. Now these steps look repetitive. The repetition maybe a bit hard to see since it only happens twice. But if you wrote down the steps for strings with more TAAs that dont work you would see that you do these steps over and over again. To make this repetitive let's write it down like this. Notice that steps four, five, and six are what we will repeat. We've slightly adjusted the steps from before to reflect the choice we were making in step four and the two possible outcomes in steps five, and six. However, we have left the conditions under which we'll repeat these steps blank here. How do we know when to stop repeating them? Also what would you do after you stopped repeating this loop? We would stop if we run out of TAAs. If that happened currIndex would be minus 1, which you know from having learned that you get minus 1 when you cannot find something in a string. If you were to encounter this case, it would mean that there is no valid gene in the string. So you should give an answer of the empty string. If you did not see this right off, what could you do to figure it out? You should work more examples until you understand the pattern. Now, your algorithm is generalized. But you'll need to learn about while loops before you can translate this into code, thank you.