Welcome to the last course in the statistics with our specialization. Congratulations on getting this far. Now you get to put the skills you've acquired and the methods you've learned in the previous courses into a thorough data analysis project, using real data to answer a concrete business problem. >> You are a consultant hired to analyze data for a real estate investment firm. Your assignment is to help provide insight into how the firm should invest and to quantify and communicate what types of real estate properties are good investment and why. Some tips, it's generally a bad idea to buy the most expensive house on the block and remember, the real estate agent's mantra location, location, location. Keep in mind that the goal is to make money for your investors and hence, investing in a property that is overvalued is rarely a good idea. And this means that it is critical to know which properties are overvalued and which are undervalued. The regression techniques that you've learned in the prior courses can help you identify these types of properties. >> We will give you three data sets, a subset for training, a subset for testing and a third subset for validation. You're asked to build your model or models initially only using the training data. Then you will test your model on the testing data. And finally, validate using the validation data. Could you use just all the data from the beginning? Yes. But should you? No, we are challenging you to keep your analysis experience realistic. And in a realistic scenario, you would not have access to all three of these data sets at once. So why are we giving them all to you at once? Well, because we want you to be able to progress through the capstone project at your own pace, which means some of you might get to a later step in the analysis earlier than others. >> Over the following weeks, you'll work on a series of assignments and analyzes, leading to the final assignment. First, you will complete a quiz, which will be similar to the lab quizzes you've taken in prior courses. You will load the initial data set into R and answer a series of questions ranging from straightforward data manipulation to more advanced modeling questions. These questions are designed to lead you in the right direction for your final project. You will submit your answers to these questions in a multiple choice quiz and receive instant feedback. You should use this feedback to determine whether you are on the right track for your final project or whether you would benefit from reviewing concepts from the prior courses. This is also a great time for discussion on the course forums. We strongly encourage you to take advantage of the fact that you are not alone in the course and ask questions, answer each other, and discuss findings. >> Next, you'll complete a peer assessment where you will do exploratory data analysis modal selection and model evaluation. The questions on this peer assessment will be similar to the ones you worked on in the previous quiz. You will complete this assessment by writing a data analysis report with R Markdown, and submit your source code and your compiled report for evaluation by your peers, just like the final projects in the prior courses. Your submission will be reviewed by three of your peers, and you in turn will review the work of three of your peers using the rubrics provided. >> Then you'll repeat this process with another quiz and then, a final peer assessment. For the second round, you will need to do more advanced modeling. However, the process and logistics will be very similar to the first round of assessments. In your final write-up, you should provide a summary of the data and problem and discuss objectives for your models. You're welcome to use Frequentist or Bayesian approaches, as you see fit. We very strongly encourage you to try both and compare and contrast your findings. Your interpretations should especially discuss how housing prices change with changes in covariance. You might also want to evaluate the predictive success of your models and provide a discussion around that. Lastly, it is a good idea to include a summary of limitations of your models and methods, as well as what you would do if you had more time, based on what you have discovered so far. >> Along the way, you might feel like you take two steps forward and one step back. This is entirely natural. So please do not get discouraged. Data analysis is an iterative process. You might have some hypotheses about the relationships between the variables you're working with and the only way to find out if these hypotheses are supported by your data is to explore them. Depending on what you discover during the exploration, you might decide not to include it in your final project. This does not mean you wasted time. In fact, if every single visualization you made and every model you fit makes it into your final project, you're probably making a mistake. You want to try a variety of approaches and then pick and choose between them to craft your final project. That being said, you probably do not want to constantly feel like you're going down the wrong path or having to backtrack too far in your work due to a careless mistake. In order to avoid this, we strongly recommend that you take cues from the answers to the quiz questions and peer assessment feedback along the way. >> Good luck with your analysis, and don't hesitate to ask questions and share your findings in the course forum. Keep in mind, there is no one right answer and your justifications and conclusions are just as important as the methods and techniques you use to arrive at them.