Welcome. In this video we will consider one more econometric problem, which is called endogeneity. Previously, we assumed that our independent variables x are predetermined or non-stochastic. But in fact, in reality, when we deal with real data, they are random. In order to get best linear unbiased estimates, we have to assume exogeneity of our independent variables. This means that our independent variables x and our disturbances are independent or they are uncorrelated. This is a quite strong assumption which has to be taken into account. If we write down our assumptions in terms of conditional values, they would look in the following way: the conditional value of the disturbance term, conditional on x is equal to 0, and the conditional variance of the disturbance term is equal to some constant value, Sigma squared. Or in other words, these assumptions mean the uncorrelatedness of x and the disturbance term. The violation of this assumption is called endogeneity. Endogeneity is a big problem. It means that if we have endogeneity, this means that our estimates are biased and inconsistent. We have to deal with that definitely. Let's consider an example where we calculate an estimate for the regression model without the constant. In this model, if we assume uncorrelatedness of x and Epsilon, then our estimates are unbiased and consistent. But in fact, it happens that x is correlated with the disturbance term, and we can say that our covariance of x and Epsilon are equal to some value Sigma. Then if we calculate the estimate for Beta hat, then it will be our formula for the regression model without a constant. When we substitute, instead of y, our model, Beta x plus Epsilon, then we can write down the estimate for Beta hat as the sum of the true parameter and a random term. Since our covariance between the x and the disturbance term is not equal to 0, then the second term is not equal to 0. Our Beta hat is biased, and if we calculate the expected value for Beta hat, it will not be equal to Beta. We can also show that our estimate Beta hat is inconsistent. In order to show that we have to calculate the probability limit. If we try to calculate the probability limit for these formula, we can see that the probability limit is equal to Beta, the true value for the parameter plus the fraction. For this fraction, we have to calculate the probability limits for nominator and the denominator. The probability limit for the nominator is Sigma x epsilon. The probability limit for denominator is just sigma squared. We calculate those probability limits according to the law of large numbers. The probability limit of Beta hat is equal to the true parameter Beta plus the fraction of the Sigma x Epsilon, and Sigma squared. So our Beta hat is inconsistent. The reasons of an endogeneity are the following ones. First of all, endogeneity can arise when we have omitted variable in our regression model. Secondly, when we have the lagged dependent variable in our regression model and simultaneously autocorrelation in disturbances. Third, an endogeneity can happen when we have the measurement error in our independent variables. Finally, it can also happen when there is simultaneity in our economic models. When we have an omitted variable, then obviously you remember that our estimates are biased. It can often happen when we cannot measure some individual characteristic. For example, if we studied the relationship between the wage and the years of schooling, then we might skip the variable abilities of students because it is difficult to measure. These characteristic abilities of students can be correlated with the years of schooling. In this case, we have the omitted variable problem and we have endogeneity because the omitted variable is captured by the disturbance. Another example of omitted variable is when we studied the relationship between the executive compensation and the firm value. We often skip their variable for abilities and these abilities of executives can be correlated with the executive compensation and the firm value. In this case, we also face endogeneity. Let's consider an example where we study the relationship between the dependent variable and two independent variables. If we are not able to measure the second independent variable, x_2, then we are able only to estimate the model with one independent variable x_1. In this case, we emit variable x_2. These admitted variable is captured by the disturbance u. It means that our disturbance u is correlated with the regressor x_1. If we calculate the covariance between x_1 and the disturbance term u, we can see that this covariance is not equal to 0. The second reason of endogeneity is simultaneous inclusion of a lagged dependent variable y and autocorrelation in disturbances. Let's consider a regression model in which we include the lagged dependent variable. Economists are often interested in such types of models. If our regressors, and here regressors are x and y t minus 1 are not correlated with the disturbance term, then our estimates are consistent. But if in our disturbance term there is autocorrelation of any order, let's say here we have the autocorrelation of first-order, then we have the problem of inconsistency. In order to see the problem, let's rewrite our regression model in period t minus one. Notice that in this model we have the disturbance term in period t minus 1. Do not forget the autocorrelation in the model for a period t, where we have also the Epsilon t minus 1. As you can see, our disturbance term is correlated with our regressor yt minus 1, and hence, our OLS estimates are inconsistent. Another reason for endogeneity is the measurement error for regressors x. In case we have the regression model for y and x, and we have the measurement error in x, we can notice that our observations are distributed with higher, greater variance of x down for initial values of x. In this case, we can see that our regression line is shifted towards 0. So our estimate of slope, or for this regression is biased. Let's consider the measurement error for a regression model y on x. Here, our aggressors and the disturbance terms are uncorrelated. But we have the measurement error in x and therefore we observe not x but z variable. It means that our measurement error is called u_i. Then if we substitute our observed value into the regression model, then we will see that we have the new disturbance w. This new disturbance w is consisted of the measurement error and the disturbance term. So this new disturbance term w is correlated with the regressor, which we include in our regression model, the regressor z. If we calculate the covariance between z and w, then we can see that this covariance is not equal to 0, but is equal to minus Beta Sigma squared u. If we go further and calculate the probability limit for beta hat, then we will see that this probability limit is equal to the true parameter, beta_2, plus the ratio between the sigma z_w, and sigma squared z. We can also simplify this expression, and we can usually calculate the probability alignment for beta hat. We can see that our beta hat is inconsistent, and it can be consistent only if we do not have the measurement error, which means that the variance for the measurement error is equal to zero. Another reason of endogeneity is simultaneity. We face simultaneity when our dependent variable y, can also affect independent variable x. For example, if we study the relationship between the ownership structure and the firm's performance, we can say that firm's performance affect ownership structure and the other way around. Or if we study the relationship between capital reserves and banks' performance, we can say that banks' performance affect capital reserves, and the other way around. We often face simultaneity for economic models in which we specify both variables simultaneously. For example, the supply and demand model. In this model, we also have an equilibrium, and we can show that our independent variable price is correlated with the disturbance term, epsilon. Our estimates are biased and inconsistent. Let's consider another example of simultaneity. It's the well-known Keynesian model. If we consider the consumption function, we can say that our consumption is dependent on real income per capita, and beta_ 2 is the marginal propensity to consume. Our income is determined by our consumption, and real investments per capita there is simultaneity in this model. Suppose that our residuals are not correlated with consumption or our income per capita, and our investments are also uncorrelated and therefore they are exogenous, but in this model, not only consumption but also real income per capita are endogenous. That's why Y_t and epsilon t are correlated, therefore our OLS estimates are biased and inconsistent. Let's try to derive the reduced form for these model. In the reduced form, we have the endogenous variables on the left-hand side, and exogenous variables on the right-hand side. As you can see in this system on the right-hand side, we can also have investments and disturbance terms. From these radio system, we can derive the expression for the covariance between Y_t and epsilon t, which is not equal to zero. Therefore, our wearable Y_t is correlated with the disturbance term epsilon t, and our estimates are biased. Let's show that the probability limit is also not equal to the true parameter beta_2. In order to show the probability limit, let's derive the variance for Y_t. Again, we substitute the expression for Y_t from the reduced form, which we simplify according to the properties of the variance, and we see that the probability limit for beta_2 hat is equal to the true parameter plus some term, which is not equal to zero. From this term, we can see that the true marginal propensity to consume is overestimated in this model.