第9章 异方差问题检验与修正.ppt

上传人:本田雅阁 文档编号:2114195 上传时间:2019-02-16 格式:PPT 页数:64 大小:2.65MB
返回 下载 相关 举报
第9章 异方差问题检验与修正.ppt_第1页
第1页 / 共64页
第9章 异方差问题检验与修正.ppt_第2页
第2页 / 共64页
第9章 异方差问题检验与修正.ppt_第3页
第3页 / 共64页
亲,该文档总共64页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

《第9章 异方差问题检验与修正.ppt》由会员分享,可在线阅读,更多相关《第9章 异方差问题检验与修正.ppt(64页珍藏版)》请在三一文库上搜索。

1、第9章 异方差:检验与修正,Heteroskedasticity: test and correction,Contents,Whats heteroskedasticity? Why worry about heteroskedasticity? How to test the heteroskedasticity? Corrections for heteroskedasticity?,Whats heteroskedasticity?,What is Heteroskedasticity?,Recall the assumption of homoskedasticity implied

2、 that conditional on the explanatory variables, the variance of the unobserved error, u, was constant var(u|X)=s2 (homoskedasticity) If this is not true, that is if the variance of u is different for different values of the Xs, then the errors are heteroskedastic var(ui|Xi)=si2(heteroskedasticity),E

3、xample of homoskedasticity,Example of Heteroskedasticity,Examples,Generally, cross-section data more easily induce heteroskedasticity because of different characteristics of different individuals. Consider a cross-section study of family income and expenditures. It seems plausible to expect that low

4、 income individuals would spend at a rather steady rate, while the spending patterns of high income families would be relatively volatile. If we examine sales of a cross section of firms in one industry, error terms associated with very large firms might have larger variances than those error terms

5、associated with smaller firms; sales of larger firms might be more volatile than sales of smaller firms.,Patterns of heteroskedasticity,The relation between R&D expenditure and Sales,The scatter graph between R&D expenditure and Sales,Why Worry About Heteroskedasticity?,The consequences of heteroske

6、dasticity,OLS estimates are still unbiased and consistent, even if we do not assume homoskedasticity. take the simple regression as an example Y= b0 + b1 X +u We know the OLS estimator of b1 is,The consequences of heteroskedasticity, cont.,The R2 and adj-R2 are unaffected by heteroskedasticity. Beca

7、use RSS and TSS are not affected by heteroskedasticity, our R2 and adj-R2 are also not affected by heteroskedasticity.,The consequences of heteroskedasticity, cont.,The standard errors of the estimates are biased if we have heteroskedasticity,The consequences of heteroskedasticity, cont.,The OLS est

8、imates arent efficient, thats the variances of the estimates are not the smallest variances. If the standard errors are biased, we can not use the usual t statistics or F statistics for drawing inferences. That is, the t test and F test and the confidence interval based on these test dont work. In a

9、 word, when there exists heteroskedasticity, we can not use t test and F test as usual. Or else, well get the misleading result.,Summary of the consequences of heteroskedasticity,OLS estimates are still unbiased and consistent The R2 and adj-R2 are unaffected by heteroskedasticity The standard error

10、s of the estimates are biased. The OLS estimates arent efficient. Then, the t test and F test and the confidence interval dont work.,How to test the heteroskedasticity?,Residual plot,In the OLS estimation, we often use the residual ei to estimate the random error term ui, therefore, we can test whet

11、her there is heteroskedasticity of ui by examine ei. We plot the scatter graph between ei2 and X.,Residual plot, cont.,Residual plot, cont.,If there are more than one independent variables, we should plot the residual squared with all the independent variables, separately. There is a shortcut to do

12、the residual plot test when there are more than 1 independent variables. That is, we plot the residual with the fitted value , because is just the linear combination of all Xs.,Residual plot: example 9.2,Park test,If there exists heteroskedasticity, then the variance of error term ui, si2 may be cor

13、related with some of the independent variables. Therefore, we can test whether si2 is correlated with any of the explanatory variables. If they are related, then there exists heteroskedasticity, on the contrary, theres no heteroskedasticity. For example, for the simple regression model ln(si2 )=b0 +

14、 b1 ln(Xi) + vi,Procedure of Park test,Regress dependent variable (Y) on independent variables (Xs), first. Get the residual of the first regression, ei and ei2. Then, take ln(ei2) as dependent variable, the original independent variables logged as explanatory variables, make a new regression. ln(ei

15、2 )=b0 + b1 ln(Xi) + vi Then test H0: b1 =0 against H1: b1 0. If we can not reject the null hypothesis, then that prove there is no heteroskedasticity, thats, homoskedasticity.,Park test: Example,Let take example 9.2 as example First, regress R&D expenditure (rdexp) on sales (sales), we get rdexp =

16、192.91 + 0.0319 sales Se= (991.01) (0.0083) N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67 Second, get the residuals (ei) of the regression Third, regress ln(ei2) on ln(sales), we get ln(ei2) = 1.216 ln(sales) Se = (0.057) p = (0.000) R2=0.9637 Adj-R2=0.9615 Finally, we test whether the slope of the sec

17、ond regression equal zero. From the p-value of the parameter, given 5% significant level, we will can reject the null hypothesis. Therefore, there exist heteroskedasticity in the first regression. Note: Park test is not a good test for heteroskedeasticity because of his special specification of the

18、auxiliary regression, which may be heteroskedastic.,Glejser test,The essence of Glejser test is same to Park test. But, Glejser suggest we can use the following regression to detect the heteroskedasticity of u. |ei|=b0 + b1 Xi + vi |ei|=b0 + b1 Xi + vi |ei|=b0 + b1 (1/Xi) + vi Still, we just test H0

19、: b1 =0 against H1: b1 0. If we can reject the null hypothesis, then that prove there is heteroskedasticity. On the contrary, its homoskedasticity.,Glejser test: example 9.2,First, regress R&D expenditure (rdexp) on sales (sales), we get rdexp = 192.91 + 0.0319 sales Se= (991.01) (0.0083) N=18 R2=0.

20、4783 Adj-R2=0.4457 F(1,16)=14.67 Second, get the residuals (ei) of the regression Third, regress |ei| on 1/sales, we get |ei|=2273.65-1992500 (1/sales) se =(604.69) (12300000) p =(0.002) (0.125) Finally, test whether the slope is zero. From the p-value of the slope, we can see it larger than 5% of s

21、ignificance level. We can not reject the null hypothesis, that means there doesnt exist heteroskedasticity.,The White Test,The White test is more general test, which allows for nonlinearities by using squares and crossproducts of all the Xs, ie., k=3 Y = b0 + b1X1 + b2X2 + b3X3 + u e2= d0 + d1 X1+ d

22、2X2 +d3 X3 + d4 X12+d5X22 +d6X32+d7X1X2+d8X1X3+d9X2X3+v Using an F or LM to test whether all the Xj, Xj2, and XjXh are jointly significant, that is, to test H0: d1=d2=d9=0 against H1: H0 is not true. If we can reject H0, that means there exists heteroskedasticity.,The White Test,To test H0: d1=d2=d9

23、=0, we can use F test learned in chapter 4. Let R2 stands for the goodness of fit from the auxiliary regression. F = R2/k/(1 R2)/(n k 1) We also can use LM test. LM=nR2c2(k), n is number of obs. k is the number of restrictions.,The White Test: Example 9.2,First, regress R&D expenditure (rdexp) on sa

24、les (sales) and profits (profits) , we get rdexp = -13.93 + 0.0126 sales +0.2398profits se =(991.997)(0.018) (0.1986) p =(0.989) (0.496) (0.246) n=18 R2=0.5245 Adj-R2=0.4611 F=8.27 Second, we get the residuals e from the regression above. Third, regress e2 on sales, profits, sales2, profits2, and sa

25、lesprofits. e2=693735.5+135.00sales-1965.7profits-0.0027sales2-0.116 profits2+0.050salesprofits N=18 R2=0.8900 F(5, 12)=19.42 Prob F=0.0000 Finally, test H0: d1=d2=d3=d4 = d5=0, The p-value of the F test is 0.0000, so we can reject H0. LM=nR2=180.89=16.02 c20.05 (5)=11.07, also reject H0. So, there

26、exists heteroskedasticity in the first regression.,Alternate form of the White test,This can get to be unwieldy pretty quickly Consider that the fitted values from OLS, , are a function of all the Xs Thus, 2 will be a function of the squares and crossproducts and and 2 can proxy for all of the Xj, X

27、j2, and XjXh, so Regress the residuals squared on and 2 and use the R2 to form an F or LM statistic Note only testing for 2 restrictions now,The procedure of the special case of white test,regress Y on X1,X2,Xk. We get the residual ei Calculate , 2 (predict ybar,xb. Gen ybarsq=ybar2) regress e2 on ,

28、 2 . And test the joint zero hypotheses of the regressors Use F statistic or LM test to test the null hypothesis of homoskedasiticity.,Example: white test in wage determination equation,First, using OLS estimate the model without considering heteroskedasticity wge =-2.87+0.599educ + 0.022exper +0.13

29、9tenure Calculate the residuals of regression, ei and the fitted value of wage, wge. Therefore, the value of ei2, wge2. Regress ei2 on wge, wge2,we get ei2=7.36 2.86 wge +0.49 wge2 se =(5.62) (1.76) (0.125) n=526 R2=0.0984 F=28.55 ProbF=0.000 Test Ho:d1=d2= 0, F test, F=28.55 ProbF=0.000 5.99= c20.0

30、5(2), reject H0.,Corrections for Heteroskedasticity,Corrections for Heteroskedasticity,Known variances, Var(ui|X)=si2 The original model is Yi =b0 + b1Xi1 + bkXik+ ui Two sides divided by si at the same time The new disturbance is ui*=ui/si ,then var(ui*)=var(ui/si)=var(ui)/si2=1 So the new model Yi

31、/si =b0/si + b1Xi1/si + bkXik/si+ ui/si, that is, Y* =b0* + b1X1* + bkXk*+ u* We can estimate the new model with OLS, this is called WLS But, usually, we dont know the variances.,Case of form being known up to a multiplicative constant,Suppose the heteroskedasticity can be modeled as Var(u|X) = s2h(

32、X), where the trick is to figure out what h(X) hi looks like E(ui/hi|X) = 0, because hi is only a function of X, and Var(ui/hi|X) = s2, because we know Var(u|X) = s2hi So, if we divided our whole equation by hi we would have a model where the error is homoskedastic,Case 1: h(X)=X,The simple regressi

33、on model Yi =b0 + b1Xi + ui We know ui is heteroskedasticity and the variance of ui is Var(u|Xi) = s2h(Xi)=s2Xi, Then, we divide the original model by Xi two sides, get a know model Yi/ Xi =b0 /Xi + b1 Xi /Xi + ui /Xi, rewrite it as Yi/ Xi =b0 /Xi + b1Xi + vi (*) Var(vi)=var(ui /Xi)= var(ui)/Xi=s2,

34、which is homoskedastic. Therefore, the new equaiton (*) can be estimated using OLS.,Example 9.6(textbook2e, p233),We have proved that there exist heteroskedasticity in the model of R&D expenditure determination model. Now, we assume the variance of the error term change with independent variable sal

35、es, that is, var(ui)=s2salesi The original model is rdexpi =b0 + b1salesi + ui The transformed model is rdexpi/ salesi =b0 (1/ salesi)+ b1 salesi + vi, Where, vi=ui / salesi,Example 9.6(textbook2e, p233),Estimate of the transformed model is rdexp/sales =-246.73 (1/ sales)+ 0.0368 sales rdexp= -246.7

36、3 + 0.0368sales se =(381.16) (0.0071) t =(-0.65) (5.17) n=18 R2=0.6923 Adj-R2=0.6538 F=18.00 WLS command: reg rdexp sales aweight=1/sales Estimate of the original model is rdexp = 192.91 + 0.0319 sales Se= (991.01) (0.0083) t = (0.19) (3.83) N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67 Compare the res

37、ult of the two estimation, what do you find?,Case 2: h(X)=X2,The simple regression model Yi =b0 + b1Xi + ui We know ui is heteroskedasticity and the variance of ui is Var(u|Xi) = s2h(Xi)=s2Xi2, Then, we divide the original model by Xi two sides, get a know model Yi/Xi =b0 /Xi + b1 Xi /Xi + ui /Xi, r

38、ewrite it as Yi/Xi =b0 /Xi + b1 + vi (*) Var(vi)=var(ui /Xi)= var(ui)/Xi2=s2, which is homoskedastic. Therefore, the new equaiton (*) can be estimated using OLS.,Generalized Least Squares,Estimating the transformed equation by OLS is an example of generalized least squares (GLS) GLS will be BLUE in

39、this case, (because the transformed equation will meet the Gauss-Markov assumption) GLS is a weighted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(ui|xi),More on WLS,More on WLS, cont.,More on WLS, cont.,A similar weighting arises when we are using per

40、capita data at the city, country, state, or country level. If the individual-level equation satisfies the Guass-Markov assumptions, then the error in per captia equation has a variance proportional to one over the size of the population. Therefore, weighted least squares with weights equal to the po

41、pulation is appropriate.,Summary of WLS,WLS is great if we know what Var(ui|xi) looks like In most cases, wont know form of heteroskedasticity Example where do is if data is aggregated, but model is individual level Want to weight each aggregate observation by the inverse of the number of individual

42、s,Feasible GLS,More typical is the case where you dont know the form of the heteroskedasticity. In this case, you need to estimate h(xi) Typically, we start with the assumption of a fairly flexible model, such as Var(u|x) = s2exp(d0 + d1x1 + + dkxk) Since we dont know the d, must estimate,Feasible G

43、LS (continued),Our assumption implies that u2 = s2exp(d0 + d1x1 + + dkxk)v Where E(v|x) = 1, then if E(v) = 1 ln(u2) = a0 + d1x1 + + dkxk + e Where E(e) = 0 and e is independent of x Now, we know that e is an estimate of u, so we can estimate this by OLS,Feasible GLS (continued),Now, an estimate of

44、h is obtained as = exp(), and the inverse of this is our weight, So, what did we do? Run the original OLS model, save the residuals, e, square them and take the log, that is ln(e2) Regress ln(e2) on all of the independent variables and get the fitted values, Do WLS using 1/exp() as the weight,Exampl

45、e of FGLS: Demand for Cigarettes (Smoke.raw),What determine the peoples daily demand consumption? Variables cigs, cigarettes smoked per day. income, annual income, $. cigpric, cigarettes price for per pack. age, in years restaurn, dummy vaiable =1 if state restaurant smoking restrictions Model cgs =

46、 -3.64 + 0.88 log(income) 0.75 log(cigpric) 0.50 educ + 0.77 age 0.009 age2 2.83 restaurn,Example of FGLS: Demand for Cigarettes,Use White test the heteroskedasticity: Get e2 and reg e2 on all independent variables Get F=13.69 p-value=0 Or, LM=8070.0329=26.55 p-value =0 That proves there exists hete

47、roskedasticity. reg ln(e2) on all the independent variables and get the fitted value Transforming all the data with 1/e, and regress the transformed equation without constant. cgs = 5.63 + 1.295 log(income) 2.94 log(cigpric) 0.463 educ + 0.482 age 0.0056 age2 3.461 restaurn The income effect is now

48、statistically significant and larger in magnitude. The estimates changed somewhat, but the basic story is still the same. Cigarette smoking is negatively related to schooling, has a quadratic relationship with age, and is negatively affected by restaurant smoking restrictions.,New specifications to

49、correct heteroskedasticity,Use new specifications, sometimes, can correct the heteroskedasticity. The log-linear model is more often homoskedasticity. For example, in the example 9.2, we can use the following specification: ln(rdexpi) =b0 + b1ln(salesi)+ ui The estimated model is ln(rdexpi) = -7.37 + 1.32 ln(salesi) se = (1.8

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1