TO: Students in CJ 405

FROM: RB Taylor

DATE: 11/5/01

RE: Comments on 10/29 homeworks on error terms

POINT 1
Several of you when looking at the assumption

E(e) = 0

[ the expected value of the residuals is zero ]

thought this was just with reference to the entire set of error terms, and that since the overall average was zero, all was settled.

But if you look carefully at point 1 at the bottom of p. 31 in Hamilton, you will see that the zero mean is expected for "every value of X." Every value makes sense when X is a categorical variable; when you have lots of X values you can reduce it to categories such as low, medium, and high, which we did in class. So this means that you would want to get the average (e) at low, medium, and high X values to see if the assumption is being met. Can you see the steps you would need to follow in order to do this?

POINT 2
The idea about the error terms being independent refers BOTH to independence from scores on the X variables, and to independence from errors of other cases. Again, see p. 31 in Hamilton. The suggestion just made above will help you see if r(x1,e) = 0; if the mean of e is the same for every value (or range of X) then you have independence from X.

Independence from other other cases is a harder nut to crack. Just a couple of short points here.

1) Non-independent error terms is a serious no-no; it implies you do not have independent observations, which is a fundamental underpinning for statistical testing; in fact, your degrees of freedom assume this independence (freedom=independence, get it?)

2) You are likely to have non-independence among error terms whenever your dependent variable is based on multiple observations of the same units over time (county crime rates over years, a person's score on repeated testings), or when the individual units from whom/which you are collecting the outcome data are ecologically grouped in some way (students in different schools, residents in different neighborhoods, precincts in different police departments).