TO: Students in CJ 161-003
FROM: R. B. Taylor
DATE: 3/19/03
RE: M&Ms and Central Limit theorem and z scores and so on
Here is what we did
Here are the raw data for each sample and the sample averages, after we have sorted
S # 1 2 3 4 5 AVERAGE
1 10 4 4 12 7.50 2 8 10 9 4 12 8.60 3 4 10 9 16 8 9.40 4 3 9 10 9 17 9.60 5 4 16 10 8 11 9.80 6 13 12 10 4 11 10.00 7 11 9 10 11 10.25 8 11 8 12 10.33 9 10 16 12 4 10.50 10 17 8 8 10 10 10.60 11 12 10 11 10 10 10.60 12 12 12 10 11 11.25 13 12 10 11 12 11.25 14 10 11 16 12 8 11.40 15 13 8 17 9 11.75 16 12 10 10 11 16 11.80 17 17 11 3 18 10 11.80 18 12 17 13 11 7 12.00 19 17 18 4 10 12.25 20 11 10 17 12.67
These means are supposed to be in the shape of a normal distribution. Why? Because of central limit theorem. Will explain more on that below. One way to see if this was true would be to put these into a histogram and see if we got a normal shape. We would not expect the shape to match EXACTLY because a normal curve is a theoretically expected distribution, given lots and lots and lots and lots of sample means. But let's see how we do.
The figure super-imposes a normal curve. How did we do? I would say not so bad. We seem to be short one "high" sample, but hey...
We can calculate the mean and sd for the distribution of sample means. Those numbers are as follows:
10.6675 Average of the sample averages 1.2610 Standard deviation of the sample averages
The average of the averages is our best estimate of the POPULATION average, in the whole population of households. The standard deviation of the sampling distribution of means is the SAME THING as the STANDARD ERROR of ONE SAMPLE MEAN. Different term, same thing.
Now we can start calculating DIFFERENT RANGES in the distribution of sample means
What range? FROM: TO: mean to mean + 1 sd 10.6675 11.9285 mean to mean - 1 sd 10.6675 9.4065 mean to mean + 2 sd 10.6675 13.1895 mean to mean - 2 sd 10.6675 8.1455
Now we want to start calculating how many cases fall into various RANGES on the X (guns in household, blue m&ms) variable. We are going to look here at TWO things: what would we expect THEORETICALLY if the distribution of means perfectly matched the normal distribution, and, what do we OBSERVE. Obviously, because we only have 20 samples, and because probability theory is always about what happens in the LONG run, we are not going to expect a perfect match. But how do we do?
The slides tell the story. In general, we are not that far off. You can see the results on the slides.
ANSWERS TO SOME QUESTIONS
Q: I don't understand when you use standard deviation (sd) and standard error (se)
A: The se IS the standard deviation of a certain type of distribution: a distribution of MEANS. If we have an actual distribution of means, as we did from our 20 samples, then you calculate se like you would a normal sd. If, HOWEVER, you have only ONE sample, and you want an estimate of that ONE sample's se, you divide sample sd/(square root of n of cases).
So say you had one sample on the n of firearms and the mean was 3 and the sd was 1 and the sample size was 16 the se = 1/4 = .25
Q Why is the graph always in a bell curve even if the numbers do not reflect the curve? A Great question! The "bell" or normal curve is a THEORETICAL distribution - that is, it tells us how lots and lots and lots and lots ....... and lots ..... and lots of sample means would distribute themselves. So it is showing us what to EXPECT. By looking at how our numbers are distributed RELATIVE to that theoretical expectation we can get an idea of how close or far off we are from that expectation. With real world data, even if we have lots of cases, we will NOT match that theoretical expectation perfectly.
Q Please explain more on the concept of population estimate A If we draw a sample using probability procedures, we can use that sample mean to get an estimate of the population average. But we need to take into account that we have introduced sampling error through the process of sampling. So we expect that the population mean will be centered on our sample mean, and will be within two standard errors of it. Stated differently, we can estimate that the chances very good that the REAL population number -- which of course we can never know -- is within + / - 2 se of the sample mean.
Q Sample means should center on population means?, but not center exactly on population means? A: YES: take a look at central limit theorem again; B&P, p. 228, point 1. And YES again - see above