TO: Students in CJ 161-003
FROM: R. B. Taylor
DATE: 3/19/03
RE: M&Ms and Central Limit theorem and z scores and so on

Here is what we did

  1. Each of your two households got a random number.
  2. The number of blue m&ms in each of your households represented the number of working firearms in that household.
  3. We called out random numbers so we could randomly select households. Since we were doing random selection, each time we sampled a household, each household's chances of being sampled were equal. This makes it a probability sample.
  4. We tried to get five households, randomly selected, into each sample of five households. We had some missing data - analogous to households that are not home when you call with a survey -- so in some samples we have only four households, and in one sample we have three.
  5. We calculated the AVERAGE for each sample. These SAMPLE AVERAGES make up the distribution of sampling means.  In real world research you usually only take one sample but you COULD theoretically take a bunch. That is what we did here.
  6. We then sorted the samples from the lowest to highest average.

Here are the raw data for each sample and the sample averages, after we have sorted

S #	1	2	3	4	5	AVERAGE
1	10		4	4	12	7.50
2	8	10	9	4	12	8.60
3	4	10	9	16	8	9.40
4	3	9	10	9	17	9.60
5	4	16	10	8	11	9.80
6	13	12	10	4	11	10.00
7		11	9	10	11	10.25
8		11		8	12	10.33
9	10	16		12	4	10.50
10	17	8	8	10	10	10.60
11	12	10	11	10	10	10.60
12	12	12	10		11	11.25
13	12		10	11	12	11.25
14	10	11	16	12	8	11.40
15	13	8		17	9	11.75
16	12	10	10	11	16	11.80
17	17	11	3	18	10	11.80
18	12	17	13	11	7	12.00
19	17	18	4	10		12.25
20		11		10	17	12.67
  1. These means are supposed to be in the shape of a normal distribution. Why? 
    Because of central limit theorem. Will explain more on that below. One way to see if this was
    true would be to put these into a histogram and see if we got a normal shape. We would not 
    expect the shape to match EXACTLY because a normal curve is a theoretically expected
    distribution, given lots and lots and lots and lots of sample means. But let's see how we do.
 
The figure super-imposes a normal curve. How did we do? I would say not so bad. We seem to 
be short one "high" sample, but hey...
  1. We can calculate the mean and sd for the distribution of sample means. Those numbers 
    are as follows:
10.6675	Average of the sample averages		
1.2610	Standard deviation of the sample averages		
The average of the averages is our best estimate of the POPULATION average, in the whole 
population of households. The standard deviation of the sampling distribution of means is the 
SAME THING as the STANDARD ERROR of ONE SAMPLE MEAN. Different term, same
thing.
  1. Now we can start calculating DIFFERENT RANGES in the distribution of sample means
What range?				FROM:	TO:
mean to mean + 1 sd			10.6675	11.9285
mean to mean - 1 sd			10.6675	9.4065
mean to mean + 2 sd			10.6675	13.1895
mean to mean - 2 sd			10.6675	8.1455
 
  1. Now we want to start calculating how many cases fall into various RANGES on the 
    X (guns in household, blue m&ms) variable. We are going to look here at TWO things: 
    what would we expect THEORETICALLY if the distribution of means perfectly matched
    the normal distribution, and, what do we OBSERVE. Obviously, because we only have 20
    samples, and because probability theory is always about what happens in the LONG run, we
    are not going to expect a perfect match. But how do we do?
  2. The slides tell the story. In general, we are not that far off. You can see the results on the slides.
ANSWERS TO SOME QUESTIONS
Q: I don't understand when you use standard deviation (sd) and standard error (se)
A: The se IS the standard deviation of a certain type of distribution: a distribution of MEANS. If we
have an actual distribution of means, as we did from our 20 samples, then you calculate se like you
would a normal sd. If, HOWEVER, you have only ONE sample, and you want an estimate of that
ONE sample's se, you divide sample sd/(square root of n of cases). 
So say you had one sample on the n of firearms and the mean was 3 and the sd was 1 and the sample
size was 16 the se = 1/4 = .25
Q Why is the graph always in a bell curve even if the numbers do not reflect the curve?
A Great question! The "bell" or normal curve is a THEORETICAL distribution - that is, it tells us how
lots and lots and lots and lots ....... and lots ..... and lots of sample means would distribute themselves. 
So it is showing us what to EXPECT. By looking at how our numbers are distributed RELATIVE to 
that theoretical expectation we can get an idea of how close or far off we are from that expectation.
With real world data, even if we have lots of cases, we will NOT match that theoretical expectation
perfectly.
Q Please explain more on the concept of population estimate
A If we draw a sample using probability procedures, we can use that sample mean to get an estimate
of the population average. But we need to take into account that we have introduced sampling error
through the process of sampling. So we expect that the population mean will be centered on our
sample mean, and will be within two standard errors of it. Stated differently, we can estimate that
the chances very good that the REAL population number -- which of course we can never know --
is within + / - 2 se of the sample mean.
Q Sample means should center on population means?, but not center exactly on population means?
A: YES: take a look at central limit theorem again; B&P, p. 228, point 1. And YES again - see above