Central limit theorem in two minutes
My goal is to prove central limit theorem with as few words as possible.
Central limit theorem states : Distribution of means of samples, randomly taken from population, has approximately normal distribution.
Idea is simple:
- Generate non normally distributed population
- Randomly take large number of samples
- Apply average function to that samples
- Look at distribution of samples’ means
- Distribution of samples’ means will have mean approximately equal to population’s mean and standard deviation equal to population’s standard deviation divided by square root of sample size
Lets get our hands dirty!
- Lets get uniformly distributed population between -10 and 10.
Here is Mathematica code for that.
population = RandomVariate[UniformDistribution[{-10, 10}], 10000];
Lets plot it!
2. Lets take one random sample of size 30 from population.
RandomSample[population, 30]
We are going to get list of numbers. Something like this:
{-0.765169, 5.43087, -2.30363, -5.84992, 8.27449, 9.1928, 9.95873, \
-1.28949, 7.85436, -9.75674, -9.63462, -6.44642, -8.01943, -4.9619, \
-1.33589, -7.55137, 5.76924, 9.15316, 6.10032, 8.00286, 6.60086, \
5.449, -2.07678, 4.40854, -7.47191, -7.11837, 0.308717, -2.83244, \
2.19256, 5.48393}
Lets apply Mean[] function to this list!
Mean[RandomSample[population, 30]]
Now we got scalar. Just average of this list, which in our case is 0.558879.
3. Lets get many samples and apply Mean[] function to each sample.
samplesMeans=Table[Mean[RandomSample[population, 30]], 500];
Lets plot it!
4. As we can see from above picture, distribution of samples’ means will be normal (Gaussian) with mean equal to population mean and standard deviation equal to population’s standard deviation divided by square root of sample’s size.
- Important notes: First, central limit theorem applies also to populations with bimodal distributions, like on picture above, not only to uniform distributions. Second, sampling distribution of the mean is simply another name for distribution of samples’ means.
Population mean in my case is -0.0733272.
populationMean = Mean[population]
Mean of samples’ means in my case is -0.0764752. Which is almost the same as population mean!
samplesMeansMean = Mean[samplesMeans]
Population’s standard deviation in my case is 5.75503.
populationStd = StandardDeviation[population]
Standard deviation of samples’ means in this case is 0.956219. Which is almost equal to populationStd/Sqrt[sampleSize].
samplesStd = StandardDeviation[samplesMeans]
populationStd/Sqrt[30]=1.05072.
As we increase sample size, sample’s mean is going to approach population mean and standard deviation of samples means is going to approach zero.
The larger sample size, the better!
Each calculation from this presentation involves some sort of randomness. So don’t be afraid if you get values different than I got. Every time you run calculation, you get different values. Be cautious, parameter values should not vary a lot between different calculation runs!
Although central limit theorem has theorem in its name, it has nothing to do with theory. People were just observing how sample’s statistic behaves when drawn from population many times.
In this article we saw real life importance of normal distribution. Regardless of population distribution, distribution of values of samples’ means is going to be normal.
So what do you think, why students have difficulties in understanding something so intuitive?