Introduction to Central Limit Theorem


 The central limit theorem (clt for short) is one of the most powerful and useful ideas in all of statistics. There are two alternative forms of the theorem, and both alternatives are concerned with drawing finite samples size n from a population with a known mean, μ, and a known standard deviation, σ. The first alternative says that if we collect samples of size n with a “large enough n,” calculate each sample’s mean, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape. The second alternative says that if we again collect samples of size n that are “large enough,” calculate the sum of each sample and create a histogram, then the resulting histogram will again tend to have a normal bell-shape.

In either case, it does not matter what the distribution of the original population is, or whether you even need to know it. The important fact is that the distribution of sample means and the sums tend to follow the normal distribution.

The size of the sample, n, that is required in order to be “large enough” depends on the original population from which the samples are drawn (the sample size should be at least 30 or the data should come from a normal distribution). If the original population is far from normal, then more observations are needed for the sample means or sums to be normal. Sampling is done with replacement.

CLT tells that when we increase the sample size, the distribution of sample means becomes normally distributed as the sample, whatever be the population distribution shape. This theorem is particularly true when we have a sample of size greater than 30. The conclusion is that if we take a greater number of samples and particularly of large sizes, the distribution of sample means in a graph will look like to follow the normal distribution.

The main importance of the normal distribution comes from the central limit theorem. The central limit theorem states that the sampling distribution of the mean of sample means approaches the normal distribution as the sample size gets larger no matter the shape of the population distribution. Lets look at this through an example. Consider the case where we look at the number of tweets a person makes in a week (randomly generated data between 0 and 200). The frequency distribution of the data looks like this:

Image for post

This is not similar to any kind of distribution we know.

Now lets take 1000 random samples of size 50 from this data and calculate the mean of each sample. When we plot these means we get a normal distribution curve also known as the sampling curve or the sampling distribution.

Image for post

Mean = 98.78 (population mean = 98.87)

The central limit theorem has some important properties:

  1. The mean of the population is approximately equal to the mean of the sampling distribution. We can see this in the example above where population mean( 98.87 ) is approximately equal to the mean of sampling distribution (98.78).
  2. The standard deviation of the sampling distribution also known as the standard error is equal to the population standard deviation divided by the square root of the sample size. As a result the greater the sample size, the lower the standard deviation and greater accuracy in determining the sample mean from the population mean.
Image for post
  1. The distribution of sample means is normal regardless of the shape of the population distribution. It means that even if our original distribution is skewed or bimodal or some other distribution the mean of sample means is always a normal distribution. This is what makes the central limit theorem so powerful.

For the central limit theorem to hold the sample size should be sufficiently large (generally > 30)


Suppose that we are interested in estimating the average height among all people. Collecting data for every person in the world is impractical, bordering on impossible. While we can’t obtain a height measurement from everyone in the population, we can still sample some people. The question now becomes, what can we say about the average height of the entire population given a single sample.

The Central Limit Theorem addresses this question exactly. Formally, it states that if we sample from a population using a sufficiently large sample size, the mean of the samples (also known as the sample population) will be normally distributed (assuming true random sampling). What’s especially important is that this will be true regardless of the distribution of the original population.

When I first read this description I did not completely understand what it meant. However, after visualizing a few examples it become more clear. Let’s look at an example of the Central Limit Theorem in action.

Further Intuition

When I first saw an example of the Central Limit Theorem like this, I didn’t really understand why it worked. The best intuition that I have come across involves the example of flipping a coin. Suppose that we have a fair coin and we flip it 100 times. If we observed 48 heads and 52 tails we would probably not be very surprised. Similarly, if we observed 40 heads and 60 tails, we would probably still not be very surprised, though it might seem more rare than the 48/52 scenario. However, if we observed 20 heads and 80 tails we might start to question the fairness of the coin.

This is essentially what the normal-ness of the sample distribution represents. For the coin example, we are likely to get about half heads and half tails. Outcomes farther away from the expected 50/50 result are less likely, and thus less expected. The normal distribution of the sampling distribution captures this concept.

The mean of the sampling distribution will approximate the mean of the true population distribution. Additionally, the variance of the sampling distribution is a function of both the population variance and the sample size used. A larger sample size will produce a smaller sampling distribution variance. This makes intuitive sense, as we are considering more samples when using a larger sample size, and are more likely to get a representative sample of the population. So roughly speaking, if the sample size used is large enough, there is a good chance that it will estimate the population pretty well. Most sources state that for most applications N = 30 is sufficient.

These principles can help us to reason about samples from any population. Depending on the scenario and the information available, the way that it is applied may vary. For example, in some situations we might know the true population mean and variance, which would allow us to compute the variance of any sampling distribution. However, in other situations, such as the original problem we discussed of estimating average human height, we won’t know the true population mean and variance. Understanding the nuances of sampling distributions and the Central Limit Theorem is an essential first step toward talking many of these problems.