Diving into Hypothesis Testing and the Maths behind it

8 min readSep 16, 2022

Photo by Lukas: https://www.pexels.com/photo/graph-and-line-chart-printed-paper-590045/

Often we heard from our product analysts and managers that we have validated the hypothesis hence we are going with this change. Hypothesis testing in statistics is a way to test the results of a survey or experiment to see if we have meaningful results. We’re basically testing whether the results are valid by figuring out the odds that your results have happened by chance. If your results may have occurred by chance, the experiment won’t be repeatable and so has little use. Ok, this may seem confusing but we will break it down and will see the mathematics behind this Hypothesis testing. It is a very powerful tool in an analyst’s box. But before starting with Hypothesis testing there are a few fundamentals that need to be cleared.

Mean: Mean is one of the measures of central tendencies. It is basically the average of the data. Mean is the sum of data divided by the total number of data.

Standard Deviation: Standard Deviation (SD) is the measure of how far numbers are apart in a distribution. It is the square root of variance which is the average squared difference from the mean.

Density curve: This is the distribution of data points in a graph where the y-axis is the frequency and the x-axis is the data value. For example in the below diagram suppose we take a data point “0.50–0.52”. We can see that its frequency is 16%, which means it will occur 16% of the time. And if we slowly decrease the interval between the x-axis the discrete histogram will be converted to a continuous density curve. In this case, it will become a normal distribution that comes across in many places. The mean of the normal distribution is the x point which is in the middle and has the highest frequency/probability.

Z- score: Z score is how many SD away from the mean a particular data/x point is. Zscore = (x-mean)/SD

Central Limit Theorem: The central limit theorem states that if you take sufficiently large samples from a population, the samples’ means will be normally distributed, even if the population isn’t normally distributed. So what does this imply? Suppose we have any distribution if we take “n” samples from it and find its mean and if we repeat this step many times and plot all the means in the density curve then the resultant density curve would be normal and as n tends to infinity it would be more normal with less SD. This is a very useful theorem in statistics and has use cases in many places. There is a good tool to visualize this.

Taken from https://onlinestatbook.com/stat_sim/sampling_dist/

In the above diagram, we have taken a random discrete density curve for the parent distribution. The last and second last graph is the sample distribution of the sample mean of the parent population with n=5 and n=25 respectively. Let us understand it thoroughly for the case of n=5.

From the parent distribution, we take 5 elements and repeat them many times.

S1 = [2,10,20,25,12] mean = 13.8

S2 = [7,13,22,26,11] mean = 15.8

Similarly, we calculate S3, S4, …… After we have taken say 10000 samples and plot their means in a density curve. We observe that it will tend to become a normal distribution with the mean tending to the mean of the parent population and standard deviation tending to the SD of the parent population divided by the root of n i.e 5 in this case. This can be easily proven with some mathematical manipulation. You can see the proof here. This can also be verified from the diagram above where you can see the parent population mean and the mean of the sampling distribution of sample means are almost equal and the SD is 1/sqrt(n) times the population SD.

μM (mean of the sampling distribution of sample mean)= μ (mean of the population)

σM (SD of the sampling distribution of sample mean) = σ/√n (SD of the population divided by sqrt of n ie number of data taken at each iteration)

As “n” grows larger the SD of sampling distribution decreases, this is intuitive also, if we take more numbers from a population and avg it there will be more chance that it will be closer to the population mean. The main takeaways from CLT are the two formulas mentioned above and the fact that the sample distribution of the sample mean is normal.

Now we have enough understanding to dive into Hypothesis testing.

The easiest way to understand this would be by taking an example.

Suppose you run an e-commerce app. Your app has an add-to-cart button. The color of the button is red. And the average number of clicks of that add-to-cart button you were getting per day per user was 20. Now you want to change the color of the button to blue and you put a hypothesis that this will increase the number of clicks to the add-to-cart button. Now you have to validate that. You rolled out this change to a few users and collected 100 samples. You found the average to be 25 clicks with a standard deviation of 12.5. You are happy now, but you still have to validate this as this might be out of chance also.

So this is a common question in every business whether to go with this change or that change and what will the best for the users. Hypothesis testing helps us a little bit to solve these questions. Let’s look into the solution then.

The first thing is to find the Null hypothesis(Ho) and the Alternative Hypothesis(Ha).

Ho = The avg number of clicks is 20 i.e. μ=20

Ha = The avg number of clicks is greater than 20 i.e. μ>20

The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

As everything in mathematics goes we will assume the null hypothesis to be true. So we assume that the avg click is 20 i.e. μ=20 for the distribution of add-to-cart clicked after the color is changed to blue. Now from the central limit theorem we know if we take samples of size 100 from this distribution and calculate its mean then plot the mean vs frequency then we will get a normal distribution. We are basically plotting the sample distribution of the sample mean and one of those values will have 25 as mean which we have collected earlier. Let’s break it down.

We assume the parent population distribution mean is μ=20

We pick samples of size N=100 from the population and plot the mean of each of them in the density curve as shown below.

As you can see from the diagram the last graph is a normal distribution and we get a 25 as one of the sample means of the sample of 100 users.

We need to know one last thing i.e. p-value

A p-value is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.

Here p-value is the probability of getting 25 or above i.e. the area between 25 and the right tail.

We also have to set a significance level (α). Usually, we set this to 5% or 0.05

The significance level or alpha level is the probability of making the wrong decision when the null hypothesis is true.

Hypothesis testing tells if the p-value is less than the significance level (α) then we reject the null hypothesis else we would fail to reject the null hypothesis. Ok so this is a bit confusing, but let's think intuitively. What it says is if we assume the null hypothesis to be true, then what will the probability of getting 25 or greater clicks? If this probability is greater than our significance level (α) then we can argue that the 25 clicks were just by chance and we do not reject the null hypothesis. And the avg clicks after converting the color of the add-to-cart button to blue would be more or less the same as before. But if the probability is less than the significance level then we can argue that the null hypothesis is not true and we reject the null hypothesis.

Coming back to our example we have to find the p-value.

First, we need to find the z score for the 25 clicks samples

We know from CLT that μM = μ and σM = σ/√n

We know the population means μ is 20, but we don’t know the population SD σ we can estimate this to the sample SD which is 12.5 given in the question. So z-score can be calculated as follows

z = (25–20) / (12.5/sqrt(100)) = 5*10/12.5 = 4

Now p-value = Probability of z greater than 4

So z gives us how many SD away we are from the mean and we are trying to find the probability of getting z greater than 4.

P(z>4) = 0.00003 (This we have calculated from the z table)

As we can see — p-value < significance level (α)

So we reject the null hypothesis and we can conclude that the number of clicks had certainly increased because of the change in color of the add-to-cart.

But there can still be errors in our conclusion. There can be Type I error or Type II error. Type I error is rejecting the null hypothesis even though it is true and Type II error is not rejecting the null hypothesis even it is false. You can see more about this error here.

Conclusion

Hypothesis testing is a very useful tool. In the above example, we have taken the example of a one-tailed test. There can also be two-tailed test where the alternative hypothesis changes and instead of Ha the avg number of clicks is greater than 20 we would have taken Ha the avg number of clicks is not equal to 20. This is intuitive as now where we check in the right tail of the normal distribution we would also have checked the left tail also and calculated the total probability on both sides. More about the two-tailed and one-tailed tests can be found here

Thanks

Diving into Hypothesis Testing and the Maths behind it

Written by Harsh Sinha