Null Hypothesis, P-Value and Significance Level

What is the null hypothesis?
That's easy: It is the assumption, that an observation is simply due to chance. The contrary assumption, that an observation is NOT due to chance, is called the alternative hypothesis.

What is it used for?
It is used to formulate a data science problem of the contrary form: Could it be that there is something beyond chance in my observation? Can I build up confidence that there is "something special" in the data (the alternative hypothesis)?
To answer the question, we use a classical mathematical trick: We assume that the null hypothesis is true, so we work on purely random observations. Knowing the rules for chance, we could then calculate the probability for a value as good as the observed value, the so called $p$-value, given the null hypothesis is true. If this probability $p$ is high, then it is not a good idea to reject it. On the other hand, if $p$ is low then we would say that the assumption is not plausible and found a way to reject the null hypothesis - we gained confidence that the alternative hypothesis is valid.

When do we reject the null hypothesis?
If the $p$-value falls below a critical level, the so called significance level $\alpha$. Usual values are $\alpha = 0.05$ or $\alpha = 0.025$. In this case the probability is too low to accept the null hypothesis, the deviation from it is statistically significant.
Assume that the probability for the observation under the null hypothesis is only around 1% , in this case we are pretty confident (99%) that we should reject the null hypothesis. Therefore the $q$-value $q = 1 - p$ (here 99%) is also called the confidence.

If you are interested in how to calculate the $p$-value using the $z$-values, check out this post about the z-scores.

Modern Data Analysis

Null Hypothesis, P-Value and Significance Level

Mirko

0 Comment