In this lesson you will learn to:
(Video 3 mins 10 sec)
(Video 3 mins 57 sec)
H0 Testing | Court of Law | |
---|---|---|
Question: | Is H0 false? | Is the defendant guilty? |
Starting position: | H0 is presumed 'true' | Defendant is presumed innocent until proven guilty |
Evidence: | Test is based solely upon sample data | Decision of jury based solely upon the presented evidence |
Errors: False negative | Insufficient evidence to find H0 false | Guilty defendant is found inocent |
Errors: False positive | H0 is incorrectly found false | An inocent defendant is found guilty |
Below is a summary of the two broad methodologies. Fisher's methodology in black and Neyman and Pearson's extension in blue.
An effect is often said to be statistically significant when H0 is rejected.
In the end, the scientist uses the results together with other evidence, and estimated effect sizes, to draw a biological conclusion from the analysis.
# Fit models for the hypothesis and the null-hypothesis
m1 = lm(HEIGHT~1+SEX, data=human) # Hypothesis model
m0 = lm(HEIGHT~1, data=human) # Null-hypothesis model
# Compare these two models
anova(m0, m1)
Code to estimate effect sizes using emmeans
package.
library(emmeans) # load emmeans package
# Estimate effect of SEX on HEIGHT
m_effect = emmeans(m1, spec='SEX')
# Print effect sizes and 95% confidence intervals
confint(m_effect)
We have already seen this example when discussing statistical modelling.
Question: Is our dice biased?
The two possible answers to this question can be written as hypotheses:
Hypothesis (H1): The dice is biased.
Null-Hypothesis (H0): The dice is unbiased.
I have a dice that I have rolled 20 times, giving these results:
This is a sample of 20 observations from the population.
In this case, the population is an infinite number of rolls of this dice (we will never have complete information about this population).
We will use our measure of bias as a test statistic: $$ \text{Bias} = \sum_{i=1}^6 (p_i - 1/6)^2 $$ where $p_i$ is the observed relative frequeny of the ith outcome.
For our data sample:
$p_1=0.3$, $p_2=0.1$, $p_3=0.25$, $p_4=0.1$, $p_5=0.1$, $p_6=0.15$
Giving $$ \text{Bias} = 0.0383 $$
What is the probability that a fair dice will give a bias as large as 0.0383? (the answer to this is the p-value)
Below is the distribution of bias from a fair dice (this distribution is calculated from the categorical distribution).
Using this distribution, the p-value is calculated as $$ p = 0.51 $$
The p-value (=0.51) is greater than 5%
So we fail to reject our null-hypothesis
We have no evidence from our data that this dice is biased
This is the example discussed in the videos
Question:
Is the average height of adult men diffreent from adult women?
The two possible answers to this question can be written as hypotheses:
Hypothesis (H1): The average height of adult women differs from that of adult men
Null-Hypothesis (H0): The average height of adult women is the same as that of adult men.
A general linear model is an appropriate statisical model. So we fit general linear models for the H1 and H0 hypotheses and use an F-ratio as our test statistic.
# Fit models for the hypothesis and the null-hypothesis
m1 = lm(HEIGHT~1+SEX, data=human) # Hypothesis model
m0 = lm(HEIGHT~1, data=human) # Null-hypothesis model
# Compare these two models
anova(m0, m1)
The anova command calculates the F-ratio to be 2238.
This F-ratio means that there is a large distance between the two models.
Below is the distribution of F-ratios that we would expect if the null-hypothesis were true.
Notice that the smallest possible F-ratio is zero.
An F-ratio greater than 10 is exceedingly unlikely.
Using this distribution, the p-value is calculated as
p<10-15
The p-value (p<10-15) is well below 5%
(it is so small the computer can only give an upper bound).
So we reject the null-hypothesis.
It is also important to state the estimated effect size and its uncertainty.