“What is the empirical evidence for hypothesis \(x\) relative to other possible hypotheses?”
or
“Which hypothesis/model is closest to reality?”
Null hypothesis statistical tests
In contrast, Null Hypothesis Statistical Tests1 are designed to answer the question:
“What is the probability of observing my data if the null hypothesis were true?”
or
“How likely are my data, given the null hypothesis?”
So NHST starts with the hypothesis, and checks whether our field data comply
This is counter-intuitive! We would prefer to find out the strength of evidence for real (alternative) hypotheses that we’re actually interested in
A more natural approach is:
“How likely are my hypotheses, given the data?”
Information theory
The Information Theoretic approach starts with our data, and evaluates the strength of evidence in support of each hypothesis
Information Theory is based on Kullback-Leibler Information1, and allows us to:
Compare our models, each of which represents a discrete hypothesis
Select the best model, if one exists, and
Average the results of all models, if there is no single best model
Akaike’s Information Criterion (AIC)
Information Theory’s fundamental statistic is Akaike’s Information Criterion (AIC)
AIC is a numerical value representing the scientific evidence for a model1
Information Theory offers a simple and compelling approach:
Compute an AIC value for each model (hypothesis)
Compare models using their AIC values
The model with the smallest AIC value is the best fit to your field data
Calculate measures of the relative strength of evidence for each hypothesis
Infer the importance of predictor variables using all models
This enables us to evaluate the likelihood of our proposed hypothesis being correct, given what we observe in the field
AIC definition
\[
AIC = -2 log(L(\hat{\theta}) | data) + 2K
\]
Where:
\(\hat{\theta}\) = Parameter estimates from your model
\(L(\hat{\theta}) | data\) = The likelihood of your model, given the data
\(K\) = number of parameters1
The minus sign means that AIC value decreases as the likelihood of your model increases
How is AIC helpful?
Imagine AIC as:
The distance from the model to full reality (you want to minimise this distance), or
The amount of information lost by using that model to approximate reality
In both cases, the best model is the one:
Closest to reality (small AIC), or
That loses the smallest amount of information (small AIC)
AIC in our water deer case-study
Here’s the output from our model of constant density and detectability:
hn_Null # show distance model output
Call:
distsamp(formula = ~1 ~ 1, data = distUMF, keyfun = "halfnorm",
output = "density", unitsOut = "ha")
Density:
Estimate SE z P(>|z|)
-2.95 0.102 -28.9 9.56e-184
Detection:
Estimate SE z P(>|z|)
4.72 0.0617 76.5 0
AIC: 381.835
The AIC value is 381.8349836
AIC is for comparing models
This AIC value is only meaningful when compared with the AIC values of other models tested on the same data
AIC is only useful for comparing models
We cannot draw any conclusions from the AIC value of a single model in isolation, or from AIC values for models tested on different data
The value of AIC depends on the data, and so it’s only valid to compare models using AIC if they have been run on the same data1
Which model is best?
There is no simple rule to determine which model is best based on AIC values
We need to interpret the evidence for each model, and decide whether:
There is strong enough support for a single model, or
You should draw conclusions based on your entire model set
The materials in this module and the next take you through this judgement process
AIC is more objective than NHST p-values:
There are no arbitrary choices about what level of \(\alpha\) to compare your \(p\)-value with
Testing multiple hypotheses doesn’t increase your chance of getting a spurious significant result
However, the Information Theoretic approach still requires thought when selecting hypotheses to compare!
AIC and parsimony
\[
AIC = -2 log(L(\hat{\theta}) | data) + 2K
\]
AIC decreases when you include fewer parameters in your model, because the \(+2K\) component is smaller
Parsimony - use the simplest model capable of representing the information in our data
If you have two models that explain the same amount of variation in your field data, but one is simpler (fewer parameters), you should prefer the simpler model
This is because:
It’s easier to interpret simpler models
You’re less likely to be over-fitting your model by trying to explain every fine-scale pattern in the data
The precision of your parameter estimates will be higher
Without this ‘penalty’ for each parameter, more complex models would always be better, because each parameter reduces the amount of residual noise (lower the AIC), even if the extra parameters are not biologically informative
AIC small sample correction (AICc)
AICc is the AIC value corrected for small sample sizes
AICc is a better estimator when the number of parameters is large compared to the sample size