Model selection strategy

Interpreting evidence

Interpreting the evidence from Information Theory is a value judgement

There are no hard rules about when to reject models or hypotheses if there’s not a clear best model

Job candidates analogy

💭 As an illustration, imagine the situation where you’re trying to predict which of two candidates will be awarded a job

The first candidate has a Masters degree from a well-known university, and is familiar with the required ecological theory
The second only has an undergraduate degree from a little-known university

Knowing this, you might be confident that the first candidate would get the post

However, if both candidates were more evenly matched, with Masters degrees and relevant field experience, it’s much harder to judge who is the strongest candidate, and to predict who will be awarded the job

The same can happen with your models:

Sometimes your analysis will clearly show that one model is much closer fit to your field data than its competitors
In other cases, the candidate models may have similar levels of evidence in support of them, making it harder to decide which model fits best

A model selection strategy

Let’s bring together all the theory and practical knowledge we’ve gained to formulate a multi-stage process to choose between hypotheses using Information Theory

This approach works for any situation in which you use AIC to evaluate and compare statistical models, not just distance sampling

Assuming that you have constructed your candidate model set carefully, we suggest the following strategy:

Assess Goodness of Fit
Examine dAICc values
Examine model weights
Check summed weights for covariates
Examine coefficient estimates
Compare model LogLikelihoods
Compute evidence ratios

1. Assess Goodness of Fit

Assess the Goodness of Fit of the global model

Use R-squared and/or chi-squared values to assess Goodness of Fit

If none of your models fit well, Information Theory will only choose the most parsimonious from your set of poor models, which doesn’t add to our understanding of the ecological system

2. Examine dAICc values

Examine the dAICc values in your model comparison table

Is there a stand-out best model with all other models having large dAIC values (>12), or
Are there several/many models with small dAIC values?

3. Examine model weights

Examine the Akaike weights (probabilities) for each model

Do you have:

A single best model with a high model probability (>0.9)
Several models with high weight, perhaps all containing an important covariate, or
Many models with low weights?

4. Check covariate summed weights

Check the summed weights for each covariate

Do the covariates differ in their importance?
Do you have some covariates which are in all of the best models, and are omitted from the worst models?

5. Examine coefficient estimates

Examine the coefficient estimates for each model relative to their Standard Errors and Confidence Intervals

Are the estimates close to zero, or similar for all categories?
Are the Standard Errors large relative to the coefficient estimates?
Are the Confidence Intervals wide and/or centred on zero?

6. Compare model LogLikelihoods

Compare the LogLikelihood or deviance values of models

Does adding a given covariate increase the LogLikelihood by more than the penalty of -2 imposed for the extra parameter?

Increasing the LogLikelihood by more than -2 suggests that the parameter is helpful in explaining patterns in your field data, rather than being a ‘pretending’ variable (Anderson 2008) which only gives a small improvement in model fit

7. Compute evidence ratios

Importance of covariates

If you’re still uncertain about the value of a covariate, use evidence ratios to compare pairs of models with and without it

Difficulties selecting a model

An inability to select between models suggests either that:

The data are inadequate to distinguish between your hypotheses, or
None of your hypotheses are sufficient to explain reality

If this happens, rely on your understanding of the ecology and inter-relationships between your hypotheses and covariates to make an informed decision, or acknowledge that your data are inadequate to draw clear conclusions

Ecological systems are complex, and research needs to be well designed to add to our understanding of a system

Prefer parsimony!

If you have several models with similar evidence supporting each, you may prefer the more simple model if it:

Predicts well
Includes covariates that directly measure the ecological process you wish to explain
Captures the main patterns in your data (Anderson 2008)

Always be aware that there is uncertainty in your selection of the best model, because the AIC values rely on the particular dataset you collected during that survey

Is it possible that the weightings and ranks could change if you repeated the survey?