Assess model fit

Is our best model useful?

We’ve learned how to evaluate our models in compared to each other using Delta AIC

However, our set of candidate models might omit vital influences on density, or only model subtle or trivial patterns in the data

What if even our best model is a poor fit - a terrible approximation of reality?! 😱

How can we tell how good our models are?  👼 or 👿 ?

We need to assess how well our model set explains the patterns in our field data:

  • Are we modelling substantial information in the data?
  • Or are we simply modelling noise well?

Over-fitting

A poor model may suffer from over-fitting, or under-fitting

Over-fitting happens when you include more parameters in your model than your sample size can support

  • Overly parameterised models try to fit all of the residual variation, and so reduce our ability to generalise the model to other samples or systems
  • It can be hard to diagnose over-fit models

Under-fitting

Under-fitting occurs when your models contain too few, or the wrong, parameters

  • Information in the dataset is wasted, remaining mixed up with residual noise
  • Under-fit models can give you highly precise, but wrong, answers

Avoiding problems

Both over- and under-fitting can be avoided by:

  1. Careful development of hypotheses based on your knowledge of ecological systems
  2. Selecting predictor variables that are likely to be causal factors and therefore closely related to your response variable
  3. Ensuring you collect sufficient data to test your most complex hypothesis, or
  4. Simplifying your hypotheses so they are appropriate for the amount of data you have

Assess model fit

Before we further investigate our results and draw final conclusions, we should assess the fit of our global, or most general, model1 in two ways:

  1. Examine the R-squared value of our global model to check how much of the variation in our field data the model explains
  2. Run a Goodness of Fit (GoF) test, which uses the parameter estimates from the global model to simulate multiple datasets

R-squared

The R-squared value for the global model tells you how much of the variation in your field data is explained by your parameters

This helps you decide whether:

  • You’ve included parameters that are useful in modelling the real world (high R-squared)
  • You’re unable to adequately model what’s happening (low R-squared)

If you are concerned that the global model is a poor fit, include a model with little or no structure (null model)

This null model will be relatively implausible if the global model is useful

Goodness of Fit

GoF assesses where the real data are placed in comparison to the simulated data

  • Do the real data sit happily in the middle of the simulations?
  • Or are they an extremely rare outcome of our parameter estimates?

If our real data are at one extreme of the simulated data, and have a very low probability, this indicates that all our models are inadequate to describe our field observations

Chi-square GoF

We use a Chi-Square test to assess how far our real data lie from the simulated data:

High probabilities, indicated by large p-values, tell us the general model explains our data well (nothing to worry about) 😅

A non-significant p-value tells us that our field data don’t deviate markedly from what our model predicts, and our global model is helpful to some degree in describing the ecological system we’re investigating

Low probabilities, indicated by small p-values, indicate the model is poor at predicting our real field data (real data is outlier) 😓

A significant Goodness of Fit test (\(p < 0.05\)) tells us we failed to generate an accurate detection function, or identify what factors determine the species’ density

Run a GoF test

Let’s run a Goodness of Fit check on our global model, hazLC_DTC1

We can do this by running a parametric bootstrap using unmarked’s parboot() function:

1GOF <- parboot(hazLC_DTC,
2    nsim = 100)
1
Global (most parameterised) model
2
Number of simulations

Bootstrapping takes time! Be patient

Bootstrapping with a higher number of simulations gives you a more robust goodness of fit test

Here we use only 100 simulations, but you can increase this to 1000 or more if 100 simulations run in less than a minute on your computer

Check the fit of the global model

Let’s view the output of our Goodness of Fit test:

GOF # view output

Call: parboot(object = hazLC_DTC, nsim = 100)

Parametric Bootstrap Statistics:
     t0 mean(t0 - t_B) StdDev(t0 - t_B) Pr(t_B > t0)
SSE 157           30.4             25.7        0.129

t_B quantiles:
     0% 2.5% 25% 50% 75% 97.5% 100%
[1,] 80   92 108 120 142   186  193

t0 = Original statistic computed from data
t_B = Vector of bootstrap samples

The important value here is the p-value, called Pr(t_B > t0)

Our p-value of 0.129 is greater than 0.05, and is therefore non-significant

This results means that our field data lie within the datasets simulated from our global model, rather than being an extremely unlikely event that lies beyond the normal range of predictions from our global model

We can conclude that our global model fits the field data sufficiently well for us to trust all the models in this analysis 🙌