15 Generalized linear mixed models
15.1 Introduction
In chapter 13 on linear mixed effect models we have introduced how to analyze metric outcome variables for which a normal error distribution can be assumed (potentially after transformation), when the data have a hierarchical structure and, as a consequence, observations are not independent. In chapter 14 on generalized linear models we have introduced how to analyze outcome variables for which a normal error distribution can not be assumed, as for example binary outcomes or count data. More precisely, we have extended modelling outcomes with normal error to modelling outcomes with error distributions from the exponential family (e.g., binomial or Poisson). Generalized linear mixed models (GLMM) combine the two complexities and are used to analyze outcomes with a non-normal error distribution when the data have a hierarchical structure. In this chapter, we will show how to analyze such data. Remember, a hierarchical structure of the data means that the data are collected at different levels, for example smaller and larger spatial units, or include repeated measurements in time on a specific subject. Typically, the outcome variable is measured/observed at the lowest level but other variables may be measured at different levels. A first example is introduced in the next section.
15.1.1 Binomial Mixed Model
15.1.1.1 Background
To illustrate the binomial mixed model we use a subset of a data set used by Grüebler, Korner-Nievergelt, and Von Hirschheydt (2010) on barn swallow Hirundo rustica nestling survival (we selected a nonrandom sample to be able to fit a simple model; hence, the results do not add unbiased knowledge about the swallow biology!). For 63 swallow broods, we know the clutch size and the number of the nestlings that fledged. The broods came from 51 farms (larger unit), thus some of the farms had more than one brood. Note that each farm can harbor one or several broods, and the broods are nested within farms (as opposed to crossed, see chapter 13), i.e., each brood belongs to only one farm. There are three predictors measured at the level of the farm: colony size (the number of swallow broods on that farm), cow (whether there are cows on the farm or not), and dung heap (the number of dung heaps, piles of cow dung, within 500 m of the farm). The aim was to assess how swallows profit from insects that are attracted by livestock on the farm and by dung heaps. Broods from the same farm are not independent of each other because they belong to the same larger unit (farm), and thus share the characteristics of the farm (measured or unmeasured). Predictor variables were measured at the level of the farm, and are thus the same for all broods from a farm. In the model described and fitted below, we account for the non-independence of these clutches when building the model by including a random intercept per farm to model random variation between farms. The outcome variable is a proportion (proportion fledged from clutch) and thus consists of two values for each observation, as seen with the binomial model without random factors (Section 14.2.2): the number of chicks that fledged (successes) and the number of chicks that died (failures), i.e., the clutch size minus number that fledged.
The random factor “farm” adds a farm-specific deviation \(b_g\) to the intercept in the linear predictor. These deviations are modeled as normally distributed with mean \(0\) and standard deviation \(\sigma_g\).
\[ y_i \sim binomial\left(p_i, n_i\right)\\ logit\left(p_i\right) = \beta_0 + b_{g[i]} + \beta_1\;colonysize_i + \beta_2\;I\left(cow_i = 1\right) + \beta_3\;dungheap_i\\ b_g \sim normal\left(0, \sigma_g\right) \]
# Data on Barn Swallow (Hirundo rustica) nestling survival on farms
# (a part of the data published in Grüebler et al. 2010, J Appl Ecol 47:1340-1347)
library(blmeco)
data(swallowfarms)
#?swallowfarms # to see the documentation of the data set
dat <- swallowfarms
str(dat)
## 'data.frame': 63 obs. of 6 variables:
## $ farm : int 1001 1002 1002 1002 1004 1008 1008 1008 1010 1016 ...
## $ colsize: int 1 4 4 4 1 11 11 11 3 3 ...
## $ cow : int 1 1 1 1 1 1 1 1 0 1 ...
## $ dung : int 0 0 0 0 1 1 1 1 2 2 ...
## $ clutch : int 8 9 8 7 13 7 9 16 10 8 ...
## $ fledge : int 8 0 6 5 9 3 7 4 9 8 ...
## [1] 51
15.1.1.2 Fitting a Binomial Mixed Model in R
15.1.1.2.1 Using the glmer function
dat$colsize.z <- scale(dat$colsize) # z-transform values for better model convergence
dat$dung.z <- scale(dat$dung)
dat$die <- dat$clutch - dat$fledge
dat$farm.f <- factor(dat$farm) # for clarity we define farm as a factor
The glmer
function uses the standard way to formulate a statistical model in R, with the outcome on the left, followed by the ~
symbol, meaning “explained by”, followed by the predictors, which are separated by +
. The notation for the random factor with only a random intercept was introduced in chapter 13 and is (1|farm.f)
here.
Remember that for fitting a binomial model we have to provide the number of successful events (number of fledglings that survived) and the number of failures (those that died) within a two-column matrix that we create using the function cbind
.
15.1.1.2.2 Assessing Model Assumptions for the glmer fit
The residuals of the model look fairly normal (top left panel of Figure 15.1 with slightly wider tails. The random intercepts for the farms look perfectly normal as they should. The plot of the residuals vs. fitted values (bottom left panel) shows a slight increase in the residuals with increasing fitted values. Positive correlations between the residuals and the fitted values are common in mixed models due to the shrinkage effect (chapter 13). Due to the same reason the fitted proportions slightly overestimate the observed proportions when these are large, but underestimate them when small (bottom right panel). What is looking like a lack of fit here can be seen as preventing an overestimation of the among farm variance based on the assumption that the farms in the data are a random sample of farms belonging to the same population.
The mean of the random effects is close to zero as it should. We check that because sometimes the glmer
function fails to correctly separate the farm-specific intercepts from the overall intercept. A non-zero mean of random effects does not mean a lack of fit, but a failure of the model fitting algorithm. In such a case, we recommend using a different fitting algorithm, e.g. brm
(see below) or stan_glmer
from the rstanarm
package.
A slight overdispersion (approximated dispersion parameter >1) seems to be present, but nothing to worry about.
par(mfrow=c(2,2), mar=c(3,5,1,1))
# check normal distribution of residuals
qqnorm(resid(mod.glmer), main="qq-plot residuals")
qqline(resid(mod.glmer))
# check normal distribution of random intercepts
qqnorm(ranef(mod.glmer)$farm.f[,1], main="qq-plot, farm")
qqline(ranef(mod.glmer)$farm.f[,1])
# residuals vs fitted values to check homoscedasticity
plot(fitted(mod.glmer), resid(mod.glmer))
abline(h=0)
# plot data vs. predicted values
dat$fitted <- fitted(mod.glmer)
plot(dat$fitted,dat$fledge/dat$clutch)
abline(0,1)
## [1] -0.001690303
## [1] 1.192931
15.1.1.2.3 Using the brm function
Now we fit the same model using the function brm
from the R package brms
. This function allows fitting Bayesian generalized (non-)linear multivariate multilevel models using Stan (Betancourt 2013) for full Bayesian inference. We shortly introduce the fitting algorithm used by Stan, Hamiltonian Monte Carlo, in chapter 18. When using the function brm
there is no need to install rstan
or write the model in Stan-language. A wide range of distributions and link functions are supported, and the function offers many things more. Here we use it to fit the model as specified by the formula object above.
Note that brm requires that a binomial outcome is specified in the format successes|trials()
, which is the number of fledged nestlings out of the total clutch size in our case. In contrast, the glmer
function required to specify the number of nestlings that fledged and died (which together sum up to clutch size), in the format cbind(successes, failures)
.
The family is also called binomial
in brm
, but would be bernoulli
for a binary outcome, whereas glmer
would use binomial in both situations (Bernoulli distribution is a special case of the binomial). However, it is slightly confusing that (at the time of writing this chapter) the documentation for brmsfamily
did not mention the binomial family under Usage, where it probably went missing, but it is mentioned under Arguments for the argument family.
Prior distributions are an integral part of a Bayesian model, therefore we need to specify prior distributions. We can see what default prior distributions brm
is using by applying the get_prior
function to the model formula. The default prior for the effect sizes is a flat prior which gives a density of 1 for any value between minus and plus infinity. Because this is not a proper probability distribution it is also called an improper distribution. The intercept gets a t-distribution with mean of 0, standard deviation of 2.5 and 3 degrees of freedoms. Transforming this t-distribution to the proportion scale (using the inverse-logit function) becomes something similar to a uniform distribution between 0 and 1 that can be seen as non-informative for a probability. For the among-farm standard deviation, it uses the same t-distribution as for the intercept. However, because variance parameters such as standard deviations only can take on positive numbers, it will use only the positive half of the t-distribution (this is not seen in the output of get_prior
). When we have no prior information on any parameter, or if we would like to base the results solely on the information in the data, we specify weakly informative prior distributions that do not noticeably affect the results but they will facilitate the fitting algorithm. This is true for the priors of the intercept and among-farm standard deviation. However, for the effect sizes, we prefer specifying more narrow distributions (see chapter 10). To do so, we use the function prior
.
To apply MCMC sampling we need some more arguments: warmup
specifies the number of iterations during which we allow the algorithm to be adapted to our specific model and to converge to the posterior distribution. These iterations should be discarded (similar to the burn-in period when using, e.g., Gibbs sampling); iter
specifies the total number of iterations (including those discarded); chains
specifies the number of chains; init
specifies the starting values of the iterations. By default (init=NULL
) or by setting init="random"
the initial values are randomly chosen which is recommended because then different initial values are chosen for each chain which helps to identify non-convergence. However, sometimes random initial values cause the Markov chains to behave badly.
Then you can either use the maximum likelihood estimates of the parameters as starting values, or simply ask the algorithm to start with zeros. thin
specifies the thinning of the chain, i.e., whether all iterations should be kept (thin=1) or for example every 4th only (thin=4); cores
specifies the number of cores used for the algorithm; seed
specifies the random seed, allowing for replication of results.
library(brms)
# check which parameters need a prior
get_prior(fledge|trials(clutch) ~ colsize.z + cow + dung.z + (1|farm.f),
data=dat,
family=binomial(link="logit"))
## prior class coef group resp dpar nlpar lb ub
## (flat) b
## (flat) b colsize.z
## (flat) b cow
## (flat) b dung.z
## student_t(3, 0, 2.5) Intercept
## student_t(3, 0, 2.5) sd 0
## student_t(3, 0, 2.5) sd farm.f 0
## student_t(3, 0, 2.5) sd Intercept farm.f 0
## source
## default
## (vectorized)
## (vectorized)
## (vectorized)
## default
## default
## (vectorized)
## (vectorized)
# specify own priors
myprior <- prior(normal(0,5), class="b")
mod.brm <- brm(fledge|trials(clutch) ~ colsize.z + cow + dung.z + (1|farm.f) ,
data=dat, family=binomial(link="logit"),
prior=myprior,
warmup = 500,
iter = 2000,
chains = 2,
init = "random",
cores = 2,
seed = 123)
# note: thin=1 is default and we did not change this here.
15.1.1.2.4 Checking model convergence for the brm fit
We first check whether we find warnings in the R console about problems of the fitting algorithm. Warnings should be taken seriously. Often, we find help in the Stan online documentation (or when typing launch_shinystan(mod.brm)
into the R-console) what to change when calling the brm
function to get a fit that is running smoothly. Once, we get rid of all warnings, we need to check how well the Markov chains mixed. We can either do that by scanning through the many diagnostic plots given by launch_shinystan(mod)
or create the most important plots ourselves such as the traceplot (Figure 15.2).
15.1.1.2.5 Checking model fit by posterior predictive model checking
To assess how well the model fits to the data we do posterior predictive model checking (Chapter 16). For binomial as well as for Poisson models comparing the standard deviation of the data with those of replicated data from the model is particularly important. If the standard deviation of the real data would be much higher compared to the ones of the replicated data from the model, overdispersion would be an issue. However, here, the model is able to capture the variance in the data correctly (Figure 15.3). The fitted vs observed plot also shows a good fit.
yrep <- posterior_predict(mod.brm)
sdyrep <- apply(yrep, 1, sd)
par(mfrow=c(1,3), mar=c(3,4,1,1))
hist(yrep, freq=FALSE, main=NA, xlab="Number of fledglings")
hist(dat$fledge, add=TRUE, col=rgb(1,0,0,0.3), freq=FALSE)
legend(10, 0.15, fill=c("grey",rgb(1,0,0,0.3)), legend=c("yrep", "y"))
hist(sdyrep)
abline(v=sd(dat$fledge), col="red", lwd=2)
plot(fitted(mod.brm)[,1], dat$fledge, pch=16, cex=0.6)
abline(0,1)
After checking the diagnostic plots, the posterior predictive model checking and the general model fit, we assume that the model describes the data generating process reasonably well, so that we can proceed to drawing conclusions.
15.1.1.3 Drawing Conclusions
The generic summary
function gives us the results for the model object containing the fitted model, and works for both the model fitted with glmer
and brm
. Let’s start having a look at the summary from mod.glmer
.
The summary provides the fitting method, the model formula, statistics for the model fit including the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the scaled residuals, the random effects variance and information about observations and groups, a table with coefficient estimates for the fixed effects (with standard errors and a z-test for the coefficient) and correlations between fixed effects. We recommend to always check if the number of observations and groups, i.e., 63 barn swallow nests from 51 farms here, is correct. This information shows if the glmer
function has correctly recognized the hierarchical structure in the data. Here, this is correct. To assess the associations between the predictor variables and the outcome analyzed, we need to look at the column “Estimate” in the table of fixed effects. This column contains the estimated model coefficients, and the standard error for these estimates is given in the column “Std. Error”, along with a z-test for the null hypothesis of a coefficient of zero.
In the random effects table, the among farm variance and standard deviation (square root of the variance) are given.
The function confint
shows the 95% confidence intervals for the random effects (.sig01
) and fixed effects estimates.
In the summary
output from mod.brm
we see the model formula and some information on the Markov chains after the warm-up. In the group-level effects (between group standard deviations) and population-level effects (effect sizes, model coefficients) tables some summary statistics of the posterior distribution of each parameter are given. The “Estimate” is the mean of the posterior distribution, the “Est.Error” is the standard deviation of the posterior distribution (which is the standard error of the parameter estimate). Then we see the lower and upper limit of the 95% credible interval. Also, some statistics for measuring how well the Markov chains converged are given: the “Rhat” and the effective sample size (ESS). The bulk ESS tells us how many independent samples we have to describe the posterior distribution, and the tail ESS tells us on how many samples the limits of the 95% credible interval is based on.
Because we used the logit link function, the coefficients are actually on the logit scale and are a bit difficult to interpret. What we can say is that positive coefficients indicate an increase and negative coefficients indicate a decrease in the proportion of nestlings fledged. For continuous predictors, as colsize.z and dung.z, this coefficient refers to the change in the logit of the outcome with a change of one in the predictor (e.g., for colsize.z an increase of one corresponds to an increase of a standard deviation of colsize). For categorical predictors, the coefficients represent a difference between one category and another (reference category is the one not shown in the table). To visualize the coefficients we could draw effect plots.
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: cbind(fledge, die) ~ colsize.z + cow + dung.z + (1 | farm.f)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 282.5 293.2 -136.3 272.5 58
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2071 -0.4868 0.0812 0.6210 1.8905
##
## Random effects:
## Groups Name Variance Std.Dev.
## farm.f (Intercept) 0.2058 0.4536
## Number of obs: 63, groups: farm.f, 51
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.09533 0.19068 -0.500 0.6171
## colsize.z 0.05087 0.11735 0.434 0.6646
## cow 0.39370 0.22692 1.735 0.0827 .
## dung.z -0.14236 0.10862 -1.311 0.1900
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) clsz.z cow
## colsize.z 0.129
## cow -0.828 -0.075
## dung.z 0.033 0.139 -0.091
## 2.5 % 97.5 %
## .sig01 0.16809483 0.7385238
## (Intercept) -0.48398346 0.2863200
## colsize.z -0.18428769 0.2950063
## cow -0.05360035 0.8588134
## dung.z -0.36296714 0.0733620
## Family: binomial
## Links: mu = logit
## Formula: fledge | trials(clutch) ~ colsize.z + cow + dung.z + (1 | farm.f)
## Data: dat (Number of observations: 63)
## Draws: 2 chains, each with iter = 2000; warmup = 500; thin = 1;
## total post-warmup draws = 3000
##
## Group-Level Effects:
## ~farm.f (Number of levels: 51)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.55 0.15 0.26 0.88 1.00 858 1513
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.09 0.22 -0.52 0.34 1.00 3151 2341
## colsize.z 0.05 0.13 -0.21 0.30 1.00 2951 2043
## cow 0.40 0.25 -0.09 0.89 1.00 3269 2245
## dung.z -0.15 0.12 -0.39 0.08 1.00 2882 2257
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
From the results we conclude that in farms without cows (when cow=0) and for average colony sizes (when colsize.z=0) and average number of dung heaps (when dung.z=0) the average nestling survival of Barn swallows is the inverse-logit function of the Intercept, thus, plogis(
-0.09)
= 0.48 with a 95% uncertainty interval of 0.37 - 0.58. We further see that colony size and number of dung heaps are less important than whether cows are present or not. Their estimated partial effect is small and their uncertainty interval includes only values close to zero. However, whether cows are present or not may be important for the survival of nestlings. The average nestling survival in farms with cows is plogis(
-0.09+
0.4)
= 0.58. For getting the uncertainty interval of this survival estimate, we need to do the calculation for every simulation from the posterior distribution of both parameters.
bsim <- posterior_samples(mod.brm)
# survival of nestlings on farms with cows:
survivalest <- plogis(bsim$b_Intercept + bsim$b_cow)
quantile(survivalest, probs=c(0.025, 0.975)) # 95% uncertainty interval
## 2.5% 97.5%
## 0.5078202 0.6400263
In medical research, it is standard to report the fixed-effects coefficients from GLMM with binomial or Bernoulli error as odds ratios by taking the exponent (R function exp
for \(e^{()}\)) of the coefficient on the logit-scale. For example, the coefficient for cow from mod.glmer
, 0.39 (95% CI from -0.05 to -0.05), represents an odds ratio of exp(
0.39)=1.48 (95% CI from 0.95 to 0.95). This means that the odds for fledging (vs. not fledging) from a clutch from a farm with livestock present is about 1.5 times larger than the odds for fledging if no livestock is present (relative effect).