25 Capture-mark recapture model with a mixture structure to account for missing sex-variable for parts of the individuals

25.1 Introduction

In some species the identification of the sex is not possible for all individuals without sampling DNA. For example, morphological dimorphism is absent or so weak that parts of the individuals cannot be assigned to one of the sexes. Particularly in ornithological long-term capture recapture data sets that typically are obtained by voluntary bird ringers who do normaly not have the possibilities to analyse DNA, often the sex identification is missing in parts of the individuals. For estimating survival, it would nevertheless be valuable to include data of all individuals, use the information on sex-specific effects on survival wherever possible but account for the fact that of parts of the individuals the sex is not known. We here explain how a Cormack-Jolly-Seber model can be integrated with a mixture model in oder to allow for a combined analyses of individuals with and without sex identified. An introduction to the Cormack-Jolly-Seber model we gave in Chapter 14.5 of the book Korner-Nievergelt et al. (2015). We here expand this model by a mixture structure that allows including individuals with a missing categorical predictor variable, such as sex.

25.2 Data description

## simulate data
# true parameter values
theta  <- 0.6 # proportion of males
nocc <- 15    # number of years in the data set
b0 <- matrix(NA, ncol=nocc-1, nrow=2)
b0[1,] <- rbeta((nocc-1), 3, 4) # capture probability of males
b0[2,] <- rbeta((nocc-1), 2, 4) # capture probability of females  
a0 <- matrix(NA, ncol=2, nrow=2)
a1 <- matrix(NA, ncol=2, nrow=2)
a0[1,1]<- qlogis(0.7) # average annual survival for adult males
a0[1,2]<- qlogis(0.3) # average annual survival for juveniles
a0[2,1] <- qlogis(0.55) # average annual survival for adult females
a0[2,2] <- a0[1,2]
a1[1,1] <- 0
a1[1,2] <- -0.5
a1[2,1] <- -0.8
a1[2,2] <- a1[1,2]

nindi <- 1000    # number of individuals with identified sex
nindni <- 1500   # number of individuals with non-identified sex
nind <- nindi + nindni  # total number of individuals
y <- matrix(ncol=nocc, nrow=nind)
z <- matrix(ncol=nocc, nrow=nind)
first <- sample(1:(nocc-1), nind, replace=TRUE)
sex <- sample(c(1,2), nind, prob=c(theta, 1-theta), replace=TRUE)
juvfirst <- sample(c(0,1), nind, prob=c(0.5, 0.5), replace=TRUE)
juv <- matrix(0, nrow=nind, ncol=nocc)
for(i in 1:nind) juv[i,first[i]] <- juv[i]

x <- runif(nocc-1, -2, 2)                   # a time dependent covariate covariate 
p <- b0                                     # recapture probability
phi <- array(NA, dim=c(2, 2, nocc-1))
# for ad males
phi[1,1,] <- plogis(a0[1,1]+a1[1,1]*x)
# for ad females
phi[2,1,] <- plogis(a0[2,1]+a1[2,1]*x)
# for juvs
phi[1,2,] <-  phi[2,2,] <- plogis(a0[2,2]+a1[2,2]*x)
for(i in 1:nind){
  z[i,first[i]] <- 1
  y[i, first[i]] <- 1
  for(t in (first[i]+1):nocc){
    z[i, t] <- rbinom(1, size=1, prob=z[i,t-1]*phi[sex[i],juv[i,t-1]+1, t-1])
    y[i, t] <- rbinom(1, size=1, prob=z[i,t]*p[sex[i],t-1])
    }
}
y[is.na(y)] <- 0

The mark-recapture data set consists of capture histories of 2500 individuals over 15 time periods. For each time period \(t\) and individual \(i\) the capture history matrix \(y\) contains \(y_{it}=1\) if the individual \(i\) is captured during time period \(t\), or \(y_{it}=0\) if the individual \(i\) is not captured during time period \(t\). The marking time period varies between individuals from 1 to 14. At the marking time period, the age of the individuals was classified either as juvenile or as adult. Juveniles turn into adults after one time period, thus age is known for all individuals during all time periods after marking. For 1000 individuals of the 2500 individuals, the sex is identified, whereas for 1500 individuals, the sex is unknown. The example data contain one covariate \(x\) that takes on one value for each time period.

# bundle the data for Stan
i <- 1:nindi
ni <- (nindi+1):nind
datax <- list(yi=y[i,], nindi=nindi, sex=sex[i], nocc=nocc, 
              yni=y[ni,], nindni=nindni, firsti=first[i], firstni=first[ni],
              juvi=juv[i,]+1, juvni=juv[ni,]+1, year=1:nocc, x=x)

25.3 Model description

The observations \(y_{it}\), an indicator of whether individual i was recaptured during time period \(t\) is modelled conditional on the latent true state of the individual birds \(z_{it}\) (0 = dead or permanently emigrated, 1 = alive and at the study site) as a Bernoulli variable. The probability \(P(y_{it} = 1)\) is the product of the probability that an alive individual is recaptured, \(p_{it}\), and the state of the bird \(z_{it}\) (alive = 1, dead = 0). Thus, a dead bird cannot be recaptured, whereas for a bird alive during time period \(t\), the recapture probability equals \(p_{it}\): \[y_{it} \sim Bernoulli(z_{it}p_{it})\] The latent state variable \(z_{it}\) is a Markovian variable with the state at time \(t\) being dependent on the state at time \(t-1\) and the apparent survival probability \[\phi_{it}\]: \[z_{it} \sim Bernoulli(z_{it-1}\phi_{it})\] We use the term apparent survival in order to indicate that the parameter \(\phi\) is a product of site fidelity and survival. Thus, individuals that permanently emigrated from the study area cannot be distinguished from dead individuals. In both models, the parameters \(\phi\) and \(p\) were modelled as sex-specific. However, for parts of the individuals, sex could not be identified, i.e. sex was missing. Ignoring these missing values would most likely lead to a bias because they were not missing at random. The probability that sex can be identified is increasing with age and most likely differs between sexes. Therefore, we included a mixture model for the sex: \[Sex_i \sim Categorical(q_i)\] where \(q_i\) is a vector of length 2, containing the probability of being a male and a female, respectively. In this way, the sex of the non-identified individuals was assumed to be male or female with probability \(q[1]\) and \(q[2]=1-q[1]\), respectively. This model corresponds to the finite mixture model introduced by Pledger, Pollock, and Norris (2003) in order to account for unknown classes of birds (heterogeneity). However, in our case, for parts of the individuals the class (sex) was known.

In the example model, we constrain apparent survival to be linearly dependent on a covariate x with different slopes for males, females and juveniles using the logit link function. \[logit(\phi_{it}) = a0_{sex-age-class[it]} + a1_{sex-age-class[it]}x_i\]

Annual recapture probability was modelled for each year and age and sex class independently: \[p_{it} = b0_{t,sex-age-class[it]}\] Uniform prior distributions were used for all parameters with a parameter space limited to values between 0 and 1 (probabilities) and a normal distribution with a mean of 0 and a standard deviation of 1.5 for the intercept \(a0\), and a standard deviation of 5 was used for \(a1\).

25.4 The Stan code

The trick for coding the CMR-mixture model in Stan is to formulate the model 3 times:
1. For the individuals with identified sex
2. For the males that were not identified
3. For the females that were not identified

Then for the non-identified individuals a mixture model is formulated that assigns a probability of being a female or a male to each individual.

data {
  int<lower=2> nocc;                    // number of capture events
  int<lower=0> nindi;                   // number of individuals with identified sex
  int<lower=0> nindni;                  // number of individuals with non-identified sex
  int<lower=0,upper=2> yi[nindi,nocc];         // CH[i,k]: individual i captured at k
  int<lower=0,upper=nocc-1> firsti[nindi];      // year of first capture
  int<lower=0,upper=2> yni[nindni,nocc];       // CH[i,k]: individual i captured at k
  int<lower=0,upper=nocc-1> firstni[nindni];    // year of first capture
  int<lower=1, upper=2> sex[nindi];
  int<lower=1, upper=2> juvi[nindi, nocc];
  int<lower=1, upper=2> juvni[nindni, nocc];
  int<lower=1> year[nocc];
  real x[nocc-1];                     // a covariate 
}

transformed data {
  int<lower=0,upper=nocc+1> lasti[nindi];       // last[i]:  ind i last capture
  int<lower=0,upper=nocc+1> lastni[nindni];       // last[i]:  ind i last capture
  lasti = rep_array(0,nindi); 
  lastni = rep_array(0,nindni);
  for (i in 1:nindi) {
    for (k in firsti[i]:nocc) {
      if (yi[i,k] == 1) {
        if (k > lasti[i])  lasti[i] = k;
      }
    }
  }
  for (ii in 1:nindni) {
    for (kk in firstni[ii]:nocc) {
      if (yni[ii,kk] == 1) {
        if (kk > lastni[ii])  lastni[ii] = kk;
      }
    }
  }

}


parameters {
  real<lower=0, upper=1> theta[nindni];            // probability of being male for non-identified individuals
  real<lower=0, upper=1> b0[2,nocc-1];             // intercept of p
  real a0[2,2];                  // intercept for phi 
  real a1[2,2];                  // coefficient for phi   
}

transformed parameters {
  real<lower=0,upper=1>p_male[nindni,nocc];         // capture probability
  real<lower=0,upper=1>p_female[nindni,nocc];       // capture probability
  real<lower=0,upper=1>p[nindi,nocc];               // capture probability

  real<lower=0,upper=1>phi_male[nindni,nocc-1];   // survival probability
  real<lower=0,upper=1>chi_male[nindni,nocc+1];   // probability that an individual 
                                                  // is never recaptured after its
                                                  // last capture
  real<lower=0,upper=1>phi_female[nindni,nocc-1]; // survival probability
  real<lower=0,upper=1>chi_female[nindni,nocc+1]; // probability that an individual 
                                                  // is never recaptured after its
                                                   // last capture
  real<lower=0,upper=1>phi[nindi,nocc-1];   // survival probability
  real<lower=0,upper=1>chi[nindi,nocc+1];   // probability that an individual 
                                           // is never recaptured after its
                                           // last capture

  {
    int k; 
    int kk; 
    for(ii in 1:nindi){
      if (firsti[ii]>1) {
        for (z in 1:(firsti[ii]-1)){
          phi[ii,z] = 1;
        }
      }
      for(tt in firsti[ii]:(nocc-1)) {
        // linear predictor for phi:
        phi[ii,tt] = inv_logit(a0[sex[ii], juvi[ii,tt]] + a1[sex[ii], juvi[ii,tt]]*x[tt]); 

      }
    }

    for(ii in 1:nindni){
      if (firstni[ii]>1) {
        for (z in 1:(firstni[ii]-1)){
          phi_female[ii,z] = 1;
          phi_male[ii,z] = 1;
        }
      }
      for(tt in firstni[ii]:(nocc-1)) {
        // linear predictor for phi:
        phi_male[ii,tt] = inv_logit(a0[1, juvni[ii,tt]] + a1[1, juvni[ii,tt]]*x[tt]); 
        phi_female[ii,tt] = inv_logit(a0[2, juvni[ii,tt]]+ a1[2, juvni[ii,tt]]*x[tt]);

      }
    }
    
    for(i in 1:nindi) {
      // linear predictor for p for identified individuals
      for(w in 1:firsti[i]){
        p[i,w] = 1;
      }
      for(kkk in (firsti[i]+1):nocc)
        p[i,kkk] = b0[sex[i],year[kkk-1]];  
      chi[i,nocc+1] = 1.0;              
      k = nocc;
      while (k > firsti[i]) {
        chi[i,k] = (1 - phi[i,k-1]) + phi[i,k-1] * (1 - p[i,k]) * chi[i,k+1]; 
        k = k - 1;
      }
      if (firsti[i]>1) {
        for (u in 1:(firsti[i]-1)){
          chi[i,u] = 0;
        }
      }
      chi[i,firsti[i]] = (1 - p[i,firsti[i]]) * chi[i,firsti[i]+1];
    }// close definition of transformed parameters for identified individuals

    for(i in 1:nindni) {
      // linear predictor for p for non-identified individuals
      for(w in 1:firstni[i]){
        p_male[i,w] = 1;
        p_female[i,w] = 1;
      }
      for(kkkk in (firstni[i]+1):nocc){
        p_male[i,kkkk] = b0[1,year[kkkk-1]];  
        p_female[i,kkkk] = b0[2,year[kkkk-1]];
      }
      chi_male[i,nocc+1] = 1.0; 
      chi_female[i,nocc+1] = 1.0; 
      k = nocc;
      while (k > firstni[i]) {
        chi_male[i,k] = (1 - phi_male[i,k-1]) + phi_male[i,k-1] * (1 - p_male[i,k]) * chi_male[i,k+1]; 
        chi_female[i,k] = (1 - phi_female[i,k-1]) + phi_female[i,k-1] * (1 - p_female[i,k]) * chi_female[i,k+1]; 
        k = k - 1;
      }
      if (firstni[i]>1) {
        for (u in 1:(firstni[i]-1)){
          chi_male[i,u] = 0;
          chi_female[i,u] = 0;
        }
      }
      chi_male[i,firstni[i]] = (1 - p_male[i,firstni[i]]) * chi_male[i,firstni[i]+1];
      chi_female[i,firstni[i]] = (1 - p_female[i,firstni[i]]) * chi_female[i,firstni[i]+1];
    } // close definition of transformed parameters for non-identified individuals

    
  }  // close block of transformed parameters exclusive parameter declarations
}    // close transformed parameters

model {
  // priors
  theta ~ beta(1, 1);
  for (g in 1:(nocc-1)){
    b0[1,g]~beta(1,1);
    b0[2,g]~beta(1,1);
  }
  a0[1,1]~normal(0,1.5);
  a0[1,2]~normal(0,1.5);
  a1[1,1]~normal(0,3);
  a1[1,2]~normal(0,3);

  a0[2,1]~normal(0,1.5);
  a0[2,2]~normal(a0[1,2],0.01); // for juveniles, we assume that the effect of the covariate is independet of sex
  a1[2,1]~normal(0,3);
  a1[2,2]~normal(a1[1,2],0.01);

  // likelihood for identified individuals
  for (i in 1:nindi) {
    if (lasti[i]>0) {
      for (k in firsti[i]:lasti[i]) {
        if(k>1) target+= (log(phi[i, k-1])); 
        if (yi[i,k] == 1) target+=(log(p[i,k]));   
        else target+=(log1m(p[i,k]));  
      }
    }  
    target+=(log(chi[i,lasti[i]+1]));
  }
  
  // likelihood for non-identified individuals
  for (i in 1:nindni) {
    real log_like_male = 0;
    real log_like_female = 0;

    if (lastni[i]>0) {
      for (k in firstni[i]:lastni[i]) {
        if(k>1){
          log_like_male += (log(phi_male[i, k-1]));
          log_like_female += (log(phi_female[i, k-1]));
        }
        if (yni[i,k] == 1){ 
          log_like_male+=(log(p_male[i,k]));
          log_like_female+=(log(p_female[i,k]));
        }
        else{
          log_like_male+=(log1m(p_male[i,k])); 
          log_like_female+=(log1m(p_female[i,k])); 
        }

      }
    }  
    log_like_male += (log(chi_male[i,lastni[i]+1]));
    log_like_female += (log(chi_female[i,lastni[i]+1]));
    
    target += log_mix(theta[i], log_like_male, log_like_female);
  }

}

25.5 Call Stan from R, check convergence and look at results

# Run STAN
library(rstan)
fit <- stan(file = "stanmodels/cmr_mixture_model.stan", data=datax, verbose = FALSE)
# for above simulated data (25000 individuals x 15 time periods) 
# computing time is around 48 hours on an intel corei7 laptop
# for larger data sets, we recommed moving the transformed parameters block 
# to the model block in order to avoid monitoring of p_male, p_female, 
# phi_male and phi_female producing memory problems

# launch_shinystan(fit) # diagnostic plots
summary(fit)
##                 mean      se_mean         sd        2.5%         25%
## b0[1,1]   0.60132367 0.0015709423 0.06173884  0.48042366  0.55922253
## b0[1,2]   0.70098709 0.0012519948 0.04969428  0.60382019  0.66806698
## b0[1,3]   0.50293513 0.0010904085 0.04517398  0.41491848  0.47220346
## b0[1,4]   0.28118209 0.0008809447 0.03577334  0.21440931  0.25697691
## b0[1,5]   0.34938289 0.0009901335 0.03647815  0.27819918  0.32351323
## b0[1,6]   0.13158569 0.0006914740 0.02627423  0.08664129  0.11286629
## b0[1,7]   0.61182981 0.0010463611 0.04129602  0.53187976  0.58387839
## b0[1,8]   0.48535193 0.0010845951 0.04155762  0.40559440  0.45750793
## b0[1,9]   0.52531291 0.0008790063 0.03704084  0.45247132  0.50064513
## b0[1,10]  0.87174780 0.0007565552 0.03000936  0.80818138  0.85259573
## b0[1,11]  0.80185454 0.0009425675 0.03518166  0.73173810  0.77865187
## b0[1,12]  0.33152443 0.0008564381 0.03628505  0.26380840  0.30697293
## b0[1,13]  0.42132288 0.0012174784 0.04140382  0.34062688  0.39305210
## b0[1,14]  0.65180372 0.0015151039 0.05333953  0.55349105  0.61560493
## b0[2,1]   0.34237039 0.0041467200 0.12925217  0.12002285  0.24717176
## b0[2,2]   0.18534646 0.0023431250 0.07547704  0.05924694  0.12871584
## b0[2,3]   0.61351083 0.0024140550 0.07679100  0.46647727  0.56242546
## b0[2,4]   0.37140208 0.0024464965 0.06962399  0.24693888  0.32338093
## b0[2,5]   0.19428215 0.0034618302 0.11214798  0.02800056  0.11146326
## b0[2,6]   0.27371336 0.0026553769 0.09054020  0.11827243  0.20785316
## b0[2,7]   0.18611173 0.0014387436 0.05328492  0.09122869  0.14789827
## b0[2,8]   0.25648337 0.0018258589 0.05287800  0.16255769  0.21913271
## b0[2,9]   0.20378754 0.0021367769 0.07380004  0.07777998  0.15215845
## b0[2,10]  0.52679548 0.0024625568 0.08696008  0.36214334  0.46594844
## b0[2,11]  0.47393354 0.0032593161 0.10555065  0.28843967  0.39781278
## b0[2,12]  0.22289155 0.0017082729 0.05551514  0.12576797  0.18203335
## b0[2,13]  0.26191486 0.0024159794 0.07016314  0.14106495  0.21234017
## b0[2,14]  0.65111737 0.0055743944 0.18780555  0.29279480  0.50957591
## a0[1,1]   0.95440670 0.0013771881 0.04808748  0.86301660  0.92146330
## a0[1,2]   0.01529770 0.0469699511 1.46995922 -2.82218067 -0.95533706
## a0[2,1]   0.16384995 0.0049928331 0.12634422 -0.06399631  0.07533962
## a0[2,2]   0.01535679 0.0469634175 1.47006964 -2.81864060 -0.95515751
## a1[1,1]   0.15937249 0.0028992587 0.08864790 -0.01288607  0.10017613
## a1[1,2]   0.08055953 0.1007089857 3.02148727 -5.95525636 -1.96662599
## a1[2,1]  -0.83614134 0.0074143920 0.18655882 -1.21033848 -0.95698565
## a1[2,2]   0.08071668 0.1006904255 3.02145647 -5.94617355 -1.96508733
##                  50%        75%      97.5%     n_eff     Rhat
## b0[1,1]   0.60206306  0.6431566  0.7206343 1544.5301 1.002331
## b0[1,2]   0.70165494  0.7355204  0.7946280 1575.4617 1.001482
## b0[1,3]   0.50367411  0.5330078  0.5898079 1716.3196 1.001183
## b0[1,4]   0.27997512  0.3046483  0.3544592 1649.0040 1.000760
## b0[1,5]   0.34936442  0.3751935  0.4191138 1357.3073 1.002072
## b0[1,6]   0.12987449  0.1481661  0.1873982 1443.8040 1.003676
## b0[1,7]   0.61203228  0.6397577  0.6933929 1557.5904 1.001458
## b0[1,8]   0.48513822  0.5134314  0.5672066 1468.1355 1.002511
## b0[1,9]   0.52534212  0.5501747  0.5994060 1775.7335 1.000824
## b0[1,10]  0.87324112  0.8934047  0.9258033 1573.3747 1.000719
## b0[1,11]  0.80300311  0.8261868  0.8675033 1393.1817 1.001172
## b0[1,12]  0.33044476  0.3552199  0.4052902 1794.9956 1.000566
## b0[1,13]  0.42116690  0.4492297  0.5026942 1156.5339 1.000289
## b0[1,14]  0.64956850  0.6864706  0.7607107 1239.4056 1.004061
## b0[2,1]   0.33493631  0.4251416  0.6150923  971.5524 1.004049
## b0[2,2]   0.17981663  0.2358847  0.3446097 1037.6210 1.001474
## b0[2,3]   0.61326419  0.6644156  0.7628427 1011.8737 1.005727
## b0[2,4]   0.36837778  0.4158585  0.5190457  809.8949 1.003803
## b0[2,5]   0.17910449  0.2591418  0.4533117 1049.4733 1.001499
## b0[2,6]   0.26739172  0.3299594  0.4685139 1162.6006 1.001170
## b0[2,7]   0.18254607  0.2198969  0.3003156 1371.6455 1.000878
## b0[2,8]   0.25280556  0.2895585  0.3704113  838.7174 1.005624
## b0[2,9]   0.19724053  0.2501298  0.3694806 1192.8747 1.003687
## b0[2,10]  0.52587075  0.5845730  0.7061694 1247.0027 1.002851
## b0[2,11]  0.46874445  0.5392302  0.7046892 1048.7425 0.999473
## b0[2,12]  0.21961656  0.2580782  0.3397127 1056.1081 1.000907
## b0[2,13]  0.25601959  0.3056204  0.4142888  843.3960 1.003130
## b0[2,14]  0.65824835  0.7973674  0.9698829 1135.0669 1.003838
## a0[1,1]   0.95368445  0.9862439  1.0515747 1219.2071 1.003898
## a0[1,2]   0.01633534  0.9911055  2.9717839  979.4231 1.003726
## a0[2,1]   0.15519648  0.2472483  0.4230776  640.3489 1.004625
## a0[2,2]   0.01587281  0.9898084  2.9659552  979.8429 1.003744
## a1[1,1]   0.15647489  0.2205720  0.3354845  934.8953 1.007190
## a1[1,2]   0.06683287  2.1568781  6.0295208  900.1297 1.003701
## a1[2,1]  -0.83503982 -0.7075691 -0.4814539  633.1119 1.010568
## a1[2,2]   0.06586905  2.1557247  6.0239735  900.4432 1.003704