Diagnostics for a generalized linear model

Steve Simon

2021-10-18

There are a broad range of diagnostic methods (mostly graphical) that can help you decide from among a dizzying array of alternative models for count data. The most important consideration is to start at the shallow end of the pool with the simplest approaches. Don’t jump in at the deep end and try something really complex before you’ve had the chance to look at simpler models, even if you know they are overly simplistic.

suppressMessages(suppressWarnings(library(broom)))
suppressMessages(suppressWarnings(library(dplyr)))
suppressMessages(suppressWarnings(library(ggplot2)))
suppressMessages(suppressWarnings(library(foreign)))
suppressMessages(suppressWarnings(library(magrittr)))
suppressMessages(suppressWarnings(library(pscl)))

The dataset used in this example

Francis Huang has a nice web page showing two datasets and how to analyze them using the Poisson regression model as well as several more sophisticated models. The first data set is available for anyone to download. The link I originally used no longer works.

fn <- "http://faculty.missouri.edu/huangf/data/poisson/articles.csv"
articles <- read.csv(fn)
head(articles)

Here is a different link for the same file, used at German Rodriguez’s website. I’ve placed a copy on my website in case this link breaks. Also, this dataset is available with the pscl library of R under the name, bioChemists.

fn <- "http://www.pmean.com/00files/couart2.dta"
articles <- read.dta(fn)
head(articles)
##   art   fem     mar kid5  phd ment
## 1   0   Men Married    0 2.52    7
## 2   0 Women  Single    0 2.05    6
## 3   0 Women  Single    0 3.75    6
## 4   0   Men Married    1 1.18    3
## 5   0 Women  Single    0 3.75   26
## 6   0 Women Married    2 3.59    2

The variables are

Histogram of outcome variable

Look at the possible values for your outcome variable, which is usually a count. If you have rate data, do a histogram both for counts and rates.

ggplot(articles, aes(art)) + 
  geom_histogram(binwidth=1, color="black", fill="white")

plot of chunk glm-diagnostics-04

Compare this to a bar chart of Poisson probabilities. The outcome variable from a Poisson regression model does not need to follow a single simple Poisson set of probabilities. It is more likely a mixture of different Poisson variables depending on the covariate pattern. Nevertheless, it can’t hurt to look at the Poisson probabilities, as long as you don’t get too concerned about discrepancies between the actual data and the Poisson probabilities.

data.frame(x=0:max(articles$art)) %>%
  mutate(p=dpois(x, lambda=mean(articles$art))) %>%
  ggplot(aes(factor(x), p)) +
    geom_col(color="black", fill="white")

plot of chunk glm-diagnostics-05

Here there seems to be a disparity at zero, which may (or may not) indicate the need for a zero-inflated model.

Descriptive statistics on indepdent variables

It helps to know the general descriptive statistics (means, standard deviations, percentages) for each of the independent variables.

articles %>%
  group_by(fem) %>%
  summarize(n=n())
## Error in summarize(., n = n()): argument "by" is missing, with no default
quantile(articles$ment)
##   0%  25%  50%  75% 100% 
##    0    3    6   12   77
quantile(articles$phd)
##    0%   25%   50%   75%  100% 
## 0.755 2.260 3.150 3.920 4.620
articles %>%
  group_by(mar) %>%
  summarize(n=n())
## Error in summarize(., n = n()): argument "by" is missing, with no default
articles %>%
  group_by(kid5) %>%
  summarize(n=n())

The wide ranges for ment and phd may make it difficult to interpret. A simple approach is to scale them to the interval 0 to 1.

articles %>%
  mutate(ment=(ment-min(ment))/(max(ment)-min(ment))) %>%
  mutate(phd=(phd-min(phd))/(max(phd)-min(phd))) -> articles_scaled

Bivariate plots

Although you are going to fit multiple independent variables, it does not hurt to look at the bivariate relationships first. A variable that looks important from a bivariate perspective, may be unimportant from the multivariate perspective and vice versa. So don’t read too much into these plots. It is intended to get you familiar with trends and patterns that might (or might not) show up in more complex models.

Use boxplots when the independent variable is categorical and scatterplots when the independent variable is continuous.

ggplot(articles, aes(factor(fem), art)) +
  geom_boxplot()

plot of chunk glm-diagnostics-12

ggplot(articles, aes(ment, art)) +
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

plot of chunk glm-diagnostics-13

ggplot(articles, aes(phd, art)) +
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

plot of chunk glm-diagnostics-14

ggplot(articles, aes(factor(mar), art)) +
  geom_boxplot()

plot of chunk glm-diagnostics-15

ggplot(articles, aes(factor(kid5), art)) +
  geom_boxplot()

plot of chunk glm-diagnostics-16

There aren’t any huge trends here. It looks like there might not be much going on here at all. If there are differences, they seem to show up in the values above the 75th percentile.

The one cause for early concern is that the effect of ment seems to level off after a certain point.

Fit simple models first

pois_fem <- glm(art ~ fem, family=poisson, data = articles_scaled)
summary(pois_fem)
## 
## Call:
## glm(formula = art ~ fem, family = poisson, data = articles_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9404  -1.7148  -0.4119   0.4139   7.3221  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.63265    0.03279  19.293  < 2e-16 ***
## femWomen    -0.24718    0.05187  -4.765 1.89e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1794.4  on 913  degrees of freedom
## AIC: 3466.1
## 
## Number of Fisher Scoring iterations: 5
pois_ment <- glm(art ~ ment, family=poisson, data = articles_scaled)
summary(pois_ment)
## 
## Call:
## glm(formula = art ~ ment, family = poisson, data = articles_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5700  -1.6316  -0.3598   0.5068   5.9483  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.25991    0.03436   7.564 3.91e-14 ***
## ment         2.00584    0.14764  13.586  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1669.5  on 913  degrees of freedom
## AIC: 3341.3
## 
## Number of Fisher Scoring iterations: 5
pois_phd <- glm(art ~ phd, family=poisson, data = articles_scaled)
summary(pois_phd)
## 
## Call:
## glm(formula = art ~ phd, family = poisson, data = articles_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9596  -1.7599  -0.4767   0.3878   7.8155  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.32259    0.06813   4.735 2.19e-06 ***
## phd          0.32976    0.10053   3.280  0.00104 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1806.6  on 913  degrees of freedom
## AIC: 3478.3
## 
## Number of Fisher Scoring iterations: 5
pois_mar <- glm(art ~ mar, family=poisson, data = articles_scaled)
summary(pois_mar)
## 
## Call:
## glm(formula = art ~ mar, family = poisson, data = articles_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8677  -1.7845  -0.5042   0.3107   7.4992  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.46514    0.04508  10.317   <2e-16 ***
## marMarried   0.09117    0.05458   1.671   0.0948 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1814.6  on 913  degrees of freedom
## AIC: 3486.3
## 
## Number of Fisher Scoring iterations: 5
pois_kid5 <- glm(art ~ kid5, family=poisson, data = articles_scaled)
summary(pois_kid5)
## 
## Call:
## glm(formula = art ~ kid5, family = poisson, data = articles_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8708  -1.8067  -0.5333   0.3694   7.4916  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.55960    0.02988  18.728   <2e-16 ***
## kid5        -0.06978    0.03450  -2.023   0.0431 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1813.2  on 913  degrees of freedom
## AIC: 3485
## 
## Number of Fisher Scoring iterations: 5

For all of these models, the deviance is much greater than the degrees of freedom, indicating a poor fit. This could be failure to include all the variables in a multivariate sense or a problem with the underlying Poisson model.

Don’t put too much stock in the p-values just yet, but several of them are small. This may change when we look at the independent variables in combination.

Fit graually more complex models

pois_all_ivs <- glm(art ~ fem + ment + phd + mar + kid5, family=poisson, data = articles_scaled)
summary(pois_all_ivs)
## 
## Call:
## glm(formula = art ~ fem + ment + phd + mar + kid5, family = poisson, 
##     data = articles_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5672  -1.5398  -0.3660   0.5722   5.4467  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.31430    0.08758   3.589 0.000332 ***
## femWomen    -0.22459    0.05461  -4.112 3.92e-05 ***
## ment         1.96679    0.15447  12.733  < 2e-16 ***
## phd          0.04956    0.10202   0.486 0.627139    
## marMarried   0.15524    0.06137   2.529 0.011424 *  
## kid5        -0.18488    0.04013  -4.607 4.08e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1634.4  on 909  degrees of freedom
## AIC: 3314.1
## 
## Number of Fisher Scoring iterations: 5
tidy(pois_fem) %>%
  bind_rows(tidy(pois_ment)) %>%
  bind_rows(tidy(pois_phd)) %>%
  bind_rows(tidy(pois_mar)) %>%
  bind_rows(tidy(pois_kid5)) %>%
  filter(term != "(Intercept)") %>%
  mutate(unadjusted_rate_ratio=round(exp(estimate), 2)) %>%
  select(term, unadjusted_rate_ratio) -> pois_crude
## Error in select(., term, unadjusted_rate_ratio): unused arguments (term, unadjusted_rate_ratio)
tidy(pois_all_ivs) %>%
  filter(term != "(Intercept)") %>%
  mutate(adjusted_rate_ratio=round(exp(estimate), 2)) %>%
  select(term, adjusted_rate_ratio) -> pois_adjust
## Error in select(., term, adjusted_rate_ratio): unused arguments (term, adjusted_rate_ratio)
pois_crude %>%
  full_join(pois_adjust)
## Joining, by = "term"
## # A tibble: 5 x 3
##   term       unadjusted_rate_ratio adjusted_rate_ratio
##   <chr>                      <dbl>               <dbl>
## 1 femWomen                    0.78                0.8 
## 2 ment                        7.43                7.15
## 3 phd                         1.39                1.05
## 4 marMarried                  1.1                 1.17
## 5 kid5                        0.93                0.83

Examine residuals

A good general plot is fitted (predicted) values versus residuals. Since the fitted value is a composite of all the independent variables, it is likely to show unusual behavior if there are issues involving any of the independent variables.

An ideal pattern would be a random scatter of points spread across all regions of the graph. A curvilinear pattern (e.g., high in the middle, low at either extreme) or a fanning out pattern (e.g., more variation on the right side of the graph than the left) is indication that there are unusual features in the data that are not fully captured by the current model.

r_pois <- augment(pois_all_ivs, articles_scaled, type.predict="response", type.residuals="deviance")
ggplot(r_pois, aes(.fitted, .resid)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

plot of chunk glm-diagnostics-24

Clearly there is something unusual going on with the largest fitted value.

r_pois %>%
  filter(.fitted > 7) %>%
  data.frame
##   art fem     mar kid5       phd ment  .fitted    .resid .std.resid
## 1   1 Men Married    1 0.2652005    1 9.627207 -3.567244  -3.986632
##        .hat  .sigma   .cooksd
## 1 0.1993306 1.33509 0.4006419

This appears to be the subject where the scaled value for ment equals 1. In other words, the highest possible value for ment.

You should also plot the residuals versus every independent variable, although the plots versus binary independent variables are unlikely to show anything interesting.

ggplot(r_pois, aes(factor(fem), .resid)) + 
  geom_boxplot()

plot of chunk glm-diagnostics-27

ggplot(r_pois, aes(ment, .resid)) + 
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

plot of chunk glm-diagnostics-28

ggplot(r_pois, aes(phd, .resid)) + 
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

plot of chunk glm-diagnostics-29

ggplot(r_pois, aes(factor(mar), .resid)) + 
  geom_boxplot()

plot of chunk glm-diagnostics-30

ggplot(r_pois, aes(factor(kid5), .resid)) + 
  geom_boxplot()

plot of chunk glm-diagnostics-31

Perhaps, the value of a mentor is good only to a point and anything after offers no additional help. Let’s compute a new value for ment where any value above 40 is set equal to 40.

articles %>%
  mutate(ment=pmin(40, ment)) %>% 
  mutate(ment=(ment-min(ment))/(max(ment)-min(ment))) %>%
  mutate(phd=(phd-min(phd))/(max(phd)-min(phd))) -> articles_rescaled
pois_rescaled <- 
  glm(
    art ~ fem + ment + phd + mar + kid5, 
      family=poisson, 
        data = articles_rescaled)
r_rescaled <- 
  augment(
    pois_rescaled, 
      articles_rescaled, 
        type.predict="response", 
          type.residuals="deviance") 
ggplot(r_rescaled, aes(.fitted, .resid)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

plot of chunk glm-diagnostics-32

ggplot(r_rescaled, aes(ment, .resid)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

plot of chunk glm-diagnostics-32

Negative binomial model

library(MASS)
nb1 <- glm.nb(art ~ fem + ment + phd + mar + kid5, data = articles_rescaled)
summary(pois_rescaled)
## 
## Call:
## glm(formula = art ~ fem + ment + phd + mar + kid5, family = poisson, 
##     data = articles_rescaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8559  -1.5078  -0.3423   0.5818   5.3354  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.28876    0.08797   3.283  0.00103 ** 
## femWomen    -0.22336    0.05453  -4.096  4.2e-05 ***
## ment         1.29786    0.09771  13.283  < 2e-16 ***
## phd         -0.02054    0.10361  -0.198  0.84284    
## marMarried   0.15519    0.06136   2.529  0.01143 *  
## kid5        -0.17383    0.03987  -4.360  1.3e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1817.4  on 914  degrees of freedom
## Residual deviance: 1612.3  on 909  degrees of freedom
## AIC: 3292.1
## 
## Number of Fisher Scoring iterations: 5
summary(nb1)
## 
## Call:
## glm.nb(formula = art ~ fem + ment + phd + mar + kid5, data = articles_rescaled, 
##     init.theta = 2.345482748, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2016  -1.3559  -0.2671   0.4481   3.4923  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.24464    0.11622   2.105  0.03530 *  
## femWomen    -0.21377    0.07215  -2.963  0.00305 ** 
## ment         1.34279    0.14614   9.188  < 2e-16 ***
## phd          0.02888    0.13882   0.208  0.83517    
## marMarried   0.15054    0.08156   1.846  0.06494 .  
## kid5        -0.17066    0.05240  -3.257  0.00113 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(2.3455) family taken to be 1)
## 
##     Null deviance: 1122.1  on 914  degrees of freedom
## Residual deviance: 1008.1  on 909  degrees of freedom
## AIC: 3128.3
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  2.345 
##           Std. Err.:  0.289 
## 
##  2 x log-likelihood:  -3114.311

Zero inflated model

zip1 <- 
  zeroinfl(
    art ~ fem + ment + phd + mar + kid5 | 
      1, 
        dist="poisson", 
          data = articles_rescaled)
summary(zip1)
## 
## Call:
## zeroinfl(formula = art ~ fem + ment + phd + mar + kid5 | 1, data = articles_rescaled, 
##     dist = "poisson")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4952 -0.9862 -0.2940  0.5601  7.3463 
## 
## Count model coefficients (poisson with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.52080    0.09770   5.331 9.79e-08 ***
## femWomen    -0.23166    0.05824  -3.978 6.96e-05 ***
## ment         1.13674    0.10570  10.754  < 2e-16 ***
## phd         -0.05874    0.11145  -0.527 0.598140    
## marMarried   0.13225    0.06574   2.012 0.044239 *  
## kid5        -0.16129    0.04285  -3.765 0.000167 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.7435     0.1639  -10.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 13 
## Log-likelihood: -1611 on 7 Df
zip2 <- 
  zeroinfl(
    art ~ fem + ment + phd + mar + kid5 | 
      fem + ment + phd + mar + kid5, 
        dist="poisson",
          data = articles_rescaled)
summary(zip2)
## 
## Call:
## zeroinfl(formula = art ~ fem + ment + phd + mar + kid5 | fem + ment + 
##     phd + mar + kid5, data = articles_rescaled, dist = "poisson")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9307 -0.8803 -0.2634  0.5320  7.0179 
## 
## Count model coefficients (poisson with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.62199    0.10366   6.000 1.97e-09 ***
## femWomen    -0.21558    0.06390  -3.374 0.000741 ***
## ment         0.96293    0.11653   8.263  < 2e-16 ***
## phd         -0.09387    0.12331  -0.761 0.446505    
## marMarried   0.10351    0.07184   1.441 0.149633    
## kid5        -0.13632    0.04756  -2.866 0.004156 ** 
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -0.61145    0.43313  -1.412  0.15804   
## femWomen     0.06827    0.28564   0.239  0.81110   
## ment        -4.03328    1.45417  -2.774  0.00554 **
## phd         -0.21521    0.56689  -0.380  0.70421   
## marMarried  -0.34732    0.32230  -1.078  0.28119   
## kid5         0.22108    0.19709   1.122  0.26198   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 19 
## Log-likelihood: -1599 on 12 Df
zip3 <- 
  zeroinfl(
    art ~ fem + ment + phd + mar + kid5 | 
      ment, 
        dist="poisson",
          data = articles_rescaled)
summary(zip3)
## 
## Call:
## zeroinfl(formula = art ~ fem + ment + phd + mar + kid5 | ment, data = articles_rescaled, 
##     dist = "poisson")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9318 -0.8877 -0.2517  0.5310  7.0536 
## 
## Count model coefficients (poisson with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.59978    0.09752   6.150 7.74e-10 ***
## femWomen    -0.22165    0.05874  -3.774 0.000161 ***
## ment         0.96217    0.11360   8.470  < 2e-16 ***
## phd         -0.07773    0.11213  -0.693 0.488213    
## marMarried   0.13383    0.06630   2.018 0.043546 *  
## kid5        -0.15649    0.04324  -3.619 0.000295 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.8236     0.2141  -3.847  0.00012 ***
## ment         -4.1956     1.3978  -3.002  0.00269 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 14 
## Log-likelihood: -1600 on 8 Df
zinb <- 
  zeroinfl(
    art ~ fem + ment + phd + mar + kid5 | 
      ment, 
        dist="negbin",
          data = articles_rescaled)
summary(zinb)
## 
## Call:
## zeroinfl(formula = art ~ fem + ment + phd + mar + kid5 | ment, data = articles_rescaled, 
##     dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2732 -0.7706 -0.2525  0.4796  6.4457 
## 
## Count model coefficients (negbin with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.37538    0.12149   3.090  0.00200 ** 
## femWomen    -0.21021    0.07162  -2.935  0.00333 ** 
## ment         1.14749    0.15473   7.416 1.21e-13 ***
## phd         -0.01688    0.13731  -0.123  0.90214    
## marMarried   0.14007    0.08088   1.732  0.08331 .  
## kid5        -0.16263    0.05218  -3.117  0.00183 ** 
## Log(theta)   1.01830    0.14352   7.095 1.29e-12 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.8630     0.3706  -2.329   0.0199 *
## ment        -25.3539    10.5780  -2.397   0.0165 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 2.7685 
## Number of iterations in BFGS optimization: 35 
## Log-likelihood: -1550 on 9 Df

Comparing competing models

The AIC statistic is useful for comparing models, but it is not produced automatically for negative binomial regression or the zero-inflated models. It requires you to pull the individual components from the

save.image(file="glm_diagnostics.RData")