Guidelines for logistic regression models

Steve Simon

1999-09-27

Categories: Blog post Tags: Logistic regression

**[StATS]: Guidelines for logistic regression models (created September 27

There are three steps in a typical logistic regression model.

Fit a crude model

Fit an adjusted model

Examine the predicted probabilities.

Step 1. Fit a crude model.

There are two types of models

**If the factor that you use to predict your binary outcome is itself binary

In this example

The Feeding type * Exclusive bf at discharge Crosstabulation shows us the frequency for the four possible combinations of feeding type and breast feeding status at discharge. It helps to also look at the row percentages and the risk option.

The table above shows row percentages for the exclusive breast feeding status at discharge. Notice that a much greater fraction of the Treatment group were exclusive breast feeding at discharge (86.8% versus 41.3% for the control group).

The Risk Estimate table appears when we select the RISK option. This table provides information about the odds ratio and two different risk ratios. The odds ratio is 9.379. You should always be careful about this estimate

Notice that SPSS provides two additional estimates. These two additional estimates are risk ratios and are computed by dividing one row percentage by the other. The value of 4.461 is the ratio of 58.7% divided by 13.2%. This is the increase in the probability of not exclusively breast feeding at discharge when we compare the NG Tube group to the Bottle Fed group.

The other estimate

The logistic regression output from SPSS is quite extensive. We will break it apart into pieces and discuss each piece individually.

The Case Processing Summary table shows you information on missing cases and unselected cases. Make sure that you are not losing data unexpectedly.

The Dependent Variable Encoding table shows you which of the categories is labeled as 0 and which is labeled as 1. If the estimates that you get later in the output go in the opposite direction from what you would expect

We will skip any discussion of all of the tables in Step 0. These represent the status of a null model with no independent variables other than an intercept. These values are more likely to be interesting if you are fitting a sequential series of logistic regression models.

The Omnibus Tests of Model Coefficients table is mostly of interest for more complex logistic regression models. It provides a test of the joint predictive ability of all the covariates in the model.

The Model Summary table in Step 1 shows three measures of how well the logistic regression model fits the data. These measures are useful when you are comparing several different logistic regression models.

The Classification Table in Step1 is often useful for logistic regression models which involve diagnostic testing

In the Variables in the Equation table for Step 1

We can also get a confidence interval for the odds ratio by clicking on the Options button and selecting the the CI for exp(B) option box.

If we were interested in the earlier odds ratio of 9.379 instead of 0.107

Let’s look at another logistic regression model

The log odds ratio is 0.157 and the p-value is 0.001. The odds ratio is 1.170. This implies that the estimated odds of successful breast feeding at discharge improve by about 17% for each additional year of the mother’s age.

The confidence limit is 1.071 to 1.278

If you wanted to see how much the odds would change for each additional five years of age

Step 2. Fit an adjusted model

The crude model shown in step 1

When you run this model

The Omnibus Tests of Model Coefficients table and the Model Summary table for Block 1 are identical to those in the crude model with MOM_AGE as the covariate. We wish to contrast these with the same tables for Block 2.

The Chi-square values in the Omnibus Tests of Model Coefficients table in Block 2 show some changes.

The test in the Model row shows the predictive power of all of the variables in Block 1 and Block 2. The large Chi-square value (28.242) and the small p-value (0.000) show you that either feeding type or mother’s age or both are significantly associated with exclusive breast feeding at discharge.

The test in the Block row represents a test of the predictive power of all the variables in Block 2

Notice that the two R-squared measures are larger. This also tells you that feeding type helps in predicting breastfeeding outcome

The odds ratio for mother’s age is 1.1367. That tells you that each for additional year of the mother’s age

The odds ratio for feeding type is 0.1443 or

Step 3. Examine the predicted probabilities.

The logistic regression model produces estimated or predicted probabilities and we should compare these to probabilities observed in the data. A large discrepancy indicates that you should look more closely at your data and possibly consider some alternative models.

If you coded your outcome variable as 0 and 1

The Report table shows average predicted probabilities (Predicted probability column) and observed probabilities (Exclusive bf at discharge column) for mother’s age. We had to create a new variable where we created five groups of roughly equal size. The first group represented the 15 mothers with the youngest ages and the fifth group represented the 17 mothers with the oldest ages. The last column (Mother’s age column) shows the average age in each of the five groups.

The Hosmer and Lemeshow Test table provides a formal test for whether the predicted probabilities for a covariate match the observed probabilities. A large p-value indicates a good match. A small p-value indicates a poor match

The Contingency Table for Hosmer and Lemeshow Test table shows more details. This test divides your data up into approximately ten groups. These groups are defined by increasing order of estimated risk. The first group corresponds to those subjects who have the lowest predicted risk. In this model it represents the seven subjects where the mother’s age is 16

The next group corresponds to those with the next lowest risk

Summary

There are three steps in a typical logistic regression model.

First

Second

Third

Further reading

Logistic Regression. David Garson. (Accessed on November 19

You can find an earlier version of this page on my original website.