9
Odds ratios, as Peng, So, Stage, and St. John (2002) remind us, are not odds. Their interpretation is not as
transparent as the original Tool Box assumed them to be. And while an R
2
statistic is presented in the logistic tables
of this study, it cannot be read the way one would interpret an R
2
in a linear regression, i.e., it does not indicate the
percent of variance in the dependent variable that is explained by the independent variables (Long, 1997). For that
reason, it is called a "pseudo R
2
" (Cabrera 1994) and is one of a number of measures of goodness-of-fit. As blocks
of variables are added to the model in the stepwise manner followed here, the pseudo R
2
should increase.
10
For the computation of Delta-p, I am using a shortcut recommended by Paul Allison of the University of
Pennsylvania: bp(1-p), where b is the logistic coefficient and p is the probability for the dependent variable in the
model. This heuristic produces slightly higher values than the formula advanced by Petersen (1985).
11
For all technical issues, please see Appendix D.
8
In terms of statistical technique, both the original Tool Box and The Toolbox Revisited use simple
logistic regression, not structural equations or other path models that are common to causal
inquiries or searches for indirect effects, e.g., of discrete aspects of school or college
environments (Dey and Astin 1993). A logistic regression is focused on an event that either
happens or it doesn’t. The dependent variable is dichotomous: yes or no. The independent
variables are judged within each model by the degree to which they contribute to what happened
in relation to or controlling for all other independent variables in the model (Hair, Anderson,
Tatham, and Black 1995).
There are a number of ways of expressing this “degree.” One is by an “odds ratio,” which,
expressed in a simple way, is a ratio of the odds that X will happen given a unit of change in the
independent variable to the odds of X not happening, and ultimately shows the strength of
association between the independent and dependent variables—with the closer the odds ratio
to 1, the less the strength of the association.
9
This was the measure used in the original Tool
Box. Another way of expressing the value of the contribution of an independent variable is by a
“Delta-p” statistic that says every unit change in the independent variable changes the
probability that X will happen by Y percent given the values of the other variables in the model
(Peterson 1985; Cabrera 1994). The narrative of The Toolbox Revisited relies on Delta-p,
10
and
the logistic model tables provide Delta-p statistics only for those parameter estimates that are
statistically significant since there is no way to determine the statistical significance of the Delta-
p itself (Cabrera 1994).
11
But in this paper there is a major methodological departure from the original Tool Box study:
there are seven (and not five) steps in the model employed, all driven by the empirical history of
the NELS:88/2000 students. Following St. John, Paulsen, and Starkey (1996), the blocks of
variables in each step were entered "in a sequence that parallels the order in which students pass
through well-established stages of persistence behavior" (p. 194) on their way toward bachelor’s
degree completion (or not). Each of the seven steps, too, is cumulative. That is, variables in one
step that meet the statistical criteria for remaining in the model are carried forward to the next
step. This extended accounting, which we will call a "logistic narrative," allows "a meaningful
examination of the direct effects of variables on persistence, as well as their interactions with the
variables entered in successive steps" (St. John, Paulsen, and Starkey 1996, p. 194).
The reader can already tell that there is a great deal of technical material in this presentation, but
it is presented in the spirit of the U.S. Department of Education’s goal of building a culture of
evidence. The author trusts that reports such as The Toolbox Revisited will contribute to the