|

|

|

|
DECONSTRUCTING THE STATIC 99
Psychologists and psychiatrists have been struggling
with the prediction of physically violent behavior for years. Some very
modest success has been achieved by statistical or actuarial methods in
predicting physically violent offenses. Predicting sexually violent
behavior is even more difficult and more subject to error. The authors of
the Static 99 should be commended for producing a multi-variable prediction
equation in a field that is even more difficult than the prediction of
physically violent behavior. The Static 99 is a method that is designed for
sex offender recidivism assessment instead of for a broad range of
different types of offenders. It is also standardized on a large population
of sexual offenders and gives specific demographic information about the
database used for test development.
There are a variety of major problems with the method
due, no doubt, to the fact that an attempt is made to predict actual
behavior in a free and complex society rather than to identify a mental
trait or disorder. Since recidivism may be due in large part to
environmental factors that are difficult or impossible to predict in
advance, a question arises as to whether we will ever be really precise in
predicting post-release offenses. Some of the principal problems with the
method together with explanations of why the problems exist appear below.
WEAK CORRELATION WITH RECIDIVISM: The most important
weakness of the Static 99 is that the correlation of the method with actual
reoffense is low. This leads to an unknown but large error factor. Most
persons untrained in statistics or mathematics do not realize how low this
correlation is. The most commonly used method to express the correlation
coefficient in terms that are understandable to everyone is to square the
statistic and then multiply by 100.
This arithmetic manipulation yields a percentage of
the so-called “explained variance.” A percentage, of course, is a statistic
one can readily visualize. In other words the resulting percentage
approximates how much we know about the relationship between the measure
and the actions we are attempting to estimate. In the case of the Static 99
with a correlation with recidivism of r = .33, this arithmetic manipulation
tells us that the Static 99 gives us about 10% of what we would ideally
like to know in order to make a completely confident prediction.
Hanson prefers to use another less-known method to
arrive at a percentage. This method would require multiplying by 100 before
any other manipulation or preparation. The resulting figure would yield the
percentage reduction in reoffense likelihood of an offender with a
favorable trait as compared with an offender without that favorable trait.
The penile plethysmograph has a r = .32 correlation with recidivism (Hanson
and Bussiere 1998).
Thus, an offender with a favorable plethysmographic
protocol would be 32% less likely to reoffend than a person without that
favorable trait or condition. The problem with this method as used to evaluate
the Static 99 is that we do not know the likelihood of reoffense of the
person without the trait. We know that the person with the favorable trait
is a 32% better bet in the community than one without it but we do not know
what risk the other offender’s poses so we have nothing to add or subtract
from.
We could use the population base rate to estimate what
threat one or the other offender might pose. In the case of sexual offense,
however, we do not know what the base rate is. Probably the most important
issue in predicting either mental disorder or reoffense behavior in the
field of sexual disorder or in any other field is base rate. That is, at
what rate does the predicted phenomena take place in a particular sexual
offender population. Particularly, low base rate can be a major problem
affecting prediction accuracy. Low base rate leads to undue numbers of
false positive mistakes in prediction. That is, if the base rate is low, we
are very likely to falsely predict that an offender will reoffend when, in
fact, he will not.
Another method of concretely demonstrating the value
of the Static 99 would be to compare how many errors are made using the
test to an estimate of how many errors one would make just by guessing.
Without having the raw data for the Static 99 study, however, it is not
possible to directly calculate the exact number of errors made by the test.
To illustrate roughly the typical magnitude of such an error, however, I
have constructed the sample database reflected in the accompanying tables.
It is possible to construct a sample data base by simply increasing the
number of errors in prediction until the correlation coefficient is
essentially the same as was calculated by Hanson and Thornton for the
Static 99. These authors also use another measure of predictability (the
so-called ROC Curve). We want to construct a sample data base that also
approximates this measure.
Note that the sample database has a correlation with
the criterion of about r = .37 as compared with the r = .33 noted for the
Static 99. These correlation coeficients are very similar. The Area Under
the ROC Curve is the other measure of relationship used by Hanson and
Thornton. This measure is ROC Area = .78 for the sample database and ROC
Area = .71 for the Static 99. Again, the figures are close with the sample
database having slightly better predictive power.
To make the comparison simple, there are only 18 cases
in this sample database. There are 10 disordered cases in the sample
database and 8 normal cases. The ratio, therefore, of disordered to normal
cases is about even and one would anticipate about 50% accuracy in
selection simply by chance alone (using the flip of a coin, for example, to
dictate the choice). Simply by flipping a coin, one would anticipate about
50% correct choices and 50% incorrect choices.
Actually, with the correlation coefficient and the
area under the curve about equal to the Static 99, 5 mistakes were made by
using a mathematically derived cut score in the sample database to predict
whether the cases were healthy or disordered. If this figure had been 8 or
9, the measure used would be largely or totally useless and bereft of any
predictive ability. We may presume, therefore, that if we had the raw data
for the Static 99, predictability would be the same or less than when using
the sample data base. The classification efficiency of the sample database
would be better than chance, but not very much better. Referring again to
the sample database we have used instead of the Static 99, we have been able
to identify 3 or 4 cases by the use of this method that we probably would
not have identified merely by flipping a coin to make the decision.
STATIC 99 IS BASED ON MALES DIFFERENT FROM W&IC
6600 MALES: As with any prediction device, we must have good reason to
assume that the standardization population used to develop the test or
measure is similar in most important respects to the population from which
we intend to make predictions. If that is not substantially true, then even
a powerful test will not offer much help in making predictions in a group
that is dissimilar from the standardization group in important ways.
The standardization group for the Static 99 is
probably (inadvertently) pre-selected to include disproportionate numbers
of reoffense prone individuals. This is clearly reflected by the work of
Firestone in the case of the rape offenders used in the standardization
group. Firestone showed that, if one used an unselected group of rape
offenders, the recidivism rate of these offenders would be substantially
less than typically reported when the rape offenders are all selected from
maximum security institutions or hospitals as was the case for Static 99
[Firestone, Bradford, et al. 1998 #13740].
Unfortunately, I know of no studies that illustrate
this point for child molester subjects but the likelihood is high that
reoffense statistics would be much higher if one restricts the
developmental sample to those leaving maximum security institutions.
Another clear dissimilarity between the developmental
sample of the Static 99 and the population of sex offenders processed in
California under W&IC 6600 has to do with the age of the offenders. The
age of the offenders upon discharge in the Hanson/Thornton data is roughly
35 years of age on average. The age of persons processed using W&IC
6600, however, is much older than that. Age is known to be an important
factor amongst rape offenders. A 35 year old man, everything else being
equal, is more that twice as likely to commit a rape offense as compared with
a 45 year old man. It seems likely that the average age of W&IC 6600
evaluees is over 45 years of age. The age factor is probably not as
important in the child molester population. Still, diminishing offenses
with increasing offender age in the child molester population would be
anticipated; we are simply not as sure of the magnitude of the effect.
Of course, American evaluee’s may or may not be
similar to the Canadian and British offenders primarily making up the
database of this method. This may be especially true of Afro-American
evaluees or even Hispanic American evaluees. Since some of the scoring of
the Static 99 is based on charges, arrests, sentencing dates etc. we must
also hope that these legal practices are the same or similar across these 3
nations.
LIMITED NUMBER OF VARIABLES USED BY STATIC 99: The
data on which Static 99 predictions are based are narrow, including mainly
offense characteristics. As the name Static 99 implies, offense statistics
are unchangeable. No matter what the offender does to improve his reoffense
potential, his prior offenses remain the same and his Static 99 score is
not likely to be favorably affected. Many changeable personal and sexual
characteristics are known to affect reoffense potential, basically none of
these are used in the Static 99.
METHOD USED IN ARRIVING AT REOFFENSE POTENTIAL: The
manner of reporting reoffense potential also implies a precision that the
method does not, in fact, offer. This would be likely to mislead especially
those individuals who are not trained in biostatistics. Typically, we base
statistical predictions on the mathematical likelihood that our predictions
are accurate. We may, for example, predict that a given individual will
reoffend but through mathematical calculations we can assert that we are
only 25% confidant that he will do so. These calculations are typically
based on the variability within the development sample, the magnitude of
the correlation and other factors.
The Static 99, by contrast, uses the proportion of
recidivists receiving a particular score to estimate recidivism potential.
That is, if 40% of offenders receiving a score of 5 on the Static 99 in the
sample group reoffended, then 40% is thought to be the reoffense potential
for any offender whose score on the Static 99 is 5. This type of analysis
tends to discourage the more typical practice of offering “confidence
limits” within which the estimate might vary. To the contrary, the estimate
seems to be used with the presumption that it is very precise. To say the
least, this method of calculating “recidivism potential” places a
particularly heavy weight on the requirement that future evaluees be
similar to those on which the method was standardized.
STATIC 99 IS NOT DESIGNED TO PREDICT VIOLENT SEX
OFFENSE: An additional weakness of Static 99 specifically for purposes of
W&IC 6600 is that the method predicts any sexual offense rather than
specifically a violent sexual offense as required by and described by
W&IC 6600. On balance, therefore, the method is probably best utilized
for purposes such as assigning parolees to high and low risk groups for
supervision or other ancillary administrative decision making. Whether the
method should be used as the centerpiece for decision-making leading to
actual confinement is doubtful.
DOES THE STATIC 99 UNDERESTIMATE RECIDIVISM: Hanson
emphasizes that any estimate of reoffense potential by the Static 99 (or
presumably other statistical measures) would be an underestimate. This
would be true, according to Hanson, because not all offenses are reported
and, of those that are, not all offenses lead to arrests, readmissions or
conviction. It would seem appropriate, however, to determine whether this
method predicts at all for the particular offender studied before
emphasizing that it is an underestimate.
At the very least, if this assertion of
underestimation is made, other caveats should also appear. It should be
noted, for example, that many offenders continue offending until they are
eventually arrested. In the case of these offenders, his speculation would
be partially correct but somewhat less relevant because these offenders
would, in fact, be processed in the legal system even though not all the
offenses they had committed would be known.
It is also known that some offenders are so caught up
in their obsessions or so generally inept that they both offend others
greatly and also make it easy to identify them. Their offenses are likely
to be reported immediately and an arrest made for the first reoffense.
There is also evidence to show that many of the
unknown offenses or offenses not cleared by law enforcement may be
perpetrated by a rather small sub-segment of the sex offender population
(abel). Until and unless these issues are clarified and specific offender
groups identified by future research, it may be better not to speculate
that the rate for a particular offender would automatically be higher than
suggested by the Static 99 or any other specific actuarial method.
The value of the actuarial method is that replaces
speculation with more solid evidence. The weakness is that it refers to the
general group on which it is based and only indirectly to the specific
individual under study. Such general speculation, therefore, risks
confounding what value the method may have in a specific case by replacing
or diluting solid general evidence with general speculation that may be
unwarranted in a specific case.
THE FUTURE OF STATIC 99 AND SIMILAR PREDICTION
METHODS: Steadman argues that “because most existing actuarial tools are
based on a main effects regression approach, they do not adequately reflect
the contingent nature of the clinical assessment processes” [Steadman,
Silver, et al. 2000 #50] That is, most of the variables related to the
prediction of criminal behavior are useful only under certain conditions.
The relationship is probabilistic and, even then, not necessarily or
universally true.
A few examples may illustrate to point. We note
clinically, and with considerable study evidence to corroborate our
observations, that most pedophiles are passive or passive aggressive in
their personality orientation. We can hardly deny, however, that some
pedophiles are aggressive in their personality orientation. The middle
group who are assertive but not really aggressive may be more similar in
their functioning to normal people.
Therefore, a personality measure of passivity on the
one hand and aggression on the other might be consistent with pedophilia in
both extreme positions but not in the middle. Also, merely because one is
pathologically passive, that person is not necessarily a pedophile. Other
things must be true also and simultaneously to create this disorder. These
other measures must be part of a comprehensive prediction system.
In order to accommodate to the complexity of the
prediction problem, our prediction measure must take this complexity and
synergy into account. At present, this is not the case. Measures like the
Static 99 demonstrate that prediction is possible; they must be further
refined to reflect the scope and complexity of the problem before
significant further progress can be made.
Steadman and other have proposed a classification tree
method rather than the present additive method to accomplish this. They believe
that the benefits of this method are supported by empirical data from the
MacArthur Violence Risk Assessment Study [Monahan, Steadman, et al. 2000
#30].
Reference List
1. Firestone, P., Bradford, J. M., McCoy, M.,
Greenberg, D. M., Curry, S., & Larose, M. R. (1998). Recidivism in
convicted rapists. J Am Acad Psychiatry Law, 26(2), 185-200.
2. Monahan, J., Steadman, H. J., Appelbaum, P. S.,
Robbins, P. C., Mulvey, E. P., Silver, E., Roth, L. H., & Grisso, T.
(2000). Developing a clinically useful actuarial tool for assessing
violence risk. Br J Psychiatry, 176, 312-9.
3. Steadman, H. J., Silver, E., Monahan, J.,
Appelbaum, P. S., Robbins, P. C., Mulvey, E. P., Grisso, T., Roth, L. H.,
& Banks, S. (2000). A classification tree approach to the development
of actuarial violence risk assessment tools. Law Hum Behav, 24(1), 83-100.
|