Should We Believe the SFST and the Cops
Finally, Dr. Cole disputed the claims of proponents of the SFSTs that the studies regarding them have been published in peer review journals. The 1977 and 1981 field studies were published in technical reports by NHTSA, but those reports excluded the "methods and results" sections because they were thought to be too lengthy. Id. at 543. Cole concluded "[i]t is
difficult to see how the NHTSA could claim that the FST is accepted in the scientific community, when results of studies on the validation of the FST have never appeared in a scientific peer reviewed journal, which is a basic requirement for acceptance by the scientific community." Id. Cole concluded:
Because of its widespread use, the FST battery has been assumed to be a reliable and valid predictor of driving impairment. NHTSA has done little to dispel that assumption. Law enforcement cannot be blamed for its use
of the FST battery. Training documents refer to NHTSA reports and provide what appears to be supporting evidence for the validity of the FST battery. In addition, there is little doubt that individuals who have high BAC levels will have difficulty in performing the FST battery. However, what the law enforcement community and the courts fail to realize is that the FST battery may mislead the officer on the road to incorrectly judge individuals who are not impaired. The FST battery to be valid must discriminate accurately between the impaired and non-impaired driver.
critical reviewer to determine whether the officer's arrest decision was based on the SFSTs alone, or on the totality of the information available to the officer, including the results of the breath test. Thus, the studies were not controlled, and there were multiple variables that affected the ultimate decision. He concluded, therefore, that these "validation" studies were scientifically unacceptable.
NHTSA's own research on that issue . . . has not been subjected to peer review by the scientific community. In addition, a careful reading of the reports themselves provides support for the inadequacy of the FST battery. The reports include low reliability estimates for the tests, false arrest rates between 32 and 46.5 percent, and a field test of the FST that was flawed because the officers in many cases had breathalyzer results at the time of the arrest. NHTSA clearly ignored the printed recommendations of its own researchers in conducting that field study.
Id. at 546. (Emphasis in original).
Horn also introduced the affidavit of Joel P. Wiesen, Ph.D. Dr. Wiesen is an industrial psychologist with special expertise in experimental psychology, psychometrics and statistics. His experience includes more than ten years working with the Commonwealth of Massachusetts developing civil service examinations and an equal number of years as an independent consultant in the area of test development and validation. In addition, he is a published author of a mechanical aptitude test used nationwide. Although he is most familiar with written tests, he does have experience in the development of human performance tests. Def's. Reply Memo, Exh.6 at 1.
Dr. Wiesen reviewed the NHTSA 1977 Report, the 1981 Final Report, the 1983 Field Evaluation, the 1995 Colorado Validation Study, the undated Florida Validation Study, and the NHTSA student manual for the SFSTs. He was highly critical of these
studies, as the following summary illustrates :22
22 The information reported in the chart is found in Def's. Reply Memo, Ex.6 at 1-13.
1981 Report (Lab & Field Phases)
1995 Colo. Study
1. In the lab 1. Serious 1. Report 1. Report 1. Report too
the HGN test was flaws include seriously describes incomplete to
administered 20% false flawed, does results of permit
using a chin positive not meet impaired meaningful
rest which evaluations of professional driving arrests evaluation.
facilitated intox.; very standards of from seven
making HGN high error testing Colorado law
observations. rates in community. enforcement
This was not reliability if organizations.
done in the using SFSTs to Report too
field. predict BAC. incomplete to draw any
validity of the test.
2. A single set 2. HGN test 2. Failure to 2. Methodology 2. Methodology
of data was used affected by monitor data results and not described,
to determine time of day, no collection by data sections and data
criterion score adjustment in officers. of report are regarding
and to evaluate scoring. Cannot tell if missing. methodology not
accuracy of decisions based provided in
test, which on SFSTs or report.
artificially prelim. breath
inflates test (PBT). estimate of
3. Tests are 3. Test/retest 3. Arrest 3. Data 3. Data
not age & gender reliability decisions made generated by incompletely
neutral, and rates very low. on PBT results "volunteer" described.
age/gender as well as officers--
differences can SFSTs. Not suggesting
affect ability possible to possible bias.
to perform tell
SFSTs. reliability of SFSTs.
4. In lab tests 4. Report 4. Authors 4. No
officers were states testing fail to report monitoring of
monitored to officers did the data from data collection
insure correct not necessarily N.C. Test to verify
performance of base decisions site–over 25% reporting
tests, not done on results of of data for methodology.
in field. SFSTs, making whole test. Officers merely
5. Test results 5. Authors 5. No 5. Results
differ in admit field statistical unclear,
statistically test data not tests conducted particularly
significant appropriate for on data. because two
respects statistical different
depending on significance arrest
time of day that testing, and standards used
HGN test was could be (one for
performed, yet biased. intoxication,
test scoring did another for
not account for impaired) difference in
time of day test
6. The study 6. High error 6. SFSTs not
was not peer rates. 28.6% of administered in
reviewed, and subjects with standard
would not have "legal" BAC fashion.
been accepted if arrested, and offered. 50% of subjects
w/ BAC > 0.10
7. Officers 7. Authors
selected for acknowledge
study not "extreme
representative caution" needed
of police in analyzing officers across data collected
the board. in study.
8. Authors reported that in field some officers forgot or ignored
standardized procedure to administer
Dr. Wiesen concluded his evaluation of the SFST reports with the following observation:
the studies give only a general indication of the level of potential validity of the tests as described in the NHTSA manual . . . . Rather than the five studies supporting each other, they evaluate somewhat different combinations of test content and test scoring. The differences are large enough to change the validity and accuracy of the tests. The older studies are probably less germane, due to the changes in test content and scoring over time. The reports for the newer studies are grossly inadequate. Given this, and in light of the specific critiques above (which are not exhaustive), I can only conclude that the field sobriety tests do not meet reasonable professional and scientific standards.
Id. at 12-13.
Harold P. Brull testified on behalf of Horn and supplied an affidavit as well. Mr. Brull is a licensed psychologist with many years experience consulting in connection with the design and implementation of procedures to measure human attributes, especially in employment settings. He has designed and evaluated tests and procedures measuring human characteristics for over twenty years. Def's. Reply Memo, Exh. 5 at 2.
Mr. Brull reviewed the NHTSA 1977 Report, the 1981 Final Report, the 1983 Field Evaluation, the 1995 Colorado Validation Study, the Florida Validation Study, and the NHTSA officer training manual. Among his general observations of these materials was the opinion that there was a complete absence of