Should We Believe the SFST and the Cops

Finally, Dr. Cole disputed the claims of proponents of the SFSTs that the studies regarding them have been published in peer review journals. The 1977 and 1981 field studies were published in technical reports by NHTSA, but those reports excluded the "methods and results" sections because they were thought to be too lengthy. Id. at 543. Cole concluded "[i]t is

difficult to see how the NHTSA could claim that the FST is accepted in the scientific community, when results of studies on the validation of the FST have never appeared in a scientific peer reviewed journal, which is a basic requirement for acceptance by the scientific community." Id. Cole concluded:

Because of its widespread use, the FST battery has been assumed to be a reliable and valid predictor of driving impairment. NHTSA has done little to dispel that assumption. Law enforcement cannot be blamed for its use

of the FST battery. Training documents refer to NHTSA reports and provide what appears to be supporting evidence for the validity of the FST battery. In addition, there is little doubt that individuals who have high BAC levels will have difficulty in performing the FST battery. However, what the law enforcement community and the courts fail to realize is that the FST battery may mislead the officer on the road to incorrectly judge individuals who are not impaired. The FST battery to be valid must discriminate accurately between the impaired and non-impaired driver.

critical reviewer to determine whether the officer's arrest decision was based on the SFSTs alone, or on the totality of the information available to the officer, including the results of the breath test. Thus, the studies were not controlled, and there were multiple variables that affected the ultimate decision. He concluded, therefore, that these "validation" studies were scientifically unacceptable.


NHTSA's own research on that issue . . . has not been subjected to peer review by the scientific community. In addition, a careful reading of the reports themselves provides support for the inadequacy of the FST battery. The reports include low reliability estimates for the tests, false arrest rates between 32 and 46.5 percent, and a field test of the FST that was flawed because the officers in many cases had breathalyzer results at the time of the arrest. NHTSA clearly ignored the printed recommendations of its own researchers in conducting that field study.

Horn also introduced the affidavit of Joel P. Wiesen, Ph.D. Dr. Wiesen is an industrial psychologist with special expertise in experimental psychology, psychometrics and statistics. His experience includes more than ten years working with the Commonwealth of Massachusetts developing civil service examinations and an equal number of years as an independent consultant in the area of test development and validation. In addition, he is a published author of a mechanical aptitude test used nationwide. Although he is most familiar with written tests, he does have experience in the development of human performance tests. Def's. Reply Memo, Exh.6 at 1.

Dr. Wiesen reviewed the NHTSA 1977 Report, the 1981 Final Report, the 1983 Field Evaluation, the 1995 Colorado Validation Study, the undated Florida Validation Study, and the NHTSA student manual for the SFSTs. He was highly critical of these


studies, as the following summary illustrates :22

1977 Report

1981 Report (Lab & Field Phases)

1983 Report

1995 Colo. Study

Fla. Study

1.        In the lab     1. Serious         1. Report          1. Report            1. Report too

the HGN test was flaws include         seriously          describes                                                 incomplete to

administered        20% false          flawed, does       results of           permit

using a chin        positive           not meet           impaired                         meaningful

rest which          evaluations of     professional        driving arrests evaluation.

facilitated         intox.; very       standards of        from seven

making HGN          high error         testing               Colorado law

observations.       rates in           community.           enforcement

This was not        reliability if                       organizations.

done in the         using SFSTs to                       Report too

field.              predict BAC. incomplete to draw any


about the

validity of the test.

2.      A single set 2. HGN test           2. Failure to      2. Methodology 2. Methodology

of data was used affected by           monitor data       results and          not described,

to determine        time of day, no collection by         data sections       and data

criterion score     adjustment in      officers.          of report are       regarding

and to evaluate     scoring.           Cannot tell if     missing.                         methodology not

accuracy of                            decisions based                       provided in

test, which                            on SFSTs or                           report.

artificially                           prelim. breath

inflates                                       test (PBT). estimate of


3.        Tests are      3. Test/retest     3. Arrest          3. Data               3. Data

not age & gender reliability           decisions made     generated by                                                 incompletely

neutral, and        rates very low.    on PBT results     "volunteer"                         described.

age/gender                             as well as         officers--

differences can                        SFSTs. Not         suggesting

affect ability                         possible to        possible bias.

to perform                             tell

SFSTs. reliability of SFSTs.


4.        In lab tests 4. Report            4. Authors         4. No

officers were       states testing     fail to report      monitoring of

monitored to        officers did       the data from       data collection

insure correct      not necessarily N.C. Test            to verify

performance of      base decisions     site–over 25%       reporting

tests, not done     on results of      of data for          methodology.

in field.           SFSTs, making      whole test.          Officers merely

validity                              reported

suspect.                              results.

5.        Test results 5. Authors           5. No              5. Results

differ in           admit field        statistical          unclear,

statistically       test data not      tests conducted particularly

significant         appropriate for on data.             because two

respects            statistical                          different

depending on        significance                         arrest

time of day that testing, and                             standards used

HGN test was        could be                             (one for

performed, yet      biased.                              intoxication,

test scoring did                                          another for

not account for                                                          impaired) difference in

time of day test



6.        The study      6. High error      6. SFSTs not

was not peer        rates. 28.6% of administered in

reviewed, and       subjects with      standard

would not have      "legal" BAC        fashion.

been accepted if arrested, and offered.          50% of subjects

w/ BAC > 0.10

not arrested.

7.                                                     Officers  7. Authors

selected for       acknowledge

study not          "extreme

representative     caution" needed

of police             in analyzing officers across data collected

the board.         in study.

Accuracy of

data suspect.

8.                  Authors reported that in field some officers forgot or ignored

standardized procedure to administer



Dr. Wiesen concluded his evaluation of the SFST reports with the following observation:

the studies give only a general indication of the level of potential validity of the tests as described in the NHTSA manual . . . . Rather than the five studies supporting each other, they evaluate somewhat different combinations of test content and test scoring. The differences are large enough to change the validity and accuracy of the tests. The older studies are probably less germane, due to the changes in test content and scoring over time. The reports for the newer studies are grossly inadequate. Given this, and in light of the specific critiques above (which are not exhaustive), I can only conclude that the field sobriety tests do not meet reasonable professional and scientific standards.

Harold P. Brull testified on behalf of Horn and supplied an affidavit as well. Mr. Brull is a licensed psychologist with many years experience consulting in connection with the design and implementation of procedures to measure human attributes, especially in employment settings. He has designed and evaluated tests and procedures measuring human characteristics for over twenty years. Def's. Reply Memo, Exh. 5 at 2.

Mr. Brull reviewed the NHTSA 1977 Report, the 1981 Final Report, the 1983 Field Evaluation, the 1995 Colorado Validation Study, the Florida Validation Study, and the NHTSA officer training manual. Among his general observations of these materials was the opinion that there was a complete absence of


