Evaluating SFSTs Under the Horn and Reed Decisions

These studies are very significant, as they have been cited repeatedly by the state courts in their opinions regarding the admissibility of SFSTs in connection with assessment of the reliability of the SFSTs and their general acceptance within the law enforcement and traffic safety communities. They also are important in this case because they have been the subject of critical analysis by Horn's experts, who provided detailed testimony regarding the limitations of these studies and the extent to which the SFSTs are reliable and valid tests for driver intoxication or alcohol impairment.15

15 This underscores an important point. When analyzing the many state decisions regarding the admissibility of SFST evidence, care must be taken to focus on the factual basis supporting the rulings made. In many instances, the primary evidence that the court had before it regarding the reliability of SFSTs was Dr. Burns' testimony and the above described NHTSA, Colorado and Florida studies, as well as testimony from law enforcement officers with a vested interest in the use of the SFSTs. In most, but not all, instances, the defendant in the state cases simply did not mount a challenge to the "science" underlying the SFSTs. This is not the case here, where Horn has provided a spirited and detailed attack on the tests' reliability. This highlights an inherent limitation in the process of judicial evaluation of the reliability and validity of any scientific or technical evidence: the court must, under Rule 104(a), act as the "gatekeeper" to decide whether the evidence is reliable and admissible. The court, however, is limited in its ability to do so by the quantitative and qualitative nature of the evidence produced by the parties, whatever research the court itself may do, and any help it may derive from courts that have addressed the issue before it. This process unavoidably takes place on a continuum, and a court faced with the present task of deciding the admissibility of scientific evidence must exercise care to consider whether new developments or evidence


The three SFSTs developed by the research sponsored by NHTSA are summarized in the NHTSA student manual. (Gov't. Opposition Memo., Ex. 2). The manual describes the tests and evaluations conducted to develop the SFSTs, then provides detailed instruction on how to administer and score each of the three tests.

The most "scientific" or "technical" of the three is the Horizontal Gaze Nystagmus Test ("HGN Test"). Nystagmus is "the involuntary jerking of the eyes, occurring as the eyes gaze toward the side. Also, nystagmus is a natural, normal phenomenon. Alcohol and certain other drugs do not cause this phenomenon, they merely exaggerate it or magnify it." Id. at

VIII-12. Horizontal gaze nystagmus "occurs as the eyes move to the side."  Id. at VIII-13.             The HGN SFST requires the

investigating officer to look for three "clues": (1) the inability of the suspect to follow a slowly moving stimulus smoothly with his or her eyes, (2) the presence of "distinct" nystagmus when the suspect has moved his or her eyes as far to the left or right as possible (referred to as holding the eyes at "maximum deviation") and held them in this position for

require a reevaluation of the conclusions previously reached by courts that did not have the benefit of the more recent information. In short, neither science and technology may rest on past accomplishments--nor may the courts.


approximately four seconds and (3) the presence of nystagmus before the eyes have moved 45 degrees to the left or right (which, the manual states, usually means that the subject has a BAC above 0.10). Id. at VIII-14-15. The officer is trained to

look for each of the above three "clues" for each of the suspect's eyes, meaning there are six possible "clues." If the officer observes four or more clues the manual asserts that "it is likely that the suspect's BAC is above 0. 10 [ and] [u]sing this criterion [one] will be able to classify correctly about 77% of [one's] suspects with respect to whether they are above 0.10." Id. at VIII-17. If the results of the HGN test are

offered to establish that the suspect's BAC is above 0.10,16 it is readily apparent that much depends on the investigating officer properly performing the HGN test procedures and on his or her subjective evaluation of the presence of the "standardized clues." Indeed, the manual itself cautions with

16 At the time of Horn's arrest, Maryland law stated that, "if at the time of [taking the breathalyzer test], a person has an alcohol concentration of at least .07 but less than .10" such results would be "prima facie evidence that the defendant was driving with alcohol in the defendant's blood." Md. Code Ann., Cts. & Jud. Proc. § 10-307 (1998 Repl. Vol.). Effective September 30, 2001, a blood alcohol concentration between 0.07 and 0.08 will be prima facie evidence that the person was driving while impaired by alcohol. If the person's BAC is .08 or higher, the defendant shall be considered under the influence of alcohol per se. Md. Code Ann., Cts. & Jud. Proc. § 10-307 (d) , (g) (2001 Supp.) .


respect to each of the SFSTs:

[the tests are valid] only when . . . administered in the prescribed, standardized manner; and only when the standardized clues are used to assess the suspect's performance; and, only when the standardized criteria are employed to interpret that performance. If any one of the standardized field sobriety test elements is changed, the validity is compromised.

Id. at VIII-12 (emphasis in original).

The Walk and Turn ("WAT") test requires the suspect to place his feet in the heel-to-toe stance on a straight line. The subject then is instructed to place his right foot on the line ahead of the left foot, with the heel of the right foot against the toe of the left. The suspect also is told to keep his arms down at his side and to maintain this position until the officer instructs him to begin the test. Id. at VIII-18. Once told to

start, the suspect is to take nine heel-to-toe steps down the line, then to turn around in a prescribed manner, and take nine heel-to-toe steps back up the line. Id. While walking, the

suspect is to keep his hands at his side, watch his feet, and count his steps out loud. Id. at VIII-19. Also, the suspect is

told not to stop the test until completed, once told to start.


As with the HGN test, the Manual asserts that there are


standardized clues, eight in all,17 that "[r]esearch . . . has demonstrated . . . are the most likely to be observed in someone with a BAC above 0.10." Id. at VIII-19. Further, it states

"[i]f the suspect exhibits two or more distinct clues on this test or fails to complete it, classify the suspect's BAC as above 0.10. Using this criterion, you will be able to correctly classify about 68% of your suspects." Id. at VIII-21. Once

again, it is the officer's subjective evaluation of the suspect that results in the determination of whether a "clue" is present or not, and, if only two of the eight "standardized clues" are detected, NHTSA asserts that the suspect's BAC is 0.10 or more.

The third SFST is the One Leg Stand ("OLS") test. In this test the suspect is told to stand with her feet together, arms at her sides. She then is told not to start the test until told to do so. To perform the OLS test, the suspect must raise whichever leg she chooses, approximately six inches from the ground, toes pointed out. Id. at VIII-23. While holding this

position, the suspect then must count out loud for thirty seconds, by saying "one-one thousand, two-one thousand," etc.

17 The eight clues are the inability to keep balance while listening to instructions, starting the test before the instructions are finished, stopping to steady one's self, failure to touch heel-to-toe, stepping off the line, using arms for balance, improper turning, and taking an incorrect number of steps. Id. at VIII-20.


Id. The NHTSA manual identifies four "standardized clues" for

the OLS test18 and instructs law enforcement officers that "[i]f an individual shows two or more clues or fails to complete the [test]. . . there is a good chance the BAC is above 0.10. Using that criterion, [one] will correctly classify about 65% of the people [one] test[s] as to whether their BACs are above or below 0.10." Id. at VIII-24.

The NHTSA Manual advises that when the WAT and HGN tests are combined, using a decision matrix developed for NHTSA, an officer can "achieve 80% accuracy" in differentiating suspects with BACs in excess of 0.10. Id. at VIII-5. These conclusions

are supported, it is claimed, by the results of research and testing done by Dr. Burns and her company that was reported in the 1981 Final Report, the 1983 Field Evaluation, the 1995 Colorado Validation Study and the Florida Validation Study.19 Id.

at Exs. 4-8.

As next will be seen, Horn's experts have challenged the

18 The four clues are swaying while balancing, using arms for balance, hopping, and putting a foot down. Id. at VIII-24.

19 The Florida Validation Study is undated. During the Rule 104(a) hearing, there was testimony from Spurgeon Cole, Ph.D., one of Horn's witnesses, that a third validation test had been done in San Diego, but it was not offered as an exhibit. Dr. Cole did testify, however, as to its conclusions and the defects in its design.


reliability, validity and relevance of the SFSTs to prove driver intoxication and are sharply critical of the claims of accuracy advanced in the NHTSA publications and the so-called validation studies. They have framed these objections in terms of the factors discussed in the Daubert/Kumho Tire decisions, as

amplified by this Court in Samuel v. Ford Motor Co., 96 F. Supp.

2d 491 (D. Md. 2000).

3. Horn's Challenges to the Reliability/Validity
of SFST Evidence

Rule 702 prohibits expert testimony if it is not the product of reliable methods or principles that reliably have been

applied to the facts of the particular case. In the context of scientific or technical testing, such as may be the case with SFSTs, reliability means the ability of a test to be duplicated, producing the same or substantially same results when successively performed under the same conditions. Daubert, 509

U.S. at 595; Samuel, 96 F. Supp. 2d at 494. Thus, for the SFSTs, if reliable, it would be expected that different officers, viewing the same suspect performing the SFSTs, would reach the same conclusion regarding the level of the suspect's impairment or intoxication. Alternatively, the same officer re­testing the same suspect with the same BAC as when first tested would reach the same conclusion.


A related, though distinct concept, deals with the validity

of a test. A test is valid if it has a logical nexus with the issue to be determined in a case.  Daubert, 509 U.S. at 591;

Samuel, 96 F. Supp. 2d at 494. In the context of SFSTs, they

are valid if there is a logical nexus between what the tests measure and the true ability of a driver safely to operate a motor vehicle. Thus, for example, does the fact that a suspect missed two "cues" in the WAT test mean that the driver cannot safely drive a car, or does it simply mean that the driver has some inability to perform the test that is unrelated to his or her ability to drive? Horn has challenged both the reliability and validity of the SFSTs.

During the Rule 104(a) proceedings, Horn produced four experts, three of whom submitted affidavits, and two of whom also testified: Yale Caplan, Ph.D. (former chief toxicologist for the State of Maryland and former scientific director of the Maryland Alcohol Testing Program); Spurgeon Cole, Ph.D. (Professor of Psychology, Clemson University and author of a series of articles critical of the SFSTs); Harold P. Brull (a licensed psychologist and consultant specializing                                                in

industrial/organizational psychology, particularly the definition and measurement of human attributes in employment and related settings); and Joel Wiesen, Ph.D. (an industrial


psychologist with special expertise in experimental psychology, psychometrics and statistics. Dr. Wiesen worked for more than ten years for the Massachusetts Division of Personnel Administration, developing and validating civil service examinations and is an independent consultant in the field of development and validation of human performance tests).

In his testimony and published writings, Dr. Cole was highly critical of the reliability of the SFSTs if used to prove the precise level of a suspect's alcohol intoxication or impairment. His 1994 article "Field Sobriety Tests: Are They Designed for Failure?," published in the journal Perceptual and Motor Skills,

analyzed the 1977 Report, the 1981 Final Report, and the 1983 Field Evaluation report published by NHTSA regarding the SFSTs. (Def' s. Memo, Ex. C.) .

Dr. Cole observed the following:

(1)    47% of the subjects tested in the 1977 NHTSA laboratory study who would have been arrested by the testing officers for driving while intoxicated (BAC of 0.10 or greater) actually had BACs below 0.10;

(2)      in the 1981 Final Report, 32% of the participants in the lab study were incorrectly judged by the testing officers as having BACs of 0.10 or greater; and

(3)    the accepted reliability coefficient for standardized clinical tests is .85 or higher, yet the reliability coefficients for the SFSTs, as reported in the NHTSA studies, ranged from .61 to .72 for the individual tests and .77 for individuals that were tested on two different occasions while dosed to the exact same BAC.                                  More


alarmingly, inter-rater reliability rates (where different officers score each subject) ranged from .34 to .60, with an over-all rate of .57.

Id. at 100.

Dr. Cole theorized that the SFSTs, particularly the WAT and OLS tests, required subjects to perform unfamiliar, unpracticed motions and noted that a very few miscues result in a conclusion that the subject failed and had a BAC in excess of 0.10. Id. His

hypothesis was that individuals could be classified as intoxicated/impaired as a result of unfamiliarity with the test, rather than actual BAC.     Id.  He tested this hypothesis by

videotaping twenty-one completely sober individuals performing either "normal-abilities tests" (such as reciting their addresses or phone numbers or walking in a normal manner) or the WAT and OLS tests. Id. at 99-102. The results of the study were

that 46% of the officers that viewed the videotape of the sober individuals performing the SFSTs rated the subjects as having had too much to drink, as compared to only 15% reaching this decision after seeing the videotape of the subjects performing the normal-abilities tests. Id. at 102. Dr. Cole concluded:

[The SFSTs] must be held to the same standards the scientific community would expect of any reliable and valid test of behavior. This study brings the validity of field sobriety tests into question. If law enforcement officials and the courts wish to continue to use field sobriety tests as evidence of driving impairment, then further study needs to be conducted addressing the direct relationship of


performance on these and other tests with driving. To date, research has concentrated on the relationship between test performance and BAC and officers' perception of impairment. This study indicates that these perceptions may be faulty.

Id. at 103.

During his testimony at the Rule 104(a) hearing, Dr. Cole repeated his criticism of the reliability of the 1977, 1981 and 1983 studies but also testified about the Colorado, Florida and San Diego studies performed by Dr. Burns, styled as "field validation studies." This testimony echoed Dr. Cole's written criticisms about the SFSTs' reliability as precise predictors of the level of alcohol intoxication and the SFST's validity as a measure of driver impairment in his 1994 article, co-authored with Ronald H. Nowaczyk, titled "Separating Myth from Fact: A Review of Research on the Field Sobriety Tests" and published in the Champion journal of the South Carolina Bar Association.

Def's. Reply Memo, Exh. 1.

Dr. Cole's primary criticisms, as discussed in his 1994 article, include, first, that the 1981 Final Report published by NHTSA claims an 80% accuracy rate for users of the SFSTs. This is misleading because when the actual data is examined with respect to the success rate of using the SFSTs to differentiate between drivers with BACs above 0.10 and those without, the critical population, the officers had "a 50/50 chance of being


correct just on the basis of guessing." Id. at 539.

Second, the SFSTs have a combined test-retest reliability rates of .77, while the scientific community "expects reliability coefficients to be in the upper .80s or .90 for a test to be scientifically reliable."                                    Id. at 540.     When

different officers tested the same subjects at the same BAC dose level on different days the reliability was only .59-–a 41% error rate. Dr. Cole contrasted these substandard reliability coefficients with that of the BAC machine, which is .96 or 96% reliable. Id. at 540-41.

Third, Dr. Cole argued that in order for the SFSTs to be valid predictors of BAC they must "not only identify individuals above a BAC level of 0.10 as 'failing', but also identify individuals below .10 as 'passing'." Id. at 541. The data from

the NHTSA 1977 Report, however, shows that the validity of the HGN, OLS and WAT SFSTs was ".67, .48, and .55, respectively, with a combined validity coefficient of .67." Id. This means

that use of the SFSTs results in an unacceptably high erroneous arrest rate, if the tests are used by the officer to make arrest decisions based on BAC levels being in excess of .10.

Fourth, Dr. Cole was particularly critical of claims that the NHTSA SFSTs have been "validated" in a "field setting." In this regard, he stated that the 1977 and 1981 NHTSA studies were


done in a laboratory setting, and the difference in conditions in a controlled lab are dramatically dissimilar from field conditions that can be expected when officers employ SFSTs at all times of day and night in widely disparate weather and traffic conditions and where issues of officer safety may influence how the test is performed.20 Id. at 542. Dr. Cole

stated that the NHTSA 1983 Field Evaluation purported to be a field validation study, but it failed to meet the recommendations of the authors of the NHTSA 1981 Final Report that the SFSTs be validated in the field for eighteen months in locations across the country. Id. Dr. Cole also stated that

Dr. Burns herself has testified that the SFSTs adequately have not been field tested.2 1 Id.

20 This criticism is especially significant in light of the third evaluative factor in Rule 702. This factor requires that the expert's opinion testimony be based on the use of principles/methods themselves reliable but that also reliably have been applied to the facts of the particular case. Thus, even if the SFSTs are determined to be reliable measures of driver intoxication, an officer's testimony about their use in a particular case could not be allowed absent a showing that the officer properly had administered the tests.

21 During his testimony, Dr. Cole stated that the

Colorado, Florida and San Diego "validation" studies performed by Dr. Burns with various sheriff's departments do not cure the defects contained in the original reports. The three studies involved officers that made stops of drivers that were driving unsafely, and the officers evaluated them using the SFSTs, but also had the benefit of preliminary breath analysis tests, in many instances, and the studies do not permit a


