The Last Word On Nothing | What beer and running taught me about science (part 1 of 2)

Our runners. (click on photos to watch them drink beer.)

I love to run. I also love to drink beer. Sometimes I run, then drink beer. My friends and I do this pretty regularly, and at some point I began to wonder if it was as harmless as we thought.

Though I make my living as a writer now, I’m still a scientist at heart, and I decided this called for an experiment. I talked Runner’s World into commissioning a study (read my story about the project in this month’s issue), and enlisted the help of my friend, Gig Leadbetter. Gig’s an exercise physiologist at my local college and he’s also a runner, college cross-country coach and home brewer.

We searched the literature, and found some research suggesting that drinking alcohol shortly after hard exercise might impair the body’s ability to replenish glycogen stores. This became the basis of our hypothesis, which we then set out to test. Our study took place over three days, and I was one of the guinea pigs.

THE PROTOCOL
Day 1:
5pm: Beer Run #1
Volunteers run on a treadmill for 45 minutes at 75% of their VO2 max, an intensity hard enough to tap into glycogen stores.

5:45pm: Run ends. Beer drinking begins.
Runners are served beer and pasta. Half get a serving of alcoholic beer (Fat Tire Amber Ale) calculated to bring their blood alcohol level to just under 0.08% — the legal limit for “driving under the influence” in Colorado. (My Runner’s World story explains why the cops showed up to the lab.) The other half are unknowingly served an equivalent volume of a non-alcoholic beer (O’Doul’s Amber). The two beers look nearly identical in our plastic serving cups, and only Gig’s student research assistant, Mike Cramer, knows who’s getting the O’Doul’s.

Day 2:
8am: Run to Exhaustion (RTE)
Volunteers run on a treadmill run at 80% of the their volitional max (the speed they were running when they hit their VO2 max) until they are too exhausted to continue. Every 3 minutes, runners rate how hard they’re working on an RPE (rating of perceived exertion) scale. Gig and his team also measure heart rate, oxygen use, and respiratory exchange ratios.

8:15 to 9:15 am: Breakfast
Immediately after the RTE, runners refuel with bagels, cream cheese, fruit, and orange juice.

5pm: Beer Run #2
Runners return to the lab for another 45 minute run at 75% of max, just like the first one.

5:45pm: Beer Run ends, and beer drinking ensues.
Runners served pasta and whichever beer they weren’t served the first time.

Day 3:
8am: Repeat the RTE using same protocol as previous day.

RESULTS
The Runner’s World story explains exactly how the study went down, and provides a summary of the things we measured, our results and how they fit with our hypothesis.

I’ll just say here that at first, the results seemed really exciting. We had five women and five men in our study, and performance on the RTEs after the alcohol was significantly different between the sexes. Averaged together, the women did 22 percent better on the RTE after they’d drank the alcoholic beer. Men, on the other hand, did 21 percent worse.

As a female beer aficionado, I’d love to tell you that our study proved that women perform better with beer and men don’t — here’s to ladies-only beer tents! But the honest truth is, I’m not completely convinced. It’s not just that my legs felt strangely tingly and sore the night after I’d downed the post-run beer or because I suspect that fatigue may explain why I pooped out sooner on my RTE the morning after the Fat Tire. It’s that being a participant in this experiment taught me some important lessons about the limits of studies, especially small ones like this.

But first, a disclaimer. This was a pilot study. We never intended it to provide the definitive word on alcohol and running. The issues I’m about to discuss are not criticisms of Gig or anyone involved in the study. These are problems that hinder many, if not most studies, and I think that both scientists and journalists should keep them in mind when we assess new research.

Lesson 1: even accepted protocols warrant scrutiny.
Our hypothesis was that drinking alcohol within 30 minutes of a hard run would impair glycogen recovery in runners. Gig selected the run to exhaustion as our measure of recovery, because that’s the test that other researchers have used. It’s an accepted test. But what I learned as a participant is that this test is a lousy way to measure recovery. It’s extremely susceptible to confounding factors.

Larry Brede’s result provides a perfect example. He ran 10 minutes and 46 minutes longer on the RTE after the non-alcoholic beer than he did the morning after the Fat Tire. After the results were tabulated, I asked him if he realized that he’d performed worse after the real beer. “Yeah,” he says. “I probably could have gone a little longer that time, but I had my daughter with me and I wanted to get done so we could go home.” He says he did feel slightly more tired on the morning after he’d drank the alcoholic beers, but he’s not so sure it was due to the alcohol. He’d had a really long day of work that Friday, with no time to eat, and he’d had to get up early Saturday morning for our testing.

For me, the RTE seemed more like a test of mental tenacity than a measure of muscle recovery. You’re running at 80% of your max. It’s hard, but it’s a step or two down from all-out. Your legs get a little heavier and your will to continue wanes, but you can’t quite reach that totally spent feeling you get after a finish line sprint. It becomes a mental game. “I kept asking myself—am I truly exhausted, or just sick of this?” Cynthia Malleck said afterwards. We agreed that mostly we were just uncomfortable and bored. I wanted to just turn up the treadmill so I could burn the rest of my energy in one big spurt; instead, I was forced to let it trickle out, one step at a time. It was a slow, gentle torture.

The thing is, if I’d been writing about this study without having participated in it, I’m not sure I’d have given the test any thought. Gig is an accomplished researcher, and he hadn’t anticipated any problems with it either. After all, it’s a standard test. This makes me wonder about problems that might exist with other well-accepted protocols I’ve written about.

Lesson 2: Participant expectations matter.
Before the study, Gig held an orientation meeting to explain the protocol and get our written consent. During the meeting, he said something that stuck in my mind, and I later learned that it had stuck in other participants’ minds too.

He told us that the run to exhaustion wouldn’t last as long as the beer run. Most people last about 20 minutes or so on the run to exhaustion, he said. Immediately, we all decided that this was the time we needed to beat. He’d essentially given us permission to stop after 20 minutes. If he had told me to expect to last 40 minutes, I’m pretty sure I would have forced myself to gut it out at least that long.

Only three of our ten participants lasted less than 20 minutes on any of their RTEs, and these were the three that joined the study after that first orientation meeting. It’s quite possible that Gig’s instructions for the study induced a version of the Hawthorne effect, which happens when study participants alter or improve their behavior because they know they’re being monitored.

Lesson 3: It’s not just the protocol, but how it’s carried out. 

All of us participants were competitive runners, and we were instructed to try and last as long as possible on the RTE tests. Once we set a time on the first test, most of us were motivated to at least match it on the second. At some point during the testing, Gig caught on to this, and he began covering up the timer on our treadmills. But by then, it was too late—at least for me. In my first trial, I could see the timer on my treadmill, so I knew when I’d surpassed my coveted 20 minute mark. On the second trial, the researchers had obscured the timer on my treadmill, but I have good eyesight and was able to eyeball my time on the clock across the lab.
Would I have lasted as long on my two trials without this visual feedback about my time? I don’t know, but I’m certain that having this feedback swayed my trials. This realization left me wondering how many other little details influence a trial’s outcome.

Lesson 4: Averages can obscure the real picture.
Our averages told a compelling story, but the raw data weren’t so clear cut. The individual numbers were all over the place. One female participant ran 74 percent longer after drinking beer, while another went only 16 percent more. On the men’s side, one guy ran 32 percent longer after his O’Doul’s, while another actually ran a sliver more after the alcoholic beer. Viewed like this, I wondered—are we really seeing a pattern here, or just forcing a line through our data?

In search of some perspective, I called Rebecca Goldin, director of research at the Statistical Assessment Service at George Mason University. The problem with small studies like ours, she says, is that you can get small, random effects turning up that have nothing to do with the thing your study set out to test.

Here’s how she explains it. Imagine you have 10 coins — five blue and five red. You flip them all and half of them came up heads. Suppose you look at the reds and find that they had more heads than expected, and you see that the blues had more tails than expected. How do you tell if you have a good study saying that blue coins are different than red, or whether you just have 10 coins that landed randomly? You can’t, Goldin says. Statistical tests likes the ANOVAs we did can estimate how likely it is that the result we found was due to chance, but to get a more conclusive answer, we need more studies.

Lesson 5: Beware of enthusiasm bias.
When Gig first told me that the women in our study ran better after drinking beer, I was ecstatic. We’d worked hard to design a rigorous study, and I believed in our science. I wanted the study to turn up an interesting result. It’s human nature to want your work to succeed, and I’ll admit, I hoped that study our would prove that running and beer were a good mix. Which is a long way of saying that I was primed to believe in (and overstate) our result. My enthusiasm for our study created a credulous spirit that, left unchecked, could have easily overridden any doubts.

Which is why it was so important for us to check in with other experts. Gig consulted with Bob Pettitt, an exercise physiologist at Minnesota State University, Mankato, for help analyzing the statistics, and I ran our results by Goldin for a reality check on how to interpret them. These outsiders didn’t have any skin in the game, and they offered a more cautious take on our results.

WHAT’S NEXT
Gig has a larger follow-up study in the works. “As a scientist, I’m skeptical,” he says of the initial results. “We could have a very unrepresentative sample. If the second study confirms that beer improves performance in women, then he’ll get truly excited.

The next study won’t just be larger, it will also have some changes to the protocol. While three of the women ran longer on their second trial, suggesting that fatigue wasn’t a major factor in the results, Gig is planning to space the two beer trials several days apart next time just to be sure. “Marty’s result caused me some concern,” says Gig. “His first run to exhaustion lasted so dang long (82 minutes) that I think he may have been too tired for the second trial.”

None of this means that our first study is worthless. This is simply the way science works. You formulate a hypothesis, then you set out to test it. Based on these preliminary results, you refine your protocols and your hypothesis and then you test it some more. My question was simple: Does drinking beer right after a run hurt my recovery? Our study provided some initial hints at an answer but the last word on nothing.

Tomorrow: Part 2—The alcohol experiment gets personal.

***

Photos by JT Thomas.

5 thoughts on “What beer and running taught me about science (part 1 of 2)”

Cameron says:

February 1, 2012 at 12:02 pm

This is fascinating! I can’t wait for the next installment. But is it really not possible to tell the difference between Fat Tire and O’Doul’s?
Christie Aschwanden says:

February 1, 2012 at 3:01 pm

Oh, yes, definitely possible! Contrary to what it says in the edited RW story, I actually did figure out which beer I was getting each time. So did Larry–he even guessed the brand and flavor of our NA beer.

I taste tested a bunch of NA beers before we did the study, and they’re all pretty awful. O’Doul’s was the best and looked the most like the Fat Tire.
Irene says:

February 1, 2012 at 9:34 pm

Interesting post–a great way to illustrate the statistical lessons. I’ve often wondered why so many charity runs, even those that start at 7 am, end with a visit to the beer tent,. Now I know there just might be a medical basis ;). Too bad it’s Miller light and not Fat Tire that sponsors our runs.
Pingback: Life without beer: part 2 of my beer & running science experiment : The Last Word On Nothing
Pingback: The “Honey Badger” Daily Quinn « The Daily Quinn

Comments are closed.