In March, I had reason to worry about this. Just after Nature published a story that I wrote about a massive cancer genetics project, I received an email from my editor:
“Should we be worried about our cancer story?” read the subject line of his email.
Our story had covered findings from the Collaborative Oncological Gene-Environment Study (COGS), a massive examination of the genetic underpinnings of breast, prostate and ovarian cancer. The project did find dozens of new genetic risk factors for these diseases, but not enough: we still don’t know enough genetic markers to be able to predict who among us will develop these cancers.
I was a bit surprised, then, to learn that Britain’s National Health Service was planning to start offering a genetic test for cancer risk based on the study’s results. Or at least that’s how it seemed from other news coverage of the study, which my editor noted in his concerned email. He pointed me to stories that dubbed the study “the single biggest leap forward” that the field of cancer genetics had seen, and predicted that patients could soon take a £5-spit test for cancer risk at their local doctor’s office.
Our story, by comparison, seemed somewhat glum; I quoted the coordinator of the study, for instance, as saying that it was too early to use the data in such tests.
It was another episode in the continuing saga of reporting Big Data. How, I wondered, could I and my colleagues at other publications have produced such inconsistent stories?
Part of the answer came down to selection bias. COGS involved more than 1,000 scientists; I spoke to three of them, including the coordinator of the COGS group, along with five sources from outside the project. From them, I got the overall picture that even a project of COGS’ size – involving an unprecedented hundreds of thousands of cancer patients – wasn’t big enough to solve cancer’s complex genetic architecture.
My colleagues elsewhere, on the other hand, quoted other sources, all of whom were involved with, or had funded, the COGS study. From them, they appeared to get a different picture of the work’s significance. For a study like this, it seems, it’s hard to say what the “right” story is because people within these projects disagree amongst themselves about that.
OK, fine, but what about the facts – namely, that £5 spit test? That came from a press conference that I couldn’t attend because it occurred after my deadline and in the middle of the night in my time zone, according to an audio archive provided by Cancer Research UK, one of the COGS funders. The tone of the press conference was more optimistic about the significance of the COGS study and the possibility that its results will be used soon in cancer screening – and used quickly – than the collective take I got from my reporting. During the press conference, one of the COGS investigators explained that she’s studying whether the COGS results could be used to develop a cheap genetic screen that could be offered to men with a familial history of prostate cancer.
She later explained to me in an email that her group is now “developing protocols to research the application of the spit test,” and that the results would then have to be reviewed by UK regulators before being used. In other words, there is no spit test that is ready for prime time – at least not yet. At the press conference, she speculates that if her studies go well and regulators are convinced by them, the test could be available in five years. That uncertainty didn’t make it into newspaper stories that highlighted the test.
Yet I understand why newspaper reporters focused on the “£5 spit test”. There have been so many cancer genetics studies in the past decade; why cover any of them any more unless they’re actually getting us somewhere useful?
In the end, I felt safe assuring my editor that we hadn’t missed the boat on COGS. But I still worry about getting the story right when the nature of scientific projects is changing. I think it’s going to be the rare instance that everyone involved with Big Data projects – an ever more important part of science – agrees on their significance. And it’s becoming more difficult to encapsulate the breadth of their findings into neat, 500-word stories.
Because of that, I’m thankful for a new type of resource that the editors of Nature Genetics, which published five COGS papers, has put together. It’s an explorer that pulls together links to the different COGS papers, along with commentaries on their findings, and new analyses, called “primers,” that serve as guides to the collection and to its collective results. The brains behind the COGS explorer, Nature Genetics editor Orli Bahcall, hopes that it will broaden the focus from single Big Data papers out to a more holistic view of their findings – a laudable and crucial task.
In the meantime, I must admit that this and other recent episodes – such as the flap over the ENCODE project and whether its results were overhyped – may lead me to lean on the conservative side in my reporting on Big Data. And there is a risk that I’ll underplay the importance of a study because I’m bending over backwards not to overplay it. So I guess that makes me the Eeyore in the Big Data playroom.
But I’m OK with that. There are plenty of Tiggers out there already.
Image of Eeyore courtesy jdhancock/flickr.