When marine biologist Julia Lowndes started graduate school in California in 2006, she expected to spend the next several years learning about the behavior of the Humboldt squid, which had recently—and dramatically—expanded its range north along the California coast.
But before she learned anything about the squid, she discovered, she had to learn to code.
The satellite tags that Lowndes attached to individual squid took second-by-second measurements of depth and temperature, so every day she collected thousands of data points. When she first tried to look at her raw numbers, the file was so huge that she couldn’t open it in Microsoft Excel. Her fellow biologists didn’t know how to help her, so she enrolled in a computer-science course intended for game designers and pored over the book Practical Computing for Biologists, gradually piecing together the programming skills she needed to manage and analyze her massive dataset. “I learned to code in a panic, and mostly on my own,” she says now, “and that’s how a lot of biologists still do it.”
Eventually, her lines of code revealed the story in the data: she learned that Humboldt squid off the California coast can swim 30 miles a day and dive to depths of almost a mile, and that they like to hang out below the surface, feeding on fish day and night.
Lowndes is now a marine data scientist at the National Center for Ecological Analysis and Synthesis in Santa Barbara, California, and she’s part of the Ocean Health Index, an international effort to track the overall state of the world’s oceans. The project’s early challenges were, in some ways, jumbo versions of those Lowndes faced in graduate school: The researchers wanted to turn mountains of ecological data into coherent stories that could be understood worldwide, but their usual tools weren’t up to the task. In a paper published today in Nature Ecology and Evolution, Lowndes and her co-authors describe how unwieldy email threads, vague and inconsistent Excel filenames, and other seemingly small annoyances steadily undercut the project and its goals.
So during their second global assessment, in 2013, the team members gradually transformed themselves from scientists into scientist-programmers. They learned to code in R and RStudio, track the different versions of their files in Git, and share their work in GitHub. They learned from groups like Software Carpentry and rOpenSci that are helping environmental scientists overcome their fear of data science. Now, Lowndes says, marine biologists working in the Baltic can much more easily compare their data to those gathered in the Pacific, and successive assessments can be confidently compared over time. Such comparisons could help inform and enforce wide-ranging protections, such as the newly proposed code of conduct for marine conservation. “There’s this myth that you’re either a coder or you’re not, and that environmental scientists are definitely not,” says Lowndes. “But when people see how powerful these tools are for collaboration and communication, they get on board.”
The Ocean Health Index isn’t the only international conservation effort to suffer from too much information and not enough communication: thanks to cheap technology and the rise of citizen science, we have more data about more species and habitats in more places than ever before, but a lot of those data aren’t being used to protect what they describe. Changing that requires the political will to solve global conservation problems, of course, but it also requires a language that crosses borders. Maybe the common language of conservation is R.
Top photo by Marcus Spiske.