Ann: Please meet Adam Rogers. He wrote a story about DARPA looking for solutions to the credibility problems of social science, only what I’m calling “solutions to credibility problems,” he called bullshit detection.
First, social science’s credibility problems. Here’s the way I said it in 2015: Start with any question involving human behavior or motivation and try to find an answer. Google it, GoogleScholar it, search the PsychLit database, read the titles and abstracts. Your question will probably be profoundly interesting: Why do siblings stake out their own territory? Why do some people in a community accept what the community offers but avoid offering anything back? How long does grief last? Every single answer you find will be one you could have figured out if you’d arrived yesterday from Mars, taken one look around, and said the first thing that came to mind.
However, that’s a rant, not an explanation of a problem. So Adam, why is a BS detector necessary in the first place? What is wrong with social science that it can’t reliably answer some of the most vexing, important questions we have?
Adam: Hello, Ann! Well, you’ve hit upon the problem right at the top, of course, which is: Why are there no good answers to the best questions? The questions are not, as I perhaps crossed a line by saying, bullshit. The answers, though? Ugh. Across disciplines—from sociology to anthropology to economics to political science—they hew to frameworks that are at least internally consistent (a good start) but don’t talk to each other. This is what the sociological Duncan Watts called his field’s “incoherency problem.” Which I might rephrase as “WTF social science?”
Adam: What the social sciences don’t seem to have, as far as I can tell, is the overarching theory of almost everything that, say, physics does. Seriously, those guys are spending billions of dollars and tera-electron volts of energy trying to break their theory, and it keeps not breaking. And, yes, I get it, relativity breaks at quantum scales and vice versa, but still, the sociologists would kill for a problem like that, because it has numbers and things you can make predictions about and then test. Like science.
Ann: I might hedge your argument about the lack of a fundamental theory of human behavior being the reason for the WTFness of social science. Biology arguably doesn’t have a single overarching, unifying theory, not even evolution/natural selection; and it still acts like a science. So what’s wrong with social science?
Adam: Ok, just to be less of a jerk about all this, physics—and its applied version, engineering—ask questions with quantitative, predictive answers. I am building a bridge. I build it like this. Does the bridge stand? I have successfully understood the science of bridges. But the social sciences don’t know how—yet—to ask questions like that. To the extent that we need a bullshit detector (coming back around to your apposite question), we need it because a lot of people are doing research with results that, intentionally or unintentionally, are bullshit. And if we’re going to make determinations about, say, how best to deploy troops to Afghanistan based on the way people get radicalized via YouTube, it’d be nice to know how all that actually works.
So the really interesting question for us to chew on, maybe, is: If we believe that science is the single best way human beings have figured out to apprehend the universe, why doesn’t it seem to apprehend us?
Ann: I think because human beings have too many variables. Physics doesn’t even like three, and we’ve got a trillion. How long does grief last? You’d have to quantify the definition of grief, then control the ages of the people you ask, their relationship to the person who died, the ages of the person who died, the cause of death, more variables than I can think of right now, and then watch the whole mosaic long enough to see when grief ends. And how are you going to quantify grief in the first place? hand out questionnaires with items like, “#15. I feel numb. A) Not at all, B) Rarely, C) Sometimes, D) Often, E) Always?” On 300 questionnaires, count the number of “A) not at all’s,” find out the number peaks at 4 years out, and declare victory? I could shoot that down with a bow and arrow riding a fast horse in a high wind.
But that’s arguing from anecdote and also not convincing. How does DARPA plan to solve the WTF problem?
Adam: You’ve got two different kinds of questions you can ask here—a Parmenidean problem of being versus becoming. The social sciences try to describe things as they are and physics, chemistry, biology try to predict how things will be. So, one thing you could do would be to reconfigure the questions you want to ask. Or that you can ask. You turn at least some slice of the social sciences into a more pragmatic, outcomes-based field. Engineering, not physics.
Ann: I am stunned by the phrase “a Parmenidean problem of being versus becoming,” not to mention by applying Parmenides (is that a name? do I have to look it up? I’m not going to.) to the social sciences. I admire your education, I’m not joking, I do. Never mind. I could try to figure out what kinds of social science questions would be more pragmatic and outcome-based but 1) I suspect the questions would have to be much less interesting; and 2) that’s not where your argument is going anyway.
Adam: Or! You could try to add some arrows to the quiver by your side on that fast horse. “Science is self-policing,” is what scientists tell us, and what we science journalists say, too. What we mean is, science has tools it uses for validation. Peer review is one. Statistical significance is another. There’s replication, meta-analyses, and newer stuff like impact factors for journals and expert prediction markets. So what DARPA might do—getting around to answering your question now at last—is find techniques to add to that list.
Ann: But hasn’t social science used those tools all along, and still fallen on its face — like the statistical p-hacking crisis or the replication crisis? Oh, right, I just remembered: your article says not that DARPA knows what to do, only that it’s requesting proposals for ideas about what to do. So it doesn’t know either. My friend who wrote a history of DARPA, Sharon Weinberger, says it has been trying to solve social science problems — like how insurgencies win over local folks — since forever, without notable success. So what kind of techniques might work?
Adam: Here’s a thought: Matt Salganik, a Princeton sociologist I talked to for my article, talked about “post-publication peer review,” essentially trying to quantify or at least aggregate what qualified people say about an article or a concept after it gets published, on social media or in the comments. Darpa could, I suppose, build some kind of LEED certification-style system for experiments along the lines of what Brian Nosek’s Center for Open Science has championed, with badges for things like pre-registering study designs to obviate P-hacking. And in his Fragile Families Challenge, Salganik is working with a big prospective study of child success to bring computer science methods of validation and quantification to the (worthy, respected) study.
Ann: Shall we argue about whether it’s DARPA or Darpa? No? So the possibilities for fixing social science are 1) have other social scientists get together afterward and say whether the study was credible; and 2) have other social scientists look at the design of a study before it’s done and say whether it’s science? Does this seem as lame to you as it does to me? Aren’t these the things science is supposed to do and if it doesn’t, it stops calling itself science? I think I’m just getting mad all over again. And what if I’m really interested in fragile families and success? what if I want to see a big (i.e., statistically meaningful), prospective (follow the kids as they grow up) study on children who succeed?
That one actually sounds promising. I can see how to sensibly define “succeed” in ways that wouldn’t rely on dumb multiple choice questions. So what would be “computer science methods of validation?”
Adam: In this case it’s called the common task method, which apparently has been one of the things that has helped machine learning make the big strides it has in the last decades. As I understand it, you take your big longitudinal database and break off a chunk to hold aside. Then you let different groups try models on the data that’s left to see if they predict what happens in your hold-out. If they do, you have a winner. And if the different models all do some things well, you combine them. But see, you’re doing it for predicting human behavioral outcomes instead of, like, machine translation or whatever.
Ann: This is a slippery idea that I only faintly understand but to the extent that I do, I think it might be related to what astronomers call “mocking the data.” Anyway, it’s a way of using huge databases to test different ideas of what might be in there.
Adam: Maybe I’m naive, but I have to admit that this idea doesn’t seem lame to me. I mean, I don’t know if the Darpians will be able to turn social science into a reliable, predictive field. But I love the idea of continuing to add arrows to the quiver—of bringing new tools and techniques to the ways science polices itself. I’m getting all it’s-the-journey-not-the-destination here, maybe, but that’s how you know it’s science: It says, eh, maybe we should check this whole thing just one more time.
Ed. (Ann) note: If you want to know more about these ideas, I do recommend reading Adam’s story.