After 18 years of practicing as a midwife, Ellen Tilden knew that pregnant women had a lot of questions—and most of the time, she had the answers. She knew which kinds of cheese and fish they shouldn’t eat, how to tell practice contractions from the real thing, what kinds of pain relief medications they could have during labor. When she was stumped, she could usually find the information by consulting a colleague or reading a study. Her patients usually left feeling reassured—one of the most satisfying aspects of Tilden’s job.
But sometimes, Tilden didn’t know. For example: Is it safer to have the baby at home or in a hospital? When certain risk factors were present—like diabetes, for example, or a breech baby—the answer was pretty clear: Hospital births were better. But Tilden, who practices and researches at Oregon Health and Science University, didn’t know what to say to healthy women with uneventful pregnancies, and neither did any of her colleagues. “So we went into conversations about anecdotal information,” she said. “It became a really personal conversation really fast.”
Tilden didn’t have all the facts, and neither did science. The reason for this goes to the heart of how medical studies work. The gold standard of research is the randomized controlled trial—where participants are randomly divided into groups, each group follows a different protocol, and the results are compared. But for certain situations—like contrasting home birth and hospital birth—that kind of experiment design doesn’t work very well. In fact, when British researchers attempted it in 1996, they began with 71 women, but so many dropped out of the study or changed their minds that at the end there were only 11 participants left. “That just isn’t enough,” Tilden says. “Anything could happen in that small a group.”
So researchers are left to look at existing data sets, trying to draw conclusions based on patterns, in what are known as “retrospective cohort studies.” Those aren’t always ideal, because researchers have to control for all sorts of variables, a clunky process that’s sometimes so inexact the results can be practically meaningless. That problem is especially pronounced for obstetrics, because birth isn’t actually a medical problem: Unlike, say, heart attacks, which are associated with certain risk factors, all kinds of women have babies, and usually it turns out fine.
Home birth is an extreme case of this kind of obstetrical research problem, but it’s far from the only one. In fact, there’s a lot we don’t know about how the nearly 4 million babies who are born in the United States every year come into the world. Scientists still can’t explain, for example, why for two women who are roughly the same age and have all sorts of demographic variables in common, labor is quick and easy for one and long and difficult for the other. Many of the guidelines that doctors still use to assess how childbirth progresses—how long each phase takes, for example, or when an obstetrician should call for a c-section—are based at least in part on retrospective studies conducted more than fifty years ago.
And yet we still don’t have enough information about those groups to reduce their risks. There is little research on how certain controllable variables—such as whether labor is induced and which induction drugs are used, what position most women give birth in, and whether the doctor uses forceps to help the baby out—affect diverse groups of women. Even though data to answer these questions probably now exists in the form of a decade or more of electronic medical records from hospitals around the world, scientists aren’t sure how best to manage data sets of that size using the familiar techniques from retrospective cohort studies.The women in those studies looked very different from American moms today: Almost all were likely white and in their early twenties, and few would have been obese. Many likely smoked and drank through their pregnancies; probably not many exercised. We now know that certain women—those who are black, or older, or overweight, or smokers, for example—have elevated risks of complications, both for themselves and their babies.
A few years ago, Tilden began discussing these problems with her colleague, OHSU epidemiologist Jonathan Snowden. The dilemma of being prisoners of retrospective cohort studies made Snowden think of a new statistical tool that he had learned at UC Berkeley, where he completed his PhD in 2011. His mentor, who had trained at Harvard, taught his graduate students to use diagrams and specialized algorithms to figure out which variables they should control for in retrospective cohort studies—and which might distort the findings of the study. At that time, the method, called causal inference, was “considered very esoteric and very daunting,” recalls Snowden. Economists had been employing it for years—it was helpful for figuring out why people made certain financial decisions, for example. But it wasn’t widely used in medicine. “As I got to understand these issues about the practical realities of enrolling pregnant women in trials,” Snowden said, “I began to realize that these methods could be an ideal match.”
For people without a background in statistics, causal inference can at first seem like other methods that researchers use to figure out whether something unusual about their data might be throwing off the results—and then to control for the unusual data to make the results more accurate. That process—controlling for variables—is an essential part of data analysis. But when there are many complex, interrelated variables, it can sometimes skew the results. Let’s say you’re trying to figure out whether living in a home where someone smokes raises a newborn’s risk of dying shortly after birth. A study using typical methods would likely control for the baby’s weight at birth, because babies who weigh less than 5 pounds 8 ounces have an elevated risk of dying as newborns. The problem is that smoking during pregnancy can actually cause babies to be born too small—so if you control for low birthweight, you’re missing a valuable piece of information about a possible cause of a newborn’s death.
Causal inference allows researchers to use that kind of relationship in their data analysis to obtain results that reflect much greater levels of nuance, which is particularly exciting for people who study maternal health and childbirth. When faced with a dizzyingly complex web of variables, researchers are trying hard to understand how they all relate to each other. Take home birth. A conventional study would attempt to control for race, level of education, number of previous children, among others for the entire group of participants. But all those variables “sometimes means you’re comparing apples to oranges,” says Snowden.
So in 2015, Tilden and Snowden decided to apply causal inference to the nagging question of home birth safety. They had a large data set: the birth certificate data for 80,000 babies who were born in Oregon in 2012 and 2013. In order to make sense of the vast number of variables, Tilden and Snowden used a causal inference technique called a propensity score: They calculated how likely each participant was to have her baby at home based on previously studied factors like race, and level of education, and number of previous children—women who choose home birth tend to be whiter, richer, and better educated than those who choose hospital births. After analyzing the data with propensity scores, they published a study in The New England Journal of Medicine in 2015, that concluded that babies born at home had a slightly higher risk of dying as newborns, while moms who give birth at the hospital have a slightly higher risk of having a cesarean section.
To explain how the scores work, they gave me the example of two hypothetical participants: Jane, 35, is white and college educated, with two previous kids. A score of 100 percent means that the researchers are certain that a certain woman will have her baby at home. Jane’s score is 25 percent, which is pretty high, because home birth is still relatively uncommon. Then there’s Maria, 37, who has three other children, is Hispanic and went to grad school. Let’s say her score is 23 percent. Now let’s say Jane chose to deliver at the hospital, while Maria had her baby at home. “Here are two women who were reasonably likely to make the same choice but didn’t,” says Snowden. “So when you look at their birth outcomes, you can be reasonably confident that you’re seeing how the birth setting influenced the outcome, rather than one of the many other variables.”
In a series of three papers aimed at OBGYNs, midwives, and other women’s healthcare providers, Tilden and Snowden presented the basic methods. They explained that propensity scores are just one of several causal inference techniques—others include diagrams that help researchers figure out the relationship between variables and statistical methods that help researchers figure out which questions can best be answered with a certain data set. The reception was mostly positive, said Tilden, but there is an old guard of scientists who are reluctant to believe any results that don’t come from randomized controlled trials. From them, she has felt “a little chill,” she said. “There are people who don’t want to change the way that we do things, even though it’s clear we’re not serving women with the current methods.”
One problem, says Julia Phillippi, a midwife and researcher at Vanderbilt University, is that many physicians simply might not understand causal inference. “When they went through their stats courses, they were trained to look only for certain things, and those things aren’t here,” she said. “But what they don’t get is that these are actually higher quality studies.” Leslie Farland, a University of Arizona professor of epidemiology and biostatistics who specializes in pregnancy and childbirth, agrees. “Given the increasing awareness of maternal mortality and women’s health as urgent public health issues,” she wrote to me in an email, “there is a need for advanced methodologic approaches.”
Other scientists I talked to pointed out more possible uses for causal inference: Researchers could use the technique to help determine whether certain medications are safe for pregnant women—a challenge, because it’s still considered unethical to enroll expecting mothers in clinical trials. Then there are the myriad complexities of race and childbirth: Even when a randomized controlled trial is logistically feasible, its results are often skewed because African American women enroll in trials far less frequently than their white counterparts.
And in fact, Snowden has already seen some evidence that causal inference can help doctors better serve black women. In 2016, Snowden and his colleagues studied an Oregon policy that aimed to curb the rising rate of elective cesarean sections and labor inductions before a baby reached full term. Their conventional data analysis found that while use of those procedures did decrease, moms’ and babies’ outcomes remained the same. But when they ran the numbers again with a causal inference technique, they did find evidence that the policy had reduced the number of health complications—particularly among black women, who were more likely to have elective c-sections than their white counterparts.
Next, Snowden is tackling a question that has baffled researchers for years: He’s trying to figure out why black moms are nearly four times more likely to die during childbirth than their white counterparts. Between 2011 and 2013, for example, for every 100,000 births, 12 white women died, compared to 40 black women. It might be his most complicated causal inference project yet. The data sets are small, because luckily, maternal death is uncommon. But the list of possible explanations is long: Certain health conditions seem to increase the risk of maternal death, as does poor prenatal care and low hospital quality. Most likely, says Snowden, it’s a combination—and the trick will be to figure out how some risk factors lead to others.
“I’m tired of reading studies that just talk about associations,” he says. “I want to know what’s causing this epidemic of black maternal death, what’s driving it.” Tilden, too, is excited about the prospect of using causal inference to learn more about vexing racial disparities in childbirth. “To be able to point out the differences there,” she says, “That is extremely useful.”