Professor Alyssa Bilinski set out to answer a seemingly simple question: how often are pregnant people included in medical trials? Finding the answer, however, was anything but simple. With 90,000 records to analyze, she turned to AI for help—but ensuring the accuracy of the results required a creative approach. Discover how Bilinski tested and refined AI algorithms to deliver reliable insights and advance health policy research.
Mentioned in this episode:
Tell someone you know!
And one last thing! If you enjoyed today’s episode, text a friend and let them know about the show.
Welcome to Humans in Public Health. I'm Megan Hall.
In the past few years, the field of public health has become more visible than ever before, but it's always played a crucial role in our daily lives. Each month, we talk to someone who makes this work possible. Today, Alyssa Bilinski
Ever since ChatGPT came out about two years ago you’ve probably had a hard time escaping two letters: “AI”. There’s a lot of debate about the role of artificial intelligence in our lives. When is it a useful tool? And when can it create harm? Academic researchers are also grappling with these questions.
Our guest today, Alyssa Bilinski, is one of them.
Alyssa’s a professor at Brown University, and she’s not a computer scientist or a programmer. She’s a health policy researcher.
Alyssa Bilinski:I sometimes like to say that a lot of health policy research focuses on studying the past, so understanding the impact of policies like the Affordable Care Act, or even looking at what's happening in the present. But I like to think about taking what we've learned from the past and what is happening in the present and trying to use this to predict the future.
Megan Hall:When Alyssa tries to predict the future, she’s usually doing that with tons of data. She looks for trends, and figures out what’s working, and what’s not.
That’s where the role of AI comes in.
Alyssa Bilinski:At a high level, I think that there are two main ways that AI is changing research. So first, it's changing the conduct of research, so it can help us with things like coding, reading papers, making a podcast of your papers.
Megan Hall:Oh, no. Don't tell me that.
Alyssa Bilinski:And in my work in particular where we're gathering information from many, many different sources, it can change our ability to get information.
The second piece that we're thinking a lot about is not just how AI helps us do research, but how we can think about improving the AI available to us to make it better for research. And so for both of these tasks, it's really hard to overstate the potential benefits of AI, but there's really big risks as well. If you ever played around with ChatGPT, you know that it makes errors, and it makes unexpected errors, and it makes them really confidently and so that can make it really hard to actually apply AI to the problems we want to work on.
Megan Hall:So, how do you use AI to gather information without making big errors? Alyssa grappled with this question in a recent project. She was trying to figure out how often pregnant people are included in research for developing new medication.
Alyssa:Over 90 million women in the US have given birth. That's about more than 70% of women between the ages of about 18 and 85 but traditionally, pregnant women have been excluded from drug development clinical trials.
Megan Hall:Those clinical trials, called Randomized Control Trials or RCTs, follow a standard process for evaluating whether a medication is safe and effective. Here’s how it works-
Alyssa Bilinski:you take a group, you randomly give some of them treatment, and others not. And this is the best way to learn about, learn reliably about how well a drug works and doesn't work, and what its side effects are not.
Megan Hall:but when you don't include pregnant women in these randomized controlled trials, then you don't have any data about whether it's safe for pregnant women to take?
Alyssa Bilinski:Yep. And so in practice, what that means is that some people still choose to take medications, even if there's not great information about them, and they might be exposed to adverse effects before we learn about them. And on the other hand, some people will be hesitant. To take a medication that could actually really benefit them, because there's not great data available about it.
So we wanted to ask what might actually seem like a pretty basic question, which is, how often are pregnant people included in these RCTs and these clinical trials and how has that changed over time? And this is a deceptively hard question to answer. So the reason for that is that there's a really good database about nearly all clinical drug trials, in the United States, that has really rich information about trials, but there's no actual field related to pregnant inclusion.
Megan Hall:So there's not like, just a little check box that can say, did you include pregnant women? Yes or No
Alyssa Bilinski:no, no check box. Sometimes it's going to say, had to have a negative pregnancy test. Sometimes it might say, Be post menopausal. Sometimes it might say, have a positive pregnancy test. It's completely unstandardized,
Megan Hall:Here’s where the role of AI comes in…The information that Alyssa was looking for is buried in long paragraphs that describe each trial. And she had more than 90 thousand trials to sort through.
Alyssa Bilinski:But what we didn't want to do was tell a poor RA to go read, I don't know between 40,060 1000 of those studies. It would probably be a Summer's worth of work for an RA to do, and a really unpleasant summer at that.
So AI helped us out by basically reading these blurbs for us and telling us whether pregnant people were or were not included, as well as some additional information.
Megan Hall:But here’s where Alyssa had to think about preventing AI from making errors…
Alyssa Bilinski:I think the key is that we didn't just say, Here chat GPT tell us whether pregnant people are included.
Megan Hall:Before Alyssa and her colleagues asked chat GPT to read through all of those randomized control trials, they created a series of tests.
Alyssa Bilinski:So we pulled a small group of studies, like 25 to 50 studies, and we asked ChatGPT, or the API, the model underlying chatgpt, And we said, tell us whether you think pregnant people were included in this clinical trial. Give us the reason for that classification and give us a quote that supports your claim.
Megan Hall:Oh, wow. Okay, so don't just give us the answer. Support your work.
Alyssa Bilinski:Yep, support your work. And so what that allowed us to do was it helped us to both catch edge cases that we maybe hadn't thought about, like, what if it talks about breastfeeding but not pregnancy directly, and it also helped us to catch cases where the AI was likely to hallucinate, so the case that it had a lot of trouble with was sometimes like there wouldn't be any information to make a call. But the model would classify the trial as excluding pregnant participants. But we wanted it to tell us there was no data, and so we actually then did something else. We added a second step where we would have a second AI agent, take a look at the result from the first agent and say, like, Did you classify this correctly? And in particular, are you making this error of being too confident?
Megan Hall:So it's like you had AI, be a second reader. You had, like, a second reader, AI,
Alyssa Bilinski:We totally had a second reader, AI.
Megan Hall:And was it trained differently? Because if it's the exact same AI, won't it just make the same mistake?
Alyssa Bilinski:Interestingly, no. Calling it a second time. It could often kind of catch itself just like maybe sometimes when you play with chatgpt and it gives you an error and you say, that's not right, it will be like, Oops, sorry. You're right. I'm wrong. So interestingly, that worked really well. And then what we did was we had a human to whom we are very grateful. Actually label 1000 studies as a larger training set. And we sort of went through this process on a larger scale of really trying to refine the prompt and the different steps at play.
Megan Hall:Even after Alyssa’s team let the AI loose on the dataset, they took one last step to make sure the results were accurate. They went through and randomly checked its work.
Alyssa after:And what we found was this model was more than 98% accurate.
Megan Hall:How would you have done this research if you didn't have AI?
Alyssa Bilinski:
So I think if we didn't have AI, we would have only been able to look at a much smaller sample of trials, and we would have had a much less well rounded understanding of this phenomenon. In contrast, the work that I'm describing here, allowed us to very nimbly answer this question, as well as a number of other questions that were related that we looked at in the trial very, very quickly. So actually, like running this analysis on the full sample takes about an hour. Wow, and the training takes longer, but it's still much, much faster and so totally changed how we thought about about our ability to do this kind of work.
Megan Hall:So it sounds like most of the effort in using AI wasn't in just saying AI answer this question for us, it was training it and testing it and making sure that it was appropriately doing its job
Alyssa Bilinski:100% Yeah.
Megan Hall:As for the results of the research…
Alyssa Bilinski:So what we found out is perhaps not surprising, but not very heartening.
Megan Hall:After the AI tool analyzed all of the studies…
Alyssa Bilinski:Less than 1% included pregnant participants. And even more so, despite calls to more broadly include pregnant participants in RCTs this this rate has been completely flat over the past 15 years
Megan Hall:And in all of the trials for drugs that target chronic conditions like anxiety, depression, or asthma, only 19 had pregnant participants.
Alyssa Bilinski:and there are real harms to both the parent and to the fetus and eventually child, of having a parent who has uncontrolled depression. And if we think potentially risky to experiment on pregnant people, it's even worse to make every pregnant person take imperfect information and make a guess. It's kind of like experimenting on all of them, but not learning from it,
Megan Hall:Yeah. So instead of having a randomized control trial where you're like, really monitoring symptoms and being sure to check all sorts of safety data, you're just kind of asking women to experiment on themselves.
Alyssa Bilinski:Yeah, and to make a hard call during a really important time with less data than people would normally have available to them to make decisions about taking medications.
Megan Hall:So what are your thoughts on that now that you've done this research?
Alyssa Bilinski:Yeah, so I think at a really high level, what we like to emphasize is that it was only in 1962, that the FDA required that companies submit evidence that medications were safe and effective. And for quite a long time, not just pregnant people, but all women of childbearing age were barred from participating in clinical trials. And it was only in 1993 that federal law required including non pregnant people and sort of broader notions of representation in clinical trials. That really wasn't that long ago. That was just about 30 years ago. And so our hope is that 30 years from now, not including pregnant people in clinical trials and not having high quality evidence about the safety of medications, both for pregnant people and for their babies, will seem just as odd and unusual to us as not including women in clinical trials seems to us today.
Megan Hall:So, let’s just try to sum it up a little bit here, I think a lot of people are afraid of AI. They're afraid of AI for taking their jobs, maybe starting to make podcasts without human producers. They're afraid of it, making mistakes, having these hallucinations, but they're also aware of the power of AI. So what's your perspective on the role of this tool moving forward, specifically in research?
Alyssa Bilinski:So I think we should be cautious, but engaged. I think that, to your point, AI is only going to become more ubiquitous, and it's to our benefit and the benefit of the research we do in the communities we work with to harness what it can do to improve our research. At the same time, on the AI front, if there is one thing I would want people to take away from this podcast, it's that phrase, test driven development. Anytime we are using AI, I want us to be stopping and asking the question before we open whatever interface we're using, and ask, how will we know if the results we get are correct and reliable, and only after we've thought through how we're going to do that do we then actually go to using AI. And then sort of the last piece of that is, I also think we have to be really transparent about communicating how correct and reliable and robust it was for the particular application at play.
Megan Hall:So use AI, but make sure you spend a lot of time upfront testing it and making sure it's giving you the answers that you think it's giving you.
Alyssa Bilinski:Exactly
Megan Hall:Great well. Alyssa Bilinski, thank you so much for coming in. This was really interesting, and I'm still scared of AI, but I do see its usefulness, and I'm glad you're being careful about it.
Alyssa Bilinski:Thank you so much. It's such a pleasure to talk with you.
Megan Hall:Alyssa Bilinski is the Peterson Family Assistant Professor of Health Policy, as well as an Assistant Professor of Health Services, Policy and Practice and Biostatistics, at the Brown University School of Public Health.
Humans in Public Health is a monthly podcast brought to you by Brown University School of Public Health. This episode was produced by Nat Hardy and recorded at the podcast studio at CIC Providence.