Today: Testing GPT4 Variability in the Clinical Setting
Episode 17914th September 2023 • This Week Health: Newsroom • This Week Health
00:00:00 00:12:32

Transcripts

Today in health, it testing the reproducibility of GPT four in a clinical setting. My name is bill Russell. I'm a former CIO for a 16 hospital system. And creator of this week health, a set of channels and events dedicated to leveraging the power of community. To propel healthcare forward. We want to thank our show sponsors who are investing in developing the next generation of health leaders. Short test artist, site parlance, certified health, notable and service. Now check them out at this week. Health. Dot com slash today. We want to help us out, share this podcast with a friend or colleague, use it as a foundation for daily or weekly discussions on topics that are relevant to you and the industry. They can subscribe wherever they listen to podcasts.

All right today, I'm at one of the 2 29 round table events. With, , some security officers, some healthcare leaders, and we're going to have some discussions around that. The exciting news around. That is that, , we've raised another $5,000 for childhood cancer. And every time we get together, we ask the sponsors to contribute on behalf of the members. Who come to the 2 29 events. And, , these sponsors stepped up and contributed another $5,000. So we are up over $55,000 for the year on our goal of 50,000. So we really appreciate. Your generosity and generosity of everybody involved. If you want to contribute, go ahead and hit our website top right-hand column. You're gonna see a logo for the lemonade. Stand, click on that to give today. We believe in the generosity of our community and we thank you. In advance. All right. I love it when, , you know, things progress. I don't know how to say it any differently. So, , did a podcast. And we talked about, , John Comments on generative AI and how it's not reliable. And th the main topic there was that generative AI is probabilistic and not deterministic. So most algorithms are deterministic. They have a set of rules that you take, , some, ask some data and you pass it through a set of rules that it comes out with the same thing.

Every time that's deterministic. Probabilistic means. It is. Essentially. Identifying what the next most probable word is in the sentence. And so when you take things like you throw a whole bunch of notes, At a clinical note at a system and ask it to summarize those things. It's it's probabilistic. It's saying, well, this is probably the next word, probably the next word and so forth and so on. And it's amazing given that probabilistic nature, how accurate GPT four and other large language models are. But at the end of the day, they will produce different answers. And so Kevin Malloy. Is, , someone who listens to the show and somebody who I've had on the show. He is what I would call a physician nerd.

And I love clinical nerves because they will make this work. They will take their medical background and they will take their knowledge of technology. And they will bring the two together and they will make sure that this technology gets implemented in a way that is safe. That is effective. For the clinician and that moves healthcare forward. And so He, , wrote an article on LinkedIn. Where he shared an experiment that he did. And he said, I have been enthralled with open AIS functioned calling where you can basically request a response in Jason. To plug into your user interface. I E fire app M page, et cetera. So I created a simple web app get experiments that patient. Dot dev. That would use function calling. And let the user enter a triage nurse blurb and get the top five likely conditions according to GPT four, as well as probability and reasoning. So if you go to that site and you can. And again, it's, , get experiments. Actually that takes, let's see, get experiments that patient, that dev. All right. So I go over here has system prompt. And as user prompt that has description of table fields, and then it comes back with a condition. Now, if you do this, you're gonna have to put in your open AI key. He doesn't store it. He's not collecting them, but, , it's just required in order to utilize the API and get it over there. Now, if you have your own, you could write this code. , it's, , it's not overly hard. And if you have some programming backgrounds, Over the hard to write this kind of code. And if you have the API APIs really not hard, you could also use something like, , Like Zapier. Which is what we're using over here for generating notes and whatnot around our. , interviews that we do. So we do a 45 minute interview. We send it over to Zapier and we use their interface to, , to go. Back and forth with our open API. , a, , open, , chat GPT, , open AI API. There you go. And, , it provides a response and that's essentially what's going on here. So let me give you a little bit of what's what he has done. As I play around with it. I noticed that even with the same exact input you could T4 would give me different answers. That's the probabilistic nature of GPT for. So I captured 50 sequential API requests using functional calling against the open API. Open AI API, put them in a Google sheet. You can check out the exact object sent at the bottom and copy and paste it yourself. And he has a link to the Google sheet. This by the way is in LinkedIn. If you want to find it on, like, then

you could follow Kevin Malloy, M D M a L a Y. And the article title is re reproducibility of GPT four in healthcare and experiment. Okay. And this is what he sent the nurse triage note. User prompt was 55 year old male. History of diabetes. , mellitus. Hypertension with two weeks of right. Upper quadrant pain, chest pain. There you go. And they said 49 out of 50, 98% of the time. I returned the top, most probable diagnosis. As having to do with gallbladder disease, acute. Cola cystitis, chronic Cola, cystitis, and gallstones. So there you go. , oddly the one non gallbladder. Related top diagnosis was GERD.

Which our clinicians would know is gastro. Esophageal reflux disease. Which is somewhat disturbing. As I asked CPT to imagine they were an ER doc and I don't know any ER, doc who would think the most probable diagnosis in a 55 year old male. With those symptoms would be GERD. I imagine most would think of more life threatening diagnosis as acute coronary syndrome, gallbladder, pathology, pancreatitis, et cetera. As GBT, advanced data analysis to help with how similar each run was. And it suggested using the Jakarta similarity. On average, the runs have its courage similarity of 3.4, 1% to 6.97%. Indicating that there's a moderate amount of variation in the diagnosis generated across different runs. His take on this. He summarizes it. He has to, one function calling has insane potential for fire apps. And pages anything in the EHR that can do simple network requests and then process Jason. And that's absolutely true. The more I used GPT four, the more I just ruminate on different ideas of how we can utilize it. Over here. And if I were in the, , CIO role for a health system, I would be doing the same thing except I would be doing it with a set of nurses and doctors and exposing them to the technology and asking them how it might be used. , the other thing I would say is I also agree that that there's one thing to sit in front of the prompt and go ahead and put it in. And they now have a, an iPhone app that you can have it read on your iPhone and put the prompts in. It's one thing to have that it's another thing to use the API because the API. Builds it right into the workflow. It's really, really fascinating. Anyway, his second thing is there's a considerable amount of variation. Which worries me, but it probably produces more relevant and useful NLP responses than anything we have. It brings to mind the saying don't let perfect be the enemy of getting stuff done. Given the right use cases, such as copilot for differential based. On minimal triage, text function calling in GPT can be helpful in the ER, especially at 3:00 AM. And that gets back to, I mean, my soul went on, this is absolutely, , recognize the power of this, especially with an API. I agree a thousand percent.

And then the second thing, which he talks about, , you know, don't let perfect be the enemy of making progress, but in order to do that, we have to adhere to the copilot design construct. Which is really important and we have to encourage people not to get lazy. It is very easy to get lazy with these kinds of tools and to rely on them too heavily. And then all of a sudden we've created a problem which gets written up in a journal. You get the, the whole idea. The co-pilot design construct. Drive this into your healthcare organizations. AI as a copilot AI, not as the pilot. And hopefully if you drive that in and it's, there's a cultural adoption to this that cannot be overlooked. Yes. We're going to talk about the technology. Yes. We're going to talk about API APIs. Yes. We're going to talk about, the. The probabilistic versus deterministic. We're going to talk about a lot of, , techie kind of things, but at the end of the day, there's a cultural adoption that needs to happen here. And we can't err, on either side, we can't air on the side of, , you know, Hey adopt at all costs. And we can't err on the side of, , moving too slowly. In doing that, we have to have the right design construct. We have to educate people on how they can use AI effectively. In their environment. So again, Bring teams together. Start having conversations. I read a great article. On how. Cedar Sinai is bringing people together on the AI thoughts. I guess we had hackathons for a while and now we have AI thoughts, but essentially they're bringing people together and they're. Soliciting their ideas and their thoughts on how generative AI can be brought into the clinical setting. , And it's everything from the patient journey to note summary to you name it if you put if you put enough clinicians into a room especially Ones that on the scale 10 towards the to the nerdy side they like technology they play with technology And they have become familiar with gpt They're going to have ideas they're going to have interesting ideas now Some are going to be viable some art that's the reason you do ai athens And one of the things that they've done at cedars is the best idea actually gets some funding And they actually move forward and i i love that concept i love the exposure creates i love the culture creates and i love the incentive that it creates for clinicians to get involved in moving this technology forward In a safe and effective way I want to thank kevin malloy again if you haven't followed him, follow him on like did he And i recorded a show a while back so you can find him on our website as well And he is on the nerdy scale probably , pretty far over he he does a lot of work with fire And a lot of , programming work and whatnot so And i love it i love where he's taking this i love the experiments that he's doing and moving this body of knowledge forward so thanks kevin for doing that appreciate it And that's all for today So don't forget Share this podcast with a friend or colleague really helps out And start having conversations about it We want to take our channel partners who are investing in our mission to develop the next generation of health leaders Short test artist 📍 site parlance certified health buildable, and service now check them out at this week health.com/today. Thanks for listening That's all for now

Chapters

Video

More from YouTube