Artwork for podcast In The Blink of AI with Georgie Healy

How to Build a Defensible AI Startup – With Dr. Thomas Kelly from Heidi Health

Episode 39 • 25th July 2025 • In The Blink of AI with Georgie Healy • DayOne.FM

Episode Summary

Dr. Tom Kelly, founder of Heidi Health, is building one of Australia’s fastest-growing AI startups, and he’s doing it differently. In this episode, Tom reveals how Heidi Health transforms messy doctor-patient conversations into medical-grade notes in seconds, why batch transcription beats live, and why trust and time, not flashy features, are the future of healthcare AI.

Georgie and Tom unpack why most B2B SaaS startups may not survive, what not to do as a non-technical founder in AI, and how to build trust in high-stakes industries. They also explore personal branding, attention hacking, agents, and AI's limits in life-or-death decision-making.

Plus, Tom shares a killer AI travel hack and plays Late Stage Startup Bingo, guessing the hottest Aussie AI companies, from Relevance AI to Leonardo.

Time Stamps

03:20 – What is Heidi Health and how does it work?

07:30 – Doctors spend half their time on documentation

10:45 – How many Australians have unknowingly used Heidi

12:15 – Does every AI founder need a personal brand?

14:20 – Attention hacking vs. building real trust

18:00 – AI Hack of the Week: voice agents as personal tour guides

20:00 – Gemini’s financial report guessing game

23:00 – Late Stage Startup Bingo: Aussie AI edition

28:30 – The batch processing decision that made Heidi better

31:10 – Why 2% better quality led to 40% more adoption

35:45 – How non-technical founders can get AI-literate fast

40:30 – Should you fine-tune models early on?

44:30 – RAG vs. unlimited context: what comes next

49:10 – The dream of lifelong medical memory

50:50 – Rapid Fire: B2B SaaS, AI agents & Aussie gov support

Resources

🌐 Heidi Health: https://heidihealth.com

🔗 Dr. Tom Kelly on LinkedIn: https://www.linkedin.com/in/tomkeykong/

In the Blink of AI is made possible by our wonderful partners.

- Vanta: Vanta is the all-in-one solution for startups to become compliant quickly and build a security foundation with ease.

Startup customers get $1000 off Vanta at http://vanta.com/blink

✨ Connect with Georgie Healy

Linkedin: https://www.linkedin.com/in/georginahealy/

Instagram: https://www.instagram.com/georgina_healy/

Twitter: https://x.com/georgina__healy?lang=en

The Day One Network

In The Blink of AI is part of Day One, the podcast network dedicated to founders, operators & investors.

Mentioned in this episode:

Vanta Ad_BAI Jul25

Vanta Ad_BAI Jul25

Georgie: 00:00

What are some critical decisions you made when building Heidi? From a technical standpoint,

Tom Kelly: 00:06

If I have to, as a doctor, feel like I can’t trust the output and have to review every single word, every single note, and key facts are wrong, then it very quickly stops being useful.

Georgie: 00:17

Year of the AI agent. Agents, agents, they’re changing our lives. Tom, or are they? What do you think they can do, and what can the everyday person actually get from an agent?

Tom Kelly: 00:27

If Heidi is valuable, I, as a doctor, shouldn’t have to edit it much. It should basically read my mind and write what I would’ve written in the room.

Georgie: 00:35

What kind of AI startups may not survive due to the type of problem they’re solving, or just a very outdated approach?

Tom Kelly: 00:43

All B2B SaaS.

Georgie: 00:44

Yeah. What’s a must-do if you’re building an AI-powered company like Heidi Health? Hello, and welcome to In the Blink of AI, your weekly front-row seat to the AI revolution. I’m Georgie Heal, and this week I’m speaking to Dr Thomas Kelly, the founder of Heidi Health. They’re the AI startup that’s growing faster than Canva.

And on the show, we talk about why accurate medical notes are crucial, the future of how health records are interpreted, attention hacking, and personal brand as an AI founder. We also play an AI game, and if you wait till the end, you’ll get Tom’s surprisingly hot takes. I know this is going to be a fan favourite episode.

We’re keeping up momentum on the show, because next week is our first ever in-studio recording with a multimillionaire founder. The questions I asked actually made me uncomfortable, so subscribe now and it’ll be waiting for you next Friday. But for now, I can’t wait for you to hear from Tom from Heidi Health. Let’s dive in.

Tom Kelly: 01:55

You’re listening to a Day One FM show.

Georgie: 01:59

Hey Tom, thank you for joining In the Blink of AI. I’ve had you on my bucket list for some time, thrilled to have you! Look, Heidi Health is a household name. You’re on the front cover of the AFR, but just in case listeners aren’t aware of what Heidi Health is, can you please give us the elevator pitch?

Tom Kelly: 02:20

Thanks for having me, Georgie. Yeah, I’m hoping most Aussies have run into Heidi at this point, maybe if their GP or physio uses it or something like that. Heidi’s a piece of AI technology that listens into visits and turns clinical conversations into really good clinical notes.

Usually as clinicians, it’s tricky to manage everything in the room. You’ve got a patient, maybe patients, you’re trying to figure out the right questions, what diagnosis they might have, basically what to do next. And you’ve also got to document everything and create all the paperwork the patient needs, all at once.

you know, Woman’s Day from: 2001

So basically, if Heidi’s doing its job, they click stop at the end of the visit, instantly create great notes, review them, draft them, put them in the record, and that’s it. Fortunately for us, it’s been really popular. Clinicians save a lot of time and now it’s being used all around the world.

Georgie: 03:35

You guys have been incredibly successful. When you were a doctor, talk us through how much of the work was actually in that chat you were having with the patient versus the follow-up and the write-up of notes and things like that.

Tom Kelly: 03:52

I’d say probably about half the clinical time is spent either before or after visits. It’s pretty proportional to the time you spend with the patient. My experiences were mostly in outpatient clinics, I was a surgical registrar, so we had a lot of the clinics where you go to hospital and wait to see the surgical team. Often there’d be three, four scans that happened before, previous visits, old clinical notes from when they were in hospital, so there’s a lot of looking through the past and trying to figure out what’s going on.

In the room, every conversation you have, at a minimum, you have to write a clinical note and a letter back to the GP. If there’s anything else, you might need to create TAC, WorkCover forms, certificates of capacity, you name it, different kinds of forms. Generally, if it’s a half-hour visit, I’d probably have 15 to 20 minutes of prep and documentation to do. And very often, because it’s so busy and the clinic is only maybe four hours, you just see as many patients as you can and then do all your paperwork after the fact.

Georgie: 05:03

Wow.

Tom Kelly: 05:03

Which is bad quality of care, because you’re not likely to remember everything you said in the first visit. You’re doing your best, but often you’re missing key details or facts of the case. What we actually find is that Heidi’s notes are generally more accurate than manually written notes. It’s not because clinicians don’t do a good job, it’s just that they’re doing them after the fact, so they don’t quite remember everything that happened.

Georgie: 05:30

Oh my gosh. I’ll listen to an episode I recorded the week prior and be like, I don’t remember any of this.

Tom Kelly: 05:36

Yeah, exactly. Memory’s interesting, you remember the highlights and the lowlights, or strong emotions, or things that really imprint in your mind. You remember the key facts of the case, the main things, but you might miss a lot of details. Those are the things that make people great doctors, or perceived as great doctors. If you remember that they have three kids, their names, their ages, all these little details that you kind of forget. There’s an increase in quality, but also, for patients, the perception of this amazing doctor who seems to remember everything about them, which is the best part of it for me.

Georgie: 06:14

We hear about bedside manner, or that feeling you get from the doctor. But if the notes suck, it’s like, well, I don’t just want to have a chat, I want to be diagnosed correctly and all that. Doctors are famous for terrible handwriting. What was your handwriting like, Tom? Did you try really hard to make it perfect?

Tom Kelly: 06:37

I have to say, I was an outlier, I actually had really nice handwriting, I think. Really! A lot of doctors from my generation had that millennial thing, highlighters, sticky notes, beautiful notes. I was similar, all these coloured highlighters from Drew Village, surgical anatomy pictures and things. But it was too slow, that was the problem.

Georgie: 07:04

That’d be great if you had three patients in a day, but…

Tom Kelly: 07:06

Exactly.

Georgie: 07:07

I was curious about this because when I recently booked in, I got an email notification saying in advance that the session would be transcribed by an AI tool. I don’t remember if they specifically called it an AI tool, but I thought to myself, I wonder if this is Heidi. Would it be you, and how many Australians inadvertently have been supported by Heidi Health, do you think?

Tom Kelly: 07:39

On the consent point, it’s really important. Out of the box, Heidi has a way to set up consent, we give patient explainers and forms in the waiting room and all sorts of things. So you should definitely expect the clinician to tell you they’re using Heidi, either beforehand, in the waiting room, or during the visit.

As for our reach, we don’t know for sure because we don’t capture any patient data ourselves, but we know how many visits we do a week. We’re almost doing 2 million visits a week now around the world. In Australia, it’s probably around 500,000, so over a year, that’s a lot, maybe 25 million visits. I think probably 20–30% of Australians have run into Heidi at this point.

Georgie: 08:33

That’s incredible. And just on a personal note, you seem to have quite the follower count, you’ve got 10,000 LinkedIn followers as a founder. Is that the game now? Do you need that personal brand and reach, or would you prefer to just stay behind the scenes? Or is it a bit of both?

Tom Kelly: 08:57

Definitely a bit of both. Early on, I found LinkedIn really useful for us, a lot of early adopters, or people looking at creative careers in medicine, are on LinkedIn. It was a good way to find our first users. I remember writing lots of different posts, trying to get people to try Heidi. As it’s grown, I’m still excited to share team achievements, milestones, interviews like this, because my job as CEO is to attract amazing talent and make sure they know this world-class AI story is coming out of Melbourne and Sydney and that they should join us. So for me, it’s mainly about employer brand and recruiting, not so much my own brand.

Georgie: 09:53

The reason I ask is, I’m increasingly of the view that time to market and the availability of incredible tools lets founders get up and running faster. So to differentiate, and for brand, having a social presence is more and more important, because everything else is democratised. Do you subscribe to that philosophy?

Tom Kelly: 10:30

I think so. It’s similar to a lot of media, podcasts like this, the loss of centralised media and channels, TV with less viewership than ever. It’s similar: you don’t have to do it, it’s just an option. There are different channels to get your product in front of people. For us, doctors aren’t typically a group you go directly to; usually people sell top-down to chief medical officers or practice owners, and doctors are an afterthought. We’re unusual in that we go directly to doctors. They might need their practice’s permission, but that was a deliberate strategy. Having been a doctor, I know they’re human, they like to use good products. For us, it was important to build a great brand. We even do a lot of performance advertising, directly to doctors, or pre-roll YouTube ads of me being dumped with notes and talking about the product.

Georgie: 11:46

I need to see this, this sounds good.

Tom Kelly: 11:49

Yeah, so for AI products, especially consumer-first ones, you should go where your users are. A lot of founders go to Twitter (X) because there are heaps of engineers and founders there. If your software is for founders or a productivity tool, that’s the place. For doctors, it’s tricky, LinkedIn, some Facebook groups, but ultimately, you have to have something great that gets word of mouth and spreads on its own.

I’ve seen companies out of YC in San Francisco do “attention hacking”, outlandish, scroll-stopping things just to get the name out there. It doesn’t even matter what they do.

Georgie: 12:49

Have you seen these founders who raise money from a16z, like, one of them hacked Amazon admissions tests, and then they get funding?

Tom Kelly: 13:03

Yeah, that’s exactly the one I was thinking of.

Georgie: 13:05

It’s not the way you’d expect to attract someone trustworthy! But this attention hacking is fascinating.

Tom Kelly: 13:19

I think it depends on your category. For us, trust, safety and privacy are key, so that would never work for us. But there’s no wrong answer, you just want people using your product, just to survive at a minimum. You need users to pay your team and start growing. Once you reach some steady state and word of mouth, then you can be more precise about channels and approach.

For us, and still today, I think brand is the most important thing in healthcare. It’s very networked, doctors talk to each other. The number of patients that have seen Heidi is wild, so it should spread quickly if you’re doing something right. But what the brand stands for and how it makes you feel, “time to care”, giving doctors and patients more time, higher quality of care, these are our brand values. We try to express them in all our channels and everything we do.

Georgie: 14:32

Thank you for unpacking that. That’s genuinely fascinating and it’s great to have the healthcare perspective. Let’s dive a little deeper into AI. I was pleasantly surprised how deep in the weeds you get, even as, with utmost respect, an ex-doctor but a non-technical founder, you seem very passionate about the technology.

Let’s start with something a bit fun, AI Hack of the Week. This is where you and I share a hack: a tool we like or a specific use case. Tom, what’s your hack of the week?

Tom Kelly: 15:20

My favourite thing is, whenever I’m in a new place, it doesn’t have to be a new city, it could just be a new suburb, any chatbot with voice mode. You can use ChatGPT, or Perplexity is pretty good. Just turn it on and there are different ways to do it. The problem with voice modes is ambient noise, if a siren goes by, it’ll suddenly start replying. So what I do is put headphones in, some with a physical mute button, so you can just mute. Walk around, and the intro prompt can be anything you want, “I’m walking around Amsterdam, I’ve never been here before, I’m going to tell you what I see, can you give me history, steer me around?” It does an amazing job. It’s like having a tour guide in your ear. You can use it for language tutoring, practising speeches, all sorts. The trick is muting, it doesn’t really work unless you can mute. Or just hold your phone like a walkie-talkie: open it up when you want to hear it, mute when you’re done. That’s my hack.

Georgie: 16:46

That’s a brilliant hack, Tom. I love it, it’s something everyone can use. Everyone travels and sees new places, and I agree, I’ve got small kids and sometimes I try using voice mode and then within a split second I’m interrupted by a three-year-old’s chatter! So I love that, thank you.

My hack of the week, a good hack’s a stolen hack, I find. This one’s stolen from my husband. He has this party trick, a really terrible party game, where he gets an annual report from a publicly listed company, say Apple, and takes the income statement. He’ll share it with someone in finance and get them to guess what company it is based on the numbers.

That’s how I used to do any hiring and things like that. Well, he’s been using Gemini and won’t tell anything to Gemini, just uploads the income statement and asks it for insights and which company it thinks it is. And 100% of the time, it nails it, gets it right, and he can debate profitability, cash flow, financial health. I know it sounds niche and crazy, but it is a fascinating thing that AI can do.

Tom Kelly: 18:12

Yeah, it’s amazing, the things you can do now, it’s just wild.

Georgie: 18:21

I do feel bad for the graduates trying to get into these industries, because how do you do a better job than that? It’s quickly evolving.

This hack brings me to the next part, a more fun game, I’d argue: Late Stage Startup Bingo. I’ll share a hint about five different Series A or beyond startups, something you at Heidi can relate to as a later-stage startup. I’ll give a one-liner hint, and you tell me which startup you think it is. Ready?

Tom Kelly: 19:07

Yep, ready.

Georgie: 19:08

So, not an Aussie startup, they use AI to generate realistic human-like voices. Things like audiobooks, virtual assistants, those sorts of applications.

Tom Kelly: 19:21

Nice. There are a few of these, but the one I know best is 11 Labs, I think they’re the most famous. There’s also an underrated one if you want something cheaper, Cartia is very good. Not quite as uncanny valley, but it’s pretty good and cost effective.

Georgie: 19:39

Have you used it for professional or personal use cases, and do you find them compelling?

Tom Kelly: 19:46

Yeah, we’ve explored voice quite a bit. At our last quarterly product roundup, we did a little early preview of some of our calling features where Heidi could have a voice-to-voice conversation with a patient, like this voice mode idea but about someone’s health. Some of them are just amazing. If you don’t tell someone and just play the recording, they probably wouldn’t even notice it’s AI.

There’s a trade-off, like with all these models. You can get amazing generative audio that’s indistinguishable from reality, but it’s expensive and a bit slow. For real-time use, generating replies and turning them into voice, there’s a quality trade-off. But I bet in a few years it’ll be a bit scary, we won’t even know. We’ll have to have voice fingerprinting or other biometrics, some way to prove it’s you before a conversation, because the voice alone won’t be enough. It could just be cloned, which is scary.

Georgie: 21:04

We did an episode a few weeks ago where the guest brought on their virtual AI. If I didn’t know, I would have thought I was having an interview with the actual person, I couldn’t tell the difference.

Tom Kelly: 21:17

It’s crazy, yeah. I know some podcasters now, for their ad reads and other things, just use an 11 Labs voice to save time. It’s perfectly on script and can do the intonation, the highs and the lows. It’s unreal.

Georgie: 21:32

You were on the Today Show on Sunday. That’s one thing you couldn’t have hacked with AI yet! Can confirm, this was you, Dr Tom Kelly on the show?

Tom Kelly: 21:44

Touch! The real background, I could touch it.

Georgie: 21:47

I could see it. Okay, what about this one, an AI-powered music generation platform?

Tom Kelly: 21:55

Okay, I’m not sure if it’s the one, but there’s one called Suno, S-U-N-O. That’s the one I know. Our head of product design, Kate, she’s a musician and loves Suno. She makes songs and all sorts of things, she’s super good at it.

Georgie: 22:12

That means a lot, actually, because I played with it briefly and thought it was incredible. I was curious what a musician would think.

Tom Kelly: 22:21

It’s the composition side. She plays guitar and sings, so it’s idea generation for songs for her. She wouldn’t replace actually performing, because that’s the fun part, but for idea generation, she loves it.

Georgie: 22:36

Not staring at a blank manuscript, you can jump right in.

Tom Kelly: 22:38

Exactly.

Georgie: 22:39

Alright, this company uses AI to power precise medical diagnostic solutions for radiology and pathology, aiming to help clinicians identify illnesses earlier. You nodded pretty early on at this.

Tom Kelly: 22:53

Yeah, I think Harrison, probably Harrison.ai, right? Do you know those guys? I’ve met them a couple of times. We both raised money from Blackbird, a few years apart, but I spoke to Angus, one of the founders, as part of Blackbird deciding to invest in us, got the shakedown.

Georgie: 23:13

Yeah, I bet that was fun.

Tom Kelly: 23:15

It was.

Georgie: 23:16

Well, it worked out for all of you, I’m quite sure. Okay, second last one: an AI agent builder and workflow automation platform, enabling businesses to create their own specialised AI workflows.

Tom Kelly: 23:32

Again, there are a few. But if it’s Aussie, probably the Relevance guys, yeah, Relevance. That’s the one I know.

Georgie: 23:41

You’re doing too well! I should have made these harder. Have you met them? We had Jackie on the show a few weeks back.

Tom Kelly: 23:50

I haven’t met them in person, but we’ve had some emails back and forth and explored using Relevance at Heidi.

Georgie: 23:58

You’ve got to move to Sydney, Tom.

Tom Kelly: 24:01

Yeah, I do. You really do. Melbourne’s more lonely.

Georgie: 24:05

I wouldn’t say you have to move here, but the ecosystem here is quite tight, everyone’s met each other now, it’s quite nice.

Tom Kelly: 24:15

Yeah, for sure.

Georgie: 24:17

Last one: an AI-powered platform for generating high-fidelity images, empowering creators in gaming, architecture, and digital media. Aussie one.

Tom Kelly: 24:28

Got it, has to be Leonardo. Surely Leonardo. Love those guys.

Georgie: 24:37

Amazing, you nailed it. I knew you would, but that was a fun game, thank you. So, diving a bit more into AI technical 101, I’d love to know: what are some critical decisions you made when building Heidi from a technical standpoint? What’s a must-do if you’re building an AI-powered company like Heidi Health?

Tom Kelly: 25:06

There are different important parts, but probably the first is actually ignoring all the models, constraints and infrastructure, and just focusing on the end user. What’s the absolute best experience for the doctor, architect, or whoever you’re serving?

For example, a lot of products in our space use live transcription, like speech-to-text on a phone, or Google Voice to text, where you see it in real time. Whenever I see a product doing that, I know Heidi’s at least 30–40% better. We chose not to use live transcription. We still do real-time processing, batching audio and breaking it into chunks, but we don’t retain any of the audio as it’s processed.

There’s a clear split: live transcription is just returning the next word as it goes, without retaining memory of what’s come before. Batch processing retains memory of earlier parts of the conversation as it processes the next word. For example, if I say, “It’s really sore in my chest,” but it comes out strangely or the audio breaks, and later I say, “I’m finding it hard to breathe around my ribs,” batch transcription can link the two and understand it’s about the chest, thanks to context.

So, it’s just more accurate to do batch transcribing, it makes Heidi much higher quality, with a lower word error rate. We actually tested it a lot ourselves and found a huge difference. Once you’re confident about a choice, then you do all the infrastructure, compliance and security work. For us, we don’t retain recordings of sessions, which is essential for compliance. We have to process things live, but with tiny batches.

We often get requests for live transcripts or speaker labels, and we could do it, but it would meaningfully reduce the output quality, which we’re not willing to compromise. Hopefully one day live transcription will be as good, but for now, we create a live experience with better processing behind the scenes, batch processing.

Georgie: 28:34

Batch processing. How early in building Heidi did you start A/B testing different techniques?

Tom Kelly: 28:45

For us, if you’re building for a specific industry, the founders don’t necessarily have to be from that industry, but you do need a group of people willing to be alpha testers and give early feedback. We tested mostly on ourselves, I’m a doctor, Kieran’s a doctor, Mo runs product and is a doctor, so we’d just do the sessions ourselves, and it was night and day which approach was better.

With our product, quality and accuracy are non-linear: if you’re 2% better in transcript quality, maybe 30–40% more doctors will like Heidi. A small improvement makes a huge impact on adoption and retention.

Georgie: 29:35

Is that because doctors are so focused on quality, or is it universal across customers?

Tom Kelly: 29:44

It’s because of the value, if Heidi’s valuable, as a doctor I shouldn’t have to edit it much. It should basically read my mind and write what I would’ve written in the room. To do that, you have to be very accurate on the transcript. If you’re not, you risk making mistakes with patient names or key facts. If I can’t trust the output and have to review every word and note because key facts are wrong, it stops being useful.

Georgie: 30:25

Forget it. Forget the whole thing.

Tom Kelly: 30:27

Exactly.

Georgie: 30:28

This brings me to my next question, I love that you enjoy getting nerdy about AI technical aspects. What’s critical for a non-technical founder to wrap their head around when building an AI product? I know you could get a software engineer or CTO to do this, but I really recommend you don’t.

Tom Kelly: 30:49

Now more than ever, there are so many tools for non-technical founders. When I was trying to build early versions of Heidi in 2018–19, there was no low-code, no ChatGPT, you just had to learn from books and watch courses on YouTube.

Georgie: 31:10

Yeah, CS50 is a good one, it’s Harvard’s intro computer science, and it’s completely free online.

Tom Kelly: 31:11

Exactly. For non-technical founders, you need to learn the basics of software engineering: how a database works, what a REST API is, how the front end and back end work together, just so you can understand complexity and how to size up tasks. You have to get in sync with your technical co-founder.

For AI specifically, I highly recommend Andrej Karpathy’s videos, he was head of AI at Tesla and worked at OpenAI. He has courses where he rebuilds GPT-3 from scratch and explains how it all works. That helps you build intuition about what models are good at, what they’re not, and where things are heading. As a non-technical founder, especially as CEO, your job is to forecast and point the company in the right direction for the next few years, so you need a perspective on what’s going to happen and why your company will still exist.

Georgie: 33:10

I’m going to look those up after this, thank you! Say you’ve watched all the videos and read a textbook, and you think you understand the basics of infrastructure. Are you still picking a model off the shelf and personalising it later, or do you think you need to start from the ground up and work with an engineer?

Tom Kelly: 33:40

I always go back to the end user. If you can’t get something useful without fine-tuning or building it yourself, you’re probably in a bit of trouble, unless it’s something really specific. Models are so general and powerful that for almost any use case, you should get some utility out of the box. You want feedback like “it works well 40–50% of the time, but there’s a gap”.

For some domains, like chemistry and molecular design, you do have to build from scratch, but for things like professional productivity, or something like Heidi in another industry, my advice is start with state-of-the-art models. Don’t try to cost-optimise before you have product-market fit; you can always solve cost and pricing later (within reason, you still need money to run the business). That’s what we did. When we had the free version of Heidi last year, we gave away the absolute best models, not cheap, but it meant clinicians got the best experience.

There’s extra complexity for us with compliance and privacy, running models in different regions, not every state-of-the-art model is available everywhere. But for founders starting out, I’d use the best models you can afford, and only worry about fine-tuning and training for cost optimisation, not for quality. Fine-tuning is more about making things cheaper or shorter, not about massive quality leaps, except in some cases.

Georgie: 37:09

I love unpacking how a founder got from where they are now to being as successful as you, and how you’d suggest founders starting out should navigate these waters. I’ve got some headline news for you to unpack, Tom. One: the year of the AI agent. Agents, agents, agents, they’re changing our lives. Tom, or are they? What do you think they can do, and what can the everyday person actually get from an agent?

Tom Kelly: 37:44

I think the best use of agents today is still research. I hope everyone’s tried Gemini Deep Research, or ChatGPT’s deep research product, definitely give them a go. Gemini might even be free. Basically, what’s happening is the AI can use tools, it can actually take next steps and hopefully do something useful on your behalf.

Last year, if you asked a question like “What are the best canals to see in Amsterdam?”, the model would just answer based on its training data. Now, with research agents, the model has the tool to search the web, make a plan, look up keywords, read websites, add that into context, and keep doing it until it hits a limit or decides it’s done. The results are often amazing.

There is a bit of a Dunning–Kruger problem though…

Georgie: 39:16

The graph! I love this graph. Explain it to listeners, in words and hand gestures.

Tom Kelly: 39:22

Exactly, yeah. It’s basically about someone’s knowledge of an area. For example, if you’re an expert, let’s use Amsterdam, so you’re a tour guide there, you’d read that deep research result and say, “Oh, it’s missed all these amazing areas.” For an experienced person, your perception of news and research is generally that it’s not very good quality.

But for someone uninitiated, just a tourist there for a health conference, in my case, you think it’s amazing and detailed, because you’re not aware of all the information out there. So it always seems positive-sum to you; you think it’s amazing and complete, which can be problematic.

This is my link to our use case in medicine. I actually think a lot of those agent use cases are very tricky to do well, because you have to be complete. You can’t have false negatives, like not finding the right blood results, missing the research paper everyone cares about, or not finding the right resources. That can be catastrophic.

Georgie: 40:34

Right, very dangerous.

Tom Kelly: 40:35

Yeah, so we have a context feature, but we always put it on clinicians to select what they want to include in context, to upload it themselves. It’s such an intentional choice. We don’t automatically summarise the record or do pre-chart summaries without input from them, because the risk of a false negative is much worse.

In the visit, the clinician was there, they were present, so if Heidi makes a mistake, at least they were there and should review the notes before putting them in the system. For hidden summarisation tasks, it’s a bit dangerous. So agents today have this retrieval and search problem.

Again, not to get too technical, but it’s the same problem everyone has with RAG. If you’ve ever used any RAG-based system, where there’s a search involved, it actually devolves the product back to the quality of the search. It’s like doing a Google search for something really obscure, you often just don’t find anything useful. What the model says to you is, “I couldn’t retrieve anything useful,” or “I could only retrieve this,” and it ends up basically useless.

Georgie: 41:50

Oh my gosh, so true. I remember I used a RAG search for a shopping use case and thought it was genius. I didn’t want to use a million filters, I wanted a dress that’s above the knee, blue, this size, whatever. But it was so overwhelmed…

Yeah, exactly,

That didn’t work either. So is RAG coming out of fashion, Tom? Are we not into RAG anymore?

Tom Kelly: 42:20

Okay, this is just what I suspect, not a novel opinion, but I think RAG is like not having enough RAM in the nineties to run a video game. It’s a weird constraint that will go away in 10 or 15 years, probably. RAG is about overcoming the context window, you only have so much context.

Also, context windows have varying degrees of precision on retrieval. For example, if you use Gemini, which is the best at this, you can put a single sentence somewhere in the context like, “If you find this sentence, please include ‘Apple’ as your first word.” You can then test whether it retrieves it or not, and how accurate it is. Models vary, but Gemini models are amazing, I think Google’s infrastructure is a huge advantage there.

Georgie: 43:09

We didn’t pay you to say that, no affiliation with the pod!

Tom Kelly: 43:14

Yeah, don’t worry, other models have their strengths too. But the reason I mention Google infrastructure is the actual hardware. The larger the context window, the more compute intensive the query is, and to make the context really precise, it’s even more intensive. But if chips keep getting better, faster and cheaper, you could imagine a model with a hundred million token context, your whole life, everything you’ve ever done.

In that world, you wouldn’t need to retrieve data; you’d just put the whole medical record in context and let the model find what’s relevant.

Georgie: 43:58

Fascinating. So for the listener: say, 24 or 25 years ago I had an injury on my leg. Because it’s outside the scope of the context window, it’s not taken into account if I get another knee injury now, that could be a real issue, right?

Tom Kelly: 44:22

Exactly. Imagine you had metal hardware put in, screws and so on, and you present today with fevers, shivering, and weird spots on your hands. I might think you have sepsis, a bacterial infection causing clots, hence the spots and how sick you feel. If I asked an AI system, “Is there anything in Georgie’s history that’s relevant to a bacterial infection?” today it would have to surface your previous fracture and hardware insertion.

There’s a medical link, but the search to find that is really hard, 25 years of documents. The problem is, the search isn’t as good as the model. You can have embeddings, vector searches, but these have existed for years, we’ve all put long queries into Google. It comes down to: do you find that bit of the record? If not, Heidi will just say, “I don’t think I found anything relevant,” which might not be dangerous in that case, but could be in others.

The real issue is reliance, in practice, a doctor or resident would go through your whole history, look at every interaction you’ve ever had. To be fair, they probably wouldn’t find it if it’s that long, either.

Georgie: 46:05

Yeah, I know. Worse off, I guess, but yeah.

Tom Kelly: 46:08

Exactly. So for us, as we push the bleeding edge on different use cases, that’s the standard we have to test against, existing practice. Are we the classic Hippocratic Oath: are we causing harm? We shouldn’t cause damage. It shouldn’t be worse than current standard practice.

As long as doctors are taught and understand the high risk of false negatives, and know they ultimately still have to do the search themselves, it’s probably safe to release, but heavily caveated. I look forward to a future with unlimited context windows that don’t break the bank, where you can put all that information in and let the model do the searching. Then you’d get a system as powerful as a doctor, or better, making critical associations.

That’s the trajectory we’re on: as hardware gets better, models get cheaper and smaller, these amazing things become possible. For individuals, it means an infinite memory of everything you’ve done at work, self-improving AI systems become possible when you can give the AI its own memory. Today, it’s not feasible, if every conversation I’d ever had with ChatGPT was in its memory, it’d break the bank.

So yeah, it’s going to be an interesting world as it goes forward.

Georgie: 47:51

So, before we get to rapid fire, I’ve got a three-year-old daughter with a dairy intolerance. When she’s had X-rays, it’s a mess if she’s ever had dairy. But it’s a lot of pressure on me as her mum, when she’s 28, maybe the intolerance is gone, but she might have a stomach issue and I wouldn’t know. What’s the best case scenario that AI and healthcare could provide for someone like her, after 25+ years of doctor visits?

Tom Kelly: 48:29

I think it’ll be critical to have great, interoperable access to everyone’s data. That means, for the uninitiated: I should be able to go to my health record and instantly pull every visit I’ve ever had. If it’s paper-based from the nineties, fine, but whatever exists in digital form should go with me. I should have access, my next doctor should have access, and every medical record system should integrate and get that history.

I think AI will push that to become standard, because historically it wasn’t useful, if I get 20 years of records as a doctor, what am I going to do with them? I don’t have time to read every page. But an AI system, like Heidi, could process it and safety net the clinician, doing things they can’t. Literally read every single line, every blood result, every investigation, and help make it not your responsibility, Georgie, but the record follows your daughter. So next time she’s seen, the doctor can have that surfaced to them.

Hopefully that’s the world we get to. Governments will need to play along, but as Heidi, we can help too. If clinicians are transcribing conversations, maybe you can get a summary on your side and collect these over time to share with the next doctor, a living memory of what happened, which is really useful.

Georgie: 50:32

I’ll sleep better at night when you build that, Tom. I’m looking forward to the future of Heidi Health. We’re at the rapid fire questions, are you ready for the spiciest, hottest takes of the episode? Ready?

Okay, you have to pick one. Hire for Heidi Health: at this stage, what’s most important, medical background, AI background, or sales background?

Tom Kelly: 50:53

AI background.

Georgie: 50:55

Amazing. What’s one bit of criticism Heidi Health has had that’s actually fair?

Tom Kelly: 51:03

Our templates are a bit too hard to make. You almost have to be a prompt engineer to make great templates for really specific things, and some doctors find it hard, which we know about.

Georgie: 51:17

Let’s, honest and fair answer. What kind of AI startups may not survive due to the problem they’re solving, or a very outdated approach?

Tom Kelly: 51:28

Pretty much all personal productivity, actually, this is the hottest take, all B2B SaaS.

Georgie: 51:39

Don’t tell a B2B SaaS investor!

Tom Kelly: 51:42

Except, I’d say all B2B SaaS that’s a thin business logic platform is in trouble. Anything regulated, like Heidi, FinTech, or tricky areas, probably fine. That’s why I sleep easy.

Georgie: 51:58

I should have done a whole episode of hot takes! These are great. How could the Australian government better support AI startups, Tom?

Tom Kelly: 52:06

I think Australia does pretty well, I want to give them some credit, like R&D tax rebates. I’d love to see more industry programs with universities, maybe more formal grad programs or pathways into companies like Heidi, Harrison, Leonardo, etc. They do engage with us, but more engagement as they create policy would be good, AI could have a transformational effect, so I hope they consider us in their plans.

Georgie: 52:44

Yeah, why aren’t they talking to the people building things? That’d be great! Last question: you’re stuck on a desert island, let’s hope it never happens, and you have to choose between AI or a human to bail you out. Which are you choosing?

Tom Kelly: 53:02

Ah, today? Definitely a human. Maybe if it was an embodied robot that could get energy from the sun and didn’t need feeding and was as strong as me, maybe then I’d pick AI. But for now, I’ll take the human.

Georgie: 53:18

I feel like the human might eat me! I don’t know if this is a reasonable fear, but I’d be delicious. What are they doing? Why choose a human?

Tom Kelly: 53:34

No, I think I’m safe. No one would want to eat me, it’s fine.

Georgie: 53:37

Okay, good. I guess that’s like an if-then diagram, if delicious, choose AI!

Exactly.

Tom, you’ve been such a great sport. I could have spoken to you for another three hours. I love the way you think about the future of AI and how you’re trying to solve a problem that affects everyone, the doctor consultations and making that a better experience for both doctors and patients. Thank you for being on In the Blink of AI. Any shout outs for our listeners?

Tom Kelly: 54:09

If you see Heidi in a doctor’s surgery, think of us! Be excited, it means you’re going to get better quality care. If you’re looking to join a company in AI, we’re hiring a lot, especially for AI engineering and also sales teams, lots of people want to use Heidi, so helping them out is the easiest sell in the world. You can find us at heidihealth.com, hope to see you there.

Georgie: 54:36

Thank you so much.

Tom Kelly: 54:37

Thank you.

Share Episode

Shownotes

Episode Summary

Time Stamps

Resources

✨ Connect with Georgie Healy

The Day One Network

Transcripts

Follow

Links

Chapters

Video

More from YouTube