Artwork for podcast Tech Talk with Amit & Rinat
Canary Speech
Episode 9124th November 2025 • Tech Talk with Amit & Rinat • Amit Sarkar & Rinat Malik
00:00:00 00:58:19

Share Episode

Shownotes

In this special episode of Tech Talk, Henry O'Connell, co-founder and CEO of Canary Speech, discusses the inception, development, and applications of the company. Canary Speech specializes in using AI to analyse speech for diagnosing medical conditions. Henry recounts his collaboration with Jeff Adams, a pioneer in natural language processing, and their combined efforts to revolutionize healthcare diagnostics through vocal biomarkers. The discussion highlights the reliability, security, and ethical considerations of their technology. Henry also delves into potential uses beyond clinical settings, the importance of accurate and bias-free machine learning models, and future expansions into different languages and childhood conditions like ADHD and autism. The conversation emphasizes the transformative potential of AI in enhancing medical diagnostics and clinician support.

Transcripts

Rinat:

Hi everyone. Welcome to another episode of Tech Talk. This is a special episode because we have a guest with us Henry O'Connell. And we're gonna talk about the company that he co-founded and is CEO of—Canary Speech. Very excited to know more about it because it is to do with one of the hot topics in current days, AI and AI based applications.

Rinat:

Very excited to know more about what Canary Speech does and how it impacts our society. Thank you Henry for coming to this show. And I'll open the floor to you with just the introductory question. What is Canary Speech and how did you come up with this idea?

Henry:

First of all, I appreciate being here with both of you, and to have an opportunity to communicate with each other. And I love talking about Canary Speech. 40 years ago I was at the National Institutes of Health as a researcher in a neurology group.

Henry:

I met a gentleman, Jeff Adams. He was at the NSA at the time, building mathematical models to de-encrypt spy messages. Now, 40 years ago, speech and language technology did not exist and Jeff was building models to explore encrypted messages and to decode them during the Cold War. At both of us stayed around five years in the government service, and then we left and went on to business careers.

Henry:

I pursued a career in the medical technology area and over 20 plus years I worked as a turnaround CEO. I would go into a distressed company. Jeff had an incredible opportunity and an incredible career. After he left the federal government, he went to work with a gentleman named Ray Kurzweil. He had the opportunity while working with Ray Kurzweil to develop the first natural language processing one of the key tools available in speech and language, and taking speech and creating a textual representation of that.

Henry:

He led the team that built Dragon Naturally Speaking for medical dictation software. He joined a company named YAP, where he built the same platform for legal dictation. Along the way, he did a great deal of work in the fundamental tools that are available to us today and are still practically used across the world.

Henry:

About 15 years ago, Amazon was looking for a team to build probably the most exciting speech consumer product in the world. They decided that the best avenue to successfully build that product was to buy the company Jeff was in. So they purchased that company. They got Jeff and 17, 18 other individuals, speech and language scientists and a patent lawyer with a PhD in machine learning.

Henry:

That became the team that three years later launched the Amazon Echo, which is one of the most successful products in the speech platform area ever. So, he finished up that work about 10 years ago and nine years ago we were at a bagel shop down the street here in Provo, Utah talking about the opportunity we could look for to do something together.

Henry:

And I asked him as is the case with all really brilliant people, I said, what's left on your table? What have you really wanted to do that you haven't had the opportunity or the time to execute on? And he said I've always wanted to apply the use of speech to the identification of human condition and disease.

Henry:

And I said why don't we pursue that? And what was born out of that is Canary Speech. So it was in a bagel shop just about a mile from here in Provo, Utah, that we had a conversation that we thought would last the length of lunch that lasted eight hours. And that provided the commitment that the two of us had, to create Canary Speech and to create a product that could impact on health and medical conditions, and provide tools for clinicians and clinical team members that didn't exist yet in the world.

Rinat:

What an incredible journey, and the story is very inspiring. And I always love hearing about the startup stories that hopefully one of our audience will be inspired to see that, in our daily lives, the conversations we have sometimes can generate really innovative ideas and change the world in a positive way.

Rinat:

When I came to know about Canary Speech and the way it's trying to make a difference in the world of AI, obviously AI has been researched on and worked on for a lot more than before 2023 when Chat GPT popularized the idea of AI. A lot of people don't know that, but many people like yourselves have been working on generating text from audio and things like that. I've also worked on some projects where you try to digitize the handwritten text content as well and those things were being worked on many more years ago.

Rinat:

With the advent of AI now there is an AI for everything, and a lot of the times those things don't really add a lot of value in society, it's a lot of hype. And what I've seen in Canary Speech is that in med tech it is actually trying to make a difference. When I read about it, I wanted to understand a little bit more about what the product does. Please tell us what do Canary Speech do? What are the main products and how does it work?

Henry:

Oh, and certainly thank you. So after Jeff created commercially available NLP products he provided tools that people could use to do both research and to explore the application of those tools across different marketplaces and different applications in healthcare and in doing analysis and analytics and financial applications. A whole range of them, and they're applied in a range of different places. They make it practical for us to be able to speak fluently and create textual records of that, and then search that textual record. And so for 35 years after he created these tools, there was an attempt on the part of many very intelligent people to apply the analysis of the words that people speak to diseases.

Henry:

One of the groundbreaking pieces of that work was done by MIT, where they looked at Agatha Christie's novels and 40 years of novels, very rich set of data, and Agatha passed away with Alzheimer's disease. They were able to go back in her novels, which were very rich in information, a textual record that they digitized, and then search for where they saw changes in how she created sentences and the use of language and words.

Henry:

And they were able to pinpoint where they believed her Alzheimer's started and how her language usage decreased over time. They weren't able to create a practical product from that. And the reason is that most of us don't have 40 years of novels when we go to a clinician or a clinic and they don't have the resources or capability to analyze 40 years of my written word.

Henry:

So we looked at that and after 35 years, there were no products in the industry. Now, one of the things that we understood was that textual level is a representation of multiple layers of data underneath. That data is much richer, much denser, and much more closely associated with the central nervous system's creation of language.

Henry:

And if there is damage in our body to our central nervous system, to our brain, to other components like our vocal cords or other things, our ability to create those words are gonna be impacted. And that could be read in real time. Clinicians and clinical people do this every day when they're sitting with a patient. In that bagel shop that I mentioned earlier, I said to Jeff after he described the history of what was going on and how the work had proceeded, I said, that's funny. He knows my kids. We've known each other for 40 years. My kids were grown. Most of them had completed college by then. And I said when my daughter, Caitlin was still in high school, she would come home from soccer practice.

Henry:

And as she walked across the room, I could tell just from her walk, her gait, whether it was a good day or a bad day. When she turned and looked at me and I could see her face and her body language, I got a great deal more understanding of how rough the day may have been when I would say to her how was your day, honey?

Henry:

I got the same answer that every parent gets in any part of the world. "It was fine" as she continued to strut away, and I would always say to her, no. I said, this is a moment for us to talk. And she would come back and we would chat and I'd find out whether it was algebra that day or whether it was her soccer coach.

Henry:

I'd find out a little bit more about her life. And I said to Jeff, I said, Jeff, she no longer lives with me. She's married, she has her first child and on my best day, I would still have her in my home, but that's not what parents get to do. And I said, but I will admit to calling her at least twice a week.

Henry:

And when I call her, I don't have these physical cues anymore. I can't see her walking, I can't see her face, I can't see her shoulders or her strutting. And it doesn't matter the words she's speaking, it doesn't matter at all. But within moments, and it's irritatingly accurate, I know if it's a good day or a bad day.

Henry:

I know if she's happy or she's sad or she's depressed or she's anxious or she's surprised about something. And I said to Jeff, how are we doing that? And he said I don't know. And I said to him clinicians do that every day. They sit with a patient and a patient says, "I'm doing fine." And they know they're not.

Henry:

The doctor knows they're not, and so they pursue it. I said to Jeff, if we could create a tool that they could hold in their hand, that gave them an objective measurement to support what they were observing, that could be a very powerful addition to the clinical toolbox. So that's how this story got started.

Henry:

And Canary Speech doesn't look at the words. In fact, we ignore the words that people are speaking just like I did that day and the many days since then on the phone with my daughter. But our brain, our machine learning tool is listening to how language is created and we don't comprehend consciously that's going on, but we're doing it all the time. And so we listen. We found that there are 2548 features in speech and every 10 milliseconds we're reading that entire data set, and we do that for 40 seconds and we have 12.5 million data elements. Our brain has already picked all those up, but now we're doing it with a tool that's dispassionate.

Henry:

It doesn't engage itself in making judgements. It's only looking at the data. And then we balanced that and developed correlations with various diseases, with clinical teams and some of the most prestigious places on earth. Harvard, Beth Israel, NYU, Hackensack Meridian, the National Institutes of Health in Japan, Tallaght hospital in Dublin, Ireland, Ulster Hospital in Belfast and many others.

Henry:

And we've had the opportunity to correlate with the diagnosis that a doctor was making and these vocal biomarkers from speech—12.5 million data elements, which we do in a streaming realtime mode. And while the doctor is still talking to the patient, they get a response of where the behavioral health or the cognitive health, or whether or not they have Parkinson's disease or MS.

Henry:

And all that information is flowing back to them while they're still talking with the doctor. And that provides them with objective information that helps them manage their engagement and interaction with the patient. So that's what Canary started at, and it was that aha moment, where you think, got a machine learning tool, it's doing this. Can I train an objective tool that supports the clinical team in what they're viewing and what they're seeing, and help them make a decision concerning the welfare and the health of a patient.

Amit:

Incredible. Henry, I think I can relate to that story with your daughter because sometimes when my wife says, and I think we are all married here, so I think when your wife says everything is fine, you know that it's not fine. And it's funny that how you have taken a very simple event in your life and tried to drill down what is happening behind that event.

Amit:

Okay, someone says, "I'm fine," but no, it's not fine. How do we come to that conclusion? How does our brain come to that conclusion? And I think the same is going in my head when, while my son is growing. So my son is three and a half years old, and when he talks, I can understand what is bothering him or if he needs anything, etc.

Amit:

So it's incredible like how as human beings, we are able to do that. It's even more fascinating that you have taught a machine to do that, to help the human beings, especially the clinicians. So I think the next natural question is, how would someone use Canary Speech in real life? Is it a software, is it in a machine? Is it with a mic somewhere? How is the speech analyzed in the clinician's office, and is it just restricted to clinicians? Because while doing research for Canary Speech, I think what I realized is you can also use this in a 911 call, you can understand what the emergency is by analyzing the voice immediately, right?

Amit:

So someone calls emergency and you can analyze, okay, what is the patient going through? Is it actually an emergency? Is it something that they're troubled? Is it just because they are mentally impaired? And they've just called accidentally, etc. So just talk me through how it works in real life and have you applied it to other areas outside of clinics?

Henry:

Sure. One of the wonderful things about a solution like this is that it's possible in any conversation to capture that data. 'Cause it is just conversational speech. So wherever we can capture an audio, whether it's an engagement between a clinical person and a patient or a mother and a daughter on a telephone with someone trying to purchase a product, if we can do that.

Henry:

We capture that audio, we take the algorithms that we've developed in the clinical setting in which were peer reviewed and validated. We take those algorithms, those algorithms provide us with the ability, once we have the audio to analyze the presence and the level of severity of something like the presence of stress and the level of severity of stress or anxiety, or the presence of Parkinson's disease and the level of Parkinson's, the extent to which it's progressed.

Henry:

We can do that kind of thing in real time. And it could be the stress that a young daughter is having, like mine coming home from school, I recognize she does, but she could monitor her own stress levels across time and help as a biofeedback mechanism, help manage that. And a parent can help them do that. A doctor, what we're doing with a doctor is we are providing them technically with a piece of information called a clinical decision support. So in the concept of clinical decision support, our information's not making a decision. The trained expertise of a clinician, the experience that they have is arriving at a diagnosis, but they use a range of different tools.

Henry:

Our tool is an objective measurement. What we're measuring is not consciously decided. When you and I are talking, we're selecting words, our conversation is flowing. We're not consciously seeing a stream of those words, which we're going, yeah, use that one and use that one and use that one. It's not a conscious decision.

Henry:

It's controlled by our central nervous system. When there's a deficit, either anxiety, depression, or Huntington's or Parkinson's, some kind of a deficit in the system, our brain and our central nervous system work with the orchestra that creates language to make it as fluent as possible.

Henry:

But the path to do that has been altered by the disease. We're measuring that, that altered path, and we're associating that altered path with a specific disease. So a practical use would be a patient is in for an annual checkup with a physician, and instead of doing a GAD-7 for general anxiety test, written test, which is subjective, or a PHQ-9, which is a counterpart for depression. Instead of doing those on the side while they're talking with the doctor gets a message back and it says, elevated anxiety. So they look at that and that may reinforce what they have already observed, or it may cause them to think and focus more on that. So the next question they ask may be about how their day is going and how busy have you been at work, and is everything all right at home?

Henry:

And their conversation then turns towards the patient. And we don't separate the clinician and the patient because this is all being done in an ambient listening mode. So we don't have to take the patient and put them over here to get the sample. We just listen to the both of them speak. Now we built a lot of AI into this so that when a doctor and a patient are speaking, we're capable of capturing that audio across what is called an API, that API sends information up to the web, the audio up to the web, but on the way to the web, as we're streaming that information, we separate the two voices.

Henry:

We're capable of doing speaker identification, not their names, but what their roles are. We know who's the doctor, we know who's the patient. We know when we have enough information. We know if the information signal to noise is not right, so we discard bad data and wait until we have good data, and then we process that in milliseconds and return it to the doctor's smart device or tablet in their hand.

Henry:

We can do this with a number of companies. We're primarily B2B. We sell our products as a solution that augments or expands the capability of another platform and we interface through this API with them in a clinical environment or in a consumer environment in their home. We're also used by insurance companies and helping them plan for future use and to evaluate needs of patients.

Henry:

But all of this is done in a passive way. The hope was that we could create a product that would be a companion, a co-pilot with the clinical team or with the call center employee. So as a copilot, you don't want to infringe on the relationship between the doctor and the patient. You want to passively be there and we function at a data security level equal to any healthcare institution in the world. So we're a good partner. We handle data properly, we do it with security and with integrity. And then we return scores all de-identified, and we do that in real time. Now in the United States, much of what we do is reimbursable by insurance, CPT codes. That provides support to the institution and provides them with additional information at the same time.

Henry:

So those are practical uses for the product. Oh, and I should mention, we do that in Japanese, Spanish, and English. We're adding Dutch and Arabic right now, we're expanding to languages to have an opportunity to address this across a broader geographic area. English has been tested in Irish English, British English, Canadian, American, Australian. And what we're trying to do is make this a product that can augment the quality of care, provided individuals regardless of where they live in the world.

Rinat:

Wow. That is amazing. I think each of the features you were mentioning, I was being impressed every time. And the way you are expanding into different languages, that's even more because the rest of the world who are not English speakers, that's also a huge world out there and definitely worth reaching out to. Now, one of the things that I really like about having a tool is the objectivity that it adds to interaction between two people. A clinician, however expert and experienced, they are still humans. There is an inherent risk of added biases. But having a tool as a copilot hopefully helps them come to a more objective decision, which would definitely help with diagnosis. Now the other thing you mentioned is about security and how the whole thing is handled, and that's where my next curiosity is. With any kind of breakthrough technology, one of the first questions you think about is how can it be misused by a rogue agent? And you've mentioned it's a B2B platform at the moment. But even with B2B, you never know which businesses might want to misuse it to do ethically or morally wrong things. And these agents we think about maybe doing it for profit or whatever their agenda. It can also be misused by the government for identifying or applying it to a scenario where it shouldn't be applied. So what are the possible misuses that you've thought about already and what kind of protection do you have against that kind of things?

Henry:

This is such a wonderful question because as we, and we are moving forward as a world, as a community, as we are capable of creating solutions that could bring great benefit. There's an equal opportunity to do harm. And so one of the things that we've seen with AI in general is the development of a range of different standards, ISO standards that control the ethical use of product. So you remove biases from the creation of algorithms that are measuring things. And also ISO standards and high trust environments and high trust audits that can demonstrate to a partner that the data that they do send you is being treated in an ethical way and in a secure way. So Canary Speech is audited both for high trust and for multiple ISO standards that apply to the use of data in a way that it's both fair and ethical and secure. And for instance, all the information that we gather from a client, a hospital or otherwise is sent to us in a de-identified way. So the data we get doesn't have the identity of an individual on it.

Henry:

Now we have a secure site running in a high trust environment, HIPAA compliant, and regularly tested for vulnerability and penetration. If someone were to penetrate any site. If you have separated the identity of the individual completely from the data itself, then it's less risky for the individual.

Henry:

Now in working with different organizations, we primarily focus in the healthcare space where the intent is equally yoked. We're yoked equally with a healthcare organization to bring benefit and to bring quality of care to populations and communities. When you're working with other organizations that are bringing product in, you have to explore with them, are they functioning with the same intent at the same demonstrated level? If they have gone through the effort, pulled the wagon, if you will, to get the securities, the ISO securities and the high trust, and open their doors to be audited for that, then you're probably sitting in a room with a good partner.

Henry:

And those are the kind of things that are routine. You do that as a routine. Your quality systems demand that you perform and act in that way. I always tell people that we have the ability to expand into other languages and to do so reasonably quickly. But in healthcare, particularly in healthcare, you only do that after you have validated and had peer review.

Henry:

So when we're entering Abu Dhabi in the Arabic language, we're working with Cleveland Clinic in Abu Dhabi. That has the responsibility and the credibility of acting in the highest level of professionalism. So they align with us because they see the same when they look at us and we align with them for the same reason.

Henry:

So we're very careful about honoring the obligation that we have to bring good. And we think that there is an enormous opportunity to bring good, but as you said, the danger is whenever that opportunity to bring good is so substantial, the potential to do harm is equally substantial. And that's where you have to have the procedures, the processes, the quality, and the securities in place so that you remain on the side of that wall that provides good.

Amit:

I've been listening so far and I have got some understanding of the product and I really think it's a very useful product because it acts as a copilot. But I wanted to understand the accuracy side of things, because when we talk about AI these days, we talk about hallucinations, we talk about accuracies, we talk about AI blabbering out data that is incorrect, but they're very confidently saying it.

Amit:

So how do you counter that in your product so that when the clinician is observing your data, they're at least say 80% or 90% confidence. Do you have something like a scoring mechanism and have you gone through different trials? Because Rinat mentioned about biases. So you have biases in the system, biases in the model or in the training data.

Amit:

So you will have a general accuracy of the output that you're receiving. So what kind of mechanisms are in place to make sure that the accuracy is also conveyed to the clinician so that they can take a more informed decision. It's not just an output, but also output with a score. This is my assumption. I don't know how it works behind the scene on with the clinician, so I'm just trying to figure out how it is and how do you mitigate such thing? Because when we talk about AI, we also talk about hallucinations.

Henry:

Intuitively, you're just spot on. You just described in many ways the processes that we follow. We developed an MCI model with the National Institutes of Health in Osaka, Japan, and we just published in Lancet. So there, in order to get to the point where you have a population of individuals selected to participate in a study with a protocol that has been written and approved by both their medical team and our scientists, and reviewed by their peers and reviewed by an IRB board—on human research trial board, independent of anything either of us are doing, all of those approvals get in place and then individuals are invited to participate, voluntarily participate and do so under an informed consent.

Henry:

All of those are the basis upon which every model, every algorithm we have ever created is derived. So that's kind of ground one, the diagnosis or the ground truth is the experience and expertise of the specific trained physician group and the review of their decision by peers, as you would do in a clinical setting.

Henry:

And it in fact is a clinical setting in which we do this. So if it's a neurological disease, these are neurologists and they often specialize in Alzheimer's or Parkinson's or MS or whichever it is. And we're working with that team and then that gets reviewed, the data is collected along with audio and the machine learning process that we use, which is supported by 14 patents worldwide right now.

Henry:

And 12 pending. But that process is then used to create the correlation with the physicians. Now, when we've built our first Huntington models with the Harvard Beth Israel neurology team, the accuracy was 98%. And when it's that high you say to yourself, is this real? Or am I getting overfitting or am I getting hallucinations or something?

Henry:

And the Harvard team had some experience in speech and language, and our scientists, of course, have experience in speech and language, and we put all the data on the table and all of us looked at the data. There are about five different ways to look at whether or not you have hallucinations or you have overfitting of data.

Henry:

We ran the tests upside down and backwards, and determined that the data. It was not overfitted, but just simply accurate. So we then took that data on a blind study set and conducted a blind study to canary against it, and were able to reproduce the same accuracy in a blind study, a double blind study. Those are standard scientific process and procedures that you use.

Henry:

Now, the same is true of every single model we've ever done. The mild cognitive impairment model that we built in Japanese was worked on for a period of a few months. Data was collected under the same manner and process and procedures under the same controls. And then we wrote a paper. In both cases, we wrote papers with Harvard.

Henry:

We wrote a paper with the National Institutes in Japan. One was published at the American Neurological Society meeting as a podium presentation by the Harvard Group. The other one was published in Lancet. In order to get those publications both accepted and presented, they had to be reviewed by an independent medical board. So that the research and the procedures and the processes used have a completely independent review.

Henry:

I always tell people the lift seems pretty heavy, but the requirement to function in that level of integrity requires that lift. You can't get there by gathering data from Facebook. It's irresponsible to do that. A responsible, ethical individual lifts the weight required to meet the demand of the job. And that's the process we followed. Now, if you do that for every single model, someday you wake up and you get out of your bed in the morning and you look back and there are 14 different clinical models that have been built over the period of nine years.

Henry:

And now there are three languages that we have and two more will be added this year. And you think to yourself, at the start, I didn't know that I could lift this more than once. But the truth is, we haven't been alone. Every single effort has had yolked to that same wagon with us.

Henry:

Some clinical team at an institution like Harvard or like Beth Israel, like Mayo Clinic, these are individuals that are trying to make a difference in the world. And that's an equally yoked with us. We're trying to make a difference in the world, but you have to do it correctly.

Rinat:

Absolutely. That's actually really comforting to know the level of thoroughness that went through into bringing these product into fruition. One of the distinction that I want to understand is the name is Canary Speech. We talked about speech analysis, but at the same time it is audio.

Rinat:

Does the product analyze anything but speech for example, ambient environment to understand what kind of other information can be derived from? There are other parts of speech which is not what us humans hear, like the pauses or the silences, those are also biomarkers, but those are part of speech. But in terms of just random audio to understand or collect data, which are not exactly speech, is there a distinction and how does the product work in that front?

Henry:

I've said this earlier, but it's such an intuitively important question. When we built our first depression and anxiety models, we built it with three different clinical institutions in English. We then hired three separate CROs that collected data from all 50 US states, a cohort of about 6,000 individuals collected in a protocol.

Henry:

Conducted by a professional CRO team independent of Canary capturing the audio. We did that because of many of the things that you just reviewed so well. When you're working in different clinics in different parts of the country or the world, you want to know that the noise in that environment is not negatively impacting or negatively affecting the accurate measurement of this model.

Henry:

And so we collected it in all 50 states with different accents and maybe second language is English, and at that time we were working in English. And then you build that into your machine learning process. And so your machine learning algorithm, the algorithm that comes from that, the resulting algorithm is more robust.

Henry:

It's experienced different noise characteristics from different places now because Jeff led a team that built the Amazon Echo Far Field speech where you can be standing halfway across a room and tell the Amazon Echo to do something and it understands you, and noise is bouncing all over the place. We knew the process by which that was developed and we understood that the robustness of that process would make a more robust model.

Henry:

It would perform more routinely, more consistently across populations. So we did that. There's another thing in machine learning that you do where you go out and you get artificial noise or you create artificial noise and you train your model with that as well. And so you do all of those types of things so that the model can function in a more accurate and robust way across different environments, whether it's a clinic or a hospital or so on and so forth. And you encourage people to follow certain processes to ensure that the audio that's being captured has a noise, a signal ratio that's appropriate. But then you use AI and you build into it. And in real time you measure what those ratios are.

Henry:

And if the ratio falls out of normal range, you alert the individual to take a new sample just like you would if you took a blood sample and you knew it was faulty, you go take a new blood sample. So we've also integrated AI tools into the process to alert individuals that are doing a test, that they have a viable, reasonable sample to do a measurement on. And then you make the models independently as robust as possible. So that's a process that you follow. And I think it's consistent with the earlier question about delivering something that meets the measure of the day. You really, truly need to understand that you're working within healthcare and that you're working with a group of individuals that not only have expertise and experience, but they have a desire to help.

Henry:

And the tool you give them needs to be built in the same ruggedness and robustness. And that's what we did. We just felt to be a partner in the space, we needed to do that.

Amit:

Again, very interesting to learn about the measures and steps that you have taken to ensure that the data integrity is met and you take care of the ambient noise. While having this conversation with you, another question came into my head and that was regarding the age of the voice.

Amit:

Children have different voice, and adults have different voice, and when you go through puberty, you change the voice. And normally what we've seen in the UK—we're both based out of London. So in the UK what we have seen is a growing trend in the student population of mental health issues. Has Canary Speech thought about helping schools, and if yes, how do you differentiate between an adult voice and a like a child voice or a toddler voice, or a even a student voice?

Amit:

Because I'm pretty sure there are biomarkers to tell whether it's an adult, whether it's a child, because there will be different models involved for these kind of voices.

Henry:

Honestly, these are the best questions I've ever been asked. I've had nine years to experience this and you guys had 40 minutes to. So brilliant. Honestly brilliant. There are two broad answers to it. Once an individual reaches adolescence and for men and for women, their voices become an adult voice from there on out, the models are pretty consistent. They're very consistent. And we've trained them with, and used them with children as young as 16 years old and forward. And the models are recommended for certain age groups all the way up to senior adults. The younger children—for the first seven and a half years we didn't work with children under that age. We now have multiple childhood models under development. And the reason we did not was there is, in my life personally, there's a sensitivity and a respect for young children. Being able to make informed consents, being able to understand what they were doing, the demand and the need is so enormous, like in ADHD and in autism and in depression, suicide prevention, all of those areas are so critical right now and almost at epidemic levels in our schools and in our populations for our children.

Henry:

So we're currently working with the CDC, Hackensack Meridian Healthcare, an organization in Edinburgh and an organization in Chicago, all who specialize in childhood illnesses. And we're currently working on ADHD, autism and then behavioral health models for that group. And the reason we chose those first is because of everything that we've already talked about, the demand and the requirement for behavioral health, mental health for our young children is desperate.

Henry:

To be able to make a product that could be shared between a mom and a daughter, could be shared in home, could be used for self-managing their situation if they're old enough. For ADHD and autism, there are developmental issues of the brain in these young children that if they could be treated earlier, there can be significant correction to their problem. If they're not, and their development progresses too far, they're locking in certain conditions and they will be lifelong conditions for them. In UK and the US and other countries. Right now, the backlog for autism and ADHD is so significant and the primary care level lacks the tools to effectively diagnose the need for a referral.

Henry:

So we're working to develop a tool, again, a companion, a co-pilot for a primary care or a pediatrician that could accurately identify ADHD or autism, to identify elevated levels of depression or anxiety in a primary care level, to be able to take those tools home because most of the triggers that occur for ADHD or autism don't occur when a young person is with the doctor, they incur when they're in their school room or they're in their home, or they're in their playing field, and so being able to connect the parent to where a trigger is with their child, to identify it, to help them to avoid it, or to identify that for the clinical team so they can better understand their disease and make a referral.

Henry:

The groups who we're working with in the US, our intent is to build these models, and then have them registered as a standard support for diagnosis. So a referral could be made. With some of the children, they're not forming words yet. Some autistic children aren't forming words, and so remember we're not working on words and I'll jump back to a question you asked earlier—we're not working primarily on words at all. We're working on how words are created or how sounds are created. So in the autistic child, we're able to have a rich enough data set to be able pre, pre speech to be able to identify very early that autism exists and therefore get the child lined up in that process of diagnosis.

Henry:

The same is true of ADHD. And so those measurements are just critical. You asked about non vocal biomarkers, things like spaces. In diseases, progressive neurological diseases, like in Parkinson's, you get a weakening because of the disease of the vocal cords and you get an elongation of words.

Henry:

So we pay attention to that because it can help us define what stage the Parkinson's patient is. Um. I just used a filler word. Um, uh, I just did it again. In a cognitively impaired individual, you can also end up with a lot more filler words because they pause to think a little bit. We're a speech company. We pick up on those. We know they're there, we build them into the algorithms. But for children, we waited until now after seven and a half years of building our processes, because of the commitment to the respect and integrity and security of their data, we wanted to be able to demonstrate one, that this was working, two, that we had the processes and quality systems in place to honor their care.

Henry:

And so we felt after seven and a half years we could choose effective and partners who are at the same level. Pediatric organizations and specialty care facilities for ADHD and autism. So we're at the beginning point with this, but this year we'll have some models for children and then we'll expand on that.

Rinat:

Wow. That is really amazing to hear. Now I have a technical question that I've just thought about while I'm trying to imagine how the product is being used, and you mentioned it can analyze in real time while the clinician is talking to the patient. Now, what about the human factor in a way that the clinician also is a human?

Rinat:

Is there a cutoff point when the patient is speaking? Or could the AI accidentally also diagnose the anxiety in a clinician? And how, what do we do with that? If the clinician while talking gets feedback on his or her own diagnosis, how would that scenario play out?

Henry:

So in an involvement with between a clinician and a patient, or a clinical team member and a patient, our technology using AI is very sophisticated at identifying the different roles. So we know who the patient is, we know who the doctor is, we are capturing or we could capture both voices.

Henry:

'Cause they come in a stream and we separate them. Now we take the clinician's voice and we just throw it away in a normal interaction. We have had some situations under informed consent with the approval of the clinical teams, where we looked at, generally looked at the anxiety level and depression level of the professional.

Henry:

Because of burnout situations, we can measure both voices simultaneously. Where we put that information is directed intelligently. So the information about the patient goes back to the doctor and they're looking at it and it helps guide their interaction.

Henry:

The information if it's being gathered, and very few do this, almost none. And it's with the informed consent and approval of the clinical team. The most reasonable application of it so far has been, all nurses in a department. So if there's 150 nurses in a department, no single nurse is being measured, but the entire health of that nurse population is being measured.

Henry:

So they're looking at it, is it elevated to higher anxiety or depression, or is it stable? That kind of thing. But from a community standpoint or a population evaluation standpoint, someone like a CDC could use this to look at the population wellness of a whole community of people by just engaging them to participate over the phone or over a smart device with an app.

Henry:

So those kind of things can be done. The most frequent and 99% of the time, it's the patient only. We're only looking at the patient, we're providing information for the doctor. The other is an area that may expand, it may grow. We're working right now where we're in intelligent patient rooms, where they have a monitoring system 24/7 watching whether or not the patient fell or is in their bed or not in their bed.

Henry:

And compliance to certain hospital rules, does a clinical team member, nurse or doctor clean their hands when they're coming in, clean their hands when they're going out? But we're also measuring at the request of hospitals, level of aggression. In the US. That's a major problem. Nurses are being attacked by either family members or patients on a daily basis.

Henry:

So we're capable without ever leaving the room, we're deployed within the room. We actually measure levels of aggression just from audio and speech. And we give, in that case, we give a green, yellow or red light. And if it's a red light, you go in with someone, you try to deescalate the situation and talk with the person.

Henry:

So it's a proactive approach to helping stabilize situations. Someone gets agitated in a room, it's because they're ill, they're sick, and sometimes even terminally. And in other cases, they just don't feel they know enough. And those are all things you can manage. You can go in and deescalate a situation and help manage it if you're aware it's there.

Henry:

So that's another type of application for the technology within a healthcare environment, is dealing with levels of aggression, identifying them early, allowing proactive deescalation of those situations.

Rinat:

I've known about another project which also deals with sort of understanding the various biomarkers of humans from videos. For example, in a live conversation video conversation, the tool would analyze the pixel by pixel and frame by frame differences, and it could potentially measure the heart rate the human is experiencing and give you more information based on that. And going back to the story you've told us in the beginning about your daughter, how you would pick on different cues, which will tell you whether or not she had a good day or a bad day, and that was based on not just speech, but also various visual cues.

Rinat:

So why limit the input into just audio or maybe you are already thinking about having more of a rich input to the tool so it can give you more of a richer output to make even better decisions or more information. Is that something that you are thinking of expanding into? Or if not, why not?

Henry:

We do that already to a small degree. For instance, when we were working with mild cognitive impairment. Understanding the analysis of a patient, certain criteria, their age or education level. Simple things. Their gait analysis—all of those things were built into the algorithm, so either their gait was being measured, if that was available. That became a function of the algorithm itself. We were able to improve by about 12% the accuracy by using what is called, in this case, it's called multimodal biomarker analysis. So it's multiple modes of biomarkers. You can use a whole range of different things. Facial analysis, retinal analysis are all available even on smartphones now.

Henry:

They provide information about blood pressure, temperature, many other things too. Even an estimate of the age of the individual. We're working with a few companies right now to integrate what Canary Speech does into their platforms so that we can merge these two approaches into a single algorithm to try to enhance the understanding of the overall health and wellness of a patient.

Henry:

So those things are ongoing. There are some cases with certain diseases where voice alone, which is very easy to capture—it can be captured ambiently in a conversation like we're having. If it's 98% accurate for that particular disease, that's pretty high. It's hard to imagine putting other modes of information in and getting to 99%, for instance.

Henry:

And remember, this is not a diagnosis, this is a clinical decision support information. The doctor's already reading what they're seeing and this is supporting that potentially, or prompting them to ask additional questions. In this case where there will be additional value, I think, and we're exploring it, is understanding the progression of the disease.

Henry:

So we can measure in Parkinson's, for instance, we can measure very early stage Parkinson's disease and we can watch as it progresses through a mild and a more severe Parkinson's. A transition from a phase two to a phase three, Parkinson's patient often is marked also by mild cognitive impairment or cognitive problems.

Henry:

So, similar to Alzheimer's disease. And we can measure when a patient has both Parkinson's and Alzheimer's characteristics indicating that transition, that could also be measured potentially by the severity of their deterioration of their motor function, their skills, their hands, their voice, and their gait, their walking.

Amit:

I want to follow up that with the quote that you have mentioned, and I think it relates to what you just said, because sometimes yes, you're right. If the accuracy is already too high, you don't need any other modes. I was just reading the quote. It's—you said next to the human genome, speech is the richest data stream in the body. Was that a quote by you, Henry, or was it from Jeff?

Henry:

That one was mine. When you think about, when you think about information, the genome, there are different types of information. Of course, the genome can tell us, if you both probably have brown eyes or dark eyes. My eyes are green. However, if you had my allele for eye color, we would both have green.

Henry:

That's a biomarker. It doesn't matter where it occurs in the world. If you have that set of alleles, you're gonna have that. However, in the genome, you may also have early onset Alzheimer's disease. Now that tells you're likely to get early onset Alzheimer's. It doesn't tell you when or that it has happened.

Henry:

So the genome can tell you, looking forward, the probability of you getting something—even the likelihood may be a hundred percent, but when it occurs, you don't know. Huntington's disease is a dominant gene. You're going to get Huntington's. But does that happen at 28 or does that happen at 38 or 42 or whenever does it manifest. What Canary Speech is capable of doing, in measuring nearly as large a dataset, millions of data elements in a minute, 15 and a half million. Now, the genome is much denser than that, but very few things are a million data elements in a minute. Very few. Speech is one of the only, and so we have this very, very rich set of information.

Henry:

Second, I believe, only to the genome, and it does a different thing than the genome does. It tells you that you currently manifest a disease. The genome tells you're likely 80% to get the disease or a hundred percent to get the disease, but when it will happen is unknown. What Canary tells you is it has happened.

Henry:

So that's the difference between these two sets and why they augment each other. If you're scanning through a genome and you notice that an individual has the Huntington's alleles, then you can measure them with Canary Speech or if you're measuring them with Canary Speech. And while we're very accurate with both Parkinson's and Huntington's, if you measured it and there was any question, you could check and see do they have the gene for Huntington's and correlate the two together. And that could be done in the same milliseconds of time because the information's available in that, in their healthcare record within the hospital system.

Rinat:

It was actually amazing learning about Canary Speech and how it works and the story and the journey that you've taken to bring it up to where it is now. And thanks. Thank you and Jeff both to making this into a reality and taking the next step from genetic analysis, as you mentioned to giving us more information to make more informed decision for the doctors and clinicians. It is really neat learning that this exists. Among all the other AI derivatives that we are coming to know daily, this seems like one of the groundbreaking ones.

Rinat:

Definitely hope you guys continue and do more and more amazing things. Very much enjoyed the conversation, Henry. Thank you for coming to our show and to the audience, if you guys have any particular questions that you would like to ask Henry, we can forward those questions and potentially get those answered.

Rinat:

And yeah, as always keep coming back to our show. We usually talk about various tech related topics, and we do want more and more guests to come to our show like Henry. Please do reach out if any of you would also like to attend our show. Thanks again, Henry. With that we can end the show here. Thank you all the listeners.

Chapters

Video

More from YouTube

More Episodes
91. Canary Speech
00:58:19
90. Vibe Coding
00:42:01
89. AI Powered Humanoid Robots
00:37:13
88. AI Agents
00:42:37
87. Data Broker
00:40:40
86. Post Quantum Cryptography
00:47:41
85. Enterprise Software
00:44:15
84. Datacentres
00:40:27
83. Ransomware
00:38:16
82. Bitcoin Halving
00:38:03
81. Impact of AI on Jobs
00:35:43
80. Social Media Addiction
00:53:41
79. Orkes
00:58:06
78. Bubble
01:00:12
77. Digital Trading
00:53:40
76. AI Companion
00:42:20
75. Digital Immortality
00:28:20
74. WordPress
00:51:56
73. Sportskeeda
01:07:45
72. P2P Payments
00:31:47
71. Generative AI
00:45:06
70. Humanoid Robots
00:45:43
69. BIOS
00:43:05
68. Social Commerce
00:38:36
67. Tech in Japan
00:45:35
66. Data Visualization
00:35:19
65. Formula E
00:28:44
64. Mobot
00:48:06
63. Prompt Engineering
00:41:05
62. Solar Panels
00:48:31
61. Digital Divide
00:38:34
60. ChatGPT
00:50:22
59. Cardiac Devices
01:15:14
58. BitTorrent
00:49:43
57. In-flight Wi-Fi
00:43:30
56. Electric Grid
00:42:17
55. Wi-Fi
00:55:03
54. Ports
01:05:28
53. Dropshipping
00:43:09
52. Fire Control System
00:49:51
51. GNSS
00:45:54
50. MFA
00:51:31
49. UTC
00:39:21
48. Deepfake
00:46:05
47. Digital Legacy
00:48:36
46. Biometrics
00:42:54
45. Bots
00:46:15
44. Noise Cancellation
00:52:26
43. Internet Browser
00:49:36
42. Web 3.0
00:53:30
41. Biotech
00:57:31
40. Fintech
00:57:25
39. Analytics
00:45:16
38. Agile
00:43:36
37. NFT
00:59:25
36. Metaverse
00:39:30
34. Smart Vacuum
00:40:10
35. CBDC
01:07:42
33. Starlink
00:44:35
32. Chromebook
00:46:35
31. Unified Communications
01:16:18
30. How to start a Podcast
00:43:11
29. Robotics
01:00:29
28. A11Y Project
01:13:00
27. Lichess
01:00:25
26. Rapid Software Testing
01:33:03
25. Raspberry Pi
01:14:25
24. Have I Been Pwned
01:17:57
23. Search Engines
00:58:32
22. Building a PC
01:17:56
21. UAVs
00:49:07
20. Electric Vehicles
00:57:57
19. Digital Marketing
00:48:21
18. E-commerce
01:02:37
17. Computer Vision
00:48:46
16. Big Data
00:53:56
15. 3D Printing
00:56:51
14. Internet of Things
01:04:47
13. Cloud Computing
01:01:04
12. Database
01:01:32
11. Software Automation
01:05:13
10. VR & AR
01:10:29
9. Tech for Travel
01:11:22
8. Cybersecurity with Mani
01:18:13
7. CAD & Autodesk Inventor
01:17:17
6. Software Testing
01:08:34
5. Cryptocurrency & Bitcoin
01:15:48
4. Beginner's Guide to Websites
01:12:50
3. AI & Genetic Algorithm
01:08:09
2. API & Postman
00:54:17
1. RPA & UiPath
00:53:54