Today, we explore the evolving landscape of conversational AI with Kylan Gibbs, the CEO and founder of InWorld AI. We focus on how InWorld develops AI products that enhance the creation of scalable applications, particularly in consumer contexts where user interaction is increasingly conversational. Kylan shares insights into the importance of real-time performance and how expectations differ between consumer and business applications. He emphasizes that while businesses often prioritize automation and factuality, consumer applications demand speed and engagement, requiring a nuanced approach to AI design. Join us as we delve into the technical challenges and innovations that are shaping the future of AI interactions.
The podcast features a deep dive into the evolving landscape of software architecture, focusing on the role of AI in modern applications. Our guest, Kylan Gibbs, the CEO of InWorld AI, discusses how his company builds AI products that facilitate real-time conversational experiences for users. Kylan emphasizes that the majority of consumer interactions with AI now occur through conversational interfaces, such as chatbots and voice assistants. He explains that InWorld specializes in creating scalable applications that not only meet user demands but also adapt to varying contexts, such as customer support, gaming, and educational applications. This adaptability is crucial because user expectations differ significantly across different scenarios. The conversation further explores the intricate balance between performance and user experience, highlighting how different user expectations influence the design and functionality of AI-driven applications. Kylan shares insights into the engineering challenges that come with real-time AI interactions, emphasizing the need for robust performance engineering to deliver smooth conversational flows. He believes that as AI technology progresses, the focus should shift towards enhancing user engagement while maintaining high performance standards. Overall, this episode offers valuable insights into how software architects can navigate the complexities of integrating AI into consumer-facing applications while ensuring that the user experience remains at the forefront.
Takeaways:
Links referenced in this episode:
Companies mentioned in this episode:
Mentioned in this episode:
How do you operate a modern organization at scale?
Read more in my O'Reilly Media book "Architecting for Scale", now in its second edition. http://architectingforscale.com
What do 160,000 of your peers have in common?
They've all boosted their skills and career prospects by taking one of my courses. Go to atchisonacademy.com.
Hello and welcome to Software Architecture Insights.
Speaker A:Your go to resource for empowering software architects and aspiring professionals with the knowledge and tools they require to navigate the complex landscape of modern software design.
Speaker A:My guest today is Kylan Gibbs.
Speaker A:Kylan is the CEO and founder of InWorld AI.
Speaker A:InWorld develops AI products for builders of consumer applications, enabling the creation of scalable applications that evolve and grow to match increasing user demands.
Speaker A:Kylan, welcome to Software Architecture Insights.
Speaker B:Thank you so much Lee.
Speaker B:Pleasure to be here.
Speaker A:So as I read that description of your company, tell me a bit more about what that exactly means.
Speaker A:What exactly does InWorld do?
Speaker B:Yeah, so it's, it's a bit more narrow than perhaps it sounds in that description.
Speaker B:The reality is if you look at how most consumers are interacting with AI today, it is almost all in some kind of conversational interface.
Speaker B:So that could be a chat.
Speaker B:Of course, we're all very used to ChatGPT, that could be voice.
Speaker B:We're getting very used to that now.
Speaker B:I think we're seeing like crazy growth in the last six months, especially around voice uptake and it's all kind of that conversational nature.
Speaker B:And I think the kind of insight there is, you know, if you're building an interface to something that is a human, like intelligence, even if it's for entertainment purposes, you're going to use something conversational.
Speaker B:So we tend to focus very much on real time conversational pipelines, models specifically.
Speaker B:And the reason that we do that is, well one, we're very good at it.
Speaker B:So our team comes from a background in real time conversational AI for decades for some of our team, we also basically have found that that is just covering most of the consumer space.
Speaker A:So if I were to say that that performance is your number one concern, you would probably come back and say specifically, real time performance is the aspect that's most important to what you do.
Speaker B:That's right.
Speaker B:And what's super interesting is we've kind of started to reach the level of like human, like performance.
Speaker B:And what we've started to find is kind of, I think for the last few decades even our goal was, okay, let's try and create AI that sounds like a human.
Speaker B:And now that we're getting to that place where it does start to sound like a human, it's performance of a specific role.
Speaker B:So we work for example with, you know, health and fitness companies and apps, language learning apps, you know, of course, games and media.
Speaker B:And what's interesting is what performance means in each of those contexts is very different.
Speaker B:If you're talking to a Customer support agent versus you're talking to your favorite character from a show, you're going to expect a very different type of conversation.
Speaker B:You're going to expect them to use knowledge differently.
Speaker B:You're going to expect them to cut in at certain points very differently if it's an assistant versus, you know, a, a protagonist character.
Speaker B:And so there's a lot of nuance, I guess to what performance means in each context.
Speaker B:And it's actually very specific to what the user is expecting.
Speaker B:What is the user believing that they're encountering in this moment?
Speaker B:Is it a helpful support assistant?
Speaker B:Is it a favorite character in their game?
Speaker B:Is it their health coach or someone helping them learn a language?
Speaker B:In each of those contexts, the user has a different expectation of what performance or quality really means.
Speaker B:And so a lot of what we do is of course building that grid default, but then helping adapt it to those vertical or user specific context.
Speaker A:Yeah, in fact, the real time performance in your context is not your real time may not be the right term, maybe just in time is a better term because it doesn't.
Speaker A:The fastest isn't necessarily the right answer.
Speaker A:It's at the right time and be exactly precise in the correct timing of when it goes.
Speaker A:And so you have to be ready very fast, which is where performance comes in.
Speaker A:But there's a lot of nuance to the exact timing of when you want to give a response or interrupt or do whatever.
Speaker A:In a conversational AI.
Speaker B:That's exactly right.
Speaker A:Yeah.
Speaker A:Cool.
Speaker A:So that's, that's a, a B2C sort of of framework.
Speaker A:Right.
Speaker A:Where some, some business is building an AI that's talking to a consumer.
Speaker A:I guess it doesn't have to be business to consumer, but let's leave it at that for, for a moment.
Speaker A:But a significant portion of the spending of AI right now is in B2B.
Speaker A:And I think you're the one that told me when we had our last conversation that wasn't about 90% of AI spending currently in the B2B marketplace.
Speaker A:So how does that translate?
Speaker A:Are the needs the same in B2B or are they different?
Speaker A:And how exactly does that fit into what you do?
Speaker B:Yeah, so maybe some background here.
Speaker B:So I came from DeepMind.
Speaker B:When I was at DeepMind, my job was to basically take DeepMind AI and get integrated Google products.
Speaker B:And we were using early LLMs.
Speaker B:There was a lot of really cool ideas around search, learning everything else and really a lot of it ended up landing in Google cloud for business facing applications.
Speaker B:And the key thing there, if you're thinking about building, for example, a contact center AI, it's very predictable.
Speaker B:You're kind of focusing on factuality.
Speaker B:And also when you start looking at a lot of things that people.
Speaker B:So that's kind of on the conversational part.
Speaker B:But beyond that, in a lot of cases now, a lot of business automation is just doing complex tasks.
Speaker B:It really has nothing to do, for example, with entertainment or kind of engagement of a user.
Speaker B:And so what we tend to find is a lot of the business automation that we use today is things like business intelligence.
Speaker B:Am I understanding a huge amount of data?
Speaker B:Am I automating very complex processes like some legal process or an onboarding process for an agency?
Speaker B:In each of those contexts, you don't care if it takes 30 minutes because it's maybe still faster than a human doing it.
Speaker B:You don't really care about cost because Even if it's $30 per 30 minutes, that's still cheaper than a human lawyer.
Speaker B:The reality is when you come into a consumer context, what we're talking about, the real time performance there, the just in time performance, as you put it, is you're much more cost sensitive.
Speaker B:First of all, your users, you're looking at maybe a dollar to a few dollars a month in terms of processing costs dramatically lower.
Speaker B:The business latency is also interestingly much more important in consumer context.
Speaker B:In business, users are expecting this human like performance, which is usually under 500 milliseconds, to feel natural.
Speaker B:Other things are then how you even think about quality.
Speaker B:So if I was building for example a standard customer support agent for every business, I could probably create the same Persona and users would be happy with it.
Speaker B:We have to then take for example in some cases hundreds of characters that are well known and beloved and help create them.
Speaker B:For example, for a, it could be for language learning app, or it could be for a parks company, or it could be for a streaming platform.
Speaker B:And in each of those contexts, they're really recreating something that engages the users.
Speaker B:There is just this fundamental distinction I think between what AI architectures look like when they're doing that complex long running.
Speaker B:I think the important part is long running business processes and when they're doing this kind of very fast user facing ones.
Speaker B:And so yeah, about 90% of the total total spend on AI days in the business facing side.
Speaker B:But I would also say that's just because the tech wasn't ready for the constraints in the consumer side because it's too expensive, it's too slow, we can't adapt the quality well enough to optimize it.
Speaker B:You know, we don't even have some, like a lot of people don't have basic modules like memory that are very important for that long term relationship with consumers.
Speaker B:And so we're genuinely seeing this kind of big inflection point right now where huge amounts of investment over the last few years have gone into the business facing side developer tools.
Speaker B:Now we're starting to see kind of the winners emerge there and people are realizing that the value is kind of moving up to that application layer and that the kind of most open space, let's say, is on the consumer side.
Speaker B:Because let's be clear, like the business place has been kind of really eaten up over the last four years.
Speaker B:And so we're seeing kind of a gold rush of folk on the consumer side because the tech is ready and basically the market is hungry.
Speaker A: So you know that that: Speaker A:You expect it to be a huge influx of spending in the consumer side of the world.
Speaker A:Less so on, or maybe not less on the business side, but less increase on the business side as time goes on.
Speaker A:Yeah.
Speaker A:So what does this mean for the developer then for a team that is working on an AI product, whether it's consumer business, whatever, what does this really mean for them?
Speaker B:So the funny part is what you can see for example, is you can see a lot of folks going out and building very complex workflows.
Speaker B:So you know, for example, there's the opening agent kit that was released.
Speaker B:Google's got a lot of these kind of visual flow builders and you can create very complex tasks.
Speaker B:For example, one of the companies we work with, they do business facing applications and you know, they're for a multi hundred thousand dollar per month contract.
Speaker B:Even though they're using those very complex business processes, a lot of the challenge is just basically baking in the business logic.
Speaker B:It's not really in how you configure the AI or the performance engineering, let's say.
Speaker B:And so the biggest difference I see when people are kind of starting to step in the consumer space is like performance matters.
Speaker B:When I say performance, I literally mean basic performance engineering.
Speaker B:How fast can you make it?
Speaker B:Are you writing it in C or a slower language?
Speaker B:Are you really optimizing what's done on the client, what's done on the server?
Speaker B:Those little things become much more important when you have these applications that are touching millions of concurrent users.
Speaker B:So if I just said, okay Lee, I'm going to basically create a process by which you could onboard every podcaster, generate some scripts, pull all the knowledge from the web for them.
Speaker B:And it's going to take an hour to run this process.
Speaker B:You're like, great, I don't have to do it now you'll go for that.
Speaker B:If you were basically saying, okay, now you're going to have a talking Lee that's going to talk to millions of people in the world at the same time you're, you're gonna be much more concerned about those little performance aspects.
Speaker B:And so it's just like the funny part is it's much more like hardcore engineering actually that's required to make this stuff work in the consumer context.
Speaker B:And the other challenge that kind of associated with that is a lot of the folks who are stepping into that space don't have that background.
Speaker B:People who tend to go into consumer founding consumer teams aren't necessarily from a performance engineering background.
Speaker B:And that creates this funny gap where a lot of the tools that are being built by the big guys are focused on serving those business facing applications.
Speaker B:Because as we said, it's 90% of the market and yet you're seeing this growth of demand on the consumer side.
Speaker B:But frankly the tools are just not quite there and you could build them, but the teams who are building these are five person teams without that kind of background.
Speaker B:And so we're this kind of funny gap where I think there are still some technical challenges to resolve to unlock this kind of next consumer wave because the teams need it basically that are, they're building this and they're not necessarily going to want to build themselves, they want to keep their eyes on the users.
Speaker A:So it's an interesting take.
Speaker A:You're talking about tools and the performance of tools.
Speaker A:You know, if you look at enterprise, the classic B2B compared to maybe an SMB market that may be B2B or maybe B2C.
Speaker A:One of the things that you notice is that enterprise plays tend to focus a lot on low code, no code.
Speaker A:And one of the aspects of that sort of mindset is performance.
Speaker A:Optimized code isn't critical, its ease of development is what's most critical.
Speaker A:Building systems is designed to make things easy to build AI systems that work together.
Speaker A:That's actually one of the reasons, I think, why Python is the language of choice for AI.
Speaker A:It's a simple language, it integrates very well with AI.
Speaker A:It's a, there's a lot of the ecosystem's big because of that.
Speaker A:But it is easy for a non developer to make things work in Python.
Speaker A:Yet Python and the whole AI ecosystem that goes attached to it really isn't designed for performance.
Speaker A:I mean, it's trying to be polite here, but I don't think there's a lot there.
Speaker A:Why is that?
Speaker A:What does that do you for the performance expectations for enterprise and for these low code environments as opposed to these new demands for what you're talking about, for the conversational AI and what's needed there and how to make that work.
Speaker B:So I just, for some reason, as you said that I thought of a funny analogy.
Speaker B:So if we went back like a few decades, but maybe guess.
Speaker B:Yeah, a couple decades.
Speaker B:No, you could imagine, like what's the difference between McKinsey and Google?
Speaker B:Really what they're both doing is serving information to a person.
Speaker B:McKinsey just happens to take a million dollars and six months to produce their information, but you're willing to wait for it.
Speaker B:It's high quality, you know, it's taking away things that you don't want to do and you're paying for that.
Speaker B:Google is serving you search results in fractions of a second to billions of people globally.
Speaker B:And when you think about the architecture of each of those companies, you know, McKinsey basically has this very slow, methodical process.
Speaker B:It's very much about baking in the logic and taking kind of that human understanding and baking it in and the systems basically just kind of being a conduit for that.
Speaker B:On the Google side, the amount of effort that they of course put into search and caching and how they're basically managing knowledge bases that over time is incredible and also extremely fast.
Speaker B:And let's also look, look at the technical innovation of Google versus McKinsey, let's say, over the last few years.
Speaker B:So I say that because I think it's kind of just an interesting point to saying, like when you're serving millions of concurrent users in a kind of standardized context, but that requires a lot of adaptation, you're going to kind of place a lot more technical constraints.
Speaker B:And so let's just relate that then to like, you know, say Python versus C. So if, you know, I was building a custom workflow using a bunch of different models and my goal is basically just to make it human readable, human understandable, bake in kind of, you know, my understanding, my client's understanding, I'd probably write it in Python or even when I go one level up and maybe use it just a visual graph builder, because then I could sit down with my client and really walk through it with them.
Speaker B:If you were doing a Google search and I asked you for every Google search to whiteboard out a flow diagram of how you wanted the search to execute, you'd probably be pretty pissed off.
Speaker B:And so I think it's like a neat analogy because really the goal there is saying, okay, I can't possibly give a McKinsey level answer to every single user, but I'm going to get as close as I can and do it as fast as I can.
Speaker B:And honestly with the AI integrations and everything they're doing now, they're getting pretty close.
Speaker B:And so I think there's just this interesting question of like scale and generality that is forcing you more towards that performance side and where you're probably gonna be writing a lot more in C. You're gonna be much more concerned about your backend systems.
Speaker B:Your infrastructure is gonna matter more than your people knowledge at a certain point.
Speaker B:And so there's I think just a fundamental difference there in the way that we're interacting with a lot of these, these, these sources, let's say of information or purposes.
Speaker A:Yeah, I like that analogy.
Speaker A:I think that works really well.
Speaker A:You know, the McKinsey's of the world, when you hire a McKinsey or you hire any consultant, you hire any gardener, you hire anybody, you're bringing in an expertise, you expect performance, you expect results.
Speaker A:So you expect them to be high quality results and you spend a ton of money to do that.
Speaker A:When you go to Google, you expect to spend no money, get the answer right away and still have a decent quality result.
Speaker A:But you're willing to accept mistakes or problems here, inaccuracies or more work on your part to have to go through exactly and figure out which of these 20 answers is the one that really matters to you.
Speaker A:Right.
Speaker A:So you're making a relatively small trade off in effort and quality in order to get a huge trade off in cost.
Speaker A:The cost can be dollars or effort or whatever, you know, however you make that work.
Speaker B:Yes, exactly.
Speaker A:So that same thing was applying here is what you're saying.
Speaker A:So in the AI world you're, you can build the, the perfect AI system that does exactly gives you exactly the research you want, the results you want.
Speaker A:And in fact, you know, I think OpenAI made some announcements as far as better research based models recently, but there are expectations with those models of quality of answers, quality of results and boy, is it going to be expensive to come up with this result.
Speaker A:Expensive not only in a standpoint of time, but also in AI costs and the dollars associated running those costs.
Speaker A:On the exact opposite end of that there is this, the new focus on the conversational piece, which is no, I just want to say something.
Speaker A:And I want it to be meaningful and useful, but maybe not perfect, but I need to do it in 500 milliseconds.
Speaker A:And it has to be in the flow of the conversation with right intonation and all that sort of stuff.
Speaker A:Those are two very, very, very different models.
Speaker A:Now the first one is the background model, and that's what you say we've been historically, historically working on.
Speaker A:But this One is the B2C model.
Speaker A:This is the model that's really starting to come up now.
Speaker A:And this is really the model you're talking about that what you're doing is focusing on.
Speaker B:That's right.
Speaker B:So, I mean, I think that just what's happened is it's funny, when we first started, like looking at AI and we were looking at what would be automated in that, you know, we thought it'd be robots and factories and that, and it's turned out to be the complete opposite.
Speaker B:And so it's like, you know, 80, 20 or more like 95, 10 in terms of there's 95% of like, you know, let's say value in like these very small business cases, which is like, you know, CEO level work, you know, things that are happening very.
Speaker B:So OpenAI and those, I think, have really pushed to take over those complex tasks.
Speaker B:You hear, for example, Sam Talk about PhD level agents that will cost $20,000 a month.
Speaker B:If you think about that means it's saying, okay, we're just going to go after these super high value use cases that are extremely hard to find anybody in the world who can do, and we're just going to make those accessible.
Speaker B:Now that's a very different problem from now saying across that long tail of consumers.
Speaker B:Well, they probably don't need that complex answer in a given moment.
Speaker B:They want to be entertained, they want to go through their workout efficiently, they want to learn a language.
Speaker B:And also there's millions of people who want to do that at the same time.
Speaker B:And so I think that the challenge there is, okay, your interface is going to be very different.
Speaker B:You're not going to create a visual graph or whatever.
Speaker B:You're just going to create a chatbot, or you're just going to create a talking character, or you're just going to create a copilot that lives alongside your app.
Speaker B:You're going to fit it into the form factor that's native to the user.
Speaker B:And naturally, what's really interesting about this is AI is still best to be thought of, I think, in the metaphor of it's always playing a human role of Some kind of.
Speaker B:And when consumers are interacting with it, the most natural thing, for example, how you and I are talking now, most all human interactions are conversational in nature.
Speaker B:Even if it's a letter, you know, a conversation, it's a text, it's an email, it's very hard to interact with the world without it being a conversation of some kind, especially with other people.
Speaker B:And so naturally, when you think about, okay, an AI is starting to play a human role in some context, even that's just being a companion or a therapist or a health coach or a, you know, a learning assistant in every one of those contexts, like the person is expecting something that is human like and therefore they tend towards conversation.
Speaker B:And I think that's why we just found product market fit.
Speaker B:Like ChatGPT was a huge unlock of course, for the whole AI industry simply because it introduced a conversational interface.
Speaker B:And that's just getting to continuing to evolve.
Speaker B:And so the way that I see this is like chat, of course was the first conversational interface.
Speaker B:I think now we're going into voice.
Speaker B:Next it will go into video, you know, and basically live streaming of avatars, as well as kind of understanding real time space and, and what's going on in front of me as the user.
Speaker B:And so I just think about it as getting closer and closer to the experience of having a human alongside you in whatever context that is, and being fully aware of the context and every modality associated with it.
Speaker B:And so that is to say, that's a huge challenge though, when we go back to kind of the Google McKinsey thing.
Speaker B:So now I've got to make it work to have this unique conversation for a specific role for millions of concurrent users that's going to cost less than a few dollars per month and it's going to respond in 500 milliseconds.
Speaker B:Like you start to very quickly be like, okay, these are hard engineering problems.
Speaker B:And so it's just, I think, two different directions frankly for AI.
Speaker B:Like, I think there's one big pressure which is let's automate all the business processes and the end result there is reduced cost, higher profit, but it basically doesn't introduce any new revenue into the industry or to market overall.
Speaker B:So to do that we need to create things that are actually engaging to consumers enough that they're spending more money.
Speaker B:And, and to do that we need to solve all the technical problems to make that possible.
Speaker B:And so we're very much focused on solving those technical problems for those real time conversational interfaces which we fundamentally believe are the right path towards bringing the benefits of AI to everyone in the world, which is I think a different like existential pressure than the kind of automation side, which is to say, okay, great, we can just automate business.
Speaker B:Well, where does that leave everyone else then?
Speaker B:If the kind of the benefits of AI aren't flowing to them, we've got to make sure that at the same time we're seeing this portfolio of applications and use cases evolve that improve everyone's lives.
Speaker A:So as a software developer who's trying to decide their next career direction, or as a software team that's trying to figure out where their team fits in within the company or what their company is doing fits in within the industry, what advice can you get from all of this for those groups?
Speaker B:Yeah, so I think we can start at the top.
Speaker B:So when I say the top, I mean the application.
Speaker B:So you know, if you're a founder or you're an engineer setting off to build a consumer AI application, I think the first thing that you should note is where people kind of get stuck, I think is they reach a quality bar that they're happy with and then they can't produce it at the same, at the cost or latency that they need.
Speaker B:And so you really kind of want to bring on, let's say a team that is aware of those challenges of how to do kind of that low level engineering.
Speaker B:You've also got to be aware that you're not probably going to have a single pipeline that's plugging in a Single API from OpenAI and you're going to get everything that you want.
Speaker B:You're going to have to do a lot of customization, figure out which models work, making sure that you're choosing the right text to speech for latency, you're choosing the right LLM for latency.
Speaker B:Are you hosting it yourself?
Speaker B:Are you putting the cloud, where is it located?
Speaker B:All those there's very complex problems to make consumer work.
Speaker B:I think you saw this a little bit in the early mobile era where a lot of the like basic gotcha games and things in Zynga actually innovated a lot on tech to make it possible to reach all those.
Speaker B:So I think as a founder engineer at that level, that's what I would say is, you know, focus on your users but be very aware that to serve them properly you're going to have to do some pretty low level work.
Speaker B:I think as a developer now building for tools in this space, the reality is like things like front end code is getting easier and easier to automate, you know, we're starting to kind of eat into backend code a little bit.
Speaker B:I think that the challenge is then, okay, now I'm starting to do that performance engineering.
Speaker B:I'm setting up my servers.
Speaker B:That's where I think we still need human hands a lot.
Speaker B:And it gets very complex when you're handling, for example, concurrent operations across clusters.
Speaker B:And that becomes difficult.
Speaker B:I think that focus on where the AI, because I can generate front end code, I definitely can't set up a server to reach under 500 millisecond latency and design an inference stack.
Speaker B:And even more so I think that more work needs to be done on understanding the actual architecture of models.
Speaker B:How does inference run, which hardware is efficient for this, why it's getting down to that level of okay.
Speaker B:Now if a lot of those top level problems are kind of going away, the value as an engineer I think is getting really good at that performance engineering because even as you're connecting models, there's going to be a point at which you can shave off 100 milliseconds here and there in some certain configuration or how a model is hosted or how the inference stack is built.
Speaker B:And I still see most teams thinking that like AI is this kind of magic wand where I just pull a model out of OpenAI or Google and it works great.
Speaker B:The reality is let's just go back to good old software engineering.
Speaker B:A model is basically a set of weights.
Speaker B:You got to host it, you've got to say get it set up on an inference stack, you've got to figure out your hardware, you got to set up your networking, you got to make sure that you're co located with your databases.
Speaker B:Is very basic stuff, but most teams aren't doing that.
Speaker B:And I think it's because we've been kind of trained to say, don't worry about the AI, you know, let us handle it.
Speaker B:We're going to figure out the AI stuff.
Speaker B:You guys use an API, well that's okay, but are you really going to get the best performance out of that?
Speaker B:And so I think it's kind of like the simple message I would say is like engineering is not done.
Speaker B:We just kind of forgot that AI is engineering in some ways and requires engineering.
Speaker A:That's a great message.
Speaker A:You know, that's, that goes straight to what I hear mostly from developers, which is like, what is my job now that AI is here?
Speaker A:Does my job exist and if so, what am I doing?
Speaker A:And that's kind of going directly towards that.
Speaker A:So I'm encouraging the developers who are listening to this, listen to this advice.
Speaker A:Your job hasn't gone away.
Speaker A:It's changed, but it really hasn't changed that much is kind of what you're saying.
Speaker A:It's the same things you've done before just apply to a different space.
Speaker B:Yeah, like LLM is a really complex for loop.
Speaker B:It's all these operations that we got very used to in standard engineering and now we're just introducing operations that are model calls and defining the input output formats.
Speaker B:We've done that before and we've done typing.
Speaker B:And so I think we're just developing a new nomenclature and a new architecture.
Speaker B:If our components are now, for example, maybe not imported functions, but our model calls or their locally hosted models.
Speaker B:And it just changes, I think, the nature of engineering, but it's still very much required.
Speaker B:And I think the faster that engineers can kind of get to that state where they're choosing tools that give them that flexibility, give them that level of control, then they get to actually do engineering.
Speaker B:I think.
Speaker B:Just saying, I see a lot of teams like, especially on the more, less technical side where they plug in an API, it works and they walk away and you're like, guys like, this is, this is.
Speaker B:There's a lot of work left to do.
Speaker B:And I think it's a, it's a natural, I think, evolution.
Speaker B:But I'm pretty excited, I think, to see that people are starting to figure this out.
Speaker B:And I also think we're starting to kind of see, enter that kind of flatter curve on the kind of model gains, it's still going to keep going.
Speaker B:It's just going to be more expensive and longer now proportionally.
Speaker B:And that means, I think that now a lot of the gains need to come from like the engineering work done on top of the models.
Speaker B:Because models aren't going to just be the panacea that they were kind of posed to be within at least the next couple years.
Speaker A:Yeah, there's been a lot of growth and model changes over the last couple years.
Speaker A:But what you're saying is there's going to be less of that instead we'll be focusing more on the connections.
Speaker A:And I think that kind of plays straight into my next question, which was, you know, around kind of the open, in the background of the OpenAI announcements that have occurred recently.
Speaker A:Now we're recording this episode in October, mid October, but there was announcements by OpenAI just a couple of weeks ago.
Speaker A:I'd like you to talk about that a little bit.
Speaker A:But also about one of that is really a shift in a direction we're Taking AI as opposed to here is version X of this next.
Speaker B:Yeah, so I would point to a specific trend.
Speaker B:So if you, if you look across the last, maybe even more than four years, let's say eight years, we started out at a place where we were iterating architectures.
Speaker B:We had RNNs, then we had, you know, transformers came out and kind of took the, won the race.
Speaker B:But there was actually iteration on architectures now.
Speaker B:Then the next stage we was iterating on how to scale up compute, how to set up operations like GPT3 was a fundamentally an engineering innovation in many ways, right.
Speaker B:In terms of how they were able to scale the model.
Speaker B:And I think over the last three or so years companies have been able to make so much progress by just continuously scaling up the compute that that was the paradigm we were in, that was how innovation worked was basically how do I build bigger models, how do I find better data sets, so on and so forth.
Speaker B:And what I think you saw happen last year and kind of slightly guessed before when you started seeing the reasoning models, this thinking process, you're starting to push more of the inference basically towards actually the.
Speaker B:Yeah, best.
Speaker B:Sorry, more the compute towards inference time rather than at training time.
Speaker B:And what I would say we're seeing now with the evolution of these agent kits, you know, the frameworks to connect these things, is a further push.
Speaker B:So now you're not only taking reasoning models, you're connecting multiple reasoning models to do a certain process.
Speaker B:And so it's kind of to say like as we kind of get, like as the modeling part gets harder, the answer is okay, we're just going to use more model calls to do more complex operations.
Speaker B:And this is kind of what I was alluding to before in saying that, you know, we used to assume that if a model got big enough and powerful enough, it would basically solve every problem.
Speaker B:Well, that might be true, but what if that takes 10 years to get to that model?
Speaker B:Well then you're going to have to do a lot more complex model calls.
Speaker B:What reasoning really is, is just putting some more tokens in and basically doing some internal processing.
Speaker B:What an agent is, is just layering on kind of now a connection or a graph of or pipeline of those models.
Speaker B:And I think you're just seeing happen this happen across every space.
Speaker B:So there's basically what that, what's.
Speaker B:What I think is happening there is the technical constraints are pushing the industry towards more and more like inference time, compute and the, the reasoning models.
Speaker B:The next layer on top of that is naturally kind of Building these connections and doing more custom workflows within that, pulling in new knowledge sources, you know, you know, zapier style kind of operations as they're doing now.
Speaker B:And I think this is just like if, if OpenAI had been able to release, you know, GPT 5.5 at this announcement, I'm sure they would have.
Speaker B:But guess what, it's probably going to take another six months and another $20 billion.
Speaker B:So it's easier for them to build up these SDKs.
Speaker B:And I also see this as a pattern across a lot of companies.
Speaker B:So you see, see companies come up with new model types.
Speaker B:For example, video now is a big one.
Speaker B:You see lots of companies come up with video.
Speaker B:What I expect to happen in the next year is you're going to have bigger companies that are going to figure out how to connect video models and build, you know, these, these long term templates around them.
Speaker B:And so it's kind of to say like you have the main vector there, which is actually the model performance.
Speaker B:But if that starts to slow down on the curve, you start to be able to get those gains primarily off of the, off of basically pushing more and more compute, which is just by more chaining, more and more model calls into the inference time.
Speaker B:And I will honestly say though, I don't think this is solving the problems that you need on the consumer side still.
Speaker B:So that's the interesting part.
Speaker B:Like this is basically meaning it's going to get really freaking good for me to be able to take a bunch of market intelligence, dump it in, set up some sales leads, reach out to a bunch of customers.
Speaker B:I can actually do that pretty well and it's maybe more efficient than the person doing it, but it still doesn't get to that level of okay, well that's not really solving the problem of how do I take a specific, specific model or a very simple pipeline and get it to work, you know, 10 times faster for 10 times more users in a way that, you know, matches the quality of each of them.
Speaker B:So it's like a, it's just a different, I think, paradigm and, and I don't think this is not like unique to open AI.
Speaker B:You know, Google I think is just pushing the space.
Speaker B:You know, they're doing a lot more on the visual graph builders and that and the other thing that I would say that they're trying to do, so that's kind of on the technical constraint side from the, so there's kind of like a technical capabilities and then like let's say market demand.
Speaker B:I think what they're also trying to do is make these tools more accessible.
Speaker B:So, like, there's always been things like these frameworks to connect models and build it and you can technically code it yourself.
Speaker B:Most of the world isn't code.
Speaker B:And actually less and less of the world codes, it turns out, because everybody's using these.
Speaker B:And so I think the other part of this is just they're trying to make the, like, customization of these tools using things like graphs and pipelines more accessible to a larger audience so that you can now have marketing folks who are building pipelines for themselves, you can have legal folks who are building pipelines for themselves.
Speaker B:And it's just opening up, I guess, those capabilities to a wider audience.
Speaker B:So it's a very, let's say, rational direction.
Speaker B:But my argument is it's still not opening up to people I really care about, which is everybody in their daily life.
Speaker B:And that is, I think, still the open challenge to solve.
Speaker A:So I want to end with getting back to the performance aspect a little bit again to connect what you've just talked about back to that.
Speaker A:We're now in the world where doing more of the calculation at inference time versus at model creation time is important.
Speaker A:Yet we're also in a world where the amount of actual processing time we can take to make that happen is becoming more and more scarce because the real time nature is very important.
Speaker A:And so the computation needed at the time of the conversation is more and more critical than it's ever been before.
Speaker A:And that's going to continue and that's going to grow.
Speaker A:And that's where the innovation needs to be.
Speaker A:Take that concept and merge it with what you're doing and using a term that you've mentioned multiple times in this episode, but I've been wanting to get onto.
Speaker A:But we're going to end with it instead.
Speaker A:And that is the use of C in AI programming.
Speaker A:Now, you don't think of C as an AI programming language @ all, ever.
Speaker A:You don't hear anyone talk about using C anywhere and anything having to do with AI typically, yet you talk about it pretty openly.
Speaker A:So a lot of what you do is in C. Tell me why C is the right answer for what you're doing.
Speaker B:Yeah.
Speaker B:So I think that you could start with what we're actually trying to start.
Speaker B:So when we take these real time conversational contexts, you tend to have.
Speaker B:In the simplest context, you could have a graph of like, imagine a speech to speech context.
Speaker B:So you take ASR with an LLM and tts.
Speaker B:Then of course you can layer on things like memory and safety and tool calls and all that.
Speaker B:But let's imagine a very, very simple complex or context like that.
Speaker B:So in that context of that simple speech to speech pipeline, the ideal human case, humans tend to respond between 250 and 500 milliseconds after you top stop an utterance.
Speaker B:So to there you have a latency constraint or a goal of basically reaching, let's say around like between 250 and 500 milliseconds.
Speaker B:That's very fast, like very, very fast.
Speaker B:So when you think about the model calls, there's a, there's an aspect of, you know, of course we can actually optimize the model sizes, we can figure out how they're hosted, we can make sure that they're running on better hardware, we can make sure that they're located close to the user, you know, all that, all the basic, you know, networking stuff.
Speaker B:But at some point, you know, at some point you have this graph and let's say you have it at 600 milliseconds and you have the models that you have and you connected them and you're basically using, let's say a Python based framework to build this.
Speaker B:The challenge is that you just lose a lot of the control of the basic kind of underlying optimizations of okay, if this program is for example, running on your client and let's say it's a robot for the context of this, right, because it's a fun one, you're running on a client, you have a robot and the robot has to respond in 300 milliseconds and the executions are happening locally in that context you're definitely going to want to C also because then it's going to be able to speak the rest of the system, you're going to be able to manage the memory, you're going to make sure that if you're doing it, for example, in a robot context, you're probably having to use some other locomotion stuff at the same time off the same chips.
Speaker B:In a gaming context it's kind of similar.
Speaker B:So you're now having to split memory between the actual graphics rendering as well as the character animation as well as this model inference.
Speaker B:So you want, you want that control, but you also just want the simple speed.
Speaker B:So you know, you might get very basic gains off of like a Python C exchange, but those gains add up especially as you start training on more, more and more components.
Speaker B:So like that three model structure that we talked about, that could get very complex.
Speaker B:You might have a bunch of, you know, Branching different contexts in which different model calls are made at different times, you may have custom microservices that are operating on the server as well that you're calling at certain points.
Speaker B:And in all those contexts, just like as you add up the complexity, you kind of have this explosion of the small latency and cost changes really mattering if you especially had, let's say 20 model calls in a chain.
Speaker B:And so what we did is when we built our runtime, which is kind of the best way to actually orchestrate these models in complex components, we built it all with a C core.
Speaker B:Now we recognize of course that people don't necessarily all code in c. We built SDKs on top of that to make it accessible.
Speaker B:But the benefit of that is even if it's running on the server, you're just shaving off those performance aspects.
Speaker B:And it gives us the ability to optimize the actual pipeline end to end and do kind of basic memory management and things you do.
Speaker B:The other thing is that it makes it portable also onto device.
Speaker B:So if you did want to have this as a compile, you can compile it.
Speaker B:You could ship it on a client, you could run it on a server, it's packaged, it's optimized, you're shaving off, you know, hundreds of mil, potentially hundreds of milliseconds depending on the complexity of call.
Speaker B:So it's just to say that like as we were noting before, these things don't matter if you don't care about that kind of real time performance.
Speaker B:But most consumers care about that real time performance.
Speaker B:And that's why we've built that all in C. And so yeah, we continue to basically build out the core of all of our systems and infrastructure in C for that reason, which it gives us more control.
Speaker B:It gives us that opportunity to shave off latency.
Speaker B:It gives us the option to compile the programs and run them on cloud or on the client, which is very important because if you think about a game, there might be a game where it happens to run on a powerful local computer, or it might happen to run in the cloud and be streamed to computer.
Speaker B:In both contexts you're going to want that same that performance.
Speaker B:We're also going to want to basically adapt to a different user device.
Speaker B:And so it just is really kind of all about those like basic things that don't matter if you're looking at these like complex business automation cases, but really, really matter when you're kind of doing something that requires that real time performance.
Speaker B:And when you're doing with multi millions of concurrent users where you're having to manage basically, you know, all the rate limits and the, and the kind of threshold of capacity because it's, that's a lot of processing that's happening there.
Speaker A:So the fine control over cpu, defined control over memory, the ability to optimize that in order to make it highly performant in both the SaaS environment as well as highly performant in a constrained local environment like a robot or AI assistant or whatever.
Speaker A:Both of those are better suited towards a language structure that allows you to have finer control over the CPU and finer control over memory.
Speaker A:And really, nothing beats C in both those aspects.
Speaker A:That's right.
Speaker A:My guest today has been Kylan Gibbs.
Speaker A:Kylan is the founder and CEO of InWorld AI, a framework for improving the development and scaling of large scale AI systems.
Speaker A:Kylan, this has been a great conversation.
Speaker A:Thank you so much for this.
Speaker A:And thank you for joining me on Software Architecture Insights.
Speaker B:It's been a pleasure.
Speaker B:Thank you so much, Lee.
Speaker B:And thanks for everyone listening.
Speaker A:Thank you for joining us on Software Architecture Insights.
Speaker A:If you found this episode interesting, please tell your friends and colleagues you can listen to Software Architecture Insights on all of the major podcast platforms.
Speaker A:And if you want more from me, take a look at some of my many articles@softwarearchitectureinsights.com and while you're there, join the 2,000 people who have subscribed to my newsletter so you always get my latest content as soon as it is available.
Speaker A:Thank you for listening to Software Architecture Insights.