In this episode, Frank La Vigne sits down with his Red Hat colleague Christopher Newland for a deep dive into the evolving challenges and opportunities at the intersection of AI, open source, and enterprise technology.
Fresh off attending both IBM Think and Red Hat Summit, Christopher Newland shares insights from two very different industry perspectives—executive strategy and hands-on engineering. Together, they explore the elusive “last mile” problem in AI adoption, the rise of agentic systems, the critical role of harnesses and runtimes, and why memory management is becoming the next frontier.
Plus, they discuss the practical realities and future potential of tools like OpenShift AI, IBM Bob, and open source alternatives. Whether you’re a developer grappling with implementation details or a leader focused on ROI, this episode has something for everyone navigating today’s fast-changing AI landscape.
00:00 Comparing IBM Exec and Red Hat Conferences
05:24 Challenges in AI implementation
06:56 Challenges in scaling microservices
11:38 Integrating AI with project management
14:23 Debate on AI model vs. harness
16:54 Discussing model evolution and limitations
22:54 Affordable Power BI Courses Bundle
25:19 Separating and managing runtimes
27:02 Using semantic routing for requests
30:15 Agent memory and compression basics
36:02 New AI approach and vision
38:49 Developing a multi-agent system
40:40 Importance of data chunking
It's very much based off of your requirements, your
Speaker:skills, your knowledge, your processes
Speaker:that now need to be defined within
Speaker:your AI stack. And that really is the last mile. And
Speaker:I think that's even I saw that from both conferences where
Speaker:the realization that there's still a lot of work that needs to be done to
Speaker:get AI to a point where it's actually very fine
Speaker:tuned, very functional, very efficient.
Speaker:And right now it may work, but it may not be very efficient for scaling,
Speaker:it may not be efficient for cost, it may not be efficient for the new
Speaker:token economy that we're seeing. And the last mile is
Speaker:historically the biggest problem to crack. Right. And once you solve that problem,
Speaker:Amazon as a physical last mile in terms of
Speaker:how they actually execute on delivery, right, because you can have
Speaker:warehouses, but everybody lives in a different house, right?
Speaker:So there's a lot of little last miles. It's, it's
Speaker:death by a thousand paper cuts, if you will. Proof of concept
Speaker:projects are everywhere. Real business value, that's
Speaker:the hard part. This is data Driven.
Speaker:I'm Frank Lavinia and with me I have a very special guest,
Speaker:Christopher Newland, who is a colleague of mine at Red Hat. And
Speaker:we're gonna do a deep dive. You, you travel all the time.
Speaker:I know that I travel two weeks back to back and that was a been
Speaker:a while since I had to do that. But it is conference
Speaker:season and so you
Speaker:were at IBM think last week
Speaker:and you were also at Red Hat Summit this
Speaker:past week, as was I. I have my Atlanta I heart I Red Hat
Speaker:Atlanta T shirt on and
Speaker:so how's it going, Christopher? But yeah, it was nice
Speaker:because one of those two IBM think was
Speaker:actually in my area of Boston, so I was able
Speaker:to attend that locally. Still, still a lot
Speaker:though. Like, you know, you're going in at like 7 in the morning to be,
Speaker:try to beat traffic and then you're leaving at like 10 o' clock at night.
Speaker:But it's very poor. Yeah,
Speaker:yeah. And those conferences were very, very different, targeting very
Speaker:different audiences. So it was, I felt like I got kind of
Speaker:two perspectives of the AI world and
Speaker:what people are concerned about. One from a very like executive
Speaker:lens and another one from more the day to day users,
Speaker:developers, engineers who are actually implementing the AI.
Speaker:So which is which? I think I know the answer, but yeah. So
Speaker:I IBM think is an executive conference.
Speaker:So I think it's normally director level or above.
Speaker:So it's targeting a lot of C suites, senior
Speaker:directors, I think the
Speaker:most that you would ever, lowest you would See would be like a senior manager
Speaker:of some sort. But for the most part it's a C suite type of
Speaker:conference. And a lot of the conversation
Speaker:there is more about the business return of
Speaker:AI and what does that look like this year. And then
Speaker:Red Hat Summit is very much about
Speaker:the system administrator, the cluster administrator, the
Speaker:sre, the developer who's actually
Speaker:utilizing these technologies and actually like implementing something
Speaker:with it or managing something with it. So very like two
Speaker:different lenses to the same challenge within the industry.
Speaker:Yeah, no, it was interesting. And I don't know about you and what you,
Speaker:the attendees this year had much better questions, I think, than any other
Speaker:Red Hat event than I've ever AI questions than I ever seen before.
Speaker:Right. It seems like people are struggling to
Speaker:implement this in a way that is secure,
Speaker:stable, scalable.
Speaker:And I think we also have a much better platform story this year than we
Speaker:had in previous years. Absolutely. So the way I've been
Speaker:framing it to people, it kind of goes into two terms. So the first
Speaker:term I've been using with people is that last mile.
Speaker:And that then kind of feeds into this
Speaker:concept that you hear a lot about in business and other
Speaker:industries called the 8020 rule. I think a lot
Speaker:of people are finding that 80, 20, what
Speaker:is 80% of the returns or
Speaker:20% of the effort? And then what we find is that it
Speaker:flips for that remaining 20%
Speaker:of returns is now going to be 80% of
Speaker:the effort. And that 20% is what I've defined really as
Speaker:the last mile. And I think the
Speaker:conversations I'm having with people is that they now have the tools and they've had
Speaker:POCs and they're seeing results and they're seeing
Speaker:even a lot of times good results. They just don't know how do I get
Speaker:it to the point where it's actually returning investment, whereas
Speaker:roi. And this is a question that was happening at both conferences,
Speaker:both from an executive lens point and from the, you know, the
Speaker:general day to day developers. And this is where I think open source
Speaker:is set for in a great position because
Speaker:there's so many open source tools out there that we
Speaker:can work with people on, you know, finalizing that last
Speaker:mile. I think what people are most annoyed about though is
Speaker:that there's not a magic button that's going to fix it because it's
Speaker:very much based off of your requirements, your
Speaker:skills, your knowledge, your processes
Speaker:that now need to be defined within
Speaker:your AI stack. And that really is the last mile.
Speaker:And I think that's even I saw that from Both conferences
Speaker:where the realization that there's still a lot of work that needs to be
Speaker:done to get AI to a point where it's actually
Speaker:very fine tuned, very functional, very
Speaker:efficient right now. It may work, but it may not be very
Speaker:efficient for scaling, it may not be efficient for cost, it may not
Speaker:be efficient for the new token economy that we're seeing.
Speaker:And the last mile is historically the biggest problem to crack. Right.
Speaker:And once you solve that problem, Amazon as a
Speaker:physical last mile. Right. In terms of how they actually execute on
Speaker:delivery. Right. Because you can have warehouses, but
Speaker:everybody lives in a different house. Right. So there's a lot of little
Speaker:last miles. It's death by a thousand paper cuts, if
Speaker:you will. Absolutely. And we saw the same thing with microservices
Speaker:back in the 2010s where there are a lot
Speaker:of organizations that developed microservices but
Speaker:then had a lot of challenges and had to overcome a lot of
Speaker:that last mile when it came to data domain.
Speaker:And you know, where, where does your data exist within this
Speaker:microservice architecture? How do you do contracts and
Speaker:handshakes between services? How do you orchestrate these services? How do you scale
Speaker:them? You know, in many ways this is the
Speaker:problems that we saw kubernetes kind of develop out of.
Speaker:And now we're seeing being embraced now by a lot
Speaker:of the same challenges we're seeing with agentic systems and
Speaker:AI and how do we scale that out efficiently?
Speaker:So I love what you said. It's not a new problem. It's
Speaker:just the same problem we've seen reiterating over
Speaker:50 plus years of compute history that now
Speaker:just has a different lens to it of the
Speaker:AI problem now. But a lot of the same solutions are still the
Speaker:solutions that we had for many of those advancements in technology
Speaker:that we saw, you know, over the last few decades. That is
Speaker:interesting because like, you know, you know, Kubernetes does
Speaker:has solved a lot of the same problems and it doesn't solve them all. But
Speaker:there's a significant overlap and I, and I got that sense from the conference
Speaker:that people are finally starting to get it. Like, why OpenShift AI?
Speaker:Well, because OpenShift solves a lot of these problems. You just
Speaker:put AI on top of those solved problems and it
Speaker:doesn't fix everything. There's still going to be a lot more room
Speaker:for improvement in terms of how you implement that
Speaker:on your last mile. But it gets you
Speaker:halfway there from a get go easily. Yes,
Speaker:absolutely. A lot of the questions that
Speaker:IBM think were it was Actually funny,
Speaker:a lot of them are about IBM Bob and I know you and I have
Speaker:been kind of talking about this for like last two weeks, but
Speaker:at the IBM think IBM Bob was a very serious conversation of
Speaker:executives wanting to know how can they
Speaker:mimic tools like Claude code. Right. But
Speaker:within their enterprise setting. And the biggest thing about IBM
Speaker:Bob, that I learned actually at IBM think from both the engineers there and
Speaker:those who are interested in it, is that a big thing here is what they
Speaker:want for institutional knowledge. They want to keep a record
Speaker:of all that institutional knowledge from the prompts
Speaker:and the context and all the things that you
Speaker:know, are built out of IBM Bob,
Speaker:so that they can keep that information as institutional
Speaker:knowledge. Really being able to then take that knowledge and
Speaker:then kind of re injected into their broader agentic
Speaker:engineering. And I think that's actually the, you know, I don't think
Speaker:IBM Bob is actually really meant to be a clone of cloud code. I think
Speaker:it's really meant to be a manager of institutional knowledge across
Speaker:many different. Yeah, so we have a
Speaker:special guest, a second special guest show up. This is
Speaker:Crystal, my little dachshund pup. And I had a pick her up from
Speaker:chewing wires, but I was listening. But
Speaker:you're right though, there is definitely a
Speaker:Bob feels different. I don't know how to describe it.
Speaker:I had issues getting authenticated into it but the folks at
Speaker:the Bob booth, at the my IBM booth did help with that. But
Speaker:it's unfortunately named honestly I think because I think
Speaker:of Microsoft Bob and that was not
Speaker:exactly a winning product. Right. But,
Speaker:but I, I've been playing around with it and I, you know, I had to
Speaker:do kind of the init process on, on, on a couple
Speaker:projects and was interesting because it suggested how to take those
Speaker:projects and turn them into MCP servers and agents,
Speaker:which the other ones Codex and Claude
Speaker:have not. I thought that was interesting and I didn't prompt it to do that.
Speaker:It just basically said on its own like, you know, you could turn
Speaker:this process into an agent MCP server and things like that.
Speaker:While that was in the back of my mind as I, you know, built these
Speaker:various projects, it was not top of mind. So I thought that was interesting.
Speaker:Yeah, it definitely is not a clone. It's. It's meant to solve a new
Speaker:problem. Yes,
Speaker:I agree, I agree. And I think it really
Speaker:starts feeding into this bigger scope of things like
Speaker:sports perspective and development. Right. And these other tools
Speaker:of like how do we get the knowledge out of the project managers, how do
Speaker:we get it out of the JIRA and how do we get it
Speaker:into a way that the AI can
Speaker:interpret it but not lose that knowledge along the way?
Speaker:So as the prompts are coming in, as the context are coming in, it then
Speaker:comes part of that institutional knowledge. And I think that is
Speaker:ultimately what Bob is trying to achieve. That is very
Speaker:different than I think, what a lot of the other alternatives out there are.
Speaker:My hope is that as this grows, we see more opportunities for
Speaker:it to become more open source. That's probably one area
Speaker:where it's a little different than what we do here at Red Hat, where I
Speaker:think we, I mean, we're not supporting the project,
Speaker:but a project that we're very closely buying is things like
Speaker:open code, for example, which is an open source alternative
Speaker:to cloud code. It's really interesting to see all these
Speaker:different solutions right now. I also like the fact that Bob can be
Speaker:an IDE and mimic more cursor, or it
Speaker:can be a CLI and mimic more of a cloud code, which obviously
Speaker:with my background I'm more comfortable with the CLI side
Speaker:now. That was a big one. And I would say then obviously agents would
Speaker:be the second biggest thing. Just in general, that was the theme last
Speaker:year at IBM Think. And that didn't change this year. I
Speaker:think we're just seeing the experimentation of
Speaker:agents now, moving into the
Speaker:solidification of agents in the industry.
Speaker:And I think we heard about agents a little bit at a high level
Speaker:at IBM Think. But then for Summit, everything was about
Speaker:agents. Everything went down to
Speaker:how does this implement to the agent, how does the inference of AI
Speaker:implement to the agent, how does the data implement
Speaker:to the agent? You know, the orchestration layer,
Speaker:kubernetes, all these things. It all had to do with the agent.
Speaker:And that was really interesting to see how the conversation
Speaker:over the last two years has shifted from all
Speaker:of these individual parts. I think the last time I was on, on your show
Speaker:and I know you and I have talked a lot about, about how AI. There's
Speaker:been a lot of these parts, but nothing has kind of unified them.
Speaker:I think what we're seeing with AI agents is going
Speaker:to be that unification. The agent will become the unification part of all these
Speaker:different parts of the AI industry where all these tools now will come
Speaker:together. And we saw a lot of that at Red
Speaker:Hat Summit. You don't think that'll be. Harnesses will ultimately
Speaker:be the container for that, where all these things will live and harnesses will be
Speaker:kind of like top level abstraction. This is a really good
Speaker:question because this is the big debate within the AI labs
Speaker:and the AI community, are you invested in
Speaker:harness engineering or do you think the models
Speaker:themselves will just supersede
Speaker:the harness and that they can be knowledgeable
Speaker:enough to basically function agentically without.
Speaker:So obviously the open AIs and the
Speaker:clouds of the world and anthropic. They're probably a little bit
Speaker:more on the model side because that would ultimately benefit them.
Speaker:Right where I think the IBMs and the
Speaker:Nvidias and I would say the majority of the industry
Speaker:is probably a little bit more on the harness side because that allows
Speaker:a larger ecosystem of third party tools and something
Speaker:that's a little bit more familiar to people. I don't know.
Speaker:I think over the next year or two it'll definitely be harnessed because that's where
Speaker:we've seen the most advancement. But with things like mixture of
Speaker:expert models just continuing to advance and how
Speaker:they can do reasoning and they can do a lot of agentic.
Speaker:It could be that we see the model layer
Speaker:chip away at the harness layer and is this going to be a back and
Speaker:forth and it really just gets also into
Speaker:how do you inject the context. And this is closely
Speaker:related to the same argument of is RAG still needed?
Speaker:With context size growing so much, why would you need rag? And
Speaker:I think from an enterprise standpoint, and I think Red Hat is
Speaker:very big on the harness side because we see the
Speaker:need for different security layers, different different integrations into third party
Speaker:tools, different
Speaker:authorization layers, routing,
Speaker:networking that the model
Speaker:will not be able to manage completely, at least for
Speaker:a while. And that's where I think the harness engineering layer will
Speaker:exist because there are all these existing technologies
Speaker:that the agent needs to integrate with and that's all going to
Speaker:happen at that harness layer and then be
Speaker:executed within that runtime layer. Yeah, that's how I see it
Speaker:too. I think the harness layer is really going to be.
Speaker:It may not be a foundational type situation where you build on
Speaker:top of it. I see it more as the mortar between the bricks.
Speaker:I agree. Right. Like, and it's not
Speaker:that the mortar is more important than the bricks, but
Speaker:the bricks are kind of pile of rubble
Speaker:unless you have mortar kind of holding in place. That's kind of how I see
Speaker:the harness story evolving.
Speaker:But I have a hard, I, I have a hard time
Speaker:imagining models ever being able to be that far advanced.
Speaker:However, you know, we've gotten
Speaker:further with the LLM architecture than I ever thought we would.
Speaker:Synthetic data has been more. And
Speaker:distillation has worked better than I ever thought it would. So Take.
Speaker:Take my thoughts with that in mind. Right. You know,
Speaker:when. When I looked at synthetic data and kind of
Speaker:distillation in particular. Right. There's a meme where they show,
Speaker:you know, somebody fishing in the. In the water, and then somebody is
Speaker:fishing from that guy's pot, and then from. Somebody was fishing from that
Speaker:guy's pail. Right. And then they show each subsequent
Speaker:fisherman was like, more and more distorted. We've not really seen
Speaker:that come about. Right. It's not like you're copying VHS
Speaker:tapes where subsequent generation gets
Speaker:worse. I'm sure that if you don't do it carefully, you'll
Speaker:get some weird artifacts. But it's not been.
Speaker:That has not been a default case, which I think is interesting. It
Speaker:is interesting too, because most of the models that are out
Speaker:right now are
Speaker:distillations of actually GPT4 family. Right.
Speaker:Even the GPT5 is still a direct
Speaker:distillation of 4. It was not completely retrained.
Speaker:And Anthropic obviously has their first generations and second
Speaker:generations, but we actually haven't seen very much
Speaker:new generation just because how expensive it is to create
Speaker:from. From fresh. And from. What I'm imagining is that they've
Speaker:tried and it's just they haven't gotten the results that they wanted.
Speaker:So I think that will be what we see. I don't know. I haven't heard
Speaker:if Mythos is a. So if people aren't following the. With Mythos
Speaker:model from Anthropic, it's a.
Speaker:It's a model that they've withheld because supposedly it's too
Speaker:risky. I don't know if that model is.
Speaker:Is a whole new generation. I would imagine that it probably
Speaker:is. But to your point, most of the models are out there now,
Speaker:and what we know from the Chinese models, that they're all just distillations of
Speaker:the American models. We have proof now that they've been
Speaker:mass API, hitting the
Speaker:GPT and Anthropic and Gemini to
Speaker:create the generation of Chinese models that we have now.
Speaker:So that's something. And they get. They're very performant. Like, those models are very, extremely
Speaker:good. Very good. I mean, it just shows you like this. This is
Speaker:not the paradigm of, you know, analog VHS copying. Right. This
Speaker:is more. More, I guess, in the style of, you know,
Speaker:remixing an old song digitally. Right. You
Speaker:don't really get it's. It's not
Speaker:a well thought out analogy, Christopher, but. But, you know, you'll hear
Speaker:like, you know, a lot of techno songs in the early 2000s you will hear
Speaker:them on I don't go to Clubs Anymore, but on my, on my what's
Speaker:New and what's Hot techno playlists on Spotify.
Speaker:Right. I, I recognize the same backbeat, I recognize the same
Speaker:chorus, right. Like from songs from like 20, 30 years ago, right. Like,
Speaker:and even sampling and rap music, right. Like, it's a bit more like that
Speaker:where you do get a completely fresh perspective based on older parts.
Speaker:And that's something that I did not expect. I, I just assumed that it would
Speaker:be some kind of. You would start getting really bizarre artifacts after
Speaker:so many generations. But that's not been the case. So,
Speaker:you know, I think it's interesting because we really don't. This is really uncharted
Speaker:territory, right? These are. Yes, they're based on very well known mathematical
Speaker:principles. But like, as these systems get more complex,
Speaker:it's getting harder and harder to predict not just their behavior, but
Speaker:the range of their behaviors. Yep. One second. I'm going to grab
Speaker:something because we'll do a little bit of show and tell as well. Cool, cool,
Speaker:cool. So while you're away, I will
Speaker:maybe I can interview a dachshund. So what is, what do dogs think about AI?
Speaker:Everybody and their cousin and their dog has a AI startup now. So what's your
Speaker:AI startup? Oh, a link shortener.
Speaker:Okay, cool. Because I get it. Your short
Speaker:legs. I get it. That's cool.
Speaker:While we're waiting for Christopher to come back, you all know I'm a big
Speaker:fan of Humble Bundle, so. Humble
Speaker:Bundle. Oh, you're back. Cool. Oh, you can finish your thought.
Speaker:Humble oh, so Humble Bundle. I actually, so I
Speaker:worked the booth. I had a, A talk on day one and I worked
Speaker:the booth on the subsequent days and
Speaker:you know, a lot of people came by and
Speaker:other Red Hatters. Actually, I, I was showing them
Speaker:Humble Bundle. I'm sorry, go ahead.
Speaker:No, I just said that looks really cool. Yeah, yeah. So if
Speaker:you're not Familiar with it, humblebundle.com it started as games,
Speaker:but if you go and you pick store.
Speaker:Not store, I'm sorry, bundle,
Speaker:you can pick books and there's comic books there. But there's
Speaker:also a lot of stuff here that is particular around
Speaker:software. Right. So in this example here,
Speaker:this is the books on practice
Speaker:exams for AWS and gen AI,
Speaker:all sorts of interesting stuff here. Security.
Speaker:This is actually a hybrid of like courseware.
Speaker:So they also have software oops.
Speaker:Bundles that are, you know, sometimes it's kind of like image editing
Speaker:tools and things like that. But very often
Speaker:they Will have courses for, you know, how to get into Open Claw and things
Speaker:like that. And I know if you don't know, Christopher is really into openclaw,
Speaker:he helped me get my Claudia kind of up and working.
Speaker:But if you go here, I know a lot of listeners Data Driven are
Speaker:big into Power bi. These are basically courses on Power Bi
Speaker:and things like that. And the cool thing is it's
Speaker:$20 for 17 courses and a
Speaker:portion of your cost goes to a charity.
Speaker:So it's really cool. You get a lot of material and you know, a
Speaker:charity gets funded and things like that. So definitely
Speaker:check it out. They often have AI books or,
Speaker:you know, app development books. A lot of things around
Speaker:game development too because that's kind of where Humble Bundles started.
Speaker:Nice. That's a great segue too because speaking of
Speaker:openclaw, so when I got home from
Speaker:Red Hat Summit, this arrived. Oh, nice.
Speaker:So I haven't gotten a Mac Mini in
Speaker:probably like over 10 plus years. So when this came I was
Speaker:kind of like. Because I don't know if you remember, the Mac Mini
Speaker:was maybe about the same size, but it was much bigger than this.
Speaker:And I'm actually holding this with like one hand right now.
Speaker:But the reason why I got this is because
Speaker:Red Hat in particular wants to make sure that all of
Speaker:the agents that I'm running for Red Hat are isolated
Speaker:for runtime. So I could use
Speaker:my. Let me see if I can pull it over. You have one of those
Speaker:framework things, right? This is the framework. Yeah. This is the
Speaker:size of it. So that is actually
Speaker:powering my home Lab that has OpenShift in it. I could
Speaker:do that. And that's actually where a lot of our tooling are going
Speaker:towards. But I also need a
Speaker:agent to have access to my email, have access to like more
Speaker:like my day to day tooling which actually exists more. More on a
Speaker:desktop. And that's where this guy comes into play.
Speaker:It's interesting. Now we're separating
Speaker:the harness from its runtime and now I'm dealing with
Speaker:multiple runtimes. I'm going to have runtimes that probably run on the home lab
Speaker:and now we have runtimes that are going to run on this.
Speaker:This one would be. I need it to do something that actually
Speaker:involves some kind of GUI or something that's already on my
Speaker:desktop, which just there's no API set up for me to do.
Speaker:Or I need it to do something, maybe some, something basic that's really easy to
Speaker:do within the Mac ecosystem where this
Speaker:like my home Lab, maybe it's an agent that's running diagnostics
Speaker:on like AI ops diagnostics on my
Speaker:home lab. Why isn't something up? Why isn't it working correctly?
Speaker:And this is where the whole concept of
Speaker:runtime now has become such a big thing. And I think it will continue to
Speaker:become more important this year. Harness is kind of getting the
Speaker:spotlight, but we need to move more into this runtime conversation of
Speaker:okay, now the harness has put the context together, it's put all the knowledge together
Speaker:and the skills. The agent is running the agentic loop with
Speaker:the model. But now where does the output actually run?
Speaker:Does it run on your, your personal computer
Speaker:where it has access to sensitive information and you know, it
Speaker:could do things that it shouldn't or does it run in an
Speaker:isolated environment? So this is probably going to act more as like a little server
Speaker:that runs here in my office where
Speaker:agents, this is just for agents. This box. Where is the inference run?
Speaker:Where does the inference run for your agents? Is it run? Yeah.
Speaker:Does it run on that? Does it run on your framework or does it run
Speaker:in a hypercloud service? So I'm actually
Speaker:doing a new technique called semantic routing.
Speaker:All my requests go to my home lab first. Within what we would call the
Speaker:control plane for the agent, there's a router that
Speaker:exists that actually evaluates the information that's coming in and
Speaker:decides based off of sensitivity and complexity where
Speaker:this route should go. About 80% of my traffic actually hits
Speaker:the framework for a model that's running within Vllm
Speaker:on the framework device itself on
Speaker:OpenShift and then about 20% where I've deemed kind of high
Speaker:reasoning. Then we'll get sent off
Speaker:to our corporate Gemini account that we have within Red
Speaker:Hat. So this way it's also really
Speaker:nice because when I first started working with agents all the way back, I mean
Speaker:I've been working with agents for, for years and years. But our current
Speaker:modern day idea of what agents look like, back at
Speaker:the beginning of this year I was running out of
Speaker:tokens. I was getting throttled by, by Google and there was
Speaker:nothing I could do about that because that was part of our corporate account. It
Speaker:wasn't anything to do that I could go and change the knobs. So moving to
Speaker:this semantic routing approach allowed me to not
Speaker:run into that throttling anymore. Most of my things go. So right now I'm running
Speaker:a quinn, the quinn 3.6
Speaker:35B mixture of experts model. Nice.
Speaker:And that's running right now and doing all of my local agentic work.
Speaker:It's doing most of the low reasoning tasks and then all the high
Speaker:reasoning tasks then get sent off to Gemini.
Speaker:So do you ever have it set up where the high reasoning task will divvy
Speaker:up a bunch of low reasoning tasks and then send that down to your Quinn?
Speaker:Or is that something in the works? I have
Speaker:experimented some with that. So
Speaker:that gets into some like post inference type of techniques that
Speaker:we've been experimenting with, myself included.
Speaker:I haven't gotten that far yet. This is where
Speaker:areas such as like speculative decoding kind of come into play or
Speaker:post inference technique. Why would speculative
Speaker:decoding come into play here?
Speaker:Yeah, because there could be a speculator that sits
Speaker:at the local model that actually
Speaker:acts as kind of almost like a guardrail to the
Speaker:larger model where it can actually start reasoning about
Speaker:some of the things earlier on and decide
Speaker:basically acts as a breaker. I got you. And that makes sense.
Speaker:That's where speculative decoding would be kind of the
Speaker:next iteration on that where
Speaker:it's really the management of knowledge and memory and cache at that
Speaker:point. I really haven't gotten into that with my local setup, but that's
Speaker:part of that whole last mile where memory I
Speaker:think will be the last portion of the last mile for
Speaker:everybody. It's going to be memory management, it's going to be cache management.
Speaker:When you say memory, organizational memory, not necessarily the physical memory.
Speaker:When I'm talking about memory, I'm talking about the memory of
Speaker:the agent itself. For
Speaker:OpenClaw, for example, every time it makes decisions,
Speaker:it keeps a compressed record of what it's done in these
Speaker:JSON files and then it will reference that
Speaker:your cloud code does something very similar. Every time you hit your token
Speaker:context window maximum, you'll see that it's doing a bunch of
Speaker:compressions and it takes a little thought. That's actually what
Speaker:we call a form of memory. If you've actually been following the news.
Speaker:Even just today,
Speaker:Google IE just announced a whole new
Speaker:agentic memory platform, a
Speaker:framework that fits right into this. And that's why I think memory
Speaker:is going to be the next iteration on. On,
Speaker:you know, improving the agentic system. And that's not the KV cache,
Speaker:that's not your physical memory. It's not the
Speaker:agentic memory would be a. Yeah, it's like a gentic memory. It's how your
Speaker:agent has recogn
Speaker:reconciling what it's doing doing and has. It's. It's
Speaker:outside of the context window, but it's not the KV
Speaker:cache. It's something that's like, oh, this is what I've done in the
Speaker:past and this is the context I need that I just
Speaker:need to keep carrying forward in my conversations.
Speaker:It's something that maybe it's not an MD
Speaker:file, it's not like permanent knowledge. It could get flushed. You could just say
Speaker:go ahead and flush your, your memory
Speaker:and that may actually be what you need to do because maybe it's. There's a
Speaker:lot of nonsense in there or something that's doing something wrong. It's not
Speaker:meant to be long term. Think of it like human short term memory. Exactly what
Speaker:it is. Interesting. Not everything that we do is long term.
Speaker:So long term memory in this case would be the, your
Speaker:MD files, it would be your kv,
Speaker:potentially even like some layers of your KV cache where I
Speaker:would actually consider that more like intermediate. But it's really that
Speaker:long lasting context that just keeps getting injected in where
Speaker:this concept of, of memory that we keep hearing about
Speaker:is more of that short term memory of what knowledge
Speaker:do you need to have right now to make the decisions that you need to
Speaker:make based off of the reasoning and the topics that you're working with
Speaker:right now? So a good example would be. I'm sorry, go ahead. No, no,
Speaker:no, no, go ahead. No, good, good example. You like your hotel room number when
Speaker:you go to a conference, right. Like you're never going to cancel, you're having, you
Speaker:know, needing to remember that beyond once you check out is very low.
Speaker:Or when you get the two factor authentication, the six digit code. Right. You only
Speaker:need to remember that for a very short window of time.
Speaker:Yes, exactly. And that's a prime example where
Speaker:you could long term forget that information. But in the
Speaker:short term it would be very detrimental if you forget your hotel room, you have
Speaker:to go and ask somebody and that, that takes time. Yeah, it takes time and
Speaker:that's exactly the same narrative. It's not that the agent couldn't get that information,
Speaker:it's just that it's faster for the agent to get that information if it's
Speaker:located in some type of short term memory. And that's
Speaker:where we're seeing so much advancement in, in these
Speaker:agentic platforms. Did you want to, did you want to add
Speaker:anything to that? I know we're coming up to time, so I just. Oh no,
Speaker:I mean, no, I appreciate your time. I see the, I see that we' up
Speaker:on time and you know the.
Speaker:No, I think there's a lot. I think, I think the one thing I learned
Speaker:this week was it's very easy to think that you're behind
Speaker:everyone else. But you know, we've had people, we had people come in the booth,
Speaker:like, I don't know anything about this, tell me where to get started. And I
Speaker:was like, you know, to hear that in 2026 was
Speaker:both shocking and, and
Speaker:refreshing. Right. You know, there were, there were people. I'm not going to name
Speaker:any names, but like, you know, there are people who are in our
Speaker:division and they've not even installed
Speaker:Claude yet. Open Claw. I mean, I
Speaker:always get those two confused, even though I know they're very different things.
Speaker:But, you know, who've not installed Open Claw, like
Speaker:on their own? And it's just like I feel behind because I
Speaker:have Open Claw, but I don't have it as set up. Well, set up as
Speaker:you. Right. But I do have it, you know, so it's kind of
Speaker:like it's, it's, it's, you know, don't be afraid of being behind because
Speaker:chances are you're probably not. No, no. Part of the reason why
Speaker:I have the dog do that intro now, which of course is, you know, obviously
Speaker:AI generated was part of the joke of that was everybody on their dog is
Speaker:an AI expert now. And there's not
Speaker:really any experts. There's probably about half a dozen people
Speaker:worldwide that really are on a whole other level.
Speaker:I mean, the Andrew Angs. The Andrew Angs of the world. The
Speaker:Jeffrey Hinton's of the world. Right. Like those are the people.
Speaker:Yan Lecun for sure. You don't
Speaker:hear much from Yahshua Bengio anymore. But you know,
Speaker:like people at that level, right. At that, that strata, like
Speaker:they are, they really are like that far ahead.
Speaker:And it's always interesting seeing like what problems they're trying to solve.
Speaker:I think is very interesting. What is particularly interesting, I think it was. John Lecun
Speaker:is very skeptical of LLMs getting any further
Speaker:along. Yeah. Which I think is
Speaker:interesting. I mean, it's, you know, at this point almost a 8 or 9 year
Speaker:old concept of LLM
Speaker:transformers. The concept that he created. The concept that he created. Right.
Speaker:So the underlining layers. Yeah, yeah. So like a lot
Speaker:of. Go ahead. I was gonna say there's a lot of new
Speaker:interviews that he has out in the last couple weeks about, you know, his
Speaker:new approach to AI and how he
Speaker:sees it superseding LLMs. And that'll be interesting
Speaker:too because he's looking at it from a whole new direction than just
Speaker:how LLMs just, they're just,
Speaker:they're just building the next pixel the
Speaker:next text where he's looking at it from a whole new
Speaker:direction of, you know, maybe we built this
Speaker:house of cards wrong. We need to just kind of start over and basically
Speaker:like, stop, start at the basics and, and build something better from what we've
Speaker:learned. And it'll be very interesting to see what he comes up with out of
Speaker:all this. Yeah, I, I, because I, I'm surprised we've gotten
Speaker:this far this fast with LLMs. I, I
Speaker:really thought, like, the whole reasoning aspect to LLMs is something I
Speaker:did not see that they were, I did not, I would not have bet real
Speaker:money on them being able to do that. Right. But here we are. Like, they
Speaker:clearly can do some level of reasoning. How much is probably
Speaker:debatable, but the fact that they're just, you
Speaker:know, you hear that they're just like text prediction thing algorithms on your
Speaker:phone where they predict the next word. Well, technically true,
Speaker:I think doesn't really tell the whole story. Right. Like, that's like saying that the,
Speaker:the F35 fighter is the same thing as a
Speaker:paper airplane. Right. Like, they do have to apply, they do have to obey the
Speaker:same laws of physics, thrust, lift, gravity, blah, blah, blah.
Speaker:But they are very different animals in that sense.
Speaker:Very much. I agree with that analogy. It's really good. Cool, man. I love to
Speaker:have you on the show again. We could talk open claw. Yeah. You've done some
Speaker:crazy cool stuff with that. Definitely. I know some of the
Speaker:agents that you've built that people probably don't want me talking about because I know
Speaker:you made a lot of it. Security people very nervous.
Speaker:That's true. But the stuff that you've been able to
Speaker:automate has been nothing short of like, oh, my God, that's amazing.
Speaker:And also super useful. Crazy too, for me is that I've
Speaker:been so busy that so much of the stuff that I did that people were
Speaker:talking about was like one to two months ago
Speaker:and I think this summer. So there was actually a,
Speaker:a really popular podcast out of the the
Speaker:AI Daily Brief that went out where he was talking about how
Speaker:everything that's happened over the last six months basically came out of
Speaker:Christmas break. So, like, everyone went home and
Speaker:had a few weeks to just like, play around with this stuff. I was one
Speaker:of those people. So, like, so much of what I did came out of those,
Speaker:like, experimentation phases. And I think I have to repeat
Speaker:that this summer because there's so much new things that we've learned.
Speaker:Right. That, But I still haven't built on top of that yet. And I think
Speaker:so for me right now, so many of my agents are doing very simple tasks.
Speaker:They're doing information gathering, they might be looking at
Speaker:meetings and suggesting that I read certain articles
Speaker:correlating to something I'm about to talk about. But I want to go the
Speaker:next level where I get into like a multi agent system where
Speaker:I have like a chief of staff agent who's got one
Speaker:that's doing programming demos and then I have another one that's doing
Speaker:like general administrative assistant work or another
Speaker:one that's front facing, you know, I
Speaker:a model that's on our slack that people can just ask questions to
Speaker:based off of my institutional knowledge that I have of, of
Speaker:our company and our industry. So that's the next phase and that's
Speaker:where the memory stuff has to come into play and the multi agent kind of
Speaker:orchestration and all these things are things that are being worked on now. So there's
Speaker:not like a clear winner or a clear understanding of what's what
Speaker:that looks like right now. But we're all kind of playing around with it. So
Speaker:I think that's kind of the next phase. And yeah, I look forward to coming
Speaker:back. And I think that's probably part two of this conversation will be absolutely.
Speaker:What does that look like? What are these tools? How do we kind of build
Speaker:on top of this thing called openclaw or
Speaker:Hermes or all these other ones that are out these days. Yeah, that'd be
Speaker:awesome. And even if we just do a deep dive on like kind of what's
Speaker:exactly, you know, what's what. Because I know you mentioned a couple of things that
Speaker:maybe most of our listeners don't fully grok, right. Because
Speaker:we have a lot of data engineers here too. Right. So and the
Speaker:other thing too that really came out was people would ask me questions about because
Speaker:we have something that Microsoft folks may know as TFAs or technical focus
Speaker:areas, call them pillars. So you're the agentic lead, I
Speaker:believe, and I'm the connecting models to data. Right.
Speaker:So the rag and that sort of thing and you know, a lot of the
Speaker:conversations I had was, you know, data engineering is
Speaker:more important now in AI systems than they were in the past
Speaker:because I don't know exactly how
Speaker:rag agentic systems would fail but,
Speaker:but when they fail, they probably fail very spectacularly.
Speaker:But I know with, with rag systems, right, you know, if
Speaker:your data chunking strategy and your data kind of
Speaker:indexing strategy is not, I wouldn't say perfect because you'll
Speaker:never really get there, but appropriate to the data source documents that you're dealing with,
Speaker:you're not gonna. It's gonna fail in a way that is subtle and is only
Speaker:gonna amplify get worse down the road. Right. So you
Speaker:really have to think through a lot of these things. Right. The, the one sentence
Speaker:I said most of all was, you know, chunking
Speaker:is an architectural decision. Yes. It's
Speaker:an important one. Treat it with that importance as opposed to just whatever,
Speaker:you know, paragraph by paragraph or blah, blah, blah, blah, blah.
Speaker:So that was other consistent theme. But I will say that the, the questions that
Speaker:I get are far more evolved than I haven't gotten at any
Speaker:other conferences in a while. I agree.
Speaker:Especially this year. It's just a step up from where we were. So. Yeah. Cool.
Speaker:This is great. Thank you for having me on. No problem. We'd love to
Speaker:have you back. And since the recordings for the podcast, we'll let the music play.
Speaker:It.