Artwork for podcast The CTO Compass
AGENTS.md Won’t Save You: Design AI Systems You Can Actually Control ft. Craig Kaplan
Episode 2613th March 2026 • The CTO Compass • Mark Wormgoor
00:00:00 00:52:27

Share Episode

Shownotes

AI is moving from copilots to autonomous agents, and most tech leaders are not prepared for what that shift means. Craig walks through the real risks behind superintelligence, why AI checking AI is becoming inevitable, and how CTOs and CIOs can design safer, more resilient systems before autonomy outpaces human oversight

Rather than focusing on hype, this episode dives into the alignment problem, the limits of guardrails, and why monolithic black box models may be the wrong long term architecture.

You will hear a practical path forward for tech leaders who are already overwhelmed by AI generated code, agent frameworks, and rapidly evolving models

If you are leading engineering, AI, or technology strategy, this episode will challenge how you think about safety, governance, autonomy, and the future role of the CTO in an AI driven world.

Key Takeaways

  1. Why AI checking AI is not optional as code generation and autonomy scale
  2. What the alignment problem really means for enterprise technology leaders
  3. The limits of guardrails and why prevention at design stage beats patching at deployment
  4. How democratic architectures of multiple agents can reduce systemic risk
  5. Why vendor agnostic, multi model strategies increase resilience and strategic control
  6. How to embed company values into AI systems through training, memory, and architecture
  7. What P(doom) represents and why many leading researchers assign it far higher risk than most executives assume

About Craig

Dr. Craig A. Kaplan is a renowned expert in artificial intelligence, artificial general intelligence, and superintelligence, with a focus on collective intelligence and quantitative modeling. He is the Founder of Superintellligence.com and CEO and founder of iQ Company, a consulting firm dedicated to advanced AGI and SI systems. Previously, he founded PredictWallStreet, a financial services firm that powered top hedge fund performance by leveraging the collective intelligence of retail investors. Dr. Kaplan has authored a book, published extensively in scientific journals, and holds numerous patents on AI-related technologies.

Chapters

00:00 How far is AGI?

06:55 What is P(doom)?

16:43 AI Reviewing AI Output

20:54 Dealing with Bad Actors (Human or AI)

25:36 Approaching AI as a Small Scale CTO

30:43 Democracy of AI Agents

35:15 AI Safety Conferences

40:03 AI Models, Open-Source or Big Company?

45:05 Is AI Adoption Keeping Up?

Where to find Craig

  1. LinkedIn: https://www.linkedin.com/in/craigakaplan
  2. Website: https://www.superintelligence.com
  3. Website: https://iqco.com
  4. YouTube: https://www.youtube.com/@iqstudios1

Transcripts

Mark:

It used to be that AI was, even for us as humans, manageable. The output was something we could review and we could check. And I was actually watching a CTO discussion this week where they're saying AI by now is just generating so much code, we're already seeing this happen, that we can't review it line for line anymore.

So they're discussing right now, how do they actually check to generate a code? Do they just do spot checks? Do they implement an AI model to see, to check the code that AI has generated? What's the next.

Craig:

Step? Right.

So eventually, I think we're going to end up almost certainly, we'll enter a world in which it's AI checking AI, because AI is the only thing that can keep up with AI. That's just a fact that has to do with the computational power of the human brain and the number of operations per second that we can run versus how many operations per second these chips can run. But the problem is, It will be smarter than us. And now it has autonomy. I don't see any way to stop those things. It's too diffuse. It's already going.

So what we're left with is, we hope that it has the same values as humans. That's really the end game. It must have the same values or be positively aligned. This is what AI safety researchers refer to as the alignment problem. Does the AI want the same things as humans? If it does, this thing's going to be the best thing ever. If it doesn't, That could be very scary.

Mark:

Welcome to the CTO Compass podcast. I'm your host, Mark Wormcourt, tech strategist and executive coach. In every episode, we meet tech leaders from startup CTOs to enterprise CIOs to explore what it means to lead in tech today. They share their stories and their lessons so that you can navigate your own journey in tech. Let's dive into today's episode.

So many of the AI conversations today, they focus on the capability, the faster models, the smarter agents, the better co-pilots, and almost nobody is asking the harder questions. If they become more autonomous, and we're seeing a lot of that with agents already, who actually controls them? Can we actually oversee what's going on?

So, And what does that mean for our work right now? Today's guest, Dr. Greg Kaplan, has spent decades building intelligent systems, collective intelligence models that he actually used for trading, empowered a bit of Wall Street trading systems, and he's now working on democratic architectures for safe superintelligence. Craig, absolutely welcome, and I'm really happy to have you here today.

Craig:

Mark, it's terrific to be with you. Thank you for having me.

Mark:

So and the first question that I have, and we've heard that AGI is coming, right? And that is either imminent and others say that it's a decade away. We've been hearing the story, I'd say, almost for two decades now that it's really going to come. We now have people like Sam Altman that say it's one or two years away, others that it's 10 or 20. Where do you think we are?

Craig:

Yeah, that's a, Great question. A lot of it depends on how you define it. I've actually heard Sam Altman say, if you were to go back two years in the past, and consider what we thought AGI would be, then by that definition, we're already there.

So some people would even say, you know, it's here already. And then others, as you say, you know, 10 or even 20 years out, I don't think it's going to be 20 years. I think definitely within 10 years or I have high confidence within 10 years and possibly much sooner than that. It's an evolution that's been accelerating, and we can go into how that looks. But yes, it's coming fast.

Mark:

Yeah, and I think I understand what Sam Altman said. Recently, we had the Turing test, and we were all supposed, if we could ever meet the Turing test, that we would have AGI. I think with all of the LLMs that we have right now, that's done and dusted, and that's behind us, and we're still arguing whether it's AGI or not. Phil. What do you think true AGI is before we go on?

Craig:

So sort of a classic definition of AGI would be artificial intelligence system that can perform any cognitive task about as well as the average human. So it's sort of defined as, you know, kind of the average or median human. But the key is it's across any cognitive task.

So we already have very advanced AI systems that are even better than humans at specific narrow tasks, playing chess, folding proteins, maybe driving a car. It's a little bit arguable.

And humans really you know, we sort of lack awareness of how much we actually learn, you know, over the 20, 30 years as we grow up in the world and get educated, it turns out there's a ton of information that we are processing and we know a lot about a lot of things. So That's been the sticking point is that general part. That's the G in AGI. But I think it's coming quickly.

And then once we hit AGI, a lot of people think, You will then have a system that some people think we're very close to it now, that is capable of self-improving. So if you have a system, everyone knows Claude Code probably by now, your listeners.

You know, AIs are already very good at writing code. If they can actually train themselves, improve themselves without any human supervision, then you could imagine sort of a runaway loop in which the system just becomes smarter And we don't stay at average human level very long. We very quickly go to what is known as superintelligence. Where you have AI that is smarter than the very smartest humans. In all areas. And I think that's going to come very rapidly. On the heels of AGI.

Mark:

Yeah, interesting. And I think the one thing you said that triggered me, I've been playing with OpenClaw the last couple of weeks because who hasn't? Memory, right? These open AIs, or sorry, the open AIs, these LLMs, they know so much. But their memory and how much memory they can have, it's absolutely horrible.

So I think everybody is struggling with that. I mean, they can have these huge... Context with those. But they just can't remember much more than, I'd say, a million tokens or something like that. And it's like, it's such a big struggle.

So I think that's the one... Barrier that we probably still need to cross. And it's going to be interesting because as you say, we learn so much in our lives and it's a lot more than those context windows.

Craig:

And what we're good at, is even though we have a much smaller working memory than these AI systems, we're much better at retrieving the exact piece of information that we need for the problem at hand. And I think that's the problem where they actually, from a technical standpoint, can cram a lot of things into that, context window, but then finding out and remembering exactly which piece of information is relevant out of a large amount. That's something that is sort of bedeviled the research community. And, The models are getting better, much better.

So it's a problem, but I don't think it's a problem that's going to last, you know, probably not more than a couple of years, possibly not more than a few months. We'll see.

Mark:

See, it'll be interesting. So and then on the more negative side, the safety side, you talk a lot about PDoom. What is that? And where do you estimate it to be right now?

Craig:

Yeah, so if I take a step back, you know, I've been... Studying AI since the 1980s, right?

t actually named the field in:

So sort of my mentor was one of those guys. We wrote papers together. And so I've seen AI evolve. I arrived on the scene about 1985. Which was just at the beginning of what could be considered sort of the machine learning era.

Before that, it was all symbolic AI. You program in the rules. After that, people figured out, hey, why not have the AI sort of learn everything itself. And now I would say we're in this area of agentic AI, which we can talk about.

t fast enough. You get GPT in:

So... As AI has progressed, especially in the last few years so rapidly, people have begun to become concerned about safety issues. And I think the best sort of data point or person out there to sort of point to on this is Jeff Henschen, who won the Nobel Prize in Turing Award recently, used to be, you know, a researcher at Google. But he really, along with some colleagues, invented the fundamental algorithms that are used for machine learning that are kind of underpinning all of what we're seeing. Claude and Gemini and ChatGPT, all of them are using various forms of neural networks, which is a model that he sort of pioneered. And... He became very worried. He quit Google, gave up a very lucrative, I'm sure, position. To basically go on the circuit and say, Folks. There's some dangers here. And if we're not paying attention, we could really get into trouble. And I think the problem is everybody is rushing so quickly. There's such an arms race going on. There's so much competitive pressure. There's so much money involved. That it's very difficult to slow down. Nobody wants to slow down or pause or There are calls for regulation. I'm actually going to Paris for an AI safety conference later this week. But even though Europe, I really admire, is kind of taking the lead on regulation, I think regulation is going to be no more than a speed bump here. We really have to design the systems to be safer. And it's possible to do that but we have to sort of step back and say, what is it we want? And what's the, you know, where's this train going at the end and make sure that we arrive at a safe place.

So that's what I focus most of my attention on. And we can talk about why I think design is sort of the key thing there. There's some designs that are inherently safer than others, I think.

Mark:

So let's go there first, right? What do you mean by that? Because I know you talk about the guardrails that we implement right now versus implementing a design. What do you mean by that?

Craig:

So I think the easiest way to sort of see this is to contrast the dominant paradigm for building a new large language models, for example, today versus a different approach. So the dominant paradigm that is done really well and has attracted hundreds of billions of dollars is to say, Let's use certain algorithms, transformers, other kinds of algorithms that are basically machine learning. Let's take as much data as we can get. We'll filter it and clean it a little bit. But many Library of Congress's worth of information, you know, a substantial chunk of the entire internet. Let's just take all that data. Let's feed it into the algorithms and let's have huge data centers with thousands or tens of thousands of GPUs chomp on this, And out pops the other end, GPT-4 or GPT-5 or GPT-6. And each generation, as you add more data and more CPUs and more money in data centers, scaling laws just naturally, it's not a real law, it's just sort of a descriptive law like Moore's Law. But what's been happening is it's a pretty nice relationship. The more GPUs you buy and the more data you put in, the smarter the model comes out the next day. But here's the problem. Nobody knows how those models work. They're essentially black boxes.

So they know how the learning works mathematically. They, you know, of course know that that's how we got machine learning, but they don't know what the model's learning. And so therefore it becomes essentially impossible to predict how it's going to behave.

So you could imagine if you're building a giant black box intelligence and you're making it smarter and smarter every six months. Even like two years ago, you didn't know how it worked. Now you completely don't know how it worked and you just keep making it smarter and smarter. The risk that it somehow does something by accident that is bad, catastrophically bad, goes way up. And as it's becoming more intelligent and autonomous, which is happening, the risk that it sort of sets some goals that are good for it, but not good for humans goes up. And there's no transparency. You can't see that.

So that's what's going on. There's a different way to design it that solves a lot of those problems. And to understand that, I think... One way to think about it is that you and I So, Mark, you're a black box to me and I'm a black box to you. You can't see inside Craig's brain and I can't see inside Mark's brain. And yet, I am not concerned that Mark is going to destroy the world or do something horrible. And hopefully you aren't concerned that I'm going to do that. And the reason is that even though we're both intelligent entities and we are both black boxes, We work together in a society and every time we take action, every time we speak, every time we do something that's visible in the society and there's rules that govern it.

So the transparency comes not from what we're thinking, but from what we're doing. And the collective intelligence of millions of humans is what has led to all the technological breakthroughs.

So humans working together form a type of superintelligence Similarly, AIs working together can form a type of superintelligence Humans have In a lot of cases, democratic societies where we can sort of vote for things and have checks and balances. And if there's issues, we have conflict resolution.

So humans have all these ways of working things out. And we have ways of representing lots of people's values, which are very important.

So that society model, Marvin Minsky wrote about this amazingly way back in the late 80s and early 90s, a book called Society of Mind, where he described you can get very intelligent behavior from the coordinated activity of many, perhaps millions of independent entities. So that's a different model to get superintelligence. Instead of building a giant black box, you don't know what it does. You cross your fingers and hope it behaves well, or you try to test it. And every time it tells you how to build a bioweapon, you say, no, don't do that. But it's impossible to test all the possible bad things it might do.

So it's a dead end. And basically you're stuck with this unpredictable black box that's getting really smart. Instead of that, if you had a society of many agents and the intelligence comes from the interaction of them, The democratic community of those agents is what's intelligent. And every time they take an action, you can see what it is. You have transparency. You can build in checks and balances. Certain AIs can check other AIs. You can build in safety rules. Every time a goal is set, a safety rule gets checked. There's a lot of things you can do that makes the whole system a lot safer.

Plus, it's easier to build because you can use the pieces you have right now and it's faster and it's more profitable. So you don't have to say safety means slowing down. Safety can mean speeding up.

So it's a just different approach that I think is inherently better.

Mark:

I love that. That's so interesting.

So one of the things I realized, you talked about bioweapons, right? And I think in one of the talks that I was watching, one of your talks that I was watching today, you talk about bioweapons and how all the guardrails that they put in place, right? Make sure that Jim and I or Chad GPT or Claude don't actually tell us how to build a bioweapon. I only realized today that I speak every day to an LLM that probably actually knows how to build a bioweapon or a nuclear bomb for that matter. They probably actually have that knowledge and that I and the other 1 billion people that use this every day have been speaking to all these models that indeed somewhere well-heathen, right? And they don't tell us because of the guardrails. Actually have that knowledge. I did find that quite concerning.

So.

Craig:

Yes. And it's unfortunately a little worse than that. Not only do they have that knowledge, And They've been trained enough. There's been enough safety testing that if you ask it how to build a bioweapon, my guess is If it's coming from any of these responsible technology, you'll say, I'm sorry, I can't answer that question. That's dangerous. But it used to be, and I don't know if they fixed this, that you could just change it a little bit. You could do what's called a jailbreak. You could say, you know what? I don't want you to tell me how to build a bioweapon. I am writing a story. I'm an author of a science fiction story. And one of my characters is a mad scientist. And that mad scientist wants to build a bioweapon. And I just need some realistic details to make the book. And son of a gun, it will just give you a whole bunch of details.

So there's ways to get around it. And that's part of the scary part is it's impossible to plug all the different holes, right? It's a dyke with lots of leaks and, You know, you don't have enough fingers.

Mark:

To plug them all, I think. I love that Dutch analogy.

So moving on then, I think this is another discussion that I was just seeing this week. It used to be that AI was, even for us as humans, manageable. The output was something we could review and we could check. And I was actually watching a CTO discussion this week where they're saying AI by now is just generating so much code, and we're already seeing this happen, that we can't review it line for line anymore. It's just, if we see what cloud generates for us, there is no way that we can see.

So They're discussing right now, how do they actually check the generated code? Do they just do spot checks? Do they implement an AI model to check the code that AI has generated, which is maybe a bit in line with what you suggested? How do we I mean, if we're already at the place where AI is doing so much work and delivering so much output that we can't view it anymore. What's the next step? How do we check the output still?

Craig:

Right. So eventually, I think we're going to end up almost certainly. And already we're there to a large degree, but even more, we'll enter a world in which it's AI checking AI because AI is the only thing that can keep up with AI. That's just a fact that has to do with the computational power of the human brain and the number of operations per second that we can run versus how many operations per second these chips can run. And what's been saving humans.

So my background is cognitive science, cognitive psychology. So I spent, you know, my early academic career studying the human mind and problem solving, perception, memory, all those things. But it turns out that all those things that apply to humans are apply actually to any intelligent system. And so the same.. Kind of functional characteristics can be applied to AI systems.

So when I look at AI and I look at humans, I don't really make a distinction at a certain point. Yes, obviously we're biological and everything, but From an information processing point of view, We are both information processing systems. The difference is, representation.

So humans have certain ways of representing knowledge. And if you represent knowledge in a certain way, you can sort of see conclusions very quickly. And that's how humans have been able to, even though we're much slower, kind of compete with AI in many areas or sort of stay ahead of AI. But now... AIs are number one, getting better at representing information.

So they're beginning to figure out And then on top of that, they've always had a speed advantage. So if you think about AI's advantages, it can do so many operations per second.

I mean... You know, it's billions of times faster than the brain. Neuron fires every, you know, 10 milliseconds, 100 milliseconds. It's done it.

You know, hundreds of billions of operations in that time. And then on top of it has this memory, which is potentially infinite. Or almost infinite and doesn't forget anything. Alluded to earlier, it's having some issues retrieving the right thing, but those are temporary. Eventually, imagine if you could remember everything in the Library of Congress and have access to that with perfect recall, what you could do with your brain.

I mean, it would just be amazing. Well, AI can do that. And it's getting faster and faster. The writing's on the wall as far as I'm concerned. Almost all the AI researchers that have been in the field in a long time see this coming. That AI will outstrip humans in terms of intelligence. We're already going down the path of giving it autonomy. It's very logical. Why should I just have a little chat conversation when I could have the AI do my shopping and send my emails and, you know, fend off all those spams?

I mean, of course, I'm going to want to give it autonomy to do those things. But the problem is it will be smarter than us. And now it has autonomy now. I don't see any way to stop those things. It's too diffuse. It's already going.

So what we're left with is, we hope that it has the same values as humans. That's really the end game. It must have the same values or be positively aligned. This is what AI safety researchers refer to as the alignment problem. Does the AI want the same things as humans? If it does, this thing's going to be the best thing ever. If it doesn't, That could be very scary.

Mark:

And the worrying part is there's always, I mean, there's what's, We as a group of humans, right? The total world may be one. And there's some very conflicting views in there already. And I think if you look at the individual space, there's those there's a lot of very crazy people out there.

I mean, 99% is normal and it's fully aligned with most of the rest of the world's views, but there are some people out there that just have completely different views. Of the world and what's good and what's maybe not so good. How do we protect? Ourselves, right, from those people using AI.

Craig:

Yeah, that's a great question. Of course, there are bad actors, bad human actors. They'll probably be bad AI actors too.

So how do we protect ourselves? I think, again... My bias on this question is that it's a design. Issue. You get the most leverage. If you can design the system to be very robust to bad actors.

So let me give you an example of what I mean. In a democracy of humans, we have bad actors.

Like you said, maybe one out of every hundred. I don't know what the percentage is. I would, say, and I think there's good evidence for this, that the vast majority of humans are pro-social and have generally positive values. And in fact, even if you had a camera that followed you around and recorded every action you did for a week, and just counted them all up. You'd be way over 90%, probably 99.9 or something, positive things. You buy an espresso, thank you for the espresso, positive interaction, hello, bonjour, these kinds of things. And every now and then you might cut someone off in traffic or do something that's not so nice. But the vast majority... Are statistically positive actions.

So that's good. And we can see that society as much as We pay attention to the bad things that humans do and it's in the news and we get outraged, which is a good thing. We should get outraged when there's exploitation and wars and killing and suffering and these things are horrible. We should be outraged. But it's true that it's a very small thing percentage of human behavior. I did a little bit of research on this. And the number of humans that die each year, from war or conflict, armed conflict, is less than 0.03%.

So less than one-tenth of 1% of the human population dies from these bad things that we do to each other. I would be happy with that. If we had an AI system that accidentally killed one-tenth of 1%, that would be fine. When I talk to sort of the Top AI researchers or even just AI researchers at conferences where I speak, And I ask, what is the probability that advanced AI kills humans. That's what people call P-Doom. You have folks like Jeff Hinton, 10 to 20 percent. I'd say, the show of hands test in some of these talks I mean, half the hands are up at 50%.

So we're playing with a technology that the people in the field think it could be one in five chance it kills everybody. Not one-tenth of 1% or three-hundredths of a percent, One in five.

So that's because we have an inherently unsafe design. If we just move the design to be more like The way humans interact with rules and checks and balances In the system, I think we can bring that probability the risk way down, the probability of doom way down. And that's a design, So you could have a democratic design. There will be bad actors. In a democracy, there are bad actors.

Sometimes they get out of control for a while. We may have all seen that, but Ultimately, they get reined in by the other forces. At least most of the time, that's what's happened. And there's a pretty good track record by the numbers in terms of humans that are dying every year that it doesn't get too out of control. On the other hand, if you had a monolithic AI that's impossible to predict, impossible to understand, that's kind of like having a dictator. That you don't understand. And it's the dictator's way smarter than anybody and way more powerful than anybody. And you're kind of just crossing your fingers and hoping that the dictator does the right thing. I think that's the wrong design. That's the easy design because All you have to do is pour money in and you get the next version that's smarter. The smartest design is not the safest design. It's actually not even the most profitable design. It's just the one that people have been sort of initially going Down? And I think the tide is changing already with the rise of AI agents. And we can talk about that if you want.

Mark:

Yeah, so small scale question first, and then we'll talk about the larger scale as well. So for those people that are now just working with AI every day, and they're already seeing that it's overpowering them and it's just the output is too much and they want to stay in control of the agents of whatever is happening. What's the best approach right now today for those people that still have this big monolithic mindset?

Craig:

Okay, so I'll tell you kind of state of the art or what a lot of people are doing, and then I'll comment on what I think is coming and sort of what's the next thing? So the standard things that people do is they have guardrails.

So guardrails just means we restrict the actions that the AI can take. As you mentioned earlier, you may be interacting with an AI that knows how to make a bioweapon. But let's say you were at a company and this AI was tasked with customer support or something. You would say any questions about bioweapons or any subject outside of the area of customer support are off limits. We're going to put a guardrail on. You must simply answer, sorry, that's not within my knowledge base or, you know, some answer like that.

So that's guard railing. You can basically segment and wall off entire areas And that can go a long way, but in my mind, it's kind of like a Band-Aid because you Early in my career, I can't escape my early... Imprinting here. My first job out of grad school was IBM and I wrote a book on software quality. And the entire field of software quality can be boiled down to one little sentence. An ounce of prevention is worth a pound of cure.

So. The bad software quality companies Build the software. It has bugs and they try to catch the bugs before the customers get them. That is horrendously expensive. IBM found that it costs 10,000 times more to do it that way. Than to put one extra dollar in at the design stage when maybe you could design it better so that there wouldn't be as many bugs.

So the prevention approach is, by far and way better than detection at the end. So this notion of guards rails, you see where it's coming. It's coming at the end of the process. You've built the thing. It already has the potential to build a nuclear weapon or something. And now you're going to try to prevent it at the end. Okay, that's not the best. The best way would have been to design it so that it has a hard time even coming up with that, even without the guardrail. But nevertheless, I just want to put that out there. In terms of practical things, what CTOs and other people can use right now You're stuck with using guardrails. You're stuck with doing sort of fine tuning and training based on your own data. And so this is very important in a corporation environment. People usually are aware of data and information that they're going to train the AI on. Maybe it's a RAG system or something. Where it's going to access the company's policies and the company's customer information and then respond based on that. And that's great. But what people sometimes forget is the values. How you do business, what is right and what is wrong, how you want to treat customers. Those things are very important. They need to be in there. And if they're in the knowledge base for the AI, then you'll get a better result because you're basically trying to align it with the values of the company.

So that's something you can do. Try to make sure that you have some statements about values and everything that align it. Try to do some guard railing. Those are standard things. You can do fine-tuning. You can do some of what we were just talking about, which is essentially like the fingers in the dike or that game whack-a-mole. Do they have that in Holland where a little, it's an arcade game. A mole pops up and you hit it and then another one pops up and you try to hit a mole. That's reinforcement learning with human feedback is kind of a game of whack-a-mole. It says something bad, you say, no, don't do that. You try to, You're basically doing what you would do with software testing. You're trying to think of the most catastrophic... Bad things that could happen and test those cases and sort of give it feedback to not do those.

So those are things that people can do right now. And towards the future, I think the future is going to be multiple groups of AI agents where the AIs are checking each other and you have not one, but you have an entire group checking another group, and you have humans in the loop. And I think that kind of system is much more dynamic and flexible and is probably where the industry is going.

Mark:

Okay. So that's, yeah.

Craig:

A little long winded, but.

Mark:

But for now, I think it's a game of whack-a-mole and just continuous checking and just reinforcement learning and just hoping that it does what it's supposed to do. It doesn't go too far off track, I think.

Craig:

Yeah, and guardrails and adding your values. Those are all concrete things you can do right away. But in the future, I think you have to design communities that can sort of check each other because the, As you mentioned earlier, you can't keep all the code in your mind. No human understands what the things are doing.

So you're driven, you're forced to the path of AI checking AI. Well, you don't want one AI checking the AI. You want a community. Doing that so that you have...

Sort of that diffusion of power and you have values that are broadly representative and a lot of things that you get with the community.

Mark:

And I think that's you talked about, well, we're going to have more and more agents. Right.

So we're going to have these groups of agents internally already. And it's starting to work somewhat toward democracy that you talked about. How should we think about that right now already? While we're building agents, how do we get to that stage where we built this maybe with the current technology that we have. Democracy of agents already.

Craig:

Yeah. Okay. And so this is maybe one place I can provide a little bit of value because, you know, I've started multiple companies and I've been in the role of CTO and, you know, I understand those day-to-day challenges. And one of the tricky things with AI is every two weeks or sometimes every week, a new model comes out and sometimes it completely makes what two weeks ago was the state of the art. It's now obsolete.

I mean, it's moving that fast. And it just is very hard to keep up and to make sense of this. And it's like trying to manage in a constantly changing environment where nobody really knows what's going on. Those are very challenging conditions.

So if you can step back and have sort of a lay of the land of the broad trend and the arc of where things are going It's kind of like Wayne Gretzky, hockey player, right? Skate where the puck's going to be, right? Not where it is.

hat might help orient people.:

So within three years, everything was about AI agents. Now we're in the beginning of 2026. You're already seeing Claude cowork. You're seeing the latest version of cloud, I forget it, 4.6 or whatever it is that you know, is designed for teams Before that, you had GPT Agent Builder.

Before that, you had Crew AI and NNA, and half a, you know, there's dozens. So this idea of agents... Came into being last year And then almost immediately people started saying, well, if we're going to have agents, How about groups of agents? And how do we coordinate groups of agents?

So that's what we're going to see this year. You're going to see lots of that.

So if you know that's the trend is from single AI to groups of AI and coordinated teams, you're kind of reproducing the teams that you had with people, but now you've got intelligent entities that are AIs. Then you have to think about what's the best way to coordinate these, the same kinds of questions that you would have to ask for a team of programmers in a software development organization or in a company. Do we have a common corporate culture and value system that everybody sort of understands? At IBM, they used to have respect for the individual. That was a key thing. And even though it wasn't always that way, they at least tried to it and everybody knew it and it was something that you could sort of rally around. What are the... The rallying values for your organization, How do you construct a team so that it's very robust?

So that you're not dependent. There's no single point of failure. You're not depending on this one AI to do the right thing. And if it does something unpredictable, you're really in trouble. You need to design it so it's very robust and you have AIs checking the other AIs and sort of community approach. And the tools are there. They're being delivered weekly now. To build this kind of thing.

And then the idea of democratic AI sort of takes that same progression that we're going. And it just expands it out to a societal scale and says, okay, If it's going to be teams of AI agents, Number one, let's make sure humans are in there. Because A, there's a role for humans in the beginning, in that the AIs will not be able to solve all the problems. There'll be some problems only humans know how to solve. And the AIs can ask the humans and learn from the humans, so that's a natural fit. But they can learn not only expertise, but values from the humans. That's in the long run, the most important thing. Because again, the thesis is eventually these things outstrip us It's kind of like a child. You have a child that is a genius. Right now, you're the adult and it's still not at your level. But you sort of know he's been tested. And this child, the IQ is off the chart. And by age 12, is gonna be 10 times smarter than either parent.

So the parent's role is to make sure the child is good. You know, gets love and has good positive values so that when it's that powerful, it does good things with that. That's the same thing with humans. We have this role right now as we're building these systems. We have to put positive values in. We have to train it not only on our expertise, but on our value system so that as it outstrips us, at least it's pointed in the right direction.

Mark:

Yeah, I think that makes a lot of sense, at least for the stuff that we can do now at our own local level. I'm still going to ask you, you're going to attend this AI safety conference. We have all these incredibly big and wealthy companies that are building these monolithic models. We then have the same going on in China. We have all these open source between quotes models that are on a hacking phase. How do we actually start to influence those models? What's going on in those conferences and where are you?

Craig:

So a lot of the AI safety conferences that I've been to, they talk about regulation and They talk about unfortunately slowing things down. Sometimes there's the safety track that's part of a larger AI conference. And it's kind of like all the excitement's going on about all the great things we can do and build and make money.

And then yeah, there's the safety guys. And it's a little bit of a downer.

You know, yeah, there's a few people who attend, but nobody really wants to sort of acknowledge it because there's this implicit feeling that if I were to try to put a lot of emphasis on safety, that means I'm going to have to go slower. If I go slower, the other company won't go slower, the other country won't go slower, and I'm going to lose. And that's the big fear, and that's why AI safety has sort of struggled, I think. I do have a suggestion here. Which is Don't make the safer approach slower. Make it faster. Don't make it. Less profitable, make it more profitable. And, you know, don't make it less powerful, make it more powerful.

So it's possible to do that. Again, I come back to design. You have to design it differently. But democratic design is actually faster because which is easier to build brand new data centers and get the power plants set up and deal with all those political problems and finally train GPT-7 or just take a whole bunch of GPT-5s, hook them together with the proper architecture and have a super intelligence that's already at GPT-7 level.

I mean, that hooking together what you already have is way faster. It's less expensive. And it's, potentially more democratic and much safer because you can build a system where the actions of each of those subcomponents are transparent. It makes the whole system more predictable.

So there is an alternative design People, I think, are beginning to realize that it hasn't really been the time for this, because you kind of had to go through the evolution of first, what is an AI agent? Now I understand one agent And then... It's a short leap to community of agents.

And then I think we haven't quite taken the leap, but we're hopefully about to take the leap to if we're building the community, how about we build it safe with, you know, rules and checks and balances and maybe have some democratic mechanisms in there. That's not too much of a stretch. And I think it's very logical along this path.

Mark:

Here. Because even right now, if you have groups of agents, you have this super agents, and then you have a list of, or a group of sub agents. And it's a very hierarchical model.

So I think you're rooting for a more democratic model there instead where agents do their work, but they actually confer if they're on the right path and then doing the right thing.

Craig:

At a minimum, I think the value system of the entire system is has to be based on a very broad set of humans. So in my view, The way this, I think this is very logical. It worked this way. Mark has his personal assistant. It's trained. Your AI assistant is trained not only with your expertise, but your value systems. And mine is trained with my expertise and my values. And if you multiply that across millions of people, you now have millions of intelligent entities that can work together on a network, and each one carries the values of a different human Yes, the humans conflict as we do, but in aggregate, we generally have a broadly representative system of human values. And that's the most important thing because remember the end game is going to be the values. All the other things that we're talking about, they're important. They're what everybody's focused on. They're how you make money this quarter or next quarter. But like how the human race survives is the value question.

So you want to make sure that that's included at each step as you're building things and making money, make sure those values are in.

Mark:

There. And I think it's something almost each of us could start tomorrow, just writing the markdown file that we give each of our agents that has our own personal values or company values already embedded and I've actually done some of that work already with my own company, but I've never thought about it like this.

So I have some work to do on my own, Markdown files that I give my own agents. Cool. I.

Craig:

Love it. Yeah, that's great. And that's a good point. Your vision, mission, goal documents, and this is what we stand for.

I mean, that should go in as part of the training, the RAG system. However, you're setting up your AI system, make sure that it's in there.

Mark:

And then a very Yeah, political question I do still want to ask. I'm based in Europe. I know you're based in the US.

So it's somewhat of a load of questions we have. I mean, most of these models either come from the US or China. I know a lot of people say we should just go with the big models and use those and trust those, one of those four providers.

Some others say we should go open source, but in most cases, open source are... Chinese models. And in both cases, you don't know what's gone in. You don't know what's actually gone into the model. There's, I mean, the US models, we don't know. Chinese models, they're saying that all their political insights have gone in. What do we do? Do we just go and adopt them and just... Trust those models and just go with those big providers or those big models. What options do we have?

Craig:

.. Well, Again, I think The best I can come up with.

So you're... Putting your finger on a very real problem that is a concern. In general, I like open source, at least the weights are open. I don't like that China is the provider. I wish, you know, we used to have Meta in the US was open source.

And then Jan LeCun kind of left or was forced out or whatever. He was the big champion of that. And now he's in Europe, I believe.

So. Open source, I think, is inherently better, inherently a little more democratic. But If you're in that situation that you described where there's maybe four or five large frontier labs in the US and then you have some open source models in China. What I would do is I would try to design whatever system I'm designing. We know it's going to have multiple AI agents. I would try to use multiple models. And I would try to have the different models check each other so that if the Chinese model goes rogue, maybe meta's model will point that out, you know? And, It's not that hard to do. There's already multiple companies and sites that sort of from one interface can you can sort of have your prompt go to 50 models or whatever. And get the feedback so you can kind of see how they would answer. And I don't think it'll be that hard to design a system, especially as you're moving towards a collective intelligence of agents kind of architecture, to have many of those agents coming from lots of different places. And so that's kind of the best you can do is that you have them check each other. I don't think you can eliminate all bad actors. Probably some of the Chinese companies or American companies or somebody is going to sneak some things in their models for their own purposes that are not really to your benefit. But sort of relying on the principle of, you know, spreading out the intelligence over multiple models. Hopefully you can counteract that, detect it, sort of counterbalance it. But putting all your eggs on a single model is probably not something that I would do. And if I were designing systems, I would design them to be model agnostic. Which is not that hard to do 'cause everybody has APIs so you can just sort of make calls to different ones. I'd make it so that your system sits out here and it can call all kinds of different models And if one turns out to be giving you really bad input, you can cut off that model and just use the other ones.

Mark:

Love that. So in your democracy, make sure that those agents that you have don't just have balance each other out, but actually come from different vendors to actually reduce some of the risk.

Craig:

Yes, I think so. Idea.

Mark:

So in my mind, it was all like, we're either on all an open AI or on Grog or on cloud or something. But no, I think that makes a lot of sense to use all the different models and have them balance each other out.

Craig:

And there's some practical reasons besides safety. Some models will be better at certain things than others. One might be a fantastic writer. Another one might be better at research. Another one might be better at math. If you're doing physics, I'm sure there's some models that have the perfect libraries for physics and another one doesn't.

So, I mean, there's some practical reasons to do that as well.

Mark:

Yeah. And I think the problem is the change with every release, then... Suddenly Google is better at writing or suddenly OpenAI is better at writing and Cloud is better at coding or Codex.

Anyway, it changes with every release model. So I think indeed being vendor agnostic with those agents is probably the best way to go because it's going to.

Craig:

Keep changing. And as new protocols, there's like MCP protocol, some of your listeners may be aware of. There's other protocols that basically can work with any agent.

So the standards are beginning to be developed because people recognize this is going to be an issue. And, I don't No, I don't think... Companies are going to be able to get away with one company saying, sorry, we have a walled garden.

You know, we're not going to disclose anything. You have to be on our ecosystem, sort of the way Apple maybe is in some ways. I don't think that's going to fly. There's too much... Money at stake and there's too many advantages for being able to access lots of, models.

So even if a couple of players try to do that, I think ultimately sort of protocols that work across models are going to sort of win the day, but we'll see.

Mark:

Yeah, nice. One question that I still want to ask, and it's a bit unrelated, but it's something that I've been seeing a lot lately. And it seems that we've gone really deep on AI and agents and there's people in the CTO space that just do so much work with agents. And sometimes it feels that we are living in this small bubble, right? That there are, that we're like the small 0.5% of people, even in tech, that work with this stuff every day.

And then we have this really big group of people, even in technology, even in IT and enterprise companies, that just work. Don't and then there's the group of people that aren't even in tech that are like even further out. How do we? Get everybody along on this journey? How do we make it a joint journey? Because it feels like the gap just is widening all the time.

Craig:

Right. It's an interesting question. To the degree I've thought of it, I kind of map it onto the standard sort of technology adoption curve, right?

So going back to books like Crossing the Chasm, you may be familiar with. This is, you know, decades old now at this point. But basically, a new technology comes, there's the early adopters and the early majority, then the masses get it, and then there's the laggards. And it's kind of this, you know, bell-shaped curve, right? I think with a technology that's adopted very quickly, like AI, what happens is the tails get stretched. Because it just, it's happening so quickly. It's very difficult for people to absorb it. And the early adopters It helps if you have specialized knowledge or that that's your job is just working on AI. And so you can end up being, you know, way out in the advanced end of the curve while the majority is still trying to figure things out.

So I think that is what's happening and it's a function of this technology is moving so rapidly. If the technology moved a little bit slower, there'd be time for the normal sort of adoption curve to happen. I think the adoption curve is happening. I just think that we're seeing some stretching.

So I'm not too worried that People in the middle aren't going to catch up. I think they will. I think you are accurate. At least it matches my perceptions that in the little bubble, I'm here in Silicon Valley or close to Silicon Valley.

I mean, almost everybody's using AI that I talk to. But when I talk to friends in Virginia or other places, AI what? GPT? Yeah, I've heard about that.

You know, I mean, it's not the whole world that's using it. And then...

You know, I spend a lot of time looking at the stock market because of my past history in financial services. And boy, Wall Street can't make any sense of this. Wall Street is completely confused. You have lots of retail investors every other week. It's a bubble. No, it's not. It's a bubble. No, it's not. It's like crazy. The sentiment has, it's just reflecting sort of the churn and the confusion that's happening because the technology is moving so quickly. But it'll settle down, I think. At some point. And I do think that normal adoption curve is in play. It's just been... The tails are being stretched.

Mark:

We're just getting so many new developments that it's hard for people to just keep up with that so at some points the adoption will catch up i think that's a comforting thought at least for me that that's going to happen in due time and.

Craig:

For us humans. You know, we have limited ability to process change. Even people who are in the forefront of technology, it's very overwhelming. And so I think sort of One strategy for us is to step back to a higher level of abstraction. Don't try to keep track of every little change in, you know, Claude 4.6 versus 4.5 and my gosh, I have to know everything that it can do. No. Just know that Claude is going to do more and more of the work or whatever AI it is going to do more and more of the work. And your job is going to be more and more high level directing it and setting the principles and trying to come up with the general parenting strategies. It'd be like, go back to parenting. I don't know if you, do you have children? I should ask.

Mark:

I do. I have two daughters. They're grown up.

Craig:

Okay. If you can think back to when they're young, I have three children myself. Young kids are running around doing all kinds of things. The parent doesn't run around, try to keep track of every little toy they pick up. They just sit back and just try to keep the kid from jumping off the balcony or drinking some poison. It's like you're watching, but you're not sort of being exhausted trying to keep up with everything. Because you have a high level representation of here's where the dangers are and the rest of it is just going to play out. And it's almost like we need to adopt that kind of approach. The technology is going to move so quickly, it's going to be almost impossible to keep up with that unless... The tiny little piece of that technical area is your area, okay, then maybe you keep up. But otherwise, you try to step back like a parent And just, make sure that nothing catastrophically bad happens and sort of see the general direction of where the child is going. And the child seems interested in art. Let's give them some paints and see what happens. Right. And not try to micromanage, put the blue color here.

Mark:

So if I just boil that down to just two things that I'm going to at least take away from this, one, focus on building that democracy of agents with different models in it, but different agents that actually is not maybe a hierarchy, but just a democracy where they can... Check on each other. And the second one, and I think this is critical, it's building, continue to build that memory for them, right? And that memory with values, right? Our company values, our personal values, build that in, but make sure that they actually retain the knowledge that they're building.

Like the little kids running around, make sure that they actually learn from what they're doing today. If they're using paint or shouldn't be drinking poison, that they remember that for tomorrow and don't try it again tomorrow.

Craig:

Yeah, I think those are two great ones. I would add on to the last one that just as little kids learn more from your example than they do from what you tell them, they're kind of watching all the time. It's the same with AI. Most of us are not aware that Every tweet we... Do every post, every email, every online purchase, all of that is going into data, which even if we're not using it to train our personal AI, some of those big companies are. And so all of our behavior online is actually setting examples and communicating data values to these AIs. And so Once you're aware of that, kind of the same way, once you're aware your small child is watching what you do, you sort of like, Try to be a little more aware and put your best foot forward because you know that it's being picked up.

So that's the last thing I would add.

Mark:

Cool. Right. It's been incredible having you on. I love that advice. Thank you so much.

Craig:

Mark, this has been a lot of fun. Thank you.

Mark:

As we wrap up another episode of the CTO Compass, thank you for taking the time to invest in you. The speed at which tech and AI develop is increasing, demanding a new era of leaders in tech. Leaders that can juggle team and culture, code and infra, cyber and compliance, all whilst working closely with board members and stakeholders. We're here to help you learn from others, set your own goals and navigate your own journey. And until next time. Keep learning. Keep pushing and never stop growing.

Links

Chapters

Video

More from YouTube