Artwork for podcast Act of Intelligence
AI Coding Is Solved. Software Engineering Isn’t.
Episode 230th June 2026 • Act of Intelligence • Ajay Medury, Andrew Sierota
00:00:00 01:01:21

Share Episode

Shownotes

If coding is "solved," is software engineering? Ajay Medury (software engineer) and Andrew Sierota (systems engineer) pick up where Episode 1 left off and get into the part that isn't solved: judgment. They trade notes on why weekly usage limits have quietly become the real project budget, what it's like to build a sharded Minecraft world solo as both product manager and principal engineer, what Amazon's New World got wrong about scale, running decorrelated multi-model code reviews, and what an AI "skill" actually is. It might all just be an act of intelligence.

In this episode:

  • Why "coding is solved" but software engineering isn't, and why judgment is the expensive part
  • The new bottleneck: weekly subscription usage limits as a hard budget
  • Breaking a big build into modules and submodules, and shrinking scope to actually ship
  • Wearing every hat at once: product manager and principal engineer
  • New World and the problem of scale at launch
  • Large codebases, heavy test coverage, and review rounds that exposed process gaps
  • Pre-flight checks, linters, and spec-tracing that cut review loops down
  • Decorrelated reviews: several different models reviewing blind, then taking the union of findings
  • What an AI "skill" is: system prompts, user prompts, and guardrails for long workflows
  • Severity tiers for findings: blockers, warnings, defers, suggestions, and nits
  • Why systems admins, as generalists by trade, may be an ideal audience for these tools

Chapters:

  • (00:00:00) - Real intelligence, or just an act?
  • (00:01:02) - Coding is solved; software engineering isn't
  • (00:02:02) - Keeping up with the release pace
  • (00:05:27) - Vibe coding vs. a repeatable process
  • (00:06:40) - Usage limits are the new budget
  • (00:08:40) - Breaking the build into modules
  • (00:12:18) - A sharded world, every hat on one builder
  • (00:17:12) - New World and the problem of scale
  • (00:25:04) - 200k lines and a 90-round review
  • (00:27:20) - Pre-flight checks that cut 90 rounds to 5
  • (00:30:49) - The podcast's own local-GPU pipeline
  • (00:33:27) - Learning by asking "what do you mean?"
  • (00:35:35) - When a large agent run burned through the budget
  • (00:38:05) - What is a skill, really?
  • (00:48:05) - Skills as guardrails for long workflows
  • (00:51:09) - Severity tiers: blockers to nits
  • (00:54:34) - Why sysadmins are ideal builders
  • (00:56:29) - A long-running Minecraft community, the real driver
  • (01:00:56) - Closing: an act of intelligence

Transcripts

Speaker:

Hello, my name is Ajay Medury, and I'm a software engineer, and today I'm

Speaker:

joined by...

Speaker:

My name is Andrew Sierota, and I'm a systems admin.

Speaker:

Awesome, and today we are here to talk about various topics

Speaker:

in the AI ML space for a podcast that we have coined

Speaker:

Active Intelligence, because we're trying to figure out if it's real intelligence or

Speaker:

is it just acting?

Speaker:

And this podcast is for all aspiring creators, creatives, and

Speaker:

builders.

Speaker:

Or those who have already been doing it for a while and just are looking for new

Speaker:

tools and maybe ways to improve their workflows.

Speaker:

I'm curious if, you know, one of the things we talked about last time was

Speaker:

particularly like, you know, software engineering, writing code might be a solved

Speaker:

problem, but is software engineering the solved problem?

Speaker:

And I think, yeah, I was curious.

Speaker:

Yeah, yeah, picking up where we left off on the last episode there, I remember us

Speaker:

saying that coding could be solved, right, but software engineering definitely

Speaker:

isn't, and I think I was touching on this while we were chatting just before the

Speaker:

show, you know, coding is basically, you know,

Speaker:

Claude can write code all day long, faster than anyone can

Speaker:

humanly.

Speaker:

Mm-hmm.

Speaker:

It'll figure it out if you give it enough time, enough tokens.

Speaker:

Oh, yeah.

Speaker:

But judgment isn't free.

Speaker:

Yes.

Speaker:

And that's where humans are still very valuable, is judgment.

Speaker:

And that can be really expensive.

Speaker:

It could be a really expensive mistake if you have poor judgment on the usage of

Speaker:

your code.

Speaker:

And talking about that in particular, here's like bad judgment in terms of

Speaker:

accidentally putting a vulnerability out there that could now all of a sudden be

Speaker:

discovered by models much more easily.

Speaker:

The cost of that is pretty intense.

Speaker:

Yeah, yeah.

Speaker:

And I don't remember what the, that, there was the project that Anthropic did with

Speaker:

like the 30 big companies.

Speaker:

Yes.

Speaker:

To like the pre-release of Mythos.

Speaker:

Yeah.

Speaker:

And they were supposed to like patch everything, supposedly, before they released

Speaker:

Fable.

Speaker:

Yes.

Speaker:

Right.

Speaker:

That's funny, 4.8, Opus 4.8 was only released like two weeks ago.

Speaker:

That's.

Speaker:

And then less than two weeks later, we have Fable now.

Speaker:

Three days later, we don't have Fable.

Speaker:

Which is really interesting to me because traditional software engineering took a

Speaker:

lot more time, had a lot more rituals potentially.

Speaker:

And, you know, again, we kind of broached the subject last time is, were those

Speaker:

rituals still meaningful?

Speaker:

Like, do those still make sense to do today?

Speaker:

Like, because I can't even imagine a time like four or five years ago where you'd be

Speaker:

able to release a, you know, pretty significant version and then release the next

Speaker:

major version within weeks later.

Speaker:

I think you'd be waiting months between these kind of releases.

Speaker:

So.

Speaker:

Yeah.

Speaker:

Kind of, kind of hard to keep up with, to be honest.

Speaker:

You know, there's, there's so much changing so quickly.

Speaker:

By the time this is released, you know, who knows what would have changed.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

I think that a lot of what we're doing now, and I guess

Speaker:

something that I've done, like, obviously the world's changing every day.

Speaker:

There's new tools, you know.

Speaker:

It's hard to, you know, want to use the latest and greatest all the time.

Speaker:

Yes.

Speaker:

And, but also keep adding new requirements.

Speaker:

To your project, right?

Speaker:

Yes.

Speaker:

Because, like, there was a point in time where in the, my Minecraft project right

Speaker:

now, I was just adding so much stuff because I was like, oh, yeah, I like this thing

Speaker:

can code everything for me.

Speaker:

Oh, yeah.

Speaker:

That's no longer a limit, but then once I started to get to, well, is it going to

Speaker:

work, right, there's just too many things to check.

Speaker:

Yeah.

Speaker:

And, and I think I remember last episode, I said, I think, you know, testing would

Speaker:

be 10 times as much as the production.

Speaker:

Yeah.

Speaker:

I'm actually thinking it's going to be 100 times more than the development now.

Speaker:

Yeah.

Speaker:

The realization is now dawning.

Speaker:

It's like, oh, no, this, there's a lot more.

Speaker:

I gave it the ability to build all this stuff.

Speaker:

The time it will take for me to now validate that, yeah, it just feels.

Speaker:

It's going to be a lot.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And I think that's the interesting part is because back when, back when I was in my

Speaker:

previous company, we were a cloud company.

Speaker:

And so we were trying to launch systems for others to use.

Speaker:

The expectations there were always like.

Speaker:

Let's maybe be a little more restricted because if we can restrict the size of the

Speaker:

system that we're like actually shipping out there, we can maybe do a better job of

Speaker:

building it and reduce things like risk, which is I'm imagining a question that

Speaker:

Anthropic is asking right now is what is the risk of actually releasing this model?

Speaker:

So it becomes a similar question.

Speaker:

I think we had, we had those kinds of conversations all the time where it's like,

Speaker:

all right, do we actually do less intentionally?

Speaker:

And the answer at times was yes.

Speaker:

Yeah.

Speaker:

We should do less intentionally.

Speaker:

I don't also think that always applies towards other kinds of systems like our

Speaker:

projects, right?

Speaker:

I do feel like we were briefly talking about this before the podcast about like, oh,

Speaker:

what is, you know, like, what does software engineering look like versus vibe

Speaker:

coding?

Speaker:

And there is definitely a good number of things I can bring up when I start talking

Speaker:

about it.

Speaker:

Though I do want to say like one distinction we kind of were, maybe we liked the

Speaker:

idea of it is software engineering.

Speaker:

Like in the big tech companies, like a well-oiled process, it's a repeatable thing

Speaker:

that they keep repeating to keep, you know, churning out new features, new products

Speaker:

and so on and so forth.

Speaker:

However, I do think that the vibe coding and more like, you know, building

Speaker:

locally, building a system today, like a lot of engineers who do it outside of the

Speaker:

big tech, I feel like it's more like a project where the project is something you

Speaker:

kind of figure out how to execute the project as you go along.

Speaker:

There isn't just an answer for every single thing.

Speaker:

Like you don't get told like, oh, this is where you, you know, release the code,

Speaker:

this is where you talk to next.

Speaker:

You know, you don't go step one, step two, step three with the actual like new world

Speaker:

of building systems, building like projects.

Speaker:

I do feel like there's a lot more variance.

Speaker:

And Andrew, it sounds like, sounds like you're kind of experiencing some of that,

Speaker:

right?

Speaker:

If I'm getting, if I'm getting it right.

Speaker:

Well, absolutely.

Speaker:

And it's funny, like, like in enterprise, you have budgets, you have deadlines, you

Speaker:

have a boss who's breathing down your neck.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

But, but when, when you have your own project, all of a sudden, the bigger

Speaker:

limits is, especially when you're not paying for API, you're paying for a

Speaker:

subscription like ClaudeMax, CodexMax.

Speaker:

The biggest, the next big limit is your usage every week, your usage in every five

Speaker:

hour window.

Speaker:

And that's actually the budget now that I'm working with.

Speaker:

Like, like I have to calculate, like I got a 20 X max subscription for Codex and

Speaker:

Claude.

Speaker:

I can use that up a weekly limit in two days.

Speaker:

And I have two subscriptions.

Speaker:

So I can code for four days a week.

Speaker:

That's your budget.

Speaker:

Yeah.

Speaker:

That's my budget.

Speaker:

And, and then I have to think, well, that's just testing, reviewing the code.

Speaker:

And I haven't even really reached the stage where I'm doing practical tests.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Real, real world.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And so that's why I said, I'm like, wow, even with this process, I'm going to be

Speaker:

able to do it.

Speaker:

But as I'm practically fully automated now, it's still going to take a lot of time

Speaker:

because budgets aren't infinite for most projects.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

And I think that's the idea of like within this budget for this project, how can I

Speaker:

actually figure out to execute the thing that I care about?

Speaker:

And I think that's the process of like, oh, wow.

Speaker:

I'm saying process project over and over again.

Speaker:

Maybe I'll say like, there is a self-reflection that needs to happen in projects to

Speaker:

feel like, Hey, what is, what is done?

Speaker:

done.

Speaker:

Like when do I actually, as you said, like earlier in your project, you're actually

Speaker:

like do more.

Speaker:

And eventually you started to realize like, okay, this might be a little too much

Speaker:

because then the amount of stuff that I can validate and actually make sure the

Speaker:

quality is good might be growing so quickly.

Speaker:

Then you have to take a judgment call, which actually lets you decide, or then you

Speaker:

have to, the AI won't do this for you.

Speaker:

So as a builder, you need to take a judgment call, say that, no, we're going to stop

Speaker:

here.

Speaker:

We're going to actually now figure out the validation.

Speaker:

We're going to start figuring out if everything's working.

Speaker:

At least I think that's the way I'm thinking about it.

Speaker:

I don't know if you feel a similar, like, do you feel like that's the process?

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

I, to go along with that for me, how I've kind of thought about it is, you know, I

Speaker:

broke the project down into phases, right?

Speaker:

And I was like, okay, we need, what can we start out with?

Speaker:

You know, an MVP, right?

Speaker:

Like what can we start out with?

Speaker:

Even that is pretty big in scale.

Speaker:

But at least that once that's done, there's a foundation for the later updates.

Speaker:

Right.

Speaker:

Um, and so right now I think that's why this first phase, even though I've already

Speaker:

reduced the scope, like I have like 16 planned modules, I'm only coding like, you

Speaker:

know, eight of them.

Speaker:

And one of them is a really big commons module.

Speaker:

Yeah.

Speaker:

But the thing is that they're being coded, but a lot of them are just skeletons for

Speaker:

the next phases.

Speaker:

I mean, I think, uh, I may say eight modules, eight modules.

Speaker:

Yeah.

Speaker:

I actually feel like that's a good thing.

Speaker:

Usually being able to break it down into smaller pieces.

Speaker:

So you actually then go and if something breaks, you want to ideally be able to

Speaker:

focus on one module.

Speaker:

Oh, absolutely.

Speaker:

Absolutely.

Speaker:

And do you feel like that's kind of the, it has worked pretty effectively?

Speaker:

So, so one of the modules that I have, um, has been broken down into like five sub

Speaker:

modules.

Speaker:

All right.

Speaker:

Okay.

Speaker:

Cause, cause there were, there were big enough elements individually

Speaker:

that I thought, you know, we need to break this down a little bit more, but for the

Speaker:

sake of the project planning, those modules are now instead of oh four, oh four, a,

Speaker:

b, c.

Speaker:

I don't want to go ahead and change all the other module numbers to squeeze those

Speaker:

in.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

You don't want to renumber everything.

Speaker:

You just want to let it be a submodule of the existing.

Speaker:

Exactly.

Speaker:

So those are just submodules now.

Speaker:

Um, at the end of the day, um, it's all about, can we maintain it?

Speaker:

Um, yeah.

Speaker:

And I think interestingly, what you're experiencing is a, there, there's a little

Speaker:

bit of a mirror.

Speaker:

There's like the other side of the coin.

Speaker:

If you look at software engineering, traditionally, I think we were saying there's

Speaker:

processes that once one follows, uh, and the interesting thing is like what I've

Speaker:

experienced is when you're trying to actually build something, build something new,

Speaker:

usually you're getting requirements from somebody.

Speaker:

You're actually getting requirements from a product manager or from leadership

Speaker:

saying that, Hey, we've identified, uh, this opportunity in the market.

Speaker:

Can you go scope it out?

Speaker:

Can we actually go figure out the process actually entails that, right?

Speaker:

Like the process says, uh, leadership identified an opportunity product manager.

Speaker:

Now it goes and figures out what that opportunity is like, how big is it?

Speaker:

How much scope is it?

Speaker:

How many things need to be built out to capture that opportunity?

Speaker:

And then the software engineering folks get pulled in the senior, you know,

Speaker:

principal folks get pulled in, uh, where they're like, okay, what modules do we need

Speaker:

to actually make this?

Speaker:

Oh, do we need new products?

Speaker:

Do we need eight modules?

Speaker:

Do we need three?

Speaker:

Like, is it one good enough?

Speaker:

Uh, and interestingly, uh, you know, there's a lot of

Speaker:

things we can do with confused systems.

Speaker:

There's a lot of things we can do with skilled systems.

Speaker:

And even with, uh, you know, like the, uh,

Speaker:

the, the, the, the, the applications that we, that we wanted to very, very much, we

Speaker:

didn't want to, we wanted to shift that to the, to the, uh,

Speaker:

you know, the, and the, the second one, we were, we were really thinking

Speaker:

about how we can start working with new things.

Speaker:

All of these things are very different than what it would take traditionally the big

Speaker:

tech companies to do, right?

Speaker:

Because they need individuals to write the code in the past.

Speaker:

Maybe not anymore.

Speaker:

Yeah, yeah.

Speaker:

Hmm.

Speaker:

I think, you know, as I've been letting Claude code it, planning the

Speaker:

project out, having the spec sheets and everything like that, I begin to realize

Speaker:

that I kind of wish I actually shrunk it even more.

Speaker:

Shrunk it even more?

Speaker:

Like a lot more, actually.

Speaker:

A lot more.

Speaker:

Because right now I have, like, if we, I'm not sure we talked about, like, the

Speaker:

architecture of it all, but there's going to be, it's going to be a sharded system.

Speaker:

We're going to have nine separate worlds.

Speaker:

They're each going to be very smoothly transitions for players to go between them.

Speaker:

But that requires a lot of extra networking code on top of Minecraft already.

Speaker:

And I was thinking, I could have actually just done just one server.

Speaker:

Single host, yeah.

Speaker:

Single host.

Speaker:

Exactly.

Speaker:

Ignored all the extra networking stuff.

Speaker:

And, you know, I'd probably already be in game testing right now.

Speaker:

Mm-hmm.

Speaker:

Because I think that that extra layer, on top of all the other things that I wanted,

Speaker:

Yes.

Speaker:

is actually, that's the complicated part that Claude is spending a lot of time on.

Speaker:

Yeah.

Speaker:

There's a lot of gotchas that, on the first pass, it's not going to catch.

Speaker:

Yes.

Speaker:

And I thought about it.

Speaker:

I was like, maybe I should.

Speaker:

But, you know, I've already spent so much.

Speaker:

So you are the product manager and the principal engineer, like, dealing with this

Speaker:

at the same time.

Speaker:

Yes, and I'm like, okay, well, it's going to be worth it when it works, but I will

Speaker:

get back to you on when it works.

Speaker:

And this is where I do feel like the traditional software engineering and big tech

Speaker:

would have been like, oh, no, we failed.

Speaker:

Because the process should have already caught this at some point.

Speaker:

So I feel like that's the interesting differentiation I see right now.

Speaker:

Which is not always to say it was a good thing.

Speaker:

Because I do feel like going through this process of, like, or the project approach

Speaker:

of, like, let's go start, let's just see what we can build, and, you know, let's

Speaker:

make it unrestricted, right?

Speaker:

We're going to learn a lot more.

Speaker:

And I think that means we're actually going to figure out things more individually.

Speaker:

And I'm curious if you feel like doing the project in the way that you have done it

Speaker:

so far has actually taught you a lot more of what you would avoid next time.

Speaker:

And because you are mentioning that maybe you should have started smaller and, like,

Speaker:

started more restrictive.

Speaker:

And I'm curious if you feel like doing the project in the way that you have done it

Speaker:

so far has actually taught you a lot more of what you would avoid next time.

Speaker:

There's, like, history in software engineering that kind of points us to some of

Speaker:

this stuff, right?

Speaker:

And even then, a lot of software engineers don't actually follow it.

Speaker:

A lot of companies don't actually follow it.

Speaker:

There's no guarantee it's going to actually be repeatable and working.

Speaker:

So individual builders now have the same similar power.

Speaker:

And I'm curious, are you, like, do you feel like this is one of those moments where

Speaker:

you're like, all right, I'm going to go ahead and let this run as it does.

Speaker:

But next time, I'm actually going to do single host or try and figure out what

Speaker:

single host looks like.

Speaker:

Yeah, I have a couple ideas of new projects, I suppose.

Speaker:

And I thought about it a little bit.

Speaker:

I do think that the stuff I'm doing now is informing future

Speaker:

projects.

Speaker:

I definitely would have done it a lot differently.

Speaker:

And another part of it is I constantly do change what I do

Speaker:

all the time.

Speaker:

Like, half of the time.

Speaker:

Half of my time is spent actually optimizing the workflow, thinking about where can

Speaker:

we cut costs?

Speaker:

Where can I cut usage?

Speaker:

For example, I have directed Opus to actually use

Speaker:

Sonnet to implement fixes now.

Speaker:

Yes.

Speaker:

Because, I mean, Opus will write the, you know, the plan and then Sonnet's pretty,

Speaker:

pretty good at implementing it.

Speaker:

And at the end of the day, Opus will still review the changes.

Speaker:

So I'm still gaining it behind the front-tier model.

Speaker:

And it's not just Opus, right?

Speaker:

If I'm not mistaken.

Speaker:

Yeah.

Speaker:

Right.

Speaker:

And it's not just Opus.

Speaker:

Yes.

Speaker:

There's other reviewers.

Speaker:

I have GPT 5.5 as well.

Speaker:

DeepSeq.

Speaker:

It's very cheap.

Speaker:

It's very cheap.

Speaker:

It's very cheap.

Speaker:

You're welcome, China.

Speaker:

But, yeah, I think the decorrelated reviews have saved a lot of money,

Speaker:

actually, because I'm having three different models review the code at the same

Speaker:

time.

Speaker:

And there's a lot of use.

Speaker:

There's a lot of unions in their findings, and there's a lot of findings that they

Speaker:

individually would not have picked up.

Speaker:

They're all trained on different data, and that gives you three different

Speaker:

perspectives.

Speaker:

Perspectives, yeah, yeah, I love that.

Speaker:

And I think that's really important when you're trying to build a system that's

Speaker:

robust, and that's actually what I'm trying to do with Minecraft.

Speaker:

Like, the technology behind what's going on here is intentional to be robust,

Speaker:

because there is a lot of different communities that have done similar things.

Speaker:

But not to this scale, and not, like, even

Speaker:

normal MMOs have failed at doing this.

Speaker:

Actually being able to scale out.

Speaker:

Yeah, actually being able to scale and to, like, allow, like, seamless gameplay.

Speaker:

It remains to be seen if I can accomplish this myself, because I get a little

Speaker:

concerned, thinking, like, why hasn't, you know...

Speaker:

I was going to say, for our listeners, can you maybe, like, do you have a specific

Speaker:

thing that you can talk about where the scale was required?

Speaker:

Yes, yes, what was that game?

Speaker:

Amazon Game Studios, we played it a little bit.

Speaker:

Oh, yes, yes, New World, New World.

Speaker:

New World, and see, that was the game that I thought was, I thought Amazon was going

Speaker:

to solve this problem.

Speaker:

And when you say the solve this problem, which problem, if I may say, the problem in

Speaker:

particular.

Speaker:

The problem of scale.

Speaker:

The problem of when your game releases, you have millions of players that arrive,

Speaker:

and all of a sudden you have a 30,000 player queue.

Speaker:

Yes, yes, yes, yes, okay, yeah, yeah.

Speaker:

And not only that, it's not one big world.

Speaker:

There's hundreds of worlds that have lines.

Speaker:

Yes.

Speaker:

They're crashing left and right.

Speaker:

And I was like, come on, AWS, Amazon, had to have had the resources and know-how

Speaker:

to do this.

Speaker:

But yet, they made the same mistake as every other predecessor before them.

Speaker:

And I think that's the really interesting part, is, like, I'm also curious, like, if

Speaker:

I were to go back and ask them that question, what part broke, right?

Speaker:

Like, what was it the fact that now you would have to...

Speaker:

Just have n number of players on the map at the same time?

Speaker:

Was it that they're trying to communicate with each other over voice or something?

Speaker:

And that was essentially what was causing the breakage?

Speaker:

I'm curious of, like, what was their bottleneck?

Speaker:

Because, sorry, yeah, you had something in mind?

Speaker:

I do have something in mind.

Speaker:

Like, I played a lot of different MMOs.

Speaker:

A sharded system is really common, right?

Speaker:

The issue, I think, that Amazon...

Speaker:

That Amazon had with New World was they built the game like it was any other MMO.

Speaker:

They did not take advantage of their expertise.

Speaker:

From the get-go.

Speaker:

Yeah, they built their own game engine, but they didn't do anything unique.

Speaker:

Yeah.

Speaker:

They didn't structure it in a way where they could scale it automatically.

Speaker:

Yes.

Speaker:

Right.

Speaker:

And maybe they did, but it didn't work.

Speaker:

The process failed.

Speaker:

It didn't work.

Speaker:

The risk was not assessed properly.

Speaker:

Yes.

Speaker:

I mean, there were...

Speaker:

We played launch.

Speaker:

We literally could not play for a few days.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

We actually gave up on the weekends.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

We had to wait a few days.

Speaker:

And to me, that's millions of dollars being lost.

Speaker:

Yeah.

Speaker:

I mean, I think New World would be a completely different game.

Speaker:

If it was built with that scale from the get-go.

Speaker:

If it was built properly from the get-go.

Speaker:

Yeah.

Speaker:

And I think that's the interesting, like, trade-off there as well.

Speaker:

Like, this is...

Speaker:

My understanding is also, like, traditional software engineering would also ask you

Speaker:

that question is, do you know if you need that scale?

Speaker:

Like, do you know if the marketing has been effective enough?

Speaker:

And do you know if the product...

Speaker:

Does the product manager actually...

Speaker:

Have they talked to the marketing department and seen an insane amount of, like,

Speaker:

interest?

Speaker:

And have they been able to calculate the amount of interest to then inform the scale

Speaker:

decision?

Speaker:

Because it is very common for software engineering teams during this process of

Speaker:

building the product to ask this question of, like, hey, do we want to be scalable

Speaker:

to, like, two million players on day one?

Speaker:

Or do we want to be scalable, you know, like, one world will be scalable to, like,

Speaker:

100,000 users at a time, that kind of thing.

Speaker:

And they make these decisions so that they can, you know, like, punt some of the

Speaker:

very complicated, very difficult things, in this case being, like, instead of

Speaker:

splitting this module into six different sub-modules, module three into six

Speaker:

different sub-modules, I'm just going to make module three into two sub-modules for

Speaker:

today.

Speaker:

And that'll satisfy my needs for now.

Speaker:

In this case, it feels like that process was a failure because somewhere, somehow

Speaker:

they didn't understand that the demand was so high and one of the core values as

Speaker:

gamers, as, like, people who enjoy playing games, waiting to get in to play your

Speaker:

game is a game-breaking experience.

Speaker:

Especially when you're super excited and you pre-ordered the game.

Speaker:

Yeah, and you paid extra.

Speaker:

Yeah, you paid extra.

Speaker:

And it's such a shame because I actually did, like, once we finally did play.

Speaker:

Yeah, it was fun.

Speaker:

It was fun.

Speaker:

But the game quickly died off

Speaker:

because, I mean, you had millions of players who could not play.

Speaker:

Yes.

Speaker:

Day one.

Speaker:

And then sometimes your friends would end up on the other server or the other world.

Speaker:

Yeah.

Speaker:

And they couldn't come and join you.

Speaker:

So all of a sudden, one of the main reasons I play games is a social, like, it's a

Speaker:

social thing for me.

Speaker:

I want to play with other people.

Speaker:

I want to play with my friends.

Speaker:

And if I can't play with my friends, I'm going to find a much more difficult return.

Speaker:

Absolutely.

Speaker:

So I do genuinely, like, question.

Speaker:

That's the traditional software engineering method.

Speaker:

So, like, waits for such a long span because the cost of solving these problems tend

Speaker:

to be, again, you're making commitments to your boss.

Speaker:

Yeah.

Speaker:

You're answering to leadership.

Speaker:

You're saying that, all right, leadership is saying that, OK, you have a budget of

Speaker:

these many people for these many weeks.

Speaker:

And if you can get the game released in those weeks, great.

Speaker:

Otherwise, we're going to maybe, like, go a few, you know, not give you a promotion,

Speaker:

whatever it is, right?

Speaker:

The value proposition is so different versus I have seen indie games that have been

Speaker:

so successful and they haven't.

Speaker:

But I also do understand that they don't have that same pressure of, like, you know,

Speaker:

corporate top down of, like, you need to release this soon, they will release it

Speaker:

when they want.

Speaker:

So I'm actually also curious.

Speaker:

Do you feel like do you feel pressure to release your project?

Speaker:

So thankfully, I actually have not released public information on this yet.

Speaker:

Oh, so so if someone finds this podcast, they will recognize my voice.

Speaker:

Then then they're going to start asking questions immediately.

Speaker:

There there are some people who know that it's coming.

Speaker:

I haven't given any hard timelines myself because this is,

Speaker:

you know, a project that I'm figuring out as I'm going along.

Speaker:

Absolutely.

Speaker:

But there is pressure because I think I set up a lot of my own

Speaker:

deadlines in my head.

Speaker:

Right.

Speaker:

There's I have a lot of expectations of where I should be by a certain time.

Speaker:

And that's part of the workflow.

Speaker:

And I'm like, OK, I'm going to trim this.

Speaker:

I'm going to you know, I'm going to accept that, you know, a lot of these things

Speaker:

aren't exactly as I want them, but I'm going to leave it and we're going to move on

Speaker:

and just try to get this working right now.

Speaker:

And I actually think I'm trying to get to in game testing as fast as possible,

Speaker:

because like you were saying, like before the podcast, the practical testing can

Speaker:

serve way it's way more efficient than just, you know, traditional

Speaker:

tests.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And I think that's a really interesting topic.

Speaker:

Actually, we talked about we will jump into that a little more after.

Speaker:

I do want to ask.

Speaker:

But you don't feel at this point you don't feel like you would compromise on certain

Speaker:

aspects, though, despite the time pressure, there are certain things that you are

Speaker:

very much like this is a critical thing based on your experience.

Speaker:

Yes.

Speaker:

Yeah.

Speaker:

Based on my experience, there's a lot of things at this point that I'm I'm holding

Speaker:

on to, which is really interesting to me, because when you talk about big

Speaker:

corporations and Amazon releasing it, the distance between the person who actually

Speaker:

understands the experience and the person building the experience is actually non

Speaker:

-trivial.

Speaker:

So I do feel like in big tech or in generally like big software organizations, that

Speaker:

is something that is I I'm really excited about the, you know, like coding tools and

Speaker:

all of these things becoming much more democratized, because the person who actually

Speaker:

understands the most about the experience now can actually ask direct questions

Speaker:

about, like, which parts of the experience are actually going to be implemented

Speaker:

versus not.

Speaker:

And I feel like I made the joke by you being the PM and the, you know, sometimes

Speaker:

that's I do feel like that that's a good thing, because I actually feel like you're

Speaker:

able to not only understand, but ask questions and actually also get into focus in

Speaker:

the right way.

Speaker:

Absolutely.

Speaker:

Though the testing part is still like an all huge, you know, open box.

Speaker:

And I guess I'm curious.

Speaker:

So, like, we were talking about certain number of lines of test versus code.

Speaker:

And do you want to share?

Speaker:

Yeah.

Speaker:

Yes.

Speaker:

I think

Speaker:

Project now has approximately two hundred thousand lines of code.

Speaker:

And it's about it's not exactly fifty fifty, but it's a little bit.

Speaker:

It's about fifty fifty.

Speaker:

It's going to be by the end of it.

Speaker:

I also meant the two hundred thousand lines is non-trivial.

Speaker:

Yeah, that that is a lot.

Speaker:

And I'm clearly not, you know, hand reviewing any of this.

Speaker:

But there is plenty of standards and conventions that we talked about last time that

Speaker:

are being taken into consideration during the iterative review rounds.

Speaker:

And actually to speak on that, give a little bit more since I've actually interfaced

Speaker:

with the project since then, quite a bit.

Speaker:

One module took 90 review rounds to converge into

Speaker:

what I at a certain point I actually had started to strip away requirements for the

Speaker:

testing.

Speaker:

Is that the module that you broke up or is that the module?

Speaker:

That's the module that I wrote.

Speaker:

OK, OK.

Speaker:

That explains all.

Speaker:

That does explain all.

Speaker:

Yeah.

Speaker:

It did finally convert.

Speaker:

Into where it was mostly like comments and code were just not consistent with

Speaker:

previous changes and at a certain point, I'm like, OK, we're going to keep finding

Speaker:

things forever.

Speaker:

Yeah.

Speaker:

And so I'm like, I think this is a good place to stop.

Speaker:

Yes.

Speaker:

And actually, since I actually went through a review review process

Speaker:

with the review cycle.

Speaker:

Yeah.

Speaker:

With Claude.

Speaker:

And I said, OK, let's take a look.

Speaker:

I had it log all 90 rounds.

Speaker:

You reflected on the review process of like this 90 iterations.

Speaker:

OK, well, OK, yeah.

Speaker:

Tell us more.

Speaker:

I had it.

Speaker:

I had it, you know, from the get go log all 90 rounds.

Speaker:

It logged everything from all the different the three models that I used to review

Speaker:

things.

Speaker:

Really good space to do a reflection on 90 rounds is a lot.

Speaker:

I was like, I was like, I burned a lot.

Speaker:

Like

Speaker:

and so it came back and looked through everything and it gave me.

Speaker:

So I was like, what?

Speaker:

I asked it.

Speaker:

You know, simple English.

Speaker:

Yeah.

Speaker:

What can we do to lower the amount of rounds?

Speaker:

Give me the executive review.

Speaker:

Yeah.

Speaker:

And it came back with a lot.

Speaker:

I'm not exactly familiar with maybe all the terminology specifically.

Speaker:

But there were things like I remember it saying, like, it will add it'll do like a

Speaker:

pre-flight check.

Speaker:

It will, like, trace all the methods and classes ahead of time against the specs.

Speaker:

It will, you know, it will have an index of what it needs to look for.

Speaker:

It added a few linters.

Speaker:

OK, yeah.

Speaker:

Yeah.

Speaker:

That's good.

Speaker:

It did.

Speaker:

So those are improvements.

Speaker:

Yeah, yeah.

Speaker:

Yeah.

Speaker:

And it's actually interesting.

Speaker:

The next model on the next module that it reviewed only took five rounds.

Speaker:

That 90 to five.

Speaker:

That's that's pretty.

Speaker:

That's pretty.

Speaker:

OK, I must also play devil's advocate and ask how big was the other one?

Speaker:

The it's I would say that they were similar, similar, similar.

Speaker:

But but here's the thing.

Speaker:

Here's the thing.

Speaker:

There's another reason why.

Speaker:

Yeah, because of the a lot of the rounds were

Speaker:

finding things that it should have picked up the first time.

Speaker:

OK, OK.

Speaker:

Right.

Speaker:

And that's where those things like tracing all the tracing the comments back.

Speaker:

Yeah, the linters, all of these things really did reduce a lot of the

Speaker:

noise.

Speaker:

So interestingly, I recently read this as well as like I was going through Claude's

Speaker:

has updated documentation online.

Speaker:

They actually I think a while ago put this user guide and I think it's pretty

Speaker:

buried.

Speaker:

Unfortunately, I do feel like it's a little buried.

Speaker:

One of their strong recommendations is plan first always.

Speaker:

But even before planning, you should actually ask it to understand research what

Speaker:

this module is doing or what this code looks like.

Speaker:

How does it actually trace down a particular feature?

Speaker:

So it's like, OK, how does your authentication flow work?

Speaker:

Like would be a good question.

Speaker:

Right.

Speaker:

And that actually does pre-work.

Speaker:

It says that, all right, capture all the information about the authentication flow,

Speaker:

because then I know exactly where I need to make the updates.

Speaker:

Sounds like that's you've experienced that firsthand now.

Speaker:

Yes, yes.

Speaker:

And like I was like last show,

Speaker:

there's a lot of things I'm finding out by brute force, like I'm developing the

Speaker:

process that, you know, a software engineer would have known.

Speaker:

Yeah.

Speaker:

Or would have been would have been instructed to do more than even know, like when

Speaker:

told, turn your brain off.

Speaker:

Just follow the process.

Speaker:

Yeah.

Speaker:

Which is I do feel like that's that's actually a very interesting distinction I want

Speaker:

to get back into later is

Speaker:

you're learning the reason why the judgment exists or the process exists.

Speaker:

You're using your judgment and then getting Claude to give you the right information

Speaker:

so that you can take the correct judgments that, you know, like probably engineering

Speaker:

teams have been doing ad nauseum across time.

Speaker:

And that ends up becoming either tribal knowledge or becomes very strict process.

Speaker:

It's what it feels like to me.

Speaker:

Like, that's that's what I'm hearing almost.

Speaker:

I'm curious how many more like do you feel like we also talked about using existing

Speaker:

tools, we also talked about like some project or so get you done as a repository

Speaker:

that I keep I keep talking, yes, yes, we'll we'll we'll see if that gets flagged in

Speaker:

some.

Speaker:

That is probably what got flagged on the podcast.

Speaker:

OK, that is probably all right.

Speaker:

So we were trying to syndicate our podcasts across things and the tool's name has

Speaker:

now become a problem.

Speaker:

So I'm sorry.

Speaker:

Andrew, you're going to have to find it.

Speaker:

You're going to have to get OK.

Speaker:

We're going to get flagged again.

Speaker:

Oh, it's an official.

Speaker:

I'll ask Claude to beep it out.

Speaker:

Yeah, yeah.

Speaker:

If it can do it, that would be fantastic.

Speaker:

That would be interesting.

Speaker:

And would also at some point love to learn more about the whole process.

Speaker:

We should be talking about that at some point, too.

Speaker:

Yes, yes, absolutely.

Speaker:

There's a whole process that I've used to normalize the audio to transcribe

Speaker:

per speaker.

Speaker:

That's yes, that's pretty amazing.

Speaker:

The podcast.

Speaker:

I did.

Speaker:

I did read the transcript, at least blurbs here and there.

Speaker:

I was like, OK, it's pretty good.

Speaker:

It was impressive.

Speaker:

And it was done with local compute as well.

Speaker:

Yeah, on GPF.

Speaker:

That's another that's a successful project right there.

Speaker:

Yes.

Speaker:

Anyway, sorry, coming back into this.

Speaker:

Do you feel like there are you're learning a lot of things based on your individual

Speaker:

judgment as well and you're learning a lot about the tool life is what I feel like

Speaker:

is most likely happening.

Speaker:

So maybe I should ask you that question.

Speaker:

You're right here.

Speaker:

Do you feel like.

Speaker:

You're learning a lot more about the tool and the process that you would follow and

Speaker:

building now with the modern AI tooling?

Speaker:

Definitely.

Speaker:

I guess, too, I think that was a really broad question.

Speaker:

It was very broad, I'm sorry.

Speaker:

So I was like, oh, where do I start with this?

Speaker:

I guess you have a little bit more specific to help me nail something down.

Speaker:

Yeah, yeah, definitely.

Speaker:

Definitely.

Speaker:

And without spending too much.

Speaker:

So like we talked about reviews, we talked about basically analyzing the codebase

Speaker:

ahead of time, basically doing research, pre-research ahead of time, then planning

Speaker:

and then executing.

Speaker:

Do you feel like there are more examples where you're like, OK, this is taking way

Speaker:

longer, this was too big, and I think maybe more like it

Speaker:

didn't understand me here clearly and it did all these things as well.

Speaker:

One of the things I think you mentioned to me earlier was you did tell Claude to

Speaker:

exercise his own judgment.

Speaker:

And that is something you're still working on figuring out if that was effective or

Speaker:

not.

Speaker:

Right.

Speaker:

Yes.

Speaker:

So so I guess, yeah, I remember talking about this last show.

Speaker:

Like, there's a lot of things that Claude will just fill in the blanks, fill in the

Speaker:

gaps, especially if you're not specific or intentful with your instructions.

Speaker:

If you say code this, well, it's going to code it.

Speaker:

But is it going to be the way you wanted it?

Speaker:

Is it going to be, you know, is it going to work more than once?

Speaker:

You know, you know, and will you be will you be able to follow its thought process

Speaker:

as well?

Speaker:

Exactly.

Speaker:

Yes.

Speaker:

I think before the podcast, we were talking a little bit about like how you've also

Speaker:

maybe you're using a particular workflow here to actually also learn new things

Speaker:

because it may tell you things and then you're just like, what do you mean?

Speaker:

Yes.

Speaker:

Yes.

Speaker:

And I'll speak to that.

Speaker:

Yeah, there's a lot of times where Claude will present an issue to me that popped up

Speaker:

for review.

Speaker:

And I'll read over it and I'll be like, what?

Speaker:

What are you saying?

Speaker:

Yeah.

Speaker:

And so I will literally just ask, like, what do you mean?

Speaker:

Like, could you expand on this a little bit more?

Speaker:

And then after it starts talking, I was like, OK, I'm starting to get the picture.

Speaker:

I ask more specific questions.

Speaker:

Right.

Speaker:

And I keep going and I keep going.

Speaker:

And eventually I have the entire picture and then I'll make a decision.

Speaker:

Yes.

Speaker:

And sometimes by the time I get to there, I'll say, well, we don't need.

Speaker:

We don't need any of this.

Speaker:

Yes.

Speaker:

OK.

Speaker:

And I think that is so this is this is great because this is actually what I wanted

Speaker:

to ask you is how much time did you end up spending doing that?

Speaker:

Because that is a huge value add, right?

Speaker:

Like that is all of a sudden you're like not only learning about the system, you're

Speaker:

also able to then detect it's like this was not a valuable portion of the system,

Speaker:

let's just get rid of it.

Speaker:

That's actually a really good thing.

Speaker:

Honestly, like sometimes you know the tool can do so much.

Speaker:

It does a lot.

Speaker:

And then you all of a sudden you can be like, oh, no, we can simplify.

Speaker:

We should simplify it.

Speaker:

And that is something that takes software engineering teams years potentially before

Speaker:

they realize that the systems they built, there are some portions which are not

Speaker:

useful and not needed and we can get rid of them and then also reduce the amount of

Speaker:

time we take or we spend maintaining them.

Speaker:

And that's cost.

Speaker:

That's again leadership cost.

Speaker:

I'm like curious, like how long now it took you to be able to learn some things

Speaker:

based on like just asking Claude.

Speaker:

Was it days?

Speaker:

Was it hours?

Speaker:

Oh, it's usually minutes.

Speaker:

It's usually all right.

Speaker:

Well, it's usually minutes.

Speaker:

I mean, it when something's costing me time, I

Speaker:

think it's going to be an easy lesson.

Speaker:

That's that's because you are definitely being able to see the, you know, your limit

Speaker:

getting exhausted.

Speaker:

Yeah, that's it.

Speaker:

I don't wait to see the percent change on weekly.

Speaker:

I hit refresh after two, three minutes and well, there's another percent gone.

Speaker:

And you're like, oh, that's that's a heavy.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

It was when I had access to Fable for the three days there

Speaker:

was I actually was not specific on something that Fable

Speaker:

was researching for me.

Speaker:

And it presented me with a couple of options.

Speaker:

And I said, expand on this option.

Speaker:

It launched a hundred agent workflow

Speaker:

to research this and came back with the most worthless response

Speaker:

I've had, probably from Claude.

Speaker:

But it was successful in burning 20 percent of my weekly limit in 30 minutes.

Speaker:

Oh, that's 20x too.

Speaker:

Yeah, on 20x.

Speaker:

Yeah, that was about three point three million tokens for that response.

Speaker:

Oh, wow.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And that would have been what that would have been if it was all output.

Speaker:

That would have been like one hundred and fifty bucks an API.

Speaker:

Wow.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

That's that's a lot.

Speaker:

That's an.

Speaker:

And that is also something that happens often.

Speaker:

But no, that's that's really interesting.

Speaker:

So it moves so fast that it essentially did so much work.

Speaker:

That was not really worth much at the end.

Speaker:

It told you something you mostly probably already knew.

Speaker:

Yeah.

Speaker:

And I was really simple.

Speaker:

I just expand on option two versus like go do a million.

Speaker:

You know, I didn't I didn't say do like a deep deep research report.

Speaker:

I don't know a PhD on this.

Speaker:

Yeah.

Speaker:

I just I just wanted a simple explanation.

Speaker:

And the thing was, I'm running multiple workflows.

Speaker:

I all tabbed came back.

Speaker:

I'm like, word, what happened here?

Speaker:

All right.

Speaker:

OK, so I do want to ask this question now is like in your mind, like, do you feel

Speaker:

like with events like this in general, like you now need to

Speaker:

really incorporate more rituals, more processes, make your project more of a

Speaker:

process if you want to make this into a.

Speaker:

Real life project or sorry, into a real life product or a real life.

Speaker:

Like, do you feel like now when you start running into things like this, your

Speaker:

confidence level has dropped enough that you now need to add things like guardrails

Speaker:

to increase your confidence, because one thing you already said was you did change

Speaker:

the process to do pre-work and that pre-work saved you from 90 to five iterations,

Speaker:

so like nine iterations on a single module to five, which is massive.

Speaker:

Do you feel like now you'd be more interested in looking at opportunities?

Speaker:

To add more guardrails, to slow it down intentionally?

Speaker:

So so that's a great thing to ask, because I actually that's my next kind of

Speaker:

process improvement part of the workflow is that I realized that I'm creating

Speaker:

a lot of workflows for it to follow.

Speaker:

It doesn't always follow them correctly.

Speaker:

Right.

Speaker:

Even if it's literally reading a script.

Speaker:

And then I realized I probably should be using skills and I haven't.

Speaker:

And so

Speaker:

my next step actually is to implement all the workflows that I have been using over

Speaker:

and over and implement them as formal like Claude or Kodak skills.

Speaker:

Yes.

Speaker:

Nice.

Speaker:

I said, OK, I am a little I may have some biased thoughts over there because I've

Speaker:

been using skills for a while.

Speaker:

But before we get into that, could I would you like to explain for audiences in case

Speaker:

like what is a skill versus like what is other prompting in traditional software or

Speaker:

prompting?

Speaker:

So, you know, maybe I you might be able to talk more to this, but

Speaker:

but I'll give you my interpretation of it first, because, again, I haven't actually

Speaker:

made a skill with Claude yet.

Speaker:

So that that was the next thing I'm going to do.

Speaker:

There's a really useful tool from Anthropic to do it.

Speaker:

But this is the skill tree.

Speaker:

The skill creator is so, so creative.

Speaker:

From my understanding.

Speaker:

It

Speaker:

essentially

Speaker:

is like a set of instructions that points Claude to where it needs to find the

Speaker:

information to replicate something

Speaker:

over and over, like a workflow.

Speaker:

Right.

Speaker:

Yes.

Speaker:

There is for me, it's like I don't know exactly how it was that different than

Speaker:

me telling you to read a file.

Speaker:

See, so that that's the question, because that's what I had already.

Speaker:

Like, all I have to do is tell Claude will convert this runbook into a skill.

Speaker:

That's all I'm going to say.

Speaker:

OK.

Speaker:

And then let's see if it works.

Speaker:

Right.

Speaker:

I'm hoping it will because it's it's all right.

Speaker:

All the instructions already there.

Speaker:

That's yes.

Speaker:

I think I think generally you've got the right gist of it.

Speaker:

It is a workflow for Claude to execute.

Speaker:

And in a sense of like instructions that it should know how to like methodically

Speaker:

say, do one, two, three, and you'll get the result that is intended as part of this

Speaker:

skill.

Speaker:

Like, for example, a skill could be like a tax preparation skill could be like, OK,

Speaker:

you know, put together the person's W2 and then put together all the right fields,

Speaker:

fill up the 1099 form or whatever the form is and then submit it.

Speaker:

Right.

Speaker:

Those would be the skill.

Speaker:

The thing I can add that I what I understand why it tends to behave differently than

Speaker:

traditionally, like just saying a prompt and then telling it to go read a file is

Speaker:

there is something the concept of system prompts versus user prompts.

Speaker:

Yeah.

Speaker:

So system prompts are what the AI model is essentially used to give themselves

Speaker:

context or like separate the role from like, oh, this is supposed to be me doing

Speaker:

some work versus this is what the person who's talking to me is asking.

Speaker:

So that person talking to me from a model perspective with the user prompt, whereas

Speaker:

the system prompt is essentially the actual model's own identity in a sense.

Speaker:

Like, what is its role?

Speaker:

What is its job?

Speaker:

What context is it?

Speaker:

Am I working for Anthropic?

Speaker:

What is my language supposed to be like?

Speaker:

Am I supposed to avoid saying things like, you know, like avoid profanity, avoid

Speaker:

doing like suggesting things that are not real, always base my things in reality.

Speaker:

So the there are two prompts that every AI system or every chatbot system basically

Speaker:

usually uses.

Speaker:

It is a system.

Speaker:

It is a user prompt because you could have n number of user prompts that get built

Speaker:

up over time.

Speaker:

But there's only one system prompt for that system to build over time, which was a

Speaker:

problem in the past, because if there was some very specific instructions like tax

Speaker:

reparation, that is very repeatable, that's very standard.

Speaker:

The system prompt might not be able to hold every single set of

Speaker:

like tax preparation

Speaker:

and, you know, like a writing expert and let's say, you know, a software engineer.

Speaker:

Right.

Speaker:

You can't put all of those into the system prompt.

Speaker:

It just gets too big.

Speaker:

So what they actually the innovation here with the skill was that the system prompt

Speaker:

would have a stub and then you could replace that stub with a user provided set of

Speaker:

instructions, which was a skill.

Speaker:

So you have within the system prompt, a specific section that's like add skill text

Speaker:

here.

Speaker:

And even the skill is supposed to follow a particular format that works well with

Speaker:

that model, which is supposed to give information like, all right, give me the

Speaker:

circumstances under which the skill needs to be invoked.

Speaker:

Give me, you know, like the actual step by step instruction first.

Speaker:

Give me some examples like how this works.

Speaker:

Give me like some starting text and what the eventual response should look like,

Speaker:

because then all of those things can go into that system prompt.

Speaker:

And then it's like the.

Speaker:

The AI model has a more specific role and is now executing a particular skill like

Speaker:

tax preparation and instead of the user prompt defining all of those things, which

Speaker:

ends up sometimes also being isolated and separated because you're also doing things

Speaker:

like when a user asks you for something, the AI model may or may not do everything

Speaker:

because a user may ask you to do bad things as well, like, you know, maybe a prompt

Speaker:

injection trying to do something negative is also a possibility.

Speaker:

So that same isolation that occurs on the user input or the sanitization

Speaker:

that occurs on user input does not apply to the skill and the system prompt.

Speaker:

Therefore, the skill is intended to be more focused with the instructions and follow

Speaker:

a particular format.

Speaker:

That's that's maybe the overly detailed explanation of it.

Speaker:

But yes, the intention is supposed to be that you can repeatable workflows end up in

Speaker:

skills.

Speaker:

They actually get honored more effectively.

Speaker:

So they actually get treated more like the model's gospel, like things that it'll

Speaker:

follow religiously versus not.

Speaker:

So, yes.

Speaker:

But exactly as you said, it is a workflow.

Speaker:

It is actually intended to be focused on a specific area and solve that repeatedly.

Speaker:

Yes, we just went into a five minute conversation about what a skill is.

Speaker:

But anyway, Andrew, come back.

Speaker:

Sorry.

Speaker:

Yeah, skills.

Speaker:

So you want to try skills next?

Speaker:

Yes.

Speaker:

Yes.

Speaker:

I want to try to create my own skills through the workflows I already have

Speaker:

outstanding and hopefully I will get more consistency.

Speaker:

That's my goal.

Speaker:

Like the workflows work, but there's so many instructions and so many

Speaker:

things that need to be followed.

Speaker:

Yes.

Speaker:

The models are just forgetting, conveniently forgetting certain steps along the way.

Speaker:

I'll be like, why aren't you doing this?

Speaker:

And then it'll be like, oh, it was because I interpreted,

Speaker:

you know, something I said earlier differently.

Speaker:

Like I sometimes it will say, for example,

Speaker:

sometimes I'll say continue autonomously.

Speaker:

Yes.

Speaker:

And it's very simple, straightforward.

Speaker:

Very straightforward.

Speaker:

It understands that.

Speaker:

Right.

Speaker:

But sometimes it will still,

Speaker:

you know, a blocker will come up even though I have the protocol for that.

Speaker:

It will show the decision menu and it would pause the whole workflow.

Speaker:

I'm not looking at it 24-7.

Speaker:

I come back, it's been sitting there for five hours waiting for me to say something.

Speaker:

Yeah.

Speaker:

And you're like, well, you know, you should have just continued autonomously.

Speaker:

You had the information.

Speaker:

Yeah.

Speaker:

And and then I I would ask it, so so why did you stop?

Speaker:

Yep.

Speaker:

And it was funny.

Speaker:

It literally said I didn't have a good reason to stop.

Speaker:

Oh, my gosh.

Speaker:

OK.

Speaker:

It needed your direction, Andrew.

Speaker:

Yeah, yeah, yeah.

Speaker:

And yeah, it was it's it can lose, I guess.

Speaker:

It was a simple thing to remember, but it still got drowned out

Speaker:

over time.

Speaker:

Yeah, I mean, and I definitely.

Speaker:

I think that is a challenge when context windows get long.

Speaker:

Yeah.

Speaker:

Models have been known to bias towards remembering the last thing you told them or

Speaker:

the first thing you told them.

Speaker:

And everything in between kind of just gets muddled.

Speaker:

Modern techniques and modern, you know, like the latest versions of Claude may be

Speaker:

like are better at this in certain

Speaker:

circumstances that are not so that they are still susceptible to them.

Speaker:

So there's still a possibility that will occur.

Speaker:

Something in the middle just gets lost.

Speaker:

When you said I imagine when you said continue autonomously, it's probably like

Speaker:

continue what?

Speaker:

And then it just was like, well, it was a bit better than that.

Speaker:

Well, yeah, I understand.

Speaker:

I totally agree.

Speaker:

Yeah.

Speaker:

So this does feel like there's some aspect of it where it got lost in the sauce

Speaker:

somewhere at some point.

Speaker:

And that is where something like a skill, which is like, OK, regardless of what the

Speaker:

person is asking me, this is what I'm supposed to be able to do is like a kind of

Speaker:

like it's it's kind of like a grounding.

Speaker:

Truth for it.

Speaker:

It's like this is the ground truth for me to follow or the grounding instructions

Speaker:

for me to follow.

Speaker:

So it won't ever like it always consider that, OK, regardless of what this person

Speaker:

said, what is the how does that work into the grounding truth of these instructions

Speaker:

that I'm supposed to follow and then it'll ask for clarification ahead of time and

Speaker:

it won't just arbitrarily just wait so that that is maybe the way to put it as well

Speaker:

as like you can if you have a skill for a researcher, the researcher will be like,

Speaker:

oh, I can't start until they give me all this information.

Speaker:

But once you give me that information, I can do this thing on its own.

Speaker:

And the similar thing is like when you hand it off from a researcher to, let's say,

Speaker:

a actual planning agent, then that planning agent also, if it has the right skill,

Speaker:

can also be like, let me clarify what I need ahead of time and then I can basically

Speaker:

move on.

Speaker:

So that's also like some of the skill benefits are also like giving a very solid

Speaker:

understanding of what's required to start.

Speaker:

And then because now you have a very clear understanding of what's required to

Speaker:

start, the model can also ask questions to make sure that it has enough information.

Speaker:

So you can ask clarifying questions and do all that stuff ahead of time.

Speaker:

So, yeah, sorry.

Speaker:

But yeah, I can keep going.

Speaker:

I think the important part for me

Speaker:

is that just like you were saying, as we started this like little tangent here,

Speaker:

was I like losing confidence in it, you know, performing things that I have

Speaker:

already established.

Speaker:

I have done what the models just did.

Speaker:

But yes, sorry, sorry.

Speaker:

And I'm hoping, like you implied, that the skills

Speaker:

will be a guardrail, that it will protect this workflow.

Speaker:

It's like I feel like the workflow is at a point where it's near perfection.

Speaker:

It's never going to be perfect, but it's near perfection enough that I really want

Speaker:

it to follow it.

Speaker:

And it needs to be able to follow it over multiple hours.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

And I think that's a really important part is over time.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Because there's times where I see it, you know, midstream, there's regression.

Speaker:

It's like, oh, it's no longer like I'll try to give some concrete examples.

Speaker:

Like I would ask it to like it's part of a workflow and which is in a runbook.

Speaker:

So it has a file to check what it needs to do every time.

Speaker:

This is obviously not a skill.

Speaker:

I would say please print out the findings per model.

Speaker:

The severity.

Speaker:

Right.

Speaker:

A little brief description of what it was.

Speaker:

Yes.

Speaker:

Right.

Speaker:

And so that way, when I come over to check the logs later.

Speaker:

Yes.

Speaker:

Yeah, I can see.

Speaker:

OK, Deepsea found this.

Speaker:

OK, Codex found that.

Speaker:

All right.

Speaker:

And I also have some requirements for it to keep track of Deepsea usage since I'm

Speaker:

actually paying API spend on that.

Speaker:

Yes.

Speaker:

There's a lot of notes that I can go over and review later with Claude to optimize

Speaker:

things more, determine whether these models are worth keeping around.

Speaker:

Yeah, right.

Speaker:

And that's great.

Speaker:

I was actually curious.

Speaker:

So do you use these output contracts as the union mechanism as well?

Speaker:

Like, how do you you mentioned that there was all unions between the findings?

Speaker:

Yeah.

Speaker:

You rely on the contracts to essentially.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

Because they're supposed to wait.

Speaker:

The orchestrator is supposed to wait for every single one to finish first.

Speaker:

They're all running blind.

Speaker:

Nice.

Speaker:

All of them have optimized prompts per like like angle for that for that

Speaker:

model.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

For for that specific module, what I'm looking for.

Speaker:

Oh, nice.

Speaker:

Yes.

Speaker:

Yeah.

Speaker:

And it's preloaded again with all the trace patterns that it already knows it needs.

Speaker:

Nice.

Speaker:

And so as the reviews go along, it's only deltas.

Speaker:

And to be one question, just to clarify as well, like I said, contracts.

Speaker:

I didn't clarify.

Speaker:

It's a structure, right?

Speaker:

It's basically like a schema.

Speaker:

It's like, yes.

Speaker:

Yeah.

Speaker:

OK, so sorry, just quick clarification, but continue the workflow of the workflow

Speaker:

here.

Speaker:

Yeah, yeah, yeah.

Speaker:

So so how another part of refining the reviews that I remembered was instead

Speaker:

of reviewing the whole module every single time,

Speaker:

it would start to nail down on its own where the problems, where the seams are

Speaker:

really obvious is maybe a good way to put it.

Speaker:

And at a certain point, the majority of the reviews are only deltas.

Speaker:

What did we change?

Speaker:

What what needs more attention?

Speaker:

Did the fix implement correctly?

Speaker:

Did the test work right?

Speaker:

Yes.

Speaker:

And at a certain point it converges.

Speaker:

There's no more like I stage it like you had priority one, two or three.

Speaker:

Yes.

Speaker:

I have blockers, warnings,

Speaker:

defers, suggestions.

Speaker:

Yeah.

Speaker:

And nits.

Speaker:

Well, yeah, those those are that is actually also very common software engineering

Speaker:

methodology terminology.

Speaker:

Yes, I mean, Claude gave it to me.

Speaker:

Yeah, I mean, I asked it like, well, how would we set this up?

Speaker:

How would we define the different levels?

Speaker:

And that is what it chose for me.

Speaker:

And it made sense.

Speaker:

And I was like, oh, I'll go with it.

Speaker:

It makes sense.

Speaker:

It's pretty, pretty.

Speaker:

It's a good one.

Speaker:

Yeah.

Speaker:

OK, I think the one thing I will definitely say here is that sounds like

Speaker:

we're kind of, in a sense, also like converging onto a particular process,

Speaker:

because a lot of what I'm hearing is actually like very common with like a lot of

Speaker:

the capabilities are the way that, you know, we operate as a software company and

Speaker:

also like the way that my previous companies operated, like treating software

Speaker:

development more as a process.

Speaker:

And I'm even more curious now that as you have if you feel like you're transitioning

Speaker:

from more of a vibe coding to a more of a process driven approach or do you feel

Speaker:

like, oh, no, I don't actually want to get too much in the process, because that's

Speaker:

another big thing I've seen still, like I don't know how prevalent the term still

Speaker:

is.

Speaker:

I still think it's very prevalent.

Speaker:

Vibe coding is I hear it all the time.

Speaker:

So I want to get your take on do you feel like going away from vibe coding and more

Speaker:

engineering is a good thing or do you feel like I don't necessarily want to go in

Speaker:

engineering because there's also a traditional if you talk to people in the

Speaker:

industry, they're like software engineering is so slow is what it be like.

Speaker:

It's a very common thing to hear it as well, because, yes, there are processes and

Speaker:

rituals that make it slower.

Speaker:

So do you feel like you want to avoid becoming software engineering or what's your

Speaker:

take on that?

Speaker:

I'm curious.

Speaker:

Oh, I think.

Speaker:

I think I think this kind of puts everything we've talked about a little bit more.

Speaker:

No, I think it like all comes together here.

Speaker:

I was complaining earlier about New World and the feeling at launch.

Speaker:

Right.

Speaker:

Not being able to scale when clearly there it was Amazon who made the game, they own

Speaker:

AWS.

Speaker:

They have all the skills for this.

Speaker:

What happened?

Speaker:

It didn't make any sense to me, but uh,

Speaker:

they like take everything together.

Speaker:

One of the biggest focuses on this is I would like to do it right.

Speaker:

Yeah.

Speaker:

And and to do it right will require some standardization of processes.

Speaker:

So I guess to answer your question directly, I do think that this is becoming more

Speaker:

process driven than purely vibe coding.

Speaker:

Now, I guess maybe day one it was vibe coding because I didn't really have a

Speaker:

structure to anything yet.

Speaker:

Right.

Speaker:

It was a blank slate for me.

Speaker:

This is the first time I've, I've like, you know, I've written code manually before

Speaker:

in classes and stuff like that, but not two hundred thousand lines of anything.

Speaker:

Oh, yeah.

Speaker:

Oh, yeah.

Speaker:

And one thing I will throw out there is I think, Andrew, we were also talking a

Speaker:

little bit about your profession and since admin is still very there is a lot of

Speaker:

systems that you need to put together.

Speaker:

Right.

Speaker:

Right.

Speaker:

There's a lot of connections.

Speaker:

There's a level of architecture that also you need to consider is like, what is what

Speaker:

are these things actually do and understand them enough enough depth that you can

Speaker:

put them together.

Speaker:

And I feel like, you know, like we talked about the perspective that you're bringing

Speaker:

here, we also talked about this admins in particular, maybe being a really good

Speaker:

audience for this kind of tooling because, you know, software engineering

Speaker:

can be is a very wide term and a software engineer does a lot of things.

Speaker:

You can have like specialized roles.

Speaker:

You can also have generalists.

Speaker:

I feel like this admins, you have to be a generalist up to a large extent because

Speaker:

you're working directly with people and your scope is always huge.

Speaker:

So a lot of what we can do.

Speaker:

Right.

Speaker:

We do try to automate as much as possible.

Speaker:

Right.

Speaker:

That's how we have many more hands.

Speaker:

Solve it once.

Speaker:

Yeah.

Speaker:

Get it fixed one time.

Speaker:

Yep.

Speaker:

We do thankfully have other teams that can take different things like help desk and

Speaker:

stuff like that.

Speaker:

So we're not necessarily doing like all of the front line stuff all the time.

Speaker:

But when it comes to like, like campus infrastructure, networking issues, look at

Speaker:

the websites down or, you know, like, like something like that, that would

Speaker:

definitely fall into take for granted.

Speaker:

Yeah.

Speaker:

And into our wheelhouse.

Speaker:

I guess I think I got away from your question, though.

Speaker:

Could you could you repeat it?

Speaker:

The main aspect being a risk perspective is I do feel like you're building

Speaker:

something in your personal time that you also realize there is only so much time, so

Speaker:

much token budget that you have and a combination of that.

Speaker:

And also wanting to deliver something for you during your personal like for your

Speaker:

personal project.

Speaker:

And do you feel like at this point is risk of token

Speaker:

expenditure, maybe something that drives your decision towards going more into like

Speaker:

a process oriented approach?

Speaker:

Absolutely.

Speaker:

Or do you think it's maybe also a combination of like the experience you've had of

Speaker:

like orchestrating these systems for your professional life?

Speaker:

Or maybe it's a combination of both.

Speaker:

That's interesting.

Speaker:

I think specifically for Minecraft, it's more informed by my previous experiences

Speaker:

running LuxWander.

Speaker:

Ah, yeah.

Speaker:

OK.

Speaker:

Actually.

Speaker:

And actually, I just realized that as a keyword.

Speaker:

I just dropped right in there.

Speaker:

We're going to have to bleep that out later.

Speaker:

No, no, no, it's OK, it's OK.

Speaker:

I think it's more inspired by that.

Speaker:

There's a lot of ways where I could maybe like tweak it to parallel to some things

Speaker:

that work, but I think it's more so a lot of the development and what drives a lot

Speaker:

of the development.

Speaker:

And I think it's a lot of the decisions, it goes comes directly from experience that

Speaker:

I had running LuxWander before, like, and I think like I

Speaker:

guess is like a little fun story when when LuxWander first released in 2010,

Speaker:

it was, you know, Minecraft pre-alpha.

Speaker:

That was, I don't like to say it, 16 years ago.

Speaker:

Yeah, it was 16 years ago.

Speaker:

Oh, yeah.

Speaker:

And don't remind me.

Speaker:

And

Speaker:

you know, I released like some advertisements online, like on like Minecraftforms

Speaker:

.net, right?

Speaker:

Like I had like a thread, you know, and there was rudimentary,

Speaker:

you know, multiplayer Minecraft.

Speaker:

It crashed all the time.

Speaker:

You know, parts of the map corrupted all the time.

Speaker:

Updates were coming every day, you know.

Speaker:

So it almost feels like your original, that predated all of your professional

Speaker:

experience.

Speaker:

So yes, yes.

Speaker:

But your love for actually building this community and this actual like version of

Speaker:

Minecraft that everybody could enjoy in the way that you wanted it to actually was a

Speaker:

more of a driver, essentially, even today continues to be more of a driver to

Speaker:

building something really awesome, not to say that your professional experience

Speaker:

doesn't help a little bit here and there, but maybe it's a combination of like

Speaker:

wanting to build something and having the, you know, like a wish to build something,

Speaker:

but also like a little bit of the learnings that you've had over time, you know,

Speaker:

professionally and personally in your previous experience.

Speaker:

My reasoning for this is like some of the move from a project driven

Speaker:

approach of let's just like vibe coded and hope it works and hope it works like we

Speaker:

can deploy it versus a, oh, I actually have built something that people have used

Speaker:

and I want to build something again that people have will use and will really love

Speaker:

is a very strong driver for saying that I'm not just playing around.

Speaker:

I'm building something from like a place of wanting it to be successful.

Speaker:

And that is maybe also part of like and the risk of like my risk is I actually want

Speaker:

to build something as bad either, and that is like a personal feeling about it as

Speaker:

well.

Speaker:

But it's driving you to now make decisions that are resulting in like more definable

Speaker:

processes and actually improving the quality as well as reducing the cost so you can

Speaker:

actually finish and get it out the door.

Speaker:

And I mean, I said a lot of things.

Speaker:

Let me let me maybe finish it up and ask a question is, do you feel like

Speaker:

these tools essentially have really actually enabled you or do you feel like

Speaker:

these tools are just giving you more of a mirage of like getting the AI tools,

Speaker:

Claude in particular so far?

Speaker:

Was it so far?

Speaker:

So that is a really interesting question because I cannot say definitively

Speaker:

yet until I start testing it right now.

Speaker:

So.

Speaker:

So so I think ask me in a few weeks again.

Speaker:

Yeah, because because right now it's like undefined.

Speaker:

I don't have any proof yet besides the development.

Speaker:

And so I would like to answer that confidently.

Speaker:

But I guess I can answer the kind of like the idea before that.

Speaker:

I do think it's quite empowered me to actually create something that I've always

Speaker:

wanted to do.

Speaker:

Right.

Speaker:

There is there is a lot of things, a lot of like a big wish list of stuff that I had

Speaker:

that last time I ran Lux Wanderer,

Speaker:

but I just didn't really have the manpower.

Speaker:

I didn't have like the skills, you know, that's funny.

Speaker:

Yeah.

Speaker:

You still don't know.

Speaker:

You will.

Speaker:

You will.

Speaker:

You will.

Speaker:

Skills creator.

Speaker:

I will soon.

Speaker:

And

Speaker:

it has closed the gap, though, that I think like you said, it's democratizing,

Speaker:

you know, I guess, intelligence.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And then, you know, it might all just be an act.

Speaker:

And I think that is something that we will have to see if it's if it is a

Speaker:

act of intelligence.

Speaker:

That's all right.

Speaker:

And at that point, I think we got to end the show.

Speaker:

Thank you.

Speaker:

Thank you.

Speaker:

We'll have to come back and see if it is.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

Cut the cut.

Links

Chapters

Video

More from YouTube