If coding is "solved," is software engineering? Ajay Medury (software engineer) and Andrew Sierota (systems engineer) pick up where Episode 1 left off and get into the part that isn't solved: judgment. They trade notes on why weekly usage limits have quietly become the real project budget, what it's like to build a sharded Minecraft world solo as both product manager and principal engineer, what Amazon's New World got wrong about scale, running decorrelated multi-model code reviews, and what an AI "skill" actually is. It might all just be an act of intelligence.
In this episode:
Chapters:
Hello, my name is Ajay Medury, and I'm a software engineer, and today I'm
Speaker:joined by...
Speaker:My name is Andrew Sierota, and I'm a systems admin.
Speaker:Awesome, and today we are here to talk about various topics
Speaker:in the AI ML space for a podcast that we have coined
Speaker:Active Intelligence, because we're trying to figure out if it's real intelligence or
Speaker:is it just acting?
Speaker:And this podcast is for all aspiring creators, creatives, and
Speaker:builders.
Speaker:Or those who have already been doing it for a while and just are looking for new
Speaker:tools and maybe ways to improve their workflows.
Speaker:I'm curious if, you know, one of the things we talked about last time was
Speaker:particularly like, you know, software engineering, writing code might be a solved
Speaker:problem, but is software engineering the solved problem?
Speaker:And I think, yeah, I was curious.
Speaker:Yeah, yeah, picking up where we left off on the last episode there, I remember us
Speaker:saying that coding could be solved, right, but software engineering definitely
Speaker:isn't, and I think I was touching on this while we were chatting just before the
Speaker:show, you know, coding is basically, you know,
Speaker:Claude can write code all day long, faster than anyone can
Speaker:humanly.
Speaker:Mm-hmm.
Speaker:It'll figure it out if you give it enough time, enough tokens.
Speaker:Oh, yeah.
Speaker:But judgment isn't free.
Speaker:Yes.
Speaker:And that's where humans are still very valuable, is judgment.
Speaker:And that can be really expensive.
Speaker:It could be a really expensive mistake if you have poor judgment on the usage of
Speaker:your code.
Speaker:And talking about that in particular, here's like bad judgment in terms of
Speaker:accidentally putting a vulnerability out there that could now all of a sudden be
Speaker:discovered by models much more easily.
Speaker:The cost of that is pretty intense.
Speaker:Yeah, yeah.
Speaker:And I don't remember what the, that, there was the project that Anthropic did with
Speaker:like the 30 big companies.
Speaker:Yes.
Speaker:To like the pre-release of Mythos.
Speaker:Yeah.
Speaker:And they were supposed to like patch everything, supposedly, before they released
Speaker:Fable.
Speaker:Yes.
Speaker:Right.
Speaker:That's funny, 4.8, Opus 4.8 was only released like two weeks ago.
Speaker:That's.
Speaker:And then less than two weeks later, we have Fable now.
Speaker:Three days later, we don't have Fable.
Speaker:Which is really interesting to me because traditional software engineering took a
Speaker:lot more time, had a lot more rituals potentially.
Speaker:And, you know, again, we kind of broached the subject last time is, were those
Speaker:rituals still meaningful?
Speaker:Like, do those still make sense to do today?
Speaker:Like, because I can't even imagine a time like four or five years ago where you'd be
Speaker:able to release a, you know, pretty significant version and then release the next
Speaker:major version within weeks later.
Speaker:I think you'd be waiting months between these kind of releases.
Speaker:So.
Speaker:Yeah.
Speaker:Kind of, kind of hard to keep up with, to be honest.
Speaker:You know, there's, there's so much changing so quickly.
Speaker:By the time this is released, you know, who knows what would have changed.
Speaker:Yeah.
Speaker:Yeah.
Speaker:I think that a lot of what we're doing now, and I guess
Speaker:something that I've done, like, obviously the world's changing every day.
Speaker:There's new tools, you know.
Speaker:It's hard to, you know, want to use the latest and greatest all the time.
Speaker:Yes.
Speaker:And, but also keep adding new requirements.
Speaker:To your project, right?
Speaker:Yes.
Speaker:Because, like, there was a point in time where in the, my Minecraft project right
Speaker:now, I was just adding so much stuff because I was like, oh, yeah, I like this thing
Speaker:can code everything for me.
Speaker:Oh, yeah.
Speaker:That's no longer a limit, but then once I started to get to, well, is it going to
Speaker:work, right, there's just too many things to check.
Speaker:Yeah.
Speaker:And, and I think I remember last episode, I said, I think, you know, testing would
Speaker:be 10 times as much as the production.
Speaker:Yeah.
Speaker:I'm actually thinking it's going to be 100 times more than the development now.
Speaker:Yeah.
Speaker:The realization is now dawning.
Speaker:It's like, oh, no, this, there's a lot more.
Speaker:I gave it the ability to build all this stuff.
Speaker:The time it will take for me to now validate that, yeah, it just feels.
Speaker:It's going to be a lot.
Speaker:Yeah.
Speaker:Yeah.
Speaker:And I think that's the interesting part is because back when, back when I was in my
Speaker:previous company, we were a cloud company.
Speaker:And so we were trying to launch systems for others to use.
Speaker:The expectations there were always like.
Speaker:Let's maybe be a little more restricted because if we can restrict the size of the
Speaker:system that we're like actually shipping out there, we can maybe do a better job of
Speaker:building it and reduce things like risk, which is I'm imagining a question that
Speaker:Anthropic is asking right now is what is the risk of actually releasing this model?
Speaker:So it becomes a similar question.
Speaker:I think we had, we had those kinds of conversations all the time where it's like,
Speaker:all right, do we actually do less intentionally?
Speaker:And the answer at times was yes.
Speaker:Yeah.
Speaker:We should do less intentionally.
Speaker:I don't also think that always applies towards other kinds of systems like our
Speaker:projects, right?
Speaker:I do feel like we were briefly talking about this before the podcast about like, oh,
Speaker:what is, you know, like, what does software engineering look like versus vibe
Speaker:coding?
Speaker:And there is definitely a good number of things I can bring up when I start talking
Speaker:about it.
Speaker:Though I do want to say like one distinction we kind of were, maybe we liked the
Speaker:idea of it is software engineering.
Speaker:Like in the big tech companies, like a well-oiled process, it's a repeatable thing
Speaker:that they keep repeating to keep, you know, churning out new features, new products
Speaker:and so on and so forth.
Speaker:However, I do think that the vibe coding and more like, you know, building
Speaker:locally, building a system today, like a lot of engineers who do it outside of the
Speaker:big tech, I feel like it's more like a project where the project is something you
Speaker:kind of figure out how to execute the project as you go along.
Speaker:There isn't just an answer for every single thing.
Speaker:Like you don't get told like, oh, this is where you, you know, release the code,
Speaker:this is where you talk to next.
Speaker:You know, you don't go step one, step two, step three with the actual like new world
Speaker:of building systems, building like projects.
Speaker:I do feel like there's a lot more variance.
Speaker:And Andrew, it sounds like, sounds like you're kind of experiencing some of that,
Speaker:right?
Speaker:If I'm getting, if I'm getting it right.
Speaker:Well, absolutely.
Speaker:And it's funny, like, like in enterprise, you have budgets, you have deadlines, you
Speaker:have a boss who's breathing down your neck.
Speaker:Yeah.
Speaker:Yeah.
Speaker:But, but when, when you have your own project, all of a sudden, the bigger
Speaker:limits is, especially when you're not paying for API, you're paying for a
Speaker:subscription like ClaudeMax, CodexMax.
Speaker:The biggest, the next big limit is your usage every week, your usage in every five
Speaker:hour window.
Speaker:And that's actually the budget now that I'm working with.
Speaker:Like, like I have to calculate, like I got a 20 X max subscription for Codex and
Speaker:Claude.
Speaker:I can use that up a weekly limit in two days.
Speaker:And I have two subscriptions.
Speaker:So I can code for four days a week.
Speaker:That's your budget.
Speaker:Yeah.
Speaker:That's my budget.
Speaker:And, and then I have to think, well, that's just testing, reviewing the code.
Speaker:And I haven't even really reached the stage where I'm doing practical tests.
Speaker:Yeah.
Speaker:Yeah.
Speaker:Real, real world.
Speaker:Yeah.
Speaker:Yeah.
Speaker:And so that's why I said, I'm like, wow, even with this process, I'm going to be
Speaker:able to do it.
Speaker:But as I'm practically fully automated now, it's still going to take a lot of time
Speaker:because budgets aren't infinite for most projects.
Speaker:Yes.
Speaker:Yes.
Speaker:And I think that's the idea of like within this budget for this project, how can I
Speaker:actually figure out to execute the thing that I care about?
Speaker:And I think that's the process of like, oh, wow.
Speaker:I'm saying process project over and over again.
Speaker:Maybe I'll say like, there is a self-reflection that needs to happen in projects to
Speaker:feel like, Hey, what is, what is done?
Speaker:done.
Speaker:Like when do I actually, as you said, like earlier in your project, you're actually
Speaker:like do more.
Speaker:And eventually you started to realize like, okay, this might be a little too much
Speaker:because then the amount of stuff that I can validate and actually make sure the
Speaker:quality is good might be growing so quickly.
Speaker:Then you have to take a judgment call, which actually lets you decide, or then you
Speaker:have to, the AI won't do this for you.
Speaker:So as a builder, you need to take a judgment call, say that, no, we're going to stop
Speaker:here.
Speaker:We're going to actually now figure out the validation.
Speaker:We're going to start figuring out if everything's working.
Speaker:At least I think that's the way I'm thinking about it.
Speaker:I don't know if you feel a similar, like, do you feel like that's the process?
Speaker:Yeah.
Speaker:Yeah.
Speaker:I, to go along with that for me, how I've kind of thought about it is, you know, I
Speaker:broke the project down into phases, right?
Speaker:And I was like, okay, we need, what can we start out with?
Speaker:You know, an MVP, right?
Speaker:Like what can we start out with?
Speaker:Even that is pretty big in scale.
Speaker:But at least that once that's done, there's a foundation for the later updates.
Speaker:Right.
Speaker:Um, and so right now I think that's why this first phase, even though I've already
Speaker:reduced the scope, like I have like 16 planned modules, I'm only coding like, you
Speaker:know, eight of them.
Speaker:And one of them is a really big commons module.
Speaker:Yeah.
Speaker:But the thing is that they're being coded, but a lot of them are just skeletons for
Speaker:the next phases.
Speaker:I mean, I think, uh, I may say eight modules, eight modules.
Speaker:Yeah.
Speaker:I actually feel like that's a good thing.
Speaker:Usually being able to break it down into smaller pieces.
Speaker:So you actually then go and if something breaks, you want to ideally be able to
Speaker:focus on one module.
Speaker:Oh, absolutely.
Speaker:Absolutely.
Speaker:And do you feel like that's kind of the, it has worked pretty effectively?
Speaker:So, so one of the modules that I have, um, has been broken down into like five sub
Speaker:modules.
Speaker:All right.
Speaker:Okay.
Speaker:Cause, cause there were, there were big enough elements individually
Speaker:that I thought, you know, we need to break this down a little bit more, but for the
Speaker:sake of the project planning, those modules are now instead of oh four, oh four, a,
Speaker:b, c.
Speaker:I don't want to go ahead and change all the other module numbers to squeeze those
Speaker:in.
Speaker:Yes.
Speaker:Yes.
Speaker:You don't want to renumber everything.
Speaker:You just want to let it be a submodule of the existing.
Speaker:Exactly.
Speaker:So those are just submodules now.
Speaker:Um, at the end of the day, um, it's all about, can we maintain it?
Speaker:Um, yeah.
Speaker:And I think interestingly, what you're experiencing is a, there, there's a little
Speaker:bit of a mirror.
Speaker:There's like the other side of the coin.
Speaker:If you look at software engineering, traditionally, I think we were saying there's
Speaker:processes that once one follows, uh, and the interesting thing is like what I've
Speaker:experienced is when you're trying to actually build something, build something new,
Speaker:usually you're getting requirements from somebody.
Speaker:You're actually getting requirements from a product manager or from leadership
Speaker:saying that, Hey, we've identified, uh, this opportunity in the market.
Speaker:Can you go scope it out?
Speaker:Can we actually go figure out the process actually entails that, right?
Speaker:Like the process says, uh, leadership identified an opportunity product manager.
Speaker:Now it goes and figures out what that opportunity is like, how big is it?
Speaker:How much scope is it?
Speaker:How many things need to be built out to capture that opportunity?
Speaker:And then the software engineering folks get pulled in the senior, you know,
Speaker:principal folks get pulled in, uh, where they're like, okay, what modules do we need
Speaker:to actually make this?
Speaker:Oh, do we need new products?
Speaker:Do we need eight modules?
Speaker:Do we need three?
Speaker:Like, is it one good enough?
Speaker:Uh, and interestingly, uh, you know, there's a lot of
Speaker:things we can do with confused systems.
Speaker:There's a lot of things we can do with skilled systems.
Speaker:And even with, uh, you know, like the, uh,
Speaker:the, the, the, the, the applications that we, that we wanted to very, very much, we
Speaker:didn't want to, we wanted to shift that to the, to the, uh,
Speaker:you know, the, and the, the second one, we were, we were really thinking
Speaker:about how we can start working with new things.
Speaker:All of these things are very different than what it would take traditionally the big
Speaker:tech companies to do, right?
Speaker:Because they need individuals to write the code in the past.
Speaker:Maybe not anymore.
Speaker:Yeah, yeah.
Speaker:Hmm.
Speaker:I think, you know, as I've been letting Claude code it, planning the
Speaker:project out, having the spec sheets and everything like that, I begin to realize
Speaker:that I kind of wish I actually shrunk it even more.
Speaker:Shrunk it even more?
Speaker:Like a lot more, actually.
Speaker:A lot more.
Speaker:Because right now I have, like, if we, I'm not sure we talked about, like, the
Speaker:architecture of it all, but there's going to be, it's going to be a sharded system.
Speaker:We're going to have nine separate worlds.
Speaker:They're each going to be very smoothly transitions for players to go between them.
Speaker:But that requires a lot of extra networking code on top of Minecraft already.
Speaker:And I was thinking, I could have actually just done just one server.
Speaker:Single host, yeah.
Speaker:Single host.
Speaker:Exactly.
Speaker:Ignored all the extra networking stuff.
Speaker:And, you know, I'd probably already be in game testing right now.
Speaker:Mm-hmm.
Speaker:Because I think that that extra layer, on top of all the other things that I wanted,
Speaker:Yes.
Speaker:is actually, that's the complicated part that Claude is spending a lot of time on.
Speaker:Yeah.
Speaker:There's a lot of gotchas that, on the first pass, it's not going to catch.
Speaker:Yes.
Speaker:And I thought about it.
Speaker:I was like, maybe I should.
Speaker:But, you know, I've already spent so much.
Speaker:So you are the product manager and the principal engineer, like, dealing with this
Speaker:at the same time.
Speaker:Yes, and I'm like, okay, well, it's going to be worth it when it works, but I will
Speaker:get back to you on when it works.
Speaker:And this is where I do feel like the traditional software engineering and big tech
Speaker:would have been like, oh, no, we failed.
Speaker:Because the process should have already caught this at some point.
Speaker:So I feel like that's the interesting differentiation I see right now.
Speaker:Which is not always to say it was a good thing.
Speaker:Because I do feel like going through this process of, like, or the project approach
Speaker:of, like, let's go start, let's just see what we can build, and, you know, let's
Speaker:make it unrestricted, right?
Speaker:We're going to learn a lot more.
Speaker:And I think that means we're actually going to figure out things more individually.
Speaker:And I'm curious if you feel like doing the project in the way that you have done it
Speaker:so far has actually taught you a lot more of what you would avoid next time.
Speaker:And because you are mentioning that maybe you should have started smaller and, like,
Speaker:started more restrictive.
Speaker:And I'm curious if you feel like doing the project in the way that you have done it
Speaker:so far has actually taught you a lot more of what you would avoid next time.
Speaker:There's, like, history in software engineering that kind of points us to some of
Speaker:this stuff, right?
Speaker:And even then, a lot of software engineers don't actually follow it.
Speaker:A lot of companies don't actually follow it.
Speaker:There's no guarantee it's going to actually be repeatable and working.
Speaker:So individual builders now have the same similar power.
Speaker:And I'm curious, are you, like, do you feel like this is one of those moments where
Speaker:you're like, all right, I'm going to go ahead and let this run as it does.
Speaker:But next time, I'm actually going to do single host or try and figure out what
Speaker:single host looks like.
Speaker:Yeah, I have a couple ideas of new projects, I suppose.
Speaker:And I thought about it a little bit.
Speaker:I do think that the stuff I'm doing now is informing future
Speaker:projects.
Speaker:I definitely would have done it a lot differently.
Speaker:And another part of it is I constantly do change what I do
Speaker:all the time.
Speaker:Like, half of the time.
Speaker:Half of my time is spent actually optimizing the workflow, thinking about where can
Speaker:we cut costs?
Speaker:Where can I cut usage?
Speaker:For example, I have directed Opus to actually use
Speaker:Sonnet to implement fixes now.
Speaker:Yes.
Speaker:Because, I mean, Opus will write the, you know, the plan and then Sonnet's pretty,
Speaker:pretty good at implementing it.
Speaker:And at the end of the day, Opus will still review the changes.
Speaker:So I'm still gaining it behind the front-tier model.
Speaker:And it's not just Opus, right?
Speaker:If I'm not mistaken.
Speaker:Yeah.
Speaker:Right.
Speaker:And it's not just Opus.
Speaker:Yes.
Speaker:There's other reviewers.
Speaker:I have GPT 5.5 as well.
Speaker:DeepSeq.
Speaker:It's very cheap.
Speaker:It's very cheap.
Speaker:It's very cheap.
Speaker:You're welcome, China.
Speaker:But, yeah, I think the decorrelated reviews have saved a lot of money,
Speaker:actually, because I'm having three different models review the code at the same
Speaker:time.
Speaker:And there's a lot of use.
Speaker:There's a lot of unions in their findings, and there's a lot of findings that they
Speaker:individually would not have picked up.
Speaker:They're all trained on different data, and that gives you three different
Speaker:perspectives.
Speaker:Perspectives, yeah, yeah, I love that.
Speaker:And I think that's really important when you're trying to build a system that's
Speaker:robust, and that's actually what I'm trying to do with Minecraft.
Speaker:Like, the technology behind what's going on here is intentional to be robust,
Speaker:because there is a lot of different communities that have done similar things.
Speaker:But not to this scale, and not, like, even
Speaker:normal MMOs have failed at doing this.
Speaker:Actually being able to scale out.
Speaker:Yeah, actually being able to scale and to, like, allow, like, seamless gameplay.
Speaker:It remains to be seen if I can accomplish this myself, because I get a little
Speaker:concerned, thinking, like, why hasn't, you know...
Speaker:I was going to say, for our listeners, can you maybe, like, do you have a specific
Speaker:thing that you can talk about where the scale was required?
Speaker:Yes, yes, what was that game?
Speaker:Amazon Game Studios, we played it a little bit.
Speaker:Oh, yes, yes, New World, New World.
Speaker:New World, and see, that was the game that I thought was, I thought Amazon was going
Speaker:to solve this problem.
Speaker:And when you say the solve this problem, which problem, if I may say, the problem in
Speaker:particular.
Speaker:The problem of scale.
Speaker:The problem of when your game releases, you have millions of players that arrive,
Speaker:and all of a sudden you have a 30,000 player queue.
Speaker:Yes, yes, yes, yes, okay, yeah, yeah.
Speaker:And not only that, it's not one big world.
Speaker:There's hundreds of worlds that have lines.
Speaker:Yes.
Speaker:They're crashing left and right.
Speaker:And I was like, come on, AWS, Amazon, had to have had the resources and know-how
Speaker:to do this.
Speaker:But yet, they made the same mistake as every other predecessor before them.
Speaker:And I think that's the really interesting part, is, like, I'm also curious, like, if
Speaker:I were to go back and ask them that question, what part broke, right?
Speaker:Like, what was it the fact that now you would have to...
Speaker:Just have n number of players on the map at the same time?
Speaker:Was it that they're trying to communicate with each other over voice or something?
Speaker:And that was essentially what was causing the breakage?
Speaker:I'm curious of, like, what was their bottleneck?
Speaker:Because, sorry, yeah, you had something in mind?
Speaker:I do have something in mind.
Speaker:Like, I played a lot of different MMOs.
Speaker:A sharded system is really common, right?
Speaker:The issue, I think, that Amazon...
Speaker:That Amazon had with New World was they built the game like it was any other MMO.
Speaker:They did not take advantage of their expertise.
Speaker:From the get-go.
Speaker:Yeah, they built their own game engine, but they didn't do anything unique.
Speaker:Yeah.
Speaker:They didn't structure it in a way where they could scale it automatically.
Speaker:Yes.
Speaker:Right.
Speaker:And maybe they did, but it didn't work.
Speaker:The process failed.
Speaker:It didn't work.
Speaker:The risk was not assessed properly.
Speaker:Yes.
Speaker:I mean, there were...
Speaker:We played launch.
Speaker:We literally could not play for a few days.
Speaker:Yeah.
Speaker:Yeah.
Speaker:We actually gave up on the weekends.
Speaker:Yeah.
Speaker:Yeah.
Speaker:Yeah.
Speaker:We had to wait a few days.
Speaker:And to me, that's millions of dollars being lost.
Speaker:Yeah.
Speaker:I mean, I think New World would be a completely different game.
Speaker:If it was built with that scale from the get-go.
Speaker:If it was built properly from the get-go.
Speaker:Yeah.
Speaker:And I think that's the interesting, like, trade-off there as well.
Speaker:Like, this is...
Speaker:My understanding is also, like, traditional software engineering would also ask you
Speaker:that question is, do you know if you need that scale?
Speaker:Like, do you know if the marketing has been effective enough?
Speaker:And do you know if the product...
Speaker:Does the product manager actually...
Speaker:Have they talked to the marketing department and seen an insane amount of, like,
Speaker:interest?
Speaker:And have they been able to calculate the amount of interest to then inform the scale
Speaker:decision?
Speaker:Because it is very common for software engineering teams during this process of
Speaker:building the product to ask this question of, like, hey, do we want to be scalable
Speaker:to, like, two million players on day one?
Speaker:Or do we want to be scalable, you know, like, one world will be scalable to, like,
Speaker:100,000 users at a time, that kind of thing.
Speaker:And they make these decisions so that they can, you know, like, punt some of the
Speaker:very complicated, very difficult things, in this case being, like, instead of
Speaker:splitting this module into six different sub-modules, module three into six
Speaker:different sub-modules, I'm just going to make module three into two sub-modules for
Speaker:today.
Speaker:And that'll satisfy my needs for now.
Speaker:In this case, it feels like that process was a failure because somewhere, somehow
Speaker:they didn't understand that the demand was so high and one of the core values as
Speaker:gamers, as, like, people who enjoy playing games, waiting to get in to play your
Speaker:game is a game-breaking experience.
Speaker:Especially when you're super excited and you pre-ordered the game.
Speaker:Yeah, and you paid extra.
Speaker:Yeah, you paid extra.
Speaker:And it's such a shame because I actually did, like, once we finally did play.
Speaker:Yeah, it was fun.
Speaker:It was fun.
Speaker:But the game quickly died off
Speaker:because, I mean, you had millions of players who could not play.
Speaker:Yes.
Speaker:Day one.
Speaker:And then sometimes your friends would end up on the other server or the other world.
Speaker:Yeah.
Speaker:And they couldn't come and join you.
Speaker:So all of a sudden, one of the main reasons I play games is a social, like, it's a
Speaker:social thing for me.
Speaker:I want to play with other people.
Speaker:I want to play with my friends.
Speaker:And if I can't play with my friends, I'm going to find a much more difficult return.
Speaker:Absolutely.
Speaker:So I do genuinely, like, question.
Speaker:That's the traditional software engineering method.
Speaker:So, like, waits for such a long span because the cost of solving these problems tend
Speaker:to be, again, you're making commitments to your boss.
Speaker:Yeah.
Speaker:You're answering to leadership.
Speaker:You're saying that, all right, leadership is saying that, OK, you have a budget of
Speaker:these many people for these many weeks.
Speaker:And if you can get the game released in those weeks, great.
Speaker:Otherwise, we're going to maybe, like, go a few, you know, not give you a promotion,
Speaker:whatever it is, right?
Speaker:The value proposition is so different versus I have seen indie games that have been
Speaker:so successful and they haven't.
Speaker:But I also do understand that they don't have that same pressure of, like, you know,
Speaker:corporate top down of, like, you need to release this soon, they will release it
Speaker:when they want.
Speaker:So I'm actually also curious.
Speaker:Do you feel like do you feel pressure to release your project?
Speaker:So thankfully, I actually have not released public information on this yet.
Speaker:Oh, so so if someone finds this podcast, they will recognize my voice.
Speaker:Then then they're going to start asking questions immediately.
Speaker:There there are some people who know that it's coming.
Speaker:I haven't given any hard timelines myself because this is,
Speaker:you know, a project that I'm figuring out as I'm going along.
Speaker:Absolutely.
Speaker:But there is pressure because I think I set up a lot of my own
Speaker:deadlines in my head.
Speaker:Right.
Speaker:There's I have a lot of expectations of where I should be by a certain time.
Speaker:And that's part of the workflow.
Speaker:And I'm like, OK, I'm going to trim this.
Speaker:I'm going to you know, I'm going to accept that, you know, a lot of these things
Speaker:aren't exactly as I want them, but I'm going to leave it and we're going to move on
Speaker:and just try to get this working right now.
Speaker:And I actually think I'm trying to get to in game testing as fast as possible,
Speaker:because like you were saying, like before the podcast, the practical testing can
Speaker:serve way it's way more efficient than just, you know, traditional
Speaker:tests.
Speaker:Yeah.
Speaker:Yeah.
Speaker:And I think that's a really interesting topic.
Speaker:Actually, we talked about we will jump into that a little more after.
Speaker:I do want to ask.
Speaker:But you don't feel at this point you don't feel like you would compromise on certain
Speaker:aspects, though, despite the time pressure, there are certain things that you are
Speaker:very much like this is a critical thing based on your experience.
Speaker:Yes.
Speaker:Yeah.
Speaker:Based on my experience, there's a lot of things at this point that I'm I'm holding
Speaker:on to, which is really interesting to me, because when you talk about big
Speaker:corporations and Amazon releasing it, the distance between the person who actually
Speaker:understands the experience and the person building the experience is actually non
Speaker:-trivial.
Speaker:So I do feel like in big tech or in generally like big software organizations, that
Speaker:is something that is I I'm really excited about the, you know, like coding tools and
Speaker:all of these things becoming much more democratized, because the person who actually
Speaker:understands the most about the experience now can actually ask direct questions
Speaker:about, like, which parts of the experience are actually going to be implemented
Speaker:versus not.
Speaker:And I feel like I made the joke by you being the PM and the, you know, sometimes
Speaker:that's I do feel like that that's a good thing, because I actually feel like you're
Speaker:able to not only understand, but ask questions and actually also get into focus in
Speaker:the right way.
Speaker:Absolutely.
Speaker:Though the testing part is still like an all huge, you know, open box.
Speaker:And I guess I'm curious.
Speaker:So, like, we were talking about certain number of lines of test versus code.
Speaker:And do you want to share?
Speaker:Yeah.
Speaker:Yes.
Speaker:I think
Speaker:Project now has approximately two hundred thousand lines of code.
Speaker:And it's about it's not exactly fifty fifty, but it's a little bit.
Speaker:It's about fifty fifty.
Speaker:It's going to be by the end of it.
Speaker:I also meant the two hundred thousand lines is non-trivial.
Speaker:Yeah, that that is a lot.
Speaker:And I'm clearly not, you know, hand reviewing any of this.
Speaker:But there is plenty of standards and conventions that we talked about last time that
Speaker:are being taken into consideration during the iterative review rounds.
Speaker:And actually to speak on that, give a little bit more since I've actually interfaced
Speaker:with the project since then, quite a bit.
Speaker:One module took 90 review rounds to converge into
Speaker:what I at a certain point I actually had started to strip away requirements for the
Speaker:testing.
Speaker:Is that the module that you broke up or is that the module?
Speaker:That's the module that I wrote.
Speaker:OK, OK.
Speaker:That explains all.
Speaker:That does explain all.
Speaker:Yeah.
Speaker:It did finally convert.
Speaker:Into where it was mostly like comments and code were just not consistent with
Speaker:previous changes and at a certain point, I'm like, OK, we're going to keep finding
Speaker:things forever.
Speaker:Yeah.
Speaker:And so I'm like, I think this is a good place to stop.
Speaker:Yes.
Speaker:And actually, since I actually went through a review review process
Speaker:with the review cycle.
Speaker:Yeah.
Speaker:With Claude.
Speaker:And I said, OK, let's take a look.
Speaker:I had it log all 90 rounds.
Speaker:You reflected on the review process of like this 90 iterations.
Speaker:OK, well, OK, yeah.
Speaker:Tell us more.
Speaker:I had it.
Speaker:I had it, you know, from the get go log all 90 rounds.
Speaker:It logged everything from all the different the three models that I used to review
Speaker:things.
Speaker:Really good space to do a reflection on 90 rounds is a lot.
Speaker:I was like, I was like, I burned a lot.
Speaker:Like
Speaker:and so it came back and looked through everything and it gave me.
Speaker:So I was like, what?
Speaker:I asked it.
Speaker:You know, simple English.
Speaker:Yeah.
Speaker:What can we do to lower the amount of rounds?
Speaker:Give me the executive review.
Speaker:Yeah.
Speaker:And it came back with a lot.
Speaker:I'm not exactly familiar with maybe all the terminology specifically.
Speaker:But there were things like I remember it saying, like, it will add it'll do like a
Speaker:pre-flight check.
Speaker:It will, like, trace all the methods and classes ahead of time against the specs.
Speaker:It will, you know, it will have an index of what it needs to look for.
Speaker:It added a few linters.
Speaker:OK, yeah.
Speaker:Yeah.
Speaker:That's good.
Speaker:It did.
Speaker:So those are improvements.
Speaker:Yeah, yeah.
Speaker:Yeah.
Speaker:And it's actually interesting.
Speaker:The next model on the next module that it reviewed only took five rounds.
Speaker:That 90 to five.
Speaker:That's that's pretty.
Speaker:That's pretty.
Speaker:OK, I must also play devil's advocate and ask how big was the other one?
Speaker:The it's I would say that they were similar, similar, similar.
Speaker:But but here's the thing.
Speaker:Here's the thing.
Speaker:There's another reason why.
Speaker:Yeah, because of the a lot of the rounds were
Speaker:finding things that it should have picked up the first time.
Speaker:OK, OK.
Speaker:Right.
Speaker:And that's where those things like tracing all the tracing the comments back.
Speaker:Yeah, the linters, all of these things really did reduce a lot of the
Speaker:noise.
Speaker:So interestingly, I recently read this as well as like I was going through Claude's
Speaker:has updated documentation online.
Speaker:They actually I think a while ago put this user guide and I think it's pretty
Speaker:buried.
Speaker:Unfortunately, I do feel like it's a little buried.
Speaker:One of their strong recommendations is plan first always.
Speaker:But even before planning, you should actually ask it to understand research what
Speaker:this module is doing or what this code looks like.
Speaker:How does it actually trace down a particular feature?
Speaker:So it's like, OK, how does your authentication flow work?
Speaker:Like would be a good question.
Speaker:Right.
Speaker:And that actually does pre-work.
Speaker:It says that, all right, capture all the information about the authentication flow,
Speaker:because then I know exactly where I need to make the updates.
Speaker:Sounds like that's you've experienced that firsthand now.
Speaker:Yes, yes.
Speaker:And like I was like last show,
Speaker:there's a lot of things I'm finding out by brute force, like I'm developing the
Speaker:process that, you know, a software engineer would have known.
Speaker:Yeah.
Speaker:Or would have been would have been instructed to do more than even know, like when
Speaker:told, turn your brain off.
Speaker:Just follow the process.
Speaker:Yeah.
Speaker:Which is I do feel like that's that's actually a very interesting distinction I want
Speaker:to get back into later is
Speaker:you're learning the reason why the judgment exists or the process exists.
Speaker:You're using your judgment and then getting Claude to give you the right information
Speaker:so that you can take the correct judgments that, you know, like probably engineering
Speaker:teams have been doing ad nauseum across time.
Speaker:And that ends up becoming either tribal knowledge or becomes very strict process.
Speaker:It's what it feels like to me.
Speaker:Like, that's that's what I'm hearing almost.
Speaker:I'm curious how many more like do you feel like we also talked about using existing
Speaker:tools, we also talked about like some project or so get you done as a repository
Speaker:that I keep I keep talking, yes, yes, we'll we'll we'll see if that gets flagged in
Speaker:some.
Speaker:That is probably what got flagged on the podcast.
Speaker:OK, that is probably all right.
Speaker:So we were trying to syndicate our podcasts across things and the tool's name has
Speaker:now become a problem.
Speaker:So I'm sorry.
Speaker:Andrew, you're going to have to find it.
Speaker:You're going to have to get OK.
Speaker:We're going to get flagged again.
Speaker:Oh, it's an official.
Speaker:I'll ask Claude to beep it out.
Speaker:Yeah, yeah.
Speaker:If it can do it, that would be fantastic.
Speaker:That would be interesting.
Speaker:And would also at some point love to learn more about the whole process.
Speaker:We should be talking about that at some point, too.
Speaker:Yes, yes, absolutely.
Speaker:There's a whole process that I've used to normalize the audio to transcribe
Speaker:per speaker.
Speaker:That's yes, that's pretty amazing.
Speaker:The podcast.
Speaker:I did.
Speaker:I did read the transcript, at least blurbs here and there.
Speaker:I was like, OK, it's pretty good.
Speaker:It was impressive.
Speaker:And it was done with local compute as well.
Speaker:Yeah, on GPF.
Speaker:That's another that's a successful project right there.
Speaker:Yes.
Speaker:Anyway, sorry, coming back into this.
Speaker:Do you feel like there are you're learning a lot of things based on your individual
Speaker:judgment as well and you're learning a lot about the tool life is what I feel like
Speaker:is most likely happening.
Speaker:So maybe I should ask you that question.
Speaker:You're right here.
Speaker:Do you feel like.
Speaker:You're learning a lot more about the tool and the process that you would follow and
Speaker:building now with the modern AI tooling?
Speaker:Definitely.
Speaker:I guess, too, I think that was a really broad question.
Speaker:It was very broad, I'm sorry.
Speaker:So I was like, oh, where do I start with this?
Speaker:I guess you have a little bit more specific to help me nail something down.
Speaker:Yeah, yeah, definitely.
Speaker:Definitely.
Speaker:And without spending too much.
Speaker:So like we talked about reviews, we talked about basically analyzing the codebase
Speaker:ahead of time, basically doing research, pre-research ahead of time, then planning
Speaker:and then executing.
Speaker:Do you feel like there are more examples where you're like, OK, this is taking way
Speaker:longer, this was too big, and I think maybe more like it
Speaker:didn't understand me here clearly and it did all these things as well.
Speaker:One of the things I think you mentioned to me earlier was you did tell Claude to
Speaker:exercise his own judgment.
Speaker:And that is something you're still working on figuring out if that was effective or
Speaker:not.
Speaker:Right.
Speaker:Yes.
Speaker:So so I guess, yeah, I remember talking about this last show.
Speaker:Like, there's a lot of things that Claude will just fill in the blanks, fill in the
Speaker:gaps, especially if you're not specific or intentful with your instructions.
Speaker:If you say code this, well, it's going to code it.
Speaker:But is it going to be the way you wanted it?
Speaker:Is it going to be, you know, is it going to work more than once?
Speaker:You know, you know, and will you be will you be able to follow its thought process
Speaker:as well?
Speaker:Exactly.
Speaker:Yes.
Speaker:I think before the podcast, we were talking a little bit about like how you've also
Speaker:maybe you're using a particular workflow here to actually also learn new things
Speaker:because it may tell you things and then you're just like, what do you mean?
Speaker:Yes.
Speaker:Yes.
Speaker:And I'll speak to that.
Speaker:Yeah, there's a lot of times where Claude will present an issue to me that popped up
Speaker:for review.
Speaker:And I'll read over it and I'll be like, what?
Speaker:What are you saying?
Speaker:Yeah.
Speaker:And so I will literally just ask, like, what do you mean?
Speaker:Like, could you expand on this a little bit more?
Speaker:And then after it starts talking, I was like, OK, I'm starting to get the picture.
Speaker:I ask more specific questions.
Speaker:Right.
Speaker:And I keep going and I keep going.
Speaker:And eventually I have the entire picture and then I'll make a decision.
Speaker:Yes.
Speaker:And sometimes by the time I get to there, I'll say, well, we don't need.
Speaker:We don't need any of this.
Speaker:Yes.
Speaker:OK.
Speaker:And I think that is so this is this is great because this is actually what I wanted
Speaker:to ask you is how much time did you end up spending doing that?
Speaker:Because that is a huge value add, right?
Speaker:Like that is all of a sudden you're like not only learning about the system, you're
Speaker:also able to then detect it's like this was not a valuable portion of the system,
Speaker:let's just get rid of it.
Speaker:That's actually a really good thing.
Speaker:Honestly, like sometimes you know the tool can do so much.
Speaker:It does a lot.
Speaker:And then you all of a sudden you can be like, oh, no, we can simplify.
Speaker:We should simplify it.
Speaker:And that is something that takes software engineering teams years potentially before
Speaker:they realize that the systems they built, there are some portions which are not
Speaker:useful and not needed and we can get rid of them and then also reduce the amount of
Speaker:time we take or we spend maintaining them.
Speaker:And that's cost.
Speaker:That's again leadership cost.
Speaker:I'm like curious, like how long now it took you to be able to learn some things
Speaker:based on like just asking Claude.
Speaker:Was it days?
Speaker:Was it hours?
Speaker:Oh, it's usually minutes.
Speaker:It's usually all right.
Speaker:Well, it's usually minutes.
Speaker:I mean, it when something's costing me time, I
Speaker:think it's going to be an easy lesson.
Speaker:That's that's because you are definitely being able to see the, you know, your limit
Speaker:getting exhausted.
Speaker:Yeah, that's it.
Speaker:I don't wait to see the percent change on weekly.
Speaker:I hit refresh after two, three minutes and well, there's another percent gone.
Speaker:And you're like, oh, that's that's a heavy.
Speaker:Yeah.
Speaker:Yeah.
Speaker:It was when I had access to Fable for the three days there
Speaker:was I actually was not specific on something that Fable
Speaker:was researching for me.
Speaker:And it presented me with a couple of options.
Speaker:And I said, expand on this option.
Speaker:It launched a hundred agent workflow
Speaker:to research this and came back with the most worthless response
Speaker:I've had, probably from Claude.
Speaker:But it was successful in burning 20 percent of my weekly limit in 30 minutes.
Speaker:Oh, that's 20x too.
Speaker:Yeah, on 20x.
Speaker:Yeah, that was about three point three million tokens for that response.
Speaker:Oh, wow.
Speaker:Yeah.
Speaker:Yeah.
Speaker:And that would have been what that would have been if it was all output.
Speaker:That would have been like one hundred and fifty bucks an API.
Speaker:Wow.
Speaker:Yeah.
Speaker:Yeah.
Speaker:Yeah.
Speaker:That's that's a lot.
Speaker:That's an.
Speaker:And that is also something that happens often.
Speaker:But no, that's that's really interesting.
Speaker:So it moves so fast that it essentially did so much work.
Speaker:That was not really worth much at the end.
Speaker:It told you something you mostly probably already knew.
Speaker:Yeah.
Speaker:And I was really simple.
Speaker:I just expand on option two versus like go do a million.
Speaker:You know, I didn't I didn't say do like a deep deep research report.
Speaker:I don't know a PhD on this.
Speaker:Yeah.
Speaker:I just I just wanted a simple explanation.
Speaker:And the thing was, I'm running multiple workflows.
Speaker:I all tabbed came back.
Speaker:I'm like, word, what happened here?
Speaker:All right.
Speaker:OK, so I do want to ask this question now is like in your mind, like, do you feel
Speaker:like with events like this in general, like you now need to
Speaker:really incorporate more rituals, more processes, make your project more of a
Speaker:process if you want to make this into a.
Speaker:Real life project or sorry, into a real life product or a real life.
Speaker:Like, do you feel like now when you start running into things like this, your
Speaker:confidence level has dropped enough that you now need to add things like guardrails
Speaker:to increase your confidence, because one thing you already said was you did change
Speaker:the process to do pre-work and that pre-work saved you from 90 to five iterations,
Speaker:so like nine iterations on a single module to five, which is massive.
Speaker:Do you feel like now you'd be more interested in looking at opportunities?
Speaker:To add more guardrails, to slow it down intentionally?
Speaker:So so that's a great thing to ask, because I actually that's my next kind of
Speaker:process improvement part of the workflow is that I realized that I'm creating
Speaker:a lot of workflows for it to follow.
Speaker:It doesn't always follow them correctly.
Speaker:Right.
Speaker:Even if it's literally reading a script.
Speaker:And then I realized I probably should be using skills and I haven't.
Speaker:And so
Speaker:my next step actually is to implement all the workflows that I have been using over
Speaker:and over and implement them as formal like Claude or Kodak skills.
Speaker:Yes.
Speaker:Nice.
Speaker:I said, OK, I am a little I may have some biased thoughts over there because I've
Speaker:been using skills for a while.
Speaker:But before we get into that, could I would you like to explain for audiences in case
Speaker:like what is a skill versus like what is other prompting in traditional software or
Speaker:prompting?
Speaker:So, you know, maybe I you might be able to talk more to this, but
Speaker:but I'll give you my interpretation of it first, because, again, I haven't actually
Speaker:made a skill with Claude yet.
Speaker:So that that was the next thing I'm going to do.
Speaker:There's a really useful tool from Anthropic to do it.
Speaker:But this is the skill tree.
Speaker:The skill creator is so, so creative.
Speaker:From my understanding.
Speaker:It
Speaker:essentially
Speaker:is like a set of instructions that points Claude to where it needs to find the
Speaker:information to replicate something
Speaker:over and over, like a workflow.
Speaker:Right.
Speaker:Yes.
Speaker:There is for me, it's like I don't know exactly how it was that different than
Speaker:me telling you to read a file.
Speaker:See, so that that's the question, because that's what I had already.
Speaker:Like, all I have to do is tell Claude will convert this runbook into a skill.
Speaker:That's all I'm going to say.
Speaker:OK.
Speaker:And then let's see if it works.
Speaker:Right.
Speaker:I'm hoping it will because it's it's all right.
Speaker:All the instructions already there.
Speaker:That's yes.
Speaker:I think I think generally you've got the right gist of it.
Speaker:It is a workflow for Claude to execute.
Speaker:And in a sense of like instructions that it should know how to like methodically
Speaker:say, do one, two, three, and you'll get the result that is intended as part of this
Speaker:skill.
Speaker:Like, for example, a skill could be like a tax preparation skill could be like, OK,
Speaker:you know, put together the person's W2 and then put together all the right fields,
Speaker:fill up the 1099 form or whatever the form is and then submit it.
Speaker:Right.
Speaker:Those would be the skill.
Speaker:The thing I can add that I what I understand why it tends to behave differently than
Speaker:traditionally, like just saying a prompt and then telling it to go read a file is
Speaker:there is something the concept of system prompts versus user prompts.
Speaker:Yeah.
Speaker:So system prompts are what the AI model is essentially used to give themselves
Speaker:context or like separate the role from like, oh, this is supposed to be me doing
Speaker:some work versus this is what the person who's talking to me is asking.
Speaker:So that person talking to me from a model perspective with the user prompt, whereas
Speaker:the system prompt is essentially the actual model's own identity in a sense.
Speaker:Like, what is its role?
Speaker:What is its job?
Speaker:What context is it?
Speaker:Am I working for Anthropic?
Speaker:What is my language supposed to be like?
Speaker:Am I supposed to avoid saying things like, you know, like avoid profanity, avoid
Speaker:doing like suggesting things that are not real, always base my things in reality.
Speaker:So the there are two prompts that every AI system or every chatbot system basically
Speaker:usually uses.
Speaker:It is a system.
Speaker:It is a user prompt because you could have n number of user prompts that get built
Speaker:up over time.
Speaker:But there's only one system prompt for that system to build over time, which was a
Speaker:problem in the past, because if there was some very specific instructions like tax
Speaker:reparation, that is very repeatable, that's very standard.
Speaker:The system prompt might not be able to hold every single set of
Speaker:like tax preparation
Speaker:and, you know, like a writing expert and let's say, you know, a software engineer.
Speaker:Right.
Speaker:You can't put all of those into the system prompt.
Speaker:It just gets too big.
Speaker:So what they actually the innovation here with the skill was that the system prompt
Speaker:would have a stub and then you could replace that stub with a user provided set of
Speaker:instructions, which was a skill.
Speaker:So you have within the system prompt, a specific section that's like add skill text
Speaker:here.
Speaker:And even the skill is supposed to follow a particular format that works well with
Speaker:that model, which is supposed to give information like, all right, give me the
Speaker:circumstances under which the skill needs to be invoked.
Speaker:Give me, you know, like the actual step by step instruction first.
Speaker:Give me some examples like how this works.
Speaker:Give me like some starting text and what the eventual response should look like,
Speaker:because then all of those things can go into that system prompt.
Speaker:And then it's like the.
Speaker:The AI model has a more specific role and is now executing a particular skill like
Speaker:tax preparation and instead of the user prompt defining all of those things, which
Speaker:ends up sometimes also being isolated and separated because you're also doing things
Speaker:like when a user asks you for something, the AI model may or may not do everything
Speaker:because a user may ask you to do bad things as well, like, you know, maybe a prompt
Speaker:injection trying to do something negative is also a possibility.
Speaker:So that same isolation that occurs on the user input or the sanitization
Speaker:that occurs on user input does not apply to the skill and the system prompt.
Speaker:Therefore, the skill is intended to be more focused with the instructions and follow
Speaker:a particular format.
Speaker:That's that's maybe the overly detailed explanation of it.
Speaker:But yes, the intention is supposed to be that you can repeatable workflows end up in
Speaker:skills.
Speaker:They actually get honored more effectively.
Speaker:So they actually get treated more like the model's gospel, like things that it'll
Speaker:follow religiously versus not.
Speaker:So, yes.
Speaker:But exactly as you said, it is a workflow.
Speaker:It is actually intended to be focused on a specific area and solve that repeatedly.
Speaker:Yes, we just went into a five minute conversation about what a skill is.
Speaker:But anyway, Andrew, come back.
Speaker:Sorry.
Speaker:Yeah, skills.
Speaker:So you want to try skills next?
Speaker:Yes.
Speaker:Yes.
Speaker:I want to try to create my own skills through the workflows I already have
Speaker:outstanding and hopefully I will get more consistency.
Speaker:That's my goal.
Speaker:Like the workflows work, but there's so many instructions and so many
Speaker:things that need to be followed.
Speaker:Yes.
Speaker:The models are just forgetting, conveniently forgetting certain steps along the way.
Speaker:I'll be like, why aren't you doing this?
Speaker:And then it'll be like, oh, it was because I interpreted,
Speaker:you know, something I said earlier differently.
Speaker:Like I sometimes it will say, for example,
Speaker:sometimes I'll say continue autonomously.
Speaker:Yes.
Speaker:And it's very simple, straightforward.
Speaker:Very straightforward.
Speaker:It understands that.
Speaker:Right.
Speaker:But sometimes it will still,
Speaker:you know, a blocker will come up even though I have the protocol for that.
Speaker:It will show the decision menu and it would pause the whole workflow.
Speaker:I'm not looking at it 24-7.
Speaker:I come back, it's been sitting there for five hours waiting for me to say something.
Speaker:Yeah.
Speaker:And you're like, well, you know, you should have just continued autonomously.
Speaker:You had the information.
Speaker:Yeah.
Speaker:And and then I I would ask it, so so why did you stop?
Speaker:Yep.
Speaker:And it was funny.
Speaker:It literally said I didn't have a good reason to stop.
Speaker:Oh, my gosh.
Speaker:OK.
Speaker:It needed your direction, Andrew.
Speaker:Yeah, yeah, yeah.
Speaker:And yeah, it was it's it can lose, I guess.
Speaker:It was a simple thing to remember, but it still got drowned out
Speaker:over time.
Speaker:Yeah, I mean, and I definitely.
Speaker:I think that is a challenge when context windows get long.
Speaker:Yeah.
Speaker:Models have been known to bias towards remembering the last thing you told them or
Speaker:the first thing you told them.
Speaker:And everything in between kind of just gets muddled.
Speaker:Modern techniques and modern, you know, like the latest versions of Claude may be
Speaker:like are better at this in certain
Speaker:circumstances that are not so that they are still susceptible to them.
Speaker:So there's still a possibility that will occur.
Speaker:Something in the middle just gets lost.
Speaker:When you said I imagine when you said continue autonomously, it's probably like
Speaker:continue what?
Speaker:And then it just was like, well, it was a bit better than that.
Speaker:Well, yeah, I understand.
Speaker:I totally agree.
Speaker:Yeah.
Speaker:So this does feel like there's some aspect of it where it got lost in the sauce
Speaker:somewhere at some point.
Speaker:And that is where something like a skill, which is like, OK, regardless of what the
Speaker:person is asking me, this is what I'm supposed to be able to do is like a kind of
Speaker:like it's it's kind of like a grounding.
Speaker:Truth for it.
Speaker:It's like this is the ground truth for me to follow or the grounding instructions
Speaker:for me to follow.
Speaker:So it won't ever like it always consider that, OK, regardless of what this person
Speaker:said, what is the how does that work into the grounding truth of these instructions
Speaker:that I'm supposed to follow and then it'll ask for clarification ahead of time and
Speaker:it won't just arbitrarily just wait so that that is maybe the way to put it as well
Speaker:as like you can if you have a skill for a researcher, the researcher will be like,
Speaker:oh, I can't start until they give me all this information.
Speaker:But once you give me that information, I can do this thing on its own.
Speaker:And the similar thing is like when you hand it off from a researcher to, let's say,
Speaker:a actual planning agent, then that planning agent also, if it has the right skill,
Speaker:can also be like, let me clarify what I need ahead of time and then I can basically
Speaker:move on.
Speaker:So that's also like some of the skill benefits are also like giving a very solid
Speaker:understanding of what's required to start.
Speaker:And then because now you have a very clear understanding of what's required to
Speaker:start, the model can also ask questions to make sure that it has enough information.
Speaker:So you can ask clarifying questions and do all that stuff ahead of time.
Speaker:So, yeah, sorry.
Speaker:But yeah, I can keep going.
Speaker:I think the important part for me
Speaker:is that just like you were saying, as we started this like little tangent here,
Speaker:was I like losing confidence in it, you know, performing things that I have
Speaker:already established.
Speaker:I have done what the models just did.
Speaker:But yes, sorry, sorry.
Speaker:And I'm hoping, like you implied, that the skills
Speaker:will be a guardrail, that it will protect this workflow.
Speaker:It's like I feel like the workflow is at a point where it's near perfection.
Speaker:It's never going to be perfect, but it's near perfection enough that I really want
Speaker:it to follow it.
Speaker:And it needs to be able to follow it over multiple hours.
Speaker:Yes.
Speaker:Yes.
Speaker:And I think that's a really important part is over time.
Speaker:Yeah.
Speaker:Yeah.
Speaker:Because there's times where I see it, you know, midstream, there's regression.
Speaker:It's like, oh, it's no longer like I'll try to give some concrete examples.
Speaker:Like I would ask it to like it's part of a workflow and which is in a runbook.
Speaker:So it has a file to check what it needs to do every time.
Speaker:This is obviously not a skill.
Speaker:I would say please print out the findings per model.
Speaker:The severity.
Speaker:Right.
Speaker:A little brief description of what it was.
Speaker:Yes.
Speaker:Right.
Speaker:And so that way, when I come over to check the logs later.
Speaker:Yes.
Speaker:Yeah, I can see.
Speaker:OK, Deepsea found this.
Speaker:OK, Codex found that.
Speaker:All right.
Speaker:And I also have some requirements for it to keep track of Deepsea usage since I'm
Speaker:actually paying API spend on that.
Speaker:Yes.
Speaker:There's a lot of notes that I can go over and review later with Claude to optimize
Speaker:things more, determine whether these models are worth keeping around.
Speaker:Yeah, right.
Speaker:And that's great.
Speaker:I was actually curious.
Speaker:So do you use these output contracts as the union mechanism as well?
Speaker:Like, how do you you mentioned that there was all unions between the findings?
Speaker:Yeah.
Speaker:You rely on the contracts to essentially.
Speaker:Yes.
Speaker:Yes.
Speaker:Because they're supposed to wait.
Speaker:The orchestrator is supposed to wait for every single one to finish first.
Speaker:They're all running blind.
Speaker:Nice.
Speaker:All of them have optimized prompts per like like angle for that for that
Speaker:model.
Speaker:Yeah.
Speaker:Yeah.
Speaker:For for that specific module, what I'm looking for.
Speaker:Oh, nice.
Speaker:Yes.
Speaker:Yeah.
Speaker:And it's preloaded again with all the trace patterns that it already knows it needs.
Speaker:Nice.
Speaker:And so as the reviews go along, it's only deltas.
Speaker:And to be one question, just to clarify as well, like I said, contracts.
Speaker:I didn't clarify.
Speaker:It's a structure, right?
Speaker:It's basically like a schema.
Speaker:It's like, yes.
Speaker:Yeah.
Speaker:OK, so sorry, just quick clarification, but continue the workflow of the workflow
Speaker:here.
Speaker:Yeah, yeah, yeah.
Speaker:So so how another part of refining the reviews that I remembered was instead
Speaker:of reviewing the whole module every single time,
Speaker:it would start to nail down on its own where the problems, where the seams are
Speaker:really obvious is maybe a good way to put it.
Speaker:And at a certain point, the majority of the reviews are only deltas.
Speaker:What did we change?
Speaker:What what needs more attention?
Speaker:Did the fix implement correctly?
Speaker:Did the test work right?
Speaker:Yes.
Speaker:And at a certain point it converges.
Speaker:There's no more like I stage it like you had priority one, two or three.
Speaker:Yes.
Speaker:I have blockers, warnings,
Speaker:defers, suggestions.
Speaker:Yeah.
Speaker:And nits.
Speaker:Well, yeah, those those are that is actually also very common software engineering
Speaker:methodology terminology.
Speaker:Yes, I mean, Claude gave it to me.
Speaker:Yeah, I mean, I asked it like, well, how would we set this up?
Speaker:How would we define the different levels?
Speaker:And that is what it chose for me.
Speaker:And it made sense.
Speaker:And I was like, oh, I'll go with it.
Speaker:It makes sense.
Speaker:It's pretty, pretty.
Speaker:It's a good one.
Speaker:Yeah.
Speaker:OK, I think the one thing I will definitely say here is that sounds like
Speaker:we're kind of, in a sense, also like converging onto a particular process,
Speaker:because a lot of what I'm hearing is actually like very common with like a lot of
Speaker:the capabilities are the way that, you know, we operate as a software company and
Speaker:also like the way that my previous companies operated, like treating software
Speaker:development more as a process.
Speaker:And I'm even more curious now that as you have if you feel like you're transitioning
Speaker:from more of a vibe coding to a more of a process driven approach or do you feel
Speaker:like, oh, no, I don't actually want to get too much in the process, because that's
Speaker:another big thing I've seen still, like I don't know how prevalent the term still
Speaker:is.
Speaker:I still think it's very prevalent.
Speaker:Vibe coding is I hear it all the time.
Speaker:So I want to get your take on do you feel like going away from vibe coding and more
Speaker:engineering is a good thing or do you feel like I don't necessarily want to go in
Speaker:engineering because there's also a traditional if you talk to people in the
Speaker:industry, they're like software engineering is so slow is what it be like.
Speaker:It's a very common thing to hear it as well, because, yes, there are processes and
Speaker:rituals that make it slower.
Speaker:So do you feel like you want to avoid becoming software engineering or what's your
Speaker:take on that?
Speaker:I'm curious.
Speaker:Oh, I think.
Speaker:I think I think this kind of puts everything we've talked about a little bit more.
Speaker:No, I think it like all comes together here.
Speaker:I was complaining earlier about New World and the feeling at launch.
Speaker:Right.
Speaker:Not being able to scale when clearly there it was Amazon who made the game, they own
Speaker:AWS.
Speaker:They have all the skills for this.
Speaker:What happened?
Speaker:It didn't make any sense to me, but uh,
Speaker:they like take everything together.
Speaker:One of the biggest focuses on this is I would like to do it right.
Speaker:Yeah.
Speaker:And and to do it right will require some standardization of processes.
Speaker:So I guess to answer your question directly, I do think that this is becoming more
Speaker:process driven than purely vibe coding.
Speaker:Now, I guess maybe day one it was vibe coding because I didn't really have a
Speaker:structure to anything yet.
Speaker:Right.
Speaker:It was a blank slate for me.
Speaker:This is the first time I've, I've like, you know, I've written code manually before
Speaker:in classes and stuff like that, but not two hundred thousand lines of anything.
Speaker:Oh, yeah.
Speaker:Oh, yeah.
Speaker:And one thing I will throw out there is I think, Andrew, we were also talking a
Speaker:little bit about your profession and since admin is still very there is a lot of
Speaker:systems that you need to put together.
Speaker:Right.
Speaker:Right.
Speaker:There's a lot of connections.
Speaker:There's a level of architecture that also you need to consider is like, what is what
Speaker:are these things actually do and understand them enough enough depth that you can
Speaker:put them together.
Speaker:And I feel like, you know, like we talked about the perspective that you're bringing
Speaker:here, we also talked about this admins in particular, maybe being a really good
Speaker:audience for this kind of tooling because, you know, software engineering
Speaker:can be is a very wide term and a software engineer does a lot of things.
Speaker:You can have like specialized roles.
Speaker:You can also have generalists.
Speaker:I feel like this admins, you have to be a generalist up to a large extent because
Speaker:you're working directly with people and your scope is always huge.
Speaker:So a lot of what we can do.
Speaker:Right.
Speaker:We do try to automate as much as possible.
Speaker:Right.
Speaker:That's how we have many more hands.
Speaker:Solve it once.
Speaker:Yeah.
Speaker:Get it fixed one time.
Speaker:Yep.
Speaker:We do thankfully have other teams that can take different things like help desk and
Speaker:stuff like that.
Speaker:So we're not necessarily doing like all of the front line stuff all the time.
Speaker:But when it comes to like, like campus infrastructure, networking issues, look at
Speaker:the websites down or, you know, like, like something like that, that would
Speaker:definitely fall into take for granted.
Speaker:Yeah.
Speaker:And into our wheelhouse.
Speaker:I guess I think I got away from your question, though.
Speaker:Could you could you repeat it?
Speaker:The main aspect being a risk perspective is I do feel like you're building
Speaker:something in your personal time that you also realize there is only so much time, so
Speaker:much token budget that you have and a combination of that.
Speaker:And also wanting to deliver something for you during your personal like for your
Speaker:personal project.
Speaker:And do you feel like at this point is risk of token
Speaker:expenditure, maybe something that drives your decision towards going more into like
Speaker:a process oriented approach?
Speaker:Absolutely.
Speaker:Or do you think it's maybe also a combination of like the experience you've had of
Speaker:like orchestrating these systems for your professional life?
Speaker:Or maybe it's a combination of both.
Speaker:That's interesting.
Speaker:I think specifically for Minecraft, it's more informed by my previous experiences
Speaker:running LuxWander.
Speaker:Ah, yeah.
Speaker:OK.
Speaker:Actually.
Speaker:And actually, I just realized that as a keyword.
Speaker:I just dropped right in there.
Speaker:We're going to have to bleep that out later.
Speaker:No, no, no, it's OK, it's OK.
Speaker:I think it's more inspired by that.
Speaker:There's a lot of ways where I could maybe like tweak it to parallel to some things
Speaker:that work, but I think it's more so a lot of the development and what drives a lot
Speaker:of the development.
Speaker:And I think it's a lot of the decisions, it goes comes directly from experience that
Speaker:I had running LuxWander before, like, and I think like I
Speaker:guess is like a little fun story when when LuxWander first released in 2010,
Speaker:it was, you know, Minecraft pre-alpha.
Speaker:That was, I don't like to say it, 16 years ago.
Speaker:Yeah, it was 16 years ago.
Speaker:Oh, yeah.
Speaker:And don't remind me.
Speaker:And
Speaker:you know, I released like some advertisements online, like on like Minecraftforms
Speaker:.net, right?
Speaker:Like I had like a thread, you know, and there was rudimentary,
Speaker:you know, multiplayer Minecraft.
Speaker:It crashed all the time.
Speaker:You know, parts of the map corrupted all the time.
Speaker:Updates were coming every day, you know.
Speaker:So it almost feels like your original, that predated all of your professional
Speaker:experience.
Speaker:So yes, yes.
Speaker:But your love for actually building this community and this actual like version of
Speaker:Minecraft that everybody could enjoy in the way that you wanted it to actually was a
Speaker:more of a driver, essentially, even today continues to be more of a driver to
Speaker:building something really awesome, not to say that your professional experience
Speaker:doesn't help a little bit here and there, but maybe it's a combination of like
Speaker:wanting to build something and having the, you know, like a wish to build something,
Speaker:but also like a little bit of the learnings that you've had over time, you know,
Speaker:professionally and personally in your previous experience.
Speaker:My reasoning for this is like some of the move from a project driven
Speaker:approach of let's just like vibe coded and hope it works and hope it works like we
Speaker:can deploy it versus a, oh, I actually have built something that people have used
Speaker:and I want to build something again that people have will use and will really love
Speaker:is a very strong driver for saying that I'm not just playing around.
Speaker:I'm building something from like a place of wanting it to be successful.
Speaker:And that is maybe also part of like and the risk of like my risk is I actually want
Speaker:to build something as bad either, and that is like a personal feeling about it as
Speaker:well.
Speaker:But it's driving you to now make decisions that are resulting in like more definable
Speaker:processes and actually improving the quality as well as reducing the cost so you can
Speaker:actually finish and get it out the door.
Speaker:And I mean, I said a lot of things.
Speaker:Let me let me maybe finish it up and ask a question is, do you feel like
Speaker:these tools essentially have really actually enabled you or do you feel like
Speaker:these tools are just giving you more of a mirage of like getting the AI tools,
Speaker:Claude in particular so far?
Speaker:Was it so far?
Speaker:So that is a really interesting question because I cannot say definitively
Speaker:yet until I start testing it right now.
Speaker:So.
Speaker:So so I think ask me in a few weeks again.
Speaker:Yeah, because because right now it's like undefined.
Speaker:I don't have any proof yet besides the development.
Speaker:And so I would like to answer that confidently.
Speaker:But I guess I can answer the kind of like the idea before that.
Speaker:I do think it's quite empowered me to actually create something that I've always
Speaker:wanted to do.
Speaker:Right.
Speaker:There is there is a lot of things, a lot of like a big wish list of stuff that I had
Speaker:that last time I ran Lux Wanderer,
Speaker:but I just didn't really have the manpower.
Speaker:I didn't have like the skills, you know, that's funny.
Speaker:Yeah.
Speaker:You still don't know.
Speaker:You will.
Speaker:You will.
Speaker:You will.
Speaker:Skills creator.
Speaker:I will soon.
Speaker:And
Speaker:it has closed the gap, though, that I think like you said, it's democratizing,
Speaker:you know, I guess, intelligence.
Speaker:Yeah.
Speaker:Yeah.
Speaker:And then, you know, it might all just be an act.
Speaker:And I think that is something that we will have to see if it's if it is a
Speaker:act of intelligence.
Speaker:That's all right.
Speaker:And at that point, I think we got to end the show.
Speaker:Thank you.
Speaker:Thank you.
Speaker:We'll have to come back and see if it is.
Speaker:Yes.
Speaker:Yes.
Speaker:Cut the cut.