Green CI and Merge Queue Mastery with Trunk’s Eli Schleifer

Artwork for podcast Platform Engineering Podcast

Episode 47 • 15th April 2026 • Platform Engineering Podcast • Cory O'Daniel, CEO of Massdriver

Welcome back to the Platform Engineering Podcast. I'm your host, Cory O'Daniel. My guest today has spent his career inside of some of the most demanding engineering organizations on the planet - Microsoft, YouTube and Uber. He's currently the co-founder and CEO of Trunk, Eli Schleifer welcome to the show.

Eli: 00:00:28

Thanks for having me, Cory.

Cory: 00:00:29

So I'm very excited for you to be here. It is a collision of my worlds.

So I am very much a TDD engineer.I love tests, I love green pipelines and I feel like honestly you guys are kind of at a very interesting place in how our engineering world is changing, so very excited to talk to you today about CI, keeping greens build... but would love to just learn a little bit about your background, like how you ended up in the CI space and... yeah, we'll go from there.

Eli: 00:00:55

I spent obviously a bunch of my career in big tech in the early days, did eleven years at Microsoft and then had a startup that then ended up selling to Google. Inside Google, I was just blown away by how easy it was to build software. The Google three stack, the DevX experience in there was just bar none.

And then I went to Uber, worked on self driving cars at Uber ATG. We had, you know, some six hundred, seven hundred of, probably at that time, the most expensive engineers you could hire - these self driving engineers. And we couldn't get code into the repo. It was just impossible. CI was completely unreliable.

We had a merge queue that would back up for days. If you queued something on Thursday, it would be like, "By Sunday that thing's going to clear and then I can get back to work."

And it's kind of funny because in my early days at Microsoft I worked on Windows Mobile, before even the iPhone came out, and we would have days when you'd wake up and the build would be broken and be like, "Sorry guys, no simulator to run today, nothing to do."

And I was like, this is crazy that we're still... many years later... still hitting these problems. We're just like, "We can't get work done. This is so inefficient."

And engineers want to build, they just want to land their code and go on to the next thing. And it's really hard to do that. So that's why we started Trunk. It was like, "Can we build the Google developer experience that is just...that just works?" Really, you just want the thing to work, you want it to give you signal when you need the signal and otherwise you want it to get out of your way. No one is like, "I wish CI was taking longer." It should just be instantaneous. That would be the perfect world.

So that's kind of like why we started Trunk. It's like, "Can we do that?"

We started with our code quality product, but we quickly launched our merge queue product, which is the most performant merge queue on planet Earth right now. We can handle more pull requests than GitHub itself can handle. So we basically maxed out the testing on that at the limits of the Ruby Monolith at GitHub.

And then we've launched, you know, in the last year and a half, our flaky test product, which we built from scratch. Really to identify flakiness in CI and really looking, you know, at the broader spectrum of like, "What does it look like to manage tests for organizations that manage CI for organizations so that they can get to work?"

And as you kind of hinted at, like, it's a crazy time to be doing this because the inner loop, where all that code generation is happening with Claude and Cursor and Copilot... for people who still use Copilot... like, that's just generating a lot of PRs and a lot of code and the seams in the system, the outer loop where we're doing CI, where we're trying to actually test and verify things are working, that's falling apart.

We are getting customers every single day just being like, "Our merge queue is backed up, it's totally posed, our tests are totally flaky, we can't get things through. The agents keep generating code, which is awesome, but we can't actually get into the system reliably."

And they come to us, and we're really looking to become that outer loop company. We want to keep engineering moving and basically make CI work.

Cory: 00:03:39

Yeah.

So for folks who haven't worked with a merge queue before, could you tell us a little bit about what a merge queue is and what size teams are typically... or what change velocity are teams typically starting to look towards using a merge queue?

Eli: 00:03:54

Great question. And really it probably historically would have been something like twenty, thirty engineers would need a merge queue. Now, if you're like a really productive team, that's like ten people, plus each of those is running a bunch of agents, you might need it already.

The way the merge queue works, it's basically doing predictive testing against your pull request.

And why this matters is that you, Cory, are writing a pull request and you make some new code that's using a function called "foo" and I go in another pull request and I rename "foo." I rename "foo" to "bar". So the whole code base is now going to have "bar" and you're still calling "foo." You push your stuff up - green CI. I push up my stuff - green CI. If our stuff basically both lands, it's broken. Because you've just introduced code that's using this old function. We call that a logical merge conflict.

The silliest way to get around that is to make sure that every time Main changes, you just have to retest everything... all your active pull requests. That's insanely expensive. Like, you never get any work done.

So the industry basically started building merge queues. And what a merge queue does is it does predictive testing. So in a pipeline, as you might imagine, now my same pull request is in that pipeline and your pull request gets queued behind it. So when your pull request goes into the merge queue, instead of testing against just the head of Main, you're testing against my changes, which renames and gets rid of the "foo" function. And now your code which is using that function will fail. Like, it'll fail to compile, it'll fail to test and you'll get an error, you'll get rejected from the queue, you'll be like, "Oh, I didn't know Eli was doing that. I'll fix it up," and no problem, you resubmit to the queue and everything works.

So instead of having a broken build, which basically means no one can get anything done because Main is host, instead you move to a model where you're protecting Main and Main is basically always ready to go. And in a world where you have really good CI, you could basically push directly from Main.

Many organizations do it through the CD flow and that's really giving you, like, "I can change code really quickly and get it into the hands of customers." No one wants to change things faster than ever... but AI, so being having reliable merge queue in the place is important.

Now, as you might imagine, flaky tests will basically bring a merge queue to a grinding halt.

Cory: 00:05:55

Oh yeah.

Eli: 00:05:57

If you're sitting in a queue... and just like all queues and you learned in early computability around pipelining... if that head of that pipeline gets hosed, well, you're going to do a whole pipeline flush and you're going to have to start over again.

We have anti-flake protection built into our system that actually can take advantage of that stacking behavior of a merge queue. Again, I submit that change. I'm renaming a function, you're behind me and now you're also testing my renamed code.

Now if my PR fails because of flakiness, well, we have another bite at that apple. Right behind me is your PR that also has my changes and your changes. So together, if that change passes, we know that I'm good to go also. That it's probably just a flake.

So that's one of the features we have with this anti-flake protection we have inside our merge queue that basically will keep things moving really quickly. So it's a very cool feature that you can kind of tune. And then we also have batching and we have dynamic parallelism.

So companies that basically understand...like one of the things that engineers hate about merge queues is like they'll make a change to a doc or the front end code, they're like, "Why am I sitting behind all this back end code? I'm not touching that at all. I'm just trying to change a color. Like, this is nonsense."

So we have a dynamically parallelizable merge queue that basically can use your build system like Bazel or Buck or Nx or your own globbing pattern of like, "Here's my front end code, here's my backend code." You give that to us and we can actually dynamically build parallel lanes for your merge queue in real time so that we get a fan in, fan out action as you're changing different parts of the code base.

The net effect of all that super nerdy stuff is that the merge queue just works and you don't have these backed up merge queues that last for days. Instead everything's just flowing through the system and we push massive numbers of pull requests through the system. And you also get this really cool graph that shows all the PRs and how they're interconnected. That's like where we kind of geek out about we know we see this.

We had one customer, Fair basically is one of our longtime merge queue customers. At some point they literally had twenty parallel lanes running inside their merge queue and none of the code interacting. And that just meant that everyone just... like it's as if they had their own merge queue, on their own, every one of those PRs, and it was just running behind the scenes for them. It was a super cool looking graph.

Cory: 00:07:57

Yeah, I'm thinking about that. In our own code base we do a lot of domain driven design.

So it seems like if you have very well-defined domains within your monolith, you could probably have even a lane per domain. So even teams aren't waiting behind each other if you're not working on the same product line effectively.

Eli: 00:08:15

Absolutely. I think that as we go into deeper and deeper agentic engineering, the agents want to have full context.

A monorepo is the best way for the agent to understand the totality of the system. And if you have multiple repos now you have to tell the agent, "Hey, all these repos interact with each other and they're going to deploy at different times." That's kind of craziness. It's hard for humans to do it. It's why Google moved to a monorepo a long time ago.

And that's why largest organizations gravitate towards that structure. I think agents are going to push more and more organizations into a monorepo world. And when you're in a monorepo world then you really want a merge queue with dynamic parallelism, otherwise you get into that stuck single lane problem.

Cory: 00:08:49

Yeah, and I feel like that's... you know, we're a fairly small team. We're, we're three engineers, you know, at my day job. Like we've leaned in pretty hard on AI development over the past six months or so. I was very skeptical last year but, you know, just... we started getting into it originally I was like, "Ah, it's hit or miss the output." And then you know, we kind of all took some time over Christmas, like sat down, played with the tools, figured out like what worked, what didn't, how to tie it into our code base. And we've gotten to a point where we produce pretty good code.

But now like we're already at this point... you know, you're saying like twenty, thirty engineers, you might need some things... but like we already seeing it where it's just like I go in and I'm like... the hardest part of our job now is we go in and it's like there's fifteen, sixteen PRs. And you know we're one of those teams, like we try to keep our PR small still. Like we still like the idea of trunk based development, small short-lived branches and go.

So like we're not going in and be like, "Refactor the entire code base" and getting like a 20,000 line change. But even a three, four file change, a hundred lines, two hundred lines... when you have twenty or thirty of those stacked up and now you're thinking across... I'm thinking across a lot of changes, I have to think, "How does that affect that?"

It does seem like we are speeding up in our production of code, but we're also speeding up in a lot more engineering and cultural problems in our space that are just cropping up really, really quickly. And I feel like CI and the CI pipeline is going to be just a huge bottleneck for teams as they start to adopt this.

And I feel like it's one of the things that is probably going to trip a lot of teams up in adopting AI, because they don't have great CI, they've got flaky tests, they can't trust the output, and now they have AI that's flaking tests faster.

Like, how are you seeing teams that are starting to adopt AI fit into this world?

Eli: 00:10:48

Yeah, I mean, I think that those teams are now all reaching out to us and becoming our customers.

If your CI is not reliable, you can't keep up with velocity. You want to, like, you're so excited, "Oh, look at all these PRs, I want to land that. The product will move forward like what it used to take a month in a week." But if I can't land those PRs, then I can't actually make progress. So you need to basically be looking at your entire testing infrastructure - How quickly are things running? How much flakiness is in the system? Do these things have to be babysat?

We always talked about... without a flaky test system in place, what you're basically going to do is have a flake, everyone's going to keep track in their head and their minds what are the flakes, and then they're going to go and rerun it inside CI. And that's really ridiculous. It just slows things down.

Engineering time is always at a precious minimum. So having to wait for a CI to run, even if that's twenty minutes, is a big pain. But having to do that agentically, you're going to talk about massive amount of burn.

One of the things we actually have built and are seeing really cool opportunities with is the ability to use the information we collect historically around all your tests to actually identify flakes to the agent.

We have a quarantining system which is critical to keep CI green. So you push up your test, the tests that are flaky, you could basically quarantine. We'll basically ignore those failures.

So you only are being blocked by CI when the red is truly red and trying to give you signal, because that's what you want. At the end of the day, when you're running a test, you want to be like, "Did I break something?" Because you don't want to break something, you want to fix it.

But when a test fails, because that's a timeout and it fails 5% of the time that times out, you're like, "What am I going to do with that?" I'm just going to run it again and hope that this time it's fast enough. And that's very silly. So, with our quarantine system, you can basically ignore all of those kind of well known flaky situations.

Now, on top of that, when your agents are doing this, they really need to know, "Did this ever fail in this way? Did I break something or is this a known problem?"

So we have... you can basically pull into Cursor or into Claude... with some of our skills, you can basically, on our MCP integration, you can pull in all of your test historical information. So when the agent is looking at CI results and saying, "Oh, this thing failed, do I need to fix something?" We don't want it to go and try to fix something flaky that's been in the system for six years because it's just going to be chasing its tail, like, and also create a PR that, who knows, is completely off track of what it's trying to do. What you want it to be like, "Oh, this thing has always been flaky. I'm going to ignore that and not try to fix it and move forward."

Cory: 00:13:08

It's funny that you say that.

We program in Elixir and the Elixir test suite, just the one that comes with it, is amazing and has this really great ability to run tests asynchronously. So I think our entire code base is like nineteen hundred tests or something like that. And it runs in seventeen seconds. It is fast.

But we do have this one flaky test that crops up all the time. And it is around like our... so we're on like the CD side of the world... so like it's when the deployments are happening and, you know, it's a timer related thing. And so it just fails every once in a while.

And I had Claude working on something overnight, left, came back in the morning and there was just hundreds of changes. And I was like, "Whoa, what in the holy hell is this?" And I'm like going through it, I'm like, "Wait a second, it's trying to fix this flaky test."

And what it did was it went to every single test file that we had and it switched the test framework to running synchronously instead of asynchronously. And it's like... it 100% fixed the test, but our test that also ran in seventeen seconds now took like fifteen minutes.

And I'm like, "Well, I'd rather just see that flaky test every once in a while."

Eli: 00:14:21

Cory, I have a solution for you. You've got data? We'll get you quarantined, no problem. You guys only have three engineers. You're within our free tier.

Cory: 00:14:29

There you go. So for folks that are doing CI, maybe they're using GitHub Actions today... How does Trunk fit into your world if you're using GitHub Actions or you're using GitLab? How do you tie in Trunk into CI?

Eli: 00:14:45

We're totally agnostic to what your CI system is. So basically you're still running your tests... I don't know what the Elixir command is, but you're running your Vitest, you're running your Bazel test, you're running your Jest, whatever it might be, your Playwright test for sure.

After that, those tools can all be written... can all be configured to basically output JUnit XML, which is the lingua franca of test output, test results. Which is funny because it's like a pseudo spec developed by IBM that is the loosest XML spec you can imagine. It's a travesty that the industry uses this, but this is what it is.

The next step, you're going to call our analytics CLI. That CLI is basically... it's a Rust based CLI that will take that JUnit xml, it'll post that information up to our service. It will then also check all the failing tests that were reported in that JUnit output.

If all those tests are flaky and quarantined, we'll actually go and change that exit code from a failure to a success code... if you're using quarantining. Otherwise we'll just start tracking the information. We'll understand, "Here's what happened in this test" and we'll print out inline NCI and also up in our SaaS dashboard, "Here's what happened in this particular PR for your testing."

Cory: 00:15:50

Yeah, so how do you guys handle... let's say it's a flaky test, but let's say we actually just... somebody broke it, like three changes up from me. Like, it's just broken now. How does the quarantining detect the difference between something that's flaking and something that's actually busted now?

Eli: 00:16:04

Great question.

So when we're doing our analysis, what we actually do is we'll look at the failure reasons that are inside the JUnit XML and then we'll actually do AI embeddings to understand the distance between that failure type and any previously seen failures. So we call this a failure fingerprint. And you can basically identify when things are being broken in a new way, which is also interesting to the agentic solution case as well.

So someone goes in, this is a giant big problem. It's like a broken window situation. Like, "Oh, this test is flaky. Ignore it." No, you actually just broke this test. So because we have this fingerprinting design, what you can actually say is, "This thing was failing with this timeout for the last eight years, but now I had this new failure that was introduced for this exact commit." I can tell you exactly when it happened and you can go and fix it.

So that is like a critical piece to this system that we've built where we're doing analysis of what you've actually uploaded to us, doing this fingerprinting analysis and showing you the discrete ways in which a test fails.

Because if you're an engineer that's tasked or an agent that's tasked with going to fix a flaky test, you want to say, "What are the different ways it's flaky?" Because it's not all flaky the same way.

There's a similar parallel problem of like an organization is using some Docker pull to do some tests and like 300 tests rely on this Docker pull. And the Docker times out or there's a rate limit and all of a sudden all those tests are going to be marked as failing. Then you'll retry it and it'll pass. You'll be like, "Oh, these tests are all flaky." That's obviously not the case. It was an infrastructure problem.

So we also have an infrastructure circuit breaker in the system as well. So we'll detect this is a massive number of tests that just failed. These results are actually not indicative of flakiness of the test, but a flakiness of the system. We can alert you to that situation as well and protect the test from being basically marked in our system as flaky.

Cory: 00:17:43

Oh, that's very cool.

It's funny, like, our flakiest test is also probably like the one that if it actually shipped broken, would piss people off the most because it's the actual deployment mechanism. So it's like whenever I see it fail, I'm like, "Please for the love of God, tell me that's the flakiness." It's like, I know the signature. Like you're saying the fingerprint. The second I see it, I'm just like, "Rerun, whatever. I know what it is. That's not a problem."

Eli: 00:18:08

Yeah, we're going to fix that for you, Cory. We're going to get you integrated. We can do the whole thing in an hour, and never see that thing again unless it's actually broken in a new way.

Cory: 00:18:18

I would love that. Love a green build.

So for folks that... I know, some of the audience is a bit AI skeptical and I feel like... there's like two groups of people that I feel like are talking at each other and not talking with each other right now.

There's like the far booster side of the world that's like, "Definitely figured it out." Which I feel like I'm starting to move towards that booster world. But I'm also concerned about where AI is going. Like, as an industry kind of freaks me out. And then there's like the people that's like, "This stuff's just... this sucks and everybody's hallucinating over here."

Like, what are you starting to see team wise? Like, are you seeing teams succeed with their AI initiatives?

I feel like the teams that do the best around CI and code quality tend to see the most success with generative output. But what are you guys seeing as somebody who works right around code quality?

Eli: 00:19:06

At the heart of the matter, the future is AI driven development. AI is just much better and faster at writing code than humans are, at the end of the day. What we're seeing is massive success, especially of late.

I think if you were using early versions of Copilot, you'd be like, "Okay, you can sort of do a for loop, good job, but sort of just as good as autocomplete and not really getting me there." Now what I would say to anyone who's AI skeptic, I'd be like, "Go open up Cursor or Claude on the latest model, give it a prompt and see where it goes for you." Right? And I would say the places where it's best, or places where it has the most context window, is like... Doing UI changes used to be such a pain. You could have frontend engineers who'd be like, "I'm really good at CSS and making things pop up and bubbles and all stuff." That is just so easy now, it's my favorite thing too, because I never wanted to learn any of that. The last time I coded frontend was like HTML 2.0, whatever it was. Now I'm like, "Okay, make a pop up, make the combo box, make it do this thing." And then it's like, "Wow, it worked great, let's go."

I think that that is so exciting because you can deliver really cool, beautiful, polished UI that in the past would have taken you forever and you would have just agonized over it.

Host read ad: 00:20:18

Ops teams, you're probably used to doing all the heavy lifting when it comes to infrastructure as code wrangling root modules, CI/CD scripts and Terraform, just to keep things moving along. What if your developers could just diagram what they want and you still got all the control and visibility you need?

That's exactly what Massdriver does. Ops teams upload your trusted infrastructure as code modules to our registry.Your developers, they don't have to touch Terraform, build root modules, or even copy a single line of CI/CD scripts. They just diagram their cloud infrastructure. Massdriver pulls the modules and deploys exactly what's on their canvas. The result?

It's still managed as code, but with complete audit trails, rollbacks, preview environments and cost controls. You'll see exactly who's using what, where and what resources they're producing, all without the chaos. Stop doing twice the work.

Start making Infrastructure as Code simpler with Massdriver. Learn more at Massdriver.cloud.

Cory: 00:21:16

It's really funny.

We had this idea recently, or I had this idea recently for something in our product that's been challenging for customers. And I sat down and I drew up like, I do what I always do. Like, I do very Balsamiq-esque... I don't know if you're familiar with the Balsamiq... I just do very Balsamiq-esque designs, usually because I don't like people to think about colors. And I was like, "This is my idea." And I showed it to one of my teammates and he's like, "I don't get it, I don't get it." Like, he didn't get it, and I'm like, "Damn, I really think this would help." And I'm like sitting here looking at this drawing and he's just like, "Yeah, I don't know." He's like, "It seems a little convoluted."

And then I sat down with Claude and I was like... I haven't done HTML in a long time either. I don't know... I know jQuery. I remember when jQuery came out. Like, that's where I'm at, right? But I am definitely not a Typescript ninja. I'm not a React 10x rock star, any of that stuff. And so I'm like, "I really want to get this design... like what it would look like in our site and then get his opinion." So then I literally just took... I dumped the HTML from the page we're on, popped it in Cursor and was like, "Hey, take this Balsamiq thing I just drew, put it in here." And then like, I'm going to iterate on it to like get it like how it is in my mind. And then I sent him a screenshot of it and he's like, "Oh yeah. He's like, "That would absolutely work."

And it was just like... the ability to get something out that's an mvp. It's like, I know the code might be dog shit underneath it, especially given it was just working from an HTML dump, like not our actual like full code base. But like, that was an idea that will greatly benefit customers that I could not have communicated without it.

And I was at a dead end. I drew it like four times. It's like every time I showed it to him it was like, "I don't get it." And I was like, "Arghhhhah." And now it's something that's going into production and like we've shown it to people and they're like, "Oh, this makes way more sense than what you guys have today." And we're like, "That's fantastic."

I feel like that's maybe like a place where people can start, because it is intimidating to go into your giant code base where you have fifteen flaky tests and so many red herrings that AI can chase. But like to sit down and it's like, "Okay, let's think about this problem that we haven't been able to solve in the business yet and let's see if we can start something new there." I think that's a really good way to kind of get in.

But again, I think from that point, again, code quality is the thing that is going to keep your output good. And I think that's one of those things that teams have just struggled with the time and ability to do in the face of generating revenue. Our debt is always second to the company making money.

I feel like that's one of the parts that's going to be very, very hard for engineers to get to this place of really good output until companies put in the same, or be it maybe a little more effort than they put into like their DevOps cultures and initiatives.

Eli: 00:24:17

Yeah, I think, like that whole story you just told, like, has been experienced by everyone who's basically an AI convert at this point. Like, I've definitely gone through the same thing myself.

Like when we needed a Chrome extension to be built, I just like started with like, "You know, all right, let's go build a Chrome extension. Brand new code." Like we did the whole thing in two days and it was so polished. That would have taken us a month. We would never have scheduled the time because it would be not worth it for the amount of effort it would take to maintain. And we did the entire thing in Cursor with V0 as well.

I call this thing "Code First Engineering." I wrote a blog post about it and it's really like the cost of code used to be super high. Generating code was expensive - crazy expensive. So we were like, "Let's not start writing code until we know what we're doing."

And we'd be like, "Let's sit down with design, let's sit down with testing, and let's sit down with a PM and figure out if this thing is going to be viable, and we'll do screenshots and design and figma and all that stuff." And now that's all gone.

Now you could be like, "Hey, I have a drawing from Balsamiq, or I just have like a screenshot that like V0 generated, dump it into Cursor, wire this into the product." Code First.

We tell our team now, tell all of our engineers, "Every one of you is now a product engineer. You're not a frontend engineer, you're not a full stack engineer, you're not a backend engineer, you're a product engineer. You build product."

That was always our job as engineers, like we're supposed to build things. Like no one cares what the code is looks like under the hood. They want to make sure it works, it's reliable, and also that it's doing the thing that they want it to do.

And I think that that is really like what product engineering is all about. Now everyone's a product engineer because now you have a PM and you have a designer in your back pocket. You can go to all these tools, they'll do really great work for you. And when we say Code First we say like, "Code first, let's make sure it works and make sure it makes sense."

You showed that to your peer, he's like, "Oh, I get this." Great, now we can go and polish it and get feedback and clean up the code and do all those things to make sure that it's like shippable and not just like, whatever it was... like a single prompt shot. But you know that it'll work. Like, you've done all that testing and we do it with code first, with real data.

Because, like, when we first built the flaky test product, AI wasn't there yet. And we spent months looking at UI and thinking about it, but once you actually had the data in place... it doesn't work at all, like these mocks don't actually match the way the data really feels. And now I would never go and do those mocks. I'd just be like, "Let's get the data, see what it looks like." We can get graphs and charts and tables showing up instantly with AI. Once you have the data, create the visualizations and see what actually works.

Cory: 00:26:45

You know, like where you guys sit today... so, like, let's say teams... because the other place I'm kind of seeing people get choked up, especially in our org, is like the actual... and I feel like some people in this space are saying that they're just not doing this anymore.... which, if you aren't, congratulations, I guess... but like, we still have to review the code, right?

The build is green. That's fantastic. The build is green. Like, I ran my Sobelow security scanner or whatever, right? It ran all that stuff, that's all green, but is it green to me and you?

It's like, we can catch some of that stuff in linters, we can catch some of it in style guides, but these are two quality pieces many teams still miss. They have their style guide in the Readme and they're like, "Hey, do it this way." But nothing to enforce it.

There's teams that have... let's say they've nailed all the stuff in CI. I've got my linters, I've got my style guides. It's all automated, my formatting, my quality, my security scanning, all that stuff. But then at the end it's like, "Okay, but there's still this code that someone's going to maintain." Whether it has a heartbeat or, you know, a cpu, who knows?

How are people dealing with the review process on the other side of the CI green building? And are you seeing people still get hung up there in just like the manual review of the code, or are you seeing people be a bit more, I guess, laissez faire about what they're merging?

Eli: 00:28:05

I think that that's a really interesting point where, we'll have to as an industry figure out what do we do as humans to be reviewing this code.

And I think that there definitely are cases where like the browser extension that this thing generated, in this case, every single GraphQL call got its own file. I was like, "That's really weird. Why are you doing that?" So I was like, "Don't do that. Be normal. Do it like you would normally do."

And I think that when we were making GraphQL changes to our frontend service, I sent it to our GraphQL expert and he was like, "Well, the best idiomatic way to do this is this." And then he did the code review and he was like, "We should structure it this way."

He left those comments, and what I just took to those comments was, "Hey, Cursor, look at all those comments, address them and then fix it up." And that was like... the part where like I had to do that job, that was silly. Like he should just like write those comments and then an agent should be like, "Great comments, I'll go fix them and land it." Like, Eli doesn't care the way that it's done, we just make sure that it's done right.

I'm not on the front line of engineering right now, so I don't have the context of like, "What should this look like?" I think that you need people to still have taste in how code is designed to make sure it works right. But certain things, I don't want code reviewed.

If it's a whole 10,000 lines of frontend code and you have integration tests to make sure everything is clickable and the right UI changes at the right time, ship it. Don't look at it because it's just going to be tossed out or changed the next day. Do you really want to look at what the CSS looks like here?

You need to care. We've always done this... it's like, let's care about the things that matter most. Like in the old days it would be like a nit. Like you spelled this thing wrong in this comment, and if I update that, it's going to run CI again. Like, that's bonkers. Like you're going to spend 30 minutes rerunning test... well, in your case, 17 seconds rerunning test... because I changed the comment to be spelled correctly. Like, I think you knew what I was talking about.

So I think that the more we get... just like we never... we don't look at assembly because we just know the compiler did the right thing... for the most part, there have been compiler bugs, but for the most part it just works. I think it'll be the same thing with the code we generate today.

We're going to move further and further away from reviewing the code and just reviewing what the product does, because at the end of it, that matters.

Now, AI is really off in certain cases of building things that are scalable. You want to lay out this data in a reasonable way that's actually going to work when we actually have a billion users or a million users or... you know, we consume billions of test runs every single day. Is that a thing that I would just trust AI to correct the schema correctly? No way. Like it's not there yet, it doesn't understand the complexities.

You just have to know where you need to bring in your taste and your design thinking as a large systems engineer to make this thing work. But the code itself, we're going to do a lot less of that reviewing in the future because we... there just won't be time and also it's not the best use of our time to make sure product does what product is supposed to do.

Cory: 00:31:00

Yeah, that's interesting. I can definitely start to feel that in certain parts of the code base.

It's funny, frontend engineering is always a place where I feel like there's just so much going on there, and I feel like people dog on frontend engineers sometimes. It's like, "Is it engineering or not?" It's like, "Yeah, it's fucking engineering. We're putting together things to tell machines to do things. It's engineering work." But at the same time, like... I don't want to use the word throwaway... I know that's the word people are using around like, "Yeah, that's our throwaway code versus our stable code." I don't necessarily love that term because like people are working on that and like to know that your output is like the throwaway thing... but you know, what's really interesting about like CSS and HTML is like, it's not your business logic. You know what I mean?

Like, it's not your domain. It's not the thing that makes you money, right? It's the logic that you've built, the abstraction that you've created, and that software as a service that you've built - that's the thing that makes you money. Like this is a presentation layer and teams change this all the time, right?

I talked about this a few episodes back, but didn't go like into nitty gritty details, but like when we built our landing page originally, we spent fucking like... first we spent $25 on like a React template off of something when we first launched, and it was ugly. And then we hired a marketing and branding team and we blew twenty, thirty grand on our homepage, and it was a catastrophic piece of shit. Like, it was just... it was gnarly, CSS just gnarled everywhere. And we regenerated... like, our current site was four bucks. Four bucks.

I sat down and had the best day of my life. I was like, "Our site looks like shit. I'm going to make it look great today." And we sat down, I burned $4, I generated like 20 versions of it. And we got to a point where I was just focusing on what the aesthetic was like. We got it to a point and then went to my frontend team and I was like, "Hey, what do we have to have in place that would make you happy about a code base? This was the new front end. What would make you happy?"

He started putting all of his details in there, like what he would want. It's like we fed that UI our existing sitemap and then his constraints and we got just the tightest, most well designed... like React components for everything, like the CSS is extremely well scoped instead of just kind of all over the place.

It's like, we would have never done that, right? And at the same time it's throwaway code. Like it cost me $4 and me screwing around to get it. And like that's the reality, I think, of like where a lot of our non-critical business line code is going to end up.

It's like you're sitting there and you're looking at your email system that you built... built in Rails three, six years ago and the templates are all shitty... and you're thinking like, "This AI stuff's never going to work." It's like, dude, point it at that thing that you don't care about anymore and say, "What do I have to do to get the fresh take on this?"

Like, you have a system that works, use that as your constraints and guide rails, and just say, "I'm not going to try to change this. Let me just bang out a new one." And see how it looks.

But like, we've reached this point where like a lot of code to me is like, "Eh." Like it doesn't... that code, it's not that I'm looking at it and I don't understand it, I'm looking at it and it doesn't matter. Like at the end of the day it does what it says on the box and the people that are using it are happy.

Like, who cares if like this React component is great or not, so long as it renders fast and it looks right. And I think we're probably getting close to that with like a lot of our software. So it's like when we look at our domains, our core provisioning domain, we review the hell out of those PRs, like that's where a lot of the security is. But then when it gets over to like account management, where you're uploading your picture, it's like, "It works. Like who gives a shit how it works? Like it's not storing the things publicly where people can access, I'm good to go."

Eli: 00:35:01

Yeah, yeah, exactly. You can also build things that are just so much prettier than you would have ever been able to do or ever be able to justify.

I built like a tool inside our homepage where, like on a marketing site, you like right click on our logo and download our assets. Because I was always like so annoyed... like, I need this asset, I need it to be either the word mark or the trademark, and I need it to basically show up at different sizes. And I just like... I saw another company that had a nice one. I was like... screenshot, "Make this thing work like this." And it was like three hours of just toying with it and then it was amazing.

I was like, "Is this necessary? No." But it was a really fun AI experiment of how well can you just build something that can do these kind of really cool things really quickly. And I was like, "This is so cool."

And it's why I think it is the most fun time ever to be a builder.

You want to build something? You have this army of tools that can go and help you build faster than you ever could before.

It's like... the equivalent of like carpentry, like if you wanted to do woodworking like before power tools - man, that's slow. That's just a slow way to do. Now every one of us has like a full wood shop available to us -

you got a table saw, you got a joiner, you got all these pieces. Everything is like powered by 240v and you're rocking and rolling, you're just pounding through it and you can make you know, furniture at the end of the day much faster. Same thing, like we can make software much faster.

And it's so fun because the things that used to trip us up, especially like... we had one of our engineers was like, "There's this like weird thing that fails all the time, and like it's not a big deal, but like wanted to fix it." Sent it to Claude on Monday, and Claude did like a four hour research project and was like, "Oh. The reason why is because like there's this weird timing race condition the way they were doing xyz, I fixed it now, it'll never happen again." You would never go fix that. Like that's just not where the humans time... be like, "Can we make this never happen? It only happens like 0.1% of the time." But now it's fixed and didn't cost that engineer any time. Just like, "Oh, I didn't know that. Great, now it's fixed, I'll never see that again."

And I think those weird esoteric issues, like where there's a knowledge base in the ether that it can pull from, it's so cool. It's so cool.

Cory: 00:37:10

Yeah. I think we're... to me it's like an extremely fascinating time and terrifying. Right?

There's these moments where it's like, there's two sides to the coin where you see these people that aren't engineers and they're building stuff and they're super excited and people like, "Eh, they're full on in psychosis." And it's like, "No they're not, they're excited that they've had this idea forever that they've never been able to communicate to another person effectively to get it created and like now it exists." That's very cool.

The scary part of that is... we've seen like the Postgres root password just like sitting in a JavaScript file and like all that stuff. And so like, that's the part of it that's very, very terrifying. Especially seeing like the full blown... I'm probably going to get shot by the YC Mafia for this... but like the full blown, like YC psychosis is happening and like the current batch.

I'm not sure if you caught up with that online, but it's exciting because like stuff is happening. It's like this is one of those points where it's like, I feel like we were like, "Well, what's going to happen to engineering?" And it's like, well, a lot more businesses are going to be created because people that couldn't create this stuff before can, and someone's going to have to support this and secure it eventually.

Now I think that our lives will be a bit more fraught for those people that are like, "Okay, I'm going to go get my job at, you know, the company that's been AI developed completely natively by the CEO for two years, it's making $4 million a year." You get in there day one and you're like, "Holy shit. It's 7.6 million lines of code and it's all somehow in CSS."

I think the other side of it's going to be a little scary, but I don't think it's necessarily going to collapse jobs. I think we're going to see teams probably become a lot smaller, right? And a lot more teams.

But the thing that's really curious to me with your idea of this like product engineer, that your team's kind of becoming, is, how do we train people in this new world? Where it's like, you know, folks like you and I have been in the industry for twenty years, like, how much longer are we going to be here that we can, you know, apprentice and teach our juniors on like, how to think about these systems holistically and how to think about them, you know, scale wise and for millions or billions of users? Like, how do we train future generations of engineers to be able to work with this output when they may not have seen complex systems besides walking into that 7 million line CSS code base?

Eli: 00:39:27

Yeah, I've had this conversation many times with different people.

It's kind of like... it reminds me of like what's happened like in the nuclear power industry in the United States like for, you know, a decade. We built a ton of nuclear power plants and we trained up people and then they all slowly aged out of the system and basically no more people were ever trained in that. And be like, "Okay, how do you build another whole generation of nuclear power reactors without all the people that know how this works?" And now we have a bunch of people that, you know, maybe they graduated with computer science degrees, but are they product engineers? Do they know how to build? They probably haven't seen large systems because you're not going to be exposed to that.

And I think it'll move really like into an apprentice system. I think for... you know, in the last decade there was this like shift from people getting MBAs to getting CS degrees and like, "Oh, that's a good way to make a lot of money. Those are high paid jobs." And it's like if you got into it just because it was a high paid job, you didn't actually have like that fire inside of you to build - it's going to be a hard road, I think, or you got to figure out how to find that fire. Because it's not about like, "Oh, I have a skill that I can write Assembly, I have a skill that I can write C++." It's like you need to have a skill where you're like, "I love to build things and I want to see them come to fruition. I think holistically about a whole system, about how do I connect all these pieces together to make something really cool that people will care about." And those are the product engineers that will continue to have great employment.

I won't be super critical, but I don't know how you fill the skills gap of like, "Oh, you've never seen a weird situation like this." You don't.

Like, the other week I was debugging the same Chrome extension and there was a situation where basically when our cookie was basically being used to call our GraphQL system, like if the organization hadn't been specified in this token that it wouldn't know what to do and the GraphQL queries would fail. And the code... Cursor was like, "Oh no, we're good, everything should work." And I was like, "It's not working." And it was because in our sign in flow, after you sign in, if you're in multiple organizations, you have to click one because that cookie was then being used with that organization. ID was stored in the cookie as well.

And I was like, "I know this because I basically intuited what was actually happening and thought through the whole flow." It took me two hours. I was like, "Oh, that's the thing." Engineers have always had that experience. Like, "Oh, that was dumb, I figured it out." Maybe one day AI figures that out. But that kind of like, "I can debug and figure through a system. I understand how all the pieces work. So when it's not working, I can kind of point you in the right direction." That's like a learned skill.

And that's going to be like apprenticeship, it's going to be an internship at bigger coding shops to understand how these things all work, and having an intuition around being a builder. And I think the builders will have jobs and the people who are like, "I'm a coder" - that's going to be a smaller and smaller part of the pie for software engineering.

Cory: 00:42:07

Yeah, I think so, for sure. I think honestly, capital is going to do what capital does. And I think organizations are going to start to see AI native teams shipping quickly and they're going to want to emulate that. And the catch will be, is whether they can or cannot.

That is an unknown. Just like many orgs tried to move to the cloud and they're sitting halfway in between here and there. But, yeah, I think it's going to be pretty critical.

I almost wonder... I know over the past twenty years or so, there's always been talks about what's the difference between a developer and an engineer. Some states and some countries, you can't call us engineers because we're not, technically.

I'm almost curious... if apprenticeship will be it, is that it? Are companies going to make good apprenticeship programs? What is that going to look like? Everybody's architectures are so different. Are we going to see the cloud offerings start to collapse and not offer as much stuff so we can start to standardize on some ways of running systems?

Or, like, do we start to see licensure actually be important? Because we're making things, security is important, seeing all the stuff. Team PCP is out there destroying right now, left and right... like, we are making a lot of software very, very quickly and like, we got to figure out how to secure these things and think about these systems better.

And I feel like that's the part that... on the other side of the very exciting part... like that is the glaring hole that is very, very concerning to me. I feel like that's the part in the middle that people aren't talking about. It's like, I can see the value of this. I can't imagine going back to writing code ever again. Like, I like the way that we do it now. I Still write code - it's very much like harness-esque things to kind of control the AI and where it's going. I still like reading the code and thinking about the system. But like to sit there and cludge through the typing - like, no way, man. I'm not hitting that tab. Autocomplete the entire thing for me.

But this other side, I think is going to be very, very critical for people to kind of pin down and figure out how we do this as an industry.

Eli: 00:44:10

No, I think you're right because I think... just the other day a CEO of a company called me up... and it was all a vibe coded thing. Built the entire thing with first Lovable, then Replit, then Gemini Cloud. And he was like going through the paces. And he was like, "I just got my first customer, how should I celebrate?" And I was like, "Well, you should get pager duty set up so you have an on call rotation, you should write some Playwright tests, you make sure that thing stays up and does what you want it to do because you didn't write any of that stuff yet." And he didn't know what those things were, but obviously he taught himself all the way through to understand how these systems interact.

I was like, "Go ask Claude to explain to you what these things are." And he's like, "Oh yeah. I asked Claude and Claude said you're right about the next things." So I'm like, "Yeah, these are the things you've got to do."

Make sure... when you're building a system and people are paying you money, they definitely want the thing to keep working. And you don't want to get like a call to be like, "Hey, this thing broke and you didn't even realize it." So that's where we get into testing, automation of testing, making sure. If you have good surface area coverage of testing, it's not going to matter that much under the hood.

Now, of course, I was also like, "You need to make sure that none of your keys are exposed." So you just put into your Claude prompt, "Make sure none of my keys are ever exposed." And you should run these other linters and you know, static analyzers to make sure that none of those things ever get posted.

Because, at the end of the day, if you don't understand the code, then like you're kind of hosed and like, you need to at least have someone that can be like, "I understand it enough to know where my blind spots are."

Cory: 00:45:35

Yeah. Awesome.

Well, I know we're coming up on time. Eli, thank you so much for coming on the show. Where can people find you online?

Eli: 00:45:44

You can find us at trunk.io, I'm sure we'll have that in the hyperlink somewhere. Love to talk to everyone who's like kind of exploring this new space and seeing their outer loop being crushed by the inner loop. We love hearing those stories. All of our starting customer calls are now always like, "Do you guys have flaky tests or backed up merge queue?" And generally they're all like, "Both."

We've migrated a lot of people onto our merge queue that are just seeing a bunch of creaky pains from other companies merge queues, like GitHub. And we have a lot of fun bringing them onto a more performance solution and getting them... you know, setting them straight.

Like we just finished onboarding a couple months ago, Brex, onto our flaky test system. They just didn't... they knew they had flaky tests, they didn't know the scale of the problem. It was, you know, grinding engineering to a halt and we've fixed that up for them. They're now like able to just fly. And we'll have a case study on that coming out soon.

It's just a very exciting time to be building and enabling teams to build faster than ever before. Like we used to have a roadmap. The roadmap is always a dream, like, "Oh, wouldn't this be so cool? But you know, we'll never get to it." And now... like I posted the other day, I'm like, "We're going to hit like end of the roadmap. We're going to hit the end of the road and have to add a new road on because we can actually hit the things that we say we're going to do." And that makes our customers happy. And it makes me happier because the team is just delivering at a... we're a smaller team than we ever were, we're now only, you know, sixteen people, eleven engineers total, if I include myself... but we're building two to three times the amount of code than we ever did before. And not just code... really, just product. We're building like two to three times as much product.

Big problem to solve and we're really enjoying the whole thing.

Cory: 00:47:21

Yeah, I think one of the things right there that I would love to just put a little note on for people that are still skeptical.

You may have just heard that and thought they're creating two to three times the amount of code. That means you have a lot of code to review. But again, that's also... if I spent six to eight hours of my day working on that feature and typing it, and it is 900 lines of code. Like, everybody on the team can consume that in much less than the eight hours I took to spend it.

And I think that's one of the parts that, like, people are kind of missing because they're way over here, and they're not giving it a try, they're not experimenting with it. And they're hearing the people way over here being like, "I merged 20,000 lines of code today." It's like that person isn't, first off - they're merging a thousand here or there, unless they're refactoring their node modules. But, like, even the people that are way over here that are saying, "I'm shipping stuff very, very quickly," you have to remember that they also have much more time than you do. And so, like, you can get good quality code, you can get good quality reviews.

It just, I think it's really going to become just accepting that our job's a little different and then managing our time a little different. We're going to spend more time reading code, and that may or may not be okay with you, but it's going to be very okay with your boss at some point in time, which is the freaky part.

Well, it was so awesome to have you here today.

I want to have you back soon because I actually have about, like, thirty five more things I wanted to talk to you about, so we'll have to put another time on the calendar sometime soon.

Eli: 00:48:51

Love nerding out, Cory, with you. And we'd love to get you onboarded onto our system so you never have to see that flaky test again.

Cory: 00:48:56

Yeah, let's do it, for sure. Awesome, man.

Eli: 00:48:59

Thanks again.

Cory: 00:49:00

Let's chat soon. Thanks so much.

Share Episode

Shownotes

Transcripts

Follow

Chapters

Video

More from YouTube