Artwork for podcast Platform Engineering Podcast
Policy as Code: Kyverno and Securing Kubernetes at Scale with Jim Bugwadia
Episode 4019th November 2025 • Platform Engineering Podcast • Cory O'Daniel, CEO of Massdriver
00:00:00 00:42:20

Share Episode

Shownotes

Most Kubernetes security breaches don't come from zero-day exploits - they come from misconfigurations. While your team runs scanners and reviews reports, containers are already running as root, network policies are missing, and compliance violations are piling up across dozens of repositories.

Jim Bugwadia, co-founder and CEO of Nirmata and creator of Kyverno, joins Cory to talk about a different approach: policy as code. Instead of asking developers to remember security best practices across every repo, what if your cluster automatically enforced secure defaults and blocked non-compliant deployments before they ever reached production?

You'll learn how to start using Kyverno today without breaking your production environment - from running your first audit scan (no installation required) to implementing enforcement mode with exceptions. Jim explains why micro-segmentation matters more than ever, how to automate network policies for every namespace, and why platform teams are using Kyverno for everything from security to cost optimization.

Whether you're running one cluster or managing Kubernetes at scale, this conversation offers practical strategies for making security a byproduct of your platform - not an afterthought.

Topics covered:

  • Why shift-left security fails and what "shift-down" means for platform teams
  • How to implement Kubernetes policy enforcement without grinding deployments to a halt
  • Automating secure defaults: network policies, resource quotas, and role bindings
  • The crawl-walk-run approach to rolling out policies in existing clusters
  • Real-world use cases beyond security: cost optimization and resource management

Guest: Jim Bugwadia, Co-Founder & CEO of Nirmata and creator of Kyverno

Jim Bugwadia is the Co-founder and CEO of Nirmata, a Kubernetes management platform built for enterprises to simplify and scale cloud-native operations across clouds, data centers, edge, and connected devices. With a mission to democratize cloud-native best practices, Jim brings deep expertise in building large-scale software products and leading high-performing teams. Before founding Nirmata, he led a global consulting team at Cisco, guiding enterprises and service providers on their cloud computing journeys. Earlier in his career, he contributed to innovative products at startups and major companies including Trapeze Networks, Pano Logic, Jetstream, Lucent, and Motorola. A hands-on technologist, Jim continues to code in Go, Java, and JavaScript, reflecting his passion for building in the rapidly evolving world of software.

Jim Bugwadia, X

Nirmata

Kyverno

Links to interesting things from this episode:

Transcripts

Cory:

Welcome back to the Platform Engineering Podcast. I'm your host, Cory O'Daniel and today I'm joined by Jim Bugwadia, co-founder and CEO of Nirmata. It's a Kubernetes management platform that helps enterprises simplify and scale cloud native operations across clouds, data centers, edge environments.

Jim's been in the space for a while, from Cisco to startups like Trapeze and Panologic, and he's still very much a hands-on technologist, coding in Go and Java and JavaScript. He's also deeply involved in the Kubernetes community, leading development of Kyverno, a popular open source policy engine for Kubernetes.

Jim, thanks so much for coming on the show today.

Jim:

Thank you, Cory. My pleasure. Happy to be here.

Cory:

I ran into you a few weeks back at a VC little meetup thing and we were talking about Kyverno and I was like, you've got to come on the show.

So we're definitely going to get into Kyverno a bit, but would love to just hear a little bit more about your background, like how you got into the Kubernetes space and you know, what you're kind of working on today.

Jim:

Yeah, so I'm a software engineer by training and background. Did my education out in the Midwest at the University of Illinois. Worked around Chicago. So there at that time, this was like mid-90s, the telecom industry was going through like, you know, their growth period.

Cory:

Oh yeah.

Jim:

So was fascinated by, you know, some of the technologies coming out there. Worked for quite a few years at Motorola and then at Bell Labs/Lucent in that space.

And it was fascinating to me, as a new employee and a student, how software was getting used to solve real world problems.

This was for telephony and what was interesting is when I joined these companies, they put you through a very rigorous orientation process and really hammer into you the importance of the software you're building. It's not only business critical, but it's, you know, life or death situations when people want to make phone calls for emergencies, things like that.

These systems we take for granted these days, but it was early days of cellular. For those of us who remember, it was like, you know, it was awesome to get a call through. It was like, "Wow, this actually works." And there's software behind it and that was part of the work that we were doing with CDMA and TDMA and GSM back at that time.

So my work has always been in regulated industries. And then of course as the Internet came about, I switched into data networking. Worked, like you mentioned, at Cisco, so with a lot of different networking technologies, things like that. And what's always been fascinating to me is that business critical, mission critical aspect of software. And if you look at Kubernetes today at scale, that's exactly the appeal.

Now we don't have to think about some of these hard problems lower in the stack. Back in the early systems at Motorola and any telecom, they used to do redundancy and all of that stuff in hardware. Now it's all software driven, and we can spin up clusters on our laptops. It's mind-boggling in terms of how fast the industry has moved and progressed.

But some of the core principles, the patterns, the underlying components tend to be similar things repeated now in different ways with more flexibility, with of course everything we have built in the last couple of decades with virtualization and containers and DevOps.

So it's fascinating to see all of that come along and, like you mentioned, I'm still pretty hands on with the product and software and mostly focused on open source these days - which is also a lot of fun.

Cory:

Yeah. And so you're one of the creators of Kyverno... and so if you're not familiar with this and you're running Kubernetes today, hit pause.

Yeah, actually you don't have to hit pause. We're going to talk through it a bit here. But Kyverno is one of those tools that's like... it's one of the first things I reach for in a cluster.

Now if you're learning Kubernetes, it might not be the first thing you want to install because it's probably going to break all the tutorials you go through. But if you're running a production workload today on Kubernetes, there's so much like outside of the workloads that we have to be concerned with around compliance and security and Kyverno is an amazing tool for kind of getting that into place from a central point of view. Which I think is extremely important for platform engineers.

So can you tell us a little bit about Kyverno? Like what got you interested in the idea originally and... yeah, just a little bit about the project.

Jim:

Yeah. It's like with anything, you know, it's an evolution or, like I was mentioning, taking patterns that were familiar before and repeating them in different ways.

So interestingly, Kyverno came out of Nirmata. We didn't start with open source first, we started with commercial.

Cory:

Oh, very cool.

Jim:

Our initial product was more around Kubernetes management. So we had policy and governance as one of the services we offered. But we also did things like cluster management, workload management. The last few years now we're focused completely on governance and policy as code, which I'll explain in more detail. But the interesting thing was as we built, you know, we used of course used things which are familiar to us.

I mentioned my background, interestingly my co-founders, Atnin, Marta, Ritesh and Damien, they also come from similar industries where they've worked in regulated industries with highly critical, always on software. Stuff that can never break, right? This has to be always on, has to be always working. For us, like governance, security, policy was just natural for us to bake into everything we did. And that's how we built the initial software and tooling we did at Nirmata.

What happened was, as Kubernetes matured and as we were looking and working with early customers... this was like release 1.14, 1.15, so very early days of Kubernetes... but things like CRDs had finally matured, things like admission controllers were becoming available. We saw the opportunity where... the best way I think about it is back in... for those of us again who worked with data centers and things like that, any change you want to bring into a critical environment, you typically have to go through a review board process. Then you schedule time for a change window, and then you make the change in production. That used to be the way you do things. Now we deploy, we just push code and tooling, Infrastructure as Code, all of that just happens, right? And it's continuous changes all the time.

In that, where is your review board? Who is managing and making sure that this change is compliant with your organizational policies, with industry best practices, with security best practices and other things? Sure security tools can find this after things have been deployed, but it's too late.

Cory:

You know what else is too late in my opinion is... and this is something that I feel like I rally against a lot on here, so sorry for bringing this point back up, folks... you know, a lot of the boilerplate around like repo and IaC management is the compliance and security stuff, right? And it's like I've got 13 apps, I've got 13 repos, everybody's running their containers as non-root and it's like well we gotta go put that scanner in 13 spots. And it's like somebody adds a new app, it's like you need to remember to add the compliance and security stuff to it too.

This is one of those tools that it takes like security and compliance scanning from like a best effort, maybe we'll figure it out during CI if somebody remembered to put the policies in place, to like making your clusters defensible. Right? Like the cluster says "Hey," through you know, validation or mutation, "like you can't do what you just tried to do." You know we're going to just put non-root on everything. And you have a central point. You're not managing 13 cookie cutter stamp outs of some GitHub actions someplace. You can say, "Hey, I know that this cluster is not going to become in an insecure or non-compliant state based on our rules," which is an absolute time saver.

Jim:

Absolutely. And this is where you know, these days... of course this podcast is all about platform engineering... we talk about bringing together these best practices. And one of the concepts we also talk about within CNCF and in fact myself and one of my co-chairs in the Policy Working Group, Poonam, we co-wrote a paper on what we called "Shift-Down Security." Because, if you think about it, the whole point of... everybody talks about shift left but, like you just described, shift left is really, really hard because you have to train every software developer about security,and they have enough stuff to do. They have, you know, applications to build, manage, run. Now for them to take on security, that's like an afterthought. And the only way you can solve that is by consolidating all these best practices, codifying them with Policy as Code and deploying them at common enforcement points.

Clearly you want to give that feedback as early as possible, whether it's the pipeline, whether it's your IDE, if you're writing declarative manifest, but also for defense in depth you have to be able to do admission controls. Kubernetes is fantastic in the sense it has extensible admission controls where you can plug in these checks. And you want to do background scans. So a lot of times people say, "Well, I have network firewalls, isn't that good enough?" No, it used to be good enough 10 years ago or maybe even longer, but it's no longer considered secure. Or if you're running runtime security it doesn't mean you're done, right? You still need to, you want to make sure... and it's mind-boggling that every security company of course focuses on vulnerabilities, but by definition a vulnerability is a report on something that's again already happened.

And if you do look at the research from Gartner and others, almost like 90% of issues in production come from misconfigurations. So why not solve that, right? And then you have much fewer vulnerabilities to deal with. And even if you have an occasional vulnerability, it's less critical because you're completely... your workloads are properly segmented, they're isolated, they're secure, there's no risk of, you know, the blast radius increasing.

Cory:

I mean the other thing that's problematic about reports is, like when you get a report like it's typically not going to everybody. Somebody's going to get it and send it to everybody, right? But like you've got that report and you're like, "Eh, there's a bunch of stuff we didn't do." And now I've got to go delegate it to people. Like that's the sending it to everybody, it's like, "Hey guys, we got to fix this." So now I have to figure out who owns this resource, why is it this way, how long has it been this way and how much debt has been built up on top of this that I can't just change it now.

It's one thing to be like, "Hey, I got a PR, this feature is going out." I merge it and it doesn't go through, or Kyverno makes the change and makes it like non-root or whatnot for me - it's just like great, it's a non issue. But it's another thing to like merge it, now there's usage on top of it, and maybe I've built on one of these constraints I'm not supposed to have in place, right? And so the reporting... like the after-the-fact, the reactive compliance through reporting just creates more work for us when we know these rules.

I think one of the other things that is really novel about it is like going back to that like Change Management Board. When you think about like SOC 2 and compliance and security, I'm going to argue - I'm curious if you disagree, I might get some emails for this one - but the average developer in your organization, I would say, is not necessarily a stakeholder on compliance and security. Like you said, it's hard to shift all that stuff left. And it's kind of table stakes.

And the thing that's wild about it is that the stakeholders will frequently be people that are outside of that development loop. You've got CISOs, CFOs, legal, like a bunch of people that may not even be engineers that have concerns about how compliance is working. And they very much have input on how these things work, how we went through our SOC, like what we're doing and like that feels like who needs control of that. And that is so much harder to do with the smattering of stakeholders over here that may be non-technical, and then you've got a smattering of like just boilerplate scanning all over here, and then some reports. That is just one of those recipes for just like so much work and so much vulnerabilities are going to be in place.

That's a place where many people sit today, and it's like, "Oh, we're doing a good job, we're running our scanners, we're getting our reports. It's just these gosh darn engineers just won't fix all...they won't shift all the security stuff left." And it's like, why? Because they've got a PM and they've got stuff to ship.

Sorry. So that was a little bit of tear but...

Jim:

No, makes complete sense and very much, you know, matches what we see in the space. But I think that the opportunity with Kubernetes... So first of all, Kyverno wouldn't be possible without Kubernetes and the whole resource model, or what I think some folks now call the KRM. The Kubernetes Resource Model with declarative config is brilliant. Like that's one of the game changers.

But that itself comes with complexity as well. If you think about like a pod or a deployment, like you said, there's so much boilerplate like none of us like even... I've been using Kubernetes for almost 10 years now. Since the beginning. I don't know every last detail in like a pod or a replica set or a deploy. And I don't need to.

Cory:

Mm-hmm.

Jim:

I just need to know a few basic attributes to get my stuff running. In fact I was talking to somebody just yesterday, I was helping them with the demo that they were setting up, and you know, they hadn't noticed before that Kubernetes by default keeps up to 10 replica sets. And they were asking like why Kyverna was flagging violations when they updated - it was because there's previous replica sets sitting around for rollback, right? And it's like the stuff just adds up. You don't know what's going on. You're like, "Hey, my app is working. We're all happy about that." But there's so many details in the configuration that often get missed.

Now this is where... and it's not a knock against Kubernetes, all of these things are required, somebody wanted that feature, somebody thought it was a great idea, they implemented that and there's a reason for it being there... so when people say Kubernetes is complex, I always have an issue with that because it's solving a complex set of problems. It should be complex, right? If it were simple, then we would all be frustrated with the lack of configurability and flexibility and other things.

So it's a good thing it exposes all of this. But what it also begs for is automation and automating that security concerns upfront by using declarative policies and separating those from your configuration that developers or other teams care about. So at its guts Kyverno is a policy enforcement engine.

It can apply policies as admission controllers, or it can apply policies in your pipeline, in your CI/CD IaC pipelines. You can even just apply using a CLI. But the whole idea is to then allow you to manage your policies as Kubernetes resources in your Git repos using GitOps, and then flexibly apply them at many phases in your deployment lifecycle and give that early feedback, but have a common team control the set of policies.

Now, very interestingly, you mentioned compliance and other things. When I moved over from the telecom world into the data networking world, it was always interesting, I found it curious - why is there a separate security team? Why isn't that just part of engineering?

Because it seemed like that was never something that engineering or others took on from the beginning.

And back in the telecom world, there's an acronym which I remember called FCAPS and it said if you want to manage something, like you're building a management plane, you need Fault, Configuration, Accounting, Performance and Security. That's FCAPS. Security was there. Even though it was last, it was there. And it's part of everything you have to do, you can't say I'm done until you have addressed that part of it.

So here it's interesting that there's sometimes a separate team managing compliance and then coming and saying, "Hey, we can't put this in production because of these reasons." What if you could flip that around? What if you could say, "Well, you take your governance or your document with hundreds of control," which large organizations will have...

Cory:

Most definitely.

Jim:

"Codify that, put that in a Git repo, just like we do with IaC, make it declarative, make it GitOps based, and let's agree to that as an organization so everybody sees that there's change management processes on it." And now we can enforce these policies in a very flexible manner using tools like Kyverno in different phases of your life cycle.

So it's kind of like people talk about we've done... Infrastructure as Code is now table stakes, right? We understand that's the way. But when we talk about Policy as Code, to me, Policy as Code leads to Security as Code, which is really where we want to go for the next 10 years of Kubernetes and the industry to allow it to scale and to continue to grow further.

Host read ad:

Ops teams, you're probably used to doing all the heavy lifting when it comes to infrastructure as code wrangling root modules, CI/CD scripts and Terraform, just to keep things moving along. What if your developers could just diagram what they want and you still got all the control and visibility you need?

That's exactly what Massdriver does. Ops teams upload your trusted infrastructure as code modules to our registry.Your developers, they don't have to touch Terraform, build root modules, or even copy a single line of CI/CD scripts. They just diagram their cloud infrastructure. Massdriver pulls the modules and deploys exactly what's on their canvas. The result?

It's still managed as code, but with complete audit trails, rollbacks, preview environments and cost controls. You'll see exactly who's using what, where and what resources they're producing, all without the chaos. Stop doing twice the work.

Start making Infrastructure as Code simpler with Massdriver. Learn more at Massdriver.cloud.

Cory:

I think one of the things that's really interesting about this tool in particular, with regards to platform engineering, is I think one of the goals that many teams have is creating that like right level of abstraction. Whether you call that a golden path or whatever term you're using for that. And you know, when you're choosing Kubernetes as the base for that path - I would say not the path, I think it's a tool for building paths, not a path in and of itself - like, the goal is to create an abstraction that makes sense for developers.

And I think some of the teams that I've seen that have done a very good job at this, you almost can't tell that the platform engineering team works for the same company. Like, it almost feels like a fully isolated company that's offering a product. Like true platform as product.

And one of the things that's really interesting about this is when you're aiming for that good abstraction, maybe you've targeted like, "Hey, if you have a Docker file, that's our platform, it'll just work once it ends up in the platform engineering team's Kubernetes cluster." And as soon as you say, "Hey, it's up to you to give me some configuration of how to secure and make this compliant at the network level," like, that is the utmost leakiest abstraction I've ever seen. And that's how it lands for many teams. It's like, "Hey, we didn't quite get to... [maybe the docker file is the abstraction of how to get into this Kubernetes cluster], it's a Helm chart and good luck."

I think that's really, really key. So what I'd love to know is how are you seeing platform teams start to adopt a tool like Kyverno?

So teams that have already got a couple of clusters out there. Maybe they even just have one cluster out there, but they've already got some applications that are running, they may have things in like a non-compliant or like the most... maybe everything's not perfect. I want to use this tool, but I don't want to like just grind everybody to a halt with all of a sudden their applications are not compliant and they can't be deployed to the cluster. Like how can I start using Kyverno without potentially introducing like breakage to my production environment?

Jim:

Great question. So one thing to state before I dive in deeper into like what's the best way to roll out policies and start kind of advocating them - Kyverno from the very beginning was not just about blocking things or flagging violations and reporting, you know, doing validation checks or verification checks. It does more than that, right? So it can generate secure defaults.

Again, Kubernetes, it's very good, it's a control loop, right? So everything in Kubernetes works that way. And Kyverno also works that way. You tell it through a policy. You're saying, "Hey, here's my desired state for best practices for a deployment," or an application, or a service, or whatever you're trying to write that policy for, and Kyverno can automate certain secure defaults.

Now, at first we often get asked the question, "Isn't that violating the GitOps thing where I want to put everything in my Git repo?" Well, if you think about it, if you take that approach and that stance, then Kubernetes violates GitOps too. Because deployments create pods, right? It is natural for controllers to create resources as well and manage their lifecycle.

So similarly to what... you know, going back to your networking example... one very common use case for Kyverno is every time I create a namespace, create a default network policy. Just do it, right? Don't... no question... just deny all to start with... and then your application team naturally has to start writing policies. And you can have Kyverno even validate the network policies that the application teams develop to ensure that they're not opening up things to the entire world and stuff like that. Based on certain labels and configurations, right?

So you can get started with Kyverno even without thinking about full on security, you can get started with Kyverno for automation. And that's a very good place for platform teams. And we see a lot of platform teams just like default labels or inheriting labels from your namespace and applying them to pods. You don't want everybody to do this manually. You can automate this stuff in cluster, right? And it just happens, right? Which is magic.

In fact, I did a demo on micro segmentation with Cilium and Kyverno at KCD San Francisco and somebody afterwards told me, "Hey, this kind of seemed like magic. Why do I need AI if Kyverno can do all of this?" We're not really trying to replace a language model, but it feels like that level of automation, right?

Cory:

Yeah.

Jim:

It's kind of like once you experience that there's no going back. You can automate so many things, right?

But going back to your old question, "How do I roll out a policy?" So let's say I already have applications, I already have things. So Kyverno has a concept of an audit mode. So first thing, even before you install Kyverno in your clusters, you can download the CLI, you can scan your clusters without installing anything. So that's a fantastic way to just get an assessment of where you are.

Cory:

I can do that today? I can just grab the binary and run it? I don't have to introduce anything? I don't have to install an agent, like old school agent, in the cluster?

Jim:

Nothing. There's nothing to install on your clusters. You can just scan your clusters as long as you have Kube config view only access. It'll run a scan, it'll give you a report which you can then share and see where you are for basic policy checks.

Out of the box, Kyverno, in our community repo we have about 400 plus now samples. Those are just examples of policies, but we also have a Pod security policy set which is implements the Pod security standards that Kubernetes has. So that is definitely something everybody should be applying.

And it literally takes like in five minutes you can go from zero to secure for Pod security with Kyverno. Right.

If you want to put things in your cluster and you want to now start... so you've scanned, you've alerted folks, the next step would be to put Kyverno in your cluster but in audit mode, right? Because then it's continuously scanning. Anytime there's a change, anytime a new workload is introduced, or people change configurations, Kyverno will detect that, will rerun the policies, and will produce a policy report.

Now one of the things we recently did with the Policy Working Group in the CNCF and with Kyverno and Kubewarden and even Falco and several other projects, we got together, and we've created something called OpenReports. We created this as a separate project because the problem we are seeing is every tool needs to create reports. We have OpenTelemetry for metrics and spans and other things. There's nothing where you could create a report using standard APIs.

So Kyverno also produces these OpenReport format reports which you can consume. We have tools in the community, something called Policy Reporter which can give you a quick dashboard. There's other tools like, of course, Nirmata and even I believe there's integrations with other commercial tools now where they're starting to consume these OpenReport formatted resources and show them back to developers in your platform or in your other UIs. But that's a good way to get started as an audit mode.

You can also then, once you have violations, if you're depending on the management tool you're using, you can route those to the right teams based on namespaces or labels or other things. But then eventually you want to get to a point where you flip the policy to enforce mode, right? You can do that, you can then produce Kyverno as another resource... everything we do in Kyverno is Kubernetes resources, so again, all of this makes it super easy to manage through standard APIs and tools.

But you can create an exception. So if you want to give a team like a time bound exception to say, "Hey, you have 30 days to resolve this, but your workload is not going to be impacted. We've created the violation, we're reporting that, but we give you a grant for 30 days and after which we'll flip into enforce mode." And what that means is it's not going to impact the running workload. But if your pod happens to restart and it's insecure, it will get blocked at that point.

So we don't take down or we don't like kill pods that are running. But on the next change it will say, "This is not compliant," if the policy is in force mode and there's no exceptions for that workload, then that workload gets blocked. So it's a process.

Cory:

There's a lot of baby steps to kind of step into it, right? So I mean I can just grab the binary like day one, pretty much read-only to the cluster. I can get a good report. Like I think you get two things out of that. One, you get a lay of the land. But being able to so easily get an audit that I can now turn around and maybe get some stakeholder support on, right? Like for a team that is, you know, starting to become security and compliance minded, there could be a thousand other priorities. And this is one of those assets that you can kind of create to say like, "Hey, this is really important. We have a whole host of things and we could probably nail 50 to 60% of them by just implementing this tool."

And then you've got a couple of baby steps. You can put it in audit mode now, like in the cluster, so it's just constantly reconciling. Still not blowing anything up if anything goes sideways, but giving you a little bit more interactivity there. And then from there going into enforcement mode. I mean that feels like a great path.

I'd be curious if anyone who's using Kubernetes today is not using it. Just like... yeah, just download that binary, just fire it at one of your clusters and see how many things come back. I will put it in the show notes. You mentioned that there's about 400 policies that are already pre-built in the community. So we'll link to that if you all want to check that out.

So this is one of the policies that's in there by default is adding a network policy around the namespace.

Jim:

That's right.

Cory:

For folks that aren't familiar with why you would want to do this, just give us a for instance of why. Like, I got a namespace, it's mine, it's my own little virtual cluster, right? Why would I need a network policy?

Jim:

So by default, in Kubernetes, even if you're running a CNI, any Pod can talk to any other pod by default, right? Everything's open within your cluster itself. So you have two options.

You could either say, "I'm going to keep creating separate clusters for teams or applications," and at some point that just becomes unwieldy. And you know... I read somewhere that about 40% of what runs in the cluster is tools, so every time you create a new cluster it's like... that's crazy, right? You're now replicating those tools everywhere, and it's just never going to scale.

I mean, you're wasting a lot of money, you're creating a big management problem, so what do you do? So then, you know, you go back to namespaces, and you're saying, okay, I'm going to give an app or a team a namespace or a set of namespaces. Now managing that seems at first complicated because again, if you read any of the... look, I represent a vendor, all vendors have views, we tend to of course be opinionated about our own views... if you read some of the literature, it's like, "Oh my goodness, managing namespaces is so difficult." But it's not if you do the right automation, right?

So if you automate things with Kyverno... first of all, anytime you create a namespace, create a default network policy, create a quota, create very fine-grained roles and role bindings just for that namespace. And Kyverno can do all of this, right? So you do it, you write the policies once. And this stuff doesn't change often, right? So once you're done, this is now a secure way. So anytime somebody gets access to your cluster or they go through Argo CD, they request a new namespace, that namespace already has the right secure defaults.

So now like the question becomes, "Okay, but my application needs to talk to some external service," or "I have two namespaces and I want my app in namespace A to talk to my app in namespace B." So then basically the application team, just like they would with firewall rules, they can now say, "Okay, these two namespaces need to be allowed to communicate on these ports." And that's very much... network policies allow that. And whether you're using Cilium or Calico or any other CNI, you can now do things with labels and Kyverno can automate labeling across workloads.

So it becomes a very cool and rich way of saying, "Hey, if a pod is labeled with app A and it wants to talk to an application B with the right label, allow that. Everything else just deny it." Immediately, what you've done is a few things - not only have you implemented micro segmentation within your cluster itself, but you have now restricted the attack radius.

Let's say there's a vulnerability in one application and somehow an attacker gets in through the application. Breaks into the container, breaks out into the host or even from the container, if they have access to other things. They're not allowed to communicate with other pods or other applications because of that network policy. So it's super critical. You want to limit their exposure upfront. You don't want one vulnerable application to impact your entire cluster.

Cory:

And this happens more and more, right? It's funny thinking about software supply chains, there's two easy ways to get into somebody's environment, right?

There's so many teams, even today, you see with the Bitnami change, everybody's like, "Oh man, I've got to move my images and my charts and stuff like that." It's like you're just running other people's images, right? And I'm not saying Bitnami is untrustworthy, but like zero trust is zero trust, right? You never know if one of their images is going to get compromised. Or maybe you have just another image like, "Ah, I couldn't find this Docker image, so I just grabbed this one random one from Docker Hub." And now it's running in your Kubernetes cluster, right? Like you might be dependent on that, but does that thing need access to your entire cluster day one?

Like the most recent, the NPM attack... I don't know how to say it. I think it's like Shai-Hulud, right? There's plenty of sneaky ways to get into a cluster... and it could be through an ANSI color library, like right? So like, like network level security is even more important today.

I feel like it was such a big deal when we were in the data center, but now when we're running so many workloads, even in our cloud environments, that network security is not only important at the VPC level or your VNet level, wherever you're at in the cloud, but in Kubernetes as well, right? You can't just have things running hog wild in there. And that's one of those ones that's like... I feel like many teams aren't aware that... "Oh, my namespace is isolated, right?" It's like, "No, no, it's not much at all. It's just a space for your name, really."

Jim:

That's all it is. It's an API resource, right? And Kubernetes, look, it gives you all the tools to do that, but you still have to do that, right? Or you use something like Kyverno to just automate it.

Cory:

Oh, man. So, okay, let's say I've been using Kyverno for a while. So I've gone through the crawl, walk, run. I downloaded it right after this episode. I ran it, I was like, "Ooh, man, we got a couple of things we got to get fixed." We start using it... behind Kyverno, so it's a CNCF project, but there's also Nirmata... so like, when would somebody say, "Hey, you know what, I need to start looking at Nirmata's offerings beyond Kyverno."

Like, where does Nirmata fit into the picture?

Jim:

Yeah, interesting question. And you know, just being completely transparent, even our view and product and other things on this have evolved. But one thing we wanted from the beginning... so we looked at other solutions that were out there before we started Kyverno... and as a developer, I hate when somebody says it's open source, but I realize I can only do 70% of things with open source and then I need commercial for the other third. The stuff I really need I have to now pay for, right? So we didn't want to do that.

We didn't want to bait and switch and say, hey, Kyverno is great until you try to scale it up and then you're going to have to buy a license or Kyverno is great, but only can do five policies and then you need need something else. So Kyverno is very full-featured, very easy to use, but it's a policy enforcement point.

It's not for administration, it's not for management, it has no real features that teams can use, etc. So the way we kind of have... where we draw the line in simple terms, Kyverno detects, so Kyverno finds things, Nirmata helps fix things. So we complete the workflow above Kyverno that teams are going to need.

Now look, if you're like a five person team getting started with Kubernetes, you should still be using Kyverno because, like you said, every production cluster should have Kyverno in it. But you're not going to need Nirmata at that point. We're very happy to have you in the open source community. Come to our meetings, file issues, we'll help you there, that's fine.

But as you grow, as the business grows, as your applications become mission critical, revenue critical, or there's other reasons why you need SLAs, of course we're there to help with that, but more importantly we help with completing the workflows, the human part of this. Like if you want to do remediation, how do you kind of automatically remediate? How do you ask... like how do I inform the right team? Nirmato automates all of that.

So we can detect... and these days of course, like every other business, we're leaning heavily into AI... we're building agents now to complement our SaaS and other things. So we have agents which can integrate with Argo CD. It will automatically figure out if you're using Argo CD or Flux. It will check who deployed things, where the violations are. It can create a PR for that developer just like Dependabot can do this for vulnerabilities. The Nirmata agent can create PRs to fix/remediate like policy violations.

Cory:

Yeah.

Jim:

So now you've made it it super easy for the team to say, "Hey I have these things. Here's the recommended changes. Do I want to kind of do anything like accept this PR and say yeah now I'm secure and compliant or is there anything else I need to do and then retool my app?" But at least the first... that problem of communicating that to the right team, making sure they understand the solution, why this is important, all of that becomes automated and self-service with Nirmata.

And then also like what we see is the flexibility of Kyverno. Platform teams write policies, right? So it's super interesting. It's not just security teams.

So by the way, another big use case for Kyverno is like cost optimization, resource optimization, right? There's a lot of fancy tools and solutions but if you really want to optimize things, you got to do stuff at the Pod level, which is where Kyverno shines. It can track resources in real time, it can integrate with VPA recommenders, adjust things. So even for those type of use cases the platform team authors many policies, right? And security authors other policies. So now taking those policies or even helping with authoring policy, we have another agent which can plug in into Cursor or Claude or other things as MCP tools and can help you with policy authoring, testing. Because ultimately this is Policy as Code. You want to test every change before you deploy it.

So our agents and our AI from Nirmata help with all of that and really reduce that burden for larger teams as you grow your platform and your organization.

Cory:

So for things like... let's just say it's a change that may like run as non-root. Maybe there's something in your app that actually requires root and like switching that would break. Is there anything in Nirmata that would, that would handle that or is that just up to your rollbacks to handle? The app couldn't start, roll it back.

Jim:

Yeah. So currently we will flag those unsafe changes and say that "Hey, this change could impact..." But the next step we're taking with our agent is to actually look at the repo and see if there's a Docker file and suggest that PR for that Docker file as well.

Cory:

Very cool.

Jim:

Right. Because that's the next step, you know, and that's what humans would have to do.

Cory:

Yeah.

Jim:

So more and more we're kind of, you know, using agentic workflows to automate all of that.

Cory:

And that's cool because I feel like that's one of those things - to grab this day one, it's going to be easy to run it and see what the report is. To get it in audit mode - easy. Even to get it with mutating web hooks - probably pretty straightforward.

But the hard part is for the apps that you know that you have to keep it in audit mode because... I've seen this in so many companies where it's like, "Ah, like we should change that." And it's like, "Eh, there's a bunch of features sitting on top of like that being that specific way." And that's one of those ones that... you end up back in that snowflake problem. It's like, "Oh shit, we have to fix this on all 45 of our repos." And like that's one of those things...

Jim:

That's tedious.

Cory:

Dude. That is an awesome place to use agentic AI. Like that is a time saver. I really like that.

Well, I know we're getting close to time here. Where can people find you online and where can they learn more about Kyverno and Nirmata?

Jim:

Yeah, so I hang out usually in the Kubernetes Slack or the CNCF Slack. There's a Kyverno channel. So please do join that. You know, reach out if you need any help getting started.

And I'm like pretty much on social, other than seeing the Slack channels, on LinkedIn or even Twitter or X, just Jim Bugwadia. So feel free to reach out or DM me if you need any help.

Cory:

Awesome. Are you going to have a booth at Kubecon this year?

Jim:

Absolutely. Not only a booth, but this is going to be our first KyvernoCon. So super excited about that. Right?

Cory:

Congrats. That's awesome. Okay, I'm going to be there so send me an invite to KyvernoCon. I am in.

Jim:

Absolutely. Will do. And it was mind-blowing. Like at first we were like should we do this? Is anybody going to show up? And we were blown away. So you know, KyvernoCon is a half day event, this time in Atlanta. So we had eight spots - we got like about 200 submissions for talks and papers. Amazing talks about actual real world use cases, how people are using this in production. And I was like, "Why didn't we do this earlier?"

Cory:

Heck yeah. That's awesome.

Jim:

There's obviously such a big need for it.

Cory:

That is very awesome. I'm excited for that. Well, awesome. Thanks so much for coming on the show today and I'm going to see you at KubeCon. I'm going to hunt you down.

I'm definitely going to check out KyvernoCon.

Chapters

Video

More from YouTube