Cloud infrastructure provisioning can be a challenging task, often requiring a delicate balance between speed and accuracy. In our discussion today, we explore how AI can streamline this process. Our guest, Marcin Vizensky, co-founder of SpaceLift, introduces us to their innovative solution called Spacelift Intent, which simplifies infrastructure management by removing the complexities of traditional tools like Terraform. Marcin explains how this platform allows users to express their needs directly, enabling quicker provisioning while maintaining necessary controls and policies. Join us as we delve into the intricacies of AI-driven cloud provisioning and its potential to make infrastructure management more accessible for developers and data scientists alike.
The discussion centers around the challenges of cloud infrastructure provisioning and how AI can provide solutions. Marcin Vizensky, co-founder of SpaceLift, outlines two extremes in the current landscape: the rapid but unrepeatable manual provisioning of cloud resources through console clicks, and the slow, complex processes involving tools like Terraform that require extensive knowledge and setup. He emphasizes that many users, such as developers and data scientists, do not need to become cloud experts; they simply want to provision resources effectively. Marcin introduces Spacelift Intent, a tool designed to simplify this process by allowing users to express their needs in natural language, which AI translates into API calls. This approach shortens the lengthy deployment cycle typically associated with traditional tools, making it easier for users to manage infrastructure without deep technical expertise.
Takeaways:
Links referenced in this episode:
Companies mentioned in this episode:
Mentioned in this episode:
What do 160,000 of your peers have in common?
They've all boosted their skills and career prospects by taking one of my courses. Go to atchisonacademy.com.
Hello and welcome to Software Architecture Insights, your go to resource for empowering software architects and aspiring professionals with the knowledge and tools they require to navigate the complex landscape of modern software design.
Speaker A:How can AI assist you in developing and managing a cloud based infrastructure?
Speaker A:My guest today is Marcin Vizensky.
Speaker A:Marcin is the co founder and chief R and D officer at SpaceLift and he's also the co founder of the open source project Open toefl.
Speaker A:He's here to talk about bringing AI and AI based tools to the world of cloud infrastructure provisioning.
Speaker A:Marcin, welcome to Software Architecture Insights and I hope I didn't mangle your name too much there.
Speaker B:That was perfect.
Speaker B:Like a native Polish speaker.
Speaker B:Thanks for having me and welcome everyone.
Speaker A:So why is cloud infrastructure provisioning so hard?
Speaker B:It's a very good question.
Speaker B:I think we kind of made it hard in a sense.
Speaker B:The reason I think we made it hard is because we put the possible solutions on two sides of a ostensibly very broad spectrum.
Speaker B:One is optimizing for extreme speed.
Speaker B:I just go to the console and I click on things like a caveman and I get things done instantly, right?
Speaker B:I can't repeat them.
Speaker B:I don't know what I did and definitely my colleagues don't know what I did.
Speaker B:But it works, right?
Speaker B:And this is the kind of caveman approach.
Speaker B:And then there is another spectrum or another extreme of that spectrum, which is an extreme ceremony.
Speaker B:And that extreme ceremony is what cloud practitioners would preach.
Speaker B:So that extreme ceremony involves writing terraform or pulumi or some other arcane language.
Speaker B:Right?
Speaker B:Cloud formation with its arcane YAML.
Speaker B:And it used to be JSON.
Speaker B:Like it's probably the worst JSON I've ever seen in my life.
Speaker B:So you're correct.
Speaker B:You put it under like some form of CI cd.
Speaker B:You need some manual approvals, you'll centralize it.
Speaker B:You'll need some staging space or storage for your state file.
Speaker B:You'll need to learn the arcane language.
Speaker B:But not only that, you'll then need to learn the kind of APIs the provider in a terraform space or open TOEFL space, you'll need to learn the provider.
Speaker B:Then from that you need to learn the API of a particular resource that you need to provision, right?
Speaker B:So if you're just beginning this journey and every now and then like not everyone has to be a cloud practitioner.
Speaker B:Most of people just provision cloud resources just to keep going with their job.
Speaker B:They're like developers, they're data scientists.
Speaker B:They just need something from, from the cloud.
Speaker B:They're not necessarily obsessed with getting Everything right the first time.
Speaker B:It takes days to do things properly.
Speaker B:Right?
Speaker B:It takes days.
Speaker B:And even for an experienced professional in a highly regulated environment, in a highly controlled environment, without the right tooling and without like very like the best practices, it'll take hours.
Speaker B:Right.
Speaker B:So we go between like and.
Speaker B:There's nothing between like doing something fast and stupid and doing something very proper, but very ceremonial.
Speaker B:And that's what I think makes it kind of difficult.
Speaker B:Yes.
Speaker B:You probably shouldn't do things in, in the console.
Speaker B:Right.
Speaker B:I mean other than incidents response, I wouldn't advise that anyone does things this way.
Speaker B:But like, most of the time when you're just developing cloud infrastructure and you're just playing around, you want to build it like iteratively, you don't want to learn all that stuff.
Speaker B:Like it's usually not your bread and butter.
Speaker B:You're an app developer, as I said, you're a data scientist.
Speaker B:Why would you spend hours learning Terraform?
Speaker B:Why would you spend days setting up pipelines?
Speaker B:Right.
Speaker B:And yet if you want to do things properly, this is what the gods of infrastructure kind of require you to do.
Speaker B:And I think the problem was infrastructure development being so difficult is that we made it difficult by not offering anything in between.
Speaker A:Yeah, the way I like to describe it is, you know, the.
Speaker A:There's the time to first provisioning and then time to second provisioning.
Speaker A:The console makes the time to first provisioning fast, but the time to do anything follow on hard.
Speaker A:And the traditional, the Pulumi Cloud Foundation Terraform, the CICD approach to infrastructure provisioning is time to first provisioning is extremely long, but then the second and third and fourth are much easier and better and you can control it very well and make changes and understand it very well, but the time to do the first thing or anything is so expensive and so much time and effort just to get it tuned exactly right before it will do anything.
Speaker A:So you're right, these are two extremes here.
Speaker A:So what, what I'm assuming we're going to talk about next is the middle ground.
Speaker A:And so what you do is you provide a platform to manage infrastructure provisioning, but you do it kind of in this middle ground area.
Speaker A:Is that true?
Speaker B:Exactly.
Speaker B:That's the point.
Speaker B:We're calling it spacelift Intent and it comes as an open source version and as a commercial version.
Speaker B:In a nutshell, what we're doing is we're taking the very, very long process of Terraform, or open TOEFL in this case deployment, and we kind of short circuit it on one side.
Speaker B:You have user intent, I Want a database, I want an S3 bucket, I want a Kafka cluster, I want a Kinesis topic, right?
Speaker B:And then on the other side you have an API call from, like, from, from aws, from gcp, from, from Microsoft.
Speaker B:And there is a very, very, very long process that involves like writing code, figuring out the right code to author first and then authoring that code, then taking this code through the deployment, like setting up the credentials and all that, and then getting it through all the steps.
Speaker B:And ultimately you're talking to provider APIs, you're talking to the hyperscaler APIs.
Speaker B:So what if we took your intent, we took those APIs, we proxied them to a very well defined schema with a provider, with a Terraform provider.
Speaker B:And we let your LLM be the translator.
Speaker B:So you remove the terraform from the mix entirely.
Speaker B:You let the human assisted by LLM make calls directly to provider.
Speaker B:And then what we do as spacelift intent is first of all, we facilitate this interaction, but we also capture that we serve as a middleware.
Speaker B:So we, we will provide gates before you talk to an API.
Speaker B:And then after you provide, after, after you talk to the provider, what that means is we are able to first add policy as code layer.
Speaker B:So before you talk to the provider, we'll make sure that what you're asking for is something that you allow, that your organization allows, and it generally makes sense.
Speaker B:These are the rules that you can set up yourself.
Speaker B:Your admins can do that.
Speaker B:And once you write to the provider, once you talk to the provider, we will take the state and we'll save it exactly like you'd save Terraform state.
Speaker B:So essentially you have the full Terraform experience without Terraform and without the whole software development life cycle in between.
Speaker B:Now is it the ultimate solution in infrastructure's code?
Speaker B:First of all, there is no code, but no, I think it's just a tool on the spectrum that is very broad and it kind of occupies this sort of sweet spot for when you want to do prototyping and you just don't like, you don't want to be bothered with this entire like dance of learning and writing and deploying and approving, etc.
Speaker B:And there's another really cool use case here is, you know, we sometimes think that Terraform is this, is this always greenfield projects, right?
Speaker B:Like everything is terraform.
Speaker B:And by the way, I thought that spacelift was being used by companies who are already in the green and they want to keep being green.
Speaker B:The reality is that it's almost never the case, almost all terraform projects are brownfield projects.
Speaker B:So a tiny bit is terraformed, a tiny bit is codified.
Speaker B:Could be some other.
Speaker B:Like I'm using terraform as almost like a shortcut.
Speaker B:But the reality is that it's the most widespread technology.
Speaker B:But part of it is codified, part of it isn't.
Speaker B:People have been doing things on the console.
Speaker B:People have had like a one off script.
Speaker B:Someone tried Ansible, someone tried this, someone tried that, someone did Terraform, but they forgot.
Speaker B:And now you have terraformed resources, but you don't have the code or the state.
Speaker B:So kind of like someone did something on their laptop and then left, right.
Speaker B:So the project is a brownfield project.
Speaker B:Now terraforming it, pulling it under terraform management is extremely hard unless you have something like intent.
Speaker B:Because the beauty of intent is that it's, it kind of tricks the whole process because it saves terraform state.
Speaker B:So even though it doesn't speak terraform, because the providers provide like they return state, in return you can take the state and you can create HCL out of that.
Speaker B:So you can use that, your LLM can essentially scan your entire account, see what you're not managing using terraform, pull those resources and generate beautiful terraform for you and generate the state and kind of promote something to a terraform project, which is an amazing experience compared to whatever we had in the past.
Speaker A:Okay, I want to talk at some point in time about non determinism versus determinism and AIG is that.
Speaker A:But I want to hold on to that for a second.
Speaker A:I want to get back to what you're talking about here a little bit.
Speaker A:So you essentially allow AI to go from the idea to a completed configuration change to a cloud provider, and then you snapshot that state and store that for repeatability purposes.
Speaker A:Is that a fair summary?
Speaker B:Correct?
Speaker B:Yeah, it's quite correct.
Speaker B:I wouldn't overestimate the role of AI here.
Speaker B:AI is more like a translator of intent into a particular schema.
Speaker B:So terraform resources have a very strict schema and it doesn't leave you a lot of freedom.
Speaker B:See, an LLM would translate one representation of what is required, which is a natural language.
Speaker B:And you know, the more details you provide, the more strict that translation will be into a schema that is provided by the terraform provider.
Speaker B:Okay, so it turns out that LLMs are very good translators.
Speaker A:Yeah, exactly.
Speaker A:So that actually answers, I guess the core question is are you using AI to modify your infrastructure or are you using AI to essentially create terraform scripts?
Speaker A:And you're using those scripts to actually modify your infrastructure.
Speaker B:Something in between.
Speaker A:Something in between.
Speaker B:So we're not actually like there is no hcl, there's no Terraform in, in between that we don't generate code that we would run through Terraform.
Speaker A:You don't generate the terraform script per.
Speaker B:Se, not terraform script per se, but we generate the input that the terraform script would pass to the provider.
Speaker B:So we short circuit that language translation layer.
Speaker B:Because if you think about it, what Terraform is, is there are two layers of translation here.
Speaker B:The first layer of translation is inside a human to machine process where you use a programming language or it's really more like a markup language.
Speaker B:You use a markup language to express something in a language that a computer or computer instruction.
Speaker B:The HCL code, the Terraform code.
Speaker B:Right.
Speaker B:So there's one layer of translation and then what Terraform does is it translates the intent expressed as HCL language into an API call to the, to the terraform provider.
Speaker B:So there are two parts of it.
Speaker B:And what we're doing is we're saying, look, we don't need that middle layer because if we're doing a translation, you can actually do that translation directly between a user intent, what the user wants and the schema.
Speaker B:Because an LLM is your assistant in this case.
Speaker B:So you don't necessarily need that extra formality that comes with Terraform and you get something for free.
Speaker A:Now in a traditional infrastructure as code environment, that script plays an important role.
Speaker A:And that is the, that is the script is what is checked into revision control compared against past revisions.
Speaker A:It's used for evaluating changes to, you know, you do a pull request to make sure that it's get a change reviewed by your peers to make sure it's reasonable approval processes as well as during an incident you can check to see what changes have occurred in the past and see whether that might have caused the incident, et cetera.
Speaker A:So all of those sorts of things are value that you get out of that script.
Speaker A:Now if we pull that script out, what part of the system is providing that particular set of value?
Speaker B:So the historical data is still there because the state is versioned.
Speaker B:So if you want to understand whether the changes have happened, first of all you have a full history of each resource and each operation in your database inside intent.
Speaker B:For open source that would be a SQLite or a Postgres.
Speaker B:For a commercial version, that is something that SpaceLift would host.
Speaker B:So what you'd get from state history is still there.
Speaker B:So your incident management story is still like it still takes that box.
Speaker B:Now, what is not covered specifically by intent is the ability to replay something from that state.
Speaker B:So what we do in this case is we allow you to export an intent project, so like a collection of resources in like a single logical entity, export it to hcl.
Speaker B:And so you can take that HCL and you can either go through Terraform and essentially replay it as a terraform project, or you can do something even more devious.
Speaker B:You can pass it to an LLM because it's also an expression of intent.
Speaker B:So you kind of export what you, you know, the instructions and the LLM.
Speaker B:It doesn't care whether the instructions are in a human language or in a computer language, as long as it can map it directly to a schema, it's good.
Speaker B:And so it is actually very good at even translating Terraform code into API calls without terraform.
Speaker B:Right.
Speaker B:But in general, like the moment that you need to replay what you did in a production environment, I would say that's the moment where you just eject do the HCL and you know, graduate to a production ready process.
Speaker B:And you're right, you're getting a lot of value from that process.
Speaker B:Question is at which stage you need to, you need to invest that time.
Speaker B:And what intent gives you is the ability to really get started quickly and not have to pay this, you know, ginormous price up front that you need to otherwise pay to get something terraformed.
Speaker B:Right.
Speaker B:You can start very slowly and kind of ease your way into very formal infrastructure management.
Speaker A:This is the middle ground approach, right?
Speaker A:Is where you provide something that's easier or as easy as using the console may be easier, but easier than doing the formal mechanism.
Speaker A:So you can get started a lot quicker.
Speaker A:But the assumption is by the time you, you move into a production, enterprise grade production environment, you probably have moved off of spacelift and into a more.
Speaker B:Formal CICD system of intent of intent.
Speaker B:So yeah, so in spacelift we would provide you that golden path where you say, okay, the intent project is promoted to a spacelift stack and that's it, you know, the code is generated, the state is there, and it now operates as a, as a version or a source controlled version controlled formal infrastructure scope project.
Speaker B:And then you can replay it in a different environment and you'll get the same results, et cetera.
Speaker B:So yeah, at that point, this is it.
Speaker A:So how does intent help with that migration?
Speaker A:If you're under the assumption that you're building a production stack but you want to build it fast, so you use intent to get up and running and the infrastructure up and going easily.
Speaker A:But now you've got to take the next state and move it.
Speaker A:You're going to go live, you're going to build a production environment and you want to put all the production safeguards in place before you actually do that.
Speaker A:What types of tooling or what types of abilities exist for taking that intent managed infrastructure and moving it to a full IAC stack?
Speaker B:That's a good question.
Speaker B:So to clarify what intent is, it's an MCP server spacelift intent.
Speaker B:The external functional surface is that of an MCP server.
Speaker B:So all the functionality is exposed through what's known as MCP tools, just things that API that LLMs can call, like APIs that LLMs can call.
Speaker B:It's not very different from RESTful endpoints in HTTP.
Speaker B:It's almost exactly the same thing.
Speaker B:So we expose those as abilities as tools and we expose two tools specifically.
Speaker B:Because Terraform, when you think about it, it's actually composed of two main artifacts.
Speaker B:There is the declaration code, the HCL code, the TF files, and there is the state.
Speaker B:So if we know like we can generate the HCL for you, so we know what resources you're currently managing.
Speaker B:And because we have the state generated by providers, we just give you the state and we give you the HCL code.
Speaker B:So we give you the two artifacts that is sufficient to start managing that through Terraform like immediately.
Speaker B:So we essentially give you all the components of a Terraform project, both the state and config.
Speaker B:You're good.
Speaker A:Got it.
Speaker A:Great, thank you.
Speaker A:So let's get back to this deterministic standpoint here now.
Speaker A:So one of the advantages and disadvantages of modern AI today, and I'll use that term very generically, it's non deterministic, right?
Speaker A:You, you take a given set of controlled input you supply to an AI multiple times, you're gonna get different results each time, or some combination of some set of results that are different each time.
Speaker A:And the art or the act of provisioning infrastructure, though, you need that to be very deterministic.
Speaker A:I need a server that looks like this and I need it 20 times.
Speaker A:That's a very deterministic set of things to do.
Speaker A:You wanna do the exact same thing 20 times in order to have your 20 servers set up exactly the same way, or you want at least to have a level of predictability in what it does.
Speaker A:And traditional historical AI doesn't do that.
Speaker A:And so how do you deal with that aspect of AI and building and using within the Intent Project forum for creating provisioned resources, it's a very good question.
Speaker B:So we kind of live with it in a sense.
Speaker B:I mean, AI is good at certain things.
Speaker B:It's not very good at certain things.
Speaker B:We're trying to use it for the things that it's good at and not have people use it for the things that it's not very good at.
Speaker B:If you ask the same question of your model 20 times, even if you set temperature to zero, you'll probably get 20 different answers.
Speaker B:There is, however, there are, however, ways of mitigating that.
Speaker B:First of all, even in the absence of other guardrails, the more strict you are with your expectations of an LLM, the fewer degrees of freedom you leave to the LLM to generate an answer, the more deterministic it is.
Speaker B:Empirically, we found out that the terraform schemas, with their extreme verbosity and very good descriptions, are very, very prescriptive.
Speaker B:And so they serve as very powerful guardrails that generally keep the LLM focused.
Speaker B:So that's one thing.
Speaker B:Even before we move to other things that we did, the other thing that we did is we set up policy as code guardrails.
Speaker B:You know, you say that LLMs are non deterministic, and you're completely right.
Speaker B:Well, I'll ask a very provocative question.
Speaker B:Who writes code?
Speaker B:Humans write code.
Speaker B:Are humans deterministic?
Speaker A:No.
Speaker B:No.
Speaker B:And the reason people, like people, realize, leaders realize that people make mistakes, and it's inevitable that they'll make mistakes.
Speaker B:Right?
Speaker B:Mistakes are part of the job.
Speaker B:So what you do is you set up processes, you set up policies, and you codify policies to make sure that someone having a worst day will not blow up your production.
Speaker B:So the thing that you do for a human, why wouldn't you do it for an LLM?
Speaker B:Of course, the LLM will find probably more ways of being stupid than people.
Speaker B:Although, I mean, what people can do, you probably know yourself that there's probably no limit to what people can do, and there's probably no limit to what LLMs can do.
Speaker B:And so you kind of assume that both will make mistakes and you're trying to codify those policies.
Speaker B:So what intent will give you is the tooling to do those policies, first of all to author the policies and then to enforce the policies.
Speaker B:So essentially, you're trying to keep an LLM on track with two guardrails.
Speaker B:One guardrail is very, very strict instructions from the schema.
Speaker B:And, you know, the more strict instructions you give in your intent, your expression of intent, you know, on the intersection of that will be more and more predictable.
Speaker B:And on the Left, the left guardrail is the.
Speaker B:In the the policy as code framework which is completely deterministic, right?
Speaker B:Once you have a rule, there is no interpretation.
Speaker B:You'll go through the interpreter that is fully deterministic.
Speaker B:There is input, there's a rule, there is an output, there is a decision.
Speaker B:Right?
Speaker B:The LLM will do nothing about it.
Speaker B:It'll just learn.
Speaker B:Okay, I'm sorry, but I did something I shouldn't have done.
Speaker B:Here's the explanation.
Speaker B:Human, what do I do now?
Speaker A:Basically what you're saying is AI's make a mistake.
Speaker A:Yes, but so does humans, which we're trying to replace in this case, they make mistakes too.
Speaker A:The net result is we have processes in place in order to make sure that what the humans do is, isn't destructive.
Speaker A:We can do the same thing with AI.
Speaker A:And that's what you do correct is you put this process in place for making the mistakes that are fatal and only the livable mistakes and be able to deal with them.
Speaker B:And ideally if there is a way to make those mistakes cheap.
Speaker B:So like you make those mistakes in a throwaway or pre production environment and then production environment is then fully deterministic that you know, how do I make, how do I let an LLM or a human make mistakes that will not cost me my business?
Speaker B:Well, generally I let them make those mistakes in an air gap or a sandbox environment the same way, you know, I wouldn't let my kids run onto train tracks.
Speaker B:But you know, if a child does something that is generally safe, but you know that he'll learn a lesson, then you might actually want to have them make that mistake so that they learn from their own experience.
Speaker B:And in a sense if we can provide a safe space for LLMs to do that, you know, running like a preview changes and running those policies in front of any API calls, then an LLM gets the same kind of sandbox learning experience that a human otherwise would.
Speaker B:There's no difference between the two.
Speaker A:So this is an interesting way of doing AI based provisioning and I like it.
Speaker A:But the more classic way of involving AI in the infrastructure as code model is more akin to the vibe programming approach.
Speaker A:You don't bypass the terraform step.
Speaker A:You use AI as a tool to assist you in building terraform.
Speaker A:Now my, my guess is you think the approach that intent does is better than that.
Speaker A:And under that assumption, or is at least different, what do you think of that approach and how does it compare to intent?
Speaker A:And why is one better than the other?
Speaker B:I think they both, they're neither is superior.
Speaker B:I think they both have their place.
Speaker B:The way I would describe it is like with terraform, when all you know is terraform, everything looks like a terraform problem.
Speaker B:So when all you have is a hammer, everything looks like a nail.
Speaker B:My understanding of AI assisted terraform generation is that you still have a hammer, you just make it swing faster.
Speaker B:And that's by the way, it's a major win.
Speaker B:But you still need like, okay, you've auto terraform a little bit faster.
Speaker B:So you remove that part of a very long loop.
Speaker B:You haven't removed the rest.
Speaker B:Right.
Speaker B:So you made a hammer faster.
Speaker B:But the rest of a process is still there.
Speaker B:Sometimes you need that process.
Speaker B:I'm not against using AI to generate terraform.
Speaker B:I think it's a very valid approach at a certain level of maturity of a project and maturity of a team.
Speaker B:And I think an ideal situation would have you be able to create infrastructure from intent in that sort of like human in the loop way that we do it with spacelift intent and then allow you to use the same approach in a more formalized fashion with, with Terraform being, and with, with the code review and the terraform being the thing that is being reviewed.
Speaker B:I don't think there is necessarily a conflict or one is better than the other.
Speaker B:We, you know, with spacelift intent we just give you one tool and I think AI assisted terraform generation is another tool.
Speaker B:Both valid, both have their use cases and I think the future belongs to teams choosing the right tool for the job.
Speaker B:Whereas the right tool might be a different tool at various stages of maturity and the project lifecycle intent is a.
Speaker A:Tool along the transition, not the end state is a good way to put it.
Speaker A:I love that.
Speaker A:Yeah.
Speaker A:Thank you.
Speaker A:My guest today has been Marcine Visensky.
Speaker A:Marcine is the co founder of SpaceLift, an AI enabled cloud infrastructure provisioning product.
Speaker A:Marcin, I want to thank you for joining me on Software Architecture Insights.
Speaker B:Thank you for having me.
Speaker A:Thank you for joining us on Software Architecture Insights.
Speaker A:If you found this episode interesting, please tell your friends and colleagues you can listen to Software Architecture Insights on all of the major podcast platforms.
Speaker A:And if you want more from me, take a look at some of my many articles at Software Architecture.
Speaker A:And while you're there, join the 2,000 people who have subscribed to my newsletter so you always get my latest content as soon as it is available.
Speaker A:Thank you for listening to Software Architecture Insights.