Today's discussion centers on the vulnerabilities associated with AI systems and the increasing threats they face. Our guest, Preston Wood, the Chief Security and Strategy Officer at Databox, highlights the lack of transparency in AI technologies as a significant factor that makes them more susceptible to attacks. We explore how this obfuscation creates challenges in understanding and defending against potential threats. As AI continues to advance, we also consider the evolving nature of phishing attacks and the importance of robust data management strategies to mitigate risks. This episode aims to provide insights for software architects and leaders on navigating the complexities of AI integration while ensuring security and reliability.
The podcast episode features an insightful discussion about the growing vulnerabilities associated with AI systems. The guest, Preston Wood, the Chief Security and Strategy Officer at Databox, addresses the surge in AI-related attacks, emphasizing the need for greater transparency and understanding of AI operations. He explains that the ambiguous nature of AI systems makes them appealing targets for attackers, who can exploit the lack of visibility into how these systems function. Throughout the conversation, Preston highlights the importance of ensuring that AI-generated data is clean and comprehensible to mitigate risks. He compares today's AI landscape to early phishing attacks, which have evolved into sophisticated threats due to advancements in AI technology. This episode serves as a crucial resource for software architects and technology leaders, offering them guidance on how to navigate the complexities of securing AI systems and understanding the implications of AI on data management and security practices.
Takeaways:
Hello and welcome to Software Architecture Insights.
Speaker A:Your go to resource for empowering software architects and aspiring professionals with the knowledge and tools they require to navigate the complex landscape of modern software design.
Speaker A:My guest today is Preston Wood.
Speaker A:Preston is the Chief Security and Strategy Officer at Databox.
Speaker A:DataBonded is an AI powered data pipeline management platform for enterprises to gather, manage and move data reliably.
Speaker A:Preston, welcome to Software Architecture Insights.
Speaker B:Thanks for having me, Lee.
Speaker A:So as AI use has grown over the last several years, obviously so has threats against AI based systems.
Speaker A:What can you tell me is the biggest thing that's made AI based attack vectors so, so vulnerable?
Speaker B:Well, I, you know, I think it's a great question.
Speaker B:I mean we certainly have seen just the explosion of AI over the last several years.
Speaker B:I think we're going to continue to see these type of attacks continue to sort of expand and largely, I mean, there's a number of reasons why, but one of which I think is worth mentioning is really the lack of transparency and visibility around what is actually going on within, you know, your, either your AI infrastructure, your AI decisioning, what are these models actually doing and how they're functioning.
Speaker B:And because of that, you know, you certainly have, you know, products such as ourselves that are driving towards making sure that the data that you are generating and leveraging in AI is clean, it's understandable, it's transparent.
Speaker B:So I think a lot some of the problems kind of rely upon the fact that, you know, AI is new, there's not a lot of transparency into how, you know, some of these agents are actually functioning.
Speaker B:And so, you know, attackers are able to kind of leverage upon that shiny new coin.
Speaker A:So I know sometimes people will think AI is just another technology, right?
Speaker A:And so it's really no different than any other technology when a new technology comes out, we need new ways to protect it, et cetera.
Speaker A:But what you're kind of saying is AI is a little bit more than that simply because it's not at all clear what AI is doing under the hood.
Speaker A:So there's a lot of unknown involved in that that makes it harder to protect.
Speaker A:Is that kind of what you're trying to say?
Speaker B:Yeah, that's One reason, sure, 100%.
Speaker A:So how does that make it harder?
Speaker A:I mean, what is it about the fact that we don't understand how the technology works that makes it easier for bad actors to attack it or more likely that they're makes it a more likely target for them to attack?
Speaker B:Well, you know, obfuscation has always been a, a tactic in a hacker's toolbox.
Speaker B:You know, the less, the less some, less something is transparent, the more obtuse it is.
Speaker B:You know, hackers can usually leverage that lack of understanding to drive, you know, some level of attack or drive a user to do something that they normally wouldn't do.
Speaker B:And you know, given kind of the world of AI we're seeing, it's becoming very, very hard to differentiate between what's being generated, AI versus what is not.
Speaker B:Some of this you could actually kind of point back to the early days of phishing campaigns where phishing emails were sort of crafted and sent into organizations to click on something and become a point of compromise.
Speaker B:And early days those phishing emails were easy to spot or misspellings and bad domains and these type of things, but they still caught people because there was an element of wanting to click on that link.
Speaker B:Today's phishing emails are largely driven by AI.
Speaker B:They're clean, the language is good, URLs are good.
Speaker B:And becomes incredibly difficult for users and organizations to differentiate.
Speaker B:2 the, the obtuseness and abstractness of what's going on with AI and how it's making decisions and actually what it is providing back to you.
Speaker B:And I think we've all kind of heard of the, the concept of hallucinations.
Speaker B:The AI will, you know, send something back that sounds very real and very convincing, but it's a, it's a hallucination.
Speaker B:And so how AI is actually generating what it's generating to, you know, be leveraged within an attack is unknown sometimes.
Speaker B:And so if you don't understand exactly how that attack is occurring, it's difficult to position yourself in a place that you can thwart that attack.
Speaker B:And, and then you have the speed factor.
Speaker B:So, you know, the data and information and intelligence is moving very, very rapidly.
Speaker B:And as attackers can kind of leverage AI to iterate through, you know, developing a world class phishing email or some sort of exploit or adapt on the fly.
Speaker B:I think there was actually just a release by Anthropic where they detailed an attack that they witnessed as part of leveraging AI agents.
Speaker A:Yeah, I think one of the things I worry a lot about is the dramatic improvement of phishing attacks in particular, you know, there's AI is really good at talking to you and having decent level conversations with you.
Speaker A:And the fact that it can do that means it can pretend to be whoever you want it to be.
Speaker A:And whether it is or not is another matter.
Speaker A:I was talking to a customer support agent for my home Internet connection the other day trying to solve a technical Problem that rebooting the modem didn't work and they were going through some setting adjustments and all that sort of stuff to get it working.
Speaker A:And I was talking to this agent for about, oh, 20, 25min, and then I suddenly realized that it was an AI agent and going through some very sophisticated debugging techniques and it was so good at its conversation is.
Speaker A:I didn't know it was AI until about 15, 20 minutes into the conversation.
Speaker A:That level of sophistication at that point I thought to myself, now if this was a phishing attack of some sort, it wasn't.
Speaker A:But if it was, how could I possibly detect that as a mere human to know that this high quality AI with a high quality ability have rather sane conversations with me?
Speaker A:How could I possibly know it was an attack?
Speaker B:Yeah, it's becoming increasingly difficult just pulling on that phishing thread a little bit.
Speaker B:As I mentioned back in the day, the signals around identifying something as a phishing email was much easier.
Speaker B:Misspelling again, bad domains, links that just didn't look right.
Speaker B:Today, those signals are becoming less and less.
Speaker B:And so the ability to be able to leverage technology and in many cases actually leverage AI to tear apart some of those things in real time so you can find some of those signals.
Speaker B:But for end users, users, there's always sort of those telltale sort of signs of, well, do you know who this person is?
Speaker B:Does the email look appropriate?
Speaker B:Does the link you're clicking on look like it's supposed to look?
Speaker B:I think in the age of AI here, it's really driving people to be more attuned and suspect to the information that they're seeing.
Speaker A:Yeah, there's always this race, right, between technology for bad actors and technology to detect what bad actors are doing and which one's further ahead.
Speaker A:So AI also can help with detecting these attacks and be very useful from a white hat standpoint as well too.
Speaker A:But how is that race going?
Speaker A:Are we improving our technology, for instance, to detect that a voice is AI generated versus real voice?
Speaker A:Are we getting better at our ability to detect faster than the ability for it to be undetectable?
Speaker B:We're getting there.
Speaker B:But you're absolutely right, it's a race.
Speaker B:And information security and security in general has always been this race of, you know, bad guy advances and good guy, you know, has to run to catch up.
Speaker B:And you always have this leapfrogging effect that occurs.
Speaker B:And we're in that space right now where the technologies and the abilities to be able to detect and flag, you know, these, these things on the fly is getting better, but still obviously needs some improvement as this is such a fast paced environment.
Speaker B:And I think we're all familiar with Moore's Law with regards to chips.
Speaker B:I wonder what Moore would think when he saw the evolution of AI.
Speaker A:That's true, that's true, exactly.
Speaker A:I'm thinking a lot of different dimensions with that, but absolutely that's a valid point.
Speaker A:Now, you mentioned something that I think you coined the term hands off AI or you didn't coin the term, but you use that term in some of the things that you talk about and that most organizations aren't ready yet for hands off AI.
Speaker A:So first of all, can you tell me what do you mean by hands off AI specifically so our listeners know exactly what we're talking about here?
Speaker B:Well, I don't know as much as hands off.
Speaker B:I think there's a couple of ways to sort of think about AI in the terms of how it's leveraged.
Speaker B:And I think that this ends up being a, a journey for organizations.
Speaker B:And organizations are on different paths of this journey right now of leveraging AI to answer some basic questions.
Speaker B:And this could be your customer service prompts, these type of things where there's a deterministic kind of outcome and then you start sort of moving into non deterministic outcomes where AI is sort of generating more paths that weren't necessarily determined upon input.
Speaker B:And in those cases, I think organizations are putting a human in the loop around approving certain high risk activities or transactions that the AI ran down a non deterministic path so that the organization, you know, gets comfortable that they've got the right guardrails around.
Speaker B:Yeah, I'm getting the right information.
Speaker B:This looks like a good path to take.
Speaker B:Okay, I'm going to approve this to move forward.
Speaker B:So I think you see, organizations are putting humans in the loop around before you kind of move into full sort of agentic or hands off AI where you're comfortable with the cleanliness of the data that you're sending the AI, you're fully aware of what the model is doing.
Speaker B:And even though there may be some deterministic, non deterministic paths it's going to take, you've got the right guardrails in place to prompt humans in the middle or otherwise to truly kind of be in this agentic or sort of hands off AI and organizations are in varying degrees of that journey right now.
Speaker B:And we'll continue to kind of see that comfort grow.
Speaker B:I kind of relate this back to cloud computing.
Speaker B:As cloud computing became on the scene There was a lot of concern around how do we secure the cloud, how do we manage this?
Speaker B:And so organizations sort of tiptoed into that to where we sit today.
Speaker B:There's an understanding of, you know, how the cloud works and how to secure it and how to manage it.
Speaker B:And we're seeing the same sort of journey with AI right now, but it's moving, in my opinion, much, much quicker than we saw cloud.
Speaker A:Yeah, no, I was very heavily involved in that, in that cloud.
Speaker A:How do you secure the cloud Mindset.
Speaker A:And I can almost tell you exactly when the transition occurred from I can't trust the cloud to I can't trust not using the cloud.
Speaker A:You know, it was an amazing transformation occurred.
Speaker A:So you suspect the same sort of thing will happen with AI at some point in time?
Speaker B:Yes, absolutely.
Speaker B:Absolutely.
Speaker B:We're just in that journey right now of opening transparency and understanding and learning and understanding what these guardrails need to be for AI.
Speaker A:So I hear you talking about guardrails a lot.
Speaker A:Is guardrails, is that the solution or is better model training the solution, or is it a combination?
Speaker B:Yeah, I wouldn't say there's a silver bullet here.
Speaker B:Model training, you know, 100%.
Speaker B:How, how is your model reacting?
Speaker B:How is your model being trained?
Speaker B:You know, what alignment is that model having that's equally as important to the guardrails.
Speaker B:And those guardrails are a number of different things around, you know, humans in the loop, you know, managing, you know, prompt engineering, you know, sort of managing, you know, what is going in versus and what is coming out of, you know, sort of sort of your AI, you know, in addition to, you know, managing sort of the speed and performance thereof as well.
Speaker B:So I wouldn't say there's a silver bullet in any of this at this point.
Speaker B:It never is with security.
Speaker B:There's always a layered approach with security.
Speaker B:So, you know, it's these guardrails around transparency and understanding how your.
Speaker B:How your AI, you know, is working to, you know, a number of those other, you know, aspects of, you know, securing the ecosystem.
Speaker A:Yeah, I know one of the things that a lot of people talk about now is using AI as a guardrail against AI and, you know, using, you know, some of those guardrails, as you, as you say, are, you know, inbound data, transparency of data filtering and output, et cetera.
Speaker A:But some of those are actually using AI to analyze what you're doing, what the.
Speaker A:A, what another AI is responding to, how.
Speaker A:How effective is that, and is that an important aspect and how will that evolve over time?
Speaker B:Yeah, no, I think that that will be an important aspect as well.
Speaker B:It's kind of your A B testing, right.
Speaker B:Of AI in that, you know, any of the sort of the AI work that we're working on.
Speaker B:Yeah, we, you know, bounce, bounce that up against various, you know, large language models, the open ones, and then some of the more, you know, the smaller ones out there so that you kind of understand, you know, here's, here's my, here's my expected answer to these sort of questions.
Speaker B:And here's what the AI, here's what these different AIs are providing and kind of understanding why that is the case and making sure that you're kind of aligning to the right models that can reach your outcome.
Speaker B:So I think the concept of quality testing, of ensuring that the LLMs are delivering what you expect and at the same time bouncing that up against other sort of models to understand, you know, if one's hallucinating or one is not, or, you know, one's got an alignment slant that doesn't, you know, align to the questions you're asking.
Speaker B:I think, you know, leveraging and having the understanding and skill set to, you know, test your hypothesis and decisions across multiple LLMs I think will become the norm.
Speaker A:You know, so if I look at the people who are listening to this podcast, they're, they fall into two camps.
Speaker A:They're either senior leadership or senior managers at enterprises, or they're software architects or software development leaders, all very much technology focused positions for those individuals.
Speaker A:What advice can you give them as they go forward with their AI integration strategy into whatever projects they're working on now?
Speaker B:Well, I mean, a couple of things come to mind, you know, so the first is the way I like to think about, you know, how AI has kind of really risen is, you know, for AI to actually function, you need three things.
Speaker B:You need compute, you need models, and you need data commute.
Speaker B:Compute obviously is becoming, you know, much more commodity, but, and there's more being kind of built to leverage.
Speaker B:We understand how compute works.
Speaker B:Models.
Speaker B:A lot of these models are becoming very powerful, some of which, a lot more open models are out there.
Speaker B:The ability to leverage some of these LLMs and models is becoming more accessible to more organizations.
Speaker B:Then lastly is the data.
Speaker B:The AI is only going to be as useful and valuable as the data that you put into the AI.
Speaker B:So I would encourage senior leaders in architectures to take a look at, take a look at what does your data architecture look like today?
Speaker B:Do you have the ability to AI ready your data within your Data architectures or is your data architecture sort of stuck in the, you know, the 90s and hasn't been touched because, you know, nobody knows how it works anymore and they don't want to break anything.
Speaker B:So, you know, taking a close look at these, your data architectures, and if you've got a monolithic sort of data structure and architecture in place, that's one of the areas to take a look at.
Speaker B:Because if you can't sort of leverage some of the modular approaches around managing and cleaning and transforming and normalizing your data so it's AI ready, then you run the risk of any of your AI projects being a failure even before they start, because you're not using good clean data.
Speaker A:I'd like to end with a different sort of question.
Speaker A:App observability is probably one of the hottest areas right now for maintaining the integrity of a running system.
Speaker A:Right?
Speaker A:You use application, whether you're trying to talk about logs or analytic data, whatever that is, observability data is really becoming more and more important to keeping and maintaining a high functioning SaaS application.
Speaker A:But that data you use for that observability falls into two camps.
Speaker A:It's data that you'd inherently use for performance, availability, scaling, those sorts of keeping your application working.
Speaker A:And there's data that you use specifically for security and vulnerability detection and validation and verification and threat detection, etc.
Speaker A:And I guess the first question is how similar are those two types of data?
Speaker A:I think the more important question is what role does AI have in creating and using that data?
Speaker A:Let's say focus more on the using side of that data in order to keep your application safe.
Speaker A:And how does that radically differ from observability data used for detecting failures and maintaining performance?
Speaker B:Yeah, it's a good question.
Speaker B:We're certainly seeing that uptick in observability and customers and others that are reaching out, asking around how do they better manage their observability data with the pipeline?
Speaker B:And so I would echo that same sort of sentiment.
Speaker B:We're seeing the same type of interest there.
Speaker B:But I'll answer your question in the context of data is just data.
Speaker B:And the reason I say it like that is that how you leverage and what analytics and lens you apply to that data is what's going to drive your decisioning process.
Speaker B:And so as an example, a security team might want to understand sort of the peaks and kind of valleys of CPU performance and file system accesses.
Speaker B:And some of these things are sort of deemed as observability.
Speaker B:But in reality, if you apply A model that's looking for abnormal behavior from a security perspective.
Speaker B:You know, sometimes, you know, some of these systems that may have, you know, become compromised don't manifest themselves in being compromised by way of some of the traditional security ways.
Speaker B:You see some of these other performance indicators on your system, and the same is true on sort of the it.
Speaker B:And the observability side is oftentimes some of these teams lack kind of the visibility into understanding what's going on in their firewall environment.
Speaker B:And I've certainly seen my fair share of some sort of outage that happened to be related to some security change that IT teams and observability teams were not aware of.
Speaker B:What I'm really advocating for here is really the convergence of a lot of this data so that it can be modular and repurposed and leveraged for different types of analytics and pulling out of this constrained data and tool centric silo to kind of really within, you know, as a pipeline sort of provides is, you know, that centralized AI ready fabric that blends all of this information and allows your analysts, whether it's your observability side or your cybersecurity side, to leverage that data to make the best decisions.
Speaker B:And so when you can start stitching together, you know, data that is traditionally used in some of these other disciplines in such a way that you're providing more context to these LLMs and AIs to make better inferences and pattern matching for better decisions, leveraging your AI much more broadly in your organizations and moving away from some of the silo approach data architectures.
Speaker A:Yeah, and you bring up good points.
Speaker A:Almost every time there's a security problem, it ends up being a performance issue.
Speaker A:And almost every time there's not almost every time, but many times when there's performance issues, it's also related to security issues that are currently going on.
Speaker A:And so by correlation between them, the different types of data, you get much better results.
Speaker A:My guest today has been Preston Wood.
Speaker A:Preston is the Chief Security and Strategy Officer at databond.
Speaker A:Preston, thank you so much for joining me today and Software Architecture Insights.
Speaker B:Thank you so much, Lee.
Speaker B:Thanks for having me.
Speaker A:Thank you for joining us on Software Architecture Insights.
Speaker A:If you found this episode interesting, please tell your friends and colleagues you can listen to Software Architecture Insights on all of the major podcast platforms.
Speaker A:And if you want more from me, take a look at some of my many articles at Software Architecture Insights.
Speaker A:And while you're there, join the 2,000 people who have subscribed to my newsletter.
Speaker A:So you always get my latest content as soon as it is available.
Speaker A:Thank you for listening to Software Architecture Insights.