Easily listen to The Fuse and The Flint

Artwork for podcast The Fuse and The Flint

Building an AI Coding Assistant with No Internet Access

Episode 6 • 29th April 2025 • The Fuse and The Flint • 8th Light

In this episode of The Fuse and the Flint, Technical Director Travis Frisinger is joined by Brendon Page, Delivery Manager at Chilisoft. They dive into Brendon’s journey developing an offline AI coding assistant for a secure, regulated environment.

(00:26) - How the concept of an offline AI assistant started as a joke
(11:09) - Deciding on using AI over traditional code generation methods
(17:20) - Implementation and practical use of AI coding "recipes" by the team
(23:53) - Final reflections on AI skepticism, curiosity, and practical applications

Travis Frisinger - Technical Director - 8th Light

Serving as a Technical Director for 8th Light's AI Studio, Travis Frisinger leads high-impact engineering efforts focused on building real-world AI products. With a background in distributed systems, software architecture, and artificial intelligence, he helps organizations turn technical challenges into scalable, maintainable solutions.

Brendon Page - Senior Delivery Manager - Chilisoft

Brendon Page is a Delivery Manager at Chilisoft with extensive experience in software development and project management. He’s passionate about applying innovative, yet pragmatic solutions to technology problems, and enjoys exploring the potential—and limitations—of emerging technologies.

Key Points:

Running local AI models (like Meta’s LLaMA and Mistral) on standard workstation hardware proved unexpectedly effective.
Overcoming initial skepticism by practically evaluating AI against traditional code generation techniques.
AI solutions that empower developers to create and share customizable "recipes," improving productivity and creativity in coding tasks.
The importance of curiosity, hands-on experimentation, and pragmatic skepticism when integrating new technologies.

Resources:

Thanks for tuning in to The Fuse & The Flint. Check out our YouTube channel, follow us on LinkedIn, and visit us at 8thLight.com.

We hope today’s conversation ignited your curiosity, inspiring you to think differently about technology and its capability to transform. Until next time.

If you'd like to receive new episodes as they're published, please follow The Fuse & The Flint on Apple Podcasts, Spotify, or wherever you get your podcasts. If you enjoyed this episode, please consider leaving a review on Apple Podcasts or Spotify. It really helps others find the show.

Podcast episode production by Dante32.

Travis Frisinger: 00:02

Hello, and welcome to the fuse and the Flint where technology, innovation and development intersect in exciting ways.

Travis Frisinger: 00:09

I'm Travis Freisinger, Technical director of AI at eighthlight here with Brendan Page, delivery manager from Chilisoft.

Travis Frisinger: 00:16

In today's discussion, we're going to explore how to leverage AI offline.

Travis Frisinger: 00:21

So, Brendan, I hear this started as a joke. Can you tell me a little bit about that?

Brendon Page: 00:26

Well, 1st off, Travis. Thanks for having me. Yes, it does start off as a joke, but maybe I would like to start with a bit of context on the story. So we are currently doing a project in a secure regulated environment is how I'm allowed to describe it.

Brendon Page: 00:45

And therefore we have no Internet access on our project. Pcs. And as with any project, the coding project, that is, when you're building something from scratch, there are some tedious tasks.

Brendon Page: 00:57

and you have some options with tedious tasks. As per usual. You could try write some fancy code. You can try to do some code generators. But we're trying to also manage complexity here. So we are leaving a project behind that some other that the client developers have to take on, and there is sort of a a complexity threshold we don't want to exceed. So we don't want to write fancy code, and

Brendon Page: 01:22

we don't also want to necessarily leave behind code generators, because well, they're fancy code as well.

Brendon Page: 01:28

like T. 4 templates and whatnot. I know the other custom ones out there. But yeah, so we we didn't want to exceed that complexity threshold.

Brendon Page: 01:37

So I was chatting to my my boss the one day when I was saying, I think I really need to do something like this, but I want to approach it deliberately, because, again, it's, you know, taking on complexity is a deliberate action, in my opinion.

Brendon Page: 01:50

and he sort of threw out a a a joke, you know, one of those disposable jokes, this throwaway comments they're like, Oh, you know. Why don't we get the AI to do it? Because we both know the situation we're in. There ain't no connection to anthropic or or what's the other one open. AI! So it didn't seem practical, and we both chuckled, and then immediately there was silence, like 2.5 seconds after the laughing, and we thought, hey.

Brendon Page: 02:19

hold on a second, because I just recently been chatting to him about my dabbling in running local models on my laptop.

Brendon Page: 02:30

I'm lucky enough to have a laptop with enough vram. So I and I'm not very good at creative writing.

Brendon Page: 02:35

And I have a personal project that involves a little bit of creative writing. I'm just coding a little game, and I want some item descriptions. And I was trying to figure out, how can I do this without giving?

Brendon Page: 02:49

So I was trying to figure out how to do this without giving these companies any money, because I don't know. I don't trust any of them. To be honest, I think I'm just training them

Brendon Page: 03:01

all my chats just help train them. That's my conspiracy theory.

Brendon Page: 03:07

So I was being stumped about it, and I found Olama, and I was quite surprised at the at at the ease with which you can just host models on your on your laptop. I just have a laptop. Okay, just there's a bit of privilege there. It's a it's a gaming laptop, because I'm a gamer, and I think I've got 8 gigs of Vram.

Brendon Page: 03:28

but anyhow, I was quite surprised at how far it could come. Sorry.

Travis Frisinger: 03:32

Your your machine didn't have a lot of hardware, but you still managed to make it happen.

Brendon Page: 03:36

Yeah, a lot of hardware not having a lot of hardware is sort of one of those debatable things. It is a fairly beefy machine, but in terms of what you would need compared to 5 years ago. It's almost nothing. It's a stock standard off the shelf.

Brendon Page: 03:52

gaming laptop, that's it. Yes.

Travis Frisinger: 03:54

Sure.

Brendon Page: 03:55

Yeah. So I was quite surprised. I was like, Hey, I can run these models locally. And I was even more surprised because the the models that are being put out now by Facebook. So the Meta model, and I think mistrel is the French startup. They have those lower parameter quantized ones. So 4 bit, 7 billion parameter models, and

Brendon Page: 04:19

I couldn't believe how well they performed now they they don't perform well at everything. But if I compare just chatting to one of those models compared to chatting, to Gpt, or I don't know. Pick any other one.

Brendon Page: 04:33

There wasn't much difference. So I thought, Wow, great. I can do this. I could use this in my game, and then I needed, of course, please.

Travis Frisinger: 04:42

Stop there for a second. So when you say there wasn't much difference. Obviously, it's kind of restricted to the type of activities you were doing. And I.

Brendon Page: 04:48

Yes, I was small context windows, and I was trying to get it to generate a reasonable amount of output. So not

Brendon Page: 04:59

not thousands upon thousands of lines. So I certainly in in my experimentation, I wasn't trying to ingest an entire code base or basically go do some document retrieval, and then put it into a big, fat, prompt, or a big, fat context into a prompt.

Brendon Page: 05:16

I was controlling it quite precisely, at least for my my personal project.

Travis Frisinger: 05:22

So you had a really constrained need, small scope, and you.

Brendon Page: 05:25

Yeah.

Travis Frisinger: 05:25

It just worked for you in that scenario.

Brendon Page: 05:45

so, yes, it worked really well. And and I was surprised because prior to that, I attempted to get something running, and

Brendon Page: 05:54

I would have had to have learned how to set up a path, an environment.

Brendon Page: 06:00

There's all sorts of stuff. It's not my, it's not my stack path is not my stack. So it was a whole, another world. So I was surprised at how accessible it was. That's the end of the story. In my, in my personal project.

Brendon Page: 06:11

So if I rewind back to that conversation with my boss, the reason why

Brendon Page: 06:17

we both paused after thinking, ha! Ha! That's a ridiculous, ridiculous suggestion is, I had recently been discussing my my discoveries in my personal project with them.

Brendon Page: 06:29

and then I thought, Oh, wait!

Brendon Page: 06:31

We! I said to him, no wait. Maybe I could do the same thing

Brendon Page: 06:36

that I've been doing for my personal project, and that's where it all started. That's where the the joke

Brendon Page: 06:41

and the context behind the joker. That's that's how the joke led into me, actually spending production time on

Brendon Page: 06:49

1st off, just doing some experiments. And through the success of this experiments.

Brendon Page: 06:55

coding or doing some prompt engineering to to achieve that the code generation.

Brendon Page: 07:00

Now there was quite a bit of Angst. I will have to just put this in there. I had to try to evaluate, you know. Am I just over engineering? Am I over complicating? Am I trying to use the shiny thing.

Brendon Page: 07:16

Because and and this I actually had some sleepless nods over it, because

Brendon Page: 07:23

we have that in in every developer who's who's passionate is you want to try and explore the fancy, shiny thing that everybody's currently pointing at going. Look

Brendon Page: 07:35

so for me. What ended up what ended up flinching? It was

Brendon Page: 07:42

the the. There was a bit of fuzziness to the code generation.

Brendon Page: 07:45

and the more I I tried to do a Poc for it, using more traditional code generation. And the more I got into it the more just edge cases I found

Brendon Page: 07:55

I was just like, Oh, no, I gotta deal with that. And oh, no, I gotta deal with that. And oh, no, I gotta deal with that. And and I just didn't want to.

Travis Frisinger: 08:02

Do you say?

Brendon Page: 08:03

Hmm.

Travis Frisinger: 08:05

When you say edge cases, you talking technical edge cases, are you talking limitations of the technology like what kind of.

Brendon Page: 08:13

Okay.

Travis Frisinger: 08:14

What you were experiencing.

Brendon Page: 08:17

So I started to do some Pocs.

Brendon Page: 08:20

because again, I had the angst around just normal code generation. And

Brendon Page: 08:28

I started to find situations. I can't share too many details, because you know.

Travis Frisinger: 08:32

I understand.

Brendon Page: 08:34

Yeah. But so I started to experience some edge cases where I would have to. I would have to do more complex code generation, you know, I would have to understand types and type mapping because I was converting between different languages. Let's let's put it that way.

Brendon Page: 08:54

And I was like, oh, no! Now I have to. It's possible I have to sit now. I have to code all the top mappings, and then you go. Oh, great! But what about the innumerable top mappings and and and then? Now it just keeps on going, and before I know it I'm I'm going to be flipping, doing a full visitor for an ast, and you know.

Travis Frisinger: 09:14

So what you're saying is, the shiny object started uncovering a lot more complexity than it was.

Brendon Page: 09:19

Yay!

Travis Frisinger: 09:20

The value it might bring.

Brendon Page: 09:21

Well, actually, I mean, let's just be clear here. So there was the. I was hesitant about the shiny object which was the AI. And then I thought.

Brendon Page: 09:30

I thought, Okay, well, I'm hesitant about just using AI because it's cool. And I'd had success in my personal project. So I I went and did some Pocs with the traditional code generation. I went down the traditional code generation route and did some Pocs there. And that's where I was uncovering

Brendon Page: 09:46

these these edge cases. And it's just that an infinite rabbit hole of of if anybody's done code generation before. And you want to translate from one

Brendon Page: 09:57

syntax to another syntax. You just you just end up having to basically pull out the spec and go what's possible

Brendon Page: 10:05

over here. And how does it map to over here, and it's a ton of work. I I thought, okay, I could do it. But it's a ton of work. So I was quite glad I did that because my instinct was I had success in my personal project. Let me just jump straight in. My boss has got approval. I've got approval from my boss because he thought it was an interesting idea as well. But I put the brakes on it and thought, Okay, let me do some Pocs. Let's see how much it's gonna cost. And it would have been

Brendon Page: 10:33

my, the estimation was weeks, and I had multiple different types of code like generation. I wanted to do, not just conversion from

Brendon Page: 10:41

language A to language B. There was within the same language as well. I wanted to generate classes based off of source classes. So let's say, you have a data model. And I wanted to test data builder, or at least a starting point for test data builder.

Brendon Page: 10:56

So I was like I.

Brendon Page: 10:58

I could just see this cogeneration path, the traditional path, the Non AI path just getting deeper and deeper and deeper. And yes, I would have been speeding the team up. But I thought, Okay.

Brendon Page: 11:09

I've now

Brendon Page: 11:11

evaluated it. I've sort of cured my Angst. I know what that part looks like there's less unknown unknowns down there.

Brendon Page: 11:18

And

Brendon Page: 11:19

there's not I said, fuzziness earlier. There isn't fuzziness, but there's just a lot of detail. If you go down that path. So I thought, Okay, it's deterministic. But again, you have to understand everything. So I thought, let's see if using AI

Brendon Page: 11:35

will, if I can cut out some of that. The need for some of that complexity, I mean, after all, these things are

Brendon Page: 11:43

word calculators. You know. They're probabilistic. They do operate well in fuzzy scenarios.

Brendon Page: 11:50

So just to clarify here, making sure that I'm understanding you, you tried to use traditional code generators, and that's where you bumped up on a lot of the.

Travis Frisinger: 11:59

Kind of nuances of being able to do low, level language work. And that's when you turn to AI to see if it could assist you in kind of overcoming some of those gaps.

Brendon Page: 12:09

Exactly. Well, yes, I mean there was. I would say there wasn't a gap as in

Brendon Page: 12:14

as in. It wouldn't work. There was just a lot of effort, and the path was clear. And I thought, Okay, I can do this. But every time I write a new generator I have to fully understand the source and destination. And it's not just a matter of doing a simple one. Because, of course, these guys are writing a real system.

Brendon Page: 12:33

They're writing a real system with real code, with real edge cases. So I will have to deal with those edge cases. Otherwise the output is just going to break, you know, and it's going to stop generating, or it's just going to leave out something. And

Brendon Page: 12:46

there's gonna there's going to be a lack of trust, anyhow, unless I put in massive effort. And that's actually I'm glad my brain popped that word out like all that phrase on lack of trust. I thought. I'm not willing to put the effort in, because it's not going to save enough time.

Brendon Page: 13:03

and therefore they're going to not trust that they will have to review it anyhow. So why don't I try the tool which I know is going to hallucinate, slash output fuzzier results because they're going to have to review it, anyhow, and let's see how fast I can get that tool out using AI.

Travis Frisinger: 13:24

Oh, well, so really, you you found a use case where AI can accelerate. So you're like, I'm still gonna have issues. Let's just try and get this concept out as quickly as possible to validate. If it's gonna add value.

Brendon Page: 13:36

Exactly. Yeah, because I was faced with a fork in the road, spend weeks and weeks and weeks and weeks and potentially weeks each time every time I write a code generator, and I think we've ended with 10 or 15 somewhere around there, because people have added so I've actually lost count, but I certainly got 10 out. So do I spend a whole bunch of time every single time.

Brendon Page: 13:53

And to make it super deterministic, or do I invest less time.

Brendon Page: 13:57

and then it's super. Fuzzy. People have to code review it. And I only had. I didn't have weeks and weeks for each one. So I thought, well, if I don't have that time to make it super deterministic, and so that you know what people expect from code generator, then let me see how much of the engineering work or cost of time is going to be on the AI side.

Brendon Page: 14:17

because if I can put in the same amount of effort or less. Then perhaps it's better to to go. That route

Brendon Page: 14:25

was a little bit of the hey? There's the shiny thing. But there was sort of a deliberate choice to go that route, slowing down before I just dived into it or dove into it. Whichever version of English you want.

Travis Frisinger: 14:42

Were you up against American versus the Queen's English? Here.

Brendon Page: 14:45

Yes, that's.

Travis Frisinger: 14:46

Yeah, yeah, fair. Enough.

Brendon Page: 14:48

Can leave it in if you want. Sorry.

Travis Frisinger: 14:52

I love the pragmatism that you. You've approached this with you like, okay, I have an issue. I understand the traditional ways of solving this. I'm familiar with some of the complexities. I mean, you're seasoned veteran of this industry, you know, over 2 decades of experience. So you're like, okay, maybe this is one of those scenarios kind of given all the constraints where, if I go after the shiny thing, I might actually be able to unlock some value.

Travis Frisinger: 15:16

and you had some experience there because of what you were experimenting with in your own time and kind of your own learning and growth. So you came in knowledgeable. It's not like you were just chasing a shiny object you're like. I do think this may actually have utility for my problem. You're very, very pragmatic about how you approach this.

Brendon Page: 15:34

Yes.

Brendon Page: 15:35

yes, and look! It comes from skepticism. As you said. 2 decades. I've lived through all the hype cycles, not all of them. There were some before me, but I've lived through many hype cycles.

Brendon Page: 15:49

Sorry, Mr. Editor. I had to cough so. Yes.

Brendon Page: 15:53

yeah. The pragmatism came from living through many harp cycles in the 2 decades that you referenced from the 1st round of low code. No code to the second round of low code, no code harp cycles, you know. So it it. It's 1 of those lessons I've learned. Don't do. Conference driven development conferences are great.

Brendon Page: 16:14

You get new information. You get to meet really interesting people with really interesting opinions, and they will broaden your worldview. But if you just go back to your context and you take everything that's shiny from their context.

Brendon Page: 16:27

You're going to create a disaster because

Brendon Page: 16:31

most of the time they're showcasing what's shiny, not what context it worked in.

Travis Frisinger: 16:39

Ye? Yes, I think that's a great aside. There, you know. Be be pragmatic about this. Gather insights, but you know, kind of filter your experience. So I wanted to touch on something interesting. You said you said you may 10 or 10. There might be 10 or 15 of these. What are these like? How how are you getting your team empowered to use this? AI.

Brendon Page: 16:59

Okay. So let me let me sort of rewind slightly. I've got a bit excited and continue the story. So I I we were talking quite a bit about my hardware, and how it's more possible than it has been previously to run this, especially with the quantization down to look. And now you get 1.5 2 or 2 5 bits models, whatever.

Brendon Page: 17:21

I I went to work one day and I decided, device Manager, let me go and see what graphics cause these things have, and because the client has purchased hardware for the.

Brendon Page: 17:31

for the longevity, they have purchased workstations. So they have workstation gpus, which means they come with minimum 8 gigs of RAM

Brendon Page: 17:39

in the newer models.

Brendon Page: 17:41

And I thought, Hey, 8 gigs of RAM, that's what my one has. Let's see if the Nvidia cuda drivers install on this, and they did, which was great. So I got some downloads into the offline network, set it up, and I started experimenting. And to answer your question, I didn't really know what I wanted to build. I just know I wanted

Brendon Page: 18:04

to. I didn't know what they were going to get called. I know I knew the outcome. I needed to generate some code. So input.

Brendon Page: 18:11

And then, following a particular recipe slash pattern as output. So I just started off by calling them recipes

Brendon Page: 18:19

and sorry, just clearing the throat.

Brendon Page: 18:26

So I just started court.

Brendon Page: 18:29

So I just started off. Not enough break.

Brendon Page: 18:34

So I just started off by calling them recipes. And as I started to talk to people about them, we realized that they kind of are coding assistants because

Brendon Page: 18:44

they start off with a particular recipe or seeded chat in my world, but they persist after that, unlike other standard code generators where you, you just generate the code, you get the output, you use it. This is a chat window. I built a little custom chat interface that allowed me to edit any message, because that's advantageous for some prompt engineering.

Brendon Page: 19:12

and every time I use the chat interface that doesn't have it, I'm frustrated.

Brendon Page: 19:17

But anyhow, back to the point.

Brendon Page: 19:21

So yes, I call these things recipes. And through some conversation we landed on AI coding assistance because you can ask it questions. It has a seated chat. It's like a custom Gpt sort of, but it's more like a custom gpt on steroids, because with a custom Gpt, you can set the system prompt.

Brendon Page: 19:42

and then that's the custom. Gpt, right. So when you working with the Api, which a llama has an Api, you can control more. You can control temperature or model parameters system prompt. You can see the chat because underlying all of these chats is a giant array of text.

Brendon Page: 20:04

And I've had varying success with different techniques, actually learned a whole bunch of stuff. But yes.

Brendon Page: 20:11

long story short, they are called AI coding assistance. And I built in some really cool, interesting stuff and experimented with different techniques to to get to, to effectively get the AI to do what I wanted because.

Travis Frisinger: 20:30

Just a question for you.

Brendon Page: 20:32

Shoot goods.

Travis Frisinger: 20:33

So does each developer, then have a version of this running on their machine that they can.

Brendon Page: 20:39

No worries.

Travis Frisinger: 20:39

Act with

Travis Frisinger: 20:40

all right, is there like there's a catalog of recipes they're able to pick from and kind of, based on the task that they need to achieve.

Brendon Page: 20:48

Yes, so it is. It is a very simple interface. You boot it up.

Brendon Page: 20:55

You have to have Olama running locally. So yes, it's running on their machines because they each have that Gpu in them.

Brendon Page: 21:02

There's a list of recipes. And then also the ability to start just plain, unseated, unconfigured non custom chats with the various models installed on their machines so.

Travis Frisinger: 21:17

So yeah.

Travis Frisinger: 21:19

they've got the ability to use these templates and still have a regular chat dialogue if they wanted to explore. Maybe a new recipe, for instance.

Brendon Page: 21:28

Yes, in fact, one of their favorite things is asking, asking it to generate rejects for for finding types of code. Well, one of the guys. At least he will. I think he uses the Code llama model to ask it to generate a rejects that would match something because he wants to find or replace something across many files, and coming up with a rejects is just a disaster. Time consuming disaster.

Travis Frisinger: 21:53

So I love that because you've empowered your your team. And now they're finding their own use cases now that they've got access to this new utility. They're finding ways of kind of empowering themselves and streamlining their workflows.

Brendon Page: 22:06

Yes, and in fact, the the entire

Brendon Page: 22:11

project that that I created is in in the Monorepo, and it's under dev tools, because we have a Dev tools folder in the Monorepo. So all the prompts are just text files that load up when when the when the website loads up so that the the team is completely free and have been doing it where they have created their own prompts and committed them back to source. So they're available to the entire team.

Brendon Page: 22:37

And even if it's just them using it, the the rest of the team will next time they fetch. See that that new coding assistance

Brendon Page: 22:46

and be able to make use of it. Yeah.

Travis Frisinger: 22:49

I really like this this story of kind of just curiosity unlocking this empowerment of your team, and like just really showing the value, how that you went from skepticism to kind of unlocking value in a very unlikely scenario.

Brendon Page: 23:04

Yes, I mean, that's a great way to phrase it, because I I was skeptical, but I'm always skeptical, because, as I said, I've been burnt quite a bit about it, and that's why I was trying it out personally. But I suppose that's 1 thing I would like to also say is, what this has taught me is that

Brendon Page: 23:27

it is just like big data and low code. No code. And all of the trends that have come along like docker and swarm and Kubernetes and all the things at the end of the day. They're tools, right? And yes, the the ais are interesting scary tools, because they're very impressive, yet at the same time incredibly stupid. I've learned that through this particular endeavor of mine

Brendon Page: 23:53

at the end of the day they're a tool. And so the questions I've started asking myself is, how do they fit into my various stacks?

Brendon Page: 24:01

My dev stack my production stack? Because

Brendon Page: 24:07

and the ability to run these now is is

Brendon Page: 24:12

the low well, the ability to run them. The barrier to entry to running them is incredibly low. If you realistically got a pair of

Brendon Page: 24:21

4,090 s. Or 4,080 s. 4,090 s. With enough vram. You could even use it in a production scenario. You don't even need the fancy, AI Gpus, that Nvidia makes so so

Brendon Page: 24:35

as developers, we should be skeptical.

Brendon Page: 24:37

But we should also understand that there are tools, and we should understand what they're good at, and to do that we should be playing around with them. As you said, you should be curious. Play around with them in your personal projects, or

Brendon Page: 24:52

maybe you don't have a personal project, is not everybody likes it, but at the end of the day. These are new tools moving at breakneck speed.

Brendon Page: 24:58

and you better play around with them. Otherwise you're not going to understand how to use them.