Solopreneur Workflow: Simplifying Big Data Analysis with AI Agents

Your AI tool isn't broken. It's just full.

Hi, I'm Mike Fox, host of this podcast, "Lone Wolf Unleashed." I help solo founders systemise their businesses so they can switch off sooner and live larger. This week I'm pulling back the curtain on a real data project: 103,000 rows, a client locked into Microsoft Copilot, and a categorisation task that would've taken weeks to do manually.

Here's what I worked through — and what you can take straight into your own business.

The context window is the AI's working memory. Once it runs out, the quality of your outputs tanks — or the conversation just stops. Understanding this constraint is the difference between AI that saves you hours and AI that wastes them.

Working within real-world limitations (not every client is on Claude), I built a strategy to break down a massive data set into token-efficient chunks, set up a structured workflow for Microsoft Copilot to process them in sequence, and then used a manager-agent review layer to QA the outputs before any human had to.

The same principles apply whether you're running Claude, ChatGPT, or whatever tool your organisation has decided is the one. The constraints change. The framework doesn't.

What you'll learn:

What a context window is and why it limits what your AI can do with large data sets
How to make your data and your prompts token-efficient before you send them
A practical chunking strategy for splitting large Excel or CSV files across multiple AI sessions
How to use a manager-agent role to review and QA your AI outputs
Which model settings to use for heavy analytical tasks

If you're using AI to make decisions — not just write emails — this episode is for you.

Resources, frameworks, and tools: lonewolfunleashed.com/resources

Mentioned in this episode:

This podcast is part of the Podknows Podcasting ICN Network

You might also like...

Check out the "Websites Made Simple" podcast with Holly Christie at https://websitesmadesimple.co.uk/

Speaker A: 00:00:32

G'.

Speaker A: 00:00:32

Day.

Speaker A: 00:00:32

My name is Mike from Lone Wolf Unleashed and today we're going to talk about how to interact with AI agents and your AI tools to get the best possible outcome for what you're trying to do with them.

Speaker A: 00:00:48

This week I had a data set that we were looking at.

Speaker A: 00:00:52

It had 103,000 lines, which is a lot.

Speaker A: 00:00:58

It's a lot of data.

Speaker A: 00:00:59

And there is a particular problem when you're dealing with data sets this large and that is it will consume your AI context window.

Speaker A: 00:01:09

Now what is that?

Speaker A: 00:01:10

The context window is basically how much information the AI can read before it needs to basically compact that and to reload it or you need to start a new conversation.

Speaker A: 00:01:23

A few months ago, Claude would just run out of context window and you would need to start a new conversation.

Speaker A: 00:01:28

You would basically need to start again.

Speaker A: 00:01:30

And remember, wherever you're up to, it has since got a little bit better than that.

Speaker A: 00:01:35

It is now auto compacting.

Speaker A: 00:01:36

It's providing a summary of what the conversation was about so it can continue.

Speaker A: 00:01:40

And I haven't had to do that since.

Speaker A: 00:01:42

A couple weeks ago, Anthropic actually increased my context window to about a million tokens, which was five times the amount that I was on at the time.

Speaker A: 00:01:52

It is a game changer because I haven't seen it compact since.

Speaker A: 00:01:55

When we're dealing with data sets this large, we need to be thinking about how much room the AI has to work with.

Speaker A: 00:02:05

This is going to basically be a little bit of a chat about how we go about planning out how we're going to use data of this kind with an AI tool and some strategies that will help us get around some of our constraints.

Speaker A: 00:02:23

So the customer that I'm working with for this one is constrained to.

Speaker A: 00:02:29

Yep, you guessed it, Microsoft Copilot.

Speaker A: 00:02:33

It's not my favorite AI tool.

Speaker A: 00:02:35

Let's just say that it's probably near the bottom of the list.

Speaker A: 00:02:38

But you know what?

Speaker A: 00:02:39

We have to deal with constraints every day and this is the one that the customer's dealing with.

Speaker A: 00:02:43

The idea here is that we need to be able to feed Copilot this data set and we need be able to have it return a good set of outputs based on the outcome or the objective that we want.

Speaker A: 00:02:58

Basically what we wanted to do was to scan over a bunch of categories and the descriptions of the cases that were in there and to return us a new categorization based on a different parameter which is being driven by legislative requirement.

Speaker A: 00:03:15

Doing this manually in the past, you would have to search for trigger words.

Speaker A: 00:03:19

It would be A very, very long exercise to go through that many records.

Speaker A: 00:03:22

And the idea here is that copilot will run itself over those fields and be able to complete that task.

Speaker A: 00:03:29

The problem is that it cannot consume that much information at a time.

Speaker A: 00:03:37

So, number one, let's have a look at our token usage.

Speaker A: 00:03:41

We want to be able to make this data set as token efficient as possible, and we want to be able to make our prompt as efficient as possible.

Speaker A: 00:03:48

I have gone through before about how to build AI agents, and we want to be able to articulate what the agent or the AI is supposed to do.

Speaker A: 00:03:56

The principles are the same whether you're dealing with an AI agent or an agent team or whether you're just interacting in a chat.

Speaker A: 00:04:03

We want to be able to make all of those things token efficient.

Speaker A: 00:04:07

So the first one is, well, let's strip out all the IDs that were in there.

Speaker A: 00:04:11

They were fairly long.

Speaker A: 00:04:12

Let's just go one through: 100300

Speaker A: 00:04:18

We can create a matching field.

Speaker A: 00:04:20

That's number one.

Speaker A: 00:04:21

Okay, we've saved a few.

Speaker A: 00:04:23

What other columns do we not need?

Speaker A: 00:04:25

So let's strip out every single piece of information we don't need in there.

Speaker A: 00:04:30

And then what we want to be able to do is we want to be able to articulate the outcome in a prompt.

Speaker A: 00:04:37

And we're going to do some testing about how much information is there.

Speaker A: 00:04:42

Right.

Speaker A: 00:06:03

So we tried the first time.

Speaker A: 00:06:04

Results were rubbish.

Speaker A: 00:06:06

It tried to consume too much information and the results weren't great.

Speaker A: 00:06:10

So now we're gonna chunk our data set down into manageable data sets.

Speaker A: 00:06:15

Let's say we wanna do this the long way around just to make sure that it works.

Speaker A: 00:06:19

Okay, let's break this down into 20 data sets.

Speaker A: 00:06:23

So we're gonna split this out into several different Excel books and we're gonna have a master Excel book where it's gonna return some information into, you know, so we're going to have 20 workbooks with just over 5,000 records each.

Speaker A: 00:06:39

This is much more manageable for the, for the AI tool because now we're not consuming near as much of the tokens that it has available to use for these activities.

Speaker A: 00:06:49

Now, if you're in a pro version of Microsoft Copilot, you'll be able to do things like point it towards your documents for knowledge sources and all those kinds of things.

Speaker A: 00:07:01

And in my case, I'm going to assume you're paying for it.

Speaker A: 00:07:04

What I would do is I would set up an agent to take care of this, there is a create agent type option in there.

Speaker A: 00:07:13

Create an agent in your instructions, put in what your original prompt was going to be for what it needs to do.

Speaker A: 00:07:19

And then we're going to direct it on what, how it is supposed to interact with the files.

Speaker A: 00:07:24

We have the initial part, which is easy.

Speaker A: 00:07:26

It's the hey, look at this data.

Speaker A: 00:07:28

Provide me some insights as to these new categories.

Speaker A: 00:07:33

That's the easy part.

Speaker A: 00:07:34

Now we want to direct it to go, okay, here IS data set 1.

Speaker A: 00:07:38

We are going to take this, we're going to put it, the result, in a new workbook so we can consolidate it all again.

Speaker A: 00:07:45

And the first thing you're going to do is name the file that you were in.

Speaker A: 00:07:50

Great.

Speaker A: 00:07:51

It should know that because it has the title of the thing.

Speaker A: 00:07:54

And then it's going to return in the total number of cases that are in there.

Speaker A: 00:08:01

And in the original workbook, it's going to categorize them all out.

Speaker A: 00:08:04

This is creates a quick check for us because we can open the new workbook that has the summaries in it to see that it's working.

Speaker A: 00:08:14

We don't have to open up all 20 of these other ones.

Speaker A: 00:08:16

We can just have one open that we can see that it's returning in those totals.

Speaker A: 00:08:21

Once it's done all 20 of those, we should have our total number of cases that we're looking at with these new categories.

Speaker A: 00:08:27

And we should be able to then combine in the other 20 back into one main big one as a new data set.

Speaker A: 00:08:35

Now, this is still a little bit manual.

Speaker A: 00:08:37

Yes, I know I could probably run this through Claude in my current setup in one shot, but, you know, again, if we're dealing with the constraints, that's what we need to do.

Speaker A: 00:08:48

Why are we doing this exercise?

Speaker A: 00:08:50

Well, in this case, the client is trying to make a determination ultimately around the type of resourcing that needs to be held within teams based on case volume.

Speaker A: 00:09:00

And there is an understanding at the moment there are invisible cases that are not being raised just because their operations are fragmented, or there's not enough governance, or there's not enough ownership.

Speaker A: 00:09:12

And the reporting because of that is very limited.

Speaker A: 00:09:15

So what we're trying to do here is we're trying to get come from a different perspective, a different angle, look at the problem from a different angle and go, okay, well, if we've got all these cases and case information, maybe they've been categorized incorrectly in the past, or maybe there are other layers to these cases that we haven't previously understood.

Speaker A: 00:09:33

Can we figure out what order of magnitude we're looking at here in terms of any new resource that we would need.

Speaker A: 00:09:40

That's the idea.

Speaker A: 00:09:41

The idea for this particular exercise was not to go down into very specific case by case matters.

Speaker A: 00:09:47

It was more to do with how to figure out the bigger problem.

Speaker A: 00:09:58

So that's basically in a nutshell, how to break down these big data sets when working with AI tools.

Speaker A: 00:10:06

The basic premise when doing exercises like this, particularly around data analysis and stuff, remember we need to check that it's doing it right.

Speaker A: 00:10:14

We need to scrutinize its outputs.

Speaker A: 00:10:17

We can't just take what it has as gospel.

Speaker A: 00:10:20

The other thing too is it's possible to run a multi stage exercise here in that we can feed it the original stuff, it can return.

Speaker A: 00:10:27

We can have now a manager agent run over the top of and go, we want to check the work of this one.

Speaker A: 00:10:34

Provide us what any of the gaps was around the assumptions or the outputs and then it can return to you and you can sort of test and refine from there.

Speaker A: 00:10:42

I always highly recommend, and I've, I've mentioned this before, I always highly recommend running over like a test or a battle test or a manager review type exercise over the AI outputs.

Speaker A: 00:10:55

Remember that the AI is given a role when it is doing a task for you and you would go, well, we were just getting it to check its own homework in a sense.

Speaker A: 00:11:03

Yes you are.

Speaker A: 00:11:04

But the role that it used to produce the original content wasn't the role that you've now given it to go and review.

Speaker A: 00:11:12

It does break down that time to spot check and review as well for you to, when you're doing your own review, which is really super helpful, keep those things in mind.

Speaker A: 00:11:23

The other thing that we need to think about is the type of model that we're using.

Speaker A: 00:11:26

So if you're in a tool like OpenAI's, ChatGPT or Anthropics Claude, then you'll want to make sure it's in the right mode or using the right model for these really heavy tasks.

Speaker A: 00:11:40

I would be using Opus 4.6 for example, with extended thinking mode on.

Speaker A: 00:11:45

Yes, it does consume more tokens, but you get a way better outcome and that's what we're looking for.

Speaker A: 00:11:52

Better outcomes means less of your time used doing admin work.

Speaker A: 00:11:57

What have we covered today?

Speaker A: 00:11:58

We've covered today that we need to be thinking about the constraints and we are working with our constraints of using AI tools within your business.

Speaker A: 00:12:07

And maybe it's a decision that you made earlier on about which tool to use and all that sort of stuff.

Speaker A: 00:12:12

I don't recommend really going out to purchase all these different AI tools.

Speaker A: 00:12:16

We pick one and we stick with it and we get really good at using it.

Speaker A: 00:12:19

That's the idea.

Speaker A: 00:12:21

We've covered how to break down these really large data sets.

Speaker A: 00:12:25

If you're wanting to get insights on maybe some unstructured data or data that we need to have an AI tool run over, it's not necessarily deterministic, then we need to break those down into a token efficient way to be able to, to get it to work properly and provide good outcomes.

Speaker A: 00:12:42

And then at the end we've covered how to make sure that it's given the output by running a manager type role over it and then spot checking ourselves.

Speaker A: 00:12:53

That's gonna do us today.

Speaker A: 00:12:54

Guys, thank you so much for joining me.

Speaker A: 00:12:56

I thank you so much.

Speaker A: 00:12:58

You could be doing so many other things, but instead you decided to hang out with me and learn how to break down these data sets to use in AI tools when you're trying to make big decisions.

Speaker A: 00:13:09

This is all about saving time.

Speaker A: 00:13:10

You can save more time by heading over to my website, lonewolfunleashed.com resources.

Speaker A: 00:13:16

There's a whole bunch of resources there that you can use.

Speaker A: 00:13:19

There's AI stuff, there's process stuff, there's procedure stuff, there's presentations, there's, there's heaps of stuff.

Speaker A: 00:13:26

Stuff, stuff, stuff.

Speaker A: 00:13:27

There's heaps of stuff.

Speaker A: 00:13:28

Go over there, check that out and I'll see you next week.

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube