Artwork for podcast Lone Wolf Unleashed - avoid exhaustion, reclaim your time using tools, systems and AI
Solopreneur Workflow: Simplifying Big Data Analysis with AI Agents
Episode 337th April 2026 • Lone Wolf Unleashed - avoid exhaustion, reclaim your time using tools, systems and AI • Mike Fox
00:00:00 00:13:39

Share Episode

Shownotes

Your AI tool isn't broken. It's just full.

Hi, I'm Mike Fox, host of this podcast, "Lone Wolf Unleashed." I help solo founders systemise their businesses so they can switch off sooner and live larger. This week I'm pulling back the curtain on a real data project: 103,000 rows, a client locked into Microsoft Copilot, and a categorisation task that would've taken weeks to do manually.

Here's what I worked through — and what you can take straight into your own business.

The context window is the AI's working memory. Once it runs out, the quality of your outputs tanks — or the conversation just stops. Understanding this constraint is the difference between AI that saves you hours and AI that wastes them.

Working within real-world limitations (not every client is on Claude), I built a strategy to break down a massive data set into token-efficient chunks, set up a structured workflow for Microsoft Copilot to process them in sequence, and then used a manager-agent review layer to QA the outputs before any human had to.

The same principles apply whether you're running Claude, ChatGPT, or whatever tool your organisation has decided is the one. The constraints change. The framework doesn't.

What you'll learn:

  • What a context window is and why it limits what your AI can do with large data sets
  • How to make your data and your prompts token-efficient before you send them
  • A practical chunking strategy for splitting large Excel or CSV files across multiple AI sessions
  • How to use a manager-agent role to review and QA your AI outputs
  • Which model settings to use for heavy analytical tasks

If you're using AI to make decisions — not just write emails — this episode is for you.

Resources, frameworks, and tools: lonewolfunleashed.com/resources

Mentioned in this episode:

This podcast is part of the Podknows Podcasting ICN Network

You might also like...

Check out the "Websites Made Simple" podcast with Holly Christie at https://websitesmadesimple.co.uk/

Transcripts

Speaker:

G'day. My name is Mike from Lone Wolf Unleashed and today we're

Speaker:

going to talk about how to interact with AI agents and your

Speaker:

AI tools to get the best possible outcome for what you're trying

Speaker:

to do with them. This week I had a data set that

Speaker:

we were looking at. It had 103,000 lines,

Speaker:

which is a lot. It's a lot of data.

Speaker:

And there is a particular problem when you're dealing with data sets

Speaker:

this large and that is it will consume your AI context window.

Speaker:

Now what is that? The context window is basically how much information

Speaker:

the AI can read before it needs to basically compact that and

Speaker:

to reload it, or you need to start a new conversation.

Speaker:

A few months ago, Claude would just run out of context window

Speaker:

and you would need to start a new conversation.

Speaker:

You would basically need to start again.

Speaker:

And remember, wherever you're up to,

Speaker:

it has since got a little bit better than that.

Speaker:

It is now auto-compacting. It's providing a summary of what the conversation

Speaker:

was about so it can continue.

Speaker:

And I haven't had to do that since.

Speaker:

A couple of weeks ago, Anthropic actually increased my context window to

Speaker:

about a million tokens, which was five times the amount that I

Speaker:

was on at the time. It is a game changer because I

Speaker:

haven't seen it compact since.

Speaker:

When we're dealing with data sets this large,

Speaker:

we need to be thinking about how much room the AI has

Speaker:

to work with. This is going to basically be a little bit

Speaker:

of a chat about how we go about planning out how we're

Speaker:

going to use data of this kind with an AI tool and

Speaker:

some strategies that will help us get around some of our constraints.

Speaker:

So the customer that I'm working with for this one is constrained

Speaker:

to — yep, you guessed it —

Speaker:

Microsoft Copilot. It's not my favourite AI tool.

Speaker:

Let's just say that it's probably near the bottom of the list.

Speaker:

But you know what? We have to deal with constraints every day

Speaker:

and this is the one that the customer's dealing with.

Speaker:

The idea here is that we need to be able to feed

Speaker:

Copilot this data set and we need to be able to have

Speaker:

it return a good set of outputs based on the outcome or

Speaker:

the objective that we want.

Speaker:

Basically what we wanted to do was to scan over a bunch

Speaker:

of categories and the descriptions of the cases that were in there

Speaker:

and to return us a new categorisation based on a different parameter,

Speaker:

which is being driven by legislative requirement.

Speaker:

Doing this manually in the past,

Speaker:

you would have to search for trigger words.

Speaker:

It would be a very, very long exercise to go through that

Speaker:

many records. And the idea here is that Copilot will run itself

Speaker:

over those fields and be able to complete that task.

Speaker:

The problem is that it cannot consume that much information at a

Speaker:

time. So number one, let's have a look at our token usage.

Speaker:

We want to be able to make this data set as token-efficient

Speaker:

as possible, and we want to be able to make our prompt

Speaker:

as efficient as possible. I have gone through before about how to

Speaker:

build AI agents, and we want to be able to articulate what

Speaker:

the agent or the AI is supposed to do.

Speaker:

The principles are the same whether you're dealing with an AI agent

Speaker:

or an agent team or whether you're just interacting in a chat.

Speaker:

We want to be able to make all of those things token-efficient.

Speaker:

So the first one is, well,

Speaker:

let's strip out all the IDs that were in there.

Speaker:

They were fairly long. Let's just go one through to 103,000 and

Speaker:

we can match it back to the original data set.

Speaker:

We can create a matching field.

Speaker:

That's number one. Okay, we've saved a few.

Speaker:

What other columns do we not need?

Speaker:

So let's strip out every single piece of information we don't need

Speaker:

in there. And then what we want to be able to do

Speaker:

is we want to be able to articulate the outcome in a

Speaker:

prompt. And we're going to do some testing about how much information

Speaker:

is there. Right. So we tried the first time.

Speaker:

Results were rubbish. It tried to consume too much information and the

Speaker:

results weren't great. So now we're going to chunk our data set

Speaker:

down into manageable data sets.

Speaker:

Let's say we want to do this the long way around just

Speaker:

to make sure that it works.

Speaker:

Okay, let's break this down into 20 data sets.

Speaker:

So we're going to split this out into several different Excel workbooks

Speaker:

and we're going to have a master Excel workbook where it's going

Speaker:

to return some information. So we're going to have 20 workbooks with

Speaker:

just over 5,000 records each. This is much more manageable for the

Speaker:

AI tool because now we're not consuming nearly as many of the

Speaker:

tokens that it has available to use for these activities.

Speaker:

Now, if you're in a Pro version of Microsoft Copilot,

Speaker:

you'll be able to do things like point it towards your documents

Speaker:

for knowledge sources and all those kinds of things.

Speaker:

And in my case, I'm going to assume you're paying for it.

Speaker:

What I would do is I would set up an agent to

Speaker:

take care of this. There is a create agent type option in

Speaker:

there. Create an agent, and in your instructions,

Speaker:

put in what your original prompt was going to be for what

Speaker:

it needs to do.

Speaker:

And then we're going to direct it on how it is supposed

Speaker:

to interact with the files. We have the initial part,

Speaker:

which is easy. It's the —

Speaker:

hey, look at this data, provide me some insights as to these

Speaker:

new categories. That's the easy part.

Speaker:

Now we want to direct it to go,

Speaker:

okay, here is data set one.

Speaker:

We are going to take this,

Speaker:

we're going to put the result in a new workbook so we

Speaker:

can consolidate it all again. And the first thing you're going to

Speaker:

do is name the file that you were in.

Speaker:

Great. It should know that because it has the title of the

Speaker:

thing. And then it's going to return the total number of cases

Speaker:

that are in there. And in the original workbook,

Speaker:

it's going to categorise them all out.

Speaker:

This creates a quick check for us because we can open the

Speaker:

new workbook that has the summaries in it to see that it's

Speaker:

working. We don't have to open up all 20 of these other

Speaker:

ones. We can just have one open that we can see that

Speaker:

it's returning those totals. Once it's done all 20 of those,

Speaker:

we should have our total number of cases that we're looking at

Speaker:

with these new categories.

Speaker:

And we should be able to then combine the other 20 back

Speaker:

into one main big one as a new data set.

Speaker:

Now, this is still a little bit manual.

Speaker:

Yes, I know I could probably run this through Claude in my

Speaker:

current setup in one shot, but again,

Speaker:

if we're dealing with the constraints,

Speaker:

that's what we need to do.

Speaker:

Why are we doing this exercise?

Speaker:

Well, in this case, the client is trying to make a determination

Speaker:

ultimately around the type of resourcing that needs to be held within

Speaker:

teams based on case volume. And there is an understanding at the

Speaker:

moment that there are invisible cases that are not being raised,

Speaker:

just because their operations are fragmented,

Speaker:

or there's not enough governance, or there's not enough ownership.

Speaker:

And the reporting because of that is very limited.

Speaker:

So what we're trying to do here is we're trying to come

Speaker:

from a different perspective, a different angle —

Speaker:

look at the problem from a different angle and go,

Speaker:

okay, well if we've got all these cases and case information,

Speaker:

maybe they've been categorised incorrectly in the past,

Speaker:

or maybe there are other layers to these cases that we haven't

Speaker:

previously understood. Can we figure out what order of magnitude we're looking

Speaker:

at here in terms of any new resource that we would need?

Speaker:

That's the idea. The idea for this particular exercise was not to

Speaker:

go down into very specific case-by-case matters.

Speaker:

It was more to do with how to figure out the bigger

Speaker:

problem. So that's basically in a nutshell how to break down these

Speaker:

big data sets when working with AI tools.

Speaker:

The basic premise when doing exercises like this,

Speaker:

particularly around data analysis, is —

Speaker:

remember we need to check that it's doing it right.

Speaker:

We need to scrutinise its outputs.

Speaker:

We can't just take what it has as gospel.

Speaker:

The other thing too is it's possible to run a multi-stage exercise

Speaker:

here, in that we can feed it the original stuff,

Speaker:

it can return.

Speaker:

We can have now a manager agent run over the top of

Speaker:

it and go — we want to check the work of this

Speaker:

one. Provide us with any of the gaps around the assumptions or

Speaker:

the outputs, and then it can return to you and you can

Speaker:

sort of test and refine from there.

Speaker:

I always highly recommend — and I've mentioned this before —

Speaker:

I always highly recommend running over like a test or a battle

Speaker:

test, a manager review type exercise over the AI outputs.

Speaker:

Remember that the AI is given a role when it is doing

Speaker:

a task for you, and you would go —

Speaker:

well, we were just getting it to check its own homework in

Speaker:

a sense. Yes, you are. But the role that it used to

Speaker:

produce the original content wasn't the role that you've now given it

Speaker:

to go and review. It does break down that time to spot-check

Speaker:

and review as well — for you when you're doing your own

Speaker:

review, which is really super helpful.

Speaker:

Keep those things in mind. The other thing that we need to

Speaker:

think about is the type of model that we're using.

Speaker:

So if you're in a tool like OpenAI's ChatGPT or Anthropic's Claude,

Speaker:

then you'll want to make sure it's in the right mode or

Speaker:

using the right model for these really heavy tasks.

Speaker:

I would be using Opus 4.6,

Speaker:

for example, with extended thinking mode on.

Speaker:

Yes, it does consume more tokens,

Speaker:

but you get a way better outcome and that's what we're looking

Speaker:

for. Better outcomes means less of your time used doing admin work.

Speaker:

What have we covered today? We've covered today that we need to

Speaker:

be thinking about the constraints and we are working with our constraints

Speaker:

of using AI tools within your business.

Speaker:

And maybe it's a decision that you made earlier on about which

Speaker:

tool to use and all that sort of stuff.

Speaker:

I don't recommend really going out to purchase all these different AI

Speaker:

tools. We pick one and we stick with it and we get

Speaker:

really good at using it.

Speaker:

That's the idea. We've covered how to break down these really large

Speaker:

data sets. If you're wanting to get insights on maybe some unstructured

Speaker:

data, or data that we need to have an AI tool run

Speaker:

over — it's not necessarily deterministic —

Speaker:

then we need to break those down into a token-efficient way to

Speaker:

be able to get it to work properly and provide good outcomes.

Speaker:

And then at the end we've covered how to make sure that

Speaker:

it's given the output by running a manager type role over it

Speaker:

and then spot-checking ourselves. That's going to do us today.

Speaker:

Guys, thank you so much for joining me.

Speaker:

I thank you so much. You could be doing so many other

Speaker:

things, but instead you decided to hang out with me and learn

Speaker:

how to break down these data sets to use in AI tools

Speaker:

when you're trying to make big decisions.

Speaker:

This is all about saving time.

Speaker:

You can save more time by heading over to my website,

Speaker:

lonewolfunleashed.com/resources. There's a whole bunch of resources there that you can use.

Speaker:

There's AI stuff, there's process stuff,

Speaker:

there's procedure stuff, there's presentations —

Speaker:

there's heaps of stuff. Go over there,

Speaker:

check that out and I'll see you next week.

Links

Chapters

Video

More from YouTube