Your AI tool isn't broken. It's just full.
Hi, I'm Mike Fox, host of this podcast, "Lone Wolf Unleashed." I help solo founders systemise their businesses so they can switch off sooner and live larger. This week I'm pulling back the curtain on a real data project: 103,000 rows, a client locked into Microsoft Copilot, and a categorisation task that would've taken weeks to do manually.
Here's what I worked through — and what you can take straight into your own business.
The context window is the AI's working memory. Once it runs out, the quality of your outputs tanks — or the conversation just stops. Understanding this constraint is the difference between AI that saves you hours and AI that wastes them.
Working within real-world limitations (not every client is on Claude), I built a strategy to break down a massive data set into token-efficient chunks, set up a structured workflow for Microsoft Copilot to process them in sequence, and then used a manager-agent review layer to QA the outputs before any human had to.
The same principles apply whether you're running Claude, ChatGPT, or whatever tool your organisation has decided is the one. The constraints change. The framework doesn't.
What you'll learn:
If you're using AI to make decisions — not just write emails — this episode is for you.
Resources, frameworks, and tools: lonewolfunleashed.com/resources
Mentioned in this episode:
This podcast is part of the Podknows Podcasting ICN Network
You might also like...
Check out the "Websites Made Simple" podcast with Holly Christie at https://websitesmadesimple.co.uk/
G'day. My name is Mike from Lone Wolf Unleashed and today we're
Speaker:going to talk about how to interact with AI agents and your
Speaker:AI tools to get the best possible outcome for what you're trying
Speaker:to do with them. This week I had a data set that
Speaker:we were looking at. It had 103,000 lines,
Speaker:which is a lot. It's a lot of data.
Speaker:And there is a particular problem when you're dealing with data sets
Speaker:this large and that is it will consume your AI context window.
Speaker:Now what is that? The context window is basically how much information
Speaker:the AI can read before it needs to basically compact that and
Speaker:to reload it, or you need to start a new conversation.
Speaker:A few months ago, Claude would just run out of context window
Speaker:and you would need to start a new conversation.
Speaker:You would basically need to start again.
Speaker:And remember, wherever you're up to,
Speaker:it has since got a little bit better than that.
Speaker:It is now auto-compacting. It's providing a summary of what the conversation
Speaker:was about so it can continue.
Speaker:And I haven't had to do that since.
Speaker:A couple of weeks ago, Anthropic actually increased my context window to
Speaker:about a million tokens, which was five times the amount that I
Speaker:was on at the time. It is a game changer because I
Speaker:haven't seen it compact since.
Speaker:When we're dealing with data sets this large,
Speaker:we need to be thinking about how much room the AI has
Speaker:to work with. This is going to basically be a little bit
Speaker:of a chat about how we go about planning out how we're
Speaker:going to use data of this kind with an AI tool and
Speaker:some strategies that will help us get around some of our constraints.
Speaker:So the customer that I'm working with for this one is constrained
Speaker:to — yep, you guessed it —
Speaker:Microsoft Copilot. It's not my favourite AI tool.
Speaker:Let's just say that it's probably near the bottom of the list.
Speaker:But you know what? We have to deal with constraints every day
Speaker:and this is the one that the customer's dealing with.
Speaker:The idea here is that we need to be able to feed
Speaker:Copilot this data set and we need to be able to have
Speaker:it return a good set of outputs based on the outcome or
Speaker:the objective that we want.
Speaker:Basically what we wanted to do was to scan over a bunch
Speaker:of categories and the descriptions of the cases that were in there
Speaker:and to return us a new categorisation based on a different parameter,
Speaker:which is being driven by legislative requirement.
Speaker:Doing this manually in the past,
Speaker:you would have to search for trigger words.
Speaker:It would be a very, very long exercise to go through that
Speaker:many records. And the idea here is that Copilot will run itself
Speaker:over those fields and be able to complete that task.
Speaker:The problem is that it cannot consume that much information at a
Speaker:time. So number one, let's have a look at our token usage.
Speaker:We want to be able to make this data set as token-efficient
Speaker:as possible, and we want to be able to make our prompt
Speaker:as efficient as possible. I have gone through before about how to
Speaker:build AI agents, and we want to be able to articulate what
Speaker:the agent or the AI is supposed to do.
Speaker:The principles are the same whether you're dealing with an AI agent
Speaker:or an agent team or whether you're just interacting in a chat.
Speaker:We want to be able to make all of those things token-efficient.
Speaker:So the first one is, well,
Speaker:let's strip out all the IDs that were in there.
Speaker:They were fairly long. Let's just go one through to 103,000 and
Speaker:we can match it back to the original data set.
Speaker:We can create a matching field.
Speaker:That's number one. Okay, we've saved a few.
Speaker:What other columns do we not need?
Speaker:So let's strip out every single piece of information we don't need
Speaker:in there. And then what we want to be able to do
Speaker:is we want to be able to articulate the outcome in a
Speaker:prompt. And we're going to do some testing about how much information
Speaker:is there. Right. So we tried the first time.
Speaker:Results were rubbish. It tried to consume too much information and the
Speaker:results weren't great. So now we're going to chunk our data set
Speaker:down into manageable data sets.
Speaker:Let's say we want to do this the long way around just
Speaker:to make sure that it works.
Speaker:Okay, let's break this down into 20 data sets.
Speaker:So we're going to split this out into several different Excel workbooks
Speaker:and we're going to have a master Excel workbook where it's going
Speaker:to return some information. So we're going to have 20 workbooks with
Speaker:just over 5,000 records each. This is much more manageable for the
Speaker:AI tool because now we're not consuming nearly as many of the
Speaker:tokens that it has available to use for these activities.
Speaker:Now, if you're in a Pro version of Microsoft Copilot,
Speaker:you'll be able to do things like point it towards your documents
Speaker:for knowledge sources and all those kinds of things.
Speaker:And in my case, I'm going to assume you're paying for it.
Speaker:What I would do is I would set up an agent to
Speaker:take care of this. There is a create agent type option in
Speaker:there. Create an agent, and in your instructions,
Speaker:put in what your original prompt was going to be for what
Speaker:it needs to do.
Speaker:And then we're going to direct it on how it is supposed
Speaker:to interact with the files. We have the initial part,
Speaker:which is easy. It's the —
Speaker:hey, look at this data, provide me some insights as to these
Speaker:new categories. That's the easy part.
Speaker:Now we want to direct it to go,
Speaker:okay, here is data set one.
Speaker:We are going to take this,
Speaker:we're going to put the result in a new workbook so we
Speaker:can consolidate it all again. And the first thing you're going to
Speaker:do is name the file that you were in.
Speaker:Great. It should know that because it has the title of the
Speaker:thing. And then it's going to return the total number of cases
Speaker:that are in there. And in the original workbook,
Speaker:it's going to categorise them all out.
Speaker:This creates a quick check for us because we can open the
Speaker:new workbook that has the summaries in it to see that it's
Speaker:working. We don't have to open up all 20 of these other
Speaker:ones. We can just have one open that we can see that
Speaker:it's returning those totals. Once it's done all 20 of those,
Speaker:we should have our total number of cases that we're looking at
Speaker:with these new categories.
Speaker:And we should be able to then combine the other 20 back
Speaker:into one main big one as a new data set.
Speaker:Now, this is still a little bit manual.
Speaker:Yes, I know I could probably run this through Claude in my
Speaker:current setup in one shot, but again,
Speaker:if we're dealing with the constraints,
Speaker:that's what we need to do.
Speaker:Why are we doing this exercise?
Speaker:Well, in this case, the client is trying to make a determination
Speaker:ultimately around the type of resourcing that needs to be held within
Speaker:teams based on case volume. And there is an understanding at the
Speaker:moment that there are invisible cases that are not being raised,
Speaker:just because their operations are fragmented,
Speaker:or there's not enough governance, or there's not enough ownership.
Speaker:And the reporting because of that is very limited.
Speaker:So what we're trying to do here is we're trying to come
Speaker:from a different perspective, a different angle —
Speaker:look at the problem from a different angle and go,
Speaker:okay, well if we've got all these cases and case information,
Speaker:maybe they've been categorised incorrectly in the past,
Speaker:or maybe there are other layers to these cases that we haven't
Speaker:previously understood. Can we figure out what order of magnitude we're looking
Speaker:at here in terms of any new resource that we would need?
Speaker:That's the idea. The idea for this particular exercise was not to
Speaker:go down into very specific case-by-case matters.
Speaker:It was more to do with how to figure out the bigger
Speaker:problem. So that's basically in a nutshell how to break down these
Speaker:big data sets when working with AI tools.
Speaker:The basic premise when doing exercises like this,
Speaker:particularly around data analysis, is —
Speaker:remember we need to check that it's doing it right.
Speaker:We need to scrutinise its outputs.
Speaker:We can't just take what it has as gospel.
Speaker:The other thing too is it's possible to run a multi-stage exercise
Speaker:here, in that we can feed it the original stuff,
Speaker:it can return.
Speaker:We can have now a manager agent run over the top of
Speaker:it and go — we want to check the work of this
Speaker:one. Provide us with any of the gaps around the assumptions or
Speaker:the outputs, and then it can return to you and you can
Speaker:sort of test and refine from there.
Speaker:I always highly recommend — and I've mentioned this before —
Speaker:I always highly recommend running over like a test or a battle
Speaker:test, a manager review type exercise over the AI outputs.
Speaker:Remember that the AI is given a role when it is doing
Speaker:a task for you, and you would go —
Speaker:well, we were just getting it to check its own homework in
Speaker:a sense. Yes, you are. But the role that it used to
Speaker:produce the original content wasn't the role that you've now given it
Speaker:to go and review. It does break down that time to spot-check
Speaker:and review as well — for you when you're doing your own
Speaker:review, which is really super helpful.
Speaker:Keep those things in mind. The other thing that we need to
Speaker:think about is the type of model that we're using.
Speaker:So if you're in a tool like OpenAI's ChatGPT or Anthropic's Claude,
Speaker:then you'll want to make sure it's in the right mode or
Speaker:using the right model for these really heavy tasks.
Speaker:I would be using Opus 4.6,
Speaker:for example, with extended thinking mode on.
Speaker:Yes, it does consume more tokens,
Speaker:but you get a way better outcome and that's what we're looking
Speaker:for. Better outcomes means less of your time used doing admin work.
Speaker:What have we covered today? We've covered today that we need to
Speaker:be thinking about the constraints and we are working with our constraints
Speaker:of using AI tools within your business.
Speaker:And maybe it's a decision that you made earlier on about which
Speaker:tool to use and all that sort of stuff.
Speaker:I don't recommend really going out to purchase all these different AI
Speaker:tools. We pick one and we stick with it and we get
Speaker:really good at using it.
Speaker:That's the idea. We've covered how to break down these really large
Speaker:data sets. If you're wanting to get insights on maybe some unstructured
Speaker:data, or data that we need to have an AI tool run
Speaker:over — it's not necessarily deterministic —
Speaker:then we need to break those down into a token-efficient way to
Speaker:be able to get it to work properly and provide good outcomes.
Speaker:And then at the end we've covered how to make sure that
Speaker:it's given the output by running a manager type role over it
Speaker:and then spot-checking ourselves. That's going to do us today.
Speaker:Guys, thank you so much for joining me.
Speaker:I thank you so much. You could be doing so many other
Speaker:things, but instead you decided to hang out with me and learn
Speaker:how to break down these data sets to use in AI tools
Speaker:when you're trying to make big decisions.
Speaker:This is all about saving time.
Speaker:You can save more time by heading over to my website,
Speaker:lonewolfunleashed.com/resources. There's a whole bunch of resources there that you can use.
Speaker:There's AI stuff, there's process stuff,
Speaker:there's procedure stuff, there's presentations —
Speaker:there's heaps of stuff. Go over there,
Speaker:check that out and I'll see you next week.