Nowadays, most of us have multiple agents running in parallel on our computers.
Some people are fixing dozens of broken tests. You won't get bored or frustrated. Continue making adjustments across multiple workers until the test passes again and again. Another agent was experimenting with the backend build process all night. In the morning, while I was sleeping, the time was automatically reduced by 8 minutes.
Third, I'm trying to export a financial model to Excel using the correct formulas. Export to Excel, upload to Google Sheets, export to CSV, compare calculated values, fail, adjust, then try again. Over and over again. Slow, iterative debugging like this typically remains in the backlog for days.
My computer currently crashes about 8 times a day because every agent launches its own environment. Due to lack of RAM, we had to move everything to the cloud so we could run as many instances as we needed. That's a silly question.
But this is what AI actually looks like right now. Just one person manages small systems that work in parallel and never get tiring. Finance doesn't work this way yet, but it will. And when that happens, the difference between who benefits and who doesn't is no longer about who bought the right tools. It's going to come down to something that has always been important, long before AI.
There is a lot of discussion in the financial industry about AI, including which vendors to choose, how to write policies, and how to deploy training. These questions are important, but that's downstream. The real constraint is management. If you can clearly define the problem, provide enough context, and recognize when the output is wrong, AI can quickly help you. If you can't do that, the results will be unstable and disappointing. And this depends on how much you already know how to think.
AI mental models are already outdated
Much of the current skepticism in the financial industry comes from the bad experiences people have had in 2025. They tried the agent, but the results were wrong, and they concluded that AI wasn't ready yet, so they moved on. That conclusion made sense at the time.
But in late November and early December of 2025, something important happened. Claude Opus 4.5 and GPT-5 crossed a threshold in long-term planning and the ability to evaluate one's work that most people have not yet registered. If you've used Claude Code or Open AI's Codex recently, you probably know what I mean. A year ago, you would have looked at the model in action and thought, “We're clearly a long way from AGI.” Now, that answer is even more difficult to come up with. Although the philosophical debate continues, it has become very difficult to understand the practical differences for most tasks. The output is almost indistinguishable from that of an actual average human.
Most products have not kept up with the functionality that the underlying models can currently perform. This means that most people's AI mental models are tuned to something that doesn't exist yet.
The problem of “I tried it but it didn't work” is real, but if you dig into it, in most cases, people: What they asked for and how much context was provided. It depends on how the work was defined.
AI reveals how to manage
[Engineer] Geoffrey Huntley says, “The LLM is a reflection of the skill of the operator,'' and that rings true in my experience. What you get out of your agent depends on how clearly you define its tasks, the amount of context you provide, the constraints you set, and how well you judge the output.
None of this is new. It's just management.
And many people aren't particularly good at it.
Here's how most people use AI. They give vague prompts, keep most of the context in their heads, never define what is good, get generic results, and conclude that AI doesn't work. When I ask if they've tried to give more background or be more specific about what they want, the answer is usually “no.” Changing just one thing, such as defining the outcome more precisely or providing examples of good answers, will disproportionately improve the outcome.
This is the same movement as for new employees. If a manager hands off a task without context, the employer will misunderstand it and instead of asking, “Did I understand enough?” they will ask, “Did I understand enough?” They decide that the hire is incompetent. For people, we eventually recognize this pattern and try to correct it. We invest in onboarding and mentoring managers. But with AI, we just stop using it.
Public finance makes inequality visible
In finance, this extends even further. Great financial professionals operate from multiple layers of context in their heads. They know which systems can be trusted, which variances actually matter, which controls are sacrosanct and which legacy, and what the CEO really cares about this quarter (even if they aren't told directly). They operate based on intuition built from experience rather than from what is written down. It's very difficult (for humans or models) to translate that intuition into an explicit context, so it often doesn't happen.
If the context remains implicit, the AI will not be able to infer it. So this model produces something that looks reasonable at first glance, but falls apart on closer inspection. And at that point, many teams conclude that the technology isn't ready yet. But what they often find is that their organizations haven't learned how to externalize that judgment in a way that others can collaborate with.
That constraint has always been there. AI only becomes harder to ignore.
Questions to ask before implementing AI
Before introducing AI to your team, it's worth asking how often you're hiring talented junior analysts. When talented people join your team, can you articulate what good work looks like? Can you provide them with enough content in writing so they don't have to interrupt you every 15 minutes? When they miss the mark, can you explain exactly where and why?
If you're already good at it, you tend to find AI easier to work with. If not, you might end up blaming the tool.
So before you think about which tools to buy, it's worth stopping to consider something more basic.
- Can your team clearly articulate in writing what they want?
- Can they externalize context rather than keeping it in their heads?
- Can you tell the difference between actually correct output and just correct output? sound correct?
If the answer is no, that's the first problem to solve. Because no model can solve management problems.
This is the work
When organizations respond to AI, their instinct is often to build training programs and policies. Intuitively, I understand this (I feel responsible), but I tend to frame AI as something separate from the actual work. It's like preparing and applying later.
But just attending a session or reading a guide doesn't give you an idea of what these systems can do. You'll find out by running them against your own data, observing where they fail, adjusting your prompts, and realizing that what seemed accurate in your head was actually vague. That's when learning happens.
Therefore, your weekly goals should be a meaningful improvement over the previous one and a workflow that you can use next week. There is no point if you cannot overcome contact with reality.
Second, to accelerate adoption, teams need to observe people they admire doing something truly useful with AI. But that will only happen if you build your own intuition first. It’s not about reading about it or delegating it to an “AI champion” to figure it out. You have to get your hands on it, use it in real work, encounter failures, understand why they happened, and repeat them. By doing so, you will develop the ability to judge where the model is strong and what kind of context is needed. And that judgment will be yours to share.
Once you have made that decision, share it openly. I'll demonstrate the workflow I built and explain what went wrong. Use real data. Concrete and honest examples are more valuable than polished success stories and give others something to try.
The skills shift is already happening
The finance teams that win over the next 12 months will be those of leaders who can clearly define what is good, externalize their judgment, give real context, and be able to tell the difference between what is right and what just sounds confident.
These have always been the skills that make you a good person at managing teams of people.
These are the same skills needed for someone to excel at managing a fleet of agents.
Many financial leaders don't realize this until they see their competitors move faster with fewer people. Some people may notice it early on.
I feel this through my work now. Running agents in parallel puts a load on your machine. It's annoying and crashes frequently. It's far from elegant. But the influence is real and already there.
