Talking to the Nerds (Part 3): What exactly is an agent?

If I had a euro for every time someone asked me this year, “Shouldn’t we be doing something with agents too?”, I could have retired quietly by now. The word is on the agenda of every board meeting, the Wall Street Journal is full of it, consultants are publishing glossy reports, and it crops up somewhere in virtually every product update.

That dynamic (read: hysteria) sounds familiar. A few years ago the same question was about “blockchain”; before that, “apps”. I recognise the same nervous rush, the fear of missing out, and the hope that there’s a ready-made solution out there somewhere.

But what exactly is an agent, and just as importantly, what isn’t it? Let me start with the awkward part: there is no single definition. Every supplier has its own interpretation, and marketing departments are now slapping the label on everything. What used to be called a chatbot is now called an agent. An automation? An agent. A little script that runs through a fixed step-by-step plan under the bonnet: agent.

Even so, a distinction can be made. What most people call an agent is, in reality, a workflow: a predefined series of steps with AI involved somewhere along the way. Useful, often handy, but not fundamentally different from what we’ve been doing for years. “Real” agents are something else entirely, and they are still rare in practice. With a real agent, the system itself decides which steps to take, chooses which tools to use, and keeps going until the task is done. Crucially, it is not you, the human, who has mapped out the steps. The model works them out as it goes.

That difference is more than just a matter of semantics. In a recent piece, I wrote about how generic AIs such as Claude are competing with specialised legal AI platforms (such as Legora and Harvey), and are increasingly making their mark in the legal market. Anyone trying to gauge where that market is headed would do well to understand how agents work and what they are made of.

Claude Code as an example
One of the most advanced examples of a real agent in practice today is Claude Code. It is Anthropic’s AI assistant for programmers, and it is a good deal more than the chat window we’ve all come to know. In Claude Code you give a command (something like “solve this problem” or “improve this function”), and the system gets to work. It reads files, executes commands, modifies code, tests whether everything works, fixes any issues, and reports back when the task is complete. It is precisely this autonomous, self-directed behaviour that sets an agent apart from a workflow.

Most of my fellow programmers, myself included, now mainly do two things: instruct these systems and check their work. In a short space of time, we’ve gone from pilot to co-pilot to air traffic controller. Much of the code we used to write ourselves is now written for us by the agent.

The same approach is starting to appear in other professions. Claude Cowork is the “little brother” of Claude Code, aimed at knowledge work, including legal work. You can hand over complex tasks to it, such as analysing and drafting documents. Earlier this year Anthropic also released a plugin within Cowork specifically for legal work, with commands such as /review-contract. The trend is becoming increasingly clear: what is already a daily reality for programmers is set to become one for lawyers too.

Recently, the source code for Claude Code was released. Naturally, I pored over it with great interest. It gives a far more detailed picture than you would normally get from documentation or marketing material.

The best way to understand how agents work is to look at how Claude Code works under the bonnet. I realise that sounds technical, but rest assured, I’ll do my best to explain it in a way that requires no technical knowledge.

In doing so, I’ll walk you through the five building blocks that make up Claude Code: the cycle that drives everything, the tools the system uses to get things done in the “real world”, the permissions layer that guards the boundaries, the memory (which is always too small), and the ability to delegate work to itself.

But first, something that struck me.

Most of it isn’t AI

What surprised me most was how little of the source code actually deals with the AI itself. The vast majority is run-of-the-mill software: permissions management, error handling, memory management, and interfaces with other systems.

That is an important observation. When executives or consultants talk about “agentic AI”, it sounds as though the intelligence of the language model is the deciding factor. In reality, the usability, reliability and security of such a system lie precisely in everything around it. That is where the important decisions are made, and where things most often go wrong.

The cycle
So what actually happens when you give an agent a task? It is surprisingly simple. When you give an instruction (say, “amend this contract in line with our standard”), the system kicks off a cycle that repeats itself until the task is complete. Within that cycle, the same thing happens every time.

The task, plus all the information (”the context”) gathered up to that point, goes to the AI, a language model such as ChatGPT or Claude. The model either responds straight away or asks for a follow-up step (”read this file first”, “search the database”, “execute this command”). If it responds straight away, the cycle ends. If it asks for a follow-up step, the system checks whether that is permitted, executes it, adds the result to the context, and starts the cycle again.

That is all there is to it. And that is precisely where it differs from a workflow: in a workflow, you (or the supplier) have determined in advance which steps are to be followed and when the workflow is complete. In an agent, the model decides on the fly, based on what has happened up to that point.

Tools: the system’s hands
A language model on its own can actually do relatively little. It produces text, and that is where it stops. What makes an agent an agent are the tools: the hands the model uses to get things done in the “real world”. Claude Code has more than fifty such tools at its disposal. Opening a file. Modifying a file. Executing a command. Searching the web. Calling an external system (via the Model Context Protocol).

For law firms, this is perhaps the most interesting aspect. The tools determine what an agent can actually do. No link to legal sources means no case law research. No link to the document management system (DMS) means no insight into the case file. The intelligence sits in the model; the usability sits in the system around it.

Permissions: the brake
Between the model’s decision and the actual execution of a tool sits a permissions mechanism. Claude Code has various modes, ranging from “always ask first” to “just get on with it”.

This is where policy, compliance and control are embedded within such a system. For lawyers in particular, this is an essential component. The moment a firm deploys an agent, the question “how is the permissions layer set up?” is at least as important as “which model is behind it?”.

One important point: research by Anthropic (the creator of Claude) itself shows that new users automatically approve around a fifth of all actions, while for experienced users that figure rises to nearly half. Human oversight as the ultimate safeguard is, to put it mildly, fragile. And as people work with these systems more often, that oversight becomes more lax rather than stricter. Something to bear in mind whenever someone clings to the idea that “the user is always ultimately responsible for quality and safety”.

Memory: the notepad fills up
An agent does not have unlimited memory. Everything the model “knows” about a task (the context) has to fit on a kind of digital notepad that is sent along with every step. That notepad has a fixed capacity. As soon as it threatens to fill up, the system starts clearing it out: older notes are shortened, long texts summarised, and if it really won’t fit any more, the system produces a concise summary of everything up to that point, deletes the rest, and carries on with that.

The result is inevitable: an agent forgets things. Not out of carelessness, but because the system has no other choice. For long, complex matters, this means you cannot blindly trust that all the decisions and agreements made earlier are still intact on the notepad.

Sub-agents: outsourcing to oneself
If a task becomes too complex, the lead agent can hand part of it off to a so-called sub-agent. A kind of trainee with a defined assignment and their own blank notepad. That sub-agent works independently, eventually returns with an answer, and only that answer is passed back to the lead agent. Everything in between is discarded afterwards.

Sound familiar? It is very similar to how a partner delegates work to an associate. You give a briefing, you get a result, and you (usually) trust that everything in between went well.

The right follow-up questions
Back to the original question. What is an agent?

An agent is not an AI that “can do everything”, and certainly not every system currently marketed under that banner. It is a cycle centred on a language model, equipped with tools, governed by a permissions layer, constrained by the notepad, and capable of delegating work to itself. The AI itself is only a small part of the whole. The value, and the risks, lie in the system around it.

When someone offers an “agent-based” system, the first question should be: are we talking about a real agent, or a workflow with an AI twist? Both can be valuable, but they call for a completely different assessment. And if it is a real agent, the follow-up is just as important: how are the tools, the permissions layer and the memory layer configured?

I believe that in the coming years, competition between providers will be less and less about which model is used (those are converging rapidly), and increasingly about how well the rest is designed. And the next time someone asks you, “Shouldn’t we be doing something with agents too?”, I hope this piece helps you ask the right follow-up questions. What’s more, what a real agent does is not nearly as alien as it sometimes seems: it can delegate to itself, but someone still has to decide which tools exist, which permissions apply, and which judgement is reliable enough to build upon. That work bears a striking resemblance to what we have always done.

DOSSIERS

Talking to the Nerds (Part 3): What exactly is an agent?