Developer Tools9 min

Build a Multi-Agent Trading Bot in an Afternoon — Alpaca, Claude, and LangChain

A single LLM given a brokerage API is a loaded gun with no safety. A well-structured multi-agent system is a lot more interesting — and it turns out to be the difference between a toy bot and something you would actually let run overnight. Here is the architecture, the pitfalls, and what actually happens when you try.

Erhan Timur19 April 2026Founder, Digital by Default

Share:X LinkedIn

Build a Multi-Agent Trading Bot in an Afternoon — Alpaca, Claude, and LangChain

Giving a single LLM unfettered access to a brokerage API is a bad idea. Not because LLMs are stupid — they aren't — but because the prompt pattern that makes a model good at one thing makes it worse at another, and real trading needs four very different things done well: research, strategy, risk, and execution.

This is where multi-agent architectures stop being an architectural fashion statement and start being a safety requirement.

I spent an afternoon last week building exactly that setup — a four-agent trading bot on top of Alpaca's paper-trading API, using Claude Opus 4.7 as the underlying model, the MCP Server v2 for tool access, and LangChain for the orchestration glue. It isn't going to make anyone rich. But it's a much more honest reference implementation of what "agentic trading" actually looks like once you take it seriously, and the lessons transfer directly to any high-stakes-agent problem you're working on — not just finance.

Here is how it's built, what breaks, and what surprised me.

Why multi-agent, not just one clever prompt

The one-prompt approach is seductive. You write a prompt like *"You are an expert trader. Research current market conditions, identify high-conviction opportunities, manage risk, and execute trades. Start."* Wire it to tools, hit enter, see what happens.

What happens is the model does all four things badly at once. It skims the research so it can get to the trade. It skips risk checks because the instruction is implicit. It over-anchors on whichever piece of data it saw most recently. It doesn't ask for a second opinion because it *is* the only opinion. And when something goes wrong — the ticker was wrong, the spread was wider than expected, the market was halted — there's no internal pushback, because there's no internal adversary.

Multi-agent systems solve this by decomposing the problem into specialists who each do one thing and have to hand off to each other. The research agent doesn't know how to place trades. The execution agent doesn't know how to evaluate strategies. The risk agent exists specifically to say no.

This isn't speculation — it's the same design pattern that works in human trading desks, for the same reasons. You want a trader, a quant, a risk officer, and an execution trader, not one person doing all four. When the work matters, specialisation plus disagreement beats generalist-plus-vibes every time.

The stack

Four components, all glued together in a weekend-buildable setup:

Alpaca paper-trading account. Free, instant, $100k of fake money. US residency not required. Both the Trading API and Market Data API exposed.
Alpaca's MCP Server v2. Self-hosted, configured with your paper-trading keys. Run it locally. Point your agent framework at it.
Claude Opus 4.7 as the reasoning model for each agent. Sonnet 4.6 works fine too and is cheaper; use Opus for the risk agent if you're budget-constrained and want the strongest reasoning on the most critical role.
LangChain (Python) for the agent graph, tool routing, and conversation state. LangGraph specifically if you want explicit control over handoffs, which you do.

Total cost for a week of paper-trading experimentation: about $15 in Claude API usage. You can run the whole thing on a laptop.

The four agents and how they talk to each other

The architecture is a linear pipeline with a feedback loop, not a free-for-all. Each agent has a narrow scope, a defined output format, and a short memory window — which keeps the prompts tight and the failures legible.

1. The Researcher

Role: gather the relevant context and nothing else.

The Researcher takes a universe of interest — "US large-cap tech stocks" or "the S&P 500 financials sector" — and produces a structured report: current prices, recent news headlines, earnings surprises, analyst revisions, unusual options activity. It has read-only access to the MCP Server's market-data tools. It cannot place orders. It cannot even see the portfolio.

Its output is a JSON report, not prose. This matters. If you let the model write prose, the next agent reads the prose differently every time. Structured output is the backbone of every multi-agent system that actually works.

What breaks: the Researcher will over-fetch. You have to constrain it — "maximum five tickers, maximum three news items each" — or it will happily pull down 200 data points for a single-decision problem.

2. The Strategist

Role: turn a research report into a specific trade thesis.

Given the Researcher's report, the Strategist proposes one trade: symbol, direction (long or short), size as a percentage of buying power, instrument (equity or a specific multi-leg option structure), and a written thesis explaining *why*. It has access to no tools directly — it is pure reasoning over the Researcher's report. The only thing it outputs is a proposed order + thesis.

Separating research from strategy is counter-intuitive if you think an LLM should "just think end-to-end," but it's the single most important split in the whole system. The Researcher is rewarded for completeness. The Strategist is rewarded for conviction. Mixing those incentives in one prompt produces mushy outputs that do neither well.

What breaks: the Strategist will occasionally hallucinate a ticker the Researcher never mentioned. The fix is simple — the orchestrator refuses any proposal whose ticker isn't in the report.

3. The Risk Officer

Role: say no, often.

The Risk Officer reads the proposed trade and the current portfolio state (fetched via read-only MCP tools) and produces a binary verdict: approved or rejected, with reasoning. Its system prompt is explicit: *"Your job is to reject trades that violate the risk policy. Your default posture is skeptical. Approving a bad trade costs the desk money. Rejecting a good trade costs nothing but opportunity. Default to rejecting when unsure."*

Risk policies are externalised into a config file, not the prompt — maximum 5% of portfolio per single position, no more than 30% in any sector, no new position if the drawdown in the last 7 days exceeds a threshold, no trades in the last 15 minutes before close. The Risk Officer reads these and enforces them mechanically.

The subtle trick: give the Risk Officer its own LLM call, with a different temperature (lower — 0.0 or 0.1) and a different system prompt that leans into skepticism. You get measurably more rejections, which is what you want.

What breaks: if you use the same model with the same temperature for Risk as for Strategy, the Risk Officer often rubber-stamps the Strategist because LLMs tend to agree with themselves when you ask the same thing twice. The fix is to deliberately break that symmetry — different model variant if you can, different temperature, different framing, different example set.

4. The Executor

Role: place the approved order.

The Executor reads the approved trade proposal and submits it via the MCP Server's trading tools. It has write access. It is the only agent that does. Its prompt is minimal — "submit this order as specified, confirm fill, report back" — because the last thing you want in the execution layer is creative reasoning.

The Executor also emits a structured log of what it did, which feeds the next research cycle's context. If a trade filled at a price significantly different from the proposed price, the next Strategist call knows that and adjusts.

What breaks: the Executor will occasionally misinterpret a multi-leg options structure. The fix is a deterministic pre-check — the orchestrator constructs the order explicitly from the Strategist's proposal and passes it to the Executor as a fully-formed payload, not as natural language.

The orchestrator

LangGraph handles the state machine — Researcher to Strategist to Risk Officer, then either to Executor (on approval) or back to Strategist with rejection reasoning (on rejection), with a configurable max-retries before giving up on the cycle entirely.

The whole loop runs once every 30 minutes during market hours. Logs go to a local SQLite database — every prompt, every response, every tool call, every order. This is critical. An agentic system you can't replay is an agentic system you can't trust.

What actually happens when you run it

The first three cycles, nothing happens. The Researcher produces reports, the Strategist proposes trades, the Risk Officer rejects them all, usually correctly — sector concentration, insufficient buying-power buffer, no edge versus the recent price action.

That's the correct behaviour. A well-tuned risk officer should reject 70-90% of proposals. If yours approves most of them, your risk rules are too loose or your prompt is too permissive.

By the fourth or fifth cycle, a trade makes it through. In my test week, the first approved trade was a long position in a mid-cap industrial that had just printed a positive earnings surprise with analyst upgrades but was still trading below its pre-earnings level — a reasonably textbook setup. The position filled, the logs captured everything, and 48 hours later the Strategist proposed closing it for a small profit.

That's the boring, correct shape of it. It's not making 20% a week. It's making small, logged, defensible decisions under constraint.

The three things that surprised me

1. The Risk Officer is the hardest agent to prompt well. Research and strategy are well-understood patterns for LLMs. Risk is a fundamentally different mode — it's about articulating reasons to *not* act. Default LLM behaviour is to be helpful, which means agreeing. You have to fight that with explicit prompting, temperature adjustments, and ideally a different model, and even then the Risk Officer occasionally slides into sycophancy. This is the open problem of the whole architecture.

2. Structured output gets you 80% of the safety. Forcing each agent to emit strict JSON instead of prose eliminates an entire class of misunderstandings at the boundaries. The Researcher can't sneak an implicit recommendation into its "analysis." The Strategist can't wriggle out of the size parameter. The schema is the interface, and the interface is where most of the discipline lives.

3. The MCP Server makes this stack 10x less painful than it was six months ago. Previously I would have been writing Python wrappers around Alpaca's REST endpoints, mapping them to LangChain tools by hand, babysitting auth and rate limits. The MCP Server v2 exposes all of this as LangChain-compatible tools out of the box. The amount of glue code in this project is a fraction of what it would have been in 2025.

Where to extend

Add a Compliance agent that reviews trades against a written policy document. This is the piece that makes the system deployable inside a firm — not just for your own paper account.
Add a Post-Trade Analyst that reviews fills after the fact, compares them to the proposed thesis, and produces a weekly review. Feeding this back into the next Strategist call is how you get an agent that learns from its own track record.
Add paper-trading competitors. Run three different Strategist configurations in parallel — momentum, mean-reversion, event-driven — let them each propose trades, let the Risk Officer see all three, and pick the best one per cycle. This is where multi-agent starts to outperform a single sophisticated prompt.
Swap models at each role. Use Sonnet for the high-volume research passes, Opus for the risk and strategy calls, a smaller model (Haiku 4.5) for logging and audit. The point of multi-agent is you can tune cost-to-capability per role.

What this is really about

If you squint, this isn't a tutorial about trading. It's a tutorial about how to build agentic systems that you'd actually trust to do something real.

The same pattern — specialist agents, structured handoffs, externalised policy, logged everything, a deliberately skeptical reviewer in the loop — is what you want for agents that operate a CRM, send emails, modify infrastructure, or take any action with real-world consequences. Finance is the cleanest sandbox for practicing it because the feedback loop is fast and objective. But the lessons travel.

The single-prompt "let the AI figure it out" pattern is going to keep failing at scale for the same reason a single human doing all four jobs fails on a trading desk. Solve that problem properly in a small domain, and you've solved it for everything.

That's the work worth doing in 2026. Alpaca + Claude + a multi-agent discipline is a fine place to practice it.

Want to try it? Alpaca is in the marketplace with paper-trading sign-up details. For the broader picture on multi-agent AI patterns, the Developer Tools section has most of the orchestration tools referenced above. If you're building something in this space and want a second opinion on the architecture, get in touch.

AlpacaClaudeLangChainMulti-AgentAI AgentsLangGraphAlgorithmic Trading2026

Share:X LinkedIn

Enjoyed this article?

Subscribe to our Weekly AI Digest for more insights, trending tools, and expert picks delivered to your inbox.

Browse AI Apps More Articles