About

Building AI agents for a D&D app with the Laravel AI SDK and Claude

Four agents, one campaign manager, and a lot of thinking about what makes AI output actually useful.

Opening

DM Forge started out as a simple session management tool. Over time, it gained more advanced capabilities. The AI features—such as generating NPCs, creating sessions, and writing recaps—were always part of the plan, but initially, they were just API calls to Claude wrapped in a service class. While functional, it felt like a temporary fix, with each feature as a custom integration that was inconsistent and difficult to test. When Laravel released its official AI SDK earlier this year, I completely rewrote the AI layer using it. This made the system cleaner, easier to test, and, honestly, more enjoyable to work on. Here’s what I built and what I learned from the process.


What the Laravel AI SDK Actually Is

The short version: the Laravel AI SDK provides a unified, expressive API for interacting with AI providers such as OpenAI, Anthropic, Gemini, and more. Instead of writing raw HTTP calls to different provider endpoints with different response shapes, you write Laravel-native agent classes, and the SDK handles the provider layer underneath.

Agents are the fundamental building block — each one is a dedicated PHP class that encapsulates the instructions, conversation context, tools, and output schema needed to interact with a large language model. Laravel Think of them as specialists: one agent knows how to generate monsters, another knows how to write session recaps. Each has a clear responsibility and can be prompted, tested, and reasoned about independently.

For DM Forge, which connects to Claude via Anthropic, the SDK meant I could stop thinking about HTTP headers and response parsing and start thinking about what I actually wanted the agents to do.


The Four Agents

DM Forge currently has four AI agents. Here's what each one does and why it's structured the way it is.

1. The Campaign Wizard

The entry point for new campaigns. A DM gives it a premise — "a grimdark political thriller set in a collapsing empire" — and the agent generates a campaign skeleton: factions, locations, key NPCs, and a rough story arc.

The interesting constraint here is specificity. A generic fantasy campaign generator is easy to build and mostly useless. What makes the Campaign Wizard useful is that the system prompt is loaded with the DM's stated preferences, tone, and any world-building they've already done. The agent isn't generating a campaign from nothing — it's extending something that already has a voice.

Structured output is critical here. The response comes back as a typed PHP object — factions as an array, each with name, goals and affiliations; locations with names and descriptions; NPCs with backstory and motivation fields. That structured response goes straight into the database without any parsing logic in between.

2. The NPC Generator

Probably the most-used agent in the app. Pass it a concept — "a dwarven blacksmith who secretly knows the location of a lost forge and is haunted by an apprentice's death" —, and it returns a fully fleshed NPC: personality traits, voice description, motivations, secrets, and potential hooks for the story.

The lesson I kept learning with this one: the quality of the output is almost entirely determined by the quality of the context you pass in. An NPC generated with access to the campaign's factions, existing characters, and current story arc is dramatically more useful than one generated in isolation. Wiring up that context — pulling the right data from the campaign before prompting — turned out to be more of the work than writing the prompt itself.

3. The Session Builder

Given a campaign's current state and a rough plot intention — "the party finally confronts the faction leader in the capital" — the Session Builder generates a complete session structure: scenes, encounters within each scene, branching decisions, suggested loot, and NPCs to pull in.

This one uses structured output extensively. A session has a specific shape in the DM Forge data model — scenes contain encounters, encounters contain monster slots, branches point to other scenes — and the agent needs to return something that maps directly to that structure. HasStructuredOutput An interface requires defining a schema method that tells the SDK exactly what shape the response should take, Laravel, which means the agent's output slots into the database almost without transformation.

The branching logic was the trickiest part. Getting Claude to generate genuinely divergent story branches — rather than two scenes that converge on the same outcome anyway — required being explicit in the system prompt that branches should have meaningfully different consequences for the world state.

4. The Recap Narrator

The one that gets the most love from people who actually use the app.

After a session, the DM has a set of logs: scene notes, decisions taken at each branch, combat outcomes, and which NPCs survived or died. The Recap Narrator reads those logs and generates a written narrative summary — the kind of session recap that reads like it was written by someone who was there, that you can share with your players or keep as a campaign journal.

This is the agent where prompting felt most like writing. The system prompt establishes a narrative voice — present tense, second person ("you push open the door to find..."), with space for small dramatic details — and the structured logs give it the factual spine to work from. Getting that balance right, between faithfully recapping what happened and making it enjoyable to read, took more iteration than any of the other agents.

The structured input matters as much as the structured output here. Passing raw session notes to Claude produced mediocre results. Passing structured data — scene by scene, decision by decision, with clear labels — produced recaps that players actually wanted to read.


What the SDK Made Easier

A few things that would have been friction without it:

Testing. The SDK ships with fake agents built in, so you can test AI features with real coverage without making actual API calls in your test suite. For DM Forge, that meant I could write feature tests for the session builder that verify the generated structure is saved correctly without hitting the Claude API on every test run.

Structured output. Before the SDK, getting structured JSON back from Claude meant prompting it to return JSON, hoping it did, stripping markdown fences, and parsing it in a try/catch. The SDK's HasStructuredOutput interface handles all of that. You define the schema, the SDK enforces it, and you get back a typed response you can treat like a PHP array.

Provider switching. You can configure multiple providers so the SDK fails over automatically during rate limits or outages — no fallback logic to write yourself. For a side project that runs on my own infrastructure, this is low priority, but it's the kind of thing that matters the moment you have real users.


The Honest Part

The SDK is still relatively new and shows it in places. The documentation covers the happy path well, but gets thinner when you push into less common patterns — loading large amounts of campaign context without exceeding token limits, for instance, required some experimentation that wasn't really covered anywhere.

The bigger lesson, though, isn't about the SDK — it's about AI integration in general. The technology is not the hard part. The hard part is figuring out what context to pass in, what shape the output should take, and what "good" looks like for your specific use case. I spent maybe 20% of my time on the Laravel integration and 80% on prompt iteration and context design.

That ratio feels about right for anyone approaching this kind of work.