AI design · Interaction design · Physical fabrication

Story Forge

A locally hosted platform that transforms books into interactive text adventure games using local large language models — and the four-prototype process behind making AI tell a coherent story.

Role

Designer, Researcher, Developer

Timeline

2024 – 2025

Tools

Python, FastAPI, React, Vite, Ollama, Pygame

Context

M.Des Thesis — OCAD University

Outcome

Prototypes built and tested

Terminal interface through to full web app. Each prototype was driven by a specific failure in the previous version.

LLM models benchmarked

Ranging from 600MB to 20GB. Model selection was a design decision, not a technical one.

Exhibited at DFX 2025

Inside a custom CNC machined arcade cabinet. Forge Mode became the dominant play choice on the exhibition floor.

Story Forge demonstrated that narrative coherence in LLM-driven games is a solvable design problem — not just a technical limitation — and that the tension between structure and freedom can be given directly to the player as a mode choice.

Problem

Large language models can generate engaging text turn by turn but have no mechanism for staying faithful to an existing story.

Without constraints, LLMs lose the thread of source material within a few exchanges. Narrative deviations accumulate, characters behave inconsistently, and the original plot disappears entirely. The design question was not whether AI could generate story content — it clearly could — but whether it could be structured to stay faithful to an existing narrative while still responding meaningfully to player choices. One early playtest had the mutineers and loyalists swap roles entirely. Another turned a historic naval conflict into a dispute over who stole the breadfruit.

Prototype 1

Terminal interface — the core problem revealed

The first prototype ran entirely in a terminal. Using Python, Ollama, and the llama2 model, I built a simple game loop where a system prompt instructed the model to act as a text adventure narrator following the story of the Mutiny on the HMS Bounty. The player selected numbered options each turn and the model advanced the narrative. Testing revealed the core problem immediately. The model could handle individual turns well but after five or six exchanges it began to contradict itself, lose track of plot details, and generate content with no relationship to the source material. It was generating plausible text, not telling a specific story.

Prototype 2

Inventing the Story Card

The failure of Prototype 1 made one thing clear: the model needed a persistent reference it could check against at each turn. Drawing on Rob Śliwa's work with local LLMs and his use of player and companion cards to anchor narrative behavior, I developed what I called a Story Card — a pre-written YAML summary of the HMS Bounty narrative covering the mission, the tension between Bligh and Fletcher Christian, the mutiny, Bligh's 3,600 nautical mile open boat voyage to Timor, and the fate of the mutineers. At each turn the model checked its response against this summary before generating output. Major plot deviations dropped substantially. But the model still occasionally ignored the Story Card when it could generate more interesting content without it.

What changed

Story Card introduced as a persistent YAML reference checked at every model turn. The most consequential single design decision of the project.

Prototype 3

Model benchmarking and the Pygame GUI

Before building Prototype 3, I ran structured benchmarking across seven models ranging from 600MB to 20GB, including GPT-4-Turbo, llama2, llama3, Phi-2, Mistral, and Gemma. Larger models did not perform better. GPT-4-Turbo was prone to overgeneration. Smaller models like Phi-2 drifted from the narrative frequently. The 7 to 10 billion parameter range, specifically llama3 8B, produced the most consistent Story Card adherence with reasonable response times. I also lowered the model temperature from 0.6 to 0.2 after observing that higher settings consistently weakened adherence. The terminal was replaced with a Pygame GUI designed to feel like a typewriter. I presented Prototype 3 at the OCAD notQuiteThere(yet) exhibition in October 2024. Engagement increased substantially but the interface still felt like a prototype wrapper, not a platform.

What changed

Model changed to llama3 8B. Temperature reduced to 0.2. Terminal replaced with Pygame GUI. Presented publicly for the first time.

Prototype 4

Web stack, two play modes, and the physical cabinet

For the final prototype I rebuilt the architecture entirely. Backend moved to FastAPI. Frontend rebuilt in Vite with React — both new to me going into this project. Story Forge was now a locally hosted web application. Two insights from watching players in Prototype 3 drove the interface redesign. First, long horizontal text streams were hard to read — I moved to a two column layout with narrative on the left and controls on the right. Second, text input created confusion when the model was already offering numbered choices — I replaced the text box with physical numbered buttons. The redesign also introduced Story Mode (temperature 0.2, closely aligned with the Story Card) and Forge Mode (temperature 0.8, significantly more latitude for unexpected branching). At the DFX exhibition, Forge Mode became the dominant choice. Players gathered to watch sessions unfold.

What changed

Full rebuild in React and FastAPI. Two column layout. Numbered button input. Story Mode and Forge Mode — temperature as an explicit player-controlled design variable.

Final design

A locally hosted React and FastAPI web application with a custom Ollama backend running llama3 8B. A Story Card system providing persistent YAML narrative reference at every model turn. Two distinct play modes giving the player direct control over the structure-to-freedom ratio. A two column interface with scrollable narrative text on the left and numbered button input on the right. Housed inside a custom CNC machined arcade cabinet built from half inch MDF with painted panels and vinyl decals, enclosing a 27 inch monitor with the backend running inside the enclosure.

Reflection

The most important design decision in this project was not a UI choice. It was the invention of the Story Card. Every other improvement — temperature tuning, model selection, the web stack, the two column layout, the mode system — was built on top of that one structural idea: give the model a persistent reference it checks at every turn and narrative coherence becomes manageable. The other lesson was that the tension between structure and freedom in AI storytelling is not a problem to solve but a parameter to tune. Story Mode and Forge Mode did not resolve that tension. They made it explicit and gave the player control over where on that spectrum they wanted to be. What I would push next: multi book support with a Story Card authoring interface, cloud model options for faster response times, and a multiplayer mode where two players navigate branching decisions together.

Next case study

OCAD University Student Analytics Dashboard →