Building the System Design Simulator with Codex CLI
An end-to-end account of building the System Design Simulator with Codex CLI, covering data modeling, diagrams, scoring, and feedback.
Summary
Walkthrough of the end-to-end build: data model, scoring matrix, and how decisions translate into a readable system design narrative.
Key takeaways
- Model scenarios as structured steps with explicit tradeoffs and feedback.
- Treat diagram rendering as a first-class output, not a byproduct.
- Keep scoring transparent so users can reason about outcomes.
- Store revisions so the simulator evolves without losing history.
Building the System Design Simulator with Codex CLI
Here’s the full build story: what we decided, what we built, and what we actually verified (and what we didn’t).
The simulator is a multi-step system design experience where choices update the diagram and the scoring in real time. This post walks through the path we took to make that work end to end.
If you want the interview framing first, see System Design Interview: From Zero to Hero. For product context, start with System Design Simulator: Uber-like Architecture Walkthrough.
Planning the simulator with Codex CLI
We started with a quick planning pass to make the scope explicit. The non‑negotiables were:
- The simulator is a multi-step quiz-style flow.
- It should show a diagram on one side that updates after each choice.
- There are no strictly correct or incorrect answers.
- Every answer should produce per-answer feedback.
- Earlier choices should influence later outcomes (in scoring and feedback).
- Start with a single scenario, fully fleshed: an Uber-like app.
To keep scope tight, we wrote a clear definition of done: one scenario, a fixed number of steps, a small set of choices per step, fully localized copy, and a results view that explains every signal shift. We also decided early that all metrics would share the same directionality (higher is better) to avoid mixed polarity. That choice shaped the model, the UI, and the feedback language.
We picked Mermaid because it was quick to integrate and good enough for a polished MVP.
Then we laid out a simple build plan:
- Add Mermaid dependency.
- Create a localized public route for the simulator.
- Build a diagram renderer component.
- Build a simulator component with step logic and result matrix.
- Define a scenario data model with metrics, flags, and conditional effects.
- Update dictionaries with simulator UI text.
Before coding, we asked a few blunt questions: when should feedback show up, can users change their mind, how many steps is enough, and how do we keep choices balanced. That quick Q&A prevented scope drift and gave us concrete acceptance criteria.
We also compared a few strategies for the core mechanics:
- Diagram rendering: a full graph editor vs. a lightweight renderer. We picked Mermaid for speed, clarity, and low integration cost.
- Feedback timing: inline after each step vs. a consolidated end summary. We chose end‑summary to keep the flow focused and to make cross‑step influence easier to understand.
- Scoring model: mixed polarity vs. unified polarity. We chose “higher is better” for all metrics to avoid mental overhead.
- Data organization: logic in code vs. declarative data. We chose a data‑first model with flags and conditional deltas so the scenario drives behavior.
Once those choices were made, the rest was straightforward: the code was mostly a translation of the plan.
The Uber-like scenario (single, fully fleshed)
We started with a single scenario: an Uber-like ride-hailing platform. It includes:
- A base diagram with Rider/Driver apps, an API gateway, and core services.
- Five steps, each with three options (3^5 = 243 possible paths).
Step 1: Service boundary strategy
- Modular monolith + single Postgres
- Service-oriented + per-domain databases
- Event-driven core with a shared bus
Step 2: Location storage
- Redis Geo with TTL
- Postgres + PostGIS
- Cassandra wide-column store
Step 3: Real-time updates
- WebSockets gateway
- Server-Sent Events
- Client polling
Step 4: Matching pipeline
- Synchronous matching
- Queue + matcher workers
- Stream processing pipeline
Step 5: Caching and resilience
- No cache, rely on DB
- Redis cache + fallbacks
- Geo-sharded cache + fallback
Each choice includes metric deltas and per-choice feedback. This keeps the
scenario concrete and ensures the final result matrix is meaningful.
If you want to refresh the fundamentals, browse Architecture questions.
System design simulator UI and step engine
Under the hood, the simulator is state-driven:
answersmaps step IDs to chosen choice IDs.currentStepIndextracks progress.- The diagram is recomputed every time answers change.
Navigation is intentionally simple. You can go back and change an answer, but you cannot move on without picking one for the current step. That keeps the diagram and scoring in sync. When a choice changes, we reset the summary so it always reflects the latest path. Small detail, big difference.
Scoring logic
Scoring works in a few steps:
- Initializes a metric totals object with zeros.
- Iterates over each answered step.
- Applies each choice’s
metricsdeltas to the totals. - Applies any
conditionalMetricswhen flags match. - Stores per-step results for the final report.
We do not call answers right or wrong; we label the tradeoffs like this:
- Strong if total >= +2
- Balanced if between -1 and +1
- Risk if total ≤ -2
That is the intent: show tradeoffs instead of right answers.
Scoring matrix and feedback details
At the end, users get a single summary panel. It includes:
- Summary metrics (latency, cost, reliability, complexity, scalability)
- Per-step strengths and risks
- A tradeoff summary
- An "influences" section for conditional effects
- The new "Why signals moved" section
We keep the flow clean by saving feedback for the end. That lets people stay focused while they choose, then reflect afterward. Pairing per-step notes with global metrics makes the story feel cohesive — you can see both the local reasoning and the overall system behavior.
This is where the story comes together: you can see how early decisions were softened or amplified by later choices.
Because the matrix is derived from the scenario config, new scenarios do not require changes to the rendering logic.
Verification and known gaps (system design simulator)
We did a light validation of the dictionary JSON by parsing it with Node. We did not run the full lint/test suite during this build.
If you want a full verification pass, the next steps should be:
bun installbun lint- Open the simulator in both English and Polish locales in light and dark themes.
This is the minimum to confirm the simulator renders, Mermaid loads correctly, and the UI looks good under both themes.
Using Codex skills for planning and commits
We leaned on Codex to keep the work tidy. The planning step forced us to answer
the tricky questions up front: scope, feedback timing, and how choices should
influence each other. That made the definition of done concrete and kept us from
drifting.
If you want the exact skill definitions we used, they live in codex-skills.
We also compared options side by side: diagram engines, scoring polarity, and where logic should live. We chose the simpler, data-driven path so new scenarios could be added without changing the engine. It saved us from over-engineering a custom editor or hardcoding a decision tree.
Here is a trimmed version of the Q&A we kept in front of us:
Q: When should feedback be shown? A: At the end, so users stay in flow while making decisions.
Q: Can users change their mind? A: Yes. Backtracking is allowed, and results recompute immediately.
Q: Are there correct answers? A: No. Every option is viable but comes with tradeoffs.
Q: How do earlier choices affect later ones? A: Use flags + conditional deltas to model dependencies.
Q: How many steps should the MVP include? A: Five steps with three choices each.
We also kept a tiny tradeoff matrix to make decisions explicit:
Decision Option A Option B
Diagram engine Mermaid (fast, readable) Custom editor (powerful, heavy)
Feedback timing End summary (cohesive) Inline (interrupts flow)
Scoring polarity Higher-is-better (clear) Mixed polarity (confusing)
Logic placement Data-driven (extensible) Hardcoded (brittle)
Scenario scope One deep scenario Many shallow scenarios
Finally, the commit workflow kept history clean: review the changes, stage them as one coherent unit, and write a clear Conventional Commit message. A clean example looks like this:
feat: add system design simulator with diagram feedback
Final summary
The System Design Simulator is now a first-class experience in the app:
- It shows a dynamic architecture diagram that updates after each choice.
- It gives a transparent, reasoned explanation of every metric shift.
- It captures inter-step dependencies via conditional scoring.
- It stays fully localized and theme-aware.
Most importantly, it creates a learning experience where users understand why tradeoffs exist and how architecture decisions compound over time.
If you want a deeper technical breakdown, I can include exact code snippets for any part of the simulator or generate a full decision tree for the 243 possible paths.
Read next
View allSystem Design Interview: From Zero to Hero
Learn how to approach complex system design problems and communicate your architecture decisions effectively.
System Design Simulator: Uber-like Architecture Walkthrough
A behind-the-scenes look at the new System Design Simulator: how decisions shape the diagram, metrics, and tradeoffs.