Part 6 · Security & Production Deployment
Shipping an agent to real users is 10× harder than running it on a developer's laptop. This part merges "security" and "production" — they're inseparable.
Key terms in this Part
- Defense-in-depth — every layer assumes the one above failed; security never lives in one place · §6.1, G.5
- SSRF blocklist — bans
127.0.0.1,169.254.169.254, link-local, RFC1918 by default; only extend, never disable · §6.3, G.5 - Working-directory golden rule — one tenant = one CWD; never share — file tools resolve there · §6.4
- Session pool — TTL + LRU eviction over
(tenant_id, session_id)keys; the production lifecycle pattern · §6.7 - Sticky session —
StatefulSet+ PVC +sessionAffinity; how the same session lands on the same pod · §6.8
Coverage
- 6.1 Defense-in-Depth Model — 7-layer defense stack, 5 threat categories, minimum vs ideal
- 6.2 Shell Sandbox & Command Control — macOS sandbox-exec, 3 built-in profiles, Linux alternatives
- 6.3 Network & SSRF Defense — Domain layering, httpx redirects, MCP network isolation
- 6.4 Multi-Tenant & Filesystem Isolation — working_directory golden rule, DB isolation, /tmp pollution
- 6.5 Secrets & Prompt-Injection Defense — Five commandments, attack surfaces, red-team checklist
- 6.6 Observability & Audit — 4 observation axes, built-in replay, compliance logs
- 6.7 Resource Governance & Concurrency — Session pool, TTL, token budgets, memory estimation
- 6.8 Deployment, Canary & Rollback — Dockerfile, K8s StatefulSet, canary dimensions
Paths by role
| Role | Suggested sections |
|---|---|
| DevOps / SRE | 6.6 → 6.7 → 6.8 |
| Security review | 6.1 → 6.4 → 6.5 |
| Platform engineer | 6.1 → 6.2 → 6.3 → 6.4 |
| PM (understand risk) | 6.1 → 6.5 risk sections |
Read by task
| What you're doing now | Shortest path |
|---|---|
| Establish a production security baseline | 6.1 Defense-in-depth → 6.4 Multi-tenant isolation → 6.5 Secrets & prompt injection |
| Tighten tool / shell / network boundaries | 6.2 Shell sandbox → 6.3 Network & SSRF → 5.4 Permission Engine |
| Prepare launch and operations | 6.6 Observability → 6.7 Resource governance → 6.8 Deployment |
| Run a pre-launch review | 6.1 Pre-deployment checklist → 6.4 Self-check → 6.7 Pre-load-test checklist |
Minimum pre-launch checks
- Isolation: each tenant has a separate
working_directory, and file tools cannot escape it. - Permissions: high-risk tools use
requires_confirmationor permission rules, with a deny policy for batch mode. - Network: SSRF blocklists stay enabled, and MCP / HTTP tools can reach only expected domains.
- Secrets: secrets are injected at runtime, with no plaintext credentials in logs, prompts, or memory.
- Observability: tool calls, permission denials, confirmations, errors, and token cost are traceable.
- Resources: session pools have TTL / LRU behavior, and load tests cover peak concurrency, long sessions, and abnormal exits.
- Rollback: deployment has a canary dimension, and agent, tool, and prompt versions can be tied back to logs.
Mental model
Security is layered, not a single checkpoint; production is governance, not luck. Each layer assumes the one above it has already failed — that's how you always have a safety net.