Part 6 · Security & Production Deployment

Shipping an agent to real users is 10× harder than running it on a developer's laptop. This part merges "security" and "production" — they're inseparable.

Key terms in this Part

Defense-in-depth — every layer assumes the one above failed; security never lives in one place · §6.1, G.5
SSRF blocklist — bans 127.0.0.1, 169.254.169.254, link-local, RFC1918 by default; only extend, never disable · §6.3, G.5
Working-directory golden rule — one tenant = one CWD; never share — file tools resolve there · §6.4
Session pool — TTL + LRU eviction over (tenant_id, session_id) keys; the production lifecycle pattern · §6.7
Sticky session — StatefulSet + PVC + sessionAffinity; how the same session lands on the same pod · §6.8

Coverage

6.1 Defense-in-Depth Model — 7-layer defense stack, 5 threat categories, minimum vs ideal
6.2 Shell Sandbox & Command Control — macOS sandbox-exec, 3 built-in profiles, Linux alternatives
6.3 Network & SSRF Defense — Domain layering, httpx redirects, MCP network isolation
6.4 Multi-Tenant & Filesystem Isolation — working_directory golden rule, DB isolation, /tmp pollution
6.5 Secrets & Prompt-Injection Defense — Five commandments, attack surfaces, red-team checklist
6.6 Observability & Audit — 4 observation axes, built-in replay, compliance logs
6.7 Resource Governance & Concurrency — Session pool, TTL, token budgets, memory estimation
6.8 Deployment, Canary & Rollback — Dockerfile, K8s StatefulSet, canary dimensions

Paths by role

Role	Suggested sections
DevOps / SRE	6.6 → 6.7 → 6.8
Security review	6.1 → 6.4 → 6.5
Platform engineer	6.1 → 6.2 → 6.3 → 6.4
PM (understand risk)	6.1 → 6.5 risk sections

Read by task

What you're doing now	Shortest path
Establish a production security baseline	6.1 Defense-in-depth → 6.4 Multi-tenant isolation → 6.5 Secrets & prompt injection
Tighten tool / shell / network boundaries	6.2 Shell sandbox → 6.3 Network & SSRF → 5.4 Permission Engine
Prepare launch and operations	6.6 Observability → 6.7 Resource governance → 6.8 Deployment
Run a pre-launch review	6.1 Pre-deployment checklist → 6.4 Self-check → 6.7 Pre-load-test checklist

Minimum pre-launch checks

Isolation: each tenant has a separate working_directory, and file tools cannot escape it.
Permissions: high-risk tools use requires_confirmation or permission rules, with a deny policy for batch mode.
Network: SSRF blocklists stay enabled, and MCP / HTTP tools can reach only expected domains.
Secrets: secrets are injected at runtime, with no plaintext credentials in logs, prompts, or memory.
Observability: tool calls, permission denials, confirmations, errors, and token cost are traceable.
Resources: session pools have TTL / LRU behavior, and load tests cover peak concurrency, long sessions, and abnormal exits.
Rollback: deployment has a canary dimension, and agent, tool, and prompt versions can be tied back to logs.

Mental model

Security is layered, not a single checkpoint; production is governance, not luck. Each layer assumes the one above it has already failed — that's how you always have a safety net.

→ Start with 6.1 →

Part 6 · Security & Production Deployment ​

Coverage ​

Paths by role ​

Read by task ​

Minimum pre-launch checks ​

Mental model ​

Part 6 · Security & Production Deployment

Coverage

Paths by role

Read by task

Minimum pre-launch checks

Mental model