Skip to content

Part 6 · Security & Production Deployment

Shipping an agent to real users is 10× harder than running it on a developer's laptop. This part merges "security" and "production" — they're inseparable.

Key terms in this Part

  • Defense-in-depth — every layer assumes the one above failed; security never lives in one place · §6.1, G.5
  • SSRF blocklist — bans 127.0.0.1, 169.254.169.254, link-local, RFC1918 by default; only extend, never disable · §6.3, G.5
  • Working-directory golden rule — one tenant = one CWD; never share — file tools resolve there · §6.4
  • Session pool — TTL + LRU eviction over (tenant_id, session_id) keys; the production lifecycle pattern · §6.7
  • Sticky sessionStatefulSet + PVC + sessionAffinity; how the same session lands on the same pod · §6.8

Coverage

Paths by role

RoleSuggested sections
DevOps / SRE6.6 → 6.7 → 6.8
Security review6.1 → 6.4 → 6.5
Platform engineer6.1 → 6.2 → 6.3 → 6.4
PM (understand risk)6.1 → 6.5 risk sections

Read by task

What you're doing nowShortest path
Establish a production security baseline6.1 Defense-in-depth6.4 Multi-tenant isolation6.5 Secrets & prompt injection
Tighten tool / shell / network boundaries6.2 Shell sandbox6.3 Network & SSRF5.4 Permission Engine
Prepare launch and operations6.6 Observability6.7 Resource governance6.8 Deployment
Run a pre-launch review6.1 Pre-deployment checklist6.4 Self-check6.7 Pre-load-test checklist

Minimum pre-launch checks

  • Isolation: each tenant has a separate working_directory, and file tools cannot escape it.
  • Permissions: high-risk tools use requires_confirmation or permission rules, with a deny policy for batch mode.
  • Network: SSRF blocklists stay enabled, and MCP / HTTP tools can reach only expected domains.
  • Secrets: secrets are injected at runtime, with no plaintext credentials in logs, prompts, or memory.
  • Observability: tool calls, permission denials, confirmations, errors, and token cost are traceable.
  • Resources: session pools have TTL / LRU behavior, and load tests cover peak concurrency, long sessions, and abnormal exits.
  • Rollback: deployment has a canary dimension, and agent, tool, and prompt versions can be tied back to logs.

Mental model

Security is layered, not a single checkpoint; production is governance, not luck. Each layer assumes the one above it has already failed — that's how you always have a safety net.

Start with 6.1 →