Skip to content

7. Context & Status

/context shows token usage and lets you raise or lower the limit. /status is a one-screen snapshot of session state. Together they're your dashboard.

/status was already covered in 1. Getting Started; this page focuses on /context and the broader question of "why is my session getting expensive."

/context — token-budget dashboard

text
> /context

Context Window Status:
  Estimated tokens: 47,231
  Max tokens:       200,000
  Usage:            23.6%
  Messages:         54
  Compact failures: 0/3
  Last compact:     2026-04-30 14:08:12  82,419 → 31,204 tokens | 21 summarized, 18 kept
  Re-injected files: src/auth.py, tests/test_auth.py
FieldMeaning
Estimated tokensApproximate token count of the conversation as it would be sent now
Max tokensConfigured upper bound (default 200,000)
UsageEstimated / Max. Color: green (<55%), yellow (<65%), red (>=65%)
MessagesNumber of messages in agent.messages
Compact failuresHow many times the auto-compactor has failed in this session, out of the circuit-breaker limit. Hits the limit → auto-compact disables for safety.
Last compactWhen auto-compaction last ran; pre/post token counts; how many messages were summarized vs kept
Re-injected filesFiles re-attached to context after compaction (kept "live" because the agent recently read them)

Why the color thresholds are below 100%

The bar isn't "you crashed at 100%". Auto-compaction triggers well before the model's hard limit because part of the budget is reserved for the next response and tool outputs. By the time you're at 65%+ red, compaction is imminent.

/context limit <n> — change the budget

text
> /context limit 100000
Context limit set to 100,000 tokens

> /context limit 500000
Context limit set to 500,000 tokens

What it does:

  • Sets context_manager.max_tokens for this session
  • Affects when auto-compaction triggers (compaction fires before approaching this number)
  • Resets on restart — to make it persistent, set the AGENTAO_CONTEXT_TOKENS environment variable (see 10. Configuration Reference)

Minimum: 1,000. Below that the CLI refuses.

When to lower:

  • You want compaction to kick in earlier (cheaper turns at the cost of more summarization)
  • You're using a smaller model with a smaller real context window than 200K

When to raise:

  • You're on a 1M-context model and want fewer compactions
  • You're running long, file-heavy plan sessions where the model genuinely needs more state

What auto-compaction actually does

When /context usage approaches the configured limit, the context manager:

  1. Picks an older block of messages from the conversation
  2. Asks the LLM to summarize them into a [Conversation Summary] block
  3. Replaces those messages with the summary in agent.messages
  4. Keeps the most recent N messages and the in-progress tool loop intact
  5. Re-injects file contents the agent recently read (the Re-injected files line)
  6. Writes the summary to the session_summaries table (see 6. Memory)

The summary lives both in the live message history (so the next turn sees it) and in the DB (so future sessions can reference it via memory).

The "circuit breaker" is a safety: if compaction itself fails (LLM timeout, parse error) more than CIRCUIT_BREAKER_LIMIT times in a row, auto-compaction disables for the rest of the session — better to refuse a turn than spiral.

/compact — compact on demand

/compact runs the same full-compaction path as the auto-compactor, but right now instead of waiting for the usage bar to climb.

text
> /compact
Compacted history: 54 → 19 messages, ~47,231 → ~12,880 tokens (6.4% of window).

What happens:

  • Calls context_manager.compress_messages(..., is_auto=False) — summarize an older block into a [Conversation Summary], keep recent messages + the in-progress tool loop, re-inject recently-read files.
  • Fires the same CONTEXT_COMPRESSED and session-summary observability events as auto-compaction, and dispatches matching PreCompact plugin hooks (with trigger="manual"), so replay and hooks see manual /compact the same way they see the threshold-driven path.
  • Refreshes the context-usage % shown in the prompt.

When to use it:

  • Before a big task — you know history is bloated and you'd rather pay the summarization cost now than have it land mid-turn.
  • Right after /sessions resume — squash a long restored history before the first new turn.
  • Bills creeping up/context shows 50%+ but you're not near the auto-compaction trigger yet.

Edge cases:

  • Fewer than 5 messages → Not enough conversation history to compact yet. (nothing to summarize).
  • If compaction can't make progress — circuit breaker open, no safe split point, or the summarization LLM call failed — you get Compaction made no change … and history is left untouched (check agentao.log).
  • Same lossiness as auto-compaction: anything not in the summary or in re-injected files is gone from the agent's perspective.

/status quick-reference (full content in chapter 1)

text
> /status
LineAction it suggests
Conversation summary shows huge message countConsider /clear to rotate
Permission Mode is full-accessConsider stepping back to workspace-write
Loaded sources lists unexpected pathsAudit ~/.agentao/permissions.json and the built-in preset
Markdown rendering OFF and you wanted it ON/markdown toggles
Task List shows pending itemsAgent has open todos — ask it to continue
ACP servers 0/N runningServers crashed or never started — /acp status to investigate

Combining the two: triage flow

When something feels off, run both:

text
> /status      # see what's loaded and how
> /context     # see how much you're spending
SymptomFirst check
Each turn is slow/context — usage % and last compact time
Bills spiking/context for tokens, /status for active skills (each adds prompt size)
Agent forgot something obvious/memory status (chapter 6) — recall errors > 0?
Tool calls keep failing/status permission mode + /mcp / /acp (chapter 8)

Pitfalls

  • Estimated tokens is approximate — the manager uses a heuristic per character, not the model's tokenizer. Real OpenAI/Anthropic counts can be 5–15% off. Use it as a trend, not a precise gauge.
  • Compaction is lossy — anything not in the summary or in re-injected files is gone from the agent's perspective. If the agent suddenly "forgets" something specific, check Last compact — it may have been summarized.
  • Lowering limit mid-session can trigger immediate compaction — if you set a limit below current usage, the next turn will compact aggressively. Sometimes desirable, sometimes surprising.
  • Re-injected files reflect recency, not importance — if a critical file hasn't been touched in a while, it may not survive compaction. To force preservation, ask the agent to read it again.

Where to go next

Want to…Read
Inspect what memory is bloating context6. Memory/memory status
Tune compaction thresholds in config10. Configuration Reference
Understand the embedded compaction APIPart 4 · Event Layer

Where this fits

The context manager is agent.context_manager. Embedding hosts can read cm.get_usage_stats(agent.messages) to power a host-side "context bar" UI, or call cm.compact() directly to force a compaction. The auto-compaction trigger is the same in both paths.

Authoritative help

Command syntax: /help. /context body: agentao/cli/commands/context.py. /compact body: agentao/cli/commands/compact.py. Compaction logic: agentao/context_manager.py.