Troubleshooting and recovery
This is the operational triage page. When a paired session misbehaves, work in two passes:
- Diagnose with
ctxrelay doctorandctxrelay statusbefore you change anything. - Recover or fix the specific symptom using the root-cause table below.
The most important habit is don't force-restart on instinct. ContextRelay keeps every message, handoff, note, and decision in a durable on-disk ledger, so a healthy daemon is almost always recoverable, and the symptom that looks like a crash is frequently a watchdog or a stale connection that a restart only makes worse.
contextrelay, ctxrelay, and context-relay all point to the same CLI. This
page uses ctxrelay for brevity; swap in whichever you prefer.
First moves: diagnose before you touch anything
ctxrelay doctor - is the environment sane?
ctxrelay doctor
doctor runs a checklist and prints OK, WARN, or ERR per line:
-
bun,claude, andcodexbinaries (plus the Codexapp-serversurface). -
Provider auth probes (a real
claude -pandcodex execcall). Skip these with--no-authwhen you are offline or just want the fast structural checks:ctxrelay doctor --no-auth -
Project config (
.contextrelay/config.json), state directory, and daemon tokens. -
Daemon health, including a warning if a same-project peer resolved a different instance/state directory.
-
Claude plugin registration - flags a version mismatch and tells you how to fix it.
-
Stale state - dead-pid
daemon.pid,codex-tui.pid, anddaemon.lockfiles under.contextrelay/state/.doctorreports these so you can clean them up; a clean stop (below) clears them for you.
If any check is ERR, doctor exits non-zero. Start there.
ctxrelay status - what is the live session doing?
ctxrelay status
ctxrelay status --json
status prints the daemon, session, connection, ledger, task, autonomy,
finality, and backup state. The --json form is machine-readable and is the
right input for scripts (for example, reading stateDir and controlPort).
ctxrelay instances - is this a port collision?
ctxrelay instances
ContextRelay scopes each project to its own port group (the first project gets
4500 Codex app-server / 4501 proxy / 4502 daemon control; additional
projects increment by 10). instances lists every known project, its instance
id, assigned ports, health, and when it was last seen. If two checkouts of the
same repo are fighting over a port, you will see it here.
Crash recovery
When a session dies - power loss, terminal closed, a real daemon crash - the ledger on disk survives. A fresh pair can reconstruct where you were from it.
ctxrelay recover
ctxrelay recover --json
recover summarizes the recovery context:
- the resolved session, instance, and ports, and whether the daemon is currently reachable;
- the last recorded shutdown and the last turn-watchdog event;
- possibly interrupted commands (commands that started but never recorded completion);
- recent failures and blockers from the ledger;
- the working-tree
git status; - a ready-to-paste resume prompt that tells the agents to call
read_contextandtask_statefirst, then continue from the newest request.
You do not reconstruct state from a chat transcript. Agents only share what is
written into the bridge messages and the ledger, so recover, read_context,
and task_state are how a new session learns what already happened.
Symptom → cause → fix
"Daemon disconnected" mid-turn
A "daemon disconnected" message most often masks the turn watchdog resetting a long-running Codex turn - not a real failure.
ContextRelay caps the wall-clock budget of a single Codex turn with
CONTEXTRELAY_TURN_MAX_MS (default 300000, i.e. 5 minutes). When a turn
exceeds it, the watchdog clears that turn from the busy set without killing
Codex and records a turn_watchdog event. The connection looks like it
dropped, but the daemon is fine.
Fix: check first, don't restart reflexively.
ctxrelay status # is the daemon actually healthy?
ctxrelay recover # shows the last watchdog event if one fired
For legitimately long turns, raise the budget instead of force-restarting (set all of the agents' launch env, or export it before launching the pair):
export CONTEXTRELAY_TURN_MAX_MS=900000 # 15 minutes
Set it to 0 to disable the watchdog entirely (not recommended for unattended
runs).
"It keeps crashing on restart"
A restart loop is almost always self-inflicted by force-restart thrash:
orphaned Codex app-server processes plus over-strict port classification, fed by
repeated hard restarts. The stale-bundle / version-mismatch warning from
doctor is advisory - it does not mean you must force-restart.
Fix: stop cleanly once, then relaunch a single time when the session is idle.
ctxrelay kill # clean stop of THIS project instance
# …then a single relaunch:
ctxrelay pair # or: ctxrelay claude / ctxrelay codex
ctxrelay kill marks the daemon as intentionally stopped before terminating it,
which closes the reconnect race that a raw process kill would open, and it cleans
up stale state files. For a genuine emergency across every project, the
all-instances stop is:
ctxrelay kill --all
To stop only one named session's Codex runtime while leaving the daemon, Claude, and other sessions running:
ctxrelay kill --session <id>
After a kill, start a fresh Claude Code conversation (or run /resume) so Claude
fully reconnects to the relaunched daemon.
"Codex has no ContextRelay tools"
If Codex cannot see send_to_claude, handoff_to_claude, read_context, and
the other Codex-side MCP tools, its MCP registration is missing.
Fix:
ctxrelay codex-mcp status # show the current registration
ctxrelay codex-mcp install # register the ContextRelay MCP server for Codex
Once installed, the registration is global, so any codex session in the
project picks up the tools - not only sessions launched with ctxrelay codex.
If you instead want plain codex windows to stop auto-attaching, remove it:
ctxrelay codex-mcp remove
(ctxrelay codex launches Codex connected to the daemon directly; codex-mcp
controls whether the tools are registered for standalone Codex sessions.)
"Stale Claude attachment" or "a live MCP call timed out"
If ctxrelay status shows Claude as attached but the foreground Claude is gone,
clear the stale attachment without disturbing Codex or the daemon:
ctxrelay detach-claude
This detaches the active Claude foreground only; Codex and the daemon keep running. If no Claude was attached, it tells you so.
For long reviews, prefer durable messaging over live calls. Live
deliberation and wait tools (deliberate_with_codex / wait_for_messages and
the Codex-side equivalents) are bounded and can time out at the bridge layer for
multi-minute work. When you expect a long turn, have the peer post a reply plus
an append_note and pick it up from the ledger with read_context, instead of
holding a live deliberation open.
"Codex stopped taking turns" (provider rate limits)
When Codex rejects a turn/start because of provider rate limits, ContextRelay
does not pretend the turn succeeded: the rejection is recorded as a
turn_aborted runtime event and Claude receives a system_turn_aborted notice
naming the reason. Queued Claude-to-Codex injections are not retried
immediately after a quota abort, so the pair does not burn through the rest of
your quota in a retry loop.
Fix: this is upstream quota, not a ContextRelay fault. Wait for the limit to
reset (or switch the Codex model/account), then resend the message or handoff.
The daemon stays healthy throughout - ctxrelay status should still show it up.
"ContextRelay won't activate" / "is it even on?"
A recent ContextRelay release supports a dormant-by-default mode (see Activation: auto-connect vs dormant), so "nothing is happening" can mean the session resolved to dormant rather than broken.
Fix: ask the gate why it decided what it decided.
ctxrelay gate-check --why # prints "active - <reason>" or "dormant - <reason>"
ctxrelay gate-check --json # machine-readable {active, reason}
gate-check exits 0 when active and 1 when dormant. The activation reason is
resolved by a fixed precedence - remember the top two rules when a session
surprises you:
- The env override
CONTEXTRELAY_AUTO_CONNECT(0/1/true/false) beats everything. - A per-workspace attach marker beats project and global config.
So if a session is unexpectedly active, check for CONTEXTRELAY_AUTO_CONNECT in
your environment and for an attach marker (left by ctxrelay attach). To opt the
current workspace in or out in-session:
ctxrelay attach # write the activation marker for this workspace
ctxrelay detach # remove it (does not stop the daemon or Codex)
Plugin or instruction blocks look out of date after an update
After updating the npm package
(npm i -g @proofofwork-agency/contextrelay@latest), reconcile the
Claude/Codex-facing surface with ctxrelay upgrade:
ctxrelay upgrade
ctxrelay upgrade --dry-run # preview every change, write nothing
upgrade is idempotent and safe to re-run. It:
- migrate-merges
.contextrelay/config.json, adding new default keys while preserving your existing values (it does not delete your settings or change your coordinator); - refreshes the managed CLAUDE.md / AGENTS.md blocks in place, preserving each file's slim (dormant) or full state;
- refreshes the bare
/contextrelaycommand only if it is already present; - re-registers and reinstalls the Claude plugin (skip with
--no-plugin); - prints the
from → toversion and reminds you to run/reload-pluginsin a running Claude Code so the new plugin loads.
Use --instructions refresh|project|global|both|skip to control instruction
handling (default refresh touches only files that already carry a managed
block).
ctxrelay upgrade?Use the manual fallback: ctxrelay dev (for local source checkouts) or
ctxrelay instructions install to refresh the managed blocks, then
ctxrelay doctor to confirm the plugin registration is current. See
Upgrading ContextRelay.
Ports and state hygiene
Stale lock and pid files under .contextrelay/state/ (such as daemon.lock,
daemon.pid, and codex-tui.pid) are detected by ctxrelay doctor and cleaned
up by a clean ctxrelay kill. You rarely need to delete them by hand.
If you override ports with environment variables, set all three or none - partial port overrides are rejected:
export CODEX_WS_PORT=4600
export CODEX_PROXY_PORT=4601
export CONTEXTRELAY_CONTROL_PORT=4602
If you only need to know which ports a project is using, prefer ctxrelay instances (or ctxrelay status --json) over guessing.
Next steps
- Frequently asked questions - shorter answers to common "how do I…" questions.
- Architecture overview - how the daemon, bridge, and ledger fit together, which explains why these symptoms happen.
- CLI command reference - every command, action, and flag in one place.
- Environment variables reference -
including
CONTEXTRELAY_TURN_MAX_MS,CONTEXTRELAY_AUTO_CONNECT, and the port overrides referenced above.