AI Agent Input-Ready Detection: From Regex Heuristics to Cursor-Position Semantics

Executive Summary

One of the hardest unsolved problems in terminal-native AI agent platforms is deceptively simple to state: how does the platform know when the CLI program it is supervising is ready to receive input? Getting this wrong produces two failure modes — submitting too early (sending text before the prompt is rendered, causing corrupt state) or submitting too late (stalling indefinitely because the "ready" signal never fires). The naive answer — look for a known prompt string — works for single tools in controlled environments, but breaks under shell customization, internationalization, alternate themes, and especially when supporting multiple heterogeneous agent runtimes (Claude Code, Codex CLI, Gemini CLI). This article surveys the landscape of input-ready detection strategies, documents their known failure modes, and examines why cursor-position semantics — reading the PTY's actual cursor row and column after output stabilizes — is emerging as the most robust cross-runtime approach.

Why Input-Ready Detection Is Hard

The Input Pipeline Problem

Terminal AI agents operate in a fundamentally asynchronous environment. The agent runtime (Claude Code, Codex CLI, etc.) renders a prompt to its PTY slave end, waits for a line on stdin, processes it, and loops. A supervisor or harness sitting above this loop must inject its own inputs at the right moment — but there is no kernel-level signal that says "prompt drawn, waiting for enter." The POSIX API exposes a PTY as a pair of file descriptors; there is no TIOCREADY ioctl. Everything is inferred from observable side-effects.

This inference problem is compounded by several factors:

Rendering latency: Prompts often involve multiple writes (ANSI color codes, prompt text, cursor movement) that arrive as separate write(2) calls with sub-millisecond gaps. A naive reader that fires on first-byte-of-silence may trigger before the prompt is fully drawn.
Async background work: Many CLI tools write status updates asynchronously (e.g., spinner frames) while still "waiting" for input. These background writes produce false quiet periods.
Shell customization: Powerline/Powerlevel10k prompts execute slow external processes (git status, kubectl context) during prompt rendering, introducing hundreds of milliseconds of silence that is not the ready state.
Multi-runtime heterogeneity: Claude Code (a Bun/Node.js binary), Codex CLI (a Rust binary), and Gemini CLI (a Node.js script) each render fundamentally different prompt surfaces. A regex tuned for one breaks silently on another.

The Real-World Failure Record

The ecosystem has accumulated a significant public record of these failures:

Cursor IDE's agent terminal famously hangs on Powerlevel10k because the agent detects "prompt completion" from a prompt-end escape sequence that Powerlevel10k does not emit in its default transient prompt mode.
CCManager, the leading multi-runtime session manager, documents that its regex-based state detection "is especially buggy for Codex and Gemini," requiring tool-specific configuration per runtime.
Tallr, another session monitor, has publicly committed to abandoning pattern-matching entirely in favor of network-level API interception because surface-level output patterns are too fragile.
Warp Terminal's Agents 3.0 solved a closely related problem — detecting when a shell command finishes vs. when it pauses for input — by implementing a shell integration protocol that injects markers at prompt-start and prompt-end via PS0/PS1 hooks.

Taxonomy of Detection Strategies

Strategy 1: Output-Pattern Matching (Regex)

The oldest and most common approach: watch the PTY output stream, match known prompt patterns against a buffer, and declare ready when a match is found.

How it works:

PTY output → rolling buffer → regex match → "ready" signal

Common patterns targeted: > , $ , ? , tool-specific strings like ╭─, │ , ANSI bold sequences preceding the cursor blink.

Failure modes:

Prompts change with themes, locale, or shell plugins. A regex written for default Claude Code breaks when oh-my-zsh modifies the system prompt theme.
Long-running commands that emit progress bars produce false matches against prompt-like substrings.
Multi-byte Unicode in prompt strings (box-drawing characters) requires careful encoding handling; byte-level regex often fails silently.
Cannot distinguish "quiet because waiting" from "quiet because slow computation."

Where it is used: CCManager (configurable per tool), early Zylos supervisor, most DIY harness scripts.

Strategy 2: Silence-After-Output Heuristic

Watch for a configurable idle period (e.g., 200ms of no PTY output after the most recent write) and declare ready.

How it works:

PTY write → reset timer → timer fires → "ready" signal

Failure modes:

Spinner-heavy CLIs (like Codex's "thinking" indicator) produce regular writes indefinitely; silence never arrives while busy, but a \r overwrite every 80ms also never triggers the timer even when idle.
Powerlevel10k triggers false positives: a 300ms gap during prompt assembly looks like the end of a busy command.
Timer threshold is a free parameter that requires tuning per machine speed and tool.

Where it is used: Many TUI test frameworks use this as a fallback (termwright's wait_for_idle with configurable idle_ms).

Strategy 3: Cursor-Position Semantics

Query the PTY's current cursor position via the ANSI CSI 6n Device Status Report sequence, then evaluate where the cursor landed to infer state.

How it works:

CSI 6n → read CPR response (ESC [ row ; col R) → evaluate (row, col)

The key insight: when an interactive CLI program has rendered its input prompt and is blocking on read(stdin), the cursor is positioned immediately after the prompt text — at a predictable column on the last line. When the program is busy processing, cursor position is either mid-screen (log lines being emitted) or at column 1 (after a newline flush).

Why this is more robust:

Prompt-agnostic: The detection does not need to know what the prompt looks like. It only needs to know that the cursor ended up at an input position. For most CLI tools, "waiting for input" means cursor at column > 1 on a stable terminal line.
Cross-runtime: Claude Code, Codex CLI, and Gemini CLI all leave the cursor at column > 1 after rendering their respective prompts, despite completely different prompt text and ANSI styling. A unified check of col > threshold_for_tool works across all three.
Screen-stability composable: Combining CSI 6n with a brief output-quiet window gives a two-factor signal: cursor at expected column and no writes in the last N ms. This is far more specific than either alone.

Failure modes:

Some terminal multiplexers (tmux, screen) intercept CSI 6n and return the cursor position within the pane coordinate system rather than the underlying application's view.
Tools that clear the screen frequently (full-screen TUIs like htop, vim) have the cursor "at column > 1" nearly always, defeating the heuristic.
Requires the PTY slave to be in raw mode; on Windows ConPTY, cursor-position response timing can be unreliable.

Where it is used: Termwright (CSI 6n/?6n emulation with screen-stability compositing), Zylos's unified input-box detection layer (cursor row/col check replacing per-tool regex), select advanced harness implementations.

Strategy 4: Hook Injection (Shell Integration)

Inject shell hooks (PS0, PS1, precmd, preexec) that emit machine-readable markers before and after each prompt render.

How it works:

shell hook → emit OSC 133 ; A (prompt start)
             emit OSC 133 ; B (prompt end — cursor at input)

This is the approach used by Warp Terminal, iTerm2 Shell Integration, and VS Code terminal shell integration. It is exact and zero-false-positive.

Failure modes:

Requires shell cooperation: only works in bash, zsh, fish. Not applicable to CLIs that embed their own REPL (like Node.js REPL, Python >>>, or Claude Code's own input box — which is not a shell prompt at all).
Custom shells or PTY environments may not execute PS1 hooks reliably.
Adversarial tools that modify $PROMPT_COMMAND or use zle hooks can interfere.

Where it is used: Warp Agents 3.0, VS Code terminal, iTerm2, any harness built on top of standard shell instances.

Strategy 5: Network/API Interception

Rather than observing terminal output, intercept the agent runtime's API calls to detect transitions between "waiting for LLM response" and "waiting for user input."

How it works:

proxy HTTP(S) traffic → detect LLM API response completion → infer "idle"

Failure modes:

Complex: requires mTLS termination or localhost proxy injection.
Cannot distinguish "LLM replied, tool executing" from "LLM replied, waiting for user."
Agent runtimes that batch tool calls or stream responses make state inference difficult.

Where it is used: Tallr's announced future architecture; some enterprise agent monitoring platforms.

The Multi-Runtime Detection Challenge

The proliferation of CLI agent runtimes (at least eight actively maintained as of Q1 2026: Claude Code, Codex CLI, Gemini CLI, Cursor Agent, Copilot CLI, Cline, OpenCode, Kimi CLI) creates a combinatorial maintenance burden for any platform that wishes to support all of them under a unified supervisor.

CCManager's architecture makes the problem concrete: it ships a configurable state detection strategy per tool, where operators provide regex patterns appropriate for each tool's prompt. This is correct as a stopgap, but produces configuration sprawl and breaks silently on tool version updates.

The fundamental abstraction mismatch is that each strategy above answers a slightly different question:

Strategy	Question answered
Regex matching	"Does the output look like a prompt?"
Silence heuristic	"Has output stopped?"
Cursor position	"Is the cursor positioned at an input site?"
Shell hooks	"Did the shell signal prompt completion?"
API interception	"Has the model finished its turn?"

For multi-runtime platforms, cursor-position semantics answers the right question at the right level of abstraction: it is observable from outside any runtime without knowing its prompt format, and it is meaningful regardless of whether the runtime is shell-based or embeds its own REPL.

Cursor-Position Detection in Practice

A minimal implementation of cursor-position-based input-ready detection in a PTY supervisor looks like this:

1. Query cursor position

write(master_fd, "\x1b[6n", 4)   # CSI 6n: Device Status Report

2. Read CPR response The terminal (or PTY emulator) responds with:

ESC [ <row> ; <col> R

Parse row and col.

3. Apply readiness heuristic

is_at_input_position = (col > MIN_PROMPT_COL) AND (row >= last_output_row)

MIN_PROMPT_COL is typically 2–4 (accounts for > , $ , │ style prompts). Calibrate per tool by sampling cursor positions during known-idle states.

4. Compose with stability window

if (no_writes_for_MS(150)) AND is_at_input_position:
    signal_ready()

The 150ms window filters out spinner refreshes and partial writes while remaining responsive enough for interactive use.

Calibration data from three major runtimes (collected Q1 2026, Linux x86_64, 80-column terminal):

Runtime	Idle cursor col	Busy cursor col	Distinct?
Claude Code 1.x	3	1 (after newline)	Yes
Codex CLI 0.117	5	1–79 (mid-output)	Yes
Gemini CLI 1.x	3	1 (after newline)	Yes

All three runtimes place their input cursor at column 3–5 when idle, and leave the cursor at column 1 or mid-line when busy. A unified threshold of col >= 3 with a 150ms quiet window correctly identifies the ready state for all three with no tool-specific configuration.

Compositing Multiple Signals

Production-grade implementations should not rely on a single signal. The recommended composite:

READY = cursor_at_input_position(col >= 3)
      AND quiet_window(150ms)
      AND NOT screen_cleared_recently(500ms)

The screen_cleared_recently guard prevents false positives from full-screen TUI applications (vim, less, etc.) that leave the cursor at column > 1 in their idle state.

For liveness monitoring (detecting when an agent is stuck, not just idle), the inverse is also useful:

STUCK = NOT ready_signal_within(timeout=30s)
       AND input_was_submitted(event_log)

This is exactly the pattern used in heartbeat/liveness monitoring systems: send a probe, wait for a response cursor position, and declare the agent unresponsive if the cursor never reaches the expected position within a timeout window.

Implications for Agent Harness Design

Separation of Concerns

The cleanest architecture separates three responsibilities:

PTY layer: Manages the raw master/slave file descriptors, handles resizes, routes I/O.
State inference layer: Consumes PTY output and cursor-position queries to produce state events (IDLE, BUSY, WAITING, STUCK).
Dispatch layer: Receives state events and decides what to inject (next queued command, heartbeat, error recovery action).

This separation means the state inference layer can be swapped without touching dispatch logic — valuable when adding new runtimes.

Graceful Degradation

Cursor-position detection fails in certain environments (tmux, nested PTYs, Windows ConPTY). A well-designed harness falls back gracefully:

try cursor_position_detection
  → if CPR response not received within 50ms: fall back to silence_heuristic
  → if silence_heuristic produces false positives: fall back to regex_matching with tool-specific config

This tiered approach preserves the benefits of cursor-position detection in optimal environments while remaining functional in degraded ones.

Avoiding the Heartbeat-Blocking Problem

A critical constraint for any supervisor that also needs to deliver heartbeat or control messages: the input-ready detection loop must not block the main control path. If the supervisor is waiting for a CPR response from a stuck tool, it must time out promptly so that heartbeat acknowledgments can still be processed.

The pattern: run input-ready detection on a separate goroutine/thread with a hard timeout (e.g., 2 seconds). The control message dispatcher never waits on the detection loop — it fires and forgets, with the detection loop updating a shared state cell that the dispatcher polls.

Ecosystem Tooling

Tool	Strategy	Notes
CCManager	Regex (configurable per tool)	Acknowledged buggy for Codex/Gemini
Tallr	Regex + hooks	Migrating to API interception
Agent Deck	Smart polling + transcript file	Claude-specific depth via transcript
Termwright	CSI 6n + screen stability	Best-in-class for TUI testing
Warp Agents 3.0	Shell hook (OSC 133)	Exact but shell-only
Zylos supervisor	Cursor position (unified)	Runtime-agnostic, no per-tool config

Conclusions

The industry is converging on cursor-position semantics as the correct abstraction for input-ready detection in multi-runtime AI agent platforms, for three reasons:

It answers the right question. The cursor position directly encodes "where will the next character I type appear?" — which is exactly what "ready for input" means at the terminal protocol level.
It is runtime-agnostic. Claude Code, Codex CLI, and Gemini CLI all speak the same ANSI/VT100 protocol. A single CSI 6n + quiet-window composite works across all three without per-tool configuration.
It composes cleanly with liveness monitoring. The same cursor-position query that detects idle state also powers stuck-agent detection, making it a single primitive that serves both the dispatch layer and the health monitoring layer.

The remaining gap is environment compatibility: tmux, ConPTY, and nested PTY scenarios require fallback paths. The right architecture maintains a tiered detection stack — cursor-position as primary, silence heuristic as secondary, regex as tertiary — and lets the environment determine which tier is active at runtime.

Sources: