Cloud-Hosted Agent Runtimes: The Architecture of Headless Autonomous Execution
Executive Summary
The AI agent landscape has undergone a structural shift in the first half of 2026: the dominant execution model is no longer a local CLI process tethered to a developer's terminal, but a cloud-hosted runtime where agents operate autonomously in managed sandboxes. Claude Managed Agents (launched April 2026), OpenAI Codex cloud, Google Jules, and GitHub Copilot cloud agent each represent distinct architectural answers to the same fundamental question -- how do you let an AI agent run for minutes or hours, securely, without a human watching?
This article examines the architectural patterns that have converged across these platforms: the decoupling of session state from execution, the sandbox isolation spectrum from containers to microVMs, the async task submission models, and the practical trade-offs that matter when choosing or building a cloud-hosted agent runtime. For the Zylos project -- which already operates a persistent agent with its own memory, scheduler, and multi-channel communication -- understanding these patterns is essential for evaluating whether to adopt managed runtimes or continue the self-hosted approach.
The Shift from Local to Cloud Agent Execution
Why Headless Matters
The first generation of AI coding agents (2024-2025) ran locally: Aider in a terminal, Cursor in an IDE, Claude Code on your laptop. This model has a fundamental limitation -- the agent's lifetime is bounded by the developer's session. Close the lid, lose the agent.
Headless execution decouples the agent from the developer's presence. The interaction model shifts from synchronous conversation to asynchronous task delegation:
- Submit -- the developer describes a task (via issue, chat, API call, or scheduled trigger)
- Execute -- the agent works autonomously in a cloud environment
- Deliver -- results appear as a pull request, a message, or a status update
- Review -- the developer evaluates the output on their own schedule
This is not merely a convenience improvement. It changes what agents can do. Long-running refactors, multi-file migrations, test suite generation, and dependency upgrades all become practical when the agent is not competing for your terminal.
The 2026 Landscape
By mid-2026, every major AI lab has shipped a cloud-hosted agent runtime:
| Platform | Provider | Model | Sandbox | Launch |
|---|---|---|---|---|
| Claude Managed Agents | Anthropic | Claude (Sonnet/Opus) | Managed cloud or self-hosted | April 2026 (beta) |
| Codex Cloud | OpenAI | GPT-5.5 / GPT-5.5 Pro | Isolated container, no internet during execution | 2025 (web), 2026 (CLI remote-control) |
| Jules | Google Labs | Gemini 3 Pro / 3.1 Pro | Google-managed VM | 2025 (beta), 2026 (GA) |
| Copilot Cloud Agent | GitHub / Microsoft | GPT-4.1 family | GitHub Actions runner | 2025 (preview), 2026 (GA + automations) |
| Devin | Cognition | Proprietary | Cognition cloud infra | 2024 (preview), 2026 (Teams) |
Architectural Anatomy: Brain, Harness, and Hands
Anthropic's engineering blog on Managed Agents introduced a clean decomposition that generalizes across all cloud agent runtimes. The metaphor is borrowed from operating systems: create stable abstractions that outlast implementation details.
The Three Components
┌─────────────────────────────────────────────┐
│ SESSION │
│ (append-only durable event log) │
│ - User messages, tool calls, results │
│ - Survives container crashes │
│ - Stored outside the sandbox │
└──────────────┬──────────────────────────────┘
│
┌──────────────▼──────────────────────────────┐
│ HARNESS (Brain) │
│ - Stateless agent loop │
│ - Calls Claude / GPT / Gemini │
│ - Routes tool calls to sandbox │
│ - Independently replaceable │
│ wake(sessionId) → resume from event log │
└──────────────┬──────────────────────────────┘
│
┌──────────────▼──────────────────────────────┐
│ SANDBOX (Hands) │
│ - Ephemeral execution environment │
│ - File system, shell, network (scoped) │
│ - Treated as untrusted │
│ - Replaceable "cattle, not pets" │
│ execute(name, input) → string │
└─────────────────────────────────────────────┘
Session is the source of truth. It is an append-only, durable event log stored completely outside the container. Every user message, tool call, and tool result is recorded here. When the harness crashes, restarts, or gets replaced, the session remains intact. This is what makes cloud agents resumable.
Harness is the "brain" -- the loop that calls the model, interprets tool-use requests, and dispatches them to the sandbox. Critically, the harness is stateless. It can be rebooted via wake(sessionId), which replays the relevant events from the session log into the model's context window. Anthropic reported that decoupling the harness from the sandbox improved time-to-first-token by approximately 60% at p50 and over 90% at p95.
Sandbox is the "hands" -- a container, VM, or microVM where the agent actually runs code, edits files, and executes commands. It is explicitly treated as untrusted. Credentials never live inside the sandbox; they are injected via resource-bundled auth (tokens consumed during initialization) or external vault proxies.
How Each Platform Maps to This Model
| Component | Claude Managed Agents | Codex Cloud | Jules | Copilot Cloud Agent |
|---|---|---|---|---|
| Session | Server-side event log, SSE streaming, fetchable history | Task record in ChatGPT/API | Task state in Jules dashboard | GitHub issue/PR thread |
| Harness | Anthropic-managed, configurable agent + environment | OpenAI-managed | Google-managed | GitHub Actions workflow |
| Sandbox | Cloud container or self-hosted sandbox | Isolated container (internet disabled) | Google Cloud VM | GitHub Actions runner |
| Result delivery | SSE events, file outputs | PR, chat message | PR on GitHub | PR on GitHub |
Session Management: The Durability Question
The most important architectural decision in a cloud agent runtime is how session state is managed. This determines whether an agent can survive crashes, handle long-running tasks, and maintain context across interactions.
Append-Only Event Logs
Claude Managed Agents uses an append-only event log as its session primitive. The API exposes this through Server-Sent Events (SSE):
import anthropic
client = anthropic.Anthropic()
# Create an agent (once)
agent = client.managed_agents.create(
model="claude-sonnet-4-20250514",
system="You are a senior software engineer.",
tools=["bash", "file_read", "file_write", "web_search"],
)
# Create an environment
environment = client.managed_agents.environments.create(
agent_id=agent.id,
packages=["python3", "nodejs", "git"],
network_access=["github.com", "pypi.org"],
)
# Start a session
session = client.managed_agents.sessions.create(
agent_id=agent.id,
environment_id=environment.id,
)
# Send a task and stream results
with client.managed_agents.sessions.events.stream(
session_id=session.id,
event={"type": "user", "content": "Refactor the auth module to use JWT tokens"}
) as stream:
for event in stream:
if event.type == "assistant":
print(event.content)
elif event.type == "tool_use":
print(f"[Tool] {event.name}: {event.input}")
The event history is persisted server-side. You can fetch the complete event log at any time, send additional user events to steer the agent mid-execution, or interrupt it entirely.
Task-Based Submission
Codex Cloud and Jules use a simpler task-based model: submit a task description, get back a result (typically a PR). There is no mid-flight steering -- you fire and forget.
# Codex CLI remote-control mode (headless)
codex remote-control --repo owner/repo \
--task "Add input validation to all API endpoints" \
--on-complete webhook:https://my-server.com/codex-done
# Jules via GitHub Action
# .github/workflows/jules.yml
on:
issues:
types: [opened, labeled]
jobs:
jules:
if: contains(github.event.issue.labels.*.name, 'jules')
uses: google-labs-code/jules-action@v1
with:
model: gemini-3-pro
task: ${{ github.event.issue.body }}
Copilot's Event-Driven Model
GitHub Copilot cloud agent introduced "automations" in June 2026 -- scheduled or event-triggered agent runs:
# Copilot cloud agent automation
on:
schedule:
- cron: '0 9 * * 1' # Every Monday at 9am
issues:
types: [opened]
task: |
Review the opened issue and if it's a bug report,
attempt to reproduce it and propose a fix as a PR.
This is particularly interesting because it makes the agent a first-class CI/CD participant, not just a coding tool.
Sandbox Isolation: The Security Spectrum
Running untrusted code generated by an AI model is fundamentally a security problem. The 2026 landscape shows a clear spectrum of isolation approaches, each with different trade-offs.
The Isolation Hierarchy
Strongest ←──────────────────────────────→ Weakest
microVM gVisor Container Process
(Firecracker) (syscall (Docker/OCI) (chroot)
interception)
MicroVMs (Firecracker) provide the strongest isolation with a dedicated kernel per workload. Firecracker boots in approximately 125ms with less than 5 MiB overhead per VM and supports up to 150 microVMs per second per host. E2B uses this approach -- every sandbox gets its own microVM with hardware-level isolation.
gVisor intercepts syscalls in user space without requiring a full VM. It is a middle ground: stronger than containers, lighter than VMs. Google's internal infrastructure uses gVisor extensively.
Containers (Docker/OCI) share the host kernel. Fast startup, but the shared kernel surface area is a liability when an agent can write arbitrary code, install packages, and manipulate file descriptors. Major cloud providers have been migrating control planes away from runc toward hardware-enforced isolation.
Nested isolation is emerging as the production best practice: containers inside VMs, where each layer trusts the layer below it and nothing else.
Platform Isolation Choices
| Platform | Isolation Model | Network During Execution | Persistence |
|---|---|---|---|
| Claude Managed Agents | Cloud container (configurable) | Scoped (allowlist) | Session lifetime |
| Codex Cloud | Isolated container | Disabled | Task lifetime |
| Jules | Google Cloud VM | Available | Task lifetime |
| Copilot Cloud Agent | GitHub Actions runner | Available | Workflow lifetime |
| E2B | Firecracker microVM | Configurable | 1h (Hobby) / 24h (Pro) |
| Daytona | Docker container | Configurable | Persistent until deleted |
| Modal | Container (GPU-capable) | Configurable | Configurable |
Codex's choice to disable internet during execution is the most aggressive security posture. It means the agent cannot exfiltrate data, but it also means all dependencies must be pre-installed or bundled with the repo. This is a deliberate trade-off: security over flexibility.
Credential Isolation
A critical pattern across all platforms is keeping credentials out of the sandbox. Two approaches dominate:
-
Resource-bundled auth: Repository access tokens are consumed during initialization (e.g.,
git clone). The token is used once and not persisted in the sandbox filesystem. -
External vault with proxy: OAuth tokens and API keys are stored in a secure vault outside the sandbox. An MCP proxy or sidecar process fetches credentials on behalf of the agent when needed, without exposing them to the sandbox environment.
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Sandbox │────▶│ MCP Proxy │────▶│ Vault/KMS │
│ (untrusted) │ │ (trusted) │ │ (secrets) │
│ │◀────│ │◀────│ │
│ No secrets │ │ Fetches on │ │ Stores all │
│ in env │ │ demand │ │ credentials │
└─────────────┘ └──────────────┘ └──────────────┘
This prevents prompt injection attacks from accessing credentials -- even if the agent is tricked into running malicious code, the secrets are not in the sandbox.
Async Interaction Patterns
Cloud agent runtimes must solve the interaction problem: how does a developer submit work, monitor progress, and receive results?
Pattern 1: Streaming Events (Claude Managed Agents)
The richest interaction model. The client opens an SSE connection and receives a real-time stream of events -- assistant messages, tool calls, tool results, and status updates. The client can inject new user events at any point to steer the agent.
Client Managed Agent
│ │
│─── user event ──▶│
│ │── think ──▶
│◀── assistant ────│
│◀── tool_use ─────│
│ │── execute ──▶
│◀── tool_result ──│
│ │── think ──▶
│◀── assistant ────│
│─── user event ──▶│ (mid-flight steering)
│ │── adjust ──▶
│◀── assistant ────│
│◀── session_end ──│
Pattern 2: Fire-and-Forget with PR Delivery (Jules, Copilot)
The simplest model. Submit a task, receive a PR when done. No mid-flight interaction.
Developer Cloud Agent GitHub
│ │ │
│── assign issue ─▶│ │
│ │── clone repo ──▶│
│ │── work... ─────│
│ │── push branch ─▶│
│ │── create PR ───▶│
│◀── PR notification ──────────────│
Pattern 3: Webhook Callbacks (Codex Remote-Control)
A hybrid approach where the client submits a task and registers a webhook URL. The agent calls back when done.
Orchestrator Codex Cloud Webhook Endpoint
│ │ │
│── submit task ─▶│ │
│◀── task_id ─────│ │
│ │── execute... ──────│
│ │── POST result ────▶│
│◀── notification ────────────────────│
This pattern is particularly useful for CI/CD integration where a pipeline step triggers an agent and waits for completion before proceeding.
Pattern 4: Scheduled Automations (Copilot, Claude Routines)
Agents that run on a schedule without any human trigger:
- Claude Code Routines: Managed by Anthropic, run on their cloud infrastructure on a cron schedule. Can spin up sub-agents using split-and-merge patterns.
- Copilot Automations: Triggered by cron schedules or GitHub events (issue opened, PR merged, etc.).
- Zylos Scheduler (C5): Self-hosted equivalent -- dispatches tasks to the agent at scheduled times, enabling the same autonomous operation pattern.
The Self-Hosted Alternative
Not every agent needs to run in someone else's cloud. The Zylos architecture represents the self-hosted end of the spectrum, with some unique advantages:
Managed vs. Self-Hosted Comparison
| Aspect | Managed (Claude MA, Codex Cloud) | Self-Hosted (Zylos) |
|---|---|---|
| Setup | API key + config | Full server setup |
| Sandbox control | Platform-defined | Complete control |
| Persistence | Session-scoped | Unlimited (disk) |
| Memory | Context window only | Persistent memory system |
| Communication | API/webhook | Multi-channel (Telegram, Lark, web) |
| Scheduling | Platform-provided | Custom scheduler (C5) |
| Cost model | Per-token + sandbox time | Fixed infra + per-token |
| Data residency | Provider's cloud | Your infrastructure |
| Customization | Config + system prompt | Full code control |
Claude Managed Agents now supports self-hosted sandboxes, creating a hybrid option: Anthropic runs the harness (brain), but the sandbox (hands) runs on your infrastructure. This addresses data residency concerns while still leveraging Anthropic's optimized agent loop.
When Self-Hosted Wins
Self-hosting remains the better choice when:
- Persistent state is essential: The agent needs a durable filesystem, databases, and long-lived processes (e.g., PM2 services)
- Multi-channel communication: The agent operates across Telegram, Lark, web console, and other channels simultaneously
- Custom memory architecture: Tiered memory (identity, state, references, sessions, archive) that persists across all interactions
- Full autonomy: The agent schedules its own tasks, monitors its own health, and manages its own lifecycle
- Cost predictability: Fixed infrastructure costs are preferable to usage-based pricing for always-on agents
Performance Characteristics
Cold Start and Latency
Cold start performance varies significantly across sandbox providers:
| Provider | Cold Start | Warm Start | Isolation |
|---|---|---|---|
| Daytona | ~27-90ms | <10ms | Container |
| E2B | ~150ms | <50ms | microVM |
| Modal | ~200-500ms | <100ms | Container (GPU-capable) |
| Firecracker (raw) | ~125ms | <50ms | microVM |
| GitHub Actions | 15-45s | N/A (always cold) | Container |
For Claude Managed Agents, Anthropic reported that decoupling the harness from the sandbox eliminated the need to wait for container provisioning before the first model call. The harness starts immediately, begins the model call, and the sandbox provisions in parallel.
Cost Considerations
Cloud agent execution has two cost components: model tokens and sandbox compute time.
- E2B / Daytona: ~$0.05/vCPU-hour
- Modal: ~$0.06/vCPU-hour (CPU), $2.78/GPU-hour (A10G)
- Claude Managed Agents: Token pricing + sandbox time (beta pricing TBD)
- Codex Cloud: Included in ChatGPT subscription tiers (Pro $20, Max $200)
- Jules: Included in Gemini subscription
- Copilot Cloud Agent: Included in GitHub Copilot subscription + Actions minutes
For high-volume use cases, the per-session sandbox costs can exceed the model token costs. A 30-minute agent session on E2B with 4 vCPUs costs approximately $0.10 in compute alone, plus model tokens.
Emerging Patterns and Best Practices
1. Multi-Brain, Multi-Hand Orchestration
The decoupled architecture enables powerful scaling patterns:
┌─────────┐
│ Session │
│ (log) │
└────┬─────┘
│
┌──────────┼──────────┐
│ │ │
┌────▼───┐ ┌───▼────┐ ┌──▼─────┐
│Brain 1 │ │Brain 2 │ │Brain 3 │
│(plan) │ │(code) │ │(review)│
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │
┌───▼────┐ ┌───▼────┐ ┌──▼─────┐
│Hand A │ │Hand B │ │Hand C │
│(sandbox│ │(sandbox│ │(sandbox│
│ 1) │ │ 2) │ │ 3) │
└────────┘ └────────┘ └────────┘
Claude Managed Agents supports this natively -- multiple harness instances can read from the same session, and each can dispatch work to different sandboxes. Claude Code Routines use a split-and-merge pattern where work is divided across parallel sub-agents.
2. Defense-in-Depth Security
Production deployments are converging on layered security:
- Layer 1: Sandbox isolation (microVM or nested container-in-VM)
- Layer 2: Network scoping (allowlist, no internet during execution for sensitive tasks)
- Layer 3: Credential isolation (vault proxy, no secrets in sandbox)
- Layer 4: Output validation (LLM-as-judge or rule-based checks on generated code)
- Layer 5: Human approval gates (PR review before merge)
3. Idempotent Task Design
Cloud agents may be interrupted and restarted. Tasks should be designed for idempotency:
- Use git branches as the unit of work (can be force-pushed on restart)
- Check for existing PRs before creating new ones
- Use
git stashor workspace snapshots for resumability - Design database migrations to be re-runnable
4. Observability and Cost Attribution
As agents become autonomous background workers, observability becomes critical:
- Token attribution: Track which task consumed which tokens
- Session replay: Ability to replay the full event log for debugging
- Cost alerts: Budget limits per session, per task type, per schedule
- Failure detection: Liveness checks with automatic restart (similar to Zylos C2 activity monitor)
Implications for Zylos
The cloud-hosted runtime landscape validates several Zylos architectural decisions:
-
Session as event log: Zylos's memory system (identity + state + sessions) is functionally equivalent to Managed Agents' append-only session log, but richer -- it includes structured memory tiers, not just raw events.
-
Harness-sandbox separation: Zylos already separates the agent (Claude Code runtime) from its execution environment, and can switch runtimes (Claude Code / Codex) without losing state.
-
Scheduled autonomy: The C5 scheduler enables the same autonomous operation patterns as Copilot Automations and Claude Routines, but with full control over task definitions and scheduling logic.
-
Multi-channel delivery: While managed runtimes deliver results primarily as PRs, Zylos delivers results through whatever channel the user prefers -- Telegram, Lark, web console, or direct terminal interaction.
The key question for Zylos going forward is whether to adopt Claude Managed Agents' self-hosted sandbox mode for specific workloads (like code generation tasks that benefit from ephemeral, isolated execution) while keeping the persistent agent architecture for everything else.
Conclusion
Cloud-hosted agent runtimes represent the maturation of AI agents from interactive tools to autonomous workers. The architectural convergence around session-harness-sandbox separation, append-only event logs, and defense-in-depth security creates a solid foundation for production deployment.
The landscape is not winner-take-all. Managed runtimes excel at well-scoped, ephemeral tasks (code generation, test writing, refactoring). Self-hosted runtimes excel at persistent, multi-faceted agent operation (always-on assistants, multi-channel communication, custom workflows). The hybrid model -- managed brain, self-hosted hands -- may be the best of both worlds.
For builders: the key architectural takeaway is that the session log is the most important component. If you get durable session management right, everything else -- harness restarts, sandbox replacement, multi-agent orchestration -- becomes a matter of configuration rather than architecture.

