Cloud-Hosted Agent Runtimes: The Architecture of Headless Autonomous Execution

Executive Summary

The AI agent landscape has undergone a structural shift in the first half of 2026: the dominant execution model is no longer a local CLI process tethered to a developer's terminal, but a cloud-hosted runtime where agents operate autonomously in managed sandboxes. Claude Managed Agents (launched April 2026), OpenAI Codex cloud, Google Jules, and GitHub Copilot cloud agent each represent distinct architectural answers to the same fundamental question -- how do you let an AI agent run for minutes or hours, securely, without a human watching?

This article examines the architectural patterns that have converged across these platforms: the decoupling of session state from execution, the sandbox isolation spectrum from containers to microVMs, the async task submission models, and the practical trade-offs that matter when choosing or building a cloud-hosted agent runtime. For the Zylos project -- which already operates a persistent agent with its own memory, scheduler, and multi-channel communication -- understanding these patterns is essential for evaluating whether to adopt managed runtimes or continue the self-hosted approach.

The Shift from Local to Cloud Agent Execution

Why Headless Matters

The first generation of AI coding agents (2024-2025) ran locally: Aider in a terminal, Cursor in an IDE, Claude Code on your laptop. This model has a fundamental limitation -- the agent's lifetime is bounded by the developer's session. Close the lid, lose the agent.

Headless execution decouples the agent from the developer's presence. The interaction model shifts from synchronous conversation to asynchronous task delegation:

Submit -- the developer describes a task (via issue, chat, API call, or scheduled trigger)
Execute -- the agent works autonomously in a cloud environment
Deliver -- results appear as a pull request, a message, or a status update
Review -- the developer evaluates the output on their own schedule

This is not merely a convenience improvement. It changes what agents can do. Long-running refactors, multi-file migrations, test suite generation, and dependency upgrades all become practical when the agent is not competing for your terminal.

The 2026 Landscape

By mid-2026, every major AI lab has shipped a cloud-hosted agent runtime:

Platform	Provider	Model	Sandbox	Launch
Claude Managed Agents	Anthropic	Claude (Sonnet/Opus)	Managed cloud or self-hosted	April 2026 (beta)
Codex Cloud	OpenAI	GPT-5.5 / GPT-5.5 Pro	Isolated container, no internet during execution	2025 (web), 2026 (CLI remote-control)
Jules	Google Labs	Gemini 3 Pro / 3.1 Pro	Google-managed VM	2025 (beta), 2026 (GA)
Copilot Cloud Agent	GitHub / Microsoft	GPT-4.1 family	GitHub Actions runner	2025 (preview), 2026 (GA + automations)
Devin	Cognition	Proprietary	Cognition cloud infra	2024 (preview), 2026 (Teams)

Architectural Anatomy: Brain, Harness, and Hands

Anthropic's engineering blog on Managed Agents introduced a clean decomposition that generalizes across all cloud agent runtimes. The metaphor is borrowed from operating systems: create stable abstractions that outlast implementation details.

The Three Components

┌─────────────────────────────────────────────┐
│                  SESSION                     │
│  (append-only durable event log)            │
│  - User messages, tool calls, results       │
│  - Survives container crashes               │
│  - Stored outside the sandbox               │
└──────────────┬──────────────────────────────┘
               │
┌──────────────▼──────────────────────────────┐
│                  HARNESS (Brain)             │
│  - Stateless agent loop                     │
│  - Calls Claude / GPT / Gemini              │
│  - Routes tool calls to sandbox             │
│  - Independently replaceable                │
│  wake(sessionId) → resume from event log    │
└──────────────┬──────────────────────────────┘
               │
┌──────────────▼──────────────────────────────┐
│                  SANDBOX (Hands)             │
│  - Ephemeral execution environment          │
│  - File system, shell, network (scoped)     │
│  - Treated as untrusted                     │
│  - Replaceable "cattle, not pets"           │
│  execute(name, input) → string              │
└─────────────────────────────────────────────┘

Session is the source of truth. It is an append-only, durable event log stored completely outside the container. Every user message, tool call, and tool result is recorded here. When the harness crashes, restarts, or gets replaced, the session remains intact. This is what makes cloud agents resumable.

Harness is the "brain" -- the loop that calls the model, interprets tool-use requests, and dispatches them to the sandbox. Critically, the harness is stateless. It can be rebooted via wake(sessionId), which replays the relevant events from the session log into the model's context window. Anthropic reported that decoupling the harness from the sandbox improved time-to-first-token by approximately 60% at p50 and over 90% at p95.

Sandbox is the "hands" -- a container, VM, or microVM where the agent actually runs code, edits files, and executes commands. It is explicitly treated as untrusted. Credentials never live inside the sandbox; they are injected via resource-bundled auth (tokens consumed during initialization) or external vault proxies.

How Each Platform Maps to This Model

Component	Claude Managed Agents	Codex Cloud	Jules	Copilot Cloud Agent
Session	Server-side event log, SSE streaming, fetchable history	Task record in ChatGPT/API	Task state in Jules dashboard	GitHub issue/PR thread
Harness	Anthropic-managed, configurable agent + environment	OpenAI-managed	Google-managed	GitHub Actions workflow
Sandbox	Cloud container or self-hosted sandbox	Isolated container (internet disabled)	Google Cloud VM	GitHub Actions runner
Result delivery	SSE events, file outputs	PR, chat message	PR on GitHub	PR on GitHub

Session Management: The Durability Question

The most important architectural decision in a cloud agent runtime is how session state is managed. This determines whether an agent can survive crashes, handle long-running tasks, and maintain context across interactions.

Append-Only Event Logs

Claude Managed Agents uses an append-only event log as its session primitive. The API exposes this through Server-Sent Events (SSE):

import anthropic

client = anthropic.Anthropic()

# Create an agent (once)
agent = client.managed_agents.create(
    model="claude-sonnet-4-20250514",
    system="You are a senior software engineer.",
    tools=["bash", "file_read", "file_write", "web_search"],
)

# Create an environment
environment = client.managed_agents.environments.create(
    agent_id=agent.id,
    packages=["python3", "nodejs", "git"],
    network_access=["github.com", "pypi.org"],
)

# Start a session
session = client.managed_agents.sessions.create(
    agent_id=agent.id,
    environment_id=environment.id,
)

# Send a task and stream results
with client.managed_agents.sessions.events.stream(
    session_id=session.id,
    event={"type": "user", "content": "Refactor the auth module to use JWT tokens"}
) as stream:
    for event in stream:
        if event.type == "assistant":
            print(event.content)
        elif event.type == "tool_use":
            print(f"[Tool] {event.name}: {event.input}")

The event history is persisted server-side. You can fetch the complete event log at any time, send additional user events to steer the agent mid-execution, or interrupt it entirely.

Task-Based Submission

Codex Cloud and Jules use a simpler task-based model: submit a task description, get back a result (typically a PR). There is no mid-flight steering -- you fire and forget.

# Codex CLI remote-control mode (headless)
codex remote-control --repo owner/repo \
  --task "Add input validation to all API endpoints" \
  --on-complete webhook:https://my-server.com/codex-done

# Jules via GitHub Action
# .github/workflows/jules.yml
on:
  issues:
    types: [opened, labeled]
jobs:
  jules:
    if: contains(github.event.issue.labels.*.name, 'jules')
    uses: google-labs-code/jules-action@v1
    with:
      model: gemini-3-pro
      task: ${{ github.event.issue.body }}

Copilot's Event-Driven Model

GitHub Copilot cloud agent introduced "automations" in June 2026 -- scheduled or event-triggered agent runs:

# Copilot cloud agent automation
on:
  schedule:
    - cron: '0 9 * * 1'  # Every Monday at 9am
  issues:
    types: [opened]
    
task: |
  Review the opened issue and if it's a bug report,
  attempt to reproduce it and propose a fix as a PR.

This is particularly interesting because it makes the agent a first-class CI/CD participant, not just a coding tool.

Sandbox Isolation: The Security Spectrum

Running untrusted code generated by an AI model is fundamentally a security problem. The 2026 landscape shows a clear spectrum of isolation approaches, each with different trade-offs.

The Isolation Hierarchy

Strongest ←──────────────────────────────→ Weakest
 microVM        gVisor        Container     Process
(Firecracker)  (syscall      (Docker/OCI)   (chroot)
               interception)

MicroVMs (Firecracker) provide the strongest isolation with a dedicated kernel per workload. Firecracker boots in approximately 125ms with less than 5 MiB overhead per VM and supports up to 150 microVMs per second per host. E2B uses this approach -- every sandbox gets its own microVM with hardware-level isolation.

gVisor intercepts syscalls in user space without requiring a full VM. It is a middle ground: stronger than containers, lighter than VMs. Google's internal infrastructure uses gVisor extensively.

Containers (Docker/OCI) share the host kernel. Fast startup, but the shared kernel surface area is a liability when an agent can write arbitrary code, install packages, and manipulate file descriptors. Major cloud providers have been migrating control planes away from runc toward hardware-enforced isolation.

Nested isolation is emerging as the production best practice: containers inside VMs, where each layer trusts the layer below it and nothing else.

Platform Isolation Choices

Platform	Isolation Model	Network During Execution	Persistence
Claude Managed Agents	Cloud container (configurable)	Scoped (allowlist)	Session lifetime
Codex Cloud	Isolated container	Disabled	Task lifetime
Jules	Google Cloud VM	Available	Task lifetime
Copilot Cloud Agent	GitHub Actions runner	Available	Workflow lifetime
E2B	Firecracker microVM	Configurable	1h (Hobby) / 24h (Pro)
Daytona	Docker container	Configurable	Persistent until deleted
Modal	Container (GPU-capable)	Configurable	Configurable

Codex's choice to disable internet during execution is the most aggressive security posture. It means the agent cannot exfiltrate data, but it also means all dependencies must be pre-installed or bundled with the repo. This is a deliberate trade-off: security over flexibility.

Credential Isolation

A critical pattern across all platforms is keeping credentials out of the sandbox. Two approaches dominate:

Resource-bundled auth: Repository access tokens are consumed during initialization (e.g., git clone). The token is used once and not persisted in the sandbox filesystem.
External vault with proxy: OAuth tokens and API keys are stored in a secure vault outside the sandbox. An MCP proxy or sidecar process fetches credentials on behalf of the agent when needed, without exposing them to the sandbox environment.

┌─────────────┐     ┌──────────────┐     ┌──────────────┐
│   Sandbox    │────▶│  MCP Proxy   │────▶│  Vault/KMS   │
│ (untrusted)  │     │  (trusted)   │     │  (secrets)   │
│              │◀────│              │◀────│              │
│  No secrets  │     │ Fetches on   │     │ Stores all   │
│  in env      │     │ demand       │     │ credentials  │
└─────────────┘     └──────────────┘     └──────────────┘

This prevents prompt injection attacks from accessing credentials -- even if the agent is tricked into running malicious code, the secrets are not in the sandbox.

Async Interaction Patterns

Cloud agent runtimes must solve the interaction problem: how does a developer submit work, monitor progress, and receive results?

Pattern 1: Streaming Events (Claude Managed Agents)

The richest interaction model. The client opens an SSE connection and receives a real-time stream of events -- assistant messages, tool calls, tool results, and status updates. The client can inject new user events at any point to steer the agent.

Client          Managed Agent
  │                  │
  │─── user event ──▶│
  │                  │── think ──▶
  │◀── assistant ────│
  │◀── tool_use ─────│
  │                  │── execute ──▶
  │◀── tool_result ──│
  │                  │── think ──▶
  │◀── assistant ────│
  │─── user event ──▶│  (mid-flight steering)
  │                  │── adjust ──▶
  │◀── assistant ────│
  │◀── session_end ──│

Pattern 2: Fire-and-Forget with PR Delivery (Jules, Copilot)

The simplest model. Submit a task, receive a PR when done. No mid-flight interaction.

Developer        Cloud Agent        GitHub
   │                 │                │
   │── assign issue ─▶│                │
   │                 │── clone repo ──▶│
   │                 │── work... ─────│
   │                 │── push branch ─▶│
   │                 │── create PR ───▶│
   │◀── PR notification ──────────────│

Pattern 3: Webhook Callbacks (Codex Remote-Control)

A hybrid approach where the client submits a task and registers a webhook URL. The agent calls back when done.

Orchestrator     Codex Cloud       Webhook Endpoint
    │                │                    │
    │── submit task ─▶│                    │
    │◀── task_id ─────│                    │
    │                │── execute... ──────│
    │                │── POST result ────▶│
    │◀── notification ────────────────────│

This pattern is particularly useful for CI/CD integration where a pipeline step triggers an agent and waits for completion before proceeding.

Pattern 4: Scheduled Automations (Copilot, Claude Routines)

Agents that run on a schedule without any human trigger:

Claude Code Routines: Managed by Anthropic, run on their cloud infrastructure on a cron schedule. Can spin up sub-agents using split-and-merge patterns.
Copilot Automations: Triggered by cron schedules or GitHub events (issue opened, PR merged, etc.).
Zylos Scheduler (C5): Self-hosted equivalent -- dispatches tasks to the agent at scheduled times, enabling the same autonomous operation pattern.

The Self-Hosted Alternative

Not every agent needs to run in someone else's cloud. The Zylos architecture represents the self-hosted end of the spectrum, with some unique advantages:

Managed vs. Self-Hosted Comparison

Aspect	Managed (Claude MA, Codex Cloud)	Self-Hosted (Zylos)
Setup	API key + config	Full server setup
Sandbox control	Platform-defined	Complete control
Persistence	Session-scoped	Unlimited (disk)
Memory	Context window only	Persistent memory system
Communication	API/webhook	Multi-channel (Telegram, Lark, web)
Scheduling	Platform-provided	Custom scheduler (C5)
Cost model	Per-token + sandbox time	Fixed infra + per-token
Data residency	Provider's cloud	Your infrastructure
Customization	Config + system prompt	Full code control

Claude Managed Agents now supports self-hosted sandboxes, creating a hybrid option: Anthropic runs the harness (brain), but the sandbox (hands) runs on your infrastructure. This addresses data residency concerns while still leveraging Anthropic's optimized agent loop.

When Self-Hosted Wins

Self-hosting remains the better choice when:

Persistent state is essential: The agent needs a durable filesystem, databases, and long-lived processes (e.g., PM2 services)
Multi-channel communication: The agent operates across Telegram, Lark, web console, and other channels simultaneously
Custom memory architecture: Tiered memory (identity, state, references, sessions, archive) that persists across all interactions
Full autonomy: The agent schedules its own tasks, monitors its own health, and manages its own lifecycle
Cost predictability: Fixed infrastructure costs are preferable to usage-based pricing for always-on agents

Performance Characteristics

Cold Start and Latency

Cold start performance varies significantly across sandbox providers:

Provider	Cold Start	Warm Start	Isolation
Daytona	~27-90ms	<10ms	Container
E2B	~150ms	<50ms	microVM
Modal	~200-500ms	<100ms	Container (GPU-capable)
Firecracker (raw)	~125ms	<50ms	microVM
GitHub Actions	15-45s	N/A (always cold)	Container

For Claude Managed Agents, Anthropic reported that decoupling the harness from the sandbox eliminated the need to wait for container provisioning before the first model call. The harness starts immediately, begins the model call, and the sandbox provisions in parallel.

Cost Considerations

Cloud agent execution has two cost components: model tokens and sandbox compute time.

E2B / Daytona: ~$0.05/vCPU-hour
Modal: ~$0.06/vCPU-hour (CPU), $2.78/GPU-hour (A10G)
Claude Managed Agents: Token pricing + sandbox time (beta pricing TBD)
Codex Cloud: Included in ChatGPT subscription tiers (Pro $20, Max $200)
Jules: Included in Gemini subscription
Copilot Cloud Agent: Included in GitHub Copilot subscription + Actions minutes

For high-volume use cases, the per-session sandbox costs can exceed the model token costs. A 30-minute agent session on E2B with 4 vCPUs costs approximately $0.10 in compute alone, plus model tokens.

Emerging Patterns and Best Practices

1. Multi-Brain, Multi-Hand Orchestration

The decoupled architecture enables powerful scaling patterns:

                    ┌─────────┐
                    │ Session  │
                    │  (log)   │
                    └────┬─────┘
                         │
              ┌──────────┼──────────┐
              │          │          │
         ┌────▼───┐ ┌───▼────┐ ┌──▼─────┐
         │Brain 1 │ │Brain 2 │ │Brain 3 │
         │(plan)  │ │(code)  │ │(review)│
         └───┬────┘ └───┬────┘ └───┬────┘
             │          │          │
         ┌───▼────┐ ┌───▼────┐ ┌──▼─────┐
         │Hand A  │ │Hand B  │ │Hand C  │
         │(sandbox│ │(sandbox│ │(sandbox│
         │   1)   │ │   2)   │ │   3)   │
         └────────┘ └────────┘ └────────┘

Claude Managed Agents supports this natively -- multiple harness instances can read from the same session, and each can dispatch work to different sandboxes. Claude Code Routines use a split-and-merge pattern where work is divided across parallel sub-agents.

2. Defense-in-Depth Security

Production deployments are converging on layered security:

Layer 1: Sandbox isolation (microVM or nested container-in-VM)
Layer 2: Network scoping (allowlist, no internet during execution for sensitive tasks)
Layer 3: Credential isolation (vault proxy, no secrets in sandbox)
Layer 4: Output validation (LLM-as-judge or rule-based checks on generated code)
Layer 5: Human approval gates (PR review before merge)

3. Idempotent Task Design

Cloud agents may be interrupted and restarted. Tasks should be designed for idempotency:

Use git branches as the unit of work (can be force-pushed on restart)
Check for existing PRs before creating new ones
Use git stash or workspace snapshots for resumability
Design database migrations to be re-runnable

4. Observability and Cost Attribution

As agents become autonomous background workers, observability becomes critical:

Token attribution: Track which task consumed which tokens
Session replay: Ability to replay the full event log for debugging
Cost alerts: Budget limits per session, per task type, per schedule
Failure detection: Liveness checks with automatic restart (similar to Zylos C2 activity monitor)

Implications for Zylos

The cloud-hosted runtime landscape validates several Zylos architectural decisions:

Session as event log: Zylos's memory system (identity + state + sessions) is functionally equivalent to Managed Agents' append-only session log, but richer -- it includes structured memory tiers, not just raw events.
Harness-sandbox separation: Zylos already separates the agent (Claude Code runtime) from its execution environment, and can switch runtimes (Claude Code / Codex) without losing state.
Scheduled autonomy: The C5 scheduler enables the same autonomous operation patterns as Copilot Automations and Claude Routines, but with full control over task definitions and scheduling logic.
Multi-channel delivery: While managed runtimes deliver results primarily as PRs, Zylos delivers results through whatever channel the user prefers -- Telegram, Lark, web console, or direct terminal interaction.

The key question for Zylos going forward is whether to adopt Claude Managed Agents' self-hosted sandbox mode for specific workloads (like code generation tasks that benefit from ephemeral, isolated execution) while keeping the persistent agent architecture for everything else.

Conclusion

Cloud-hosted agent runtimes represent the maturation of AI agents from interactive tools to autonomous workers. The architectural convergence around session-harness-sandbox separation, append-only event logs, and defense-in-depth security creates a solid foundation for production deployment.

The landscape is not winner-take-all. Managed runtimes excel at well-scoped, ephemeral tasks (code generation, test writing, refactoring). Self-hosted runtimes excel at persistent, multi-faceted agent operation (always-on assistants, multi-channel communication, custom workflows). The hybrid model -- managed brain, self-hosted hands -- may be the best of both worlds.

For builders: the key architectural takeaway is that the session log is the most important component. If you get durable session management right, everything else -- harness restarts, sandbox replacement, multi-agent orchestration -- becomes a matter of configuration rather than architecture.