Zylos LogoZylos
2026-06-23

Cloud-Hosted Agent Runtimes: The Architecture of Headless Autonomous Execution

ai-agentscloud-runtimeheadless-executionsandboxmanaged-agentsagent-infrastructuremicrovmsession-management

Executive Summary

The AI agent landscape has undergone a structural shift in the first half of 2026: the dominant execution model is no longer a local CLI process tethered to a developer's terminal, but a cloud-hosted runtime where agents operate autonomously in managed sandboxes. Claude Managed Agents (launched April 2026), OpenAI Codex cloud, Google Jules, and GitHub Copilot cloud agent each represent distinct architectural answers to the same fundamental question -- how do you let an AI agent run for minutes or hours, securely, without a human watching?

This article examines the architectural patterns that have converged across these platforms: the decoupling of session state from execution, the sandbox isolation spectrum from containers to microVMs, the async task submission models, and the practical trade-offs that matter when choosing or building a cloud-hosted agent runtime. For the Zylos project -- which already operates a persistent agent with its own memory, scheduler, and multi-channel communication -- understanding these patterns is essential for evaluating whether to adopt managed runtimes or continue the self-hosted approach.

The Shift from Local to Cloud Agent Execution

Why Headless Matters

The first generation of AI coding agents (2024-2025) ran locally: Aider in a terminal, Cursor in an IDE, Claude Code on your laptop. This model has a fundamental limitation -- the agent's lifetime is bounded by the developer's session. Close the lid, lose the agent.

Headless execution decouples the agent from the developer's presence. The interaction model shifts from synchronous conversation to asynchronous task delegation:

  1. Submit -- the developer describes a task (via issue, chat, API call, or scheduled trigger)
  2. Execute -- the agent works autonomously in a cloud environment
  3. Deliver -- results appear as a pull request, a message, or a status update
  4. Review -- the developer evaluates the output on their own schedule

This is not merely a convenience improvement. It changes what agents can do. Long-running refactors, multi-file migrations, test suite generation, and dependency upgrades all become practical when the agent is not competing for your terminal.

The 2026 Landscape

By mid-2026, every major AI lab has shipped a cloud-hosted agent runtime:

PlatformProviderModelSandboxLaunch
Claude Managed AgentsAnthropicClaude (Sonnet/Opus)Managed cloud or self-hostedApril 2026 (beta)
Codex CloudOpenAIGPT-5.5 / GPT-5.5 ProIsolated container, no internet during execution2025 (web), 2026 (CLI remote-control)
JulesGoogle LabsGemini 3 Pro / 3.1 ProGoogle-managed VM2025 (beta), 2026 (GA)
Copilot Cloud AgentGitHub / MicrosoftGPT-4.1 familyGitHub Actions runner2025 (preview), 2026 (GA + automations)
DevinCognitionProprietaryCognition cloud infra2024 (preview), 2026 (Teams)

Architectural Anatomy: Brain, Harness, and Hands

Anthropic's engineering blog on Managed Agents introduced a clean decomposition that generalizes across all cloud agent runtimes. The metaphor is borrowed from operating systems: create stable abstractions that outlast implementation details.

The Three Components

┌─────────────────────────────────────────────┐
│                  SESSION                     │
│  (append-only durable event log)            │
│  - User messages, tool calls, results       │
│  - Survives container crashes               │
│  - Stored outside the sandbox               │
└──────────────┬──────────────────────────────┘
               │
┌──────────────▼──────────────────────────────┐
│                  HARNESS (Brain)             │
│  - Stateless agent loop                     │
│  - Calls Claude / GPT / Gemini              │
│  - Routes tool calls to sandbox             │
│  - Independently replaceable                │
│  wake(sessionId) → resume from event log    │
└──────────────┬──────────────────────────────┘
               │
┌──────────────▼──────────────────────────────┐
│                  SANDBOX (Hands)             │
│  - Ephemeral execution environment          │
│  - File system, shell, network (scoped)     │
│  - Treated as untrusted                     │
│  - Replaceable "cattle, not pets"           │
│  execute(name, input) → string              │
└─────────────────────────────────────────────┘

Session is the source of truth. It is an append-only, durable event log stored completely outside the container. Every user message, tool call, and tool result is recorded here. When the harness crashes, restarts, or gets replaced, the session remains intact. This is what makes cloud agents resumable.

Harness is the "brain" -- the loop that calls the model, interprets tool-use requests, and dispatches them to the sandbox. Critically, the harness is stateless. It can be rebooted via wake(sessionId), which replays the relevant events from the session log into the model's context window. Anthropic reported that decoupling the harness from the sandbox improved time-to-first-token by approximately 60% at p50 and over 90% at p95.

Sandbox is the "hands" -- a container, VM, or microVM where the agent actually runs code, edits files, and executes commands. It is explicitly treated as untrusted. Credentials never live inside the sandbox; they are injected via resource-bundled auth (tokens consumed during initialization) or external vault proxies.

How Each Platform Maps to This Model

ComponentClaude Managed AgentsCodex CloudJulesCopilot Cloud Agent
SessionServer-side event log, SSE streaming, fetchable historyTask record in ChatGPT/APITask state in Jules dashboardGitHub issue/PR thread
HarnessAnthropic-managed, configurable agent + environmentOpenAI-managedGoogle-managedGitHub Actions workflow
SandboxCloud container or self-hosted sandboxIsolated container (internet disabled)Google Cloud VMGitHub Actions runner
Result deliverySSE events, file outputsPR, chat messagePR on GitHubPR on GitHub

Session Management: The Durability Question

The most important architectural decision in a cloud agent runtime is how session state is managed. This determines whether an agent can survive crashes, handle long-running tasks, and maintain context across interactions.

Append-Only Event Logs

Claude Managed Agents uses an append-only event log as its session primitive. The API exposes this through Server-Sent Events (SSE):

import anthropic

client = anthropic.Anthropic()

# Create an agent (once)
agent = client.managed_agents.create(
    model="claude-sonnet-4-20250514",
    system="You are a senior software engineer.",
    tools=["bash", "file_read", "file_write", "web_search"],
)

# Create an environment
environment = client.managed_agents.environments.create(
    agent_id=agent.id,
    packages=["python3", "nodejs", "git"],
    network_access=["github.com", "pypi.org"],
)

# Start a session
session = client.managed_agents.sessions.create(
    agent_id=agent.id,
    environment_id=environment.id,
)

# Send a task and stream results
with client.managed_agents.sessions.events.stream(
    session_id=session.id,
    event={"type": "user", "content": "Refactor the auth module to use JWT tokens"}
) as stream:
    for event in stream:
        if event.type == "assistant":
            print(event.content)
        elif event.type == "tool_use":
            print(f"[Tool] {event.name}: {event.input}")

The event history is persisted server-side. You can fetch the complete event log at any time, send additional user events to steer the agent mid-execution, or interrupt it entirely.

Task-Based Submission

Codex Cloud and Jules use a simpler task-based model: submit a task description, get back a result (typically a PR). There is no mid-flight steering -- you fire and forget.

# Codex CLI remote-control mode (headless)
codex remote-control --repo owner/repo \
  --task "Add input validation to all API endpoints" \
  --on-complete webhook:https://my-server.com/codex-done

# Jules via GitHub Action
# .github/workflows/jules.yml
on:
  issues:
    types: [opened, labeled]
jobs:
  jules:
    if: contains(github.event.issue.labels.*.name, 'jules')
    uses: google-labs-code/jules-action@v1
    with:
      model: gemini-3-pro
      task: ${{ github.event.issue.body }}

Copilot's Event-Driven Model

GitHub Copilot cloud agent introduced "automations" in June 2026 -- scheduled or event-triggered agent runs:

# Copilot cloud agent automation
on:
  schedule:
    - cron: '0 9 * * 1'  # Every Monday at 9am
  issues:
    types: [opened]
    
task: |
  Review the opened issue and if it's a bug report,
  attempt to reproduce it and propose a fix as a PR.

This is particularly interesting because it makes the agent a first-class CI/CD participant, not just a coding tool.

Sandbox Isolation: The Security Spectrum

Running untrusted code generated by an AI model is fundamentally a security problem. The 2026 landscape shows a clear spectrum of isolation approaches, each with different trade-offs.

The Isolation Hierarchy

Strongest ←──────────────────────────────→ Weakest
 microVM        gVisor        Container     Process
(Firecracker)  (syscall      (Docker/OCI)   (chroot)
               interception)

MicroVMs (Firecracker) provide the strongest isolation with a dedicated kernel per workload. Firecracker boots in approximately 125ms with less than 5 MiB overhead per VM and supports up to 150 microVMs per second per host. E2B uses this approach -- every sandbox gets its own microVM with hardware-level isolation.

gVisor intercepts syscalls in user space without requiring a full VM. It is a middle ground: stronger than containers, lighter than VMs. Google's internal infrastructure uses gVisor extensively.

Containers (Docker/OCI) share the host kernel. Fast startup, but the shared kernel surface area is a liability when an agent can write arbitrary code, install packages, and manipulate file descriptors. Major cloud providers have been migrating control planes away from runc toward hardware-enforced isolation.

Nested isolation is emerging as the production best practice: containers inside VMs, where each layer trusts the layer below it and nothing else.

Platform Isolation Choices

PlatformIsolation ModelNetwork During ExecutionPersistence
Claude Managed AgentsCloud container (configurable)Scoped (allowlist)Session lifetime
Codex CloudIsolated containerDisabledTask lifetime
JulesGoogle Cloud VMAvailableTask lifetime
Copilot Cloud AgentGitHub Actions runnerAvailableWorkflow lifetime
E2BFirecracker microVMConfigurable1h (Hobby) / 24h (Pro)
DaytonaDocker containerConfigurablePersistent until deleted
ModalContainer (GPU-capable)ConfigurableConfigurable

Codex's choice to disable internet during execution is the most aggressive security posture. It means the agent cannot exfiltrate data, but it also means all dependencies must be pre-installed or bundled with the repo. This is a deliberate trade-off: security over flexibility.

Credential Isolation

A critical pattern across all platforms is keeping credentials out of the sandbox. Two approaches dominate:

  1. Resource-bundled auth: Repository access tokens are consumed during initialization (e.g., git clone). The token is used once and not persisted in the sandbox filesystem.

  2. External vault with proxy: OAuth tokens and API keys are stored in a secure vault outside the sandbox. An MCP proxy or sidecar process fetches credentials on behalf of the agent when needed, without exposing them to the sandbox environment.

┌─────────────┐     ┌──────────────┐     ┌──────────────┐
│   Sandbox    │────▶│  MCP Proxy   │────▶│  Vault/KMS   │
│ (untrusted)  │     │  (trusted)   │     │  (secrets)   │
│              │◀────│              │◀────│              │
│  No secrets  │     │ Fetches on   │     │ Stores all   │
│  in env      │     │ demand       │     │ credentials  │
└─────────────┘     └──────────────┘     └──────────────┘

This prevents prompt injection attacks from accessing credentials -- even if the agent is tricked into running malicious code, the secrets are not in the sandbox.

Async Interaction Patterns

Cloud agent runtimes must solve the interaction problem: how does a developer submit work, monitor progress, and receive results?

Pattern 1: Streaming Events (Claude Managed Agents)

The richest interaction model. The client opens an SSE connection and receives a real-time stream of events -- assistant messages, tool calls, tool results, and status updates. The client can inject new user events at any point to steer the agent.

Client          Managed Agent
  │                  │
  │─── user event ──▶│
  │                  │── think ──▶
  │◀── assistant ────│
  │◀── tool_use ─────│
  │                  │── execute ──▶
  │◀── tool_result ──│
  │                  │── think ──▶
  │◀── assistant ────│
  │─── user event ──▶│  (mid-flight steering)
  │                  │── adjust ──▶
  │◀── assistant ────│
  │◀── session_end ──│

Pattern 2: Fire-and-Forget with PR Delivery (Jules, Copilot)

The simplest model. Submit a task, receive a PR when done. No mid-flight interaction.

Developer        Cloud Agent        GitHub
   │                 │                │
   │── assign issue ─▶│                │
   │                 │── clone repo ──▶│
   │                 │── work... ─────│
   │                 │── push branch ─▶│
   │                 │── create PR ───▶│
   │◀── PR notification ──────────────│

Pattern 3: Webhook Callbacks (Codex Remote-Control)

A hybrid approach where the client submits a task and registers a webhook URL. The agent calls back when done.

Orchestrator     Codex Cloud       Webhook Endpoint
    │                │                    │
    │── submit task ─▶│                    │
    │◀── task_id ─────│                    │
    │                │── execute... ──────│
    │                │── POST result ────▶│
    │◀── notification ────────────────────│

This pattern is particularly useful for CI/CD integration where a pipeline step triggers an agent and waits for completion before proceeding.

Pattern 4: Scheduled Automations (Copilot, Claude Routines)

Agents that run on a schedule without any human trigger:

  • Claude Code Routines: Managed by Anthropic, run on their cloud infrastructure on a cron schedule. Can spin up sub-agents using split-and-merge patterns.
  • Copilot Automations: Triggered by cron schedules or GitHub events (issue opened, PR merged, etc.).
  • Zylos Scheduler (C5): Self-hosted equivalent -- dispatches tasks to the agent at scheduled times, enabling the same autonomous operation pattern.

The Self-Hosted Alternative

Not every agent needs to run in someone else's cloud. The Zylos architecture represents the self-hosted end of the spectrum, with some unique advantages:

Managed vs. Self-Hosted Comparison

AspectManaged (Claude MA, Codex Cloud)Self-Hosted (Zylos)
SetupAPI key + configFull server setup
Sandbox controlPlatform-definedComplete control
PersistenceSession-scopedUnlimited (disk)
MemoryContext window onlyPersistent memory system
CommunicationAPI/webhookMulti-channel (Telegram, Lark, web)
SchedulingPlatform-providedCustom scheduler (C5)
Cost modelPer-token + sandbox timeFixed infra + per-token
Data residencyProvider's cloudYour infrastructure
CustomizationConfig + system promptFull code control

Claude Managed Agents now supports self-hosted sandboxes, creating a hybrid option: Anthropic runs the harness (brain), but the sandbox (hands) runs on your infrastructure. This addresses data residency concerns while still leveraging Anthropic's optimized agent loop.

When Self-Hosted Wins

Self-hosting remains the better choice when:

  • Persistent state is essential: The agent needs a durable filesystem, databases, and long-lived processes (e.g., PM2 services)
  • Multi-channel communication: The agent operates across Telegram, Lark, web console, and other channels simultaneously
  • Custom memory architecture: Tiered memory (identity, state, references, sessions, archive) that persists across all interactions
  • Full autonomy: The agent schedules its own tasks, monitors its own health, and manages its own lifecycle
  • Cost predictability: Fixed infrastructure costs are preferable to usage-based pricing for always-on agents

Performance Characteristics

Cold Start and Latency

Cold start performance varies significantly across sandbox providers:

ProviderCold StartWarm StartIsolation
Daytona~27-90ms<10msContainer
E2B~150ms<50msmicroVM
Modal~200-500ms<100msContainer (GPU-capable)
Firecracker (raw)~125ms<50msmicroVM
GitHub Actions15-45sN/A (always cold)Container

For Claude Managed Agents, Anthropic reported that decoupling the harness from the sandbox eliminated the need to wait for container provisioning before the first model call. The harness starts immediately, begins the model call, and the sandbox provisions in parallel.

Cost Considerations

Cloud agent execution has two cost components: model tokens and sandbox compute time.

  • E2B / Daytona: ~$0.05/vCPU-hour
  • Modal: ~$0.06/vCPU-hour (CPU), $2.78/GPU-hour (A10G)
  • Claude Managed Agents: Token pricing + sandbox time (beta pricing TBD)
  • Codex Cloud: Included in ChatGPT subscription tiers (Pro $20, Max $200)
  • Jules: Included in Gemini subscription
  • Copilot Cloud Agent: Included in GitHub Copilot subscription + Actions minutes

For high-volume use cases, the per-session sandbox costs can exceed the model token costs. A 30-minute agent session on E2B with 4 vCPUs costs approximately $0.10 in compute alone, plus model tokens.

Emerging Patterns and Best Practices

1. Multi-Brain, Multi-Hand Orchestration

The decoupled architecture enables powerful scaling patterns:

                    ┌─────────┐
                    │ Session  │
                    │  (log)   │
                    └────┬─────┘
                         │
              ┌──────────┼──────────┐
              │          │          │
         ┌────▼───┐ ┌───▼────┐ ┌──▼─────┐
         │Brain 1 │ │Brain 2 │ │Brain 3 │
         │(plan)  │ │(code)  │ │(review)│
         └───┬────┘ └───┬────┘ └───┬────┘
             │          │          │
         ┌───▼────┐ ┌───▼────┐ ┌──▼─────┐
         │Hand A  │ │Hand B  │ │Hand C  │
         │(sandbox│ │(sandbox│ │(sandbox│
         │   1)   │ │   2)   │ │   3)   │
         └────────┘ └────────┘ └────────┘

Claude Managed Agents supports this natively -- multiple harness instances can read from the same session, and each can dispatch work to different sandboxes. Claude Code Routines use a split-and-merge pattern where work is divided across parallel sub-agents.

2. Defense-in-Depth Security

Production deployments are converging on layered security:

  • Layer 1: Sandbox isolation (microVM or nested container-in-VM)
  • Layer 2: Network scoping (allowlist, no internet during execution for sensitive tasks)
  • Layer 3: Credential isolation (vault proxy, no secrets in sandbox)
  • Layer 4: Output validation (LLM-as-judge or rule-based checks on generated code)
  • Layer 5: Human approval gates (PR review before merge)

3. Idempotent Task Design

Cloud agents may be interrupted and restarted. Tasks should be designed for idempotency:

  • Use git branches as the unit of work (can be force-pushed on restart)
  • Check for existing PRs before creating new ones
  • Use git stash or workspace snapshots for resumability
  • Design database migrations to be re-runnable

4. Observability and Cost Attribution

As agents become autonomous background workers, observability becomes critical:

  • Token attribution: Track which task consumed which tokens
  • Session replay: Ability to replay the full event log for debugging
  • Cost alerts: Budget limits per session, per task type, per schedule
  • Failure detection: Liveness checks with automatic restart (similar to Zylos C2 activity monitor)

Implications for Zylos

The cloud-hosted runtime landscape validates several Zylos architectural decisions:

  1. Session as event log: Zylos's memory system (identity + state + sessions) is functionally equivalent to Managed Agents' append-only session log, but richer -- it includes structured memory tiers, not just raw events.

  2. Harness-sandbox separation: Zylos already separates the agent (Claude Code runtime) from its execution environment, and can switch runtimes (Claude Code / Codex) without losing state.

  3. Scheduled autonomy: The C5 scheduler enables the same autonomous operation patterns as Copilot Automations and Claude Routines, but with full control over task definitions and scheduling logic.

  4. Multi-channel delivery: While managed runtimes deliver results primarily as PRs, Zylos delivers results through whatever channel the user prefers -- Telegram, Lark, web console, or direct terminal interaction.

The key question for Zylos going forward is whether to adopt Claude Managed Agents' self-hosted sandbox mode for specific workloads (like code generation tasks that benefit from ephemeral, isolated execution) while keeping the persistent agent architecture for everything else.

Conclusion

Cloud-hosted agent runtimes represent the maturation of AI agents from interactive tools to autonomous workers. The architectural convergence around session-harness-sandbox separation, append-only event logs, and defense-in-depth security creates a solid foundation for production deployment.

The landscape is not winner-take-all. Managed runtimes excel at well-scoped, ephemeral tasks (code generation, test writing, refactoring). Self-hosted runtimes excel at persistent, multi-faceted agent operation (always-on assistants, multi-channel communication, custom workflows). The hybrid model -- managed brain, self-hosted hands -- may be the best of both worlds.

For builders: the key architectural takeaway is that the session log is the most important component. If you get durable session management right, everything else -- harness restarts, sandbox replacement, multi-agent orchestration -- becomes a matter of configuration rather than architecture.