Browser Automation Human-Handoff Experiment Results

Date: 2026-01-06 Status: ✅ Validated Successfully

Executive Summary

Validated a background agent + signal file architecture for browser automation with human-in-the-loop capability. The system allows autonomous browser operations while enabling human intervention for challenges like CAPTCHAs, without blocking the main conversation.

Architecture Validated

┌─────────────────────────────────────────────────────────────┐
│                     Main Agent (Claude)                      │
│  - Receives all Telegram messages                           │
│  - Routes "继续" signals to background agents               │
│  - Never blocked by background operations                   │
└─────────────────────────────────────────────────────────────┘
          │                                    ▲
          │ Launch                             │ Results
          ▼                                    │
┌─────────────────────────────────────────────────────────────┐
│                   Background Agent                           │
│  - Executes browser automation tasks                        │
│  - Writes signal file when human help needed                │
│  - Polls for continue signal                                │
│  - Sends Telegram notifications                             │
└─────────────────────────────────────────────────────────────┘
          │                                    ▲
          │ Write                              │ Read
          ▼                                    │
┌─────────────────────────────────────────────────────────────┐
│              Signal File (/tmp/browser-signal.json)          │
│  - {status: "waiting", task: "captcha"}                     │
│  - {status: "continue"}                                     │
└─────────────────────────────────────────────────────────────┘

Experiment Details

Test Scenario

Background agent starts "posting task"
Simulates encountering CAPTCHA
Writes waiting signal, sends Telegram notification
Polls for continue signal (2s intervals)
Howard says "继续"
Main agent writes continue signal
Background agent detects, continues execution
Task completes successfully

Key Findings

Component	Result	Notes
Background agent independence	✅	Runs without blocking main agent
Signal file mechanism	✅	Simple, reliable IPC
Polling detection	✅	2s interval sufficient
Telegram notifications	✅	Real-time updates work
Main agent routing	✅	Can distinguish signals from normal chat
Human handoff flow	✅	Natural conversation pattern

Message Routing Logic

User message received:
  │
  ├─ Is background agent waiting? (check signal file)
  │   │
  │   ├─ YES + message is "继续" → Write continue signal
  │   │
  │   └─ NO or other message → Process normally
  │
  └─ Respond to user

Implementation Components Needed

1. Browser Extension (Chrome MV3)

Service Worker: WebSocket connection to local server
Content Script: DOM manipulation (click, type, read)
Manifest: Permissions for all URLs

2. Local WebSocket Server (Node.js, PM2)

Receives commands from Claude (via file or socket)
Relays to browser extension
Returns results

3. Command Interface

File-based: /tmp/browser-cmd.json (command), /tmp/browser-result.json (result)
Or Unix socket for lower latency

4. Background Agent Protocol

Standard task execution flow
Human intervention detection (CAPTCHA, login required)
Signal file management
Timeout handling

Next Steps

Phase 1: Extension Skeleton
- manifest.json with required permissions
- Service worker with WebSocket client
- Content script with basic DOM operations
Phase 2: Local Server
- WebSocket server (ws library)
- Command queue management
- Result forwarding
Phase 3: Claude Integration
- Command file interface
- Background agent template for browser tasks
- Error handling and retry logic
Phase 4: Platform-Specific
- Xiaohongshu selectors and workflows
- Login state persistence
- Content posting automation

Risk Mitigation

Risk	Mitigation
Extension detection	Use real browser, minimal footprint
WebSocket disconnect	Auto-reconnect with exponential backoff
Command timeout	30s default, configurable per operation
Signal file corruption	JSON validation, atomic writes
Background agent crash	Main agent monitors, can restart

Conclusion

The core mechanism is validated and ready for implementation. The architecture supports:

Non-blocking browser automation
Human intervention when needed
Real-time status updates
Graceful error handling

Proceed with confidence to build the actual browser extension.