Prompt Engineering for AI Agent Systems: System Prompts, Tool Descriptions, and Instruction Hierarchies
Executive Summary
Production AI agent systems have moved far beyond basic prompt tips into sophisticated prompt architecture. All major agents (Claude Code, Cursor, Devin, Codex) converge on a five-layer system prompt anatomy: identity framing, behavioral rules, typed tool APIs, safety layers, and conditional sections assembled at runtime. Claude Code's system prompt alone is 110+ separate instruction strings totaling 16,000-25,000 tokens. Tool descriptions have become a critical engineering surface — vague descriptions are the primary driver of tool selection errors, and 31 tools add approximately 4,500 tokens per query. Instruction hierarchy (system > user > tool output) is now trained into models, achieving +63% defense against prompt extraction, though RL-based attacks still achieve 98% bypass rates against current defenses. The dominant multi-turn pattern across all production agents is plan-execute-observe-repeat with bounded iterations, but even 90%+ single-turn accuracy degrades to 10-15% success across full multi-step conversations — making coherence the central unsolved challenge.
System Prompt Architecture: The Constitutional Document Pattern
Five-Layer Anatomy
All major production agents converge on a common structure:
- Identity framing — "You are X, built by Y" (~100 tokens). Sets the agent's role and capabilities.
- Behavioral rules — Task execution principles, reversibility guidelines, tone, output format. The operational constitution.
- Typed tool APIs — Explicit per-tool instruction sections beyond just JSON schemas. How to use each tool, when to prefer one over another, what side effects to expect.
- Safety layers — Embedded as operational rules woven throughout the prompt, not a separate block. Security is architectural, not appended.
- Conditional sections — Mode-specific instructions (plan mode, auto mode, minimal mode) assembled dynamically at runtime.
Claude Code's system prompt is not a monolithic document but 110+ separate instruction strings conditionally assembled based on context — active tools, current mode, project configuration, and skill invocations.
Project-Layer Overrides: CLAUDE.md and .cursorrules
CLAUDE.md is dynamically injected as a <system-reminder> block, not hardcoded into the base prompt. This enables repository-specific behavior without modifying the base agent. The skills architecture extends this further: SKILL.md files are loaded only when invoked ("on-demand prompt expansion"). Cursor's .mdc rules files follow the same pattern — project-level behavioral overrides without touching the core system prompt.
Evolution: Shorter and Less Prescriptive
The long-term trend in system prompt design is toward brevity. Claude Code 2.0 (Sonnet 4.5, September 2025) shifted from "MUST" to "should", from "NEVER" to "NEVER... unless explicitly instructed", and removed entire code-convention blocks that the model had absorbed through RLHF training. The exception: safety-critical instructions (git operations, destructive actions) got tighter, not looser — addressing real incidents from earlier versions.
The principle: as model capability increases, reduce prompt verbosity. Find the smallest set of high-signal tokens that maximize the likelihood of the desired outcome.
Tool Description Best Practices
The Choice Overload Problem
Tool hallucinations come in two forms: function selection errors (calling non-existent or wrong tools) and parameter hallucination (fabricating input values). The root cause is often choice overload — presenting all tool descriptions in every prompt. At 31 tools, descriptions alone consume approximately 4,500 tokens per query with measurable accuracy degradation.
Engineering Tool Descriptions
| Practice | Rationale |
|---|---|
| State purpose, constraints, and side effects explicitly | Vague descriptions force the model to guess behavior |
| One clear domain per tool | Multi-purpose tools create ambiguous selection signals |
| Disclose side effects upfront | Hidden consequences lead to misuse |
| Strong typed input schemas with ranges and patterns | Ambiguous inputs drive parameter hallucination |
| Include usage guidance and follow-up steps | Reduces mid-task confusion |
The primary test: if a human engineer cannot definitively say which tool to use in a given situation, an AI agent cannot be expected to do better.
Semantic Tool Filtering
Production systems should filter tools before the agent sees them — use vector similarity against the current query to load only relevant tool definitions. This is the same principle as the on-demand skill loading architecture: do not pay the context cost for tools the agent will not need in the current step.
MCP Tool Description Standards
MCP standardizes tool descriptions across providers. Tools declare side effects via readOnlyHint and destructiveHint annotations. Critical security note: treat tool description annotations as untrusted unless from a verified server — descriptions themselves are an injection vector.
{
"name": "specific_verb_noun",
"description": "Clear purpose. Constraints. Side effects. Usage guidance.",
"inputSchema": { "type": "object", "properties": {}, "required": [] },
"annotations": { "readOnlyHint": true, "destructiveHint": false }
}
Instruction Hierarchy and Priority
The Formal Model
Without explicit hierarchy, LLMs treat all inputs equally — enabling injection through any channel.
Priority 0 — System messages (application developer)
Priority 10 — User messages (end user)
Priority 30 — Tool output (web results, API responses, third-party content)
OpenAI trained GPT-3.5 Turbo with two techniques for hierarchy enforcement: Context Synthesis (aligned responses) and Context Ignorance (teaching the model to answer "as if it never saw" conflicting lower-priority instructions). Results: +63% system prompt extraction defense, +30% jailbreak robustness, generalizing to unseen attack types.
The Arms Race
RL-Hammer attacks achieved approximately 98% success against GPT-4o's Instruction Hierarchy defense. Training-based hierarchy is necessary but not sufficient against adaptive adversaries. Prompt injection remains OWASP LLM Top 10 #1 in 2025.
Practical Layered Defense
- XML/delimiter separation between instruction blocks and user data
- Filter known injection patterns: "ignore previous", "act as", DAN variants
- LLM guardrails layer (Bedrock Guardrails, Azure AI Content Safety, NeMo)
- RAG source authentication — 5 poisoned documents can manipulate 90% of RAG responses
- Structured queries — treat user input as data structures, not natural language (USENIX Security 2025)
Context Window Management
Static vs Dynamic Content
| System Prompt (Static) | Dynamic Injection (Just-in-Time) |
|---|---|
| Identity and role | Retrieved documents / RAG results |
| Core behavioral rules | Current file or code being edited |
| Tool definitions | Conversation summary |
| Security constraints | User profile and preferences |
| Output format guidance | Skill instructions (on invocation) |
Three Long-Horizon Strategies
Compaction: Summarize conversation history and reinitiate with compressed summaries. Preserve architectural decisions and unresolved problems; discard redundant tool outputs. Best for conversational tasks.
Structured note-taking: Persistent external memory (CLAUDE.md, session files) outside the context window. The agent reads and writes to external files rather than relying on in-context history. Best for iterative development with clear milestones.
Sub-agent architectures: Clean context windows per sub-agent, returning condensed summaries (1,000-2,000 tokens) to the coordinator. Always scope what the sub-agent sees — suppress ancestral history. Best for parallel exploration and preventing context contamination.
Multi-Turn Agent Coherence
The Coherence Gap
Agents scoring 90%+ on individual tool calls succeed in only 10-15% of full multi-step conversations. Context drift and goal loss emerge across turns, not within single turns. This is the central unsolved challenge in agent engineering.
Production Coherence Mechanisms
One-action-per-iteration loops: All production agents use plan → execute → observe → repeat. Each iteration is bounded to a single meaningful action.
Verifiable intermediate goals: Tests as ground truth. Cursor's pattern: write failing tests first, implement until they pass. This provides objective checkpoints independent of model confidence.
Iteration limits with hooks: Cursor's .cursor/hooks.json loops agents until a success condition is met, with a maximum iteration cap. Autonomous completion without infinite loops.
Fresh context for subtasks: Start a new conversation when switching tasks or when the agent shows confusion. Sub-agent architectures provide clean windows by design.
Goal re-injection: For very long tasks, periodically re-inject the original goal to counter drift. Devin's planning phase before each major action implements this pattern.
Research Direction
Multi-turn PPO with both outcome rewards and intermediate step rewards produces significantly more coherent long-horizon behavior than single-turn training — beginning to appear in production model capabilities.
Production System Examples
Claude Code (v2.1.87)
- 110+ conditional instruction strings, 16-25K tokens at runtime
- Skills loaded via meta Skill tool call, not embedded in system prompt
- Tool permissions scoped per skill invocation via
contextModifier(least-privilege) - 40+ conditional system reminders injected based on context
- CLAUDE.md as project-level override, dynamically injected
Cursor
- Minimal edits mandate:
// ... existing code ...markers to prevent full rewrites .cursor/rules/for persistent project context,.cursor/commands/for reusable workflows- Natural language skill matching — skill descriptions are themselves the routing signal
- Hooks-based iteration loops for agent autonomy with bounds
Devin
- Full-stack engineer scope: planning, coding, testing, deployment
- Citation-mandatory: file path + line numbers for every claim ("code archaeology" framing)
- Planning phase before each major action serves as goal re-injection
Implications for AI Agent Platform Design
-
System prompts are software, not prose. Treat them as conditionally assembled code with version control, testing, and gradual rollout. Claude Code's 110+ instruction strings are individually testable units.
-
Tool descriptions are the highest-leverage prompt surface. A well-written tool description prevents more errors than pages of behavioral instructions. Invest engineering time in tool descriptions proportional to their impact on agent accuracy.
-
On-demand loading is the scaling pattern. Do not load all tools, all skills, and all context into every prompt. Load identity + rules always; load everything else only when needed. This is how you scale to hundreds of tools without degrading accuracy.
-
Instruction hierarchy must be both trained and enforced. Training-level hierarchy defenses are necessary but breakable. Layer runtime defenses (input filtering, output validation, structured queries) on top.
-
Multi-turn coherence requires architectural solutions, not just better prompts. Bounded iteration loops, verifiable checkpoints (tests), sub-agent isolation, and periodic goal re-injection are engineering patterns, not prompt tricks.
-
Reduce prompt length over time. As models absorb conventions through training, remove redundant instructions. The optimal system prompt shrinks as the model improves — but safety constraints should tighten, not loosen.
Sources
- Piebald-AI/claude-code-system-prompts — complete Claude Code prompt decomposition
- Anthropic Engineering: Effective Context Engineering for AI Agents
- arXiv:2404.13208 — The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
- Cursor: Agent Best Practices (official blog)
- MCP Specification 2025-11-25 and MCP Best Practices
- USENIX Security 2025: Structured Queries Defense against Prompt Injection
- OWASP LLM01:2025 — Prompt Injection
- arXiv:2505.11821 — Multi-Turn RL for Long-Horizon Agent Tasks

