AI Agent Skill Acquisition and Self-Improvement Architectures

Executive Summary

The way AI agents acquire and apply new capabilities is undergoing a fundamental shift. Rather than baking every skill into model weights or stuffing instructions into a single monolithic system prompt, the field is converging on a layered model: agents load procedural knowledge on demand from composable, portable skill packages — and the most advanced systems are beginning to write, evaluate, and replace those packages themselves.

Two parallel developments define the 2025–2026 state of the art. At the practical end, Anthropic's open Agent Skills specification (released December 2025) established a vendor-neutral standard for SKILL.md packages now adopted as a runtime/platform standard by Microsoft, OpenAI, Cursor, and GitHub — with companies like Atlassian and Figma participating as partner skill publishers rather than runtime adopters. At the research frontier, Meta's HyperAgents (March 2026) demonstrated that agents can modify not just their task-solving code but their own improvement mechanisms — a recursive loop previously considered an unsolved theoretical problem. Together these two threads — ecosystem-level skill portability and model-level metacognitive self-modification — describe where agent capability development is headed.

For teams building production agent systems today, the practical takeaway is that skill architecture is now a first-class design concern. How skills are packaged, discovered, trusted, and eventually retired matters as much as which foundation model sits underneath them.

The Skill Packaging Abstraction

From Prompt Stuffing to Progressive Disclosure

Early agent systems handled capability expansion the obvious way: add more instructions to the system prompt. The problem is linear token cost. Each new capability adds hundreds of tokens loaded on every request, whether relevant or not. An agent with twenty capabilities might burn four thousand tokens before reading the user's first message.

The Agent Skills paradigm inverts this. A skill is a self-contained bundle — typically a SKILL.md description file, optional helper scripts, and reference materials — stored on disk and loaded only when the agent determines it is needed. The agent begins each session with a compact index of available skills, selects relevant ones based on the incoming task, and loads their full content into context on demand.

This progressive disclosure pattern, pioneered in practice by Claude Code's skill system and formalized by Anthropic's December 2025 open standard, yields measurable results. Research comparing multi-agent orchestration to single-agent-with-skills architectures found a 54% reduction in token usage and 50% reduction in latency on equivalent benchmarks, by internalizing agent behaviors as selectable skills and eliminating inter-agent communication overhead.

The agentskills.io Open Standard

Anthropic published the agentskills.io specification in December 2025 as a vendor-neutral format, and adoption was rapid. The specification defines:

Naming conventions: lowercase, hyphen-separated, 64-character maximum
SKILL.md structure: frontmatter with metadata, capability description, trigger conditions, usage instructions, and optional references to supporting files
Progressive loading contract: agents receive skill names and one-line descriptions in their base context; full content is loaded on demand
Cross-platform portability: skills created for Claude Code work with Codex, Spring AI, and pydantic-ai-skills without modification

Microsoft's .NET AI Skills Executor, released in February 2026, brought the pattern to the Azure OpenAI ecosystem. Spring AI's generic agent skills module (January 2026) made skills a first-class abstraction in the Java enterprise stack. The SkillsMP marketplace launched as a distribution channel for community-contributed skills, accumulating over 12,000 indexed packages within three months.

The practical effect is that "skill" has become a unit of organizational knowledge that travels across model providers, languages, and runtimes. A team that codifies its deployment runbook as a skill can use it with Claude today and OpenAI tomorrow without rewriting anything.

Skill Acquisition Methods

Human Authorship (Current Baseline)

The dominant production pattern remains human-authored skills: practitioners observe recurring agent tasks, extract the procedural logic, write it as a SKILL.md file, commit it to version control, and deploy it to the agent's skill directory. This approach is auditable, testable, and governable — properties that matter in production — but it scales poorly. Writing skills is expert work, and the bottleneck is human attention.

Reinforcement Learning with Skill Libraries

Research from 2025 explored training agents to select and sequence skills using RL, treating the skill library as an action space. CycleQD is a representative example: it applies the Quality Diversity (MAP-Elites) algorithm in a cyclic fashion, optimizing one skill's performance metric in isolation while treating other skills as behavioral descriptors. The result is a library of diverse, non-redundant specialists that collectively cover a task space more efficiently than a single generalist.

The limitation is that skills learned through RL remain model-internal. The agent learns which skill invocations lead to reward, but the learned associations live in model weights — they cannot be inspected, shared with another agent, or governed by a policy. Bridging the gap between internal skill selection and externalized, auditable skill artifacts is an active research problem.

Automated Skill Discovery (EXIF / SEAgent)

A more ambitious approach generates entirely new skills from environmental exploration. The EXIF framework (2025) uses a two-agent setup: an exploration agent (Alice) navigates an environment and surfaces tasks that are achievable but non-trivial, then uses that trajectory data to train a target agent (Bob) on the discovered skills. The exploration-first strategy addresses a core challenge in automated skill generation — most LLM-proposed tasks are either already trivially solved or structurally infeasible, providing no useful learning signal.

SEAgent (Self-Evolving Computer Use Agent, 2025) takes a related approach for GUI automation: the agent builds an experience library from past task executions, retrieves relevant episodes during new tasks, and gradually accumulates a repertoire of reusable interaction patterns. The skills are implicit in the retrieval index rather than explicit in human-readable files, trading inspectability for acquisition speed.

Mining Open-Source Repositories for Skills

A March 2026 paper introduced a framework for automated skill extraction from large-scale open-source agentic repositories. The system ingests GitHub repositories containing agent implementations, identifies recurring procedural patterns (tool call sequences, error handling idioms, planning heuristics), and synthesizes candidate SKILL.md files that human reviewers can approve or reject. Early results showed that 60–70% of synthesized skills passed review without modification, dramatically reducing the human effort required to build a production skill library.

The SkillX project (April 2026) pushed this further, building an automatic pipeline that constructs skill knowledge bases from code repositories without human review, targeting use cases where speed matters more than auditability.

HyperAgents: Metacognitive Self-Modification

The most significant research result of early 2026 in this space is Meta's HyperAgents, published in March (arxiv:2603.19461). HyperAgents extend the Darwin Gödel Machine (DGM) into DGM-H by collapsing the task agent and the meta agent into a single, jointly editable codebase.

The Three-Loop Architecture

Prior self-improving systems separated task execution from meta-improvement: the task agent solved problems, the meta agent modified the task agent, but the meta agent itself was fixed by human design. This creates an implicit ceiling — the improvement mechanism can only do what its human designers anticipated.

DGM-H adds a third loop:

Task Execution Loop: Standard perception-reason-act cycle using foundation model calls and tool invocations
Evaluation Loop: Executable evaluation functions (unit tests, simulation scores, alignment metrics) produce measurable feedback on agent variants
Meta-Modification Loop: The system modifies both its task logic AND its improvement logic based on evaluation feedback, stepping through an archive of historically successful variants

Because the meta-modification loop can rewrite the evaluation functions and the archive management code, the system can improve its own improvement process — the property the authors call metacognitive self-modification. The infinite regress problem (who improves the improver?) is addressed by grounding all modifications in executable evaluation signals rather than another meta-agent.

Emergent Engineering Infrastructure

A striking finding is that HyperAgents autonomously developed engineering infrastructure not specified by the researchers: persistent memory for consolidated insights, causal hypothesis tracking, compute-aware planning, and performance trend analysis. These emerged from the system's need to make better self-modification decisions across iterations, not from explicit reward for building them.

Benchmark Performance

On paper review tasks, DGM-H improved from 0.0 to 0.710 on the test set (CI: 0.590–0.750), constructing multi-stage evaluation pipelines with structured checklists and decision rules. Critically, agents trained on paper review and robotics domains transferred to Olympiad-level math grading, achieving an imp@50 score of 0.630, while traditional DGM meta-agents scored 0.0 on the same transfer task. This generalization suggests that metacognitive self-modification learns domain-agnostic improvement strategies, not just task-specific patches.

Security: The Skill Trust Problem

The rapid expansion of community-contributed skill libraries has surfaced a security problem that the field was slow to anticipate. The arxiv:2602.12430 survey found that 26.1% of community-contributed skills contain security vulnerabilities. Independent audits in early 2026 confirmed the picture:

7.1% of indexed skills contained plaintext credentials embedded in reference files
36.7% of MCP servers (a closely related ecosystem) were potentially vulnerable to SSRF attacks
Prompt injection vectors were discovered in skills that accept user-provided text and embed it into downstream tool calls

Governance Frameworks

Several frameworks emerged in response:

OWASP Top 10 for Agentic Applications (December 2025) is the first formal taxonomy of autonomous agent risks. Relevant categories for skill systems include: agentic supply chain vulnerabilities (untrusted skill packages), memory and context poisoning (malicious skills corrupting agent state), unexpected code execution (skills with embedded runnable payloads), and rogue agent behavior (skills that escalate agent permissions beyond their stated scope).

CSA Agentic Trust Framework (February 2026) applies Zero Trust principles to agent skill loading. Its "Intern-to-Principal" maturity model maps directly to skill trust levels: new skills from unknown sources execute in a restricted sandbox with read-only tool access; skills that accumulate a verified track record earn broader permissions. Trust is not granted statically at install time but earned dynamically through demonstrated behavior.

Microsoft Agent Governance Toolkit (April 2026) provides open-source runtime enforcement of these principles, including a semantic intent classifier for detecting goal hijacking, capability sandboxing for MCP security gateways, and DID-based identity with behavioral trust scoring.

The four-tier gate-based permission model proposed in the arxiv:2602.12430 survey offers a practical reference architecture:

Tier	Provenance	Permissions	Review
0	First-party (owner-authored)	Full	None
1	Verified publisher	Read + limited write	Automated scan
2	Community (signed)	Read-only	Human review
3	Unverified	Sandbox only	Quarantine

Implications for Agent Developers

Treat skills as a software artifact class. Skills need the same lifecycle management as code: version control, testing, deprecation policies, and security review. The teams ahead of this curve already have CI/CD pipelines that run automated vulnerability scans on skill packages before deployment.

Design for progressive disclosure from day one. If your agent loads more than 2,000 tokens of instructions unconditionally, you have a prompt-stuffing architecture in disguise. Audit your system prompt and extract capability-specific content into loadable skill packages. The token and latency savings compound quickly as the capability surface grows.

Watch the skill security surface area. Every skill that accepts user-provided input or executes code is a potential injection point. Apply the CSA "Intern-to-Principal" model: new skills from community sources should run with minimal permissions until they demonstrate safe behavior in your environment.

Self-modification is not yet a production pattern. HyperAgents' results are compelling but come with significant caveats: the system requires expensive multi-iteration compute budgets, evaluation functions must be carefully designed (poorly specified rewards produce catastrophic self-modification), and the audit trail for self-modified code is harder to maintain than for human-authored skills. Consider self-improvement techniques for offline capability development — generate candidate skills autonomously, then route them through human review before production deployment.

Skill portability is now a vendor-selection criterion. With the agentskills.io standard adopted across the major platforms, skills you author today should survive model provider switches. If a framework you are evaluating does not support the standard, budget for migration cost when the time comes.

Conclusion

Agent skill acquisition has matured from an open research problem to an engineering discipline. The SKILL.md packaging pattern, backed by a cross-industry open standard, gives practitioners a tractable unit of capability management. Automated skill discovery and repository mining are reducing the bottleneck of human authorship. And at the research frontier, HyperAgents have demonstrated that the boundary between "agent that uses skills" and "agent that improves skills" can be dissolved — with measurable performance gains and striking emergent behavior.

The challenge ahead is governance. A self-improving agent that can author, deploy, and retire its own skills is a qualitatively different system from one that runs a fixed set of human-authored packages. The trust frameworks emerging from OWASP, CSA, and Microsoft provide a foundation, but the hard work of applying them to real production environments — with real organizational risk tolerances — falls to the teams building those systems now.

The field is moving fast enough that the right architecture question is not "should we adopt composable skills?" but "how do we govern a skill ecosystem that is partially machine-generated?" That question will define the next phase of agent platform engineering.