AI Agent Plugin and Extension Architecture
Executive Summary
AI agents have rapidly evolved from monolithic, purpose-built systems into composable platforms where capabilities are packaged as discrete, swappable units. The names differ — tools, skills, plugins, integrations, extensions — but the underlying idea is consistent: separate the agent's reasoning core from the domain-specific capabilities it can invoke. This separation enables capabilities to be added, upgraded, and removed without rewriting the core, mirrors established software engineering principles (SOLID, microservices), and allows teams to ship value incrementally as needs evolve.
The field has converged on several durable patterns in 2025-2026. Manifest-driven discovery (where each plugin carries a machine-readable description of what it does and what it needs) has become the de-facto standard, used by Claude Code Skills, MCP servers, Semantic Kernel plugins, Home Assistant integrations, and VS Code extensions alike. Lifecycle hooks — pre-install, post-install, pre-upgrade, post-upgrade, uninstall — have emerged as essential scaffolding for managing state transitions safely across versions. Security has hardened around a "never trust plugin code with core credentials" principle, with credential proxies and capability-scoped tokens becoming the norm in production deployments.
For small teams building AI agent systems, the most actionable insight is to resist over-engineering: a flat directory of SKILL.md files with well-defined contracts is often more maintainable than a plugin marketplace with dynamic loading. Start with simplicity, define clean interfaces early, and add machinery only when the pain of the current approach is clearly felt.
1. Why Plugin Architectures Matter for AI Agents
The Capability Explosion Problem
Modern AI agents are expected to search the web, write code, send emails, query databases, control browsers, call external APIs, post to social media, and more — often within a single session. Adding all these capabilities monolithically leads to several problems:
- The system prompt balloons, consuming context that could hold reasoning
- Every capability update requires touching (and re-testing) the core system
- Capabilities developed for one agent cannot be reused in another
- Security blast radius expands: a vulnerability in the email tool can affect the code execution tool
- Onboarding contributors requires understanding the entire system before touching any part
Plugin architectures address all of these by establishing a clean boundary between the orchestration core and the capability modules that hang off it.
The Spectrum from Monolith to Plugin System
Monolith Bundled Tools Plugin System
----------- ---------------- ------------------
All logic in Tools registered Tools discovered
one prompt/file at startup from at runtime from
a known list a registry/directory
[Agent Core] [Agent Core] [Agent Core]
| / | \ |
All caps Tool1 Tool2 Tool3 [Plugin Registry]
inlined (all known at / | \
compile time) Plugin Plugin Plugin
(loaded on demand)
The spectrum is not a simple "better to the right" — each position has appropriate use cases. A single-purpose agent that only ever does one thing has no need for a plugin registry. But any agent expected to grow its capabilities over time benefits from moving right.
2. Core Architectural Patterns
2.1 Registry-Based Discovery
The registry pattern separates capability declaration from capability loading. Plugins register themselves (via filesystem conventions, a manifest file, or a remote registry endpoint), and the agent core queries the registry to discover what capabilities are available.
Plugin Directory Layout (Filesystem Registry)
---------------------------------------------
skills/
├── web-search/
│ ├── SKILL.md ← manifest (name, description, activation)
│ ├── scripts/
│ │ └── search.js ← executable tools
│ └── references/
│ └── api-docs.md ← reference content injected into context
├── email-sender/
│ ├── SKILL.md
│ └── scripts/
│ └── send.js
└── database-query/
├── SKILL.md
└── scripts/
└── query.js
The filesystem itself is the registry. The agent discovers plugins by walking the directory tree and parsing manifests. This approach (used by Claude Code Skills and Zylos) requires no infrastructure beyond a shared directory and works well for single-host deployments.
For distributed deployments, remote registries (JSON endpoints, npm-style package registries, or purpose-built agent registries like Microsoft Entra Agent Registry) provide the same discovery semantics with network access.
2.2 Manifest-Driven Contracts
Every production plugin system uses some form of manifest — a machine-readable file that describes what the plugin does, what it needs, and how to interact with it. Manifests serve as the contract between the plugin and the core:
Claude Code SKILL.md (YAML frontmatter)
name: web-search
description: Search the web for current information using Brave Search API
activation: on-demand
requires:
- env: BRAVE_API_KEY
MCP Server manifest (JSON)
{
"name": "brave-search",
"version": "1.2.0",
"description": "Web search via Brave Search API",
"tools": [
{
"name": "search",
"description": "Search the web",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string"}
}
}
}
]
}
Home Assistant manifest.json
{
"domain": "hue",
"name": "Philips Hue",
"version": "1.0.0",
"dependencies": ["http"],
"requirements": ["aiohue==4.7.2"],
"config_flow": true
}
VS Code package.json (extension manifest)
{
"name": "my-extension",
"contributes": {
"commands": [{"command": "myext.hello", "title": "Hello"}],
"configuration": {
"properties": {
"myext.enable": {"type": "boolean", "default": true}
}
}
},
"activationEvents": ["onCommand:myext.hello"]
}
All four use the same conceptual structure: identity metadata, capability declaration, dependency specification, and activation conditions. The specific schema differs but the semantics are identical.
2.3 Static vs. Dynamic Loading
Static loading (plugins loaded at startup): Simpler to reason about, faster at runtime, easier to debug. The full set of capabilities is fixed at agent startup. Changes require restart.
Dynamic loading (plugins loaded on demand): More flexible, supports hot-reload, smaller memory footprint at idle. Requires careful concurrency handling and state management.
The right choice depends on deployment context:
| Deployment Context | Recommended Loading | Reason |
|---|---|---|
| Single session agent (Claude CLI) | Static | Session ends before dynamic changes matter |
| Long-running service (bot, assistant) | Hybrid: static core + dynamic optional | Core stable, optional skills swapped without restart |
| Multi-tenant platform | Dynamic with isolation | Different users get different capability sets |
| Development/debugging | Dynamic with hot-reload | Fast iteration on skill logic |
LangGraph/LangChain's MCP integration exemplifies dynamic loading well: new MCP servers can be added or removed at runtime without restarting the agent, enabling seamless updates to the capability set in production.
2.4 Extension Points vs. Full Plugins
Jenkins popularized the extension point model: the core defines abstract interfaces (extension points), and plugins provide concrete implementations. This is stricter than a generic plugin model — plugins can only extend what the core has explicitly designed for extension.
Extension Point Pattern (Jenkins/Grafana model)
-----------------------------------------------
Core defines: Plugin implements:
- SearchProvider - BraveSearchProvider
+ search(query) + search(query) → calls Brave API
+ isAvailable() + isAvailable() → checks API key
- DataStore - PostgreSQLDataStore
+ get(key) + get(key) → SQL SELECT
+ set(key, value) + set(key, value) → SQL INSERT
For AI agent systems, extension points are appropriate for infrastructure concerns (search providers, storage backends, LLM providers) where the interface is stable. Free-form tool plugins are better modeled as opaque units that expose a name, description, and callable function — the agent core does not care about implementation.
3. Lifecycle Management
3.1 The Full Plugin Lifecycle
Install → Configure → Activate → (normal operation) → Deactivate → Upgrade → Deactivate → Uninstall
| | |
pre-install post-activate pre-upgrade
post-install pre-deactivate post-upgrade
Each transition is an opportunity for hooks. Well-designed plugin systems expose lifecycle hooks at every transition:
| Hook | Runs When | Typical Use |
|---|---|---|
pre-install | Before files are written | Validate dependencies, check system requirements |
post-install | After files are written | Initialize database tables, register webhooks, warm caches |
pre-upgrade | Before new version applied | Snapshot current state, run pre-flight checks |
post-upgrade | After new version applied | Run migrations, update config schema, restart dependent services |
pre-uninstall | Before removal | Warn about data loss, export user data |
post-uninstall | After removal | Clean up residual files, deregister webhooks |
3.2 Version Compatibility and Dependency Resolution
Dependency resolution in plugin systems is notoriously difficult. Real-world systems (Jenkins, WordPress, npm, OSGI) have all faced and partially solved this problem. The key design decisions:
Semantic versioning contracts: Define clearly what constitutes a breaking change. If a plugin declares requires: core >= 1.2.0 < 2.0.0, the system can enforce this at install time rather than discovering incompatibility at runtime.
Dependency graphs: Before installing or upgrading, compute a full dependency graph. Detect cycles, version conflicts, and missing dependencies before making any filesystem changes.
Graceful degradation on version mismatch: If a plugin requires core >= 1.5.0 and the current core is 1.4.2, report this clearly rather than silently failing. Some systems allow plugins to declare soft dependencies (suggested) vs. hard dependencies (requires), enabling partial operation when soft dependencies are unavailable.
Upgrade order matters: When upgrading multiple interdependent plugins, upgrade leaf dependencies first, then the plugins that depend on them. Doing this in reverse can leave the system in a broken state mid-upgrade.
3.3 Configuration Migration
One of the hardest lifecycle problems: what happens to a plugin's stored configuration when the plugin is upgraded and the config schema changes? Three strategies:
-
Explicit migrations: The plugin ships migration scripts that transform the old schema to the new schema. Reliable but requires developer discipline.
-
Schema evolution with defaults: New fields are always optional with sensible defaults. Old config files remain valid; new fields use defaults. Simpler but limits breaking changes.
-
Config versioning: The config file carries a
schemaVersionfield. On startup, the plugin reads the version and applies appropriate migration logic. The most robust but most complex approach.
Zylos core v0.2.0 (released 2026-02-21) implements a variant of strategy 3: applyMigrationHints() runs as part of the self-upgrade step, with sync-settings-hooks.js using the upgrade template as a single source of truth to synchronize hooks in settings.json. This pattern — template-driven, idempotent migration — cleanly handles repeated upgrades without state accumulation.
4. Real-World Implementations
4.1 Claude Code Skills
Claude Code's Agent Skills system represents the state of the art in prompt-based plugin architecture. Skills are packaged as directories containing a SKILL.md file with YAML frontmatter, optional scripts, references, and assets.
The design achieves several goals simultaneously:
- Progressive disclosure: Skills are loaded hierarchically. A summary gets loaded first; full details loaded on demand. This manages context budget efficiently.
- Composability without explicit wiring: Skills do not reference each other, yet Claude can use multiple skills together automatically, coordinating their use through natural language understanding.
- Three content types with different semantics:
- Reference content: Knowledge injected into context (style guides, domain knowledge, API docs)
- Task content: Step-by-step instructions for specific workflows (invoked directly as
/skill-name) - Executable tools: Code that runs deterministically for operations better suited to algorithms than LLMs
The key insight in this design is recognizing that LLMs have variable cognitive costs for different operations. Sorting a list via token generation is expensive and unreliable. Running a sorting algorithm is free and guaranteed. Skills allow the right approach for each operation.
4.2 Model Context Protocol (MCP) Servers as Plugins
MCP represents a different architectural position: plugins as independent network services that communicate via a standardized protocol. Rather than code loaded into the agent process, MCP servers are separate processes (or remote services) that the agent connects to via JSON-RPC.
MCP Architecture
----------------
[Agent/Host]
|
[MCP Client] ←→ JSON-RPC over stdio/SSE/WebSocket ←→ [MCP Server]
|
Exposes: Tools
Resources
Prompts
This brings VS Code-style process isolation to AI agent plugins. A misbehaving MCP server cannot crash the agent host. Different servers can be written in different languages. Servers can be shared across multiple agent instances.
The 2025 MCP specification introduced Tasks (tracking long-running server-side work) and Extensions (scenario-specific additions that follow MCP conventions without requiring full protocol integration), making MCP progressively better suited as a general-purpose plugin protocol.
The MCP capability negotiation pattern is worth highlighting: clients and servers explicitly declare their supported features during initialization, rather than assuming a static capability set. This enables graceful degradation when connecting to older or limited servers.
4.3 Semantic Kernel Plugins
Microsoft's Semantic Kernel treats the Kernel as a Dependency Injection container that manages services and plugins. Plugins act as standardized wrappers around agent capabilities, enabling the kernel to discover, invoke, and manage them.
Key architectural features:
- Function calling integration: Plugins expose kernel functions that the LLM can invoke via function calling. The LLM selects which functions to call; Semantic Kernel handles the actual invocation and result aggregation.
- Declarative invocation: The kernel's LLM dynamically chooses appropriate tools based on semantic descriptions, without hardcoded tool selection logic.
- Multi-agent plugin reuse: The same plugin can be registered with multiple agents, enabling consistent capabilities across an agent fleet.
// Semantic Kernel plugin (C# example)
public class WeatherPlugin
{
[KernelFunction("get_weather")]
[Description("Get current weather for a location")]
public async Task<string> GetWeatherAsync(
[Description("City name")] string city)
{
// Implementation
}
}
// Registration
kernel.Plugins.AddFromType<WeatherPlugin>();
4.4 LangChain/LangGraph Tools
LangChain's tool system represents a more imperative approach: tools are Python objects (or decorated functions) registered with an agent. Tool descriptions are embedded in the prompt, and the LLM is instructed to format requests to these tools in a specific way.
The evolution in LangGraph adds dynamic MCP integration: MCP servers can be added or removed at runtime without restarting the agent. This multi-server architecture enables dynamic capability management in production.
LangChain's emphasis on toolkits — bundled sets of tools for specific domains — mirrors the "skill" concept: a web scraping toolkit bundles a browser tool, an HTML parser tool, and a content extractor tool under a single registration surface.
4.5 CrewAI Custom Tools
CrewAI builds plugin extensibility on top of a BaseTool abstract class. All tools — built-in or custom — inherit from this base, ensuring consistent behavior for caching, validation, event emission, and error handling.
Notable design features:
- Automatic result caching: Tool results are cached by default using (tool name, input parameters) as the cache key. This prevents redundant API calls for repeated queries in a session.
- Event emission: Custom tools automatically emit events through CrewAI's event system, enabling monitoring, debugging, and audit trails without tool developers needing to add instrumentation.
- MCP integration: CrewAI Tools natively supports MCP, giving agents access to the growing ecosystem of MCP servers through a single integration point.
4.6 Home Assistant Integrations
Home Assistant's integration system is one of the most mature plugin architectures outside the AI domain and provides highly relevant lessons. Each integration:
- Lives in its own directory within
homeassistant/components/ - Carries a
manifest.jsondeclaring domain, name, version, dependencies, and Python package requirements - Implements a standardized interface (
async_setup,async_unload, platform-specific entities) - Hooks into the HA core event bus for state notifications
The separation between component (the integration entrypoint, e.g., "Philips Hue") and platform (capability extensions within an entity type, e.g., "Hue's Light Platform") is particularly elegant. It allows integrations to provide capabilities across multiple entity types without becoming monolithic.
The HA custom integration system (HACS) further demonstrates how community-developed extensions can coexist with core integrations without forking the base system.
4.7 VS Code Extensions
VS Code's extension model provides perhaps the richest set of lessons for AI agent plugin systems. Key architectural decisions and their rationale:
Process isolation: All extensions run in a dedicated Node.js process, separate from the main editor UI. A misbehaving extension cannot block or crash the core interface.
Lazy loading: Extensions are loaded as late as possible. Extensions declare activationEvents in their manifest; the extension host only loads the extension when a matching event fires. This means unused extensions consume no memory.
Controlled API surface: VS Code does not expose the raw DOM to extensions. Extensions interact only through a stable, versioned API surface. This breaks the tight coupling that plagued traditional Eclipse plugins, where extensions often reached into internal APIs that changed between versions.
Contribution points: Rather than general-purpose hooks, VS Code defines specific contribution points (commands, menus, views, language features) that extensions can add to. This constrains extensions to well-understood extension surfaces while providing richness.
5. Communication Between Plugins
5.1 The Isolation-Coordination Tension
Plugin systems face a fundamental tension: isolation (each plugin operates independently, cannot crash others) versus coordination (plugins often need to share data or trigger each other's behavior). Three patterns resolve this tension at different points on the spectrum:
5.2 Event Bus
An event bus decouples plugins entirely. Plugins publish named events; other plugins subscribe to events they care about. Neither needs to know the other exists.
Plugin A Event Bus Plugin B
--------- --------- ---------
work done → publish("task.complete", data)
↓
subscribers notified → handler(data) invoked
Home Assistant's core is built around exactly this pattern. Every state change, every service call, every device event flows through the HA event bus. Integrations listen for events without knowing what will publish them.
Namespacing is critical for collision avoidance: use plugin-name.event-type rather than bare event names. A plugin that emits ready conflicts with every other plugin that uses the same name. web-search.cache-warm-complete does not.
5.3 Shared State Store
When plugins need to read each other's outputs (rather than react to events), a shared state store provides a clean interface:
Plugin A writes: state.set("weather.london", {temp: 20, conditions: "cloudy"})
Plugin B reads: const weather = state.get("weather.london")
The store acts as the communication medium; plugins are decoupled from each other but coupled to the state schema. Namespacing is again essential. Versioning the schema (or using typed interfaces) prevents subtle bugs when one plugin changes its state shape.
5.4 Direct Invocation with Dependency Declaration
For cases where tight coupling is acceptable (a search plugin that always wants to cache through a specific cache plugin), explicit dependency declaration is cleaner than hoping the event bus carries the right data at the right time:
# In SKILL.md or manifest
depends_on:
- cache-plugin
- auth-plugin
The plugin manager validates that declared dependencies are loaded before activating the declaring plugin, and loads them in the correct order. This mirrors npm's dependency model and avoids the "why isn't my plugin working" confusion of implicit event-based coordination.
5.5 Conflict Resolution
When two plugins compete for the same resource or provide overlapping capabilities, the system needs a resolution strategy:
- Priority/ordering: Plugins declare a priority; the highest-priority plugin gets first call on a resource
- Explicit conflict declaration: Manifests declare which other plugins they conflict with; the manager refuses to activate conflicting plugins simultaneously
- Capability overrides: A plugin can declare it "overrides" a named capability, replacing the default provider
6. Security Architecture
6.1 The Core Threat Model
Plugin systems expand the attack surface in specific ways:
- Malicious third-party plugins: A plugin that appears to add useful capability but exfiltrates credentials, reads private files, or establishes a reverse shell
- Compromised legitimate plugins: A supply-chain attack that modifies a previously-safe plugin
- Prompt injection via plugin outputs: A plugin that returns content designed to hijack the agent's reasoning (e.g., a web search result containing "Ignore previous instructions...")
- Privilege escalation via plugin cooperation: Plugin A legitimately has access to credentials; Plugin B tricks Plugin A into using those credentials on Plugin B's behalf
6.2 Credential Isolation
The most important security principle for AI agent plugin systems: plugins must never have direct access to core credentials.
Insecure Pattern Secure Pattern
---------------- ---------------
[Agent Core] [Agent Core]
| |
.env file ← Plugin reads directly Credential Proxy
| |
OPENAI_KEY Plugin requests token
DB_PASSWORD |
SMTP_CREDS Proxy validates:
- Is plugin authorized?
- Is this credential in scope?
- Is the requested operation allowed?
|
Issues scoped, time-limited token
In practice, for self-hosted single-user systems like Zylos, the threat model is primarily around third-party plugin code, not credential proxying between internal components. The practical implementation is: review plugin code before installation, use process isolation where feasible, and never mount .env files into plugin execution contexts where avoidable.
6.3 Capability-Based Permissions
Capability-based security assigns specific capabilities to plugins at declaration time, rather than granting broad system access:
# Plugin manifest with explicit capability requests
name: email-sender
capabilities:
- network.outbound.smtp
- filesystem.read: ["~/zylos/templates/*"]
- filesystem.write: ["~/zylos/logs/email-*"]
# NOT: filesystem.read.* (too broad)
# NOT: network.outbound.* (too broad)
This is the "principle of least privilege" applied to plugins. A web search plugin should have network access but not filesystem write access. An email plugin should write to logs but not read from the credential store.
Enforcement mechanisms range from OS-level (seccomp, AppArmor) to framework-level (checking capability declarations before invoking plugin code) to honor-system (asking plugins to declare capabilities but not enforcing them). The right level depends on the threat model — honor-system is appropriate for internal plugins, OS-level enforcement for third-party code.
6.4 Sandboxing Approaches
The three main approaches, ranked by isolation strength and operational cost:
| Approach | Isolation | Memory Overhead | Boot Time | Best For |
|---|---|---|---|---|
| MicroVMs (Firecracker) | Strongest | ~5MB per VM | ~125ms | High-risk third-party code |
| gVisor | Strong | Low (syscall interception) | Low | Production AI agent plugins |
| Hardened containers | Moderate | Very low | Negligible | Trusted but isolated plugins |
| Process isolation (Node.js) | Basic | Low | Negligible | Core/extension separation (VS Code model) |
For most AI agent plugin systems in 2025-2026, gVisor or process-level isolation is the practical sweet spot. Full MicroVM isolation is appropriate for cloud-hosted multi-tenant platforms where plugins run arbitrary user code.
6.5 Supply Chain Security
Given that most plugin systems involve installing third-party code:
- Code review before install: Audit what the plugin actually does, especially: network requests (to where?), filesystem access (reading
.env? credential files?), subprocess spawning, and eval/dynamic code execution - Pinned dependencies: Use lockfiles. A plugin that declares
axios: ^1.0.0can be updated to any minor version; pin to exact versions for production - Checksum verification: Registry systems should provide package checksums. Verify before installation.
- Minimal footprint: Prefer plugins with minimal dependencies. Every transitive dependency is a potential attack vector.
7. Configuration and Discovery
7.1 The Manifest as Contract
The manifest is simultaneously the plugin's identity card, capability advertisement, and integration specification. A well-designed manifest schema makes the entire plugin system more predictable.
Minimum viable manifest fields:
name: Unique identifier (must be collision-resistant at ecosystem scale, e.g.,org.author.plugin-name)version: Semantic version (MAJOR.MINOR.PATCH)description: Human-readable summary (used for capability discovery and agent context injection)entry_point: How to invoke the pluginrequires: Core version range and other dependencies
Extended fields for production systems:
capabilities: Explicit capability declarations (for security scoping)activation_events: When to auto-load the plugin (lazy loading)configuration_schema: JSON Schema for plugin configuration (enables UI generation, validation)migration_scripts: Array of migration scripts keyed by version rangehealth_check: How to verify the plugin is working
7.2 Auto-Discovery Mechanisms
Filesystem scanning: Walk a plugins directory, parse manifests in each subdirectory. Simple, reliable, no infrastructure required. Works well for single-host deployments.
Convention over configuration: Plugins placed in a well-known directory are automatically discovered without explicit registration. The ~/.claude/skills/ convention in Claude Code is an example.
Registry queries: For distributed deployments, a central registry (REST API) allows querying available plugins, filtering by capability, and checking version compatibility.
Embedding-based discovery: Emerging pattern (2025): use embeddings to match natural language capability descriptions to available plugins. An agent that needs to "search recent news" can semantically match to a web search plugin even if the plugin's name or description doesn't exactly match "news."
7.3 Plugin Registries
Remote registries introduce several features not available with filesystem-only discovery:
- Centralized metadata: Version history, changelogs, author information, download counts
- Compatibility matrices: "This plugin works with core versions X.Y.Z - A.B.C"
- Security advisories: Notify installed plugin consumers of known vulnerabilities
- Distribution: Serve plugin packages, not just manifests
The challenge for AI agent plugin registries compared to npm or PyPI: the AI plugin ecosystem is younger, more fragmented, and lacks standards for cross-framework compatibility. MCP is the most credible candidate for a universal plugin protocol, but adoption is still growing in 2025-2026. In practice, most AI agent plugin registries today are framework-specific (LangChain Hub, Claude Skills gallery, Semantic Kernel plugin catalog).
8. Practical Patterns for Small Teams
8.1 The Simplicity Spectrum
For a team of 1-5 people building an AI agent system, the correct position on the plugin architecture spectrum is almost always "simpler than you think you need." Over-engineering a plugin system that nobody else will ever contribute to is a significant source of wasted effort.
Complexity vs. Team Size (AI Agent Systems)
--------------------------------------------
1-2 devs, single deployment:
→ Flat skills directory, SKILL.md files, no registry
→ Sufficient for: 10-50 skills, occasional additions
3-10 devs, 2-5 deployments:
→ Skills directory + simple JSON registry
→ Lifecycle hooks (install/upgrade scripts)
→ Sufficient for: 50-200 skills, regular additions
10+ devs, multiple deployments, third-party plugins:
→ Full plugin system: registry, versioned manifests,
capability declarations, automated migration
→ Required for: cross-team or community contributions
8.2 The Zylos Skill Architecture
The Zylos system uses a practical implementation of the filesystem-registry pattern that serves as a good reference for small teams:
~/zylos/.claude/skills/
├── comm-bridge/ ← C4 communication bridge skill
│ ├── SKILL.md ← manifest + instructions
│ ├── scripts/ ← executable tools
│ └── references/ ← reference docs injected into context
├── scheduler/ ← C5 task scheduler skill
│ ├── SKILL.md
│ └── scripts/
│ └── cli.js ← unified CLI interface
├── zylos-memory/ ← memory sync skill
│ └── SKILL.md
└── create-skill/ ← meta-skill for creating new skills
~/zylos/components/ ← optional community components
├── telegram/
├── lark/
└── browser/
The separation between built-in skills (~/.claude/skills/) and optional components (components/) mirrors the core/extension separation in VS Code: core ships with essential capabilities, extensions are opt-in.
The component-management skill implements lifecycle management: zylos add <component> handles download, installation, configuration, and post-install hooks; upgrades run pre-upgrade → update files → post-upgrade → restart services.
8.3 Contract-First Design
Define interfaces before implementation. For an AI agent plugin system, this means:
- Write the SKILL.md (or equivalent manifest) first: What does this plugin do? What does it need from the environment? What does it produce?
- Define the tool schema: What parameters does each tool accept? What does it return?
- Write the tests: What behavior do you expect at the boundaries?
- Implement: Fill in the actual code
This order prevents the most common failure mode — building a plugin that works as a standalone unit but doesn't compose well with the rest of the system because the interface wasn't thought through upfront.
8.4 Version Your Contracts Early
The biggest mistake in plugin systems is treating the plugin interface as an implementation detail. Once other plugins or the core depend on your interface, changing it is a breaking change. Even for small teams:
- Version manifests from day one
- Document what constitutes a breaking change
- Use semantic versioning strictly
- Keep a changelog
The cost of this discipline is low. The cost of ignoring it — when you're on version 8 of the interface and have six plugins in production that all behave slightly differently — is high.
8.5 Avoid These Common Mistakes
God plugins: A plugin that does everything is not a plugin, it is a second monolith. If a plugin is doing more than one thing, split it.
Implicit state sharing: When plugins communicate through global variables or shared mutable state without explicit interfaces, you get the worst of both worlds — coupling without the clarity of explicit dependency declarations.
Skipping uninstall hooks: Installation hooks are common; uninstall hooks are much rarer but just as important. A plugin that creates database tables on install but doesn't clean them up on uninstall leaves garbage that accumulates over time.
Tight version coupling: A plugin that requires exactly core == 1.4.7 will break the moment core is upgraded. Use version ranges with floor and ceiling: core >= 1.4.0 < 2.0.0.
Credential leakage through logs: Plugin log output is often less carefully managed than the plugin's main logic. Ensure that API keys and tokens cannot appear in plugin log output, even in debug mode.
9. The 2025-2026 Convergence: MCP as Universal Substrate
The industry is converging on MCP as the universal plugin protocol for AI agent systems, for several reasons:
Language-agnostic: MCP servers can be written in Python, TypeScript, C#, Java, or any language with a JSON-RPC implementation. LangChain tools are Python-only; Semantic Kernel plugins are C#/Python/Java — MCP is truly universal.
Host-agnostic: The same MCP server can serve Claude, GPT, Gemini, or any MCP-compatible agent host. This dramatically increases the return on investment for plugin developers.
Transport flexibility: MCP supports stdio, SSE, and WebSocket transports. The same server can run locally (stdio), as a network service (SSE), or embedded in a web application (WebSocket).
Capability negotiation: The handshake-time capability negotiation means agents and servers can evolve independently. A new agent feature (say, streaming responses) can be negotiated as an optional capability without breaking older servers.
The practical implication for teams building AI agent plugin systems in 2026: strongly consider implementing new capabilities as MCP servers rather than framework-specific plugins. The upfront cost is a clean JSON-RPC interface rather than a framework-specific function, and the return is portability across the entire ecosystem.
Conclusion
The key takeaways for designing AI agent plugin and extension systems:
-
Manifest-first: Every plugin should carry a machine-readable description of what it does, what it needs, and what versions it's compatible with. This is the single most impactful structural decision.
-
Lifecycle hooks are not optional: Install, upgrade, and uninstall hooks enable safe state transitions and data migration. Skipping them is fine for prototypes; it is a maintenance burden in production.
-
Never give plugins raw credential access: Use scoped tokens, credential proxies, or environment variable injection scoped to the plugin's execution context. This is the security principle most often ignored and most often exploited.
-
Process isolation scales defense-in-depth: Even basic process separation (VS Code model) prevents misbehaving plugins from crashing the core. For third-party code, gVisor or container isolation is worth the operational overhead.
-
Start simple, evolve deliberately: A flat directory of SKILL.md files is the right starting point for most small teams. Add registry, lifecycle management, and capability declarations when the absence of each causes specific, felt pain.
-
Design for MCP compatibility: Even if not implementing MCP today, design plugin interfaces (tool schemas, capability descriptions) in ways that could be exposed as MCP tools later. The protocol is becoming the standard substrate for AI agent extensibility.
The best plugin systems are invisible to users and barely noticed by developers adding new capabilities. That invisibility is the product of careful upfront design, consistent application of a few durable patterns, and disciplined restraint against over-engineering.
References
- Anthropic — Introducing Agent Skills (Claude): https://claude.com/blog/skills
- Anthropic — Equipping Agents for the Real World with Agent Skills: https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
- Anthropic — Claude Code Docs, Extend Claude with skills: https://code.claude.com/docs/en/skills
- Jon Roosevelt — Architecting Extensible AI Agents: A Modular Core with Pluggable Skills and SSE Communication: https://jonroosevelt.com/blog/architecting-modular-ai-agents
- Lee Hanchung — Claude Agent Skills: A First Principles Deep Dive: https://leehanchung.github.io/blogs/2025/10/26/claude-skills-deep-dive/
- Microsoft — Configuring Agents with Semantic Kernel Plugins: https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-functions
- Microsoft — Understanding the Kernel in Semantic Kernel: https://learn.microsoft.com/en-us/semantic-kernel/concepts/kernel
- Microsoft — Designing Multi-Agent Intelligence: https://devblogs.microsoft.com/blog/designing-multi-agent-intelligence
- Microsoft — Guest Blog: Orchestrating AI Agents with Semantic Kernel Plugins: https://devblogs.microsoft.com/semantic-kernel/guest-blog-orchestrating-ai-agents-with-semantic-kernel-plugins-a-technical-deep-dive/
- Model Context Protocol — Architecture Specification: https://modelcontextprotocol.io/specification/2025-06-18/architecture
- Model Context Protocol — Design Philosophy & Engineering Principles: https://modelcontextprotocol.info/docs/concepts/architecture/
- Model Context Protocol — One Year of MCP: November 2025 Spec Release: http://blog.modelcontextprotocol.io/posts/2025-11-25-first-mcp-anniversary/
- Tribe AI — Inside the Machine: How Composable Agents Are Rewiring AI Architecture in 2025: https://www.tribe.ai/applied-ai/inside-the-machine-how-composable-agents-are-rewiring-ai-architecture-in-2025
- Google Cloud — Choose a Design Pattern for Your Agentic AI System: https://docs.cloud.google.com/architecture/choose-design-pattern-agentic-ai-system
- Speakeasy — A Practical Guide to the Architectures of Agentic Applications: https://www.speakeasy.com/mcp/using-mcp/ai-agents/architecture-patterns
- NVIDIA Technical Blog — Practical Security Guidance for Sandboxing Agentic Workflows: https://developer.nvidia.com/blog/practical-security-guidance-for-sandboxing-agentic-workflows-and-managing-execution-risk/
- Northflank — How to Sandbox AI Agents in 2026: MicroVMs, gVisor & Isolation Strategies: https://northflank.com/blog/how-to-sandbox-ai-agents
- Anthropic — Sandboxing (Claude Code Docs): https://code.claude.com/docs/en/sandboxing
- Home Assistant Developer Docs — Integration Architecture: https://developers.home-assistant.io/docs/architecture_components/
- VS Code Extension API — Extension Anatomy: https://code.visualstudio.com/api/get-started/extension-anatomy
- Visual Studio Code — Our Approach to Extensions: https://vscode-docs.readthedocs.io/en/stable/extensions/our-approach/
- Jenkins — Architecture and Plugin System: https://www.jenkins.io/doc/developer/architecture/
- Grafana — Architecture (DeepWiki): https://deepwiki.com/grafana/grafana/2-architecture
- CrewAI — Custom Tool Development: https://deepwiki.com/crewAIInc/crewAI/5.4-custom-tool-development
- CrewAI — Extending Framework Capabilities: https://www.wednesday.is/writing-articles/crewai-custom-tools-extending-framework-capabilities
- LangChain — MCP Integration: https://latenode.com/blog/ai-frameworks-technical-infrastructure/langchain-setup-tools-agents-memory/langchain-mcp-integration-complete-guide-to-mcp-adapters
- Python Packaging User Guide — Creating and Discovering Plugins: https://packaging.python.org/guides/creating-and-discovering-plugins/
- ABP.IO — Event Bus Documentation: https://docs.abp.io/en/abp/latest/Event-Bus
- OpenClaw / Anthropic — Securely Deploying AI Agents: https://platform.claude.com/docs/en/agent-sdk/secure-deployment
- AgentScope — Explore Integration of Agent Skills with Automatic Tool Management via Progressive Disclosure: https://github.com/agentscope-ai/agentscope/issues/1055

