12プラグイン、41エージェント、27スキルのClaude Codeエージェントオーケストレーションシステム。
A modular runtime and orchestration system for AI agents.
Renamed from
awesome-slash— Theawesome-prefix implies a curated list of links, but this project is a functional software suite and runtime. Please update your installs:npm install -g agentsys
14 plugins · 43 agents · 30 skills (across all repos) · 26k lines of lib code · 3,357 tests · 3 platforms
Plugins distributed as standalone repos under agent-sh org — agentsys is the marketplace & installer
Commands · Installation · Website · Discussions
Built for Claude Code · Codex CLI · OpenCode
New skills, agents, and integrations ship constantly. Follow for real-time updates:
AI models can write code. That's not the hard part anymore. The hard part is everything around it — task selection, branch management, code review, artifact cleanup, CI, PR comments, deployment. AgentSys is the runtime that orchestrates agents to handle all of it — structured pipelines, gated phases, specialized agents, and persistent state that survives session boundaries.
Building custom skills, agents, hooks, or MCP tools? agnix is the CLI + LSP linter that catches config errors before they fail silently - real-time IDE validation, auto suggestions, auto-fix, and 155 rules for Cursor, Claude Code, Cline, Copilot, Codex, Windsurf, and more.
An agent orchestration system — 14 plugins, 43 agents, and 30 skills that compose into structured pipelines for software development. Each plugin lives in its own standalone repo under the agent-sh org. agentsys is the marketplace and installer that ties them together.
Each agent has a single responsibility, a specific model assignment, and defined inputs/outputs. Pipelines enforce phase gates so agents can't skip steps. State persists across sessions so work survives interruptions.
The system runs on Claude Code, OpenCode, and Codex CLI. Install via the marketplace or the npm installer, and the plugins are fetched automatically from their repos.
Code does code work. AI does AI work.
Certainty levels exist because not all findings are equal:
| Level | Meaning | Action |
|---|---|---|
| HIGH | Definitely a problem | Safe to auto-fix |
| MEDIUM | Probably a problem | Needs context |
| LOW | Might be a problem | Needs human judgment |
This came from testing on 1,000+ repositories.
| Command | What it does |
|---|
Each command works standalone. Together, they compose into end-to-end pipelines.
0 skills included across the plugins:
| Category | Skills |
|---|
Skills are the reusable implementation units. Agents invoke skills; commands orchestrate agents. When you install a plugin, its skills become available to all agents in that session.
| Section | What's there |
|---|---|
| The Approach | Why it's built this way |
| Commands | All 13 commands overview |
| Skills | 30 skills across plugins |
| Command Details | Deep dive into each command |
| How Commands Work Together | Standalone vs integrated |
| Design Philosophy | The thinking behind the architecture |
| Installation | Get started |
| Research & Testing | What went into building this |
| Documentation | Links to detailed docs |
Purpose: Complete task-to-production automation.
What happens when you run it:
Phase 9 uses the orchestrate-review skill to spawn parallel reviewers (code quality, security, performance, test coverage) plus conditional specialists.
Agents involved:
| Agent | Model | Role |
|---|---|---|
| task-discoverer | sonnet | Finds and ranks tasks from your source |
| worktree-manager | haiku | Creates git worktrees and branches |
| exploration-agent | opus | Deep codebase analysis before planning |
| planning-agent | opus | Designs step-by-step implementation plan |
| implementation-agent | opus | Writes the actual code |
| test-coverage-checker | sonnet | Validates tests exist and are meaningful |
| delivery-validator | sonnet | Final checks before shipping |
| ci-monitor | haiku | Watches CI status |
| ci-fixer | sonnet | Fixes CI failures and review comments |
| simple-fixer | haiku | Executes mechanical edits |
Cross-plugin agent:
| Agent | Plugin | Role |
|---|---|---|
| deslop-agent | deslop | Removes AI artifacts before review |
| sync-docs-agent | sync-docs | Updates documentation |
Usage:
/next-task # Start new workflow
/next-task --resume # Resume interrupted workflow
/next-task --status # Check current state
/next-task --abort # Cancel and cleanup
Purpose: Lint agent configurations before they break your workflow. The first dedicated linter for AI agent configs.
agnix is a standalone open-source project that provides the validation engine. This plugin integrates it into your workflow.
The problem it solves:
Agent configurations are code. They affect behavior, security, and reliability. But unlike application code, they have no linting. You find out your SKILL.md is malformed when the agent fails. You discover your hooks have security issues when they're exploited. You realize your CLAUDE.md has conflicting rules when the AI behaves unexpectedly.
agnix catches these issues before they cause problems.
What it validates:
| Category | What It Checks |
|---|---|
| Structure | Required fields, valid YAML/JSON, proper frontmatter |
| Security | Prompt injection vectors, overpermissive tools, exposed secrets |
| Consistency | Conflicting rules, duplicate definitions, broken references |
| Best Practices | Tool restrictions, model selection, trigger phrase quality |
| Cross-Platform | Compatibility across Claude Code, Cursor, Copilot, Codex, OpenCode, Gemini CLI, Cline, and more |
155 validation rules (57 auto-fixable) derived from:
Supported files:
| File Type | Examples |
|---|---|
| Skills | SKILL.md, */SKILL.md |
| Memory | CLAUDE.md, AGENTS.md, .github/CLAUDE.md |
| Hooks | .claude/settings.json, hooks configuration |
| MCP | *.mcp.json, MCP server configs |
| Cursor | .cursor/rules/*.mdc, .cursorrules |
| Copilot | .github/copilot-instructions.md |
CI/CD Integration:
agnix outputs SARIF format for GitHub Code Scanning. Add it to your workflow:
- name: Lint agent configs
run: agnix --format sarif > results.sarif
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
Usage:
/agnix # Validate current project
/agnix --fix # Auto-fix fixable issues
/agnix --strict # Treat warnings as errors
/agnix --target claude-code # Only Claude Code rules
/agnix --format sarif # Output for GitHub Code Scanning
Agent: agnix-agent (sonnet model)
External tool: Requires agnix CLI
npm install -g agnix # Install via npm
# or
cargo install agnix-cli # Install via Cargo
# or
brew install agnix # Install via Homebrew (macOS)
Why use agnix:
Purpose: Takes your current branch from "ready to commit" to "merged PR."
What happens when you run it:
Platform Detection:
| Type | Detected |
|---|---|
| CI | GitHub Actions, GitLab CI, CircleCI, Jenkins, Travis |
| Deploy | Railway, Vercel, Netlify, Fly.io, Render |
| Project | Node.js, Python, Rust, Go, Java |
Review Comment Handling:
Every comment gets addressed. No exceptions. The workflow categorizes comments and handles each:
If something can't be fixed, the workflow replies explaining why and resolves the thread.
Usage:
/ship # Full workflow
/ship --dry-run # Preview without executing
/ship --strategy rebase # Use rebase instead of squash
Purpose: Finds AI slop—debug statements, placeholder text, verbose comments, TODOs—and removes it.
How detection works:
Three phases run in sequence:
Phase 1: Regex Patterns (HIGH certainty)
console.log, print(), dbg!(), println!()// TODO, // FIXME, // HACKPhase 2: Multi-Pass Analyzers (MEDIUM certainty)
Phase 3: CLI Tools (LOW certainty, optional)
Languages supported: JavaScript/TypeScript, Python, Rust, Go, Java
Usage:
/deslop # Report only (safe)
/deslop apply # Fix HIGH certainty issues
/deslop apply src/ 10 # Fix 10 issues in src/
Thoroughness levels:
quick - Phase 1 only (fastest)normal - Phase 1 + Phase 2 (default)deep - All phases if tools availablePurpose: Structured performance investigation with baselines, profiling, and evidence-backed decisions.
10-phase methodology (based on recorded real performance investigation sessions):
Agents and skills:
| Component | Role |
|---|---|
| perf-orchestrator | Coordinates all phases |
| perf-theory-gatherer | Generates hypotheses from git history and code |
| perf-theory-tester | Validates hypotheses with controlled experiments |
| perf-analyzer | Synthesizes findings into recommendations |
| perf-code-paths | Maps entrypoints and likely hot paths |
| perf-investigation-logger | Structured evidence logging |
Usage:
/perf # Start new investigation
/perf --resume # Resume previous investigation
Phase flags (advanced):
/perf --phase baseline --command "npm run bench" --version v1.2.0
/perf --phase breaking-point --param-min 1 --param-max 500
/perf --phase constraints --cpu 1 --memory 1GB
/perf --phase hypotheses --hypotheses-file perf-hypotheses.json
/perf --phase optimization --change "reduce allocations"
/perf --phase decision --verdict stop --rationale "no measurable improvement"
Purpose: Compares your documentation and plans to what's actually in the code.
The problem it solves:
Your roadmap says "user authentication: done." But is it actually implemented? Your GitHub issue says "add dark mode." Is it already in the codebase? Plans drift from reality. This command finds the drift.
How it works:
JavaScript collectors gather data (fast, token-efficient)
Single Opus call performs semantic analysis
auth/, login.js, session.ts)Why this approach:
Multi-agent collection wastes tokens on coordination. JavaScript collectors are fast and deterministic. One well-prompted LLM call does the actual analysis. Result: 77% token reduction vs multi-agent approaches.
Tested on 1,000+ repositories before release.
Usage:
/drift-detect # Full analysis
/drift-detect --depth quick # Quick scan
Purpose: Multi-agent code review that iterates until issues are resolved.
What happens when you run it:
Up to 10 specialized role-based agents run based on your project:
| Agent | When Active | Focus Area |
|---|---|---|
| code-quality-reviewer | Always | Code quality, error handling |
| security-expert | Always | Vulnerabilities, auth, secrets |
| performance-engineer | Always | N+1 queries, memory, blocking ops |
| test-quality-guardian | Always | Coverage, edge cases, mocking |
| architecture-reviewer | If 50+ files | Modularity, patterns, SOLID |
| database-specialist | If DB detected | Queries, indexes, transactions |
| api-designer | If API detected | REST, errors, pagination |
| frontend-specialist | If frontend detected | Components, state, UX |
| backend-specialist | If backend detected | Services, domain logic |
| devops-reviewer | If CI/CD detected | Pipelines, configs, secrets |
Findings are collected and categorized by severity (critical/high/medium/low). All non-false-positive issues get fixed automatically. The loop repeats until no open issues remain.
Usage:
/audit-project # Full review
/audit-project --quick # Single pass
/audit-project --resume # Resume from queue file
/audit-project --domain security # Security focus only
/audit-project --recent # Only recent changes
Purpose: Analyzes your prompts, plugins, agents, docs, hooks, and skills for improvement opportunities.
Seven analyzers run in parallel:
| Analyzer | What it checks |
|---|---|
| plugin-enhancer | Plugin structure, MCP tool definitions, security patterns |
| agent-enhancer | Agent frontmatter, prompt quality |
| claudemd-enhancer | CLAUDE.md/AGENTS.md structure, token efficiency |
| docs-enhancer | Documentation readability, RAG optimization |
| prompt-enhancer | Prompt engineering patterns, clarity, examples |
| hooks-enhancer | Hook frontmatter, structure, safety |
| skills-enhancer | SKILL.md structure, trigger phrases |
Each finding includes:
Auto-learning: Detects obvious false positives (pattern docs, workflow gates) and saves them for future runs. Reduces noise over time without manual suppression files.
Usage:
/enhance # Run all analyzers
/enhance --focus=agent # Just agent prompts
/enhance --apply # Apply HIGH certainty fixes
/enhance --show-suppressed # Show what's being filtered
/enhance --no-learn # Analyze but don't save false positives
Purpose: Builds an AST-based map of symbols and imports for fast repo analysis.
What it generates:
Output is cached at {state-dir}/repo-map.json and exposed via the MCP repo_map tool.
Why it matters:
Tools like /drift-detect and planners can use the map instead of re-scanning the repo every time.
Usage:
/repo-map init # First-time map generation
/repo-map update # Incremental update
/repo-map status # Check freshness
Required: ast-grep (sg) must be installed.
Purpose: Sync documentation with actual code changes—find outdated refs, update CHANGELOG, flag stale examples.
The problem it solves:
You refactor auth.js into auth/index.js. Your README still says import from './auth'. You rename a function. Three docs still reference the old name. You ship a feature. CHANGELOG doesn't mention it. Documentation drifts from code. This command finds the drift.
What it detects:
| Category | Examples |
|---|---|
| Broken references | Imports to moved/renamed files, deleted exports |
| Version mismatches | Doc says v2.0, package.json says v2.1 |
| Stale code examples | Import paths that no longer exist |
| Missing CHANGELOG | feat: and fix: commits without entries |
Auto-fixable vs flagged:
| Auto-fixable (apply mode) | Flagged for review |
|---|---|
| Version number updates | Removed exports referenced in docs |
| CHANGELOG entries for commits | Code examples needing context |
| Function renames |
Usage:
/sync-docs # Check what docs need updates (safe)
/sync-docs apply # Apply safe fixes
/sync-docs report src/ # Check docs related to src/
/sync-docs --all # Full codebase scan
Purpose: Research any topic online and create a comprehensive learning guide with RAG-optimized indexes.
What it does:
Depth levels:
| Depth | Sources | Use Case |
|---|---|---|
| brief | 10 | Quick overview |
| medium | 20 | Default, balanced |
| deep | 40 | Comprehensive |
Output structure:
agent-knowledge/
CLAUDE.md # Master index (updated each run)
AGENTS.md # Index for OpenCode/Codex
recursion.md # Topic-specific guide
resources/
recursion-sources.json # Source metadata with quality scores
Usage:
/learn recursion # Default (20 sources)
/learn react hooks --depth=deep # Comprehensive (40 sources)
/learn kubernetes --depth=brief # Quick overview (10 sources)
/learn python async --no-enhance # Skip enhancement pass
Agent: learn-agent (opus model for research quality)
Purpose: Get a second opinion from another AI CLI tool without leaving your current session.
What it does:
--continue)Supported tools:
| Tool | Default Model (high) | Reasoning Control |
|---|---|---|
| Claude | claude-opus-4-6 | max-turns |
| Gemini | gemini-3.1-pro-preview | built-in |
| Codex | gpt-5.3-codex | model_reasoning_effort |
| OpenCode | (user-selected or default) | --variant |
| Copilot | (default) | none |
Usage:
/consult "Is this the right approach?" --tool=gemini --effort=high
/consult "Review for performance issues" --tool=codex
/consult "Suggest alternatives" --tool=claude --effort=max
/consult "Continue from where we left off" --continue
/consult "Explain this error" --context=diff --tool=gemini
Agent: consult-agent (sonnet model for orchestration)
Purpose: Stress-test ideas through structured multi-round debate between two AI CLI tools.
What it does:
Usage:
# Natural language
/debate codex vs gemini about microservices vs monolith
/debate with claude and codex about our auth implementation
/debate thoroughly gemini vs codex about database schema design
/debate codex vs gemini 3 rounds about event sourcing
# Explicit flags
/debate "Should we use event sourcing?" --tools=claude,gemini --rounds=3 --effort=high
/debate "Valkey vs PostgreSQL for caching" --tools=codex,opencode
# With codebase context
/debate "Is our current approach correct?" --tools=gemini,codex --context=diff
Options:
| Flag | Description |
|---|---|
--tools=TOOL1,TOOL2 | Proposer and challenger (comma-separated) |
--rounds=N | Number of debate rounds, 1–5 (default: 2) |
--effort=low|medium|high|max | Reasoning depth per tool call |
--context=diff|file=PATH|none | Codebase context passed to both tools |
Agent: debate-orchestrator (opus model for orchestration)
Purpose: Browser automation for AI agents - navigate, authenticate, and interact with web pages.
How it works:
Each invocation is a single Node.js process using Playwright. No daemon, no MCP server. Session state persists via Chrome's userDataDir with AES-256-GCM encrypted storage.
Agent calls skill -> node scripts/web-ctl.js <args> -> Playwright API -> JSON result
Session lifecycle:
session start <name> - Create session (encrypted profile directory)session auth <name> --url <login-url> - Opens headed Chrome for human login (2FA, CAPTCHAs). Polls for success URL/selector, encrypts cookies on completionrun <name> <action> - Headless actions using persisted cookiessession end <name> - CleanupActions:
| Action | Description | Key flag |
|---|---|---|
goto <url> | Navigate to URL | |
snapshot | Get accessibility tree (primary page inspection) | |
click <sel> | Click element | --wait-stable |
click-wait <sel> | Click and wait for DOM + network stability | --timeout <ms> |
type <sel> <text> | Type with human-like delays | |
read <sel> | Read element text content | |
fill <sel> <value> | Clear field and set value | |
wait <sel> | Wait for element to appear | --timeout <ms> |
evaluate <js> | Execute JS in page context | --allow-evaluate |
screenshot | Full-page screenshot | --path <file> |
network | Capture network requests | --filter <pattern> |
checkpoint | Open headed browser for user (CAPTCHAs) | --timeout <sec> |
click-wait waits for network idle + no DOM mutations for 500ms before returning. Cuts SPA interactions from multiple agent turns to one.
Error handling:
All errors return classified codes with actionable recovery suggestions:
| Code | Recovery suggestion |
|---|---|
element_not_found | Snapshot included in response for selector discovery |
timeout | Increase --timeout |
browser_closed | session start <name> |
network_error | Check URL; verify cookies with session status |
no_display | Use --vnc flag |
session_expired | Re-authenticate |
Security: Output sanitization (cookies/tokens redacted), prompt injection defense ([PAGE_CONTENT: ...] delimiters), AES-256-GCM encryption at rest, anti-bot measures (webdriver=false, random delays), read-only agent (no Write/Edit tools).
Selector syntax: role=button[name='Submit'], css=div.class, text=Click here, #id
Usage:
/web-ctl goto https://example.com
/web-ctl auth twitter --url https://x.com/i/flow/login
/web-ctl # describe what you want to do, agent orchestrates it
Install:
agentsys install web-ctl
npm install playwright
npx playwright install chromium
Agent: web-session (sonnet model)
Skills: web-auth (human-in-the-loop auth), web-browse (headless actions)
Standalone use:
/deslop apply # Just clean up your code
/sync-docs # Just check if docs need updates
/ship # Just ship this branch
/audit-project # Just review the codebase
Integrated workflow:
When you run /next-task, it orchestrates everything:
/next-task picks task → explores codebase → plans implementation
↓
implementation-agent writes code
↓
deslop-agent cleans AI artifacts
↓
Phase 9 review loop iterates until approved
↓
delivery-validator checks requirements
↓
sync-docs-agent syncs documentation
↓
[/ship](#ship) creates PR → monitors CI → merges
The workflow tracks state so you can resume from any point.
Frontier models write good code. That's solved. What's not solved:
1. One agent, one job, done extremely well
Same principle as good code: single responsibility. The exploration-agent explores. The implementation-agent implements. Phase 9 spawns multiple focused reviewers. No agent tries to do everything. Specialized agents, each with narrow scope and clear success criteria.
2. Pipeline with gates, not a monolith
Same principle as DevOps. Each step must pass before the next begins. Can't push before review. Can't merge before CI passes. Hooks enforce this—agents literally cannot skip phases.
3. Tools do tool work, agents do agent work
If static analysis, regex, or a shell command can do it, don't ask an LLM. Pattern detection uses pre-indexed regex. File discovery uses glob. Platform detection uses file existence checks. The LLM only handles what requires judgment.
4. Agents don't need to know how tools work
The slop detector returns findings with certainty levels. The agent doesn't need to understand the three-phase pipeline, the regex patterns, or the analyzer heuristics. Good tool design means the consumer doesn't need implementation details.
5. Build tools where tools don't exist
Many tasks lack existing tools. JavaScript collectors for drift-detect. Multi-pass analyzers for slop detection. The result: agents receive structured data, not raw problems to figure out.
6. Research-backed prompt engineering
Documented techniques that measurably improve results:
7. Validate plan and results, not every step
Approve the plan. See the results. The middle is automated. One plan approval unlocks autonomous execution through implementation, review, cleanup, and shipping.
8. Right model for the task
Match model capability to task complexity:
Quality compounds. Poor exploration → poor plan → poor implementation → review cycles. Early phases deserve the best model.
9. Persistent state survives sessions
Two JSON files track everything: what task, what phase. Sessions can die and resume. Multiple sessions run in parallel on different tasks using separate worktrees.
10. Delegate everything automatable
Agents don't just write code. They:
If it can be specified, it can be delegated.
11. Orchestrator stays high-level
The main workflow orchestrator doesn't read files, search code, or write implementations. It launches specialized agents and receives their outputs. Keeps the orchestrator's context window available for coordination rather than filled with file contents.
12. Composable, not monolithic
Every command works standalone. /deslop cleans code without needing /next-task. /ship merges PRs without needing the full workflow. Pieces compose together, but each piece is useful on its own.
/plugin marketplace add agent-sh/agentsys
/plugin install next-task@agentsys
/plugin install ship@agentsys
npm install -g agentsys && agentsys
Interactive installer for Claude Code, OpenCode, and Codex CLI.
# Non-interactive install
agentsys --tool claude # Single tool
agentsys --tools "claude,opencode" # Multiple tools
agentsys --development # Dev mode (bypasses marketplace)
Required:
For GitHub workflows:
gh) authenticatedFor GitLab workflows:
glab) authenticatedFor /repo-map:
sg) installedFor /agnix:
cargo install agnix-cli or brew install agnix)Local diagnostics (optional):
npm run detect # Platform detection (CI, deploy, project type)
npm run verify # Tool availability + versions
The system is built on research, not guesswork.
Knowledge base (agent-docs/): 8,000 lines of curated documentation from Anthropic, OpenAI, Google, and Microsoft covering:
Testing:
Methodology:
/perf investigation phases based on recorded real performance investigation sessions| Topic | Link |
|---|---|
| Installation | docs/INSTALLATION.md |
| Cross-Platform Setup | docs/CROSS_PLATFORM.md |
| Usage Examples | docs/USAGE.md |
| Architecture | docs/ARCHITECTURE.md |
| Workflow | Link |
|---|---|
| /next-task Flow | docs/workflows/NEXT-TASK.md |
| /ship Flow | docs/workflows/SHIP.md |
| Topic | Link |
|---|---|
| Slop Patterns | docs/reference/SLOP-PATTERNS.md |
| Agent Reference | docs/reference/AGENTS.md |
MIT License | Made by Avi Fenesh
互換性
トピック