Skills

Skills are slash commands you can invoke directly in Claude Code. Each skill triggers a structured workflow that gathers context, calls a provider, and returns prioritized findings.

/gemini-review works out of the box with the plugin. /codex-review, /ollama-review, and /antigravity-review require their MCP servers to be added separately — see Plugin Overview.

Review Skills

All four review skills follow the same pattern:

Gather staged and unstaged git changes
Read project conventions from CLAUDE.md
Send the diff + context to the provider
Return findings filtered by confidence (80%+ threshold)
Group results: Critical (90%+) vs Important (80-89%)

`/gemini-review`

Get a second opinion from Google Gemini on your current code changes.

text

/gemini-review

Uses Gemini's 1M+ token context window, making it ideal for reviewing changes that touch many files or require understanding a large codebase. Requires an enterprise Gemini seat (gated from 2026-06-18); on other plans use /codex-review or /antigravity-review.

`/codex-review`

Get a second opinion from OpenAI Codex (GPT-5.5) on your current changes.

text

/codex-review

Falls back to GPT-5.4-mini automatically if you hit quota limits.

`/ollama-review`

Get a second opinion from a local Ollama model. No API keys needed — all processing stays on your machine.

text

/ollama-review

Requires Ollama running locally with a model pulled (e.g., qwen3.6:27b).

`/antigravity-review`

Get a subscription-backed second opinion from Google Antigravity (agy) — uses your Google AI Pro/Ultra plan, no per-token API billing.

text

/antigravity-review

Experimental and one-shot (no multi-turn). Requires agy installed + logged in once and the Antigravity MCP server registered (claude mcp add antigravity -- npx -y ask-antigravity-mcp).

Brainstorm Skills

`/brainstorm`

Send a topic to multiple LLM providers AND have Claude Opus perform its own independent research in the same run, then synthesize all findings. The coordinator agent runs:

Phase 3B — Claude Opus research. Claude reads the actual files, traces real code paths, fetches any referenced external docs, and forms independent findings tagged Verified or Inferred. Always runs — Claude is a first-class participant, not just an orchestrator (see ADR-049).
Phase 3A — External dispatch. A single foreground blocking Bash call sends the topic to each requested external provider in parallel and waits for all of them. Up to 10 minutes total (Bash tool max). See ADR-050 for why this isn't a background-job dispatch.
Phase 4 — Synthesis. Combines Claude's findings with the external responses:

Consensus points (where multiple participants agree — Claude verified + external = highest confidence)
Unique insights (findings from only one participant)
Contradictions (verified findings outrank inferred ones)
Actionable recommendations (prioritized by impact and confidence)

text

# Default external providers (Gemini + Codex), plus Claude Opus always
/brainstorm Should we use a monorepo or polyrepo for this project?

# Custom external providers
/brainstorm gemini,codex,ollama Review this authentication approach

Default external providers: gemini,codex (avoids unnecessary Ollama calls if not needed). Claude Opus is always a participant because it runs inside the coordinator — it isn't in the provider list.

`/brainstorm-all`

Shortcut for /brainstorm gemini,codex,ollama,antigravity <topic>. Sends to all four external providers (Gemini, Codex, Ollama, Antigravity) plus the always-on Claude Opus research phase — up to five participants total.

text

/brainstorm-all What's the best caching strategy for our API?

Requires Ollama running locally since it includes the local provider.

Multi-Provider Review Skills

`/multi-review`

Run independent code reviews from Antigravity and Codex in parallel, verify each finding against the source, then present combined consensus / unique / rejected findings. (Gemini via /gemini-review or the gemini-reviewer agent.)

text

/multi-review

Pipeline:

Gather and prepare the diff — git status first; git add -N for untracked files; pathspec exclusion of docs/binaries (:!docs/ :!*.md :!yarn.lock :!*.png); 3-tier size policy (<50KB send as-is, 50–150KB warn about expected wall time, >150KB ask before sending).
Dispatch with fallback — preferred path is the gemini-reviewer and codex-reviewer agents in parallel; falls back to direct Bash dispatch via the plugin's dist/run.js and dist/codex-run.js runners using the ADR-050 dispatch pattern when agents are unavailable.
Verify each finding — for every finding above 80/100 confidence, Read the file at the cited line and check whether the claim is actually true. Classifies as VERIFIED (claim holds), REJECTED (false positive), or UNVERIFIABLE (cannot confirm without runtime). This step exists specifically because confidence scores aren't an oracle — see ADR-064.
Resilient failure handling — when a provider fails (timeout, exit ≠ 0, 0-byte output), surface the failure inline with stderr instead of silently dropping. Partial results are explicit.
Synthesis — combined output with Verified by both, Verified by Gemini only, Verified by Codex only, Rejected (false positives), Unverifiable, and per-provider stats including verification counts.

The verification step protects against the failure mode where one provider returns a high-confidence claim that's contradicted by the actual source — caught and rejected before reaching the user.

`/compare`

Side-by-side raw responses from multiple providers. No synthesis, no consensus extraction, no validation pipeline — just verbatim outputs so you can compare directly.

text

/compare what's the difference between Server-Sent Events and WebSockets?
/compare gemini and codex review @src/auth.ts

Use when:

You want to see how each provider phrases the same answer (style, depth, confidence framing)
You want a sanity check before picking one provider's recommendation
You explicitly want to AVOID Claude synthesizing or weighting the responses

If you want consensus extraction → use /brainstorm instead. If you're reviewing a code diff → use /multi-review instead.

Background Continuous Review

`codex-pair` — PostToolUse hook + `/codex-pair` dashboard

codex-pair has two surfaces:

The PostToolUse hook — fires automatically after every Edit / Write / MultiEdit whenever a project has opted in via a .codex-pair/context.md marker. HIGH and MED concerns appear to Claude as a system reminder on the next turn; LOW concerns are logged. This is the workhorse — the actual review surface.
/codex-pair — a user-invocable slash command that shows current status (active / paused / not configured) and runs interactive setup on first use. Use this when you want to enable codex-pair on a new project (auto-detects context from your README + manifests, drafts the marker, asks you to confirm) or check whether it's currently running. Pairs with /codex-pair-pause and /codex-pair-resume (the imperative toggles).

This is the "hidden" surface of the plugin: the hook ships in every install but is disabled by default until a project opts in. The full mechanism, env vars, and cost characteristics live in Hooks → PostToolUse Hook: codex-pair.

Quick enable — either run /codex-pair (recommended; it auto-detects your project and asks before writing) or create the marker manually:

bash

mkdir -p .codex-pair
cat > .codex-pair/context.md <<'EOF'
# .codex-pair/context.md

<one-paragraph project-purpose summary>

## Domain invariants Codex can't infer from a single file

- <invariant 1 — something the model can't see by reading one file>
- <invariant 2 — a written spec or protocol your code implements>
- <invariant 3 — a concurrency or state-coordination rule>
EOF

Once the marker exists at the project root, every file edit triggers a Codex review with the marker's content as project context. rm -rf .codex-pair/ to disable.

How it differs from /codex-review — in the ADR-077 four-task benchmark (four structurally different task types — CRUD, URL parsing, RFC-spec implementation, stateful business logic — picked so the result would generalize across domains): Claude alone caught 2 of 10 probes; Claude + /codex-review caught 7 of 10; Claude + codex-pair caught 10 of 10. The three probes /codex-review missed exemplified the "looks fine, runs wrong" class its ≥80-confidence filter structurally suppresses — code that compiles and type-checks but produces wrong results at runtime. The recall improvement is task-agnostic; the two surfaces are complementary, not competing.

The decision about when to use the hook is about code characteristics, not project domain:

Use `/codex-review` (precision-first)	Use `codex-pair` (recall-first)
Routine PR review	Code with hidden invariants the model can't infer from one file
Glue code, simple CRUD, refactors	Code where latent bugs cost more than per-edit review (~$0.04–0.07)
You want one comprehensive report	Code evolving fast under written constraints (spec, protocol, ADR)
You're cost-sensitive (~$0.04 per review)	State coordination, concurrency, anything order-sensitive
Default for everything	The "looks fine, runs wrong" failure mode would be expensive to catch later

Skills ​

Review Skills ​

/gemini-review ​

/codex-review ​

/ollama-review ​

/antigravity-review ​

Brainstorm Skills ​

/brainstorm ​

/brainstorm-all ​

Multi-Provider Review Skills ​

/multi-review ​

/compare ​

Background Continuous Review ​

codex-pair — PostToolUse hook + /codex-pair dashboard ​

Skills

Review Skills

`/gemini-review`

`/codex-review`

`/ollama-review`

`/antigravity-review`

Brainstorm Skills

`/brainstorm`

`/brainstorm-all`

Multi-Provider Review Skills

`/multi-review`

`/compare`

Background Continuous Review

`codex-pair` — PostToolUse hook + `/codex-pair` dashboard