Skip to content

Ollama

Run local LLMs via Ollama's HTTP API. No API keys needed, fully private, zero cost. Uses native fetch against Ollama's local server.

Best for: private, air-gapped review of code that can't leave your machine — zero cost, no API keys, works offline. Not for: frontier-level reasoning. Local 7B models are weaker than hosted frontier models; use Codex when you need maximum capability.

Installation

Run in your terminal:

bash
# Project scope (current project only)
claude mcp add ollama -- npx -y ask-ollama-mcp

# User scope (all projects)
claude mcp add --scope user ollama -- npx -y ask-ollama-mcp

Or install as a plugin (adds slash commands like /multi-review, /brainstorm, /compare, plus reviewer subagents and the opt-in continuous codex-pair review hook):

bash
/plugin marketplace add Lykhoyda/ask-llm
/plugin install ask-llm@ask-llm-plugins

Or install globally:

bash
npm install -g ask-ollama-mcp

Prerequisites

  1. Node.js v20.0.0 or higher
  2. Ollama installed and running locally
  3. A model pulled:
bash
ollama pull qwen3.6:27b

Tools

ToolPurpose
ask-ollamaSend prompts to local Ollama via HTTP. Optional sessionId for multi-turn — server-side conversation replay since Ollama has no native session support (ADR-058)
get-usage-statsPer-session token totals + breakdowns. In-memory (ADR-054)
pingLists locally available Ollama models via /api/tags

ask-ollama returns both human-readable text and a structured AskResponse (provider, response, model, sessionId, usage) via MCP outputSchema — programmatic clients can extract the sessionId and usage fields directly without parsing the response footer (ADR-065). Pass sessionId: "" (empty string) to start a fresh session and have the executor return a new UUID.

Models

  • Default: qwen3.6:27b — Qwen's flagship-level local coding model (~17 GB; needs a capable GPU / plenty of RAM). Set ASK_OLLAMA_MODEL to pick a lighter model.
  • No fallback. Ollama is local — you pull the model you want, so the server never silently substitutes another. If the model isn't installed, you get a clear ollama pull <model> error.

Configuration

Set OLLAMA_HOST environment variable to customize the Ollama server address (defaults to http://localhost:11434).

Sessions

Ollama has no native session support, so the MCP server stores conversation history server-side at /tmp/ask-llm-sessions/<id>.json (ADR-058, hardened in ADR-063):

  • 24-hour TTL — sessions auto-expire
  • 40-message cap — oldest dropped on overflow to bound replay cost
  • Owner-only permissions0o600 on files, 0o700 on directory
  • Atomic temp+rename writes — readers never see partial JSON
  • Symlink rejection via lstatSync — defense-in-depth against /tmp/ race attacks

Each turn replays the full prior conversation (input tokens grow linearly with depth) — but Ollama runs locally so there's no token bill.

Key Features

  • No API keys required
  • Fully local and private — nothing leaves your machine
  • Zero cost per query
  • Server-side session continuity with hardened storage
  • Model auto-detection via /api/tags endpoint
  • No silent model substitution — a clear "pull it first" error if a model isn't installed
  • Structured AskResponse via outputSchema for programmatic clients

npm

Released under the MIT License.