Ollama

Run local LLMs via Ollama's HTTP API. No API keys needed, fully private, zero cost. Uses native fetch against Ollama's local server.

Best for: private, air-gapped review of code that can't leave your machine — zero cost, no API keys, works offline. Not for: frontier-level reasoning. Local 7B models are weaker than hosted frontier models; use Codex when you need maximum capability.

Installation

Run in your terminal:

bash

# Project scope (current project only)
claude mcp add ollama -- npx -y ask-ollama-mcp

# User scope (all projects)
claude mcp add --scope user ollama -- npx -y ask-ollama-mcp

Or install as a plugin (adds slash commands like /multi-review, /brainstorm, /compare, plus reviewer subagents and the opt-in continuous codex-pair review hook):

bash

/plugin marketplace add Lykhoyda/ask-llm
/plugin install ask-llm@ask-llm-plugins

Or install globally:

bash

npm install -g ask-ollama-mcp

Prerequisites

Node.js v20.0.0 or higher
Ollama installed and running locally
A model pulled:

bash

ollama pull qwen3.6:27b

Tools

Tool	Purpose
`ask-ollama`	Send prompts to local Ollama via HTTP. Optional `sessionId` for multi-turn — server-side conversation replay since Ollama has no native session support (ADR-058)
`get-usage-stats`	Per-session token totals + breakdowns. In-memory (ADR-054)
`ping`	Lists locally available Ollama models via /api/tags

ask-ollama returns both human-readable text and a structured AskResponse (provider, response, model, sessionId, usage) via MCP outputSchema — programmatic clients can extract the sessionId and usage fields directly without parsing the response footer (ADR-065). Pass sessionId: "" (empty string) to start a fresh session and have the executor return a new UUID.

Models

Default: qwen3.6:27b — Qwen's flagship-level local coding model (~17 GB; needs a capable GPU / plenty of RAM). Set ASK_OLLAMA_MODEL to pick a lighter model.
No fallback. Ollama is local — you pull the model you want, so the server never silently substitutes another. If the model isn't installed, you get a clear ollama pull <model> error.

Configuration

Set OLLAMA_HOST environment variable to customize the Ollama server address (defaults to http://localhost:11434).

Sessions

Ollama has no native session support, so the MCP server stores conversation history server-side at /tmp/ask-llm-sessions/<id>.json (ADR-058, hardened in ADR-063):

24-hour TTL — sessions auto-expire
40-message cap — oldest dropped on overflow to bound replay cost
Owner-only permissions — 0o600 on files, 0o700 on directory
Atomic temp+rename writes — readers never see partial JSON
Symlink rejection via lstatSync — defense-in-depth against /tmp/ race attacks

Each turn replays the full prior conversation (input tokens grow linearly with depth) — but Ollama runs locally so there's no token bill.

Key Features

No API keys required
Fully local and private — nothing leaves your machine
Zero cost per query
Server-side session continuity with hardened storage
Model auto-detection via /api/tags endpoint
No silent model substitution — a clear "pull it first" error if a model isn't installed
Structured AskResponse via outputSchema for programmatic clients

npm

Package: ask-ollama-mcp
Binary: ask-ollama-mcp

Ollama ​

Installation ​

Prerequisites ​

Tools ​

Models ​

Configuration ​

Sessions ​

Key Features ​

npm ​

Ollama

Installation

Prerequisites

Tools

Models

Configuration

Sessions

Key Features

npm