Ollama
Run local LLMs via Ollama's HTTP API. No API keys needed, fully private, zero cost. Uses native fetch against Ollama's local server.
Installation
Run in your terminal:
# Project scope (current project only)
claude mcp add ollama -- npx -y ask-ollama-mcp
# User scope (all projects)
claude mcp add --scope user ollama -- npx -y ask-ollama-mcp Or install as a plugin (adds slash commands like /multi-review, /brainstorm, /compare, plus reviewer subagents and a pre-commit hook):
/plugin marketplace add Lykhoyda/ask-llm
/plugin install ask-llm@ask-llm-pluginsOr install globally:
npm install -g ask-ollama-mcpPrerequisites
- Node.js v20.0.0 or higher
- Ollama installed and running locally
- A model pulled:
ollama pull qwen2.5-coder:7bTools
| Tool | Purpose |
|---|---|
ask-ollama | Send prompts to local Ollama via HTTP. Optional sessionId for multi-turn — server-side conversation replay since Ollama has no native session support (ADR-058) |
get-usage-stats | Per-session token totals + breakdowns. In-memory (ADR-054) |
ping | Lists locally available Ollama models via /api/tags |
ask-ollama returns both human-readable text and a structured AskResponse (provider, response, model, sessionId, usage) via MCP outputSchema — programmatic clients can extract the sessionId and usage fields directly without parsing the response footer (ADR-065). Pass sessionId: "" (empty string) to start a fresh session and have the executor return a new UUID.
Models
- Default:
qwen2.5-coder:7b(good balance of speed and capability) - Fallback:
qwen2.5-coder:1.5b(automatic on model-not-found)
Configuration
Set OLLAMA_HOST environment variable to customize the Ollama server address (defaults to http://localhost:11434).
Sessions
Ollama has no native session support, so the MCP server stores conversation history server-side at /tmp/ask-llm-sessions/<id>.json (ADR-058, hardened in ADR-063):
- 24-hour TTL — sessions auto-expire
- 40-message cap — oldest dropped on overflow to bound replay cost
- Owner-only permissions —
0o600on files,0o700on directory - Atomic temp+rename writes — readers never see partial JSON
- Symlink rejection via
lstatSync— defense-in-depth against/tmp/race attacks
Each turn replays the full prior conversation (input tokens grow linearly with depth) — but Ollama runs locally so there's no token bill.
Key Features
- No API keys required
- Fully local and private — nothing leaves your machine
- Zero cost per query
- Server-side session continuity with hardened storage
- Model auto-detection via
/api/tagsendpoint - Automatic model fallback from 7b to 1.5b
- Structured AskResponse via outputSchema for programmatic clients
npm
- Package: ask-ollama-mcp
- Binary:
ask-ollama-mcp