Ollama

Self-host

A local LLM runner that pulls open models and serves them on your machine through a CLI and an OpenAI-compatible HTTP API.

Visit ollama.com View repository

Category: AI chat · Self-host
Cost: Self-host
Country: community
Licensing: FOSS
Platforms: macOS · Linux · Windows · CLI

Pros and cons

+ what works

+Inference is fully offline after a model is pulled, so conversations never leave the host
+MIT licensed and developed in the open on GitHub
+One-line install and a simple CLI (ollama pull, ollama run) get you talking to a model in minutes
+Large model library including Llama, Qwen, Gemma, Mistral, Phi, and DeepSeek variants
+Runs on consumer hardware (Apple Silicon out of the box, NVIDIA and ROCm on Linux, CPU-only as a fallback)
+Built-in OpenAI-compatible HTTP API at localhost:11434/v1, so most existing tooling drops in

− watch out for

−CLI-first; pair it with Open WebUI or a similar front end for a chat interface
−Hardware matters (16 GB RAM is a realistic floor for small quantized models, larger models want a GPU and 24 to 64 GB)
−Model files are multi-GB downloads, and quality still trails frontier closed models on hard reasoning
−Model pulls and update checks contact ollama.com; the registry sees which models you fetch
−No built-in conversation history or sync; persistence is whatever your client stores

Privacy notes

Ollama runs inference entirely on your hardware. Prompts, responses, and any documents you feed a model never leave the machine. The privacy policy confirms the local binary does not transmit conversation content. Pulling a model reaches out to ollama.com (and to Hugging Face for some models) and the registry logs which model was fetched, along with limited device and usage metadata such as app version, request counts, and IP. Once a model is downloaded, you can run Ollama fully offline.

Does this work for you?

Notes from people who tried it

Comments (0)

No comments yet. Be the first.