Releasing "Ollama Colab Free Server" — Run Ollama on Google Colab as a Free LLM Server

March 1, 2026 · 3 min read

Individual Developer

If this site helped you, please support us with a star! 🌟

Today I'm releasing Ollama Colab Free Server, an open-source notebook that runs Ollama on Google Colab's free GPU and makes it instantly available as a backend for Claude Code and Continue. Execute the cells top to bottom, and a publicly accessible LLM server is up within minutes.

Background

Coding assistants like Claude Code and Continue are powerful, but API costs can add up quickly — and there's always the concern of sending your code to an external service. At the same time, local Ollama on a machine with a weak GPU rarely reaches practical inference speeds.

This notebook bridges that gap. By running Ollama on Google Colab's free T4 GPU and exposing it via ngrok, you get a fully free LLM server — no local setup, no data sent to external APIs — accessible entirely from your browser.

Features

The notebook is designed to work without writing any code. In the first cell (Model Registry), enter model names as a comma-separated list and edit freely. Running the cell displays a radio button UI to pick the model you want.

In the second cell (Server), paste your ngrok token and run. The following steps execute automatically: install Ollama and dependencies, start the Ollama server, establish the ngrok tunnel, and pull the selected model (5–15 minutes on first run). Once complete, the endpoint URL along with ready-to-use config snippets for Continue and Claude Code are printed directly to the terminal.

OpenAI-compatible clients such as Codex CLI are also supported — just append /v1 to the base URL.

Supported Tools

Continue (VS Code / JetBrains extension): set the endpoint URL as apiBase
Claude Code: set ANTHROPIC_BASE_URL to the endpoint (Ollama v0.14.0+ natively supports the Anthropic Messages API)
OpenAI-compatible clients: use the endpoint URL + /v1

Model Size Guide for T4

The practical range on Google Colab's T4 GPU is 8B to 14B models. Models of 20B or more see significant slowdowns, so choosing a size that matches your use case matters. If you want to benchmark candidates first, Ollama Multi-Model Benchmarker lets you compare multiple models at once.

Get Started

No local environment needed. Just have a free ngrok account and your auth token ready.

Run on Google Colab: Ollama Colab Free Server (English)
View the source: hiroaki-com/colab-ollama-server on GitHub

Feedback and Pull Requests are welcome.

Technical Details

For architecture, implementation notes, and internals (health checks, shell injection mitigation, ngrok tunnel management, etc.), see the full documentation:

Run Ollama on Google Colab GPU as a Free LLM Server

If this site helped you, please support us with a star! 🌟

Star on GitHub

Background​

Features​

Supported Tools​

Model Size Guide for T4​

Get Started​

Technical Details​