Video tutorial coming soon.
Deploy Ollama on Ubuntu with Docker — run Llama 3, Mistral, Gemma, Phi, and hundreds of open-source language models locally. GPU and CPU supported. OpenAI-compatible API with zero cloud dependency after the initial model download.
Grab the automated bash script from GitHub to follow along with the video.
wget https://raw.githubusercontent.com/mhmdali94/Docker/main/ai/ollama/ollama-ubuntu.sh
chmod +x ollama-ubuntu.sh
sudo bash ollama-ubuntu.sh
The script installs Docker, deploys Ollama with Open WebUI, and auto-detects your NVIDIA GPU if available.
wget https://raw.githubusercontent.com/mhmdali94/Docker/main/ai/ollama/ollama-ubuntu.sh
chmod +x ollama-ubuntu.sh
sudo bash ollama-ubuntu.sh
Use the Ollama CLI to download a language model. Llama 3.2 (3B) is a good starting point for CPU-only servers:
docker exec -it ollama ollama pull llama3.2
# For a larger model with GPU:
docker exec -it ollama ollama pull llama3.1:8b
Open your browser and navigate to the Open WebUI interface to chat with your local models:
http://<your-server-ip>:3000
Ollama exposes an OpenAI-compatible REST API on port 11434. Connect any compatible app — AnythingLLM, Dify, or your own scripts:
curl http://<your-server-ip>:11434/api/generate \
-d '{"model":"llama3.2","prompt":"Hello!"}'
| Port | Purpose |
|---|---|
| 11434 | Ollama REST API |
| 3000 | Open WebUI (chat interface) |