Deploying Ollama for HG Content¶
This guide covers deploying a persistent Ollama service suitable for the free local/open‑source provider option.
Local Development¶
- Install Ollama: https://ollama.com/
- Run locally:
ollama run mistral - Configure env in
.env: OLLAMA_ENABLED=trueOLLAMA_URL=http://localhost:11434OLLAMA_MODEL=mistral:latest
Railway (CPU) — Sample Setup¶
1) Files - deployment/ollama/Dockerfile - deployment/ollama/entrypoint.sh (pulls model if missing) - deployment/ollama/railway.sample.toml (volume for models)
2) Environment - OLLAMA_MODEL=mistral:latest (or a quantized variant like mistral:7b-instruct-q4_K_M) - Ensure a volume is attached at /root/.ollama to persist models
3) Deploy - Create a new Railway service from the Dockerfile - Attach a volume (>= 20–50 GiB depending on models) - Start the service; first boot will pull the model if needed
4) Configure app - Set OLLAMA_URL to the Railway service URL (e.g. https://ollama-service.up.railway.app) - Optionally set OLLAMA_ENABLED=true for direct frontend use
Notes¶
- CPU inference is slower; consider quantized models (q4_K_M) and lower max tokens.
- For GPU, select an appropriate machine and a non‑quantized model; adjust concurrency as needed.
- Never expose the Ollama service publicly without access controls; prefer private networking.