Skip to content

Deploying Ollama for HG Content

This guide covers deploying a persistent Ollama service suitable for the free local/open‑source provider option.

Local Development

  • Install Ollama: https://ollama.com/
  • Run locally: ollama run mistral
  • Configure env in .env:
  • OLLAMA_ENABLED=true
  • OLLAMA_URL=http://localhost:11434
  • OLLAMA_MODEL=mistral:latest

Railway (CPU) — Sample Setup

1) Files - deployment/ollama/Dockerfile - deployment/ollama/entrypoint.sh (pulls model if missing) - deployment/ollama/railway.sample.toml (volume for models)

2) Environment - OLLAMA_MODEL=mistral:latest (or a quantized variant like mistral:7b-instruct-q4_K_M) - Ensure a volume is attached at /root/.ollama to persist models

3) Deploy - Create a new Railway service from the Dockerfile - Attach a volume (>= 20–50 GiB depending on models) - Start the service; first boot will pull the model if needed

4) Configure app - Set OLLAMA_URL to the Railway service URL (e.g. https://ollama-service.up.railway.app) - Optionally set OLLAMA_ENABLED=true for direct frontend use

Notes

  • CPU inference is slower; consider quantized models (q4_K_M) and lower max tokens.
  • For GPU, select an appropriate machine and a non‑quantized model; adjust concurrency as needed.
  • Never expose the Ollama service publicly without access controls; prefer private networking.